Seleção de atributos baseado em algoritmos de agrupamento para tarefas de classificação

With the increase of the size on the data sets used in classi cation systems, selecting the most relevant attribute has become one of the main tasks in pre-processing phase. In a dataset, it is expected that all attributes are relevant. However, this is not always veri ed. Selecting a set of attr...

ver descrição completa

Na minha lista:
Detalhes bibliográficos
Autor principal: Dantas, Carine Azevedo
Outros Autores: Canuto, Anne Magaly De Paula
Formato: Dissertação
Idioma:por
Publicado em: Brasil
Assuntos:
Endereço do item:https://repositorio.ufrn.br/jspui/handle/123456789/26092
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Descrição
Resumo:With the increase of the size on the data sets used in classi cation systems, selecting the most relevant attribute has become one of the main tasks in pre-processing phase. In a dataset, it is expected that all attributes are relevant. However, this is not always veri ed. Selecting a set of attributes of more relevance aids decreasing the size of the data without a ecting the performance, or even increase it, this way achieving better results when used in the data classi cation. The existing features selection methods elect the best attributes in the data base as a whole, without considering the particularities of each instance. The Unsupervised-based Feature Selection, proposed method, selects the relevant attributes for each instance individually, using clustering algorithms to group them accordingly with their similarities. This work performs an experimental analysis of di erent clustering techniques applied to this new feature selection approach. The clustering algorithms k-Means, DBSCAN and Expectation-Maximization (EM) were used as selection methods. Anaysis are performed to verify which of these clustering algorithms best ts to this new Feature Selection approach. Thus, the contribution of this study is to present a new approach for attribute selection, through a Semidynamic and a Dynamic version, and determine which of the clustering methods performs better selection and get a better performance in the construction of more accurate classi ers.