Contribuições aos Processos de Clustering com Base em Métricas não-Euclidianas

In this work we present a new clustering method that groups up points of a data set in classes. The method is based in a algorithm to link auxiliary clusters that are obtained using traditional vector quantization techniques. It is described some approaches during the development of the work that a...

Volledige beschrijving

Bewaard in:
Bibliografische gegevens
Hoofdauteur: Martins, Allan de Medeiros
Andere auteurs: Dória Neto, Adrião Duarte
Formaat: doctoralThesis
Taal:por
Gepubliceerd in: Universidade Federal do Rio Grande do Norte
Onderwerpen:
Online toegang:https://repositorio.ufrn.br/jspui/handle/123456789/15263
Tags: Voeg label toe
Geen labels, Wees de eerste die dit record labelt!
Omschrijving
Samenvatting:In this work we present a new clustering method that groups up points of a data set in classes. The method is based in a algorithm to link auxiliary clusters that are obtained using traditional vector quantization techniques. It is described some approaches during the development of the work that are based in measures of distances or dissimilarities (divergence) between the auxiliary clusters. This new method uses only two a priori information, the number of auxiliary clusters Na and a threshold distance dt that will be used to decide about the linkage or not of the auxiliary clusters. The number os classes could be automatically found by the method, that do it based in the chosen threshold distance dt, or it is given as additional information to help in the choice of the correct threshold. Some analysis are made and the results are compared with traditional clustering methods. In this work different dissimilarities metrics are analyzed and a new one is proposed based on the concept of negentropy. Besides grouping points of a set in classes, it is proposed a method to statistical modeling the classes aiming to obtain a expression to the probability of a point to belong to one of the classes. Experiments with several values of Na e dt are made in tests sets and the results are analyzed aiming to study the robustness of the method and to consider heuristics to the choice of the correct threshold. During this work it is explored the aspects of information theory applied to the calculation of the divergences. It will be explored specifically the different measures of information and divergence using the Rényi entropy. The results using the different metrics are compared and commented. The work also has appendix where are exposed real applications using the proposed method