Applying optimized hierarchical NCM classification to public purchases of products in Brazil

The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common...

ver descrição completa

Na minha lista:
Detalhes bibliográficos
Autor principal: Alves Sobrinho, Pitágoras de Azevedo
Outros Autores: Xavier Júnior, João Carlos
Formato: bachelorThesis
Idioma:English
Publicado em: Universidade Federal do Rio Grande do Norte
Assuntos:
Endereço do item:https://repositorio.ufrn.br/handle/123456789/48321
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
id ri-123456789-48321
record_format dspace
spelling ri-123456789-483212023-05-02T15:32:26Z Applying optimized hierarchical NCM classification to public purchases of products in Brazil Applying optimized hierarchical NCM classification to public purchases of products in Brazil Alves Sobrinho, Pitágoras de Azevedo Xavier Júnior, João Carlos http://lattes.cnpq.br/0435510237375618 http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4758203U5 Oliveira, Marcel Vinicius Medeiros http://lattes.cnpq.br/1756952696097255 Santos, Ilueny Constâncio Chaves dos http://lattes.cnpq.br/8930351118408164 Supervised classification Machine learning Hierarchical classification Nota fiscal eletrônica Product classification The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated. The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated. 2022-07-04T14:51:29Z 2022-07-04T14:51:29Z 2022-06-15 bachelorThesis ALVES SOBRINHO, Pitágoras de Azevedo, Applying optimized hierarchical NCM classification to public purchases of products in Brazil. 2022. 19f. Trabalho de Conclusão de Curso (Residência em Tecnologia da Informação). Instituto Metrópole Digital, Universidade Federal do Rio Grande do Norte, Natal, 2022. https://repositorio.ufrn.br/handle/123456789/48321 en application/pdf Universidade Federal do Rio Grande do Norte Brasil UFRN Residência em Tecnologia da Informação Instituto Metrópole Digital
institution Repositório Institucional
collection RI - UFRN
language English
topic Supervised classification
Machine learning
Hierarchical classification
Nota fiscal eletrônica
Product classification
spellingShingle Supervised classification
Machine learning
Hierarchical classification
Nota fiscal eletrônica
Product classification
Alves Sobrinho, Pitágoras de Azevedo
Applying optimized hierarchical NCM classification to public purchases of products in Brazil
description The use of free text to categorize any type of entity causes, in most cases, difficulties related to the identification of such entities. In the Electronic Fiscal Receipt (“Nota Fiscal Eletrônica”, NF-e), issued for all public purchases in Brazil, products are categorized within the Mercosul Common Nomenclature (NCM). Such an identifier is necessary to calculate taxes, but it is often filled in wrongly, which makes it difficult to detect irregularities in prices and monitor public expenditures. In this context, an automatic product categorization system was developed based on the textual descriptions present in the NF-e. It consists of a categorization tree that follows the NCM product hierarchy, using the Local Classifier per Parent Node pattern. Each node in the tree is trained to encode the textual descriptions in Document Embeddings and then use a supervised classification algorithm to decide the NCM code. Tree nodes are optimized by selecting classification algorithms as well as parameters, testing the performance of various random configurations. In the results, the hierarchical classification presented a higher F1 score than the flat classification experiments and the error propagation problem was mitigated.
author2 Xavier Júnior, João Carlos
author_facet Xavier Júnior, João Carlos
Alves Sobrinho, Pitágoras de Azevedo
format bachelorThesis
author Alves Sobrinho, Pitágoras de Azevedo
author_sort Alves Sobrinho, Pitágoras de Azevedo
title Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_short Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_full Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_fullStr Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_full_unstemmed Applying optimized hierarchical NCM classification to public purchases of products in Brazil
title_sort applying optimized hierarchical ncm classification to public purchases of products in brazil
publisher Universidade Federal do Rio Grande do Norte
publishDate 2022
url https://repositorio.ufrn.br/handle/123456789/48321
work_keys_str_mv AT alvessobrinhopitagorasdeazevedo applyingoptimizedhierarchicalncmclassificationtopublicpurchasesofproductsinbrazil
_version_ 1773962756950589440