Uma plataforma distribuída de mineração de dados para big data: um estudo de caso aplicado à Secretaria de Tributação do Rio Grande do Norte

The volume of data stored and accessed daily is growing on a geometric scale. About 2.5 billion gigabytes are generated every day. In addition, 90 % of the world’s data has been produced in the last two years. Many terms have been used to describe this giant volume of stored data in a structured...

ver descrição completa

Na minha lista:
Detalhes bibliográficos
Autor principal: Santos, Diego Soares dos
Outros Autores: Xavier Júnior, João Carlos
Formato: Dissertação
Idioma:pt_BR
Publicado em: Brasil
Assuntos:
Endereço do item:https://repositorio.ufrn.br/jspui/handle/123456789/27508
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Descrição
Resumo:The volume of data stored and accessed daily is growing on a geometric scale. About 2.5 billion gigabytes are generated every day. In addition, 90 % of the world’s data has been produced in the last two years. Many terms have been used to describe this giant volume of stored data in a structured or non-structured way. Big Data is one of these terms. For many researchers, Big Data is the phenomenon where data is produced in various formats and stored by a large number of devices and equipment. Some efforts have been done to offer open source tools and frameworks that can handle or provide capabilities that can deal with and mine this huge amount of data. However, as the nature of the data is quite diverse, choosing or developing tools to deal with such data becomes a non-trivial problem. In addition, few tools are able to extract knowledge from the data. In this sense, knowledge extraction becomes more difficult due to specific characteristics of the data, such as: the description of a product which is totally flexible and without validation. For this reason, in certain problem domains, it is necessary to apply data mining techniques in text attributes to extract standardized values. The main objective of this paper is to propose a distributed data mining platform for the Tax Administration of Rio Grande do Norte, which can extract knowledge in a varied way, considering the specific characteristics of electronic invoices (NFC-e’s).