Aprendizagem profunda aplicada à classificação e avaliação do comportamento do Sars-CoV-2

The new BetaCoronavirus, officially named SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus - 2) is the virus causing COVID-19 disease. A member of the Coronaviridae family of viruses, SARS-CoV-2 is a positive-sense, single-stranded RNA enveloped virus that contains nearly 30,000 base pai...

ver descrição completa

Na minha lista:
Detalhes bibliográficos
Autor principal: Azevedo, Karolayne Santos de
Outros Autores: Fernandes, Marcelo Augusto Costa
Formato: Dissertação
Idioma:pt_BR
Publicado em: Universidade Federal do Rio Grande do Norte
Assuntos:
Endereço do item:https://repositorio.ufrn.br/handle/123456789/50820
Tags: Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
Descrição
Resumo:The new BetaCoronavirus, officially named SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus - 2) is the virus causing COVID-19 disease. A member of the Coronaviridae family of viruses, SARS-CoV-2 is a positive-sense, single-stranded RNA enveloped virus that contains nearly 30,000 base pairs (base-pair - bp). RNA viruses tend to undergo more modifications than DNA viruses. Thus, when a virus is circulating widely in a population and causing many infections, the probability of its genome undergoing modifications increases, which may negatively affect some of its properties, becoming more transmissible and/or even more lethal. Within this context, this work offers a tool, based on machine learning, which makes use of a deep one-dimensional (1D) convolutional neural network (CNN) for the classification and comparison of viral genomes of the new SARS-CoV- 2. As input, complete genomic cDNA samples (complementary DNA) were used, whose size varies between 26,342 and 31,029 bp in length. Unlike most approaches presented in the literature, the results obtained by this tool, which involves the classification of viruses from the same family, reveal high values for the performance metrics, proving to be more reliable when compared to the works discussed in the state of art. Subsequently, the architecture was used to verify the behavior and evolution of the genomic sequences of the main variants of concern (beta, gamma and delta) having in its high sensitivity, through values of accuracy, obtained through binary classification of these variants. For this experiment, genomic data from GISAID (Global Initiative on Sharing All Influenza Data - GISAID) were used, which also hosts epidemiological and clinical data regarding SARS-CoV-2. The Anderson-Darling, Jarque-Bera and Kruskal-Wallis tests were performed based on the global knowledge scores of each group of variations in order to analyze their static behavior. The test results indicate that the genomic sequences do not have a normal distribution and do not follow the same distribution for most experiments, indicating statistical and behavioral differences in these variations. The tests of the model of an encounter with some results do not present comparisons that show the same group probability, in relatively smaller periods of time, going against the results obtained with the work architecture, presenting the possibility of using the use of classification virus, as well as tracking the behavior of SARS-CoV-2 variants over time due to the high sensitivity of the network.