Aprendizagem profunda aplicada à classificação e avaliação do comportamento do Sars-CoV-2
The new BetaCoronavirus, officially named SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus - 2) is the virus causing COVID-19 disease. A member of the Coronaviridae family of viruses, SARS-CoV-2 is a positive-sense, single-stranded RNA enveloped virus that contains nearly 30,000 base pai...
Na minha lista:
Autor principal: | |
---|---|
Outros Autores: | |
Formato: | Dissertação |
Idioma: | pt_BR |
Publicado em: |
Universidade Federal do Rio Grande do Norte
|
Assuntos: | |
Endereço do item: | https://repositorio.ufrn.br/handle/123456789/50820 |
Tags: |
Adicionar Tag
Sem tags, seja o primeiro a adicionar uma tag!
|
Resumo: | The new BetaCoronavirus, officially named SARS-CoV-2 (Severe Acute Respiratory
Syndrome Coronavirus - 2) is the virus causing COVID-19 disease. A member of the
Coronaviridae family of viruses, SARS-CoV-2 is a positive-sense, single-stranded RNA
enveloped virus that contains nearly 30,000 base pairs (base-pair - bp). RNA viruses
tend to undergo more modifications than DNA viruses. Thus, when a virus is circulating widely in a population and causing many infections, the probability of its genome
undergoing modifications increases, which may negatively affect some of its properties,
becoming more transmissible and/or even more lethal. Within this context, this work
offers a tool, based on machine learning, which makes use of a deep one-dimensional
(1D) convolutional neural network (CNN) for the classification and comparison of viral
genomes of the new SARS-CoV- 2. As input, complete genomic cDNA samples (complementary DNA) were used, whose size varies between 26,342 and 31,029 bp in length.
Unlike most approaches presented in the literature, the results obtained by this tool, which
involves the classification of viruses from the same family, reveal high values for the performance metrics, proving to be more reliable when compared to the works discussed
in the state of art. Subsequently, the architecture was used to verify the behavior and
evolution of the genomic sequences of the main variants of concern (beta, gamma and
delta) having in its high sensitivity, through values of accuracy, obtained through binary
classification of these variants. For this experiment, genomic data from GISAID (Global
Initiative on Sharing All Influenza Data - GISAID) were used, which also hosts epidemiological and clinical data regarding SARS-CoV-2. The Anderson-Darling, Jarque-Bera
and Kruskal-Wallis tests were performed based on the global knowledge scores of each
group of variations in order to analyze their static behavior. The test results indicate that
the genomic sequences do not have a normal distribution and do not follow the same distribution for most experiments, indicating statistical and behavioral differences in these
variations. The tests of the model of an encounter with some results do not present comparisons that show the same group probability, in relatively smaller periods of time, going
against the results obtained with the work architecture, presenting the possibility of using
the use of classification virus, as well as tracking the behavior of SARS-CoV-2 variants
over time due to the high sensitivity of the network. |
---|