Modern Bioinformatics Solutions Used for Genetic Data Analysis
https://doi.org/10.35825/2587-5728-2023-7-4-366-383
Abstract
Effective counteraction to biological threats, both natural and man-made, requires the availability of means and methods for rapid and reliable microorganism identification and a comprehensive study of their basic biological properties. Over the past decade, the arsenal of domestic microbiologists has been supplemented by numerous methods for analyzing the genomes of pathogens, primarily based on nucleic acid sequencing. The purpose of this work is to provide the reader with information about capabilities of modern technical and methodological arsenal used for in-depth molecular genetic study of microorganisms, including bioinformatics solutions used for the genetic data analysis. The source base for this research is English-language scientific literature available via the Internet, bioinformation software documentation. The research method is an analysis of scientific sources from the general to the specific. We considered the features of sequencing platforms, the main stages of genetic information analysis, current bioinformation utilities, their interaction and organization into a single workflow. Results and discussion. The performance of modern genetic analyzers allows for complete decoding of the bacterial genome within one day, including the time required to prepare the sample for research. The key factor that largely determines the effectiveness of the genetic analysis methods used is the competent use of the necessary bioinformatics software utilities. Standard stages of primary genetic data analysis are assessment of the quality control, data preprocessing, mapping to a reference genome or de novo genome assembly, genome annotation, typing and identification of significant genetic determinants (resistance to antibacterial drugs, pathogenicity factors, etc.), phylogenetic analysis. For each stage bioinformation utilities have been developed, differing in implemented analysis algorithms. Conclusion. Open source utilities that do not require access to remote resources for their operation are of greatest interest due to activities specifics of NBC protection corps units.
Keywords
About the Authors
Ya. A. KibirevRussian Federation
Yaroslav A. Kibirev - Chief of the Department. Cand. Sci. (Biol.)
Oktyabrsky Avenue 119, Kirov 610000
A. V. Kuznetsovskiy
Russian Federation
Andrey V. Kuznetsovskiy - Deputy Chief of the Branch Office. Cand. Sci. (Biol.)
Oktyabrsky Avenue 119, Kirov 610000
S. G. Isupov
Russian Federation
Sergey G. Isupov - Deputy Chief of the Department, Cand. Sci. (Med.)
Oktyabrsky Avenue 119, Kirov 610000
I. V. Darmov
Russian Federation
Ilya V. Darmov - Leading Researcher. Dr. Sci. (Med.), Professor
Oktyabrsky Avenue 119, Kirov 610000
References
1. Morens DM, Fauci AS. Emerging pandemic diseases: how we got to COVID-19. Cell. 2020;182(5):1077–92. https://doi.org/10.1016/j.cell.2020.08.021
2. Smit M, Marinosci A, Agoritsas T, Calmy A. Prophylaxis for COVID-19: a systematic review. Clin Microbiol Infect. 2021;27(4):532–7. https://doi.org/10.1016/j.cmi.2021.01.013
3. Graña C, Ghosn L, Evrenoglou T, Jarde A, Minozzi S, Bergman H, et al. Efficacy and safety of COVID-19 vaccines. Cochrane Database Syst Rev. 2022;12(12):CD015477. https://doi.org/10.1002/14651858.CD015477
4. Sanger F, Air GM, Barrell BG, Brown NL, Coulson AR, Fiddes CA, et al. Nucleotide sequence of bacteriophage φX174 DNA. Nature. 1977;265(5596):687–95. https://doi.org/10.1038/265687a0
5. Watts D, MacBeath JRE. Automated fluorescent DNA sequencing on the ABI PRISM 310 Genetic Analyzer. In: DNA Sequencing Protocols. Methods in Molecular Biology, vol 167. Graham CA, Hill AJM, Eds. Humana Press; 2001. https://doi.org/10.1385/1-59259-113-2:153
6. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26:1135-45. https://doi.org/10.1038/nbt1486
7. Hernandez D, François P, Farinelli L, Osterås M, Schrenzel J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 2008;18(5):802-9. https://doi.org/10.1101/gr.072033.107
8. Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, et al. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics. 2012;13:341. https://doi.org/10.1186/1471-2164-13-341
9. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323(5910):133–38. https://doi.org/10.1126/science.1162986
10. Arumugam K, Bessarab I, Liu X, Natarajan G, Drautz-Moses DI, Wuertz S, et al. Improving recovery of member genomes from enrichment reactor microbial communities using MinION–based long read metagenomics. bioRxiv. 2018:465328. https://doi.org/10.1101/465328
11. Maljkovic Berry I, Melendrez MC, Bishop-Lilly KA, Rutvisuttinunt W, Pollett S, Talundzic E, et al. Next generation sequencing and bioinformatics methodologies for infectious disease research and public health: approaches, applications, and considerations for development of laboratory capacity. J Infect Dis. 2020;221(Suppl 3):S292–S307. https://doi.org/10.1093/infdis/jiz286
12. Besser J, Carleton HA, Gerner-Smidt P, Lindsey RL, Trees E. Next-generation sequencing technologies and their application to the study and control of bacterial infections. Clin Microbiol Infect. 2018;24(4):335–41. https://doi.org/10.1016/j.cmi.2017.10.013
13. Robinson JM, Pasternak Z, Mason CE, Elhaik E. Forensic applications of microbiomics: a review. Front Microbiol. 2021;11:608101. https://doi.org/10.3389/fmicb.2020.608101
14. Allali I, Arnold JW, Roach J, Cadenas MB, Butz N, Hassan HM, et al. A comparison of sequencing platforms and bioinformatics pipelines for compositional analysis of the gut microbiome. BMC Microbiol. 2017;17(1):194. https://doi.org/10.1186/s12866-017-1101-8
15. Chaudhari HG, Prajapati S, Wardah ZH, Raol G, Prajapati V, Patel R, et al. Decoding the microbial universe with metagenomics: a brief insight. Front Genet. 2023;14:1119740. https://doi.org/10.3389/fgene.2023.1119740
16. Vincent AT, Derome N, Boyle B, Culley AI, Charette SJ. Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money. J Microbiol Methods. 2017;138:60–71. https://doi.org/10.1016/j.mimet.2016.02.016
17. Lema NK, Gemeda MT, Woldesemayat AA. Recent advances in metagenomic approaches, applications, and challenge. Curr Microbiol. 2023;80(11):347. https://doi.org/10.1007/s00284-023-03451-5
18. Cornet L, Baurain D. Contamination detection in genomic data: more is not enough. Genome Biol. 2022;23:60. https://doi.org/10.1186/s13059-022-02619-9
19. Bush SJ, Connor TR, Peto TEA, Crook DW, Walker AS. Evaluation of methods for detecting human reads in microbial sequencing datasets. Microb Genom. 2020;6(7):mgen000393. https://doi.org/10.1099/mgen.0.000393
20. Salzberg SL, Breitwieser FP, Kumar A, Hao H, Burger P, Rodriguez FJ, et al. Next-generation sequencing in neuropathologic diagnosis of infections of the nervous system. Neurol Neuroimmunol Neuroinflamm. 2016;3(4):e251. https://doi.org/10.1212/NXI.0000000000000251
21. Brennan C, Salido RA, Belda-Ferre P, Bryant M, Cowart C, Tiu MD, et al. Maximizing the potential of high-throughput next-generation sequencing through precise normalization based on read count distribution. mSystems. 2023;8(4):e0000623. https://doi.org/10.1128/msystems.00006-23
22. Portik DM, Brown CT, Pierce-Ward NT. Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets. BMC Bioinformatics. 2022;23(1):541. https://doi.org/10.1186/s12859-022-05103-0
23. Reinert K, Langmead B, Weese D, Evers DJ. Alignment of next-generation sequencing reads. Annu Rev Genomics Hum Genet. 2015;16:133-51. https://doi.org/10.1146/annurev-genom-090413-025358
24. Liu Y, Shen X, Gong Y, Liu Y, Song B, Zeng X. Sequence Alignment/Map format: a comprehensive review of approaches and applications. Brief Bioinform. 2023;24(5):bbad320. https://doi.org/10.1093/bib/bbad320
25. Antipov D, Raiko M, Lapidus A, Pevzner PA. Plasmid detection and assembly in genomic and metagenomic data sets. Genome Res. 2019;29(6):961-8. https://doi.org/10.1101/gr.241299.118
26. Gupta SK, Raza S, Unno T. Comparison of de-novo assembly tools for plasmid metagenome analysis. Genes Genomics. 2019;41(9):1077–83. https://doi.org/10.1007/s13258-019-00839-1
27. Gurevich A, Saveliev V, Vyahhi N, Tesler G, QUAST: quality assessment tool for genome assemblies. Bioinformatics. 2013;8(29):1072–5. https://doi.org/10.1093/bioinformatics/btt086
28. Huang B, Wei G, Wang B, Ju F, Zhong Y, Shi Z, et al. Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph. BMC Bioinformatics. 2021;22(1):533. https://doi.org/10.1186/s12859-021-04448-2
29. Lu J, Rincon N, Wood DE, Breitwieser FP, Pockrandt C, Langmead B, et al. Metagenome analysis using the Kraken software suite. Nat Protoc. 2022;17(12):2815–39. https://doi.org/10.1038/s41596-022-00738-y
30. Nascimento M, Sousa A, Ramirez M, Francisco AP, Carriço JA, Vaz C. PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods. Bioinformatics. 2017;33(1):128–9. https://doi.org/10.1093/bioinformatics/btw582
31. Rose R, Golosova O, Sukhomlinov D, Tiunov A, Prosperi M. Flexible design of multiple metagenomics classification pipelines with UGENE. Bioinformatics. 2018;11(35):1963–5. https://doi.org/10.1093/bioinformatics/bty901
Review
For citations:
Kibirev Ya.A., Kuznetsovskiy A.V., Isupov S.G., Darmov I.V. Modern Bioinformatics Solutions Used for Genetic Data Analysis. Journal of NBC Protection Corps. 2023;7(4):366-383. (In Russ.) https://doi.org/10.35825/2587-5728-2023-7-4-366-383