Unusual low-quality ONT genomes due to extensive modifications
We sequenced 12 microbial strains of Listeria monocytogenes using Illumina and ONT R9.4 flowcells (~200990Mbp, SUP model) (Fig.1a, Supplementary Tables1 and 2). The ONT reads were assembled into genomes with sequencing errors further polished by Medaka and Homopolish (Supplementary Table3, see Methods). The Illumina and ONT read were hybrid assembled for evaluation purposes (Supplementary Table4). When compared with the Illumina/ONT hybrid assemblies (Fig.1b), seven ONT-only genomes exhibited high quality (HQ) ranging from Q47 to Q60 (e.g., R19-2905 and R20-0088). However, five isolates (R20-0026, R20-0030, R20-0127, R20-0148, and R20-0150) showed unexpectedly low quality (LQ) varying from Q26 to Q32. The accuracy of these five LQ genomes remained unimproved after replicated ONT sequencing. Further investigation of the five LQ genomes revealed excessive amounts of mismatch errors (15335670) compared with the seven HQ ones (040 mismatches) (Fig.1c). Homopolymer errors (i.e., indels) were not the source of inferior quality (7306, Supplementary Table5).
a Workflow of ONT-only and ONT/Illumina hybrid assembly; b Q scores; c number of mismatches (red: LQ, gray: HQ); d comparison of ONT and Illumina reads by IGV; e numbers of 5mC, 6mA, and mismatches between HQ/LQ strains (n=12, red: LQ, gray: HQ). Error bars represent the minimum and maximum values.
Manual inspection revealed that these mismatches were ONT basecalling errors uncorrected after genome polishing (Fig.1d and Supplementary Fig.1). As mismatch errors in ONT are mainly due to epigenetic modifications, we computed the frequency of well-known methylation in these isolates (see Method and Supplementary Table6). In terms of 5-methylcytosine (5mC), the numbers of modified loci in the five LQ genomes (~240340k) were not significantly higher than those in the HQ ones (210345k, P=0.89, Fig.1e). Similarly, the numbers of N6-methyladenine (6mA) modifications also showed no significant difference between the LQ and HQ groups (98218k vs. 126223k, P=0.34). Because the numbers of mismatch errors in LQ genomes are significantly higher than those of HQ ones (P=0.005), we suspected ONT basecalling algorithms failed to distinguish the novel modification types in the LQ isolates.
We removed the modifications in all microbial samples by WGA (Fig.2a), which randomly amplifies the genome fragments without retaining any epigenetic modification (see Methods). The WGA-demodified samples were sequenced by ONT (R9.4), assembled into chromosomes, and compared with the Illumina/ONT hybrid genomes (Fig.2a, Supplementary Tables7 and 8). The five LQ genomes after WGA exhibited significantly higher quality than those without demodifications (e.g., Q26 to Q53 in R20-0026) (Fig.2b, Supplementary Table9). In particular, the amounts of mismatch errors significantly reduced after demodification (e.g., 5670 to 16 in R20-0026) (Fig.2c). Consequently, the unexpected low quality of ONT was due to excessive modification-induced errors untrained in their basecalling model. The demodification by WGA can produce high-quality ONT genomes without the need for Illumina short reads.
a Worflow of WGA-demodified ONT; b Q scores of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); c numbers of mismatches of the WGA-demodified and ONT-only genomes (gray: ONT, black: WGA ONT); d WGA and ONT-only genome quality with respect to sequencing depth (shading: mininum and maximum quality in five replicates, line: median quality); e numbers of active/available pores during WGA-demodified and ordinary ONT sequencing.
However, while WGA successfully erased these modifications, the sequencing cost increased by two factors. First, WGA required a higher sequencing depth (~100) for assembling a complete genome when compared with ordinary ONT sequencing (~30) (Fig.2d and Supplementary Figs.2 and 3). It was due to the uneven amplification of WGA, which led to non-uniform sequencing depth and a fragmented assembly at moderate coverage. Second, the WGA-demodified samples may reduce the ONT yields. We observed the number of available/active pores could sometimes decrease quickly (e.g., less than 100 pores after 12h) (Fig.2e), which was possibly owing to the hyperbranched structure unresolved after WGA10. Consequently, the sequencing cost of WGA-demodified samples using ONT is much higher than ordinary sequencing.
We developed a novel computational method (called Modpolish) for correcting these modification-mediated errors without WGA and prior knowledge of the modification systems. Modpolish identifies and corrects the modification-mediated errors by leveraging basecalling quality, basecalling consistency, and evolutionary conservation (Fig.3a, see Methods). Briefly, because the ONT signals are disturbed by modifications, the basecalling quality is substantially lower than the modification-free loci (Supplementary Fig.4). As such, the basecalled nucleotides are often inconsistent at the modified loci (Supplementary Fig.5), yet these loci are within conservative motifs (Supplementary Fig.6). In conjunction with the conservation degree measured by closely-related genomes, only the modified loci with ultra-high conservation will be corrected by Modpolish, avoiding false corrections of strain variations with high specificity.
a Workflow of Modpolish; b Q scores before and after Modpolish; c numbers of mismatches before and after Modpolish (gray: before Modpolish, black: after Modpolish); d the antiviral defending systems encoded by the 12 strains (gray: before Modpolish, black: after Modpolish); e the sequence motif of modification sites in the four mza-encoding strains; f the sequence motif of modification sites on the R20-0026 strain.
We assessed the accuracy of Modpolish by comparing the quality of the ONT-only genomes (polished by Medaka) with those further polished by Modpolish. The results indicated that Modpolish significantly improved the quality of all LQ genomes from Q2734 to Q60 (Fig.3b, Supplementary Table10). The number of mismatches also greatly decreased (e.g., from 5670 to 67 in R20-0026) (Fig. (3c). The numbers of mismatches in some HQ genomes were also reduced by Modpolish. For instance, the mismatches in the R19-2905 were erased from 40 to 6. Consequently, our results suggested that Modpolish made no false corrections on the HQ genomes (Supplementary Tables1113). The comparison of different basecaller versions and models (v4.0.14 vs. v6.3.4, HAC vs. SUP) indicated that these errors remain exist and Modpolish successfully erases most of them (Supplementary Fig.7).
As the modification systems often involve anti-phage defense (e.g., R-M, BREX, DISARM)11,12,13, we investigated the defending systems possessed by the HQ and LQ strains (Fig.3d) (Supplementary Data1). All the HQ genomes encompass at least one R-M system (e.g., Type I, II, or III), which is missing in all LQ isolates. Instead, four LQ strains (i.e., R20-0030, R20-0127, R20-0148, R20-150) carry a novel methyltransferase-encoding mza defending system which is absent in all HQ genomes (Supplementary Fig.8). Analysis of modification sites of the four mza-encoding LQ strains revealed pentanucleotide motif GCAGC (Fig.3e, Supplementary Fig.6). On the other hand, modification loci in the LQ R20-0026 all centered on the motif GCTGG (Fig.3f). Together, these results suggested that two lineage-specific modification systems extensively edited the five LQ genomes. Although their underlying mechanisms remained unclear, the editing at specific motifs with high conservation within each lineage allowed cost-effective in silico correction of these errors by Modpolish.
We then assessed the performance of Modpolish on public ONT datasets sequenced by R9.4 (SUP) and R10.4 flowcells (SUP, duplex/simplex modes). In the R9.4 dataset14, we first compared the quality of seven bacterial genomes polished by Medaka and Modpolish (Fig.4a, Supplementary Table14). The quality of five genomes significantly improved from ~Q45 to Q60. Similarly, the improvement was mainly due to the reduction of mismatches (Fig.4b). For instance, the number of mismatches decreased from 388 to 13 in the Staphylococcus genome after Modpolish. On average, the mismatch reduction rates of all genomes ranged from 50-96%. Consequently, although these bacterial genomes are not extensively modified, Modpolish can further improve their quality after Medaka without false corrections.
Comparison of Medaka and Modpolish for a Q scores and b mismatches on the R9.4 dataset; comparison of Medaka and Modpolish for c Q scores and d mismatches on the R10.4 dataset.
In the R10.4 (duplex mode) dataset3, we compared the genome qualities polished by Medaka and Modpolish (downsampled to ~60) (Fig.4c, Supplementary Table15). In general, Modpolish made little or no improvement in the duplex dataset. For instance, the mismatches produced by Modpolish only reduced from 20 to 19 on the Bacillus genome (Fig.4d). The overall genome quality is very high such that no differences can be seen (Q60). Modpolish demonstrated marginal on a recently published simplex dataset (R10.4, kit 14, Dorado v0.1.1) (Supplementary Fig.9). Therefore, the qualities of ONT R10.4 flowcells, in particular the duplex mode, is not only higher than those of R9.4 and require nearly no further correction. On the other hand, Modpolish may be used to fill the accuracy gap between simplex and duplex modes when the projects aim for higher throughput.
View original post here:
Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based ... - Nature.com
- The complete plastome sequences of invasive weed Parthenium hysterophorus: genome organization, evolutionary ... - Nature.com - February 18th, 2024 [February 18th, 2024]
- Multi-omic profiling reveals associations between the gut microbiome, host genome and transcriptome in patients with ... - Journal of Translational... - February 18th, 2024 [February 18th, 2024]
- Polymerase Chain Reaction (PCR) - National Human Genome Research Institute - February 18th, 2024 [February 18th, 2024]
- Genomic Time Machine Reveals Secrets of Human DNA - SciTechDaily - February 18th, 2024 [February 18th, 2024]
- 1 Million Unannotated Exons Discovered in the Human Genome - Technology Networks - February 18th, 2024 [February 18th, 2024]
- Hope for the night parrot: bird's full genome has been sequenced - Cosmos - February 18th, 2024 [February 18th, 2024]
- RevIT AAV Enhancer: Rev-up AAV genome production in upstream manufacturing - BioProcess Insider - February 18th, 2024 [February 18th, 2024]
- Multi-omics resources for the Australian southern stuttering frog (Mixophyes australis) reveal assorted antimicrobial ... - Nature.com - February 18th, 2024 [February 18th, 2024]
- Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica ... - Nature.com - February 18th, 2024 [February 18th, 2024]
- Near-gapless and haplotype-resolved apple genomes provide insights into the genetic basis of rootstock-induced ... - Nature.com - February 18th, 2024 [February 18th, 2024]
- Secrets of Night Parrot unlocked after first genome sequenced - CSIRO - February 18th, 2024 [February 18th, 2024]
- CRISPR gene editing tool gets a revolutionary high-tech upgrade - Earth.com - February 18th, 2024 [February 18th, 2024]
- Ancient retroviruses played a key role in the evolution of vertebrate brains - EurekAlert - February 18th, 2024 [February 18th, 2024]
- Natural selection and genetic diversity maintenance in a parasitic wasp during continuous biological control application - Nature.com - February 18th, 2024 [February 18th, 2024]
- Hopes elusive parrots genome will provide answers - news.com.au - February 18th, 2024 [February 18th, 2024]
- MicroRNA is the master regulator of the genome researchers are learning how to treat disease by harnessing the ... - The Conversation - November 30th, 2023 [November 30th, 2023]
- "Ground-Breaking" Release of World's Largest Whole Genome Resource - Inside Precision Medicine - November 30th, 2023 [November 30th, 2023]
- Pangenome analysis reveals genomic variations associated with domestication traits in broomcorn millet - Nature.com - November 30th, 2023 [November 30th, 2023]
- Global genetic diversity, introgression, and evolutionary adaptation of indicine cattle revealed by whole genome ... - Nature.com - November 30th, 2023 [November 30th, 2023]
- Genome characteristics of atypical porcine pestivirus from abortion cases in Shandong Province, China - Virology Journal - Virology Journal - November 30th, 2023 [November 30th, 2023]
- CRISPR-Based "Genome Shredding" Technique Shows Promise in Treating Glioblastoma - Inside Precision Medicine - November 30th, 2023 [November 30th, 2023]
- Genome wide analysis revealed conserved domains involved in the effector discrimination of bacterial type VI secretion ... - Nature.com - November 30th, 2023 [November 30th, 2023]
- TRISH to investigate the effects of spaceflight on the human genome, central nervous system - Odessa American - November 30th, 2023 [November 30th, 2023]
- The venom preceded the stinger: Genomic studies shed light on the origins of bee venom - EurekAlert - November 30th, 2023 [November 30th, 2023]
- Integrating genomic and multiomic data for Angelica sinensis provides insights into the evolution and biosynthesis of ... - Nature.com - November 30th, 2023 [November 30th, 2023]
- Genetic diversity and ancestry of the Khmuic-speaking ethnic groups ... - Nature.com - September 21st, 2023 [September 21st, 2023]
- Researchers to Apply Genome Analysis to Childhood Cancers; Goal ... - The Japan News - September 21st, 2023 [September 21st, 2023]
- How Bats' Genomes May Help Them Avoid Cancer and Survive ... - Technology Networks - September 21st, 2023 [September 21st, 2023]
- Longitudinal genomic surveillance of carriage and transmission of ... - Nature.com - September 21st, 2023 [September 21st, 2023]
- Whole genomes from bacteria collected at diagnostic units around ... - Nature.com - September 21st, 2023 [September 21st, 2023]
- Genome-wide identification of lncRNA & mRNA for T2DM | PGPM - Dove Medical Press - September 21st, 2023 [September 21st, 2023]
- Tasmanian tiger RNA is first to be recovered from an extinct animal - Nature.com - September 21st, 2023 [September 21st, 2023]
- Loneliness and depression: bidirectional mendelian randomization ... - Nature.com - September 21st, 2023 [September 21st, 2023]
- Rome Therapeutics adds $72 million to Series B round to harness ... - OutSourcing-Pharma.com - September 21st, 2023 [September 21st, 2023]
- Mystery of 'living fossil' tree frozen in time for 66 million years finally ... - Livescience.com - September 21st, 2023 [September 21st, 2023]
- Why the human genome could be healthcares holy grail - Yahoo Finance - May 4th, 2023 [May 4th, 2023]
- Scientists Compare Genomes of 240 Mammals to Understand Human DNA - The New York Times - May 4th, 2023 [May 4th, 2023]
- Genomes From 240 Mammalian Species Help Explain 100 Years Of Evolution And Human Disease - ABP Live - May 4th, 2023 [May 4th, 2023]
- 'Deletions' from the human genome may be what made us human - Yale News - May 4th, 2023 [May 4th, 2023]
- GeneDx Adds Buccal Swab as Non-Invasive Whole Genome ... - GlobeNewswire - May 4th, 2023 [May 4th, 2023]
- Whole-genome sequencing used to track down genes behind familial glioma - Medical Xpress - May 4th, 2023 [May 4th, 2023]
- Wiggly proteins guard the genome: Dynamic network in the pores of ... - Science Daily - May 4th, 2023 [May 4th, 2023]
- Genome-Wide Splicing Quantitative Expression Locus Analysis ... - Cancer Discovery - May 4th, 2023 [May 4th, 2023]
- Digital Genome Market is expand at a CAGR of 8.6% to reach USD ... - Digital Journal - May 4th, 2023 [May 4th, 2023]
- High School Students Learn the Basics of Base Editing to Cure GFP ... - University of California San Diego - May 4th, 2023 [May 4th, 2023]
- Genomic researchers gain access to CSIRO's AI-powered data ... - Microsoft - May 4th, 2023 [May 4th, 2023]
- Archaic hominin traits through the splicing lens - Nature.com - May 4th, 2023 [May 4th, 2023]
- Critical bug in genome sequencing device scores '10' on CVSS ratings - SC Media - May 4th, 2023 [May 4th, 2023]
- Novel Genomic Approach Ensures Better Diagnosis of Hereditary ... - Technology Networks - May 4th, 2023 [May 4th, 2023]
- Intellia Therapeutics: Leading the Way in Revolutionary Genome ... - Best Stocks - May 4th, 2023 [May 4th, 2023]
- Visual tracking of viral infection dynamics reveals the synergistic ... - Nature.com - May 4th, 2023 [May 4th, 2023]
- Genome | Genome LLC | United States - March 31st, 2023 [March 31st, 2023]
- Belarus: EU and WHO deliver equipment for research of genomes of infectious disease agents - EIN News - February 24th, 2023 [February 24th, 2023]
- Gene vs. genome: Definition, function, and impact - January 30th, 2023 [January 30th, 2023]
- Big cog in the wheel: As Covid worries reappear, Insacogs genome sequencing ability must be aided by govts - Times of India - December 25th, 2022 [December 25th, 2022]
- CapitalGainsReport Sector Spotlight: Healthcare Penny Stocks On The Move (ARDX, WHSI, BNGO) - Marketscreener.com - November 25th, 2022 [November 25th, 2022]
- Genome Insight and Kun-hee Lee Child Cancer & Rare Disease Project Team of SNUH (Seoul National University Hospital) Made an Agreement About a... - November 23rd, 2022 [November 23rd, 2022]
- Genome-wide association study reveals distinct genetic associations related to leaf hair density in two lineages of wheat-wild relative Aegilops... - October 19th, 2022 [October 19th, 2022]
- The Global Genomics Market to Exhibit Growth at a CAGR of 16.90% During the Forecast Period (20222027) | DelveInsight - Yahoo Finance - October 19th, 2022 [October 19th, 2022]
- Illumina and GenoScreen Partner to Expand Access to Genomic Testing for Multidrug-Resistant Tuberculosis - PR Newswire - October 19th, 2022 [October 19th, 2022]
- Superresolution Method Poised to Better Gene Function Understanding - Photonics.com - October 19th, 2022 [October 19th, 2022]
- Genome-centric analysis of short and long read metagenomes reveals uncharacterized microbiome diversity in Southeast Asians - Nature.com - October 15th, 2022 [October 15th, 2022]
- How a New Battery Data Genome Project Will Use Vast Amounts of Information to Build Better EVs - InsideClimate News - October 15th, 2022 [October 15th, 2022]
- Scientists Reconstruct the Genome of the 180-Million-Year-Old Common Ancestor of All Mammals - SciTechDaily - October 15th, 2022 [October 15th, 2022]
- Combining OSMAC, metabolomic and genomic methods for the production and annotation of halogenated azaphilones and ilicicolins in termite symbiotic... - October 15th, 2022 [October 15th, 2022]
- Concerted expansion and contraction of immune receptor gene repertoires in plant genomes - Nature.com - October 15th, 2022 [October 15th, 2022]
- Uncovering the Full Variant Continuum with Pioneering Solutions from Bionano - Inside Precision Medicine - October 15th, 2022 [October 15th, 2022]
- Metagenomic analysis of viromes in tissues of wild Qinghai vole from the eastern Tibetan Plateau | Scientific Reports - Nature.com - October 15th, 2022 [October 15th, 2022]
- Research Assistant in Molecular and Genome Editing Therapeutics job with KINGS COLLEGE LONDON | 311876 - Times Higher Education - October 15th, 2022 [October 15th, 2022]
- Lessons learnt from COVID-19 shed light on future pandemic preparedness - The Peter Doherty Institute for Infection and Immunity - October 15th, 2022 [October 15th, 2022]
- From Neanderthal genome to Nobel prize: meet geneticist Svante Pbo - Nature.com - October 8th, 2022 [October 8th, 2022]
- Revealing the genome organization of the earliest common ancestor of all mammals - Tech Explorist - October 8th, 2022 [October 8th, 2022]
- Mitochondrial DNA Is Working Its Way Into the Human Genome - Technology Networks - October 8th, 2022 [October 8th, 2022]
- Animated Map: Where to Find Water on Mars - Visual Capitalist - October 8th, 2022 [October 8th, 2022]
- Reconstruction of The First Mammal's Genome Suggests It Had 38 Chromosomes - ScienceAlert - October 6th, 2022 [October 6th, 2022]
- Genomic Science Breakthroughs Are Happening Faster Than Ever Thanks to HPC - CIO - October 6th, 2022 [October 6th, 2022]
- Genome Of Ancient Humans Is The Winning Field Of 2022's Nobel Prize in Medicine - IFLScience - October 6th, 2022 [October 6th, 2022]
- ASU professor to study new genome editing tools with NIH Innovator Award - ASU News Now - October 6th, 2022 [October 6th, 2022]
- New R&D norms to fast-track research on genome-edited crops - The Financial Express - October 6th, 2022 [October 6th, 2022]
- Genomic Research Aids in the Effort to Understand How Best to Treat Deadly Infections Caused by a Fungus - UMass News and Media Relations - October 6th, 2022 [October 6th, 2022]