{"id":1119721,"date":"2023-11-30T20:35:05","date_gmt":"2023-12-01T01:35:05","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/correcting-modification-mediated-errors-in-nanopore-sequencing-by-nucleotide-demodification-and-reference-based-nature-com\/"},"modified":"2023-11-30T20:35:05","modified_gmt":"2023-12-01T01:35:05","slug":"correcting-modification-mediated-errors-in-nanopore-sequencing-by-nucleotide-demodification-and-reference-based-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/correcting-modification-mediated-errors-in-nanopore-sequencing-by-nucleotide-demodification-and-reference-based-nature-com\/","title":{"rendered":"Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based &#8230; &#8211; Nature.com"},"content":{"rendered":"<p><p>Unusual low-quality ONT genomes due to extensive modifications    <\/p>\n<p>    We sequenced 12 microbial strains of Listeria    monocytogenes using Illumina and ONT R9.4 flowcells    (~200990Mbp, SUP model) (Fig.1a, Supplementary    Tables1 and 2). The ONT reads    were assembled into genomes with sequencing errors further    polished by Medaka and Homopolish (Supplementary    Table3, see Methods). The    Illumina and ONT read were hybrid assembled for evaluation    purposes (Supplementary Table4). When compared    with the Illumina\/ONT hybrid assemblies    (Fig.1b), seven ONT-only    genomes exhibited high quality (HQ) ranging from Q47 to Q60    (e.g., R19-2905 and R20-0088). However, five isolates    (R20-0026, R20-0030, R20-0127, R20-0148, and R20-0150) showed    unexpectedly low quality (LQ) varying from Q26 to Q32. The    accuracy of these five LQ genomes remained unimproved after    replicated ONT sequencing. Further investigation of the five LQ    genomes revealed excessive amounts of mismatch errors    (15335670) compared with the seven HQ ones (040 mismatches)    (Fig.1c). Homopolymer errors    (i.e., indels) were not the source of inferior quality (7306,    Supplementary Table5).  <\/p>\n<p>            a Workflow of ONT-only and ONT\/Illumina hybrid            assembly; b Q scores; c number of            mismatches (red: LQ, gray: HQ); d comparison of            ONT and Illumina reads by IGV; e numbers of 5mC,            6mA, and mismatches between HQ\/LQ strains            (n=12, red: LQ, gray: HQ). Error bars            represent the minimum and maximum values.          <\/p>\n<p>    Manual inspection revealed that these mismatches were ONT    basecalling errors uncorrected after genome polishing    (Fig.1d and Supplementary    Fig.1). As mismatch    errors in ONT are mainly due to epigenetic modifications, we    computed the frequency of well-known methylation in these    isolates (see Method and Supplementary    Table6). In terms of    5-methylcytosine (5mC), the numbers of modified loci in the    five LQ genomes (~240340k) were not significantly higher than    those in the HQ ones (210345k, P=0.89,    Fig.1e). Similarly, the    numbers of N6-methyladenine (6mA) modifications    also showed no significant difference between the LQ and HQ    groups (98218k vs. 126223k, P=0.34). Because the    numbers of mismatch errors in LQ genomes are significantly    higher than those of HQ ones (P=0.005), we suspected    ONT basecalling algorithms failed to distinguish the novel    modification types in the LQ isolates.  <\/p>\n<p>    We removed the modifications in all microbial samples by WGA    (Fig.2a), which randomly    amplifies the genome fragments without retaining any epigenetic    modification (see Methods). The WGA-demodified samples were    sequenced by ONT (R9.4), assembled into chromosomes, and    compared with the Illumina\/ONT hybrid genomes    (Fig.2a, Supplementary    Tables7 and 8). The five LQ    genomes after WGA exhibited significantly higher quality than    those without demodifications (e.g., Q26 to Q53 in R20-0026)    (Fig.2b, Supplementary    Table9). In particular,    the amounts of mismatch errors significantly reduced after    demodification (e.g., 5670 to 16 in R20-0026)    (Fig.2c). Consequently, the    unexpected low quality of ONT was due to excessive    modification-induced errors untrained in their basecalling    model. The demodification by WGA can produce high-quality ONT    genomes without the need for Illumina short reads.  <\/p>\n<p>            a Worflow of WGA-demodified ONT; b            Q scores of the WGA-demodified and ONT-only            genomes (gray: ONT, black: WGA ONT); c numbers            of mismatches of the WGA-demodified and ONT-only            genomes (gray: ONT, black: WGA ONT); d WGA and            ONT-only genome quality with respect to sequencing            depth (shading: mininum and maximum quality in five            replicates, line: median quality); e numbers of            active\/available pores during WGA-demodified and            ordinary ONT sequencing.          <\/p>\n<p>    However, while WGA successfully erased these modifications, the    sequencing cost increased by two factors. First, WGA required a    higher sequencing depth (~100) for assembling a complete    genome when compared with ordinary ONT sequencing (~30)    (Fig.2d and Supplementary    Figs.2 and 3). It was due to the    uneven amplification of WGA, which led to non-uniform    sequencing depth and a fragmented assembly at moderate    coverage. Second, the WGA-demodified samples may reduce the ONT    yields. We observed the number of available\/active pores could    sometimes decrease quickly (e.g., less than 100 pores after    12h) (Fig.2e), which was possibly    owing to the hyperbranched structure unresolved after    WGA10. Consequently,    the sequencing cost of WGA-demodified samples using ONT is much    higher than ordinary sequencing.  <\/p>\n<p>    We developed a novel computational method (called Modpolish)    for correcting these modification-mediated errors without WGA    and prior knowledge of the modification systems. Modpolish    identifies and corrects the modification-mediated errors by    leveraging basecalling quality, basecalling consistency, and    evolutionary conservation (Fig.3a, see Methods).    Briefly, because the ONT signals are disturbed by    modifications, the basecalling quality is substantially lower    than the modification-free loci (Supplementary    Fig.4). As such, the    basecalled nucleotides are often inconsistent at the modified    loci (Supplementary Fig.5), yet these loci    are within conservative motifs (Supplementary    Fig.6). In conjunction    with the conservation degree measured by closely-related    genomes, only the modified loci with ultra-high conservation    will be corrected by Modpolish, avoiding false corrections of    strain variations with high specificity.  <\/p>\n<p>            a Workflow of Modpolish; b Q            scores before and after Modpolish; c numbers of            mismatches before and after Modpolish (gray: before            Modpolish, black: after Modpolish); d the            antiviral defending systems encoded by the 12 strains            (gray: before Modpolish, black: after Modpolish);            e the sequence motif of modification sites in            the four mza-encoding strains; f the            sequence motif of modification sites on the R20-0026            strain.          <\/p>\n<p>    We assessed the accuracy of Modpolish by comparing the quality    of the ONT-only genomes (polished by Medaka) with those further    polished by Modpolish. The results indicated that Modpolish    significantly improved the quality of all LQ genomes from    Q2734 to Q60 (Fig.3b, Supplementary    Table10). The number of    mismatches also greatly decreased (e.g., from 5670 to 67 in    R20-0026) (Fig. (3c). The numbers of    mismatches in some HQ genomes were also reduced by Modpolish.    For instance, the mismatches in the R19-2905 were erased from    40 to 6. Consequently, our results suggested that Modpolish    made no false corrections on the HQ genomes (Supplementary    Tables1113). The comparison    of different basecaller versions and models (v4.0.14 vs.    v6.3.4, HAC vs. SUP) indicated that these errors remain exist    and Modpolish successfully erases most of them (Supplementary    Fig.7).  <\/p>\n<p>    As the modification systems often involve anti-phage defense    (e.g., R-M, BREX, DISARM)11,12,13, we investigated    the defending systems possessed by the HQ and LQ strains    (Fig.3d) (Supplementary    Data1). All the HQ    genomes encompass at least one R-M system (e.g., Type I, II, or    III), which is missing in all LQ isolates. Instead, four LQ    strains (i.e., R20-0030, R20-0127, R20-0148, R20-150) carry a    novel methyltransferase-encoding mza defending system    which is absent in all HQ genomes (Supplementary    Fig.8). Analysis of    modification sites of the four mza-encoding LQ strains    revealed pentanucleotide motif GCAGC (Fig.3e, Supplementary    Fig.6). On the other    hand, modification loci in the LQ R20-0026 all centered on the    motif GCTGG (Fig.3f). Together, these    results suggested that two lineage-specific modification    systems extensively edited the five LQ genomes. Although their    underlying mechanisms remained unclear, the editing at specific    motifs with high conservation within each lineage allowed    cost-effective in silico correction of these errors by    Modpolish.  <\/p>\n<p>    We then assessed the performance of Modpolish on public ONT    datasets sequenced by R9.4 (SUP) and R10.4 flowcells (SUP,    duplex\/simplex modes). In the R9.4 dataset14, we first    compared the quality of seven bacterial genomes polished by    Medaka and Modpolish (Fig.4a, Supplementary    Table14). The quality of    five genomes significantly improved from ~Q45 to Q60.    Similarly, the improvement was mainly due to the reduction of    mismatches (Fig.4b). For instance, the    number of mismatches decreased from 388 to 13 in the    Staphylococcus genome after Modpolish. On average, the    mismatch reduction rates of all genomes ranged from 50-96%.    Consequently, although these bacterial genomes are not    extensively modified, Modpolish can further improve their    quality after Medaka without false corrections.  <\/p>\n<p>            Comparison of Medaka and Modpolish for a Q            scores and b mismatches on the R9.4 dataset;            comparison of Medaka and Modpolish for c            Q scores and d mismatches on the R10.4            dataset.          <\/p>\n<p>    In the R10.4 (duplex mode) dataset3, we compared the    genome qualities polished by Medaka and Modpolish (downsampled    to ~60) (Fig.4c, Supplementary    Table15). In general,    Modpolish made little or no improvement in the duplex dataset.    For instance, the mismatches produced by Modpolish only reduced    from 20 to 19 on the Bacillus genome    (Fig.4d). The overall genome    quality is very high such that no differences can be seen    (Q60). Modpolish demonstrated marginal on a recently published    simplex dataset (R10.4, kit 14, Dorado v0.1.1) (Supplementary    Fig.9). Therefore, the    qualities of ONT R10.4 flowcells, in particular the duplex    mode, is not only higher than those of R9.4 and require nearly    no further correction. On the other hand, Modpolish may be used    to fill the accuracy gap between simplex and duplex modes when    the projects aim for higher throughput.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>View original post here:<br \/>\n<a target=\"_blank\" href=\"https:\/\/www.nature.com\/articles\/s42003-023-05605-4\" title=\"Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based ... - Nature.com\" rel=\"noopener\">Correcting modification-mediated errors in nanopore sequencing by nucleotide demodification and reference-based ... - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Unusual low-quality ONT genomes due to extensive modifications We sequenced 12 microbial strains of Listeria monocytogenes using Illumina and ONT R9.4 flowcells (~200990Mbp, SUP model) (Fig.1a, Supplementary Tables1 and 2). The ONT reads were assembled into genomes with sequencing errors further polished by Medaka and Homopolish (Supplementary Table3, see Methods). The Illumina and ONT read were hybrid assembled for evaluation purposes (Supplementary Table4).  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/correcting-modification-mediated-errors-in-nanopore-sequencing-by-nucleotide-demodification-and-reference-based-nature-com\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[],"class_list":["post-1119721","post","type-post","status-publish","format-standard","hentry","category-genome"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1119721"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1119721"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1119721\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1119721"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1119721"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1119721"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}