{"id":1119318,"date":"2023-11-15T03:02:54","date_gmt":"2023-11-15T08:02:54","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/crispr-broad-combined-design-of-multi-targeting-grnas-and-nature-com\/"},"modified":"2023-11-15T03:02:54","modified_gmt":"2023-11-15T08:02:54","slug":"crispr-broad-combined-design-of-multi-targeting-grnas-and-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/human-genetics\/crispr-broad-combined-design-of-multi-targeting-grnas-and-nature-com\/","title":{"rendered":"CRISPR-broad: combined design of multi-targeting gRNAs and &#8230; &#8211; Nature.com"},"content":{"rendered":"<p><p>CRISPR-broad framework    <\/p>\n<p>    We developed a procedural pipeline for detecting gRNAs and    implemented this in Python as a standalone application    (Fig.1a). For speeding up    gRNA selection, we employed multithreading and used big data    Python module Pandas. This allowed splitting millions of short    sequences for mapping and processing large numbers of    uncompressed alignments. The different steps and options of    CRISPR-broad are implemented in seven different modules (in    Python with Pandas and PyRanges packages) to avoid    re-performing steps that are computationally demanding.    Multiple options for user input are available    (Fig.1b).  <\/p>\n<p>            Modules and features of CRISPR-broad. (a)            Working scheme of the CRISPR-broad tool. Several steps            in this pipeline are multithreaded. The input is a            multiFASTA genome file and each step can be            individually executed. Indexing and mapping steps are            time limiting and can be performed separately. The            output of this pipeline is a ranked list of gRNAs in            text format. (b) The different modules in            execution of CRISPR-broad, their features and            applicability as well as the respective options for            user input are shown. The different options for running            the individual modules are described in detail at            <a href=\"https:\/\/github.com\/AlagurajVeluchamy\/CRISPR-broad\" rel=\"nofollow\">https:\/\/github.com\/AlagurajVeluchamy\/CRISPR-broad<\/a>.          <\/p>\n<p>    Running CRISPR-broad on the C. elegans genome (target    window size 50kb), we obtained 5,734,064 candidate gRNAs    with the Cas9 PAM pattern NGG at the 3-end and flanked by 20    nt at the 5-end. We allowed a range of mismatches from 0 to 3    to map to the C. elegans genome assembly Ce235 using the    end-to-end all alignment option in bowtie2. The large    pairwise alignment was parsed for indels and matches to    calculate a ranking score. About 18% of these candidate gRNAs    were mapped to multiple sites. We further filtered entries that    were aligned to less than five genomic loci. Our analysis    resulted in 27,858 gRNAs (five hits in the selected window)    that could target 6421 unique 50kb regions (Supplementary    Fig.2a).  <\/p>\n<p>    Next, we scanned the human genome (target window size    500kb) and filtered candidate gRNAs with a cutoff of 50%    GC. This resulted in around 120 million gRNAs. We mapped these    sequences with a range of mismatches from zero to three and    maximum hits of 10,000. The multi-mapped positions were    verified for PAM sequences at their 3-end and pooled. We    processed candidate gRNAs further that had at least five hits    in the genome. This combined filtering resulted in 2,413,602    (0.6%) gRNAs that target 1,678,629 windows (Supplementary    Fig.2b). The targetable    windows with minimally five loci for a unique gRNA of the C.    elegans and H. sapiens genomes were spread    throughout the different chromosomes (Supplementary    Fig.2b). The aggregate    gRNA score pattern distribution for both sample genomes showed    that although off-targets are high (negative score), a    significant number of high scoring regions in these genomes are    available for gRNA targeting (Fig.2a). Irrespective of    genome size or sequence content, the aggregate score decreased    with the number of off-targets thereby validating the    score-based selection of gRNAs (Fig.2b,c).  <\/p>\n<p>            Aggregate gRNA score distribution for two model            organisms. (a) The aggregate gRNA score ranging            from 1 to +1 for two datasets is shown in a density            plot. Aggregate gRNA score with up to 10k            off-target (OT) settings in C. elegans            (b) and H. sapiens (c).          <\/p>\n<p>    Inter-bin distance defines the gap between two target regions    and hence illustrates the density of target windows. Analysis    of this parameter between different gRNA candidates with or    without off-targets revealed that gRNA distribution is not    biased over different chromosomes (Fig.3a). Finding potential    gRNAs was further supported by increase in window size and by    selecting gRNA that are multi-targeting    (Fig.3b).  <\/p>\n<p>            Distribution of gRNAs along the chromosomes of C.            elegans and H. sapiens. (a) gRNA            sequences clustered in small intervals are evident from            this analysis on distribution of gRNA hits. Inter-bin            distances of multi-hit gRNA sequences with and without            off-target. The distances of gRNA hits are shown in bp            (in equal bin size). Note the difference in the            distribution of gRNAs with or without off-targets for            C. elegans and H. sapiens due to the            different repetitiveness of the two genomes. (b)            Boxplot showing the relationship between size and            number of target bins in the genome of C.            elegans. Off-target hits represent the sum of gRNA            hits that fall outside all the multiple target windows.            W, window size; N, number of target windows.          <\/p>\n<p>    Typical unique sgRNA selection involves reducing off-target    hits on multiple genomic regions and finding a unique target    sequence. Tandem duplications in the genome are one cause of    off-target effects. CRISPR-broad uses these duplication events    in detecting gRNAs in bins (a large genomic region). Larger    window sizes could reduce the potential off- target effect of    gRNAs in our tool. This was evident from the number of on- and    off-target hits (Fig.3b).  <\/p>\n<p>    Each sgRNA has N total hits in the genome, T hits in the target    window and O hits in the off-targets (region outside\/different    from the 50\/500kb target window). When analyzing the    C. elegans and H. sapiens genomes, there was no    correlation between N and O (Fig.4a,b). The 50kb    and 500kb windows showed a vast number of on-targets    compared to off-targets, revealing a wide range of selectable    regions. Indeed, on-target regions could be identified that    showed a high number of gRNA loci with zero off-targets. This    included a pericentromeric region of human chromosome 1, which    has 272 gRNAs loci with no apparent off-targets (Supplementary    Fig.3a). Similarly, in    C. elegans analysis with a window size of 10kb    revealed a region on the X chromosome (chrX:73517361kb)    where at least 1000 loci could be found for one gRNA    (Supplementary Fig.3b). The candidate    target regions identified in both, C. elegans and H.    sapiens were not limited to functionally annotated    repetitive regions (e.g. telomeres, satellites) that could be    directly targeted by classical gRNA design tools such as    CHOPCHOP (Supplementary Fig.3c,d).  <\/p>\n<p>            Relationship of on-target and off-target sites for each            gRNA. Multi-hit alignment with short read aligner was            performed for each gRNA. Number of hits within the            selected window and off-target windows were enumerated            from the alignment. (a) Off-target distribution            in comparison to the number of on-target hits in C.            elegans (50kb window). (b) off-target            distribution in comparison to the number of on-target            hits in H. sapiens (500kb window).            (c) Off-targets predicted by CasOFFinder            compared to the CRISPR-broad scoring system in C.            elegans. (d) CRISPR-broad score for gRNAs in            H. sapiens compared to off-targets predicted by            CasOFFinder. The number of off-targets predicted for            individual gRNAs is anticorrelated to our CRISPR-broad            scoring system.          <\/p>\n<p>    Global comparison of the CRISPR-broad scores derived from    analyzing the C. elegans and H. sapiens genomes    to the results of an independent, state-of-the-art off-target    scanning tool for individual gRNAs (CasOffinder), indicated    that these are higher for gRNAs that were identified to have a    lower number of predicted off-targets (Fig.4c,d). This supported    the notion that our scoring method is relevant for selection of    multi-targeting gRNAs.  <\/p>\n<p>    We calculated cumulative scores for the gRNAs matching to    selected loci and including a penalty score in case off-targets    were found. These scores range from 1 to +1. In both genomes    analyzed, C. elegans and H. sapiens we observed a    bias towards the extreme values on both sides of the aggregate    gRNA score, i.e. many gRNAs are either good candidates for    multi-targeting with many hits and no off-targeting (aggregate    gRNA score close to +1) or are showing many off-target hits    and mismatching (aggregate gRNA score of close to 1)    (Fig.2a). The very high    negative aggregate gRNA scores observed are reflection of    repetitive elements such as Alu sequences, LINE-1    retrotransposons, MIR, and human endogenous retroviruses    (HERVs), which represent 55% of the human genome, occurring in    multiple copies27. Similarly, in    the C. elegans genome MITE sequence repeats might    elevate the number of off-targets28. These    off-targets are correlated to the aggregate gRNA score    (Fig.2b,c).  <\/p>\n<p>    sgRNA efficiency has been correlated with the GC content of the    nucleotide sequence29. We explored    whether the GC content feature impacted the number of available    gRNAs (with significant number of on-target hits and lower    off-target hits). The aggregate gRNA scores (gRNA scores of    each window) varied highly from the GC-contents of the    sequences (Fig.5). This indicated that    CRISPR-broad scans a wide range of gRNAs that may have    different levels of repetitive nucleotide sequences. The    repetitive elements may be AT-rich and gRNA selection based on    gRNA score is not limited by GC content.  <\/p>\n<p>            gRNA score correlation to GC composition of the 23            nucleotides gRNA sequence. (a, b) Sequence            composition as dinucleotide frequencies were            calculated. The gRNA score (range from 2 to +1) and            the GC content are depicted in the density plot.            Aggregate gRNA score and repetition of sequence            (off-target) are independent of the sequence            composition. Many candidate gRNAs with high aggregate            gRNA score that corresponds to candidate target windows            are available for varied GC content.          <\/p>\n<p>    To elucidate the effects of user-defined bin size and number of    distinct gRNA combinations, we scanned the C. elegans    genome with two window sizes of 1kb and 200kb and    targeting window numbers of 3 and 10. As expected, the number    of off-targets decreased with increasing target window sizes    and the number of target regions (Fig.3b). Our analysis    showed that with different bin sizes and using multiple gRNA, a    wide range of regions can be selected for targeting with    singular gRNAs.  <\/p>\n<p>    The dispersion of a gRNA within a bin is depending on the    number of hits and this increases with the number of mismatches    (03). Nevertheless, most hits for gRNAs were unique with no    mismatches. This is revealed from sgRNA mismatch analysis of    the whole genome of C elegans and a random selection of    10,000 sgRNA in H. sapiens (Fig.6a    and Supplementary Fig.5a). Also, these    mismatches were independent of the position within a bin    (Supplementary Fig.4). Further, the    dispersion of individual gRNAs did not correlate with the    aggregate gRNA score in both C. elegans and H.    sapiens. In C. elegans most gRNAs with higher    standard deviation from the mid position of the bin showed    lower aggregate gRNA scores (Fig.6b). Also, in H.    sapiens, the standard deviation was not correlated to the    gRNA score but was associated with a varied range of gRNA    scores (Supplementary Fig.5b). This difference    is because the H. sapiens genome is large and has more    multi-targetable regions compared to the C. elegans    genome. In both cases, a substantial number of gRNAs of varied    standard deviation and with no off-targets could be selected.  <\/p>\n<p>            Assessment of displacement of gRNAs within on-target            windows. (a) CRISPR-broad was used to scan for            potential gRNAs with different levels of mismatches,            since earlier reports have shown that the efficiency of            gRNAs are limited by the number of mismatches. Mismatch            levels and number of on-target hits for gRNAs of            individual 50kb windows in C. elegans are            shown. Mismatch levels are set in the range from 0 to            3. Many selectable gRNAs and their corresponding target            windows are available even at a mismatch level of 0.            (b) Hexbin plot showing the relationship between            aggregate gRNA score and dispersion. Standard deviation            (dispersion) was calculated from the position of the            gRNA hits within a target window. The aggregate gRNA            score ranges from negative to positive values. Higher            values of standard deviation correspond to higher            distribution of gRNA within a target window. Standard            deviation and gRNA score were calculated using            500kb windows in H. sapiens.          <\/p>\n<p>    Using PyRanges, we created intervals of user-defined size that    are overlapping with gRNA candidates containing the Cas9 PAM    pattern (3-NGG-5). Since this step is computationally    intensive, we have implemented options to narrow down the    search with minimum and maximum number of hits for a target    window.  <\/p>\n<p>    Analysis of the annotation of regions of the C. elegans    and H. sapiens genomes that can be targeted by    multi-targeting gRNAs indicated that a broad range of features    including genes and gene regulatory elements are available for    selection. The range of annotated, targetable regions for each    genome could be further significantly increased when combining    gRNA searches for different genome-targeting systems that use    different PAM sequences (Supplementary Fig.6).  <\/p>\n<p>    To test CRISPR-broad we resorted to a previously described    method of painting genome regions by targeting dCas9 fused to    green fluorescent protein (GFP). Singular gRNAs targeting more    than 100 directly repeated sequences within telomeres or    pericentromeres identified by classical gRNA design tools has    enabled mapping of these functional chromosome elements in    cellular context4,5,6. Using    CRISPR-broad we identified a singular gRNA targeting a    317kb region on human chromosome 19 at 19p13.2 with 86    hits (Fig.7a). Human U2OS    transfected with a plasmid expressing dCas9-3XGFP together with    a plasmid expressing the identified sgRNA showed two or 4 dots    of accumulated green fluorescence in the nucleus in agreement    with a 2n (G1- and S-phase) or 4n (G2-phase) chromosome    content. In contrast and as described before4,5,6, dCas9-3XGFP in    the absence of specific gRNA-mediated targeting displayed    nucleolar background staining in the cell nucleus    (Fig.7b). The results    indicated that CRISPR-broad can identify large genomic regions    for efficient targeting of dCas9 apart from simple and obvious    repetitive elements of the genome.  <\/p>\n<p>            Targeting of a broad region of the genome using a            singular gRNA designed by CRIPSR-broad. (a)            Scheme depicting a 317kb region on human            chromosome 19 that can be targeted by a sgRNA at 86            locations. (b) Fluorescence imaging of U2OS            cells transfected with a plasmid expressing dCas9-3XGFP            together with a plasmid expressing the sgRNA targeting            the region depicted in (A) (top) or the            corresponding empty vector (bottom). Focal            enrichment of GFP inside the nucleus is marked by            arrows. Note that due to the different cell cycle            stages two (2n chromosome content, G1- , S-phases) or            four (4n chromosome content, G2-phase) labeled spots            are expected. Scale bar represents 20m. Details            on the selection of the presented cells and images can            be found in Supplementary Fig.8.          <\/p>\n<p>    To assess the wider application and potential of CRISPR-broad,    we compared the results of the test runs on the C.    elegans and H. sapiens genomes using the single Cas9    PAM with annotated (epi-)genetic features using the ENCODE and    modENCODE datasets. We found the multi-targetable windows    defined by CRISPR-broad overlapping with the features    transcription factor binding sites (ChIP-seq peak regions),    histone modification region (ChIP-seq peaks), annotated    transposable elements in the genome and sites of DNA    methylation (WGBS: methylated CpG sites). The fact that the    fraction of each of these sites that could be targeted by    multi-targeting gRNAs (number of features overlapped to a gRNA    window of 5kb\/total number of features) is substantial    (Supplementary Fig.7) indicated that    CRISPR-broad could be useful in various strategies of epigenome    editing.  <\/p>\n<p>    CRISPR-broad was developed in Python and the source code is    available at <a href=\"https:\/\/github.com\/AlagurajVeluchamy\/CRISPR-broad\" rel=\"nofollow\">https:\/\/github.com\/AlagurajVeluchamy\/CRISPR-broad<\/a>.    CRISPR-broad runs in seven independent modules with multiple    options for user input (Fig.1b). The limiting steps    are mapping the gRNAs to the genome and obtaining all hits. We    tested the performance of the tool on a Linux workstation with    3040 threads computed for genome sizes of 103Mb (C.    elegans) and 3.2Gb (H. sapiens) (Table    1). With an increase in    genome size and in the allowed number of mismatches, the run    time increased. The gRNA sequences, aggregate gRNA scores, GC    content, number of on- and off-target hits, optimal on-target    window of pre- selected size, and co-ordinates of each hit are    compiled and exported in a tab-delimited text (Supplementary    Table 2).  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Go here to see the original:<br \/>\n<a target=\"_blank\" href=\"https:\/\/www.nature.com\/articles\/s41598-023-46212-x\" title=\"CRISPR-broad: combined design of multi-targeting gRNAs and ... - Nature.com\" rel=\"noopener\">CRISPR-broad: combined design of multi-targeting gRNAs and ... - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> CRISPR-broad framework We developed a procedural pipeline for detecting gRNAs and implemented this in Python as a standalone application (Fig.1a).  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/human-genetics\/crispr-broad-combined-design-of-multi-targeting-grnas-and-nature-com\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27],"tags":[],"class_list":["post-1119318","post","type-post","status-publish","format-standard","hentry","category-human-genetics"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1119318"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1119318"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1119318\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1119318"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1119318"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1119318"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}