CRISPR-broad framework
We developed a procedural pipeline for detecting gRNAs and implemented this in Python as a standalone application (Fig.1a). For speeding up gRNA selection, we employed multithreading and used big data Python module Pandas. This allowed splitting millions of short sequences for mapping and processing large numbers of uncompressed alignments. The different steps and options of CRISPR-broad are implemented in seven different modules (in Python with Pandas and PyRanges packages) to avoid re-performing steps that are computationally demanding. Multiple options for user input are available (Fig.1b).
Modules and features of CRISPR-broad. (a) Working scheme of the CRISPR-broad tool. Several steps in this pipeline are multithreaded. The input is a multiFASTA genome file and each step can be individually executed. Indexing and mapping steps are time limiting and can be performed separately. The output of this pipeline is a ranked list of gRNAs in text format. (b) The different modules in execution of CRISPR-broad, their features and applicability as well as the respective options for user input are shown. The different options for running the individual modules are described in detail at https://github.com/AlagurajVeluchamy/CRISPR-broad.
Running CRISPR-broad on the C. elegans genome (target window size 50kb), we obtained 5,734,064 candidate gRNAs with the Cas9 PAM pattern NGG at the 3-end and flanked by 20 nt at the 5-end. We allowed a range of mismatches from 0 to 3 to map to the C. elegans genome assembly Ce235 using the end-to-end all alignment option in bowtie2. The large pairwise alignment was parsed for indels and matches to calculate a ranking score. About 18% of these candidate gRNAs were mapped to multiple sites. We further filtered entries that were aligned to less than five genomic loci. Our analysis resulted in 27,858 gRNAs (five hits in the selected window) that could target 6421 unique 50kb regions (Supplementary Fig.2a).
Next, we scanned the human genome (target window size 500kb) and filtered candidate gRNAs with a cutoff of 50% GC. This resulted in around 120 million gRNAs. We mapped these sequences with a range of mismatches from zero to three and maximum hits of 10,000. The multi-mapped positions were verified for PAM sequences at their 3-end and pooled. We processed candidate gRNAs further that had at least five hits in the genome. This combined filtering resulted in 2,413,602 (0.6%) gRNAs that target 1,678,629 windows (Supplementary Fig.2b). The targetable windows with minimally five loci for a unique gRNA of the C. elegans and H. sapiens genomes were spread throughout the different chromosomes (Supplementary Fig.2b). The aggregate gRNA score pattern distribution for both sample genomes showed that although off-targets are high (negative score), a significant number of high scoring regions in these genomes are available for gRNA targeting (Fig.2a). Irrespective of genome size or sequence content, the aggregate score decreased with the number of off-targets thereby validating the score-based selection of gRNAs (Fig.2b,c).
Aggregate gRNA score distribution for two model organisms. (a) The aggregate gRNA score ranging from 1 to +1 for two datasets is shown in a density plot. Aggregate gRNA score with up to 10k off-target (OT) settings in C. elegans (b) and H. sapiens (c).
Inter-bin distance defines the gap between two target regions and hence illustrates the density of target windows. Analysis of this parameter between different gRNA candidates with or without off-targets revealed that gRNA distribution is not biased over different chromosomes (Fig.3a). Finding potential gRNAs was further supported by increase in window size and by selecting gRNA that are multi-targeting (Fig.3b).
Distribution of gRNAs along the chromosomes of C. elegans and H. sapiens. (a) gRNA sequences clustered in small intervals are evident from this analysis on distribution of gRNA hits. Inter-bin distances of multi-hit gRNA sequences with and without off-target. The distances of gRNA hits are shown in bp (in equal bin size). Note the difference in the distribution of gRNAs with or without off-targets for C. elegans and H. sapiens due to the different repetitiveness of the two genomes. (b) Boxplot showing the relationship between size and number of target bins in the genome of C. elegans. Off-target hits represent the sum of gRNA hits that fall outside all the multiple target windows. W, window size; N, number of target windows.
Typical unique sgRNA selection involves reducing off-target hits on multiple genomic regions and finding a unique target sequence. Tandem duplications in the genome are one cause of off-target effects. CRISPR-broad uses these duplication events in detecting gRNAs in bins (a large genomic region). Larger window sizes could reduce the potential off- target effect of gRNAs in our tool. This was evident from the number of on- and off-target hits (Fig.3b).
Each sgRNA has N total hits in the genome, T hits in the target window and O hits in the off-targets (region outside/different from the 50/500kb target window). When analyzing the C. elegans and H. sapiens genomes, there was no correlation between N and O (Fig.4a,b). The 50kb and 500kb windows showed a vast number of on-targets compared to off-targets, revealing a wide range of selectable regions. Indeed, on-target regions could be identified that showed a high number of gRNA loci with zero off-targets. This included a pericentromeric region of human chromosome 1, which has 272 gRNAs loci with no apparent off-targets (Supplementary Fig.3a). Similarly, in C. elegans analysis with a window size of 10kb revealed a region on the X chromosome (chrX:73517361kb) where at least 1000 loci could be found for one gRNA (Supplementary Fig.3b). The candidate target regions identified in both, C. elegans and H. sapiens were not limited to functionally annotated repetitive regions (e.g. telomeres, satellites) that could be directly targeted by classical gRNA design tools such as CHOPCHOP (Supplementary Fig.3c,d).
Relationship of on-target and off-target sites for each gRNA. Multi-hit alignment with short read aligner was performed for each gRNA. Number of hits within the selected window and off-target windows were enumerated from the alignment. (a) Off-target distribution in comparison to the number of on-target hits in C. elegans (50kb window). (b) off-target distribution in comparison to the number of on-target hits in H. sapiens (500kb window). (c) Off-targets predicted by CasOFFinder compared to the CRISPR-broad scoring system in C. elegans. (d) CRISPR-broad score for gRNAs in H. sapiens compared to off-targets predicted by CasOFFinder. The number of off-targets predicted for individual gRNAs is anticorrelated to our CRISPR-broad scoring system.
Global comparison of the CRISPR-broad scores derived from analyzing the C. elegans and H. sapiens genomes to the results of an independent, state-of-the-art off-target scanning tool for individual gRNAs (CasOffinder), indicated that these are higher for gRNAs that were identified to have a lower number of predicted off-targets (Fig.4c,d). This supported the notion that our scoring method is relevant for selection of multi-targeting gRNAs.
We calculated cumulative scores for the gRNAs matching to selected loci and including a penalty score in case off-targets were found. These scores range from 1 to +1. In both genomes analyzed, C. elegans and H. sapiens we observed a bias towards the extreme values on both sides of the aggregate gRNA score, i.e. many gRNAs are either good candidates for multi-targeting with many hits and no off-targeting (aggregate gRNA score close to +1) or are showing many off-target hits and mismatching (aggregate gRNA score of close to 1) (Fig.2a). The very high negative aggregate gRNA scores observed are reflection of repetitive elements such as Alu sequences, LINE-1 retrotransposons, MIR, and human endogenous retroviruses (HERVs), which represent 55% of the human genome, occurring in multiple copies27. Similarly, in the C. elegans genome MITE sequence repeats might elevate the number of off-targets28. These off-targets are correlated to the aggregate gRNA score (Fig.2b,c).
sgRNA efficiency has been correlated with the GC content of the nucleotide sequence29. We explored whether the GC content feature impacted the number of available gRNAs (with significant number of on-target hits and lower off-target hits). The aggregate gRNA scores (gRNA scores of each window) varied highly from the GC-contents of the sequences (Fig.5). This indicated that CRISPR-broad scans a wide range of gRNAs that may have different levels of repetitive nucleotide sequences. The repetitive elements may be AT-rich and gRNA selection based on gRNA score is not limited by GC content.
gRNA score correlation to GC composition of the 23 nucleotides gRNA sequence. (a, b) Sequence composition as dinucleotide frequencies were calculated. The gRNA score (range from 2 to +1) and the GC content are depicted in the density plot. Aggregate gRNA score and repetition of sequence (off-target) are independent of the sequence composition. Many candidate gRNAs with high aggregate gRNA score that corresponds to candidate target windows are available for varied GC content.
To elucidate the effects of user-defined bin size and number of distinct gRNA combinations, we scanned the C. elegans genome with two window sizes of 1kb and 200kb and targeting window numbers of 3 and 10. As expected, the number of off-targets decreased with increasing target window sizes and the number of target regions (Fig.3b). Our analysis showed that with different bin sizes and using multiple gRNA, a wide range of regions can be selected for targeting with singular gRNAs.
The dispersion of a gRNA within a bin is depending on the number of hits and this increases with the number of mismatches (03). Nevertheless, most hits for gRNAs were unique with no mismatches. This is revealed from sgRNA mismatch analysis of the whole genome of C elegans and a random selection of 10,000 sgRNA in H. sapiens (Fig.6a and Supplementary Fig.5a). Also, these mismatches were independent of the position within a bin (Supplementary Fig.4). Further, the dispersion of individual gRNAs did not correlate with the aggregate gRNA score in both C. elegans and H. sapiens. In C. elegans most gRNAs with higher standard deviation from the mid position of the bin showed lower aggregate gRNA scores (Fig.6b). Also, in H. sapiens, the standard deviation was not correlated to the gRNA score but was associated with a varied range of gRNA scores (Supplementary Fig.5b). This difference is because the H. sapiens genome is large and has more multi-targetable regions compared to the C. elegans genome. In both cases, a substantial number of gRNAs of varied standard deviation and with no off-targets could be selected.
Assessment of displacement of gRNAs within on-target windows. (a) CRISPR-broad was used to scan for potential gRNAs with different levels of mismatches, since earlier reports have shown that the efficiency of gRNAs are limited by the number of mismatches. Mismatch levels and number of on-target hits for gRNAs of individual 50kb windows in C. elegans are shown. Mismatch levels are set in the range from 0 to 3. Many selectable gRNAs and their corresponding target windows are available even at a mismatch level of 0. (b) Hexbin plot showing the relationship between aggregate gRNA score and dispersion. Standard deviation (dispersion) was calculated from the position of the gRNA hits within a target window. The aggregate gRNA score ranges from negative to positive values. Higher values of standard deviation correspond to higher distribution of gRNA within a target window. Standard deviation and gRNA score were calculated using 500kb windows in H. sapiens.
Using PyRanges, we created intervals of user-defined size that are overlapping with gRNA candidates containing the Cas9 PAM pattern (3-NGG-5). Since this step is computationally intensive, we have implemented options to narrow down the search with minimum and maximum number of hits for a target window.
Analysis of the annotation of regions of the C. elegans and H. sapiens genomes that can be targeted by multi-targeting gRNAs indicated that a broad range of features including genes and gene regulatory elements are available for selection. The range of annotated, targetable regions for each genome could be further significantly increased when combining gRNA searches for different genome-targeting systems that use different PAM sequences (Supplementary Fig.6).
To test CRISPR-broad we resorted to a previously described method of painting genome regions by targeting dCas9 fused to green fluorescent protein (GFP). Singular gRNAs targeting more than 100 directly repeated sequences within telomeres or pericentromeres identified by classical gRNA design tools has enabled mapping of these functional chromosome elements in cellular context4,5,6. Using CRISPR-broad we identified a singular gRNA targeting a 317kb region on human chromosome 19 at 19p13.2 with 86 hits (Fig.7a). Human U2OS transfected with a plasmid expressing dCas9-3XGFP together with a plasmid expressing the identified sgRNA showed two or 4 dots of accumulated green fluorescence in the nucleus in agreement with a 2n (G1- and S-phase) or 4n (G2-phase) chromosome content. In contrast and as described before4,5,6, dCas9-3XGFP in the absence of specific gRNA-mediated targeting displayed nucleolar background staining in the cell nucleus (Fig.7b). The results indicated that CRISPR-broad can identify large genomic regions for efficient targeting of dCas9 apart from simple and obvious repetitive elements of the genome.
Targeting of a broad region of the genome using a singular gRNA designed by CRIPSR-broad. (a) Scheme depicting a 317kb region on human chromosome 19 that can be targeted by a sgRNA at 86 locations. (b) Fluorescence imaging of U2OS cells transfected with a plasmid expressing dCas9-3XGFP together with a plasmid expressing the sgRNA targeting the region depicted in (A) (top) or the corresponding empty vector (bottom). Focal enrichment of GFP inside the nucleus is marked by arrows. Note that due to the different cell cycle stages two (2n chromosome content, G1- , S-phases) or four (4n chromosome content, G2-phase) labeled spots are expected. Scale bar represents 20m. Details on the selection of the presented cells and images can be found in Supplementary Fig.8.
To assess the wider application and potential of CRISPR-broad, we compared the results of the test runs on the C. elegans and H. sapiens genomes using the single Cas9 PAM with annotated (epi-)genetic features using the ENCODE and modENCODE datasets. We found the multi-targetable windows defined by CRISPR-broad overlapping with the features transcription factor binding sites (ChIP-seq peak regions), histone modification region (ChIP-seq peaks), annotated transposable elements in the genome and sites of DNA methylation (WGBS: methylated CpG sites). The fact that the fraction of each of these sites that could be targeted by multi-targeting gRNAs (number of features overlapped to a gRNA window of 5kb/total number of features) is substantial (Supplementary Fig.7) indicated that CRISPR-broad could be useful in various strategies of epigenome editing.
CRISPR-broad was developed in Python and the source code is available at https://github.com/AlagurajVeluchamy/CRISPR-broad. CRISPR-broad runs in seven independent modules with multiple options for user input (Fig.1b). The limiting steps are mapping the gRNAs to the genome and obtaining all hits. We tested the performance of the tool on a Linux workstation with 3040 threads computed for genome sizes of 103Mb (C. elegans) and 3.2Gb (H. sapiens) (Table 1). With an increase in genome size and in the allowed number of mismatches, the run time increased. The gRNA sequences, aggregate gRNA scores, GC content, number of on- and off-target hits, optimal on-target window of pre- selected size, and co-ordinates of each hit are compiled and exported in a tab-delimited text (Supplementary Table 2).
Go here to see the original:
CRISPR-broad: combined design of multi-targeting gRNAs and ... - Nature.com
- Copy number variation of the restorer Rf4 underlies human selection ... - Nature.com - November 15th, 2023 [November 15th, 2023]
- NYU Langone Health in the NewsThursday, November 9, 2023 - NYU Langone Health - November 15th, 2023 [November 15th, 2023]
- Eugenics: Plaguing scientific community with dark history | Opinion ... - The Arkansas Traveler - November 15th, 2023 [November 15th, 2023]
- Cranberries can bounce, float and pollinate themselves: The saucy ... - Japan Today - November 15th, 2023 [November 15th, 2023]
- Government Housing Assistance Linked to Increased Cancer ... - HealthDay - November 15th, 2023 [November 15th, 2023]
- Rate of New Lung Cancer Cases Has Decreased Over Last Five Years - HealthDay - November 15th, 2023 [November 15th, 2023]
- Clinically relevant antibiotic resistance genes are linked to a limited ... - Nature.com - November 15th, 2023 [November 15th, 2023]
- Disparities in Guideline-Concordant Care Found for Black CRC ... - HealthDay - November 15th, 2023 [November 15th, 2023]
- Mathematician Heather Harrington is new director at the Max Planck ... - EurekAlert - November 15th, 2023 [November 15th, 2023]
- New study finds genetic testing can effectively identify patients with ... - EurekAlert - November 15th, 2023 [November 15th, 2023]
- STK11 loss leads to YAP1-mediated transcriptional activation in ... - Nature.com - November 15th, 2023 [November 15th, 2023]
- Master regulator of the dark genome greatly improves cancer T-cell ... - Science Daily - November 15th, 2023 [November 15th, 2023]
- Omega Therapeutics Showcases Bidirectional and Multiplexed ... - BioSpace - November 15th, 2023 [November 15th, 2023]
- Today is International 15q Day - ASBMB Today - November 15th, 2023 [November 15th, 2023]
- Evolution of taste: Sharks were already able to perceive bitter ... - EurekAlert - November 15th, 2023 [November 15th, 2023]
- Stanford Scientists Uncover New Indicators of Health, Disease, and ... - SciTechDaily - October 16th, 2023 [October 16th, 2023]
- NHGRI Director Eric Green elected to the National Academy of ... - National Human Genome Research Institute - October 16th, 2023 [October 16th, 2023]
- Monkey survives for two years after gene-edited pig-kidney transplant - Nature.com - October 16th, 2023 [October 16th, 2023]
- Opinion: Interest in RNA Editing Accelerates as Therapies Approach ... - BioSpace - October 16th, 2023 [October 16th, 2023]
- Regulation of dermal fibroblasts by human neutrophil peptides ... - Nature.com - October 16th, 2023 [October 16th, 2023]
- Consistent effects of the genetics of happiness across the lifespan ... - Nature.com - October 16th, 2023 [October 16th, 2023]
- Storytelling through the looking glass of genetics The Stute - The Stute - October 16th, 2023 [October 16th, 2023]
- Pet dogs shed light on human health, researchers say - UPI News - October 16th, 2023 [October 16th, 2023]
- Native microbiome dominates over host factors in shaping the ... - Nature.com - October 16th, 2023 [October 16th, 2023]
- Illinois-led project to sequence soybean genomes, improve future ... - Herald-Whig - October 16th, 2023 [October 16th, 2023]
- Unrealized targets in the discovery of antibiotics for Gram-negative ... - Nature.com - October 16th, 2023 [October 16th, 2023]
- How Biotech And AI Are Transforming The Human - Noema Magazine - October 16th, 2023 [October 16th, 2023]
- The Many Lives of Alexandria Forbes - BioSpace - October 16th, 2023 [October 16th, 2023]
- CEP20 promotes invasion and metastasis of non-small cell lung ... - Nature.com - October 16th, 2023 [October 16th, 2023]
- Genotyping, sequencing and analysis of 140,000 adults from Mexico ... - Nature.com - October 16th, 2023 [October 16th, 2023]
- The role and impact of alternative polyadenylation and miRNA ... - Nature.com - October 16th, 2023 [October 16th, 2023]
- Human - Simple English Wikipedia, the free encyclopedia - January 30th, 2023 [January 30th, 2023]
- Deep Dive Ties Together Dog Genetics, Brain Physiology and Behavior to Explain Why Collies Are Different from Terriers - Scientific American - December 12th, 2022 [December 12th, 2022]
- How oxytocin drives connections of newly integrated adult-born neurons: Research - Hindustan Times - December 12th, 2022 [December 12th, 2022]
- Alzheimer's Disease Genetics Fact Sheet - National Institute on Aging - December 2nd, 2022 [December 2nd, 2022]
- Human genetic clustering - Wikipedia - November 23rd, 2022 [November 23rd, 2022]
- Human Genome Project Fact Sheet - November 23rd, 2022 [November 23rd, 2022]
- Abstracts | International Congress of Human Genetics 2023 - November 23rd, 2022 [November 23rd, 2022]
- Ancient DNA and Neanderthals | The Smithsonian Institution's Human ... - November 16th, 2022 [November 16th, 2022]
- Biological Influences on Human Behavior: Genetics & Environment - November 16th, 2022 [November 16th, 2022]
- Fluent BioSciences showcasing breakthrough solutions to enable unprecedented scale, cost-efficiency and access for single-cell RNA sequencing at the... - October 28th, 2022 [October 28th, 2022]
- Human behaviour genetics - Wikipedia - October 23rd, 2022 [October 23rd, 2022]
- Nucleome Therapeutics raises oversubscribed 37.5 million Series A financing to decode the dark matter of the human genome and deliver first-in-class... - October 19th, 2022 [October 19th, 2022]
- Gladstone data scientist elected to the National Academy of Medicine - EurekAlert - October 19th, 2022 [October 19th, 2022]
- Ocugen to Host R&D Day in New York City on Tuesday, November 1, 2022 - Yahoo Finance - October 19th, 2022 [October 19th, 2022]
- Pharmacy researcher earns $2.3 million NIH award to study opioid addiction - EurekAlert - October 19th, 2022 [October 19th, 2022]
- Study shows age often plays a bigger role than genetics in gene expression and susceptibility to disease - Anti Aging News - October 19th, 2022 [October 19th, 2022]
- CSRWire - Direct Relief, Amgen and C/Can Team Up To Improve Access to Breast Cancer Diagnostics and Treatment in Paraguay - CSRwire.com - October 19th, 2022 [October 19th, 2022]
- Maze Therapeutics Appoints Harold Bernstein, M.D., Ph.D., as President, Research and Development and Chief Medical Officer - Business Wire - October 19th, 2022 [October 19th, 2022]
- New Rare Disease Therapy Effectively Lowers Plasma Phe in Patients with PKU - MD Magazine - October 19th, 2022 [October 19th, 2022]
- GSK : announces expanded collaboration with Tempus in precision medicine to accelerate R&D - Marketscreener.com - October 19th, 2022 [October 19th, 2022]
- Famous Scientific Discoveries That Changed the Course of History - 24/7 Wall St. - October 19th, 2022 [October 19th, 2022]
- Construction workers seek fulfilment of their demands - Star of Mysore - October 19th, 2022 [October 19th, 2022]
- Genetics | The Smithsonian Institution's Human Origins Program - October 13th, 2022 [October 13th, 2022]
- Genetics - Wikipedia - October 13th, 2022 [October 13th, 2022]
- Study looking at human genetics and Covid vaccine immune responses - Science Media Centre - October 13th, 2022 [October 13th, 2022]
- ASHG 2022 in Los Angeles brings together researchers from around the world to advance discoveries in genetics, genomics research - EurekAlert - October 13th, 2022 [October 13th, 2022]
- Maze Therapeutics Appoints Harold Bernstein, M.D., Ph.D., as President, Research and Development and Chief Medical Officer - Yahoo Finance - October 13th, 2022 [October 13th, 2022]
- The Age of the Pangenome Dawns - DNA Science - PLOS - October 13th, 2022 [October 13th, 2022]
- Influence of the microbiome, diet and genetics on inter-individual variation in the human plasma metabolome - Nature.com - October 13th, 2022 [October 13th, 2022]
- Genome editing technologies: final conclusions of the re-examination of Article 13 of the Oviedo Convention - Council of Europe - October 13th, 2022 [October 13th, 2022]
- Global Biobank Meta-analysis Initiative making genome-wide association studies more diverse and representative - EurekAlert - October 13th, 2022 [October 13th, 2022]
- New NHS genetic testing service could save thousands of children in England - The Guardian - October 13th, 2022 [October 13th, 2022]
- Covid protection may be boosted by genes, study shows - Yahoo News Australia - October 13th, 2022 [October 13th, 2022]
- Genomics in Cancer Care Market is estimated to be US$ 72.61 billion by 2032 with a CAGR of 16.3% during the forecast period 2032 - By PMI -... - October 13th, 2022 [October 13th, 2022]
- Identification of hub genes and candidate herbal treatment in obesity through integrated bioinformatic analysis and reverse network pharmacology |... - October 13th, 2022 [October 13th, 2022]
- Our *Homo sapiens* ancestors shared the world with Neanderthals, Denisovans and other types of humans whose DNA lives on in our genes -... - October 8th, 2022 [October 8th, 2022]
- Blue Eyed People Have a Single Ancestor | History of Yesterday - History of Yesterday - October 6th, 2022 [October 6th, 2022]
- Heart infection could be cause of death of Polish, US hero - ABC News - October 6th, 2022 [October 6th, 2022]
- 23andMe Announces Trials-in-Progress Poster Presentation on 23ME-00610, An Investigational Antibody Targeting CD200R1, at The Society for... - October 6th, 2022 [October 6th, 2022]
- The Genetic Drivers Of Longevity In Mice, Humans And Worms - Science 2.0 - October 6th, 2022 [October 6th, 2022]
- ANGPTL7, a therapeutic target for increased intraocular pressure and glaucoma | Communications Biology - Nature.com - October 6th, 2022 [October 6th, 2022]
- 'Neanderthal Man' Nobel Prize winner Svante Pbo revolutionized anthropology. Here is a look back at his groundbreaking 2014 memoir - Genetic Literacy... - October 6th, 2022 [October 6th, 2022]
- Understanding Human Genetic Variation - NCBI Bookshelf - September 14th, 2022 [September 14th, 2022]
- Genetics - National Institute of General Medical Sciences (NIGMS) - September 14th, 2022 [September 14th, 2022]
- People with ME invited to take part in major genetic study - The Independent - September 14th, 2022 [September 14th, 2022]
- Ketamine Promising for Rare Condition Linked to Autism - Medscape - September 14th, 2022 [September 14th, 2022]
- How a small, unassuming fish helps reveal gene adaptations - University of Wisconsin-Madison - September 14th, 2022 [September 14th, 2022]
- How Nutrigenomics Explores Links Between Nutrition And Genes - Health Digest - September 14th, 2022 [September 14th, 2022]
- Scientists redefine obesity with discovery of two major subtypes - EurekAlert - September 14th, 2022 [September 14th, 2022]