{"id":1126835,"date":"2024-07-11T18:50:24","date_gmt":"2024-07-11T22:50:24","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/saturation-genome-editing-of-bap1-functionally-classifies-somatic-and-germline-variants-nature-com\/"},"modified":"2024-07-11T18:50:24","modified_gmt":"2024-07-11T22:50:24","slug":"saturation-genome-editing-of-bap1-functionally-classifies-somatic-and-germline-variants-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/saturation-genome-editing-of-bap1-functionally-classifies-somatic-and-germline-variants-nature-com\/","title":{"rendered":"Saturation genome editing of BAP1 functionally classifies somatic and germline variants &#8211; Nature.com"},"content":{"rendered":"<p><p>Optimized SGE approach improves experiment quality    <\/p>\n<p>    We developed a HAP1 DNA ligase 4 (LIG4)-knockout (KO)    line with genomic integration of a clonally derived Cas9    (HAP1-A5) and confirmed BAP1 essentiality in this line    (Figs. 1 and 2a), high Cas9 activity    (Fig. 2b and Extended Data    Fig. 1a) and robust    maintenance of haploidy (Extended Data Fig. 1b). We also optimized    plasmids and transfection protocols, increasing transfection    efficiency in HAP1 cells from <5% to >60% compared to    other12 SGE experiments    (Extended Data Fig. 1c and Methods). To screen    all coding exons of BAP1, we used five time points: day    (D)4, D7, D10, D14 and D21. Our optimized SGE protocol led to    increased editing by homology-directed repair (HDR), with ~1%    unedited reads (Fig. 2c and Supplementary    Table 1).  <\/p>\n<p>            a, Target regions of 245 bp were designed for            all coding exons of the canonical BAP1            transcript: ENST00000460680.6 (ref. 52). Target            regions were processed in separate experiments to            sequentially cover all regions. For each region,            LIG4-KO, Cas9-expressing HAP1-A5 cells were            transfected in triplicate with an sgRNA-expressing            plasmid and a corresponding variant library; homologous            recombination with this template library at the Cas9            lesion\/cut site results in the introduction of variants            into the genome, generating populations of edited            cells. This allows for assessment of variant function,            because only benign variants will rescue cell fitness            following CRISPRCas9-mediated disruption of            BAP1, an essential gene. Each region was edited            separately using two independent template librarysgRNA            pairs; each variant library (library A or library B)            contained saturating mutations (colored squares) and            library-specific synonymous PPEs (dark red line) to            prevent sgRNACas9-mediated recutting of incorporated            genomic tracts. dsDNA, double-stranded DNA. b,            Cells were cultured over time with pellets collected at            D4, D7, D10, D14 and D21. gDNA, genomic DNA. c,            Sequencing was used to assess the population dynamics            of genomic DNA libraries, generating counts for each            variant using the QUANTS pipeline. DESeq2 was used to            convert counts into an LFC of variant abundance over            time. LFCs were then median scaled and a single            functional score was computed through the aggregation            of library A and library B data. Functional scores were            categorized on the basis of a significance threshold            and assessed for accuracy against variants with known            pathogenic or benign classifications.          <\/p>\n<p>            a, A targeted CRISPRCas9 screen in HAP1-A5            cells confirmed BAP1 essentiality and permitted            selection of sgRNAs with favorable depletion kinetics            for use in SGE (Methods).            b, FACS analysis counts (green fluorescent            protein (GFP)-positive cells) demonstrated that the            HAP1-A5 clone has very high Cas9 activity (arrow),            measured at 48 and 72 h after transduction with a            GFP\/blue fluorescent protein (BFP) activity construct            (Methods:            Ploidy and FACS analysis), compared to the parental            Polyclonal (Cas9+ LIG4)            line. A total of 10,00020,000 cells were analyzed for            each line. Cell count percentages derived from            negative-control lines with no Cas9 showed expected low            levels of Cas9 activity (see Extended Data Fig.            1a and            Supplementary Fig. 1a for            representative FACS data). c, Editing using            pilot SGE conditions: a template library (496 variants)            coupled with sgRNA-A targeting exon 5 of BAP1            was transfected into the polyclonal (Cas9+            LIG4) line and cells were sampled at            D5 and D11 (time points previously10 used in            SGE). More than 10% of the counts were unedited (wild            type), which decreased to <1% when the clonal            (Cas9+ LIG4) cell line            (HAP1-A5) was edited using the same sgRNA and HDR            homology arms with optimized SGE conditions, including            a high-complexity template library (1,040 variants)            sampled over five time points. d, Count            abundance for variants that resulted in synonymous            changes or edited intronic regions did not change            significantly over a 21-day SGE screen (two-sided            MannWhitneyWilcoxon test; D4 versus D21 counts,            P=0.3; NS, not significant), whereas variants            resulting in stop-gained and frameshift consequences            were significantly depleted            (****P<2.21016;            n=8,707 synonymous and intronic variants;            n=5,628 frameshift and stop-gained variants;            mean z-score counts of three biological            replicates at each time point). Boxes show the            interquartile range, the horizontal lines show the            median z-score count and whiskers show the            maximum and minimum values that are not outliers.            e, Density plot showing functional scores            colored by Ensembl Variant Effect Predictor            (VEP)53            mutational consequence. Black tick marks represent            single variant values. f, Jitter plot showing            VEP mutational consequence categories versus functional            score. Data points that have FDR0.01 are            semitransparent and the median synonymous functional            score differs significantly from that for all other            categories except UTR (KruskalWallis test:            P<2.21016,            H=6,692.2; two-sided Dunns BH FDR            ****q<2.21016).          <\/p>\n<p>            Source data          <\/p>\n<p>    Of note, the canonical BAP1 transcript    (ENST00000460680.6) has 17 exons (Fig. 1a), and because    oligonucleotide synthesis lengths are limited, 22 SGE target    regions of 245 bp were designed to saturate all of the coding    sequence, with 20- to 90-bp exon-flanking sequences also    saturated (intron, 5 UTR, 3 UTR). For larger exons, partially    overlapping regions were designed. All HDR template libraries    were designed using VaLiAnT13. These libraries    contained two different synonymous protospacer adjacent motif    (PAM)\/protospacer protection edits (PPEs) that were refractory    to single guide RNA (sgRNA)Cas9 cutting, preventing cleavage    of incorporated tracts. Each SGE region was targeted in two    separate experiments; HDR template library A contained a PPE    for one sgRNA (A) and library B contained a different PPE for a    different sgRNA (B) within the same target region.    Transfections were performed in triplicate for both library A    and library B for all 22 regions, with samples collected at the    five time points mentioned above (Fig. 1b).  <\/p>\n<p>    In total, data from 598 genomic DNA time point-replicate    libraries progressed to data analysis (Fig. 1c), with an average    variant coverage of 535 generated on    the Illumina platform (Supplementary Table 2).  <\/p>\n<p>    We used cell fitness as a biological readout of BAP1    function, first rigorously reconfirming BAP1    essentiality (Extended Data Fig. 2ac) and SGE efficacy    (Extended Data Fig. 2d) in HAP1 cells. To    aid the selection of appropriate sgRNAs for experimentation, we    performed a targeted CRISPRCas9 screen with 193 sgRNAs tiled    across all 17 BAP1 exons (Fig. 2a). sgRNAs for SGE    were selected based principally on design parameters (as    previously described13), with depletion    kinetics also considered (Methods). We    deployed these sgRNAs and variant libraries across all 22    BAP1 target regions and confirmed editing (Extended Data    Fig. 3). As expected for an    essential gene amenable to SGE, scaled counts between D4 and    D21 for stop-gained and frameshift variants were reduced,    suggesting the depletion of cells with these variants, whereas    synonymous and intron variant counts remained unchanged (Fig.    2d). By combining    library A and library B, we calculated a single functional    score for each variant (Methods). This is    the apparent growth rate across D4, D7, D10, D14 and D21    computed by log-linear regression in DESeq2 (ref.    14) and represents    log2-transformed fold change (LFC) per unit time    (Methods). Stop-gained,    frameshift and splice donor\/acceptor variants exhibited    predominantly negative functional scores, whereas synonymous,    intron and UTR variants had a unimodal distribution centered at    0 (Fig. 2e). Missense variants    exhibited a continuum of functional scores with a negatively    skewed unimodal distribution (Fig. 2e).  <\/p>\n<p>    We next used functional scores and standard errors computed    using DESeq2 to accurately define variant effects. For each    variant tested, a z-score distribution of functional    score divided by standard error was used to calculate P    values using a two-tailed z-test (Methods). All    unique variants were collated and the false discovery rate    (FDR) was derived from the P value using the    BenjaminiHochberg (BH) procedure15 to correct for    multiple testing. The behavior of individual variants within    this spectrum was intriguing, with, for example, specific    synonymous alterations appearing disruptive and specific    stop-gained and frameshift alleles, particularly those in the    terminal exon, appearing nondisruptive. Codon deletions    (in-frame, sequentially deleted codons) also exhibited a    spectrum of scores with a bimodal distribution, which allowed    us to refine key residues\/domains within the BAP1 protein.    Individual variant functional scores relative to the FDR    threshold are shown in Fig. 2f. All mutational    consequence categories, except UTR variants, had a    significantly different median functional score from synonymous    variants as measured by Dunns nonparametric pairwise    multiple-comparisons procedure (q<0.0001;    Supplementary Table 3).  <\/p>\n<p>    Functional scores and FDR values were used to categorize    variants into functional classes, following the integration and    validation of data as described below. Variants with an    FDR0.01 were categorized as unchanged, those with an    FDR<0.01 and a negative functional score were categorized    as depleted and those with an FDR<0.01 and a positive    functional score were categorized as enriched. Data for    18,108 unique variants were collected after filtering steps    with variants categorized as follows: 11,912 unchanged, 5,665    depleted and 531 enriched (Supplementary Table 2). Unchanged    variants centered tightly around a zero functional score    (median=0.00; range=0.09) and enriched variants had    modestly increased scores (median=0.01; range=0.03), while    depleted variants exhibited a wider score distribution    (median=0.13; range=0.27; Fig. 3a). As above,    stop-gained variants were depleted consistently across all    exons, except exon 17, suggesting an escape of    nonsense-mediated decay. No enriched variants were observed for    stop-gained alleles (Fig. 3b). Functional scores    for missense variants were significantly different between    exons as measured by KruskalWallis rank sum test    (P<0.0001, H=1,093.3). Interestingly,    while missense variants were depleted in all exons, we noted    that proportionally more of these variants were depleted in    exons 19 and 1517, and that fewer missense variants were    depleted in exons 1014. Exons 19 and 1517 encode conserved    UCH and protein interaction motifs, respectively. Indeed, we    found a significant positive correlation between the depleted    missense functional classification and conservation as measured    by ortholog identity at each amino acid position in the protein    (Spearmans rank: rs=0.45,    P<0.0001). This relationship was also observed for    codon deletions (Spearmans rank: rs=0.44,    P<0.0001).  <\/p>\n<p>            a, Histogram showing all 18,108 unique variants            assayed, grouped into 75 intervals and colored            according to functional classification. Inset shows a            magnified section of functional score intervals with            500 variants. b, Composition of functional            classes by exon and mutational consequence (color key            as in a). c, EVE scores for functional            classes (n variants in class shown). Both            depleted and enriched classes have significantly            different median values from the unchanged class            (KurskalWallis, P<2.21016; two-sided Dunns BH            FDR, ****q<0.0001; depleted            q<2.21016 and enriched            q=3.4105), demonstrating that            depleted and enriched variants are less represented            over evolution compared to unchanged variants and are            therefore more likely to be disruptive. Boxes show the            interquartile range, horizontal lines show the median            EVE score, whiskers show maximum and minimum values            that are not outliers, and outliers are shown as            points. d, The bar chart shows the number of            variants in each class that are in gnomAD and not            ClinVar (n shown) divided by the number of            variants in each class assayed. Fewer depleted and            enriched variants than unchanged variants were observed            in gnomAD (two-sided chi-squared test:            2=49.1,            P<2.141011). e, Heat map            showing amino acid-level substitutions (A:stop)            created by nucleotide-level saturation across 730            codons (single nucleotide variants (SNVs) only),            colored by functional classification (SNV missense            changes with discordant functional classifications            between alternative codons were excluded;            n=158). Of note, codon deletion, alanine            scan and stop scan changes were designed to be            incorporated at each of the 720 nonsplit codons (of 730            total codons). Bar chart shows the percentage identity            calculated from Geneious alignment of the eight species            shown in Fig. 6d. Key protein            regions are shown (UCH, ubiquitin C-terminal hydrolase;            HBM, HCF1 binding motif; BRCA1, BRCA1 binding domain;            ASXL, additional sex combs like 1\/2\/3 interaction; YY1,            Ying Yang 1 binding domain; NLS, nuclear localization            signal). f,g,            AlphaFold54 BAP1            model with SGE-depleted codon deletions colored dark            blue (f). Depleted codon deletions accurately            delineate the UCH domain (purple) and protein            interaction region (cyan), as highlighted in g.            Depletion also occurs in uncharacterized regions,            including the -helix C terminal to the UCH domain,            proximal to the protein interaction region (arrow,            f).          <\/p>\n<p>            Source data          <\/p>\n<p>    Because Evolutionary model of Variant Effect    (EVE)16 scores can be    used as a measure of conservation for missense variants, we    compared this metric to our SGE results and found that depleted    and enriched variants were under more evolutionary constraint    (8,525 of 8,822 total unique missense variant assessed; Fig.    3c). Variants under    more evolutionary constraint are expected to be observed less    frequently in population-ascertained cohorts of healthy    controls from the gnomAD database, which was the case for both    depleted and enriched variants compared to unchanged variants    (chi-squared test; 2=49.1,    P<0.0001; Fig. 3d). We also observed    that the conserved N-terminal UCH domain of BAP1 showed greater    intolerance to missense changes and codon deletions compared to    the more central regions of the protein (Fig. 3e), in keeping with    its amino acid conservation. The conserved C-terminal protein    interaction motifs also demonstrated intolerance to change. Of    note, codon deletions precisely delineated critical domains    with high accuracy and highlight uncharacterized regions (Fig.    3f,g).  <\/p>\n<p>    Before making comparisons to clinical data, we examined the    reproducibility of the functional scores and functional    classifications for each variant by comparing LFCs from    separate genome editing experiments. Overall, 81% of variants    (14,624\/18,108) were separately examined using library A and    library B HDR templates, with close to linear LFC value    correlations (Pearsons R=0.95, P<0.0001;    Fig. 4a). When functional    classifications were computed using library A or library B LFCs    and FDRs, a 90% concordance of variant classification was    observed (13,106\/14,624; Fig. 4a). As LFCs and    functional classifications were found to be highly correlated,    to increase robustness, library A and library B LFCs for each    variant were combined into a single combined LFC and termed    the abovementioned functional score (Methods). As    expected, variant LFCs within PPE codons differed between    libraries (Extended Data Fig. 4a,b). Therefore,    variants in PPE codons examined by only library A or library B    were excluded from downstream analyses (n=140). As    above, our functional score was calculated as the apparent    growth rate over five time points, an analysis previously used    in SGE17. This approach    is appropriate for our data, as LFCs between later time points    were linearly related (Extended Data Fig. 4cf). The functional    scores, functional classifications (depleted, unchanged and    enriched) and downstream comparisons used throughout this study    were derived from these combined LFC values.  <\/p>\n<p>            a, Independent SGE libraries (A and B) were used            to edit most target regions with 13,106 of 14,624            variants showing a concordant functional classification            (dark blue) and 1,518 variants discordant between            libraries (light blue). Of note, the degrees of LFC for            each independent variant measurement were highly            concordant based on Pearsons correlation coefficient            (R) and two-tailed t-test            P<2.21016. b, ROC curve            for SGE functional score, with AUC value shown. Also            shown is the ideal threshold for maximum diagnostic            sensitivity and specificity (plotted as 1             specificity). Calculated using pROC (version            1.18.4)55 in R.            c, Top, a histogram showing the 18,108 unique            variants grouped within 75 intervals of functional            score, colored by ClinVar clinical significance.            Bottom, a magnified region highlights that            pathogenic\/likely pathogenic (dark blue) variants are            depleted. The arrow shows the x-axis position of            the ideal threshold. d, Top, functional            classification by ClinVar clinical significance (1*, 4            September 2023). Bottom, functional classification by            observation in ClinVar and gnomAD (n variants            shown). e, Depleted variants (n=5,665)            categorized into strongly depleted (lower 50%, dark            blue) and weakly depleted (upper 50%, light blue)            variants, either side of the median functional score            (0.1260642). f, More frameshift and stop-gained            variants and fewer missense variants were strongly            depleted compared to weakly depleted variants            (two-sided chi-squared test, 2 =            10,759, P<2.21016). g, Strongly            and weakly depleted missense variants have            significantly different EVE scores (two-sided            MannWhitneyWilcoxon test,            ****P<2.21016). Boxes show the            interquartile range, horizontal lines show the median            EVE score, whiskers show maximum and minimum values            that are not outliers, and outliers are shown as            points. h, Concordance of SGE functional            classification and orthogonal functional assays for VUS            in patients with cancer and developmental            disorders9,25. Color            indicates SGE classification and shape corresponds to            orthogonal assay classification. Control variants (from            a casecontrol study25) are            shown in green text. SGE variants that were strongly            depleted (dark blue) and not tolerated in orthogonal            assays (triangles) are completely concordant. P12A,            which was partially tolerated in an orthogonal assay,            was weakly depleted in SGE. All tolerated variants            (white squares) in assays were unchanged in SGE (gray),            except for E406V, which was enriched (red).          <\/p>\n<p>            Source data          <\/p>\n<p>    Full nucleotide and protein-level variant effect maps are    provided in Extended Data Figs. 5 and 6, respectively. The    full dataset with annotations and scores is also available for    download at <a href=\"https:\/\/github.com\/team113sanger\/Waters_BAP1_SGE\" rel=\"nofollow\">https:\/\/github.com\/team113sanger\/Waters_BAP1_SGE<\/a>    and as Supplementary Data 1 and 2. Variant scores and    classifications can also be searched on the BAP1 Viewer:    <a href=\"https:\/\/bap1-viewer.shinyapps.io\/bap1viewer\/\" rel=\"nofollow\">https:\/\/bap1-viewer.shinyapps.io\/bap1viewer\/<\/a>.  <\/p>\n<p>    To further examine functional scores, we first identified    variants with strong clinical\/functional data in support of    their classification, curating 851 benign (true negative) and    199 pathogenic (true positive) variants that had at least one    star (1*) in ClinVar (downloaded 4 September 2023). We used    the functional scores for these variants to generate a receiver    operating characteristic (ROC) curve, with the area under the    curve (AUC) computed (Fig. 4b). We found that our    functional score was highly accurate at classifying these    variants with a sensitivity of >99%, a specificity of    >98%, a classification error rate close to 0 (<0.002%)    and a precision-recall AUC of >0.999 (Supplementary Table    4). We also used our    data to explore the relationship between functional    score\/classification and reported clinical classifications and    found high concordance (Fig. 4c,d).  <\/p>\n<p>    Of note, many clinically used in silico classifiers, including    EVE16,    SIFT18 and    PolyPhen-219, use    protein-level information to predict function, whereas SGE    assesses function at the nucleotide level, capturing variant    effects on splicing, RNA folding, codon usage and other    non-protein-level processes. We observed that few synonymous    variants were depleted in our screen (Figs. 2e,f and 3b). Importantly,    synonymous variants that were classified as depleted had    significantly higher SpliceAI scores than unchanged synonymous    variants (P<0.0001, two-sided MannWhitneyWilcoxon    test; Extended Data Fig. 7a), suggesting    functional relevance. In the absence of functional or in silico    data, synonymous variants are routinely classified as    VUS20, suggesting that    these variants could be misclassified without SGE. Importantly,    we found that variants (missense, stop-gained and synonymous)    created by SGE through different nucleotide-level changes had    highly correlated LFCs, as expected (Pearsons R=0.91,    P<0.0001; Extended Data Fig. 7b). Missense changes    alone also showed a high correlation in LFCs between    alternative codons (Pearsons R=0.89,    P<0.0001). Of note, 8,822 unique nucleotide-level    changes in our screen resulted in 4,619 unique missense changes    at the protein level, of which 3,993 could be examined using    alternative codon generation, with 16.7% (667\/3,993) showing    different functional classifications. Thus, not all missense    changes have equal effects when encoded by alternative codons,    further highlighting the importance and richness of SGE    functional assessment at the nucleotide level.  <\/p>\n<p>    Because very few BAP1 missense variants have been    ascribed to be pathogenic or benign, a direct comparison of    sensitivity and specificity using an AUC summary metric between    in silico tools and SGE functional scores for missense variants    alone is not possible. However, when we compared experimental    data with in silico tools, we found that EVE, PolyPhen-2 and    CADD21 predicted SGE    classifications of non-splice region missense variants with    7779% accuracy (Extended Data Fig. 7c). With per-variant    examination, it is notable that EVE, PolyPhen-2 and CADD    classify proportionally more missense variants as pathogenic,    probably damaging and likely pathogenic, respectively,    suggesting that SGE may have a relatively higher specificity    (Extended Data Fig. 7dg).  <\/p>\n<p>    Next, we sought to quantify the evidence strength at which    predictions from our assay could be applied using the American    College of Medical Genetics and Genomics (ACMG) framework for    variant interpretation20. To this end, we    generated further truth sets of high-confidence pathogenic and    benign variants (Methods and    Supplementary Table 4) against which to    evaluate assay performance using the established framework from    Brnich et al.22. We aimed to    evaluate assay performance in predicting the impact of missense    variants, which are challenging to classify.  <\/p>\n<p>    We observed that 99.8% (2,419\/2,423) of variants in our    pathogenicity truth set exhibited the expected depletion in the    assay output, whereas 97.1% (134\/138) of variants in our    benignity truth set were unchanged or enriched in the assay    (Table 1). These observations    equate to likelihood ratios toward pathogenicity of 27.6 and    benignity of 470.6, which correspond to strong and very strong    evidence strengths, respectively22,23. Notably, when    using truth sets constructed using ClinVar-classified missense    variants only (1* review status), there was full concordance    with assay results; however, due to small sample numbers, these    truth sets yielded likelihood ratios toward pathogenicity and    benignity of 6.0 and 7.0, respectively, both equating to a    moderate strength of evidence. Further limiting truth sets by    restriction to ClinVar variants of 2* did not allow the    generation of evidence strengths due to the absence of    pathogenic variants.  <\/p>\n<p>    We were intrigued by the observation that some patients with    germline BAP1 variants have been reported as being    predisposed to tumors, whereas others have a neurodevelopmental    disorder. SGE allows us to test whether these variants have    different functional outcomes.  <\/p>\n<p>    To this end, we ranked the 5,665 depleted variants (we excluded    enriched variants) by categorizing them on either side of the    median, defining them as strongly depleted (n=2,833)    or weakly depleted (n=2,832) (Fig. 4e). We observed that    the proportions of mutational consequences seen in strongly and    weakly depleted categories were significantly different from    one another (chi-squared test; 2=10,759,    P<0.0001), with more missense and fewer stop-gained    and frameshift mutations weakly depleted (Fig. 4f). We also observed    that weakly depleted missense variants were less conserved    (P<0.0001, two-sided MannWhitneyWilcoxon test;    Fig. 4g). Strongly depleted    variants were also depleted at an earlier time point (D10) in    the screen compared to most weakly depleted variants (Extended    Data Fig. 7h). Taking these    findings together, it appears that a subset of missense    variants (n=426; strongly depleted) behave similarly    to stop-gained\/frameshift variants and a larger number of    missense variants (n=1,548; weakly depleted) have a    less extreme LFC and slower change in variant abundance.  <\/p>\n<p>    Sixteen BAP1 germline variants have been associated with    developmental disorders9,24. In our screen,    we assayed 15 of these 16 variants and found that 13 of 15 were    classified as depleted (Supplementary Note 1 and Extended Data    Fig. 7i). Functional    studies have previously been performed on variants associated    with development9 and    cancer25, with perfect    concordance observed between these orthogonal assays and SGE to    the degree that a putative hypomorphic allele can be    distinguished (Supplementary Note 1 and Fig.    4h).  <\/p>\n<p>    Next, we analyzed data from a comprehensive clinical analysis    of families with BAP1-tumor predisposition syndrome    (TPDS)26 (Supplementary    Note 1 and Supplementary    Table 5). Interestingly, we    found that carriers of depleted variants had a significantly    earlier age of onset than carriers of unchanged variants    (P<0.01, two-sided MannWhitneyWilcoxon test;    Extended Data Fig. 7j). However, we saw    no differences between strongly and weakly depleted    classifications for age of onset or cancer type (Supplementary    Note 1 and Extended Data    Fig. 7j,k). Moreover, while    there was a difference in molecular consequences (Fig.    4f), conservation (Fig.    4g) and effect sizes,    germline cancer-associated variants did not have different    functional score effect sizes compared to    development-associated variants (Extended Data Fig.    7i,k).  <\/p>\n<p>    Next, we used whole-exome sequencing data from 454,787    individuals in UK Biobank to explore the phenotypic    consequences of BAP1-disruptive    alleles27. We identified    57 SGE-depleted, 80 SGE-enriched and 754 SGE-unchanged variants    in the exomes of 297, 1,960 and 61,333 carriers, respectively    (Supplementary Table 6). We performed a    phenome-wide association study (PheWAS) analysis (Supplementary    Method 14), focusing on    depleted variants only. To evaluate the association of these    variants with overall cancer risk, we generated cancer-type    phenotypic variables and rare variant burden test masks    (variant sets) (Fig. 5a, Extended Data Fig.    8a and Supplementary    Table 7). We found that    SGE-depleted nonsynonymous variants were significantly    associated with all-site cancer predisposition (P=7.85     1003; n=82)    with this variant set\/mask composed of missense and    high-confidence protein-truncating variants (PTVs), which were    classified as depleted by SGE.  <\/p>\n<p>            a, PheWAS forest plot for all-site cancers using            SGE-depleted variants and controls; regression model            effect is shown by data points and  effect standard            error is shown by bars (Supplementary Table            7). Rare            variant burden test masks (and CADD, EVE and            REVEL56            predictors) are shown by color for BAP1 variants            in UK Biobank (n carriers shown in key).            Significance, according to the corrected P value            determined by generalized linear modeling            (Supplementary Method 14), is            indicated by a triangle (significant) or a circle (not            significant). SGE-depleted nonsynonymous variants            (yellow) showed a significant effect and are therefore            associated with increased cancer risk. SGE-depleted            high-confidence (HC) protein-truncating variants (PTVs;            orange) demonstrated a significant effect, as did HC            PTVs (red). b, UK Biobank SGE-depleted            nonsynonymous variant carriers (n=69) had a            significantly higher median blood concentration of            IGF-1 compared to noncarriers (n=398,505);            P<0.005 (P=0.0033,            two-sided MannWhitneyWilcoxon test). Violin plots are            colored by BAP1 variant status, boxes show the            interquartile range, horizontal lines show the median            IGF-1 blood concentration (nmol l1),            whiskers show maximum and minimum values that are not            outliers, and outliers are shown as points. c,            IGF1 mRNA expression levels in transcripts per            million (TPM) obtained from TCGA for 80 uveal melanoma            tumors28.            BAP1-mutant tumors (n=35) have higher            IGF1 expression than those with wild-type            BAP1 (n=45). Colors, outliers and box            description are as in b, except the horizontal            line is the median IGF1 expression in tumors.            P<0.001 (P=0.00029,            two-sided MannWhitneyWilcoxon test). d, The 80            samples from patients with uveal melanoma were ranked            by TCGA IGF1 expression level, with tumors with            the top 50% highest expression levels classified as            having high expression and the bottom 50% classified as            having low expression. Top, KaplanMeier estimates,            with deceased status (overall survival) shown by            vertical tick marks and the model for survival            probability based on the overall survival time (in            days) shown by lines colored to indicate IGF1            expression level. The P value was calculated            using the log-rank test and indicates a significant            difference between the overall survival probability for            tumors with high and low IGF1 expression from            patients in the cohort. Bottom, number at-risk table            shows a higher number of patients alive at each time            increment for patients whose tumor expressed low versus            high levels of IGF-1.          <\/p>\n<p>            Source data          <\/p>\n<p>    Beyond cancer, we also examined the association between UK    Biobank BAP1 variants and quantitative traits    (Supplementary Table 8). As a result, we    identified that circulating IGF-1 levels were significantly    increased in carriers of SGE-depleted nonsynonymous BAP1    variants compared to noncarriers (Fig. 5b;    P<0.005, two-sided MannWhitneyWilcoxon test).    Importantly, IGF-1 levels in carriers with and without a cancer    diagnosis did not differ, indicating that significantly    increased IGF-1 levels are specific to individuals with    SGE-depleted nonsynonymous BAP1 variants rather than a    cancer diagnosis, and suggests a possible mechanism of    BAP1-mediated pathogenicity (Supplementary Note 2).  <\/p>\n<p>    To further investigate the association of SGE-depleted alleles    with increased IGF-1 levels, we obtained The Cancer Genome    Atlas (TCGA) RNA-seq data28 for uveal    melanomas and found a strong association of loss-of-function    BAP1 alleles with IGF1 mRNA expression    (P<0.001, two-sided MannWhitneyWilcoxon test) and    poor prognosis by KaplanMeier estimate (P=0.004)    (Fig. 5c,d).  <\/p>\n<p>    As a further exemplar of the value of our BAP1 SGE data,    we identified a family whose proband presented at the age of 26    years with uveal melanoma. A review of the family history    revealed other BAP1-associated tumors segregating over    three generations (Fig. 6a,b), with sequencing    revealing a germline c.535C>T (R179W) variant in the    BAP1 gene. c.535C>T was depleted in our SGE    experiment, with a functional score of 0.122 and an FDR    of<0.01 (Fig. 6c). This variant had    been classified in the clinic as a VUS, but together with our    SGE data it has been reclassified as likely pathogenic (ACMG,    class IV), a result that will contribute to the clinical    management of this kindred. Of note, R179W falls in a highly    conserved region of BAP1, which includes the proton donor    residue at H169. At codon R179, the only SGE-tolerated    substitution is R179Q, with glutamine being the conserved    residue in the Drosophila melanogaster BAP1 ortholog    Calypso (Fig. 6d,e).  <\/p>\n<p>            a, Pedigree with a proband carrying a            c.535C>T variant (HGVSc,            ENST00000460680.6:c.535C>T; HGVSp,            ENSP00000417132.1:p.Arg179Trp; R179W) in exon 7 of            BAP1. The proband was a 33-year-old male            presenting with uveal melanoma (UM) at 26 years (arrow)            whose father, uncle and grandmother presented with            melanoma (ME), basal cell carcinoma (BCC) and renal            cell carcinoma (RCC), respectively. The probands            mother was not known to be a carrier and died of            metastatic (M) cancer, possibly cholangiocarcinoma            (CCA). The pedigree follows established nomenclature:            black, clinically confirmed disease (malignant tumor);            square, male; circle, female; diagonal line, deceased;            d., age at death; number, age at disease presentation.            An asterisk indicates the patient for whom samples are            shown in b. b, Pathology of the primary            cutaneous melanoma in the patient from a. Top,            micrograph showing hematoxylin and eosin (H&E)            staining. Bottom, micrograph showing BAP1            immunohistochemical staining; staining is absent in            tumor tissue (black arrow) but is present (purple            cells) in immune infiltrate (red arrow). Scale bars,            100m. Micrographs are representative of three            histological sections. c, Functional scores            across exon 7. Exonic\/intronic ranges within the target            region are shown, with points colored by VEP            consequence. Transparency based on FDR. Shape denotes            functional classification. The variant in a is            labeled. d, Multiple-sequence alignment of exon            7 created by global alignment of BAP1 orthologs from            eight species (gap open\/extension penalty=12\/3);            numbers are protein positions of human BAP1            (ENSP00000417132.1) and residues are colored by            identity (black, 100%; dark gray, 80100%; light gray,            6080%; white, <60%). R179 (and the highly conserved            H169 proton donor) is highlighted by a red arrow. Note            that the glutamine residue in Drosophila aligns            at human position R179, the only missense variant at            this position tolerated in SGE. e, Heat map (see            Extended Data Fig. 6 for the full            heat map) of amino acid substitutions for two key            positions, H169 and R179, colored by functional            classification. White space results from SNV saturation            not producing all amino acid substitutions. c.535C>T            produces R179W, which is depleted. R179R, a synonymous            change, is unchanged, other missense changes            (R179P\/L\/G\/A\/*) and R179 codon deletion are depleted            and only R179Q is tolerated. H169 in the catalytic core            is intolerant to all observed changes, except for a            synonymous change. Black circle, key synonymous            changes; white triangle, key missense changes.          <\/p>\n<p>            Source data          <\/p>\n<p>    Finally, to further explore the use of SGE data and identify    novel BAP1 variants, we queried tumor sequence data for    a cohort of 394,756 patient samples in the Foundation Medicine    database29 and found 12,172    (3.1%) unique BAP1-altered specimens harboring 13,283    BAP1 alterations, including all possible changes at    codon R146 (Extended Data Fig. 9ad). Because these    variants were derived from tumor-only sequencing, germline DNA    was obtained30 and sequenced    from a patient with breast\/cholangiocarcinoma whose sister was    diagnosed with renal carcinoma (Extended Data Fig.    9ad). Both patients    were confirmed to carry a germline R146K (c.437G>A) variant    identified as disruptive by SGE (Extended Data Fig.    9ae), providing    another example of how SGE data can help improve diagnostic    precision.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read more:<br \/>\n<a target=\"_blank\" href=\"https:\/\/www.nature.com\/articles\/s41588-024-01799-3\" title=\"Saturation genome editing of BAP1 functionally classifies somatic and germline variants - Nature.com\" rel=\"noopener\">Saturation genome editing of BAP1 functionally classifies somatic and germline variants - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Optimized SGE approach improves experiment quality We developed a HAP1 DNA ligase 4 (LIG4)-knockout (KO) line with genomic integration of a clonally derived Cas9 (HAP1-A5) and confirmed BAP1 essentiality in this line (Figs. 1 and 2a), high Cas9 activity (Fig. 2b and Extended Data Fig <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/saturation-genome-editing-of-bap1-functionally-classifies-somatic-and-germline-variants-nature-com\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[],"class_list":["post-1126835","post","type-post","status-publish","format-standard","hentry","category-genome"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1126835"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1126835"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1126835\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1126835"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1126835"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1126835"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}