Modelling clinical DNA fragmentation in the development of universal PCR-based assays for bisulfite-converted, formalin-fixed and cell-free DNA sample…

Posted: September 27, 2022 at 8:53 am

Modelling random DNA fragmentation

To begin our study, we required a model that would accurately reflect the properties of a stochastically fragmented DNA sample. The odds that a region targeted by a PCR assay will be interrupted by a DNA breakage in randomly fragmented DNA depend on the length of the region and the size of the fragments. These odds are effectively determined by establishing two adjacent fragment-sized sliding windows (wherein the end of one fragment is the start of another) and calculating the number of times a region is fully within the first fragment window, compared to the number of times the region is situated within both windows (Fig.1).

Diagram depicting example calculation of the proportion of intact copies of a target region (4bp) given a single specified fragment length (6bp). This calculation can be viewed as the probability that a region will not be cleaved at any point along its length if a genome were broken into equal length fragments. The fragment-sized Window 1 sliding across this region depicts all possible fragmentation states for this region. The intact proportion is calculated as the number of states where the region remains entirely within the fragment window over the total number of possible fragmentation states. Window 2 demonstrates that all possible states are represented at the point before the region fully exits Window 1, as these states are then repeated in this adjacent window.

This model is represented in Eq.(1), which determines the probability that a region of DNA will remain unbroken for a given fragment length:

$${text{proportion}};{text{intact}} = frac{{{text{f }}{-}{text{ r }} + { }1}}{{text{f}}},$$

(1)

where r is the length of the DNA region and f is the length at which the DNA is fragmented. However, DNA samples do not fragment at a single length but rather as a distribution, and by incorporating size distribution profiles, which contain the concentration of DNA at each fragment length, the proportion of intact target regions within a fragmented DNA sample can be calculated, as detailed in Eq.(2):

$${text{proportion}};{text{intact }} = frac{{mathop sum nolimits_{{f = r{ }}}^{n} frac{{{text{f }}{-}{text{ r }} + { }1}}{{text{f }}}{ }C_{f} }}{{mathop sum nolimits_{{f = m{ }}}^{n} C_{f} }},$$

(2)

where n is the length of the longest fragment within the sample, m is the length of the shortest fragment, and Cf is the concentration of each fragment length (i.e., pg/l).

We next sought to design qPCR and ddPCR assays that could be used to interrogate DNA fragmentation. A major focus of this assay design was to incorporate design elements that would enable the assays to be used on clinical cancer samples, as these samples are some of the most common types to undergo stochastic fragmentation. However, cancer samples are also prone to chromosomal amplifications and deletions within the genome16,17,18, and PCR assays that intersected with frequently amplified/deleted regions would result in inaccurate measures of concentration when these copy number aberrations (CNAs) occurred (i.e., the concentration of a region that is unique in the human reference genome is assumed to correspond to the overall number of genome copies within the measured sample). To control for this, we undertook an analysis to determine the regions of the human genome that were least affected by CNAs. CNA data that had been tested for statically significant gain or loss was retrieved from the Catalogue of Somatic Mutations in Cancer (COSMIC release v78)19,20. This data was filtered to exclude cell line samples and samples missing total copy number or minor allele values. Only 27 of the 10,637 samples remaining after this filtering were not derived from The Cancer Genome Atlas (TCGA) data21. We, therefore, opted to exclusively use these 10,610 TCGA samples to better ensure a dataset with experimental and analytical consistency in determining copy number changes (S1 Table).

After filtering out regions that were not covered by Affymetrix copy number probes (e.g., centromeres) the only regions completely devoid of CNAs were telomeric and likely artefactual. Outside of telomeres the minimum CNA region contained 5 samples. To determine a reasonable threshold for low copy number variation that might provide us with enough region space to meet the requirements of our assay design, we calculated the number of samples with CNAs in commonly used copy number reference genes. We found that the Human TaqMan Copy Number Reference Assays targeting RNase P and TERT offered by Applied Biosystems had CNAs in 61 and 360 of the total 10,610 samples, respectively, and the well-established standard reference gene RPP30 had CNAs in 23 samples. Based on this we set a threshold at the bottom 10th percentile of regions, excluding those where greater than 34 samples had significant copy number variation (Fig.2A). After applying this filter, we were left with 621 megabases across 858 non-contiguous regions on 22 chromosomes.

Design and performance of PCR assays against copy-neutral regions in the genome. (A) Circos plot depicting the percent of samples that undergo copy number aberration (CNA) in cancer. Chromosomes are shown in the outermost ring and include an overlay of cytogenetic Giemsa banding and centromeres marked with a red band. The second outermost ring shows the 946,615 Affymetrix Genome-Wide Human SNP Array 6.0 copy number probes used for the detection of CNAs by the TCGA. The final layer is a histogram displaying the number of samples that underwent statistically significant CNA (either loss or gain) within each region. Each grid line represents 1% of the 10,610 total samples. Universal assays were designed to target regions in the bottom 10th percentile of CNAs, excluding regions that are not covered by the Affymetrix CNV probes (e.g., centromeres). Less than 35 of the 10,610 samples (<0.33%) have CNAs in these regions, represented on the histogram as a dotted white line (above which regions were excluded). (B, C) Standard curves estimating amplification efficiencies of universal quantitation assays in 4-plex qPCR on gDNA (B) and bisulfite-converted DNA (C). Curves are artificially offset for better visualisation. E=efficiency.

We next designed a single-tube 4-plex quantitative PCR assay targeting these CNA neutral regions, which included a variety of design considerations to maximize the utility of the assay and minimize confounding effects. First, each assay would target a separate chromosome to minimize inaccurate quantification due to the remote possibility that one of the chromosomes, or at least a large portion, may be affected by CNAs. Given the size and number of regions, the second design consideration was identifying assay regions that would be unaffected by bisulfite conversion treatment, since the bisulfite conversion process is used to examine DNA methylation and is a common application in cancer genomics but also leads to substantial sample fragmentation and loss. To address this design consideration the CNA neutral regions were further analysed to identify primer and probe regions that were cytosine-free and would, therefore, be unaffected by the bisulfite conversion process. Notably, use of the assays on bisulfite material requires an extra step in qPCR data analysis to correct for the fact that only one DNA strand is quantified, resulting in a positive shift of 1 cycle threshold when compared to the unconverted genomic DNA (gDNA) counterpart.

The third design criterion was to enable assessment of the degree of sample fragmentation using this 4-plex assay. To achieve this, two of the assays were designed to be 125bp in length, and two were designed to be 175bp long. By taking the ratio of concentrations for the long to short assays, a quantitative metric for sample fragmentation can be imputed for any sample.

Finally, we sought to establish the combination of fluorescent probe chemistries that would enable successful multiplexing quantitation using either standard qPCR or ddPCR. In qPCR four different probe fluorophores (FAM, HEX, Cy5 and Texas Red) were used, whereas ddPCR 4-plex was achieved using a method developed by Dobnik et al. (2016)22 that uses two FAM probes and two HEX probes and varies probe concentrations to alter the resulting levels of fluorescence amplitude, allowing for the detection of two targets per fluorescence channel (S1 and S2 Figs).

After all these design criteria were successfully implemented, we next undertook experiments to verify the amplification fidelity and efficiency of each of the four assays. The fidelity of the assays was established by performing standard PCR and qPCR on a variety of sample types (buffy coat DNA, cfDNA, and bisulfite-converted DNA) and analysing the PCR products by standard DNA gel electrophoresis to confirm that only a single PCR amplicon was produced in singleplex (S3 Fig), and that multiplex assays produced only two bands of the expected sizes (S4 and S5 Figs). Next, the amplification efficiencies of all assays were determined using LinRegPCR window-of-linearity analysis23, and standard titration curves; this was done for all four amplicons in both singleplex and multiplex configurations, for both genomic and bisulfite-converted DNA, using both fluorescent dye and PrimeTime qPCR probes in multiple fluorophore configurations (Fig.2B,C, Table 1). Notably, all assays demonstrated>90% amplification efficiency across all conditions, indicating robust performance. Primer and probe sequences can be found in S2 Table.

The [long]/[short] ratios of any two target region lengths can be determined by applying the following equation to fragment size distribution data (Eq.3):

$${text{[long]/[short]}} = frac{{mathop sum nolimits_{{f = b{ }}}^{n} frac{{{text{f }}{-}{text{ b }} + { }1}}{{text{f }}}{ }C_{f} }}{{mathop sum nolimits_{f = s}^{n} frac{{{text{f }}{-}{text{ s }} + { }1}}{{text{f }}}{ }C_{f} }},$$

(3)

where b is the length of the longer region and s is the length of the shorter region.

To evaluate how well our model of stochastic fragmentation fit with experimental results we compared [175bp]/[125bp] ddPCR and qPCR ratios with those derived using Eq.(3) on Agilent 2100 Bioanalyzer fragment size concentration data. This analysis was performed on seven levels of increasing fragmentation induced by the ultrasonication of pooled buffy coat gDNA. The ddPCR and qPCR [175bp]/[125bp] ratios of our sonicated samples both showed high goodness-of-fit for ratios derived using Eq.(3), with R-square values of 0.995 and 0.989 for ddPCR and qPCR, respectively (Fig.3A,B).

Modelling and quantifying randomly fragmented DNA. (A) Table showing DNA samples sonicated to different fragment lengths, their fragment distribution profiles in electropherogram and pseudo-gel form, and comparison between the theoretically (Eq.3 applied to Bioanalyzer data) and experimentally determined [175bp]/[125bp] differential amplicon length ratios. The full unedited pseudo-gel image for these sonicated samples can be found in S6 Fig. (B) Line graph plotting the [175bp]/[125bp] ratios determined by qPCR, ddPCR, and our mathematical model applied to fragment size distribution data (Bioanalyzer) on differentially fragmented DNA samples. (C) Comparison of nucleic acid quantification methods on fragmented DNA. PCR data points are averages of the two universal assays for each amplicon length per well. Fluorometric (Qubit) and absorbance (Nanodrop) spectroscopy measurements were made on each sample on three separate occasions. Spectroscopy concertation measurements are depicted in ng/l and PCR as copies/l. Axes are scaled so that 1 copy=3.5pg.

Quantification of DNA samples affects all subsequent experimental steps and can lead to costly experimental failures if this step is not performed accurately. Therefore, to further extend our study we next compared the effects of fragmentation on nucleic acid quantification techniques using our sonicated DNA samples, referred to here by their peak (modal) fragment sizes: 150, 195, 283, 694, 828, 1082, and 1504bp.

One overlooked aspect of DNA fragmentation is that it results in fewer adjacent base pairs for fluorescent DNA dyes to intercalate when dye-based fluorometric methods are used. Thus predictably, and as other studies have noted1,2, the mean DNA concentration measured by fluorescence spectroscopy (Qubit 2.0) decreased with increasing fragmentation (p<0.001; one-way ANOVA), with untreated gDNA measuring at 50.40ng/l (SD=0.72), and the most fragmented sample (150bp) at 35.27ng/l (SD=2.14), which calculate to 14,400 (SD=206) and 10,100 (SD=613) genome copies, respectively, assuming 1 genome weighs 3.5pg based on the following formula:

$$begin{aligned} {text{Amount}} left( {{text{pg}}} right) & = frac{{{text{length }}left( {{text{bp}}} right)*{text{pg}}/{text{g}}*{text{weight}};{text{of}};{text{bp}} left( {{text{g}}/{text{mole}}/{text{bp}}} right)*{text{copies }} left( {{text{molecules}}} right)}}{{{text{Avogadro's}};{text{number }} left( {{text{molecules}}/{text{mole}}} right)}} \ Amount left( {pg} right) & = frac{{3,234,830,000* 10^{12} *650*1}}{{6.022*10^{23} }} \ end{aligned}$$

(4)

For absorption spectroscopy (Nanodrop 1000), the mean measurement for intact gDNA was 68.40ng/l (SD=1.97), which calculates to 19,600 (SD=563) genome copies. Although there was no dose-dependent trend towards decreasing concentration with increasing fragmentation, a one-way ANOVA did show a significant difference in concentration (p<0.001), and a Tukey's HSD test found the concentration of intact gDNA to be significantly higher than all seven levels of fragmentation (p<0.001). The highest mean concentration measured for the fragmented gDNA was 63.10ng/l (SD=0.79; 150bp) and the lowest was 57.43ng/l (SD=0.32; 283bp), which calculate to 18,100 (SD=226) and 16,400 (SD=92) genome copies, respectively.

Both qPCR and ddPCR measured substantial downward trends in concentration with increasing fragmentation (Fig.3C). This decline in amplifiable copies with increasing fragmentation reflects an increasing number of breakages in the targeted regions, the magnitude of decline being greater for the 175bp amplicon as longer target regions are more likely to be cleaved. ddPCR on the intact gDNA measured 18,984 (SD=765) and 19,058 (SD=608) mean copies for the two 125bp assays and 18,905 (SD=308) and 19,306 (SD=246) for the two 175bp assays.

The mean absorbance spectroscopy estimate for the number of genome copies in our intact gDNA sample was only 2.8% greater than the combined mean of the four ddPCR assays (M=19,063, SD=150). Whereas, the mean number of genome copies estimate for fluorescence spectroscopy was 25% lower, suggesting this method also underestimated intact, not just fragmented, DNA concentration. Our results, therefore, show that absorbance spectroscopy is the most accurate method for quantifying overall nucleic acid concentration, regardless of the degree of fragmentation. However, this technique lacks sensitivity and becomes increasingly inaccurate at the lower end of its analytical range (15ng/l)24. Absorbance spectroscopy is also highly susceptible to reporting falsely high concentrations due to protein contamination and/or phenolic compounds that absorb UV. PCR-based quantification is highly sensitive and most accurately measures the amount of amplifiable DNA at the amplicon length used. Our universal multiplex assay and accompanying online tool Fragment Calculator, which we detail in the following section, extends this ability to estimate the amount of amplifiable DNA of any given region length, while also providing an estimate of overall concentration when working with human genomic or bisulfite-converted DNA.

In addition to describing the fragmentation of the sample, the dual 175 and 125bp assays, combined with representative DNA samples, can also be leveraged to estimate the concentration of any other sized DNA region. To better enable this we designed the Fragment Calculator online tool to provide a more quantitative and actionable estimate of fragmentation (www.primer-suite.com/fragcalc). This tool uses measured 175bp and 125bp concentrations and the [175bp]/[125bp] ratio to estimate the average fragment length of a genomic or bisulfite-converted human DNA sample, the total number of genome copies in a measured sample, as well as the number of amplifiable (unbroken) instances of a DNA region of any length. This tool uses the fragment size distribution data of our seven sonicated DNA samples with average fragment lengths of 254, 291, 428, 493, 590, 745, and 1274bp, a highly fragmented FFPE DNA sample with an average fragment length of 92bp to represent the lower bounds of random fragmentation, and four gDNA samples with average fragment lengths of 6714, 15,422, 34,625 and 41,496bp for the upper bounds (S1 File).

The number of intact copies of an input DNA region length is estimated by taking the two [175bp]/[125bp] ratios from our representative fragment size distribution data that an input [175bp]/[125bp] ratio falls between (x1,x2), calculating the corresponding [125bp]/[input size] ratios using Eq.(3) on these size distribution data (y1, y2), determining the slope between these points to estimate the [125bp]/[input size] ratio corresponding to the input [175bp]/[125bp] ratio, and dividing the 125bp concentration by this ratio. For example, if the concentration measured for a fragmented DNA sample is 1000 copies for the 125bp amplicon and 700 copies for the 175bp amplicon, the input [175bp]/[125bp] ratio is 0.7, which falls between the [175bp]/[125bp] ratios of the 291bp (0.669) and 428bp (0.778) reference samples. To estimate the concentration of a 50bp region, for example, the corresponding [125bp]/[50bp] ratios determined using Eq.(3) are 0.585 and 0.707, for the 291bp and 428bp reference samples, respectively. The 50bp concentration is then calculated using the following linear equation:

$${text{y}} = mx + y_{0} ,$$

(5)

$$m = frac{{y_{2} - y_{1} }}{{x_{2} - x_{1} }}$$

$$m = frac{{0.707 - 0.585{ }}}{0.778 - 0.669}$$

$$[125{text{bp}}]/[50{text{bp}}] = 1.119*0.7 - 0.164,$$

$$[50{text{bp}}] = frac{{[125{text{bp}}]}}{{[125{text{bp}}]/[50{text{bp}}]}},$$

$$[50{text{bp}}] = frac{{1000 ;{text{copies}}}}{0.619},$$

$$[50{text{bp}}] = 1615 ;{text{copies}},$$

where m is the slope and y0 is the y-intercept. The number of genome copies is also estimated using this same method by dividing the input 125bp concentration by the [125bp]/[1bp] ratio. Similarly, the average fragment length is estimated using the [175bp]/[125bp] ratios from our fragment size distribution data (x1,x2) and their corresponding average fragment lengths (y1, y2) (Fig.4).

Fragment Calculator online tool with example inputs. The concentrations measured by the two amplicon sizes of our universal quantitative PCR 4-plex assay (125bp and 175bp) can be used to estimate the total concentration (i.e., the number of genomic copies), average fragment length of the sample, and the concentration of intact copies of any input region size.

Importantly, Fragment Calculator assumes fragment distributions for the samples being estimated to be similar to those of our representative samples. However, in our experience working with these assays, we have found FFPE samples do not behave like untreated DNA samples. The [175bp]/[125bp] ratio for FFPE samples is generally much lower than the ratio calculated from the size distributions of these samples using Eq.(3). This reveals that there is generally less amplifiable DNA in FFPE samples than their size distribution profiles suggest, which we hypothesise is likely due to a combination of single-stranded breaks and incomplete reversal of DNA crosslinking. Our assays are, therefore, a better indicator of the amount of amplifiable FFPE treated DNA than fragment size distribution data from microfluidic capillary electrophoresis instruments like the Agilent 2100 Bioanalyzer.

Further complicating this, however, is evidence that even regions of the same length can have substantially different concentrations of amplifiable FFPE treated DNA. Some of our routine quality control and quantification analyses of FFPE treated samples have revealed vast differences in the number of copies measured by the two 125bp assays, and these differences are consistent among numerous FFPE samples (S2 File). Despite assays having the same length amplicons, differences in the number of amplifiable copies are likely to occur at high degrees of fragmentation, for instance, due to differences in binding efficiencies among primers when their target regions are truncated. Indeed, we regularly observe statistically significant differences in the number of copies measured by assays of the same size in highly fragmented pooled buffy coat gDNA samples subjected to ultrasonication, some examples of which are forthcoming. However, these differences are relatively small in magnitude and may be due to sequence-specific biases in sonication-induced scission25,26. We hypothesise that the much greater differences we observe in FFPE samples may emerge due to differences in the degree to which crosslinking is reversed among regions, as well as potential differences in their susceptibility to DNA breakage. These differences may reflect an underlying nucleosome footprint given that formaldehyde cross-linking is more efficient in nucleosome-bound DNA, as evidenced by the FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements) technique27.

Since PCR-based assays that target both genomic and bisulfite-converted DNA provide more accurate measures of bisulfite conversion recovery than other quantification techniques28, we next assessed the performance and utility of our universal multiplex assay to compare the recovery and degree of fragmentation of three commonly used commercial bisulfite conversion kits (MethylEasy Exceed, EZ DNA Methylation-Gold, and EZ DNA Methylation-Lightning) across three starting concentrations (500, 50 and 5ng) using high molecular weight (HMW) gDNA.

A three-way ANOVA on the qPCR results found significant effects of starting concentration (p<0.001), assay (p<0.001), and conversion kit (p<0.001) on recovery (Fig.5A). Additionally, a significant interaction was found between starting concentration and kit (p<0.001), resulting from an increase in recovery with decreasing concentration in MethylEasy Exceed but a decrease in EZ DNA Methylation-Gold and EZ DNA Methylation-Lightning. Trends were similar for the 125bp and 175bp assays, except in MethylEasy Xceed where the proportional increase in mean recovery between 50 and 5ng was greater in 125bp assays (22%, SD=12 vs. 32%, SD=5) compared to the 175bp assays (16%, SD=10 vs. 20%, SD=5; Fig.5B). As for fragmentation, a two-way ANOVA found a significant effect of conversion kit on the [175bp]/[125bp] ratio (p<0.001), no significant effect of starting concentration (p=0.251), but a significant interaction between kit and concentration (p=0.027) arising from a decrease in the [175bp]/[125bp] ratio of MethylEasy Xceed with decreasing starting concentration.

Universal assay comparisons of DNA recovery and fragmentation by bisulfite conversion kits. (A) Recovery and fragmentation across different starting concentrations as measured by universal quantitation assays in 4-plex qPCR. (B) Plots comparing the recovery and fragmentation trends from qPCR data across decreasing starting concentrations. Recovery data points are averages of the two universal assays for each amplicon length and these values were divided to determine the [175bp]/[125bp] ratios. (C) Recovery and fragmentation measured by ddPCR 4-plex. Also includes results from in-house bisulfite protocol. (A, C) Each conversion was conducted in six replicates per concentration for each kit. [175bp]/[125bp] fragmentation ratios were calculated by dividing the average copies of the two 175bp assays by the average of copies the two 125bp assays. Error bars represent one standard deviation.

Due to the low starting concentration and recovery of the 5ng samples, we did not have enough sample left for ddPCR analysis and therefore only ran the 500ng and 50ng samples. In addition to the three commercial kits, we also included our in-house bisulfite conversion protocol in these ddPCR comparisons (Fig.5C). A three-way ANOVA showed similar results to the qPCR analysis, with significant effects of starting concentration (p<0.001), assay (p<0.001), and conversion kit (p<0.001) on recovery, and a significant interaction between kit and concentration (p=0.001). Similar to qPCR, this interaction resulted from declines in the mean recovery of similar proportions between 500 and 50ng in all kits except MethylEasy Xceed, which showed a mild increase (13%, SD=4 vs. 16%, SD=11). A two-way ANOVA found a slight statistically significant difference in the [175bp]/[125bp] ratios among conversion kits (p=0.033), which a Tukey's HSD test showed resulted from a significant difference (p=0.048) between DNA Methylation-Lightning (M=0.83, SD=0.05) and MethylEasy Xceed (M=0.75, SD=0.07). Our in-house method and DNA Methylation-Gold had mean ratios of 0.77 (SD=0.04) and 0.81 (SD=0.07), respectively. To estimate the absolute nucleic acid recovery and average fragment size after bisulfite conversion we used our Fragment Calculator tool on combined qPCR and ddPCR results (Table 2).

Snyder et al. (2016)11 identified nucleosome protection peaks using deep sequencing of pooled cfDNA samples. Implicit in these analyses is the fact that nucleosome position correlated with the enrichment of fragments at specific locations, which could only occur if nucleosome positions were at least somewhat conserved among people. However, it was unclear the extent to which these peaks might shift between individuals. If little movement occurs and peaks are instead universally conserved, this would have important implications for assay design. Targeting such peaks would maximise an assays sensitivity in cfDNA while failing to consider nucleosome protection could severely reduce sensitivity.

Snyder et al. (2016)11 calculated a Windowed Protection Score (WPS) for each nucleotide position within the mappable human genome by summing the number of sequenced 120180bp cfDNA fragments that wholly overlap a centred 120bp window and subtracting the number that truncate within this window. Peaks in nucleosome-mediated protection were then called by identifying contiguous regions of elevated WPS. Using the nucleosome protection peaks determined for the pooled healthy sample CH01, we designed two cfDNA assays targeting nucleosome protection peaks that could also be used for bisulfite-converted DNA material: a 95bp assay targeting chromosome 2 (cfUQ02) with an above-average WPS of 108 and maximum distance of 62bp from the local maxima, and a 100bp assay targeting chromosome 11 (cfUQ11) with a below-average WPS of 30 and a maximum distance of 56bp. The mean WPS of the nearly 13 million peaks identified in the CH01 sample is 63.7 (SD=41.4). We also designed several staggered PCR assays of varying lengths to flank each of these regions.

15 cfDNA samples isolated from the blood plasma of breast cancer patients were profiled using dye-based ddPCR to compare the number of amplifiable copies of our universal cfDNA assays along with these staggered assays. We observed that some samples displayed substantial differences in amplifiable copies among assays whereas others did not and that this appeared to coincide with the technique used for cfDNA isolation. We measured the fragmentation profiles of these samples and found 6 displayed characteristic~166 peaks with no sign of HMW contamination, which we thus classified as true cfDNA (Fig.6A), 6 had little to no cfDNA peak and were reclassified as contaminating HMW DNA (Fig.6B), and 3 had strong cfDNA peaks but also possible or likely contamination by HMW DNA and were excluded from analysis (S7 Fig). Although high levels of HMW DNA can occur in cfDNA due to non-apoptotic cell death (e.g., necrosis), we suspect the source in these samples was instead the result of poor plasma separation and extraction. Regardless of its source, we only expect to find nucleosome-mediated patterns of fragmentation in the DNA of apoptosed cells, and HMW DNA is likely to obscure these patterns.

Effects of amplicon length and distance from nucleosome protection peak on intact copies in cfDNA. (AB) Electropherograms and pseudo-gel images from Agilent 2100 Bioanalyzer with a High Sensitivity DNA Chip (2100 Expert version B.02.10.SI764). DNA samples are from plasma of breast cancer patients, except sample labelled sDNA which is a pooled buffy coat gDNA sample sonicated and gel-purified to produce a similar fragment size distribution to cfDNA. Samples classified as true cfDNA samples (A) were isolated using our in-house phenolchloroform method (14) and QIAamp Circulating Nucleic Acid Kit with EconoSpin All-In-One Mini Spin Columns (Epoch Life Sciences) instead of columns supplied with the kit (56). Samples classified as contaminating buffy coat DNA (B) were isolated using QIAamp Circulating Nucleic Acid Kit (711) and in-house phenolchloroform method (12). (CE) Plots of ratio to mean copies (assay/sample mean) against amplicon length (C) and against distance from nucleosome protection peak (D) for assays targeting chr11 nucleosome protection peak locus, and against distance from nucleosome protection peak for assays targeting chr2 locus (E). Box and whisker plots are centred above corresponding amplicon positions for each assay, along with cell line data of nucleosome signal (K652 and GM12878) from Kundaje et al. (2012)29 and nucleosome protection peak position (blue tick) from Snyder et al. (2016)11, adjoined by characteristic 146bp nucleosomal DNA length (blue) and 10bp linker DNA (red). Sample numbers for each sample type are gDNA=15, buffy coat DNA=6, breast cancer cfDNA=6, and sonicated DNA=1 (four technical replicates). (F-G) Box and whisker plots for chr11 102bp/56bp and chr2 142bp/62bp differential distance from protection peak copy count ratios (F), as well as the ratio to mean copies across the two loci for cfDNA samples (G). Sample numbers for each sample type are gDNA=20, colon cancer cfDNA=34, brain cancer cfDNA=10, and sonicated DNA=1 (four technical replicates). All four amplicons are 100bp in length. Letters above or below box and whisker plots (DG) represent homogenous subsets determined by post hoc Tukeys HSD analyses (=0.05) of one-way ANOVAs (p values on plots). The bottom line of each box represents the 25th percentile, top line the 75th percentile, and thick middle line the median. Whiskers extend up to a maximum of 1.5 times the height of the box. Any values that fall outside this range are classified as outliers (circles). Values that are greater than 3 times the height of the box are classified as extreme outliers (asterisks).

To normalise among samples of the same category the concentration measured for each assay was divided by the mean concentration of all assays within each region (chr11 or chr2), giving a ratio to mean copies (assay/sample mean). For ddPCR on HMW gDNA, all assays specific for unique regions should measure the same number of copies within the same sample. Therefore, the ratio of copies measured for a single assay to the mean copies of all assays should be 1:1 for intact gDNA, regardless of proximity to nucleosome peaks. Consistent with this, a one-way ANOVA on samples classified as contaminating HMW DNA found no statistical difference in ratio to mean copies among assays in the chr11 (p=0.100) and chr2 (p=0.239) regions. HMW gDNA samples extracted from the blood of 15 healthy individuals were also used as negative controls and similarly showed little variation in ratio to mean copies among assays. No significant difference was found among assays in the chr2 region (p=0.084). However, a significant difference was detected in the chr11 region (p=0.004), resulting from a minor effect of amplicon length on the number of amplifiable copies (Fig.6C). A similar trend appears to exist in the contaminating HMW DNA; however, its effects likely did not reach statistical significance due to the smaller sample size (6 vs. 15).

In contrast, the ratio to mean copies for cfDNA decreased with increasing distance from the nucleosome peak, with the highest ratio for each region being our universal cfDNA assays (cfUQ11 and cfUQ2). However, given that cfDNA is highly fragmented, differential amplicons sizes are likely to result in differences in the number of amplifiable copies, therefore confounding the effects of nucleosome protection. To control for this we used ultrasonication and gel purification to produce a blood pooled gDNA sample with a similar level of fragmentation as cfDNA, which we measured in four technical replicates for each assay to compare the effects of random fragmentation on the number of amplifiable copies. In the chr11 region, which had the greatest variance in amplicon size among assays, similar ratios were observed in the sonicated DNA and cfDNA for each assay tested (Fig.6D). A two-way ANOVA comparing these two sample types found a significant difference among assays (p<0.001) but no statistically significant interaction between sample type and assay, signifying that only the cfDNA level of fragmentation, and not nucleosome protection, was affecting the number of amplifiable copies (p=0.637). These results show that even small differences in amplicon length can have a significant impact on the number of amplifiable copies at such high levels of fragmentation but proximity to the nucleosome protection peak is likely providing little to no differential protection within this region.

Conversely, the assays targeting the chr2 region were far less variable in length and showed little difference in ratio to mean copies in the sonicated DNA, especially when compared to the cfDNA. A one-way ANOVA on the sonicated samples within this region did find significant differences in concentration ratios among assays (p=0.001); however, the magnitudes of these differences were small, they did not track with differences in amplicon length, and they appear to result from a positional effect, perhaps resulting from a sequence-specific bias in fragmentation within this region. Unlike the chr11 assays, the ratio to mean copies for the chr2 assays tracked the distance from the nucleosome peak in cfDNA, rather than the amplicon length. A two-way ANOVA comparing the sonicated and cfDNA samples found a significant difference among assays (p<0.001) as well as a significant interaction between assay and sample (p<0.001), which supports cfDNA having an effect on the number of amplifiable copies in this region beyond that caused by its level of fragmentation on differently sized amplicons (Fig.6E). Notably, a one-way ANOVA on the cfDNA samples showed no significant difference (p=0.495) in ratio to mean copies (M=1.00, SD=0.10 vs. M=0.97, SD=0.13) for the two assays with the most similar maximum distances from the nucleosome peak (92 and 99bp) and only 1bp difference in length (100 vs. 99bp.). Whereas, the 50bp distance (92 vs. 142bp) separating the two 100bp amplicons resulted in a significant decrease (M=1.00, SD=0.10 vs. M=0.78, SD=0.07; p<0.001), and the 95bp universal cfDNA assay with the smallest maximum distance from the nucleosome peak (62bp) had a significantly higher ratio (M=1.25, SD=0.06) than each of the other three assays (p<0.001; Tukeys HSD). Despite HMW contamination, the three samples with substantial cfDNA size peaks excluded from this analysis also revealed differences in copies among assays that match a nucleosome-mediated fragmentation pattern in the chr2 region (S3 File).

To further explore and confirm these results we designed probes for one flanking assay per region (in addition to the probes already designed for the cfUQ11 and cfUQ02 universal cfDNA assays), selecting those with the greatest difference between the sonicated and cfDNA samples. Where necessary, the forward or reverse primer for each assay was redesigned to normalise all amplicons to 100bp while maintaining the same maximum distance from the nucleosome peak. We ran these assays in duplex ddPCR on cfDNA samples extracted from the blood plasma of 34 patients with colorectal cancer and 10 patients with brain cancer, as well as gDNA samples from the blood of 20 healthy donors and four technical replicates of the sonicated gDNA. We then calculated the ratio of copies for the assay furthest to the assay closest to the nucleosome peak (chr11=[102bp]/[56bp] and chr2=[142bp]/[62bp]) and compared the four sample types for each region. For the chr11 region, a one-way ANOVA found no significant difference between the ratios of the colorectal (M=0.95, SD=0.11) or brain cancer (M=0.98, SD=0.11) cfDNA, gDNA (M=1.00, SD=0.06), or sonicated DNA (M=1.07, SD=0.03) samples (p=0.081). Although not significant, these differences tended towards a slight nucleosome-mediated protective effect (Fig.6F).

Conversely, a one-way ANOVA found a significant difference (p<0.001) among sample types for the chr2 region. A post hoc Tukey HSD test showed this difference was due to a drop in the [142bp]/[62bp] ratio in cfDNA, with gDNA (M=1.00, SD=0.09) and sonicated DNA (M=1.00, SD=0.03) being placed in one homogenous subset, and colorectal (M=0.67, SD=0.10) and brain cancer (M=0.62, SD=0.12) cfDNA placed in another (p<0.001). These results strongly reinforce our previous findings, showing that, unlike the chr11 nucleosome peak, the chr2 peak provides substantial and consistent protection from fragmentation among individuals. Furthermore, comparison across these two regions revealed that the stronger chr2 protection peak resulted not only in greater protection than the weaker chr11 peak but greater degradation in the adjacent valley (Fig.6G). A two-way ANOVA found significant differences (p<0.001) in the ratio to mean copies between the four assays, and no significant interaction (p=0.189) between the colorectal and brain cancer samples, indicating that the differences between assays were similar for these two cohorts. A Tukey HSD test showed significant differences between all four assays, with the chr2:142bp (M=0.81, SD=0.09), chr11:102bp (M=0.95, SD=0.06), chr11:56bp (M=1.00, SD=0.08), and chr2:62bp (M=1.24, SD=0.10) assays each being placed into separate homogenous subsets (=0.025). These results are consistent with cfDNA protection peaks being the result of nucleosome occupancy. As predicted, the protection peak with a low WPS provided weaker but more even protection within its occupied region and the peak with a high WPS provided greater but more narrow protection, thus validating the WPS metric that Snyder et al. (2016)11 applied in their analyses.

View post:
Modelling clinical DNA fragmentation in the development of universal PCR-based assays for bisulfite-converted, formalin-fixed and cell-free DNA sample...

Related Posts