Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica … – Nature.com

Posted: February 18, 2024 at 10:06 am

High-quality genome assembly of representative morphotypes

To construct a pan-genome that encompasses the full range of genetic diversity in B. oleracea, we analyzed the resequencing data of 704 globally distributed B. oleracea accessions covering all different morphotypes and their wild relatives (Supplementary Tables 1 and 2). We identified 3,792,290 SNPs and 528,850 InDels in these accessions using cabbage JZS as reference genome22. A phylogenetic tree was then constructed using SNPs, which classified the 704 accessions into the following three main groups: wild B. oleracea and kales, arrested inflorescence lineage (AIL) and leafy head lineage (LHL; Fig. 1a and Supplementary Note 2). The phylogenetic relationship revealed in our study was generally consistent with those reported previously4,5,24,25. Based on the phylogeny and morphotype diversity, we selected 22 representative accessions for de novo genome assembly (Table 1).

a, Phylogenetic tree of 704 B. oleracea accessions. Different colors of branches indicate accessions from different morphotype groups. The images of the 27 representative accessions were placed next to their branches. The light blue, yellow and green backgrounds denote the following three main clusters: the wild/ancestral group, the arrested inflorescence lineage and the leafy head lineage. The red stars denote the 22 newly assembled genomes and the red rectangles denote five previously reported genomes. b, Phylogenetic tree of the 27 representative B. oleracea accessions, with the genome of B. rapa as the outgroup. c, The estimated insertion time (y axis) of all the full-length LTRs in the 27 B. oleracea genomes along the nine chromosomes (x axis) of B. oleracea. The lengths of nine chromosomes were normalized to 0100, proportional to their physical lengths. Each dot represents one LTR insertion event. The heatmap denotes the density of the full-length LTRs. Purple bars below each chromosome denote centromeric regions detected by centromere-specific repetitive sequences. d, Distribution of insertion time of full-length Copia and Gypsy LTRs in the 27 individual genomes. Each line represents a genome in the left graph. The two circles show the Copia and Gypsy LTRs that can be clustered into groups with sequence similarity of 90%. e, The heatmap shows the TAD prediction on chromosome eight of T10 (as an example), in which the region colored in dark red denotes a TAD structure. The line charts below the heatmap show the density of Copia and Gypsy LTRs, respectively, highlighting the enrichment of Copia LTRs in the centromere region, which is surrounded by high density of Gypsy LTRs.

We assembled genome sequences of the 22 accessions by integrating long-reads (PacBio or Nanopore sequencing), optical mapping molecules (BioNano) or high-throughput chromosome conformation capture data (Hi-C) and Illumina short-reads (Methods; Supplementary Note 2 and Supplementary Tables 37). The total genome size of these assemblies ranged from 539.87 to 584.16Mb with an average contig N50 of 19.18Mb (Table 1). An average of 98% contig sequences were anchored to the nine pseudochromosomes of B. oleracea. The completeness of these genome assemblies was assessed using benchmarking universal single-copy orthologs (BUSCO), with an average of 98.70% complete score in these genomes (Supplementary Table 8).

To minimize artifacts that could arise from different gene prediction approaches, we predicted gene models of both the 22 newly assembled genomes and the five reported high-quality genomes5,21,22,23 using the same annotation pipeline (Methods). Using an integrated strategy combining ab initio, homology-based and transcriptome-assisted prediction, we obtained a range of 50,346 to 55,003 protein-coding genes with a mean BUSCO value of 97.9% in these genomes (Table 1). After gene prediction, a phylogenetic tree constructed based on single-copy orthologous genes clustered the 27 genomes into three groups, similar to the results observed in the population (Fig. 1a and b).

A total range of 53.558.5% sequences in these B. oleracea genomes were TEs, with long terminal repeat retrotransposons (LTR-RTs) being the most abundant type (Supplementary Note 2). We further identified 4,703 to 6,253 full-length LTR-RTs (fl-LTRs) in these genomes (Supplementary Table 9), with recently inserted fl-LTRs enriched in centromeric regions (Fig. 1c). We revealed continuous expansion of Copia and Gypsy in all the genomes since four MYA (Fig. 1d). In addition, Copia TEs were clustered into more and larger groups than Gypsy based on sequence similarity (Fig. 1d), suggesting that Copia was under stronger expansion than Gypsy. More than 80% of the centromeric sequences were annotated as Copia in B. oleracea (Fig. 1e and Supplementary Fig. 1). Interestingly, these enriched Copia islands in centromeres were surrounded by high densities (>50%) of Gypsy in all the nine chromosomes of B. oleracea. Moreover, the topologically associating domain (TAD) structures overlapped with the Copia islands in all nine centromeric regions (Supplementary Fig. 1). This pattern was also found in six of ten chromosomes in B. rapa (Supplementary Fig. 2). These results suggest that Copia has an important role in the organization or function of centromeres through maintaining TAD structures.

We constructed an orthologous pan-genome comprising the 27 B. oleracea genomes. In total, we identified 57,137 orthologous gene families using OrthoFinder26 (Supplementary Note 3 and Supplementary Fig. 3). To investigate the retention variation of homoeologous genes among these mesohexaploid B. oleracea genomes, we further performed syntenic orthologous gene analysis (hereafter referred to as syntenic pan-genome). In the orthologous pan-genome, homoeologs were assigned to one orthologous family, whereas syntenic pan-genome separates them into different syntenic gene families. We detected a total of 87,444 syntenic gene families based on genomic synteny among these genomes of which 32,721, 24,902 and 22,423 families were located at LF, MF1 and MF2 subgenomes, respectively. The number of syntenic gene families increased when adding additional genomes and approached a plateau when n=25 (Fig. 2a), consistent with that of the orthologous pan-genome. We further separated all these syntenic gene families into 20,306 (23.2%), 10,086 (11.5%), 55,205 (63.1%) and 1,847 (2.1%) syntenic core, softcore, dispensable and private gene families, respectively, with a mean of 21,680 (41.5%), 10,724 (20.5%), 17,236 (32.9%) and 2,675 (5.1%) per genome (Fig. 2bd). We found significantly more TE insertions in syntenic dispensable and private genes than in syntenic core and softcore genes (Fig. 2e), suggesting that TEs contribute to genetic variations in these genes. The expression levels of syntenic core and softcore genes were significantly higher than those of syntenic dispensable and private genes (Fig. 2f). Moreover, the Ka/Ks values of the syntenic core genes were significantly lower than that of the orthologous core genes (Supplementary Fig. 4b), supporting more conservation of the syntenic core genes. We found that 44.6% of syntenic private and 38.2% of syntenic dispensable genes belong to orthologous core and softcore genes (Supplementary Fig. 4a), respectively. This illustrates the extensive differential gene loss of homoeologs during the evolution and diversification of B. oleracea.

a, The number of syntenic pan and core gene families in the 27 genomes. b, Composition of the syntenic pan-genome. The histogram shows the frequency distribution of syntenic gene families shared by different numbers of genomes. The pie chart shows the proportion of different groups of syntenic gene families. c, Percentage of different groups of syntenic gene families in each of the 27 genomes. d, Presence and absence information of all syntenic gene families in the 27 genomes. e,f, The average number of TE insertions in genes and the expression level of genes in different groups of syntenic gene families (two-sided Students t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5 IQR). Different lowercase letters above the box plots represent significant differences (P<0.05). g, Functional analysis (gene ontology) of lost genes in the syntenic softcore or dispensable gene families, in different B. oleracea morphotypes, highlighting strong function enrichment associated with specific metabolites. The number of lost genes in different morphotypes is provided in the tree diagram. h, Syntenic gene families were separated into three groups corresponding to the numbers of homoeologs (single-, two- or three-copy) retained from the Brassica mesohexaploidization event. The percentage of gene families in different pan-genome classes for these groups is shown in each of the 27 B. oleracea genomes. IQR, interquartile range.

We dived into genes that were prone to being lost in different lineages/morphotypes of B. oleracea. A total of 20,924 syntenic gene families were lost in one to 14 genomes, while they were retained in 15 to 27 genomes. Among these, 2,786 and 5,139 gene families were lost exclusively in LHL and AIL, respectively (Fig. 2g). Intriguingly, in AIL, 556 syntenic gene families with gene loss specifically in broccoli were enriched in functions of sulfate transport, thioester hydrolase activity and riboflavin biosynthesis. In comparison, 1,134 syntenic gene families with gene loss specifically in cauliflower were enriched in nicotinamine biosynthesis and thiamine metabolism. Similarly, syntenic gene families with gene loss only in specific LHL morphotypes were found to be enriched in functions related to specific metabolites (Fig. 2g). The observations that genes specifically lost in different morphotypes were enriched in functions of biosynthesis or metabolism of various nutrient contents, pointing to unique nutritional composition or flavor of specific B. oleracea crops. In addition, our analysis of homoeologous copy-number variation (CNV) among B. oleracea morphotypes revealed morphotype-specific loss of homoeologous genes, which may contribute to the evolution of these morphotypes through variation in gene copy dosage that is associated with expression dosage (Fig. 2h, Supplementary Note 3 and Supplementary Tables 10 and 11).

The 27 high-quality B. oleracea genomes provide essential resources for the accurate identification of large-scale SVs. We aligned the sequences of 26 B. oleracea genomes to the T10 reference genome using Nucmer27. A total of 502,701 SVs were identified using SyRI28, including 452,148 PAVs (50bp to 3.34Mb), 13,090 CNVs (50bp to 243.14kb), 2,263 inversions (1,022bp to 12.18Mb) and 35,200 translocations (9,002 intrachromosomal and 26,198 interchromosomal translocations; 505bp to 5.59Mb; Fig. 3a and Supplementary Fig. 5a). We randomly selected 30 large SVs (>8kb) and 30 short SVs (<8kb) for validation. Approximately 93% of the selected large SVs were validated by Hi-C paired-end reads; the remaining 7% could not be validated (Supplementary Fig. 6). For the selected short SVs, 97% were validated by long-reads; the remaining 3% were found to be false calls (Supplementary Fig. 7 and Supplementary Table 12).

a, The distribution of GC content (3341%), gene numbers (0200Mb1) and TE density (20100%) in the T10 reference genome, the nonredundant SVs (presence, 2100kb/Mb; absence, 20400kb/Mb and all SVs, 10400kbMb1) among 27 genomes, as well as the SNPs (1040kb1) and InDels (1030kbMb1) identified in the 704 B. oleracea accessions. b, The number of different types of SVs from the nonredundant set of SVs in individual B. oleracea genomes. c, The number of SVs present in different numbers of query genomes. The bottom lines colored in light blue, light orange and light green mark these accessions from the wild/ancestral group, the AIL and the LHL, respectively. The sample IDs colored in light orange and light green denote accessions from broccoli/cauliflower and cabbage, respectively. The red rectangle marks the accessions of broccoli/cauliflower, highlighting the lower number of SVs in broccoli/cauliflower compared to the other accessions. d, The number of private SVs in wild B. oleracea, broccoli/cauliflower and cabbage genomes, showing significantly more private SVs in wild B. oleracea than in others (n=7 versus 5 versus 7; two-sided Wilcoxon rank-sum test; centerline, median; box limits, first and third quartiles; whiskers, 1.5 IQR). e, The frequency distribution of SVs in the following five different genomic regions: upstream (within 3kb), exon, intron, downstream (within +3kb) and intergenic regions. The SV ratios in the five regions were calculated for each of the 27 genomes, and these values were then sorted and plotted from small to large for each of the five regions. f, The density of SV sequences per 100bp in gene bodies and 5kb flanking regions in the 27 B. oleracea genomes. The area plots mark the maximum and minimum values across the 27 B. oleracea genomes, and the lines denote average values.

We merged the 502,701 SVs into 56,697 nonredundant SVs. The number of these SVs ranged from 7,449 to 9,848 per genome (Fig. 3b). A total of 50,153 nonredundant PAVs were used in our subsequent analysis. Similar to that of orthologous and syntenic gene families, the number of SVs increased when adding additional genomes; this increase diminished when n=25 (Supplementary Fig. 5b). Modeling this increase29 predicts a total SV number of 58,4101,452. The number of shared SVs sharply declined for the first three genomes and slowly decreased thereafter. We identified 27 SVs present in all 26 query genomes, 168 SVs present in 2425 query genomes, 26,641 SVs present in 223 query genomes and 18,226 SVs present in only one query genome, opposite to the trend of gene family counts (Fig. 3c). The number of private SVs in wild B. oleracea is significantly higher than in broccoli/cauliflower and cabbage, indicating extensive loss of genetic diversity during domestication of B. oleracea (Fig. 3c and d).

SVs distributed preferably in upstream and downstream regions of genes compared to gene bodies (Fig. 3e). Corroborating with this, SV density was the lowest in gene bodies and increased with distance in flanking regions (Fig. 3f), suggesting that SVs affecting regulatory sequences are likely to be under less stringent selection pressure than those disrupting encoding sequences. Besides, we found that 75% of all SVs overlapped with TEs (Supplementary Fig. 5c). We further identified SV gene, being the closest gene to the given SV within a 10-kb radius. In total, we determined 11,377 SV genes based on the syntenic pan-genome, including 9,442 expressed genes. These expressed SV genes were then separated into six groups based on the distance between SVs and corresponding genes (Fig. 4a). The 27 B. oleracea genomes were separated into two groups (presence and absence) based on the SV genotype of each SV gene. To be independent of the reference genome used for SV calling, we defined the genotype with more sequence as presence and the genotype with less sequence as absence. Comparison of SV gene expression between absence and presence groups revealed high percentages of SVs that have an effect on gene expression, decreasing with distance from 83% when located in the CDS region to 66% when located in 510kb upstream of SV genes (Fig. 4a and Supplementary Table 13). In total, for 69% (6,526) of the 9,442 SV genes, the SV was associated with gene expression changes. Of these 6,526 SV genes, SV presence was associated with significantly (P=1.481011, binomial test) more SV genes with suppressed expression (3,536 SV genes) than promoted expression (2,990 SV genes; Fig. 4b).

a, Different types of SV genes, based on the location of the SV relative to the gene, with data on expression, show a high proportion of SV genes with gene expression changes. b, The expression of SV genes from 6,526 syntenic gene families, with separated expression values for the absence and presence genotype groups (of corresponding SV). The x axis shows two groups, of which 3,536 and 2,990 syntenic gene families associated with suppression and promotion SVs, respectively. The y axis shows the normalized (z score) expression values. The green/yellow lines link the average expression values from each syntenic gene family for their presence and absence of genotype groups. c, Comparison of CpG island density and the ratio of highly methylated CpG islands between different types of SVs in 1.5kb (n=484 versus 369 versus 2,794; permutation test for 10,000 times; centerline, median; triangle, mean; box limits, first and third quartiles; whiskers, 1.5 IQR) or 3kb (n=153 versus 148 versus 1,391; permutation test for 10,000 times; centerline, median; triangle, mean; box limits, first and third quartiles; whiskers, 1.5 IQR). d, The expression fold changes of SV genes between the presence and absence of genotype groups. The black stars below the term Suppress denote DNA methylation modifications. The x axis shows the distance between SV and SV genes.

We also found that methylation was strongly associated with the suppressed expression of SV genes (Supplementary Note 4 and Supplementary Fig. 8a). We examined the sequence signature of the SV presence genotype for the 3,536 suppression SVs and found that their CpG site density was significantly higher than that of the 2,990 promotion SVs (Fig. 4c). The methylation levels of these suppression SVs were also significantly higher than that of the promotion SVs (Supplementary Fig. 8b). Both the increased density of CpG sites and their increased methylation levels resulted in a strong increase of highly methylated CpG islands in suppression SVs compared to promotion SVs (Fig. 4c). Besides suppression SVs, promotion SVs were identified that were associated with increased expression of SV genes. We investigated the sequence composition of promotion SVs and found significant (P<0.001, permutation test) enrichment of transcription factor (TF)-binding sites, including TCP, MYB, NAC, ERF and GRAS (Supplementary Table 14). These specific domains, together with low sequence methylation levels and few CpG islands in promotion SVs, may cause increased transcription of corresponding SV genes.

To further assess the strength of the effect of SVs on gene expression in B. oleracea genomes, we calculated the mean expression of corresponding SV genes for each of the two genotype groups (Fig. 4b). SVs affected gene expression ranging from over tenfold reductions to over tenfold increases, with most expression changes falling between one-third and three times (Fig. 4b,d). Furthermore, SVs that affect gene expression were enriched within 3kb flanking regions of genes. These results indicate the important role of SVs in fine-tuning gene expression levels.

We then used the nonredundant 50,153 SVs to construct an integrated graph-based genome with the T10 genome as a standard linear base reference. By mapping reads of 704 B. oleracea accessions to this graph-based genome, we revealed a total of 40,028 SVs in the population (Supplementary Note 4). We randomly selected 62 SVs, of which 59 were validated by PCR amplification (Supplementary Fig. 9 and Supplementary Table 15). Besides SVs, we identified 4,901,625 SNPs and 573,033 InDels in the population. Linkage disequilibrium (LD) analysis between these SVs and SNPs showed that 54.78% of SVs had weak LD (r2<0.5) with SNPs (Supplementary Fig. 10), indicating that SVs cannot be fully represented by SNPs in this genomic study. Of the 7,685 SV genes found in the B. oleracea population, 4,366 SV genes were expressed and 3,216 SV genes were used for downstream analysis (Methods). The percentage of SVs significantly (P<0.05) associated with the expression of SV genes ranged from 68% in the gene body to 59% 510kb away from the genes. In total, 61% of these SVs were substantially associated with expression changes of their SV genes, slightly less than 69% among the 27-genome assemblies. The SV presence was substantially associated with suppressed expression of 1,071 (55%) genes or promoted expression of 888 (45%) genes, similar to that of the 27-genome analysis (54% suppression, 46% promotion).

We also performed SV-based eGWAS analysis using 17,696 expressed genes and 40,028 SVs as traits and markers, respectively (Methods). The expression of 8,180 genes was significantly associated (P<1.001010) with at least one SV. In total, 50,076 SV signals were identified, among which 23% (11,536) and 77% (38,540) were intrachromosomal and interchromosomal signals, respectively (Supplementary Table 16). Of the 11,536 intrachromosomal SV signals, 1,335 were cis-regulatory SVs, with 49% and 51% of them suppressing and promoting gene expression, respectively. The remaining 48,741 SV signals were trans-regulatory SVs, with 47% and 53% suppressing and promoting gene expression, respectively. These results further indicate the important and complex regulatory role of SVs in gene expression.

We adopted the casecontrol GWAS strategy30,31 to identify SVs associated with different morphotypes of B. oleracea (Methods). Using the cauliflower/broccoli accessions characterized by large arrested inflorescences as the case group, we obtained 1,655 SV signals with P<8.161045, representing the top 5% signals (Fig. 5a). These SVs were assigned to 492 SV genes (SV in gene bodies or 3kb flanking regions), of which 378 were expressed, harboring 122 suppression and 109 promotion SVs. One suppression SV (P=1.5410108; 112bp) was located 643bp upstream of the translation start site of the gene BoPNY (PENNYWISE; Fig. 5b), which functions in maintaining inflorescence meristem identity and floral whorl morphogenesis32. This SV was under strong negative selection in the arrested inflorescence morphotype, being present in 2% (4 of 195) of cauliflower/broccoli accessions, contrasting to a presence of 89% (386 of 434) of control group accessions (Fig. 5c). More importantly, BoPNY was significantly higher expressed (P=3.00103) in the absence genotypes (the major allele in cauliflower/broccoli) than in the presence genotypes (Fig. 5d). The methylation levels of both the presence SV and its flanking sequences were significantly (P=8.55106) higher than that of the absence genotype, which was negatively associated with the transcription level of BoPNY (Fig. 5e). We also identified two promotion SVs located closest to gene BoCKX3. Cytokinin oxidase (CKX) catalyzes the degradation of cytokinin and thus negatively regulates cell proliferation of plants33. Mutants of ckx3 and its ortholog ckx5 form more cells and organs become larger34. One SV (SV1; P=5.8110162) involved a 316-bp Helitron-type TE insertion located 86bp downstream of the translation stop site of BoCKX3 (Fig. 5f). SV1 was present in 97% (208 of 214) of the cauliflower/broccoli accessions, contrasting to only 0.2% (1 of 431) of accessions in the control group (Fig. 5g). The other SV (SV2, 257bp) was located in last exon of BoCKX3, resulting in a frame-shift mutation. SV2 was present in only 0.5% (1 of 213) of cauliflower/broccoli accessions, compared to 29% (126 of 434) of accessions in the control group (Fig. 5f,g). These two SVs form four potential haplotypes of BoCKX3; however, the haplotype containing two SVs does not exist in our B. oleracea population (Fig. 5h). The expression of BoCKX3 in haplotype 3 was significantly higher than in haplotypes 1 and 2 (Fig. 5i), supporting the expression-promoting effect of this downstream SV1. BoCKX3 was highly expressed in leaves but not in other organs such as the curd during curd development in cauliflower/broccoli (Fig. 5j). One hypothesis is that BoCKX3 negatively regulates leaf growth, thus saving energy for fast proliferating of curds. These examples demonstrate the bidirectional impacts of SVs on gene expression, specifically associated with morphotypes of cauliflower/broccoli.

a, Manhattan plot showing the SV signals associated with cauliflower/broccoli (significance was calculated by two-tailed Fishers exact test. A Bonferroni-corrected P<0.05 was interpreted as significant). The light red dots show the top 5% P values and deep red dots show the top 1% P values. b, One SV is associated with BoPNY. c, The number of accessions with presence or absence SV (associated with BoPNY) genotype for broccoli/cauliflower accessions and all the other accessions (statistical test: two-tailed Fishers exact test). d, Expression comparison of BoPNY between SV presence and absence accessions (two-sided Students t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5 IQR). e, Sequence methylation level around BoPNY between absence and presence genotype groups, which is negatively associated with the expression level of the gene. f, Two SVs associated with BoCKX3. g, The number of accessions with presence or absence SV (associated with BoCKX3) genotypes for broccoli/cauliflower accessions and all other accessions (statistical test: two-tailed Fishers exact test). h, The four possible haplotype groups are formed by two SVs. Haplotype 4 was not detected in our population. i, Expression comparison of BoCKX3 between the three haplotype groups (two-sided Students t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5 IQR). j, Expression of BoCKX3 in different tissues of cauliflower and cabbage, highlighting high expression of this gene in leaf 2 of cauliflower. Leaf 1 denotes fresh leaf before curd initiation; leaf 2 denotes fresh leaf during curd development; curd 1 denotes developing curd; curd 2 denotes mature curd. N indicates a missing value as cabbage makes no curds.

GWAS analysis was also performed using cabbage accessions as the case group, characterized by the leafy heads (Supplementary Note 5 and Supplementary Figs. 11 and 12). We revealed two promotion SVs (SV1 and SV2) located closest to BoKAN1, which regulates leaf adaxial/abaxial polarity35,36,37. SV1 was introduced by a 970-bp TE (PIF/Harbinger) insertion, which was under strong negative selection in cabbage accessions (Supplementary Fig. 11b and c), and SV2 was introduced by a 157-bp TE (Helitron) insertion, which was also under negative selection in cabbage accessions. Among the four haplotypes formed by the two SVs (Supplementary Fig. 11d), BoKAN1 was significantly (P=3.60107) lower expressed in haplotypes 1 and 2 that lacked SV1 than in haplotypes 3 and 4 that harbored SV1 (Supplementary Fig. 11e). We also revealed one promotion SV (P=3.691091) located closest to BoACS4 (Supplementary Fig. 12a), which encodes the key regulatory enzyme involved in the biosynthesis of the plant hormone ethylene38,39. This insertion was under strong negative selection in cabbage (Supplementary Fig. 12b). Expression of BoACS4 in cabbage accessions lacking this insertion was significantly lower (P=1.901014) than in control group accessions harboring the insertion (Supplementary Fig. 12c).

Another interesting SV was present in all 18 ornamental kale accessions, but absent in any other accession. This SV was a 280-bp TE (PIF/Harbinger) insertion, located 289bp upstream of the translation start site of a MYB TF (hereafter referred to as BoMYBtf; Fig. 6a and b). Previously, MYB TFs were found to be associated with purple traits in cultivars of B. oleracea, such as kale, kohlrabi and cabbage40. The expression level of BoMYBtf was significantly higher in ornamental kale than in other morphotypes (Fig. 6c), indicating that this TE insertion was associated with the promoted expression of BoMYBtf. TF-binding sites (that is NAC, TCP and ERF), which were substantially enriched in promotion SVs as aforementioned, were also found in this PIF/Harbinger TE sequence (Fig. 6d). We hypothesize that these TF-binding sites, hitchhiking with the TE insertion, are causal factors promoting the transcriptional activity of BoMYBtf.

a, One SV (PIF/Harbinger-type TE insertion) is associated with BoMYBtf. b, The number of accessions with presence or absence of SV (associated with BoMYBtf) genotypes for ornamental kale accessions and all other accessions (statistical test: two-tailed Fishers exact test). c, Expression comparison of BoMYBtf between SV presence and absence accessions (two-sided Students t test; centerline, median; box limits, first and third quartiles; whiskers, 1.5 IQR). d, TF-binding elements identified in the PIF/Harbinger insertion. e, Schematic diagrams of reporter constructs used for the LUC/REN assay. The upstream sequences of BoMYBtf from ornamental kale T18 (with TE, 1,239bp), wild B. oleracea T10 (without TE, 951bp), cabbage T20 (without TE, 968bp) and the SV sequence (TE itself, 280bp). The empty vector was set as mock control. The activities of these promoter constructs are reflected by the LUC/REN ratio (two-sided Students t test; data are presented as the means.d.). f, Distribution of the PIF/Harbinger insertion in the 27 B. oleracea genomes. g, Boxplot showing normalized (z score) expression of 44 syntenic gene families, with a PIF/Harbinger insertion within a 3kb region from the nearest genes. The light blue and light purple backgrounds denote these syntenic gene families with PIF/Harbinger insertions located within 1.5kb and 3kb to 1.5kb, respectively, of corresponding gene members (red stars); whereas the gray dots denote their syntenic gene members without PIF/Harbinger insertion (centerline, median; box limits, first and third quartiles; whiskers, 1.5 IQR).

The role of this PIF/Harbinger TE in increasing transcription of BoMYBtf in ornamental kale was further validated by the luciferase reporter experiment (Fig. 6e). Briefly, the MYB promoters of ornamental kale T18 (with TE), wild B. oleracea T10 (without TE), cabbage JZS T20 (without TE) and the SV (TE itself) were fused in pMini-LUC as reporters and transfected into tobacco leaves (Methods). The LUC/REN ratio of mini-T18 and mini-SV was significantly higher (P<0.05) than that of other samples, while no significant difference was observed between mock, mini-T10 and mini-JZS, confirming the expression promotion effect of this PIF/Harbinger TE. Moreover, we investigated this PIF/Harbinger TE across all the 27 B. oleracea genomes. We found 60 insertions located within 3kb flanking regions of genes, with 44 associated genes being expressed (Fig. 6f). When comparing their expression among the 27 genomes, 31 genes harboring the insertion showed higher expression levels than their counterparts lacking the insertion, whereas this insertion in the remaining 13 genes did not result in increased expression (Fig. 6g). These results further support the common transcription promotion function of this PIF/Harbinger TE insertion in B. oleracea genomes.

More:
Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica ... - Nature.com

Related Posts