{"id":1122312,"date":"2024-02-18T10:06:35","date_gmt":"2024-02-18T15:06:35","guid":{"rendered":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/uncategorized\/large-scale-gene-expression-alterations-introduced-by-structural-variation-drive-morphotype-diversification-in-brassica-nature-com\/"},"modified":"2024-02-18T10:06:35","modified_gmt":"2024-02-18T15:06:35","slug":"large-scale-gene-expression-alterations-introduced-by-structural-variation-drive-morphotype-diversification-in-brassica-nature-com","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/large-scale-gene-expression-alterations-introduced-by-structural-variation-drive-morphotype-diversification-in-brassica-nature-com\/","title":{"rendered":"Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica &#8230; &#8211; Nature.com"},"content":{"rendered":"<p><p>High-quality genome assembly of representative morphotypes    <\/p>\n<p>    To construct a pan-genome that encompasses the full range of    genetic diversity in B. oleracea, we analyzed the    resequencing data of 704 globally distributed B.    oleracea accessions covering all different morphotypes and    their wild relatives (Supplementary Tables 1 and 2). We identified    3,792,290 SNPs and 528,850 InDels in these accessions using    cabbage JZS as reference genome22. A phylogenetic    tree was then constructed using SNPs, which classified the 704    accessions into the following three main groups: wild B.    oleracea and kales, arrested inflorescence lineage (AIL)    and leafy head lineage (LHL; Fig. 1a and Supplementary    Note 2). The phylogenetic    relationship revealed in our study was generally consistent    with those reported previously4,5,24,25. Based on the    phylogeny and morphotype diversity, we selected 22    representative accessions for de novo genome assembly (Table    1).  <\/p>\n<p>            a, Phylogenetic tree of 704 B. oleracea            accessions. Different colors of branches indicate            accessions from different morphotype groups. The images            of the 27 representative accessions were placed next to            their branches. The light blue, yellow and green            backgrounds denote the following three main clusters:            the wild\/ancestral group, the arrested inflorescence            lineage and the leafy head lineage. The red stars            denote the 22 newly assembled genomes and the red            rectangles denote five previously reported genomes.            b, Phylogenetic tree of the 27 representative            B. oleracea accessions, with the genome of B.            rapa as the outgroup. c, The estimated            insertion time (y axis) of all the full-length            LTRs in the 27 B. oleracea genomes along the            nine chromosomes (x axis) of B. oleracea.            The lengths of nine chromosomes were normalized to            0100, proportional to their physical lengths. Each dot            represents one LTR insertion event. The heatmap denotes            the density of the full-length LTRs. Purple bars below            each chromosome denote centromeric regions detected by            centromere-specific repetitive sequences. d,            Distribution of insertion time of full-length Copia and            Gypsy LTRs in the 27 individual genomes. Each line            represents a genome in the left graph. The two circles            show the Copia and Gypsy LTRs that can be clustered            into groups with sequence similarity of 90%. e,            The heatmap shows the TAD prediction on chromosome            eight of T10 (as an example), in which the region            colored in dark red denotes a TAD structure. The line            charts below the heatmap show the density of Copia and            Gypsy LTRs, respectively, highlighting the enrichment            of Copia LTRs in the centromere region, which is            surrounded by high density of Gypsy LTRs.          <\/p>\n<p>    We assembled genome sequences of the 22 accessions by    integrating long-reads (PacBio or Nanopore sequencing), optical    mapping molecules (BioNano) or high-throughput chromosome    conformation capture data (Hi-C) and Illumina short-reads    (Methods; Supplementary Note 2 and Supplementary    Tables 37). The total genome    size of these assemblies ranged from 539.87 to 584.16Mb with    an average contig N50 of 19.18Mb (Table 1). An average of 98%    contig sequences were anchored to the nine pseudochromosomes of    B. oleracea. The completeness of these genome assemblies    was assessed using benchmarking universal single-copy orthologs    (BUSCO), with an average of 98.70% complete score in these    genomes (Supplementary Table 8).  <\/p>\n<p>    To minimize artifacts that could arise from different gene    prediction approaches, we predicted gene models of both the 22    newly assembled genomes and the five reported high-quality    genomes5,21,22,23 using the same    annotation pipeline (Methods). Using an integrated strategy    combining ab initio, homology-based and transcriptome-assisted    prediction, we obtained a range of 50,346 to 55,003    protein-coding genes with a mean BUSCO value of 97.9% in these    genomes (Table 1). After gene    prediction, a phylogenetic tree constructed based on    single-copy orthologous genes clustered the 27 genomes into    three groups, similar to the results observed in the population    (Fig. 1a and b).  <\/p>\n<p>    A total range of 53.558.5% sequences in these B.    oleracea genomes were TEs, with long terminal repeat    retrotransposons (LTR-RTs) being the most abundant type    (Supplementary Note 2). We further    identified 4,703 to 6,253 full-length LTR-RTs (fl-LTRs) in    these genomes (Supplementary Table 9), with recently    inserted fl-LTRs enriched in centromeric regions (Fig.    1c). We revealed    continuous expansion of Copia and Gypsy in all the genomes    since four MYA (Fig. 1d). In addition, Copia    TEs were clustered into more and larger groups than Gypsy based    on sequence similarity (Fig. 1d), suggesting that    Copia was under stronger expansion than Gypsy. More than 80% of    the centromeric sequences were annotated as Copia in B.    oleracea (Fig. 1e and Supplementary    Fig. 1). Interestingly,    these enriched Copia islands in centromeres were surrounded by    high densities (>50%) of Gypsy in all the nine chromosomes    of B. oleracea. Moreover, the topologically associating    domain (TAD) structures overlapped with the Copia islands in    all nine centromeric regions (Supplementary Fig. 1). This pattern was    also found in six of ten chromosomes in B. rapa    (Supplementary Fig. 2). These results    suggest that Copia has an important role in the organization or    function of centromeres through maintaining TAD structures.  <\/p>\n<p>    We constructed an orthologous pan-genome comprising the 27    B. oleracea genomes. In total, we identified 57,137    orthologous gene families using OrthoFinder26 (Supplementary    Note 3 and Supplementary    Fig. 3). To investigate    the retention variation of homoeologous genes among these    mesohexaploid B. oleracea genomes, we further performed    syntenic orthologous gene analysis (hereafter referred to as    syntenic pan-genome). In the orthologous pan-genome,    homoeologs were assigned to one orthologous family, whereas    syntenic pan-genome separates them into different syntenic gene    families. We detected a total of 87,444 syntenic gene families    based on genomic synteny among these genomes of which 32,721,    24,902 and 22,423 families were located at LF, MF1 and MF2    subgenomes, respectively. The number of syntenic gene families    increased when adding additional genomes and approached a    plateau when n=25 (Fig. 2a), consistent with    that of the orthologous pan-genome. We further separated all    these syntenic gene families into 20,306 (23.2%), 10,086    (11.5%), 55,205 (63.1%) and 1,847 (2.1%) syntenic core,    softcore, dispensable and private gene families, respectively,    with a mean of 21,680 (41.5%), 10,724 (20.5%), 17,236 (32.9%)    and 2,675 (5.1%) per genome (Fig. 2bd). We found    significantly more TE insertions in syntenic dispensable and    private genes than in syntenic core and softcore genes (Fig.    2e), suggesting that    TEs contribute to genetic variations in these genes. The    expression levels of syntenic core and softcore genes were    significantly higher than those of syntenic dispensable and    private genes (Fig. 2f). Moreover, the    Ka\/Ks values    of the syntenic core genes were significantly lower than that    of the orthologous core genes (Supplementary Fig.    4b), supporting more    conservation of the syntenic core genes. We found that 44.6% of    syntenic private and 38.2% of syntenic dispensable genes belong    to orthologous core and softcore genes (Supplementary Fig.    4a), respectively.    This illustrates the extensive differential gene loss of    homoeologs during the evolution and diversification of B.    oleracea.  <\/p>\n<p>            a, The number of syntenic pan and core gene            families in the 27 genomes. b, Composition of            the syntenic pan-genome. The histogram shows the            frequency distribution of syntenic gene families shared            by different numbers of genomes. The pie chart shows            the proportion of different groups of syntenic gene            families. c, Percentage of different groups of            syntenic gene families in each of the 27 genomes.            d, Presence and absence information of all            syntenic gene families in the 27 genomes.            e,f, The average number of TE insertions            in genes and the expression level of genes in different            groups of syntenic gene families (two-sided Students            t test; centerline, median; box limits, first            and third quartiles; whiskers, 1.5 IQR). Different            lowercase letters above the box plots represent            significant differences (P<0.05). g,            Functional analysis (gene ontology) of lost genes in            the syntenic softcore or dispensable gene families, in            different B. oleracea morphotypes,            highlighting strong function enrichment associated with            specific metabolites. The number of lost genes in            different morphotypes is provided in the tree diagram.            h, Syntenic gene families were separated into            three groups corresponding to the numbers of homoeologs            (single-, two- or three-copy) retained from the            Brassica mesohexaploidization event. The            percentage of gene families in different pan-genome            classes for these groups is shown in each of the 27            B. oleracea genomes. IQR, interquartile range.          <\/p>\n<p>    We dived into genes that were prone to being lost in different    lineages\/morphotypes of B. oleracea. A total of 20,924    syntenic gene families were lost in one to 14 genomes, while    they were retained in 15 to 27 genomes. Among these, 2,786 and    5,139 gene families were lost exclusively in LHL and AIL,    respectively (Fig. 2g). Intriguingly, in    AIL, 556 syntenic gene families with gene loss specifically in    broccoli were enriched in functions of sulfate transport,    thioester hydrolase activity and riboflavin biosynthesis. In    comparison, 1,134 syntenic gene families with gene loss    specifically in cauliflower were enriched in nicotinamine    biosynthesis and thiamine metabolism. Similarly, syntenic gene    families with gene loss only in specific LHL morphotypes were    found to be enriched in functions related to specific    metabolites (Fig. 2g). The observations    that genes specifically lost in different morphotypes were    enriched in functions of biosynthesis or metabolism of various    nutrient contents, pointing to unique nutritional composition    or flavor of specific B. oleracea crops. In addition,    our analysis of homoeologous copy-number variation (CNV) among    B. oleracea morphotypes revealed morphotype-specific    loss of homoeologous genes, which may contribute to the    evolution of these morphotypes through variation in gene copy    dosage that is associated with expression dosage (Fig.    2h, Supplementary Note    3 and Supplementary    Tables 10 and 11).  <\/p>\n<p>    The 27 high-quality B. oleracea genomes provide    essential resources for the accurate identification of    large-scale SVs. We aligned the sequences of 26 B.    oleracea genomes to the T10 reference genome using    Nucmer27. A total of    502,701 SVs were identified using SyRI28, including    452,148 PAVs (50bp to 3.34Mb), 13,090 CNVs (50bp to    243.14kb), 2,263 inversions (1,022bp to 12.18Mb) and 35,200    translocations (9,002 intrachromosomal and 26,198    interchromosomal translocations; 505bp to 5.59Mb; Fig.    3a and Supplementary    Fig. 5a). We randomly    selected 30 large SVs (>8kb) and 30 short SVs (<8kb)    for validation. Approximately 93% of the selected large SVs    were validated by Hi-C paired-end reads; the remaining 7% could    not be validated (Supplementary Fig. 6). For the selected    short SVs, 97% were validated by long-reads; the remaining 3%    were found to be false calls (Supplementary Fig. 7 and Supplementary    Table 12).  <\/p>\n<p>            a, The distribution of GC content (3341%), gene            numbers (0200Mb1) and TE density            (20100%) in the T10 reference genome, the nonredundant            SVs (presence, 2100kb\/Mb; absence, 20400kb\/Mb and            all SVs, 10400kbMb1) among 27 genomes,            as well as the SNPs (1040kb1) and InDels            (1030kbMb1) identified in the 704 B.            oleracea accessions. b, The number of            different types of SVs from the nonredundant set of SVs            in individual B. oleracea genomes. c, The            number of SVs present in different numbers of query            genomes. The bottom lines colored in light blue, light            orange and light green mark these accessions from the            wild\/ancestral group, the AIL and the LHL,            respectively. The sample IDs colored in light orange            and light green denote accessions from            broccoli\/cauliflower and cabbage, respectively. The red            rectangle marks the accessions of broccoli\/cauliflower,            highlighting the lower number of SVs in            broccoli\/cauliflower compared to the other accessions.            d, The number of private SVs in wild B.            oleracea, broccoli\/cauliflower and cabbage genomes,            showing significantly more private SVs in wild B.            oleracea than in others (n=7 versus 5            versus 7; two-sided Wilcoxon rank-sum test; centerline,            median; box limits, first and third quartiles;            whiskers, 1.5 IQR). e, The frequency            distribution of SVs in the following five different            genomic regions: upstream (within 3kb), exon, intron,            downstream (within +3kb) and intergenic regions. The            SV ratios in the five regions were calculated for each            of the 27 genomes, and these values were then sorted            and plotted from small to large for each of the five            regions. f, The density of SV sequences per            100bp in gene bodies and 5kb flanking regions in the            27 B. oleracea genomes. The area plots mark the            maximum and minimum values across the 27 B.            oleracea genomes, and the lines denote average            values.          <\/p>\n<p>    We merged the 502,701 SVs into 56,697 nonredundant SVs. The    number of these SVs ranged from 7,449 to 9,848 per genome (Fig.    3b). A total of 50,153    nonredundant PAVs were used in our subsequent analysis. Similar    to that of orthologous and syntenic gene families, the number    of SVs increased when adding additional genomes; this increase    diminished when n=25 (Supplementary Fig.    5b). Modeling this    increase29 predicts a    total SV number of 58,4101,452. The number of shared SVs    sharply declined for the first three genomes and slowly    decreased thereafter. We identified 27 SVs present in all 26    query genomes, 168 SVs present in 2425 query genomes, 26,641    SVs present in 223 query genomes and 18,226 SVs present in    only one query genome, opposite to the trend of gene family    counts (Fig. 3c). The number of    private SVs in wild B. oleracea is significantly higher    than in broccoli\/cauliflower and cabbage, indicating extensive    loss of genetic diversity during domestication of B.    oleracea (Fig. 3c and d).  <\/p>\n<p>    SVs distributed preferably in upstream and downstream regions    of genes compared to gene bodies (Fig. 3e). Corroborating with    this, SV density was the lowest in gene bodies and increased    with distance in flanking regions (Fig. 3f), suggesting that    SVs affecting regulatory sequences are likely to be under less    stringent selection pressure than those disrupting encoding    sequences. Besides, we found that 75% of all SVs overlapped    with TEs (Supplementary Fig. 5c). We further    identified SV gene, being the closest gene to the given SV    within a 10-kb radius. In total, we determined 11,377 SV genes    based on the syntenic pan-genome, including 9,442 expressed    genes. These expressed SV genes were then separated into six    groups based on the distance between SVs and corresponding    genes (Fig. 4a). The 27 B.    oleracea genomes were separated into two groups (presence    and absence) based on the SV genotype of each SV gene. To be    independent of the reference genome used for SV calling, we    defined the genotype with more sequence as presence and the    genotype with less sequence as absence. Comparison of SV gene    expression between absence and presence groups revealed high    percentages of SVs that have an effect on gene expression,    decreasing with distance from 83% when located in the CDS    region to 66% when located in 510kb upstream of SV genes    (Fig. 4a and Supplementary    Table 13). In total, for    69% (6,526) of the 9,442 SV genes, the SV was associated with    gene expression changes. Of these 6,526 SV genes, SV presence    was associated with significantly    (P=1.481011, binomial test)    more SV genes with suppressed expression (3,536 SV genes) than    promoted expression (2,990 SV genes; Fig. 4b).  <\/p>\n<p>            a, Different types of SV genes, based on the            location of the SV relative to the gene, with data on            expression, show a high proportion of SV genes with            gene expression changes. b, The expression of SV            genes from 6,526 syntenic gene families, with separated            expression values for the absence and presence genotype            groups (of corresponding SV). The x axis shows            two groups, of which 3,536 and 2,990 syntenic gene            families associated with suppression and promotion SVs,            respectively. The y axis shows the normalized            (z score) expression values. The green\/yellow            lines link the average expression values from each            syntenic gene family for their presence and absence of            genotype groups. c, Comparison of CpG island            density and the ratio of highly methylated CpG islands            between different types of SVs in 1.5kb            (n=484 versus 369 versus 2,794; permutation            test for 10,000 times; centerline, median; triangle,            mean; box limits, first and third quartiles; whiskers,            1.5 IQR) or 3kb (n=153 versus 148 versus            1,391; permutation test for 10,000 times; centerline,            median; triangle, mean; box limits, first and third            quartiles; whiskers, 1.5 IQR). d, The            expression fold changes of SV genes between the            presence and absence of genotype groups. The black            stars below the term Suppress denote DNA methylation            modifications. The x axis shows the distance            between SV and SV genes.          <\/p>\n<p>    We also found that methylation was strongly associated with the    suppressed expression of SV genes (Supplementary Note    4 and Supplementary    Fig. 8a). We examined the    sequence signature of the SV presence genotype for the 3,536    suppression SVs and found that their CpG site density was    significantly higher than that of the 2,990 promotion SVs (Fig.    4c). The methylation    levels of these suppression SVs were also significantly higher    than that of the promotion SVs (Supplementary Fig.    8b). Both the    increased density of CpG sites and their increased methylation    levels resulted in a strong increase of highly methylated CpG    islands in suppression SVs compared to promotion SVs (Fig.    4c). Besides    suppression SVs, promotion SVs were identified that were    associated with increased expression of SV genes. We    investigated the sequence composition of promotion SVs and    found significant (P<0.001, permutation test)    enrichment of transcription factor (TF)-binding sites,    including TCP, MYB, NAC, ERF and GRAS (Supplementary Table    14). These specific    domains, together with low sequence methylation levels and few    CpG islands in promotion SVs, may cause increased transcription    of corresponding SV genes.  <\/p>\n<p>    To further assess the strength of the effect of SVs on gene    expression in B. oleracea genomes, we calculated the    mean expression of corresponding SV genes for each of the two    genotype groups (Fig. 4b). SVs affected gene    expression ranging from over tenfold reductions to over tenfold    increases, with most expression changes falling between    one-third and three times (Fig. 4b,d). Furthermore, SVs    that affect gene expression were enriched within 3kb flanking    regions of genes. These results indicate the important role of    SVs in fine-tuning gene expression levels.  <\/p>\n<p>    We then used the nonredundant 50,153 SVs to construct an    integrated graph-based genome with the T10 genome as a standard    linear base reference. By mapping reads of 704 B.    oleracea accessions to this graph-based genome, we revealed    a total of 40,028 SVs in the population (Supplementary Note    4). We randomly    selected 62 SVs, of which 59 were validated by PCR    amplification (Supplementary Fig. 9 and Supplementary    Table 15). Besides SVs, we    identified 4,901,625 SNPs and 573,033 InDels in the population.    Linkage disequilibrium (LD) analysis between these SVs and SNPs    showed that 54.78% of SVs had weak LD    (r2<0.5) with SNPs (Supplementary Fig.    10), indicating that    SVs cannot be fully represented by SNPs in this genomic study.    Of the 7,685 SV genes found in the B. oleracea    population, 4,366 SV genes were expressed and 3,216 SV genes    were used for downstream analysis (Methods). The percentage of    SVs significantly (P<0.05) associated with the    expression of SV genes ranged from 68% in the gene body to 59%    510kb away from the genes. In total, 61% of these SVs were    substantially associated with expression changes of their SV    genes, slightly less than 69% among the 27-genome assemblies.    The SV presence was substantially associated with suppressed    expression of 1,071 (55%) genes or promoted expression of 888    (45%) genes, similar to that of the 27-genome analysis (54%    suppression, 46% promotion).  <\/p>\n<p>    We also performed SV-based eGWAS analysis using 17,696    expressed genes and 40,028 SVs as traits and markers,    respectively (Methods). The expression of 8,180 genes was    significantly associated    (P<1.001010) with at    least one SV. In total, 50,076 SV signals were identified,    among which 23% (11,536) and 77% (38,540) were intrachromosomal    and interchromosomal signals, respectively (Supplementary Table    16). Of the 11,536    intrachromosomal SV signals, 1,335 were cis-regulatory    SVs, with 49% and 51% of them suppressing and promoting gene    expression, respectively. The remaining 48,741 SV signals were    trans-regulatory SVs, with 47% and 53% suppressing and    promoting gene expression, respectively. These results further    indicate the important and complex regulatory role of SVs in    gene expression.  <\/p>\n<p>    We adopted the casecontrol GWAS strategy30,31 to identify SVs    associated with different morphotypes of B. oleracea    (Methods). Using the    cauliflower\/broccoli accessions characterized by large arrested    inflorescences as the case group, we obtained 1,655 SV signals    with P<8.161045,    representing the top 5% signals (Fig. 5a). These SVs were    assigned to 492 SV genes (SV in gene bodies or 3kb flanking    regions), of which 378 were expressed, harboring 122    suppression and 109 promotion SVs. One suppression SV    (P=1.5410108; 112bp) was    located 643bp upstream of the translation start site of the    gene BoPNY (PENNYWISE; Fig. 5b), which functions in    maintaining inflorescence meristem identity and floral whorl    morphogenesis32. This SV was    under strong negative selection in the arrested inflorescence    morphotype, being present in 2% (4 of 195) of    cauliflower\/broccoli accessions, contrasting to a presence of    89% (386 of 434) of control group accessions (Fig.    5c). More importantly,    BoPNY was significantly higher expressed    (P=3.00103) in the absence    genotypes (the major allele in cauliflower\/broccoli) than in    the presence genotypes (Fig. 5d). The methylation    levels of both the presence SV and its flanking sequences were    significantly (P=8.55106)    higher than that of the absence genotype, which was negatively    associated with the transcription level of BoPNY (Fig.    5e). We also identified    two promotion SVs located closest to gene BoCKX3.    Cytokinin oxidase (CKX) catalyzes the degradation of cytokinin    and thus negatively regulates cell proliferation of    plants33. Mutants of    ckx3 and its ortholog ckx5 form more cells and    organs become larger34. One SV (SV1;    P=5.8110162) involved a    316-bp Helitron-type TE insertion located 86bp downstream of    the translation stop site of BoCKX3 (Fig. 5f). SV1 was present in    97% (208 of 214) of the cauliflower\/broccoli accessions,    contrasting to only 0.2% (1 of 431) of accessions in the    control group (Fig. 5g). The other SV (SV2,    257bp) was located in last exon of BoCKX3, resulting in    a frame-shift mutation. SV2 was present in only 0.5% (1 of 213)    of cauliflower\/broccoli accessions, compared to 29% (126 of    434) of accessions in the control group (Fig. 5f,g). These two SVs    form four potential haplotypes of BoCKX3; however, the    haplotype containing two SVs does not exist in our B.    oleracea population (Fig. 5h). The expression of    BoCKX3 in haplotype 3 was significantly higher than in    haplotypes 1 and 2 (Fig. 5i), supporting the    expression-promoting effect of this downstream SV1.    BoCKX3 was highly expressed in leaves but not in other    organs such as the curd during curd development in    cauliflower\/broccoli (Fig. 5j). One hypothesis is    that BoCKX3 negatively regulates leaf growth, thus    saving energy for fast proliferating of curds. These examples    demonstrate the bidirectional impacts of SVs on gene    expression, specifically associated with morphotypes of    cauliflower\/broccoli.  <\/p>\n<p>            a, Manhattan plot showing the SV signals            associated with cauliflower\/broccoli (significance was            calculated by two-tailed Fishers exact test. A            Bonferroni-corrected P<0.05 was interpreted            as significant). The light red dots show the top 5%            P values and deep red dots show the top 1%            P values. b, One SV is associated with            BoPNY. c, The number of accessions with            presence or absence SV (associated with BoPNY)            genotype for broccoli\/cauliflower accessions and all            the other accessions (statistical test: two-tailed            Fishers exact test). d, Expression comparison            of BoPNY between SV presence and absence            accessions (two-sided Students t test;            centerline, median; box limits, first and third            quartiles; whiskers, 1.5 IQR). e, Sequence            methylation level around BoPNY between absence            and presence genotype groups, which is negatively            associated with the expression level of the gene.            f, Two SVs associated with BoCKX3.            g, The number of accessions with presence or            absence SV (associated with BoCKX3) genotypes            for broccoli\/cauliflower accessions and all other            accessions (statistical test: two-tailed Fishers exact            test). h, The four possible haplotype groups are            formed by two SVs. Haplotype 4 was not detected in our            population. i, Expression comparison of            BoCKX3 between the three haplotype groups            (two-sided Students t test; centerline, median;            box limits, first and third quartiles; whiskers, 1.5            IQR). j, Expression of BoCKX3 in            different tissues of cauliflower and cabbage,            highlighting high expression of this gene in leaf 2 of            cauliflower. Leaf 1 denotes fresh leaf before curd            initiation; leaf 2 denotes fresh leaf during curd            development; curd 1 denotes developing curd; curd 2            denotes mature curd. N indicates a missing value as            cabbage makes no curds.          <\/p>\n<p>    GWAS analysis was also performed using cabbage accessions as    the case group, characterized by the leafy heads (Supplementary    Note 5 and Supplementary    Figs. 11 and 12). We revealed two    promotion SVs (SV1 and SV2) located closest to BoKAN1,    which regulates leaf adaxial\/abaxial    polarity35,36,37. SV1 was    introduced by a 970-bp TE (PIF\/Harbinger) insertion, which was    under strong negative selection in cabbage accessions    (Supplementary Fig. 11b and    c), and SV2 was    introduced by a 157-bp TE (Helitron) insertion, which was also    under negative selection in cabbage accessions. Among the four    haplotypes formed by the two SVs (Supplementary Fig.    11d), BoKAN1    was significantly    (P=3.60107) lower expressed    in haplotypes 1 and 2 that lacked SV1 than in haplotypes 3 and    4 that harbored SV1 (Supplementary Fig. 11e). We also    revealed one promotion SV    (P=3.691091) located closest    to BoACS4 (Supplementary Fig. 12a), which encodes    the key regulatory enzyme involved in the biosynthesis of the    plant hormone ethylene38,39. This insertion    was under strong negative selection in cabbage (Supplementary    Fig. 12b). Expression of    BoACS4 in cabbage accessions lacking this insertion was    significantly lower    (P=1.901014) than in control    group accessions harboring the insertion (Supplementary Fig.    12c).  <\/p>\n<p>    Another interesting SV was present in all 18 ornamental kale    accessions, but absent in any other accession. This SV was a    280-bp TE (PIF\/Harbinger) insertion, located 289bp upstream of    the translation start site of a MYB TF (hereafter referred to    as BoMYBtf; Fig. 6a and b). Previously, MYB TFs    were found to be associated with purple traits in cultivars of    B. oleracea, such as kale, kohlrabi and    cabbage40. The expression    level of BoMYBtf was significantly higher in ornamental    kale than in other morphotypes (Fig. 6c), indicating that    this TE insertion was associated with the promoted expression    of BoMYBtf. TF-binding sites (that is NAC, TCP and ERF),    which were substantially enriched in promotion SVs as    aforementioned, were also found in this PIF\/Harbinger TE    sequence (Fig. 6d). We hypothesize    that these TF-binding sites, hitchhiking with the TE insertion,    are causal factors promoting the transcriptional activity of    BoMYBtf.  <\/p>\n<p>            a, One SV (PIF\/Harbinger-type TE insertion) is            associated with BoMYBtf. b, The number of            accessions with presence or absence of SV (associated            with BoMYBtf) genotypes for ornamental kale            accessions and all other accessions (statistical test:            two-tailed Fishers exact test). c, Expression            comparison of BoMYBtf between SV presence and            absence accessions (two-sided Students t test;            centerline, median; box limits, first and third            quartiles; whiskers, 1.5 IQR). d, TF-binding            elements identified in the PIF\/Harbinger insertion.            e, Schematic diagrams of reporter constructs            used for the LUC\/REN assay. The upstream sequences of            BoMYBtf from ornamental kale T18 (with TE,            1,239bp), wild B. oleracea T10 (without TE,            951bp), cabbage T20 (without TE, 968bp) and the SV            sequence (TE itself, 280bp). The empty vector was set            as mock control. The activities of these promoter            constructs are reflected by the LUC\/REN ratio            (two-sided Students t test; data are presented            as the means.d.). f, Distribution of the            PIF\/Harbinger insertion in the 27 B. oleracea            genomes. g, Boxplot showing normalized (z            score) expression of 44 syntenic gene families, with a            PIF\/Harbinger insertion within a 3kb region from the            nearest genes. The light blue and light purple            backgrounds denote these syntenic gene families with            PIF\/Harbinger insertions located within 1.5kb and            3kb to 1.5kb, respectively, of corresponding gene            members (red stars); whereas the gray dots denote their            syntenic gene members without PIF\/Harbinger insertion            (centerline, median; box limits, first and third            quartiles; whiskers, 1.5 IQR).          <\/p>\n<p>    The role of this PIF\/Harbinger TE in increasing transcription    of BoMYBtf in ornamental kale was further validated by    the luciferase reporter experiment (Fig. 6e). Briefly, the MYB    promoters of ornamental kale T18 (with TE), wild B.    oleracea T10 (without TE), cabbage JZS T20 (without TE) and    the SV (TE itself) were fused in pMini-LUC as reporters and    transfected into tobacco leaves (Methods). The    LUC\/REN ratio of mini-T18 and mini-SV was significantly higher    (P<0.05) than that of other samples, while no    significant difference was observed between mock, mini-T10 and    mini-JZS, confirming the expression promotion effect of this    PIF\/Harbinger TE. Moreover, we investigated this PIF\/Harbinger    TE across all the 27 B. oleracea genomes. We found 60    insertions located within 3kb flanking regions of genes, with    44 associated genes being expressed (Fig. 6f). When comparing    their expression among the 27 genomes, 31 genes harboring the    insertion showed higher expression levels than their    counterparts lacking the insertion, whereas this insertion in    the remaining 13 genes did not result in increased expression    (Fig. 6g). These results    further support the common transcription promotion function of    this PIF\/Harbinger TE insertion in B. oleracea genomes.  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>More:<br \/>\n<a target=\"_blank\" href=\"https:\/\/www.nature.com\/articles\/s41588-024-01655-4\" title=\"Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica ... - Nature.com\" rel=\"noopener\">Large-scale gene expression alterations introduced by structural variation drive morphotype diversification in Brassica ... - Nature.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> High-quality genome assembly of representative morphotypes To construct a pan-genome that encompasses the full range of genetic diversity in B. oleracea, we analyzed the resequencing data of 704 globally distributed B. oleracea accessions covering all different morphotypes and their wild relatives (Supplementary Tables 1 and 2) <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/large-scale-gene-expression-alterations-introduced-by-structural-variation-drive-morphotype-diversification-in-brassica-nature-com\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[],"class_list":["post-1122312","post","type-post","status-publish","format-standard","hentry","category-genome"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1122312"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=1122312"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/1122312\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=1122312"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=1122312"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=1122312"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}