A supergene underlies linked variation in color and morphology in a Holarctic songbird – Nature.com

Posted: November 28, 2021 at 9:52 pm

To evaluate population structure in redpolls, we sequenced genomes of 73 individuals from the three described redpoll ecotypes (Supplementary Data File1). Our results from whole genomes confirm findings from a previous study using a reduced-representation approach (ddRAD-seq)18: redpolls lack population genetic structure by either geography or ecotype boundaries (Fig.1b), with spatially explicit clustering analyses supporting K=2, and failing to group all individuals according to their species classification. In addition, principal component analysis (PCA) (Supplementary Fig.1) of 25 million single nucleotide polymorphisms (SNPs) further reveal that PC1 explains only 3.14% of total genomic variation across all ecotypes and the majority of their global distribution. However, both PCA and population assignment analyses nonetheless indicate some degree of genetic clustering (Fig.1b, Supplementary Fig.1). PC1 visually separated samples into three clusters, with a left-most cluster containing both lesser and common redpolls, a right-most cluster containing almost entirely hoary redpolls, and a central cluster containing a mix of both common and hoary redpolls. However, many localities were recovered in all three groups, suggesting no influence of geography on genetic structure (Supplementary Fig.2). Because neither geography nor ecotype were perfectly assigned to clusters, we were interested in identifying the genomic regions responsible for generating these clusters, and in investigating their potential evolutionary impacts.

To identify divergent regions of the genome in redpolls, we aligned sequences to a brown-capped rosy-finch (Leucosticte australis) reference genome and searched for local peaks of differentiation between PCA clusters by calculating FST and dXY in 25-kb windows across all chromosomes including all ecotypes. These scans identified a highly differentiated region across 55Mb of chromosome 1 (Fig.2a, c, Supplementary Fig.3). Rerunning PCA and population assignment analyses after the removal of this chromosome either eliminated, or reduced, the variation explained (Supplementary Figs.1 and 4), demonstrating the strong contribution of this region to total genetic differentiation in redpolls. Further, conducting a PCA using only chromosome 1 qualitatively produced much stronger definition in the three clusters originally identified (Fig.2e). Within-group heterozygosity of the middle cluster for the highly differentiated region (0.626) was roughly double that of the outside clusters (0.388 and 0.378 for left and right clusters, respectively), suggesting that the PCA groups represent three possible inversion genotypes. We hereafter refer to these putative genotypes as AA, AB, and BB in left to right order across PC1. We do not distinguish between the ancestral and inverted haplotyes, and use the term inversion to refer to the inversion region rather than the inverted haplotype.

a Chromosome 1 SNPs significantly associated with phenotype (black dots) using mixed model analysis in GEMMA with an alpha of 1105 to correct for multiple comparisons. Numbers indicate the two most significant SNP associations outside (1,2) and inside (3,4) the inversion region and correspond to genotypes in (b) for those specific SNPs. Lightest yellow cells indicate individuals homozygous for the reference allele, darkest indicate homozygous for the alternative allele, intermediate indicate heterozygotes, and white are missing data. c Pi (black and gray lines for AA and AB individuals, respectively) and dXY (green line) in 50-kb windows for the first 80Mb of chromosome 1. d LD calculated as r2 comparing chromosome 1 (black line) to 4 of the other largest chromosomes (shades of gray) and LD within the supergene (black) to the rest of chromosome 1 (gray). e PCA of chromosome 1 with a 40% minor allele frequency cutoff showing three main clusters along PC1. Taxa colors correspond to Fig.1. All sequences were aligned to a brown-capped rosy-finch (Leucosticte australis) reference genome. Source data are provided as Source data file.

Broadly, the pattern of divergence recovered here is consistent with a large pericentric chromosomal inversion10,19,20, including abrupt changes in FST corresponding to the inversion breakpoints, and a central spike at the centromere (Fig.2a, c). Reduced recombination within an inversion is expected to produce patterns of elevated linkage disequilibrium (LD), along with a decrease in nucleotide diversity along the inversion in homozygotes. These patterns are both confirmed here, including a within-cluster decrease in homozygote (AA and BB) nucleotide diversity (), and elevated LD within the inversion when both compared to regions outside the inversion and along other chromosomes (Fig.2c, d). To further characterize the inversion, we selected one individual each from the AA and BB genotype groups to resequence using Oxford Nanopore Technologies MinION long-read sequencing. Structural variant calling with SVIM v1.4.221 identified an inversion extending from 18.9 to 75Mb along chromosome 1; however, overall number of reads supporting the variant call was low due to the size of the inversion and low yield from the MinION runs.

Because redpolls overlap extensively in distribution, species identification is made primarily on the basis of a suite of morphological characters, including plumage coloration (extent of brown and red pigments), bill size and shape, and body size. Transitioning from the AA, to AB, to BB genotype also broadly mirrors a transition in phenotype from dark to light plumage coloration, where the AA genotype is associated with dark plumage, BB is associated with light plumage, and AB is intermediate. Mason and Taylor18 paired phenotypic measurements of plumage and bill morphology with gene expression data to reveal a strong, linear correlation between gene expression and morphology (see ref. 18, Fig.3a). Superimposing inversion genotype on this relationship for the same individuals reveals that inversion homozygotes form the extremes of these categories, while the single heterozygote forms an intermediate (Fig.3a). Although sample size in this comparison is small, it provides strong independent evidence that the chromosome 1 inversion plays a large role in redpoll morphology, and that phenotypic variation may be additive with respect to inversion haplotype copy number.

a Phenotype PC1 and gene expression data (as mds PC1) from Mason and Taylor (2015) colored by inversion genotype, with extreme phenotypes produced by homozygotes shown in gold and pink, and an intermediate phenotype produced by a heterozygote shown in green. b Latitude of sampling site for each individual grouped by inversion genotype demonstrating B haplotypes increase in frequency with latitude (AA n=37, AB n=7, BB n=28). Box hinges represent the first and third quartiles, with centers representing medians. Whiskers represent maximum and minimum values except for BB, where outliers are values exceeding 1.5 times the interquartile range. Source data are provided as Source data file.

In total, we identified 498 annotated genes within the chromosome 1 inversion region. While all genes within the inversion are likely to be linked through the suppression of recombination, and thus could be contributing equally to phenotype, we nonetheless attempted to narrow down candidate gene regions in order to infer which biological processes and pathways were potentially influenced by the inversion and identify associated regions elsewhere in the genome. To do so, we applied two approaches: (1) by compiling a list of genes that fell within the highest FST peaks, and (2) by identifying SNPs significantly associated with species classification using a genome-wide efficient mixed model analysis (GEMMA)22. While species identity does not perfectly correlate with redpoll phenotype because they exhibit continuous phenotypic variation, the fact that species classification relies almost entirely on morphology makes it a reasonable proxy for total phenotypic variation. Finally, we annotated missense mutations within the identified genes based on a variants location with respect to open reading frames using SNPeff v4.323. Our results suggest that the vast majority of SNPs associated with phenotypic variation in redpolls are within or close to the inversion: 99% of 20,443 SNPs significantly associated with redpoll phenotype were located on chromosome 1, with only 167 located elsewhere in the genome (Supplementary Fig.3). To evaluate the reliability of these SNPs in predicting phenotype, we used a Bayesian sparse linear mixed model in a leave-one-out cross validation framework. Predicted phenotypes suggest that allelic variation of the identified SNPs explain a significant proportion of the observed phenotypic variation (R2=0.79; Supplementary Fig.5).

We filtered annotations for genes that either contained, or were adjacent to, significant SNPs as identified by GEMMA or FST outlier analysis, resulting in 322 genes across 7 chromosomes (Supplementary Data File2). Within this gene set, the gene ontology category of biological regulation was overrepresented. While this category is broad and difficult to interpret meaningfully, we note that a number of genes on chromosomes 1 and 2 identified by our analysis had annotations that either relate to coloration or bird bill development or have been implicated in coloration or bird bill development in previous studies (Table1).

Within the chromosome 1 inversion region, some of the most differentiated and significantly associated regions include key genes relating to melanin synthesis: TYR, TYRP2, FZD4, TSKU, FSTL124,25,26,27,28,29. Both TYR and TYRP2 produce melanogenic enzymes that directly synthesize melanin. In addition, FZD4 produces a G protein-coupled receptor in the Wingless-type signaling pathway, which acts as one of the main pathways affecting the regulation of MITF29,30. Previous studies of gene expression in redpolls18 also reported differential gene expression of FZD3, suggesting that Frizzled family receptors may play a significant role in further modulating melanogenesis in this group.

Redpoll phenotype also varies in the amount of red feather coloration resulting from carotenoid pigmentation. Carotenoid pigments are unique in animals in that they cannot be synthesized endogenously and must instead be taken in through their diet before they can be deposited in feathers. Previous studies of genes involved in carotenoid pigmentation in birds highlight the role of two scavenger receptor genes (SCARB131, SCARF232). The proteins produced by these genes likely function in the recognition of the lipoproteins that transport the hydrophobic carotenoid pigments. We identified two genes (ATP8A2, STARD13) within the inversion region that may also be related to carotenoid pigmentation through their involvement in lipid transport. Specifically, STARD13 produces a stAR-related lipid transfer protein, which as a protein family, are involved in intracellular lipid transport, metabolism, and cell signaling events33. While further validation studies are required to understand the role of these genes in carotenoid variation, their functions appear to be in line with other recently reported genes associated with carotenoid pigmentation.

Two additional genes within the chromosome 1 inversion region that could be affecting phenotype are well-characterized: TSKU and FSTL1 are known antagonists of bone morphogenic protein (BMP) signaling24,26. However, the effects of BMP inhibition may influence phenotype in at least two disparate ways: through the regulation of melanogenesis, or by contributing to differences in bill morphology. BMPs are regulators with important roles in epidermal homeostasis and hair follicle growth and pigmentation34. Specifically, BMP4 and BMP6 products have both been demonstrated as inhibiting or stimulating melanogenesis, respectively34. However, other studies have also implicated BMP4 in the development of bird bill morphology35,36. For example, studies of BMP4 in Darwins Finches find strong correlations of BMP4 expression with both bill depth and width35, two traits known to vary in redpolls18,37. Similar to Frizzled, TSKU was also shown to be differentially expressed in redpolls18. We therefore emphasize the observed differences in TSKU and FSTL1 documented here could influence biologically important phenotypic variation in redpoll coloration, bill morphology, or both. Given the implication of BMP4 in multiple pathways affecting different phenotypes, there could be pleiotropic effects resulting from one or more loci altering BMP signaling. Taken together, these candidate loci provide evidence that multiple aspects of redpoll phenotype are likely affected by a single genomic region maintaining associated SNPs from numerous genes in tight physical linkage.

While nearly all associated SNPs with gene annotations were within the chromosome 1 inversion region, three additional genes containing, or neighboring, associated SNPs may also have important phenotypic effects. Two of theseFILIP1L (chromosome 1 but outside of the inversion region), and SFRP4 (chromosome 2)act as regulators in the WNT pathway, suggesting they likely play roles in further modulating melanogenesis38,39. Similar to TSKU and FSTL1, SFRP4 has also been demonstrated to regulate BMP, further emphasizing the possibility of singular or joint effects on plumage coloration and bill morphology.

A third locus outside of the inversion region near an associated SNP on chromosome 2 includes a polyketide synthase (PKS). While this gene was annotated based on similarity to Mycobacterium PKS15/1, its function in birds has yet to be fully validated. However, its synteny with RAB18, and YME1L1, suggests homology with a PKS described in budgerigars (Melopsittacus undulatus)40. Functional validation through yeast-based expression demonstrated that the budgerigar PKS plays a critical role in the accumulation of red/yellow, parrot-specific pigments known as psittacofulvins. The association of PKS with redpoll phenotype indicates that it might play a similar role for organisms that contain carotenoids instead of psittacofulvins. While this requires further investigation, PKSs have been demonstrated elsewhere as important in animal pigment biosynthesis41.

Broadly, redpoll phenotype appears to function as a balanced polymorphism resulting from a 55-Mb inversion that affects plumage coloration and bill morphology. Genetic associations that include loci outside of the inversion region suggest that phenotype is likely modulated further by several independent gene regions to generate the varied forms seen across all redpoll ecotypes. Examination of genotypes at SNPs associated with redpoll morphology (Fig.2b) suggest that the inversion region primarily separates the hoary redpoll from both the common and lesser redpolls, while the additional associations with other genomic regions separate the lesser redpoll from both the hoary and common redpolls. These results demonstrate that the chromosome 1 inversion contains multiple, linked genetic elements that together affect a suite of phenotypic traits in redpolls, providing evidence that redpoll phenotype is broadly controlled by a supergene genetic architecture11. As lesser redpolls form the darkest and smallest end of the redpoll phenotype distribution, the associated SNPs located outside the inversion may also be additive with respect to overall phenotype. Given the range restriction and more extreme phenotype of the lesser redpoll, there is less opportunity for disassortative mating, and its unclear how the derived SNPs outside of the inversion that further modulate phenotype interact with the B inversion haplotype. A previous study of an avian supergene in white-throated sparrows (Zonotrichia albicollis)10 demonstrated that one of the supergene haplotypes in sparrows had likely introgressed from a closely related species. However, topology weighting across windows of the redpoll supergene favored a topology that included a sister relationship between the two haplotypes, with a combined average weight of 54% among the three topologies that included this sister relationship (Supplementary Fig.6), providing evidence that the redpoll supergene likely evolved within the redpoll lineage42.

Considerable theoretical attention has recently been given to the evolution and degradation of supergenes43,44. A primary consequence of supergene-bearing inversions is increased mutational load12,44 due to the difficulty of purging deleterious mutations in the absence (or severe reduction) of recombination. This simple scenario could result in a balanced polymorphism stemming from associative overdominance, where inversion heterozygotes perform best because heterozygosity masks some of the deleterious mutations12. However, redpolls heterozygous for the inversion appear to occur in fewer numbers than homozygotes (7/73 samples in this study), suggesting an alternative mechanism may be responsible for maintaining the polymorphism. Given the presence of all three inversion genotypes in redpolls, no combination of the supergene appears to be lethala finding in contrast to other recently described supergenes of similar size9,45. Because there is no lethal inversion genotype, recombination likely occurs regularly in homozygotes (and possibly at low levels in heterozygotes), potentially allowing for some purging of deleterious mutations. This could have a considerable influence on the maintenance of the variation, and the evolutionary consequences of this supergene.

Understanding the effects of the redpoll supergene, and the forces responsible for its maintenance, is difficult. In the absence of selection (imposed by the environment, or through mate choice), the supergene would function as a single locus with one of the haplotypes eventually becoming fixed or lost due to drift12. Even with some selection, high levels of migration (between inversion genotypes) would swamp out any loci contributing to local adaptation. The persistence of the redpoll supergene is therefore likely dependent on both selection and migration. One scenario that is supported by field data is that the supergene remains balanced through assortative mating. Redpolls often mate assortatively46, but, intermediates and mixed pairs have also been observed from multiple localities37,47. Thus, the strictness of assortative mating may vary depending on the locality or may relax during irruptive population years47. Relaxation of mate choice and mixed pairings would produce the intermediate number of inversion heterozygotes seen in our data, and ultimately maintain the supergene as a stable polymorphism. However, this scenario alone does not provide an explanation for the maintenance of latitudinal differences between ecotypes. Furthermore, no hybrid zone has ever been documented in redpolls, which would be expected under a strict assortative mating scenario. While regions of hybridization have been suggested in places like Iceland, where high color variation exists48,49, previous genetic studies have not recovered support for this hypothesis50.

The phenotypes produced by the supergene are likely subject to environmentally mediated selection: notably, the more northerly distributed redpoll ecotype demonstrates features associated with high-latitude adaptation in other bird species (e.g., whiter color, smaller bill)51,52. Despite including some individuals sampled during the non-breeding season (n=27), we are able to detect differences in latitude by inversion genotype group, with B haplotypes significantly more common at higher latitudes (Fig.3b). This pattern holds when examining only breeding season birds. While it is plausible that an alternative locus is affecting ecotypic distribution, the overall low levels of background genetic variation reflect ongoing gene flow within this system. This pattern could instead reflect incomplete lineage sorting and recent divergence times, however, tests for introgression using an ABBA-BABA framework detected a significant signal of gene flow among redpoll taxa (D=0.0027, p=0.0003). Gene flow among ecotypes would be expected to disrupt linkage between any latitude-associated loci and phenotype through recombination unless those loci were tightly linked as in an inversion. In light of the link between the redpoll supergene and phenotype and differences in breeding distribution between ecotypes18, the supergene may impart local adaptation to the environment. However, given the detection of inversion heterozygotes and the presence of gene flow, the inversion likely does not influence reproductive isolation. Thus, redpolls appear to function as a single species harboring ecotypic variation, rather than as three distinct species.

To explore the evolutionary conditions under which the observed pattern of the inversion polymorphism can remain balanced, we used the program SLiM53 to simulate data under two spatial models of evolution informed by the aspects described above (Fig.4a, b). Both models simulated 100-kb chromosomes, including a 50-kb inversion that contributed to phenotype, in diploid individuals54. The first model included one population with spatially varying selection along the y-axis to approximate differences in fitness for a particular inversion genotype by latitude. In addition, we included assortative mating as determined by an adjustable parameter, and spatial competition such that individuals surrounded by fewer individuals in space received an increase in fitness. The second model also included spatially varying selection but considered two ecotypes as two populations with gene flow. We then varied the strength of selection and the strength of assortative mating or migration parameters and quantified (1) whether or not a simulation resulted in a stable polymorphism, and (2) the inversion genotype ratios that were produced. We compared these ratios to the inversion genotype ratio in redpolls captured by our sampling. These simulations revealed that the strength of assortative mating, or amount of migration, played a larger role in the balancing of the inversion polymorphism than selection did at the levels tested (Fig.4c, d; Supplementary Table1). Regardless of the strength of selection, weak assortative mating or high migration invariably led to the loss of an inversion haplotype due to drift. Strong assortative mating or low levels of migration did maintain both haplotypes but failed to produce inversion heterozygotes. Further, all levels of selection produced spatial stratification of inversion genotypes along the selection gradient.

a, b 1000 diploid individuals were randomly placed in (x,y) coordinate space across one population in model 1 and two populations in model 2, respectively. Phenotype was additive, determined by inversion genotype, with blue squares representing BB individuals, green squares represent AA individuals, and cyan squares represent AB individuals. The strength of selection was determined for all individuals by a combination of the difference between phenotype and position along the y-axis (dotted line), and by a varied selection parameter. The strength of assortative mating or gene flow (solid arrow) was included in model 1 and model 2, respectively, and was varied across iterations. Model 1 also included a fitness adjustment for individuals based on the number of other individuals close by (solid circle). c, d Results from model 1 and model 2, respectively, show the similarity between the genotype ratios produced by simulation and genotype ratio recovered in our sampling of redpolls, where red represents the greatest similarity, gray represents the least similarity, and white represents simulations that failed to produce a stable polymorphism. Scale bar numbers represent the difference between the simulated and empirical genotype ratios with sign indicating either greater or fewer simulated heterozygotes. Numbers inside a cell indicate that only a portion of the 50 simulation iterations produced a stable polymorphism.

While these models are relatively simple and only represent two possibilities, they provide a starting point for further exploration of the complex dynamics that affect the maintenance of supergenes. For example, in redpolls, these simulations suggest that even very weak selection can produce the spatial variation seen among ecotypes, and that some relaxation of assortative mating is likely occurring, as has been proposed for populations in Canada and Alaska (USA)37 and Norway47.

As whole-genome sequences proliferate, an emerging body of literature is providing empirical evidence of intraspecific variation maintained through inversion polymorphism9,10,14,15,55. The maintenance of redpoll ecotypes via an inversion across an environmental gradient places redpolls within this growing number of species. In some cases, such as monkeyflowers (Mimulus guttatus)14, inversion polymorphisms may confer sex-specific effects, and can be maintained within a population through a balance of positive and negative fitness interactions. In other cases, though not exclusive to sex-specific effects, inversions may affect phenotypes related to local adaptation, and species distributed across a heterogeneous environment may retain an inversion polymorphism through spatially varying selection, as suggested here for redpolls. This has recently been demonstrated in seaweed flies (Coelopa frigida)15 and deer mice (Peromyscus maniculatus)55. In addition, studies of Drosophila have reported clinal variation in multiple survival traits controlled by an inversion as a result of spatially varying selection across latitude56. These findings in Drosophila highlight the need for further investigation into the selection pressures and fitness effects of the inversion we report in redpolls.

We provide evidence from whole-genome sequence data of loci associated with redpoll plumage coloration and bill morphology contained within a ~55-Mb inversion supergene. While some authorities classify redpolls as three separate species (e.g., ref. 57), we find no evidence of genome-wide population genetic structure consistent with current taxonomy. Instead, we provide evidence that the suite of morphological traits used to describe redpoll species differences are linked within the identified supergene. The presence of all possible inversion genotypes suggests there are no lethal supergene combinations and indicate that while these traits are likely involved in local adaptation, they are not involved in reproductive isolation. Though breeding distributions vary latitudinally, even minor levels of contemporary gene flow within broad areas of sympatry likely maintain these traits as stable polymorphisms. Manipulations involving common garden experiments, or aviary crosses will help elucidate the strength of selection and may reveal additional unknown genetic interactions with the supergene that are affecting the evolution of redpolls.

With the explosive growth in the number of sequenced genomes and increasing sophistication of analytical tools, detecting structural variants or complex genetic architectures is likely to become common. The large size and high gene content of these classes of variants may in some cases translate to large evolutionary effects. While key theoretical work continues to emerge, the further exploration of empirical patterns between phenotype and supergenes will provide useful insight into the evolutionary effects of similar genetic architectures for a wide range of organisms.

See the rest here:
A supergene underlies linked variation in color and morphology in a Holarctic songbird - Nature.com

Related Posts