{"id":3146,"date":"2012-10-03T21:18:34","date_gmt":"2012-10-03T21:18:34","guid":{"rendered":"http:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/a-map-of-rice-genome-variation-reveals-the-origin-of-cultivated-rice\/"},"modified":"2012-10-03T21:18:34","modified_gmt":"2012-10-03T21:18:34","slug":"a-map-of-rice-genome-variation-reveals-the-origin-of-cultivated-rice","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/a-map-of-rice-genome-variation-reveals-the-origin-of-cultivated-rice\/","title":{"rendered":"A map of rice genome variation reveals the origin of cultivated rice"},"content":{"rendered":"<p><p>Sampling and sequencing    <\/p>\n<p>    The cultivated and wild rice accessions were all from large    collections of rice accessions preserved at the China National    Rice Research Institute in Hangzhou, China, and the National    Institute of Genetics in Mishima, Japan. The accessions were    selected on the basis of the germplasm database records of    phenotypic data and sampling localities to maximize genetic and    geographic diversity. The collection was maintained by selfing    in the laboratories. For each accession, genomic DNA from a    single plant was used for sequencing, and seeds derived from    the same plant were used for following field trials. In total,    the genomes of 1,083 O. sativa accessions, 446 O.    rufipogon accessions and 15 accessions of outgroup species    were sequenced on the Illumina Genome Analyzer IIx generating    73-bp (or 117-bp) paired-end reads, each to approximately    onefold (for O. sativa accessions), twofold (for O.    rufipogon accessions) or threefold (for outgroup species)    coverage. The detailed information, including geographic origin    and sequencing coverage of the rice accessions, was listed in        Supplementary Tables 2, 7 and 8. Library construction and    sequencing of these accessions were performed as    described21. One    representative accession of O. rufipogon, W1943, was    sequenced on the Illumina HiSeq2000, generating 100-bp    paired-end reads with 100-fold genome coverage. An    amplification-free method of library preparation49 was used in    deep sequencing of the rice accession, which reduced the    incidence of duplicate sequences, thus facilitating genome    assembly and variation analysis.  <\/p>\n<p>    The paired-end reads of all the rice accessions were aligned    against the rice reference genome (IRGSP 4.0) using the    software Smalt (version 0.4) with the parameters of -pair 50,    700 and -mthresh 50. SNPs were called using the Ssaha Pileup    package (version 0.5) with detailed procedure described    previously21. Genotypes of    the rice accessions, including 1,083 O. sativa    accessions, 446 O. rufipogon accessions and 15    accessions of outgroup species, were further called at the SNP    sites from the Ssaha Pileup outputs. The genotype calls in 15    accessions of outgroup species were used to determine the    ancestral states of SNPs in O. sativa and O.    rufipogon. SNPs in coding regions, which were defined based    on the gene models in the RAP-DB (release 2), were then    annotated to be synonymous or non-synonymous for calculating    the non-synonymous\/synonymous ratio and dN\/dS    ratio. The genotype data set of the 1,529 rice accessions    (1,083 O. sativa accessions and 446 O. rufipogon    accessions) was generated on the basis of the calls in each    rice accession. Seven sets of genome sequences, which included    bacterial-artificial-chromosome-based Sanger sequences and    high-coverage resequencing data, were used to assess the    accuracy of the genotype data sets (Supplementary    Table 6). The wild rice accessions with sequencing coverage    >9 were selected to investigate the heterozygosity based on    the overlapped reads that were aligned onto the reference    sequence. For each accession, the proportion of heterozygosity    genotypes was calculated at the polymorphic sites (Supplementary    Table 3).  <\/p>\n<p>    The software Haploview was used to calculate linkage    disequilibrium with default settings, using SNPs with    information in 446 O. rufipogon accessions50. Pairwise    r2 was calculated for all the SNPs and then    averaged across the whole genome. The matrix of pairwise    genetic distance derived from simple SNP-matching coefficients    was used to construct phylogenetic trees using the software    PHYLIP51 (version    3.66). The software TreeView and MEGA5 were used for    visualizing the phylogenetic trees. Principal component    analysis of the SNPs was performed using the software    EIGENSOFT52. The sequence    diversity statistics () and the    population-differentiation statistics (FST)    were computed using a 100-kb window. The value of     was calculated for each group    in O. rufipogon and O. sativa, respectively, and    the ratio of  in the full    population (or each clade) of O. rufipogon to that in    the full population (or corresponding subspecies) of O.    sativa was used to detect selective sweeps. The genomic    regions where both O. rufipogon and O. sativa    show a low level of genetic diversity were excluded for further    analysis. To adopt appropriate thresholds to reduce the    false-positive rate but also retain true selection signals,    thresholds were chosen on the basis of both whole-genome    permutation tests and signals at known loci. Permutation tests    were performed to estimate the genome-wide type I error rate    and determine the threshold to call selective sweeps (see        Supplementary Information section 2 for    details)53. The method    cross-population extended haplotype homozygosity (XP-EHH) was    also tested for detecting selective sweeps using the software    xpehh54 (<a href=\"http:\/\/hgdp.uchicago.edu\/Software\/\" rel=\"nofollow\">http:\/\/hgdp.uchicago.edu\/Software\/<\/a>)    (Supplementary    Fig. 20). The genetic distance between two clades was    computed based on the matrix of pairwise genetic distance,    where the distance of all pairs of accessions from the two    clades were retrieved and averaged. A custom Perl script was    developed to plot all O. rufipogon accessions, using the    public geographic information of world borders from the    Thematic Mapping data set (version 0.3). The computational    simulations under different demographic scenarios were    performed using the program SFS_CODE40.  <\/p>\n<p>    For the O. rufipogon population, approximately five    seeds for each accession from the collection of wild rice were    germinated and planted in the experimental field (in Sanya,    China at N 18.65, E 109.80) from March 2011. The leaf sheath    colour was observed and scored directly and the tiller angle    was measured for each plant. The mapping population of 210    backcross inbred lines (BILs) and 61 chromosome segment    substitution lines (CSSLs) was derived from a cross between    O. sativa ssp. indica cv. Guangluai-4 and O.    rufipogon accession W1943. The BILs were developed by one    generation of backcross to Guangluai-4 followed with six    generations of self-fertilization. The CSSLs were developed by    five generations of backcross to Guangluai-4 followed with    three generations of self-fertilization. Phenotyping was    conducted in the experimental field (in Shanghai, China at N    31.13, E 121.28) from May to October, 2011. The fifteen    traits that we phenotyped for this study include germination    rate, tiller angle, heading date, stigma colour, the degree of    stigma exsertion, plant height, panicle length, the degree of    shattering, awn length, grain number per panicle, grain length,    grain width, grain weight per 1,000 grains, hull colour and    pericarp colour. The degree of stigma exsertion was scored    based on the observation of ~20 randomly sampled spikelets of    each line, on a scale of 13 (no, incomplete or complete    exertion). Seed germination rate was measured by using mature    seeds which were placed in a plastic Petri dishes kept at    30C in the    dark for 48h5. Other traits    were phenotyped and scored as described previously21, 22, 29.  <\/p>\n<p>    For the genotype data set in O. rufipogon, genotypes of    446 O. rufipogon accessions were called specifically at    the ~5 million SNP sites that were polymorphic in the O.    rufipogon population. In the panel for GWAS, only the SNPs    that have a minor allele frequency (MAF) of more than 5% and    contain genotype calls of more than 100 accessions were left    for subsequent imputation. The k-nearest neighbour    algorithm-based imputation method was used for inferring    missing calls21. The    specificity of the genotype data set before and after    imputation was assessed using three sets of genome sequences    (Supplementary    Table 6). Association analysis was conducted using the    compressed mixed linear model55. The top five    principle components were used as fixed effects and the matrix    of genetic distance was used to model the variancecovariance    matrix of the random effect. Permutation tests were used to    define the threshold of association signals of the GWAS in the    wild rice population. A total of 20 permutation analyses were    performed (10 independent permutation tests for each of the two    traits, sheath colour and tiller angle), which resulted in two    association signals with the thresholds we set53. Hence, there    were an average of 0.1 false positives (that is, totally two    false positives in 20 permutation tests) in a single    whole-genome scanning analysis. Simulation tests were used to    compare the performance of GWAS between the populations of    cultivated and wild rice.  <\/p>\n<p>    Genomic DNA of each line in the mapping population was    sequenced on the Illumina Genome Analyzer IIx, each to    approximately 0.5 coverage. Both parents of the population,    Guangluai-4 and W1943, were sequenced with at least 20 genome    coverage, in a previous work21 and in this    study, respectively. SNP identification between parents was    conducted as described previously41. Genotype    calling, recombination breakpoint determination and bin map    construction was performed using the software SEG-Map (<a href=\"http:\/\/www.ncgr.ac.cn\/software\/SEG\/\" rel=\"nofollow\">http:\/\/www.ncgr.ac.cn\/software\/SEG\/<\/a>).    QTL analysis of the fifteen traits was conducted with the    composite interval mapping (CIM) method implemented in the    software Windows QTL Cartographer56 (version 2.5)    with a window size of 10cM and a step size of 2cM. QTL with LOD value    higher than 3.5 were called, of which the location was    described according to its LOD peak location. The phenotypic    effect (r2) of each QTL was computed using    Windows QTL Cartographer. QTLs located within selective sweep    regions were further used to associate the selected regions    with their biological functions. It needs to be noted that we    adopted a stringent threshold in the QTL calling    (LOD>3.5), and the genomic regions with LOD    ranging from 2.5 to 3.5 may include many minor QTLs (the    threshold was set to 2.5 in most studies).  <\/p>\n<p>    The genome of W1943 was assembled by using a custom pipeline    integrating Phusion2 (clustering the raw reads into different    groups)57 and Phrap    (then assembling all the reads in each group to generate    contigs)58. The N50    length of the entire assembly was calculated for the initial    contigs with small contigs of <200bp excluded. All the    full-length complementary DNA sequences46 of W1943    were aligned with the final assembly of W1943 genome sequence    using the software GMAP59 (version 6)    with the parameters -K 15000 and -k 0.97. The resulting    contigs from whole-genome de novo assembly were anchored    to the rice reference genome sequence (IRGSP4.0) using the    software MUMmer60 (version 3).  <\/p>\n<p>    Gene models of the genome of the wild rice W1943 were predicted    using the software Fgenesh that was set for a monocot    model61 (version    2.0). The resulting proteome of W1943 was compared with protein    sequences in Rice Genome Annotation Project (version 7.0) using    BLASTP with a cutoff of a minimum of 95% identity. Sequence    variants, including SNPs, indels and imbalanced substitutions,    were called using the diffseq program in the EMBOSS    package62. Indels of    large size were called from the alignment results of MUMmer.    Effects of the sequence variants were predicted according to    the gene models of Nipponbare in the RAP-DB (release 2) across    the rice genome. For indels in genic regions and SNPs with    large effect around the domestication loci, the effects were    mainly based on the reference gene models.  <\/p>\n<p>    The sequence reads of 1,083 O. sativa accessions, 446    O. rufipogon accessions and 15 outgroup accessions were    then aligned against assembled genome sequences of W1943 using    the same parameters with those against the reference Nipponbare    genome sequences. Genotypes of each accession were called at    all sequence variant sites (including SNPs, indels and    imbalanced substitutions that were detected from assembled    sequences), based on the alignment outputs against the two    genome sequences. The allele frequencies at the sequence    variant sites were calculated for each clade of O.    sativa and O. rufipogon. In each clade, variant    sites with information of less than 10 accessions (less than 2    for the outgroups) were then excluded for computing allele    frequencies, namely no data available.  <\/p>\n<\/p>\n<p>See more here:<br \/>\n<a target=\"_blank\" href=\"http:\/\/dx.doi.org\/10.1038\/nature11532\" title=\"A map of rice genome variation reveals the origin of cultivated rice\">A map of rice genome variation reveals the origin of cultivated rice<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> Sampling and sequencing The cultivated and wild rice accessions were all from large collections of rice accessions preserved at the China National Rice Research Institute in Hangzhou, China, and the National Institute of Genetics in Mishima, Japan.  <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/a-map-of-rice-genome-variation-reveals-the-origin-of-cultivated-rice\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[],"class_list":["post-3146","post","type-post","status-publish","format-standard","hentry","category-genome"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/3146"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=3146"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/3146\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=3146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=3146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=3146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}