{"id":69069,"date":"2016-07-03T12:07:01","date_gmt":"2016-07-03T16:07:01","guid":{"rendered":"http:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/the-cost-of-sequencing-a-human-genome\/"},"modified":"2016-07-03T12:07:01","modified_gmt":"2016-07-03T16:07:01","slug":"the-cost-of-sequencing-a-human-genome","status":"publish","type":"post","link":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/the-cost-of-sequencing-a-human-genome\/","title":{"rendered":"The Cost of Sequencing a Human Genome"},"content":{"rendered":"<p><p>The Cost of Sequencing a Human Genome    <\/p>\n<p>    Advances in the field of genomics over the    past quarter-century have led to substantial reductions in the    cost of genome sequencing. The underlying costs associated with    different methods and strategies for sequencing genomes are of    great interest because they influence the scope and scale of    almost all genomics research projects. As a result, significant    scrutiny and attention have been given to genome-sequencing    costs and how they are calculated since the beginning of the    field of genomics in the late 1980s. For example, NHGRI has    carefully tracked costs at    its funded 'genome sequencing centers' for many years (see    Figure 1). With the growing scale of human genetics studies and    the increasing number of clinical applications for genome    sequencing, even greater attention is being paid to    understanding the underlying costs of generating a human genome    sequence.  <\/p>\n<\/p>\n<p>    Accurately determining the cost for sequencing a given    genome (e.g., a human genome) is not simple. There are    many parameters to define and nuances to consider. In fact, it    is difficult to cite precise genome-sequencing cost figures    that mean the same thing to all people because, in reality,    different researchers, research institutions, and companies    typically track and account for such costs in different    fashions.  <\/p>\n<p>    A genome consists of all of the DNA contained in a    cell's nucleus. DNA is composed of four chemical    building blocks or \"bases\" (for simplicity, abbreviated G, A,    T, and C), with the biological information encoded within DNA    determined by the order of those bases. Diploid organisms, like    humans and all other mammals, contain duplicate copies of    almost all of their DNA (i.e., pairs of chromosomes; with one    chromosome of each pair inherited from each parent). The size    of an organism's genome is generally considered to be the total    number of bases in one representative copy of its nuclear DNA.    In the case of diploid organisms (like humans), that    corresponds to the sum of the sizes of one copy of each    chromosome pair.  <\/p>\n<p>    Organisms generally differ in their genome sizes. For example,    the genome of E. coli (a bacterium that lives in your    gut) is ~5 million bases (also called megabases), that of a    fruit fly is ~123 million bases, and that of a human is ~3,000    million bases (or ~3 billion bases). There are also some    surprising extremes, such as with the loblolly pine tree - its    genome is ~23 million bases in size, over seven times larger    than ours. Obviously, the cost to sequence a genome depends on    its size. The discussion below is focused on the human genome;    keep in mind that a single 'representative' copy of the    human genome is ~3 billion bases in size, whereas a given    person's actual (diploid) genome is ~6 billion bases in    size.  <\/p>\n<p>    Genomes are large and, at least with today's methods, their    bases cannot be 'read out' in order (i.e., sequenced)    end-to-end in a single step. Rather, to sequence a genome, its    DNA must first be broken down into smaller pieces, with each    resulting piece then subjected to chemical reactions that allow    the identify and order of its bases to be deduced. The    established base order derived from each piece of DNA is often    called a 'sequence read,' and the collection of the resulting    set of sequence reads (often numbering in the billions) is then    computationally assembled back together to deduce the sequence    of the starting genome. Sequencing human genomes are nowadays    aided by the availability of available 'reference' sequences of    the human genome, which play an important role in the    computational assembly process. Historically, the process of    breaking down genomes, sequencing the individual pieces of DNA,    and then reassembling the individual sequence reads to generate    a sequence of the starting genome was called 'shotgun    sequencing' (although this terminology is used less frequently    today). When an entire genome is being sequenced, the    process is called 'whole-genome sequencing.' See    Figure 2 for a comparison of human genome sequencing methods    during the time of the Human Genome Project and circa ~ 2016.  <\/p>\n<p>    An alternative to whole-genome sequencing is    the targeted sequencing of part of a genome. Most often, this    involves just sequencing the protein-coding regions of a    genome, which reside within DNA segments called 'exons' and    reflect the currently 'best understood' part of most genomes.    For example, all of the exons in the human genome (the human    'exome') correspond to ~1.5% of the total human genome.    Methods are now readily available to experimentally    'capture' (or isolate) just the exons, which can then be    sequenced to generate a 'whole-exome sequence' of a    genome. Whole-exome sequencing does require extra    laboratory manipulations, so a whole-exome sequence does not    cost ~1.5% of a whole-genome sequence. But since much less DNA    is sequenced, whole-exome sequencing is (at least currently)    cheaper than whole-genome sequencing.  <\/p>\n<p>    Another important driver of the costs associated with    generating genome sequences relates to data quality. That    quality is heavily dependent upon the average number of times    each base in the genome is actually 'read' during the    sequencing process. During the Human Genome Project (HGP), the    typical levels of quality considered were: (1) 'draft sequence'    (covering ~90% of the genome at ~99.9% accuracy); and (2)    'finished sequence' (covering >95% of the genome at ~99.99%    accuracy). Producing truly high-quality 'finished' sequence by    this definition is very expensive; of note, the process of    'sequence finishing' is very labor-intensive and is thus    associated with high costs. In fact, most human genome    sequences produced today are 'draft sequences' (sometimes above    and sometimes below the accuracy defined above).  <\/p>\n<p>    There are thus a number of factors to consider when calculating    the costs associated with genome sequencing. There are multiple    different types and quality levels of genome sequences, and    there can be many steps and activities involved in the process    itself. Understanding the true cost of a genome sequence    therefore requires knowledge about what was and was not    included in calculating that cost (e.g., sequence data    generation, sequence finishing, upfront activities such as    mapping, equipment amortization, overhead, utilities, salaries,    data analyses, etc.). In reality, there are often    differences in what gets included when estimating    genome-sequencing costs in different situations.  <\/p>\n<p>    Below is summary information about: (1) the estimated cost of    sequencing the first human genome as part of the HGP; (2) the    estimated cost of sequencing a human genome in 2006 (i.e.,    roughly a decade ago); and (3) the estimated cost of sequencing    a human genome in 2016 (i.e., the present time).  <\/p>\n<p>    The HGP generated a 'reference' sequence of the human    genome - specifically, it sequenced one representative    version of all parts of each human chromosome (totaling ~3    billion bases). In the end, the quality of the 'finished'    sequence was very high, with an estimated error rate of <1    in 100,000 bases; note this is much higher than a typical human    genome sequence produced today. The generated sequence did not    come from one person's genome, and, being a 'reference'    sequence of ~3 billion bases, really reflects half of what is    generated when an individual person's ~6-billion-base genome is    sequenced (see below).  <\/p>\n<p>    The HGP involved first mapping and then sequencing the human    genome. The former was required at the time because there was    otherwise no 'framework' for organizing the actual sequencing    or the resulting sequence data. The maps of the human genome    served as 'scaffolds' on which to connect individual segments    of assembled DNA sequence. These genome-mapping efforts were    quite expensive, but were essential at the time for generating    an accurate genome sequence. It is difficult to    estimate the costs associated with the 'human genome mapping    phase' of the HGP, but it was certainly in the many tens of    millions of dollars (and probably hundreds of millions of    dollars).  <\/p>\n<p>    Once significant human genome sequencing began for the HGP, a    'draft' human genome sequence (as described above) was produced    over a 15-month period (from April 1999 to June 2000).    The estimated cost for generating that initial 'draft'    human genome sequence is ~$300    million worldwide, of which NIH    provided roughly 50-60%.  <\/p>\n<p>    The HGP then proceeded to refine the 'draft' and produce a    'finished' human genome sequence (as described above), which    was achieved by 2003. The estimated cost for advancing    the 'draft' human genome sequence to the 'finished' sequence is    ~$150    million worldwide. Of note,    generating the final human genome sequence by the HGP also    relied on the sequences of small targeted regions of the human    genome that were generated before the HGP's main    production-sequencing phase; it is impossible to estimate the    costs associated with these various other genome-sequencing    efforts, but they likely total in the tens of millions    of dollars.  <\/p>\n<p>    The above explanation illustrates the difficulty in coming up    with a single, accurate number for the cost of generating that    first human genome sequence as part of the HGP. Such a    calculation requires a clear delineation about what does and    does not get 'counted' in the estimate; further, most of the    cost estimates for individual components can only be given as    ranges. At the lower bound, it would seem that this    cost figure is at least $500 million; at the upper bound, this    cost figure could be as high as $1 billion. The truth is likely    somewhere in between.  <\/p>\n<p>    The above estimated cost for generating the first human genome    sequence by the HGP should not be confused with the total cost    of the HGP. The originally projected cost for the U.S.'s    contribution to the HGP was $3 billion; in actuality, the    Project ended up taking less time (~13 years rather than ~15    years) and requiring less funding - ~$2.7    billion. But the latter number represents the total U.S.    funding for a wide range of scientific activities under the    HGP's umbrella beyond human genome sequencing, including    technology development, physical and genetic mapping, model    organism genome mapping and sequencing, bioethics research, and    program management. Further, this amount does not reflect the    additional funds for an overlapping set of activities pursued    by other countries that participated in the HGP.  <\/p>\n<p>    As the HGP was nearing completion, genome-sequencing pipelines    had stabilized to the point that NHGRI was able to collect    fairly reliable cost information from the major sequencing    centers funded by the Institute. Based on these data, NHGRI    estimated that the hypothetical 2003 cost to generate a    'second' reference human genome sequence using the    then-available approaches and technologies was in the    neighborhood of $50 million.  <\/p>\n<p>    Since the completion of the HGP and the    generation of the first 'reference' human genome sequence,    efforts have increasingly shifted to the generation of human    genome sequences from individual people. Sequencing an    individual's 'personal' genome actually involves establishing    the identity and order of ~6 billion bases of DNA (rather than    a ~3-billion-base 'reference' sequence; see above).    Thus, the generation of a person's genome sequence is a    notably different endeavor than what the HGP did.  <\/p>\n<p>    Within a few years following the end of the HGP (e.g., in    2006), the landscape of genome sequencing was beginning to    change. While revolutionary new DNA sequencing technologies,    such as those in use today, were not quite implemented at that    time, genomics groups continued to refine the basic    methodologies used during the HGP and continued lowering the    costs for genome sequencing. Considerable efforts were being    made to the sequencing of nonhuman genomes (much more so than    human genomes), but the cost-accounting data collected at that    time can be used to estimate the approximate cost that would    have been associated with human genome sequencing at that time.  <\/p>\n<p>    Based on data collected by NHGRI from the Institute's funded    genome-sequencing groups, the cost to generate a    high-quality 'draft' human genome sequence had dropped to ~$14    million by 2006. Hypothetically, it would have likely cost    upwards of $20-25 million to generate a 'finished' human genome    sequence - expensive, but still considerably less so    than for generating the first reference human genome    sequence.  <\/p>\n<p>    The decade following the HGP brought    revolutionary advances in DNA sequencing technologies that are    fundamentally changing the nature of genomics. So-called    'next-generation' DNA sequencing methods arrived on the scene,    and their effects quickly became evident in terms of lowering genome-sequencing costs;    note that these NHGRI-collected data are 'retroactive' in    nature, and do not always accurately reflect the 'projected'    costs for genome sequencing going forward).  <\/p>\n<p>    In 2015, the most common routine for sequencing an individual's    human genome involves generating a 'draft' sequence and    comparing it to a reference human genome sequence, so as to    catalog all sequence variants in that genome; such a routine    does not involve any sequence finishing. In short, nearly all    human genome sequencing in 2015 yields high-quality 'draft'    (but unfinished) sequence. That sequencing is typically    targeted to all exons (whole-exome sequencing) or aimed at the    entire ~6-billion-base genome (whole-genome sequencing), as    discussed above. The quality of the resulting 'draft' sequences    is heavily dependent on the amount of average base redundancy    provided by the generated data (with higher redundancy costing    more).  <\/p>\n<p>    Adding to the complex landscape of genome sequencing in 2015    has been the emergence of commercial enterprises offering    genome-sequencing services at competitive pricing. Direct    comparisons between commercial versus academic    genome-sequencing operations can be particularly challenging    because of the many nuances about what each includes in any    cost estimates (with such details often not revealed by private    companies). The cost data that NHGRI collects from its funded    genome-sequencing groups includes information about a wide    range of activities and components, such as: reagents,    consumables, DNA-sequencing instruments, certain computer    equipment, other equipment, laboratory pipeline development,    laboratory information management systems, initial data    processing, submission of data to public databases, project    management, utilities, other indirect costs, labor, and    administration. Note that such cost-accounting does not    typically include activities such as quality assurance\/quality    control (QA\/QC), alignment of generated sequence to a reference    human genome, sequence assembly, genomic variant calling, or    annotation. Almost certainly, companies vary in terms of which    of the items in the above lists get included in any cost    estimates, making direct cost comparisons with academic    genome-sequencing groups difficult. It is thus important to    consider these variables - along with the distinction between    retrospective versus projected costs - when comparing    genome-sequencing costs claimed by different groups. Anyone    comparing costs for genome sequencing should also be aware of    the distinction between 'price' and 'cost' - a given price may    be either higher or lower than the actual cost.  <\/p>\n<p>    Based on the data collected from    NHGRI-funded genome-sequencing groups, the cost to    generate a high-quality 'draft' whole human genome sequence in    mid-2015 was just above $4,000; by late in 2015, that figure    had fallen below $1,500. The cost to generate a whole-exome    sequence was generally below $1,000. Commercial prices    for whole-genome and whole-exome sequences have often (but not    always) been slightly below these numbers.  <\/p>\n<p>    Innovation in genome-sequencing technologies    and strategies does not appear to be slowing. As a result, one    can readily expect continued reductions in the cost for human    genome sequencing. The key factors to consider when assessing    the 'value' associated with an estimated cost for generating a    human genome sequence - in particular, the amount of the genome    (whole versus exome), quality, and associated data analysis (if    any) - will likely remain largely the same. With new    DNA-sequencing platforms anticipated in the coming years, the    nature of the generated sequence data and the associated costs    will likely continue to be dynamic. As such, continued    attention will need to be paid to the way in which the costs    associated with genome sequencing are calculated.  <\/p>\n<p>    Top of page  <\/p>\n<p>    Last Updated: June 6, 2016  <\/p>\n<p><!-- Auto Generated --><\/p>\n<p>Read more from the original source:<br \/>\n<a target=\"_blank\" href=\"https:\/\/www.genome.gov\/27565109\/the-cost-of-sequencing-a-human-genome\/\" title=\"The Cost of Sequencing a Human Genome\">The Cost of Sequencing a Human Genome<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p> The Cost of Sequencing a Human Genome Advances in the field of genomics over the past quarter-century have led to substantial reductions in the cost of genome sequencing. The underlying costs associated with different methods and strategies for sequencing genomes are of great interest because they influence the scope and scale of almost all genomics research projects. As a result, significant scrutiny and attention have been given to genome-sequencing costs and how they are calculated since the beginning of the field of genomics in the late 1980s <a href=\"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/transhuman-news-blog\/genome\/the-cost-of-sequencing-a-human-genome\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[25],"tags":[],"class_list":["post-69069","post","type-post","status-publish","format-standard","hentry","category-genome"],"_links":{"self":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/69069"}],"collection":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/comments?post=69069"}],"version-history":[{"count":0,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/posts\/69069\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/media?parent=69069"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/categories?post=69069"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.euvolution.com\/prometheism-transhumanism-posthumanism\/wp-json\/wp\/v2\/tags?post=69069"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}