The Race to Build a Search Engine for Your DNA

Photo: Andrew Brookes/Corbis

In 2005, next-generation sequencing began to change the field of genetics research. Obtaining a persons entire genome became fast and relatively cheap. Databases of genetic information were growing by the terabyte, and doctors and researchers were in desperate need of a way to efficiently sift through the information for the cause of a particular disorder or for clues to how patients might respond to treatment.

Companies have sprung up over the past five years that are vying to produce the first DNA search engine. All of them have different tacticssome even have their own proprietary databases of genetic informationbut most are working to link enough genetic databases so that users can quickly identify a huge variety of mutations. Most companies also craft search algorithms to supplement the genetic information with relevant biomedical literature. But as in the days of the early Web, before Google reigned supreme, a single company has yet to emerge as the clear winner.

Making a functional search engine is a classic big-data problem, says Michael Gonzalez, the vice president of bioinformatics at one such company, ViaGenetics, which was expected to relaunch its platform in March. Before doctors or researchers can use the data, genomic data must be organized so that humans can read and search it. The first step toward that is to put it in a standard form called the variant call format, or VCF. As raw data, a persons complete sequenced genome would take up about 100 gigabytes, so a database that adds the genomes of even 10 patients per day would quickly get out of hand. But VCF files are more compact, requiring only a few hundred megabytes per genome, which helps researchers find the specific variants they want to search in a fraction of the time. Unlike a fully sequenced genome, VCF files point only to where a persons genetic data deviates from the standardthe genome originally compiled by the Human Genome Project in 2001.

With VCF, sifting the genomes themselves for pinpoint mutations isnt the challenge for search engine companies. Most of these companies are allocating their resources toward efforts to seamlessly compile supplementary information about a specific mutation from other databases across the Web, such as the biomedical research archive PubMed or various troves of electronic medical records. Many of these tools have finely tuned algorithms that prioritize the results by credibility or relevance. You want to be able to pull together the information known about a mutation in that position [of the genome] and quickly make an assessment, says David Mittelman, the chief scientific officer for Tute Genomics, based in Provo, Utah, another company designing a genetic-search engine.

In an effort to expand the information that can be attached to a genome under examination, ViaGenetics, based in Miami Beach, Fla., is making its newly updated platform useful for researchers who want to collaborate across institutions. With ViaGenetics tools, researchers can make their data available to other users, so other people can come across these projects, request access, and form a collaboration, Gonzalez says. It helps people connect the dots between different researchers and institutions. This is especially helpful for smaller labs that may not have very extensive genome databases or for researchers from different universities working to decode the same mutation.

Although the genomic-search industry is now focused on serving scientists, that might not always be the case. Mittelman envisions that Tute Genomics could eventually serve consumers directly. People are already demanding information about their genomes just to understand themselves better, Mittelman says, but most companies dont yet consider the average person to be their primary customer. In order to make that shift, the tool will have to be even more intuitive and user-friendly. Fire-hosing someone with data thats not easy to interpret, or using terminology thats not standardized, has the potential to confuse people, he says. Privacy is also a major concern for the average user; the information that Tute users upload isnt stored permanently, Mittelman says, but users will need extra reassurance if the platform becomes available to the lay public.

And a further evolution of the industry is in the offing. Both ViaGenetics and Tute are hoping to be able to run the entire process in-housefrom the initial DNA sequencing to the presentation of final searchable results to users. The market for analyzing and interpreting genomic data is very fragmented, like the computer industry in the 1990s, where you had to go to separate providers to buy a video card or a motherboard and then try to put it together, Mittelman says. Soon this field will consolidate, as the computer industry did.

This article originally appeared in print as A Google for DNA.

Read more:

The Race to Build a Search Engine for Your DNA

Related Posts

Comments are closed.