A Hacked Database Prompts Debate about Genetic Privacy

Experts urge transparency and new regulations to protect DNA donors

Flickr/Steve Jurvetson

Linking a human genome in an anonymous sequencing database to its real-world counterpart wasnt supposed to be possible.

Yaniv Erlich, a geneticist at the Massachusetts Institute of Technologys Whitehead Institute for Biomedical Research, apparently never got the memo. In the end all it took him and M.I.T. undergraduate student Melissa Gymrek to decipher the identity of 50 individuals whose DNA is available online in free-access databases was a computer and an Internet connection.

Erlich and Gymrek selected 32 male genomes from the 1000 Genomes Project, which has a publicly accessible database designed to help researchers find genes associated with different human diseases. Next, Erlich and Gymrek used an algorithm to extract genetic markers from the DNA sequences. The algorithm is specially designed to hone in on short tandem repeats on a mans Y chromosome. Y-STRs are passed patrilineally with little to no change from one generation to the next. They provide a way to link an anonymous genome to a particular family surname.

Using meta-data about the anonymous genomes included in the database, the researchers narrowed the field of possible DNA matches down to 10,000 men of a particular age who resided in Utah when they donated their DNA. Erlich and Gymrek then plugged the genomes into two of the Webs most popular genealogy sites, Ysearch and SMGF. These recreational sites provide free access to databases that connect Y-STR markers to surnames. The researchers found that eight of their samples strongly matched the surnames of Mormon families in Utah. Erlich and Gymreks findings were published in the January 17 Science.

The results show that a curious party equipped with open-access information can not only tie a three-billion-digit-long genome directly to an individual, but also can use bits and pieces of that same DNA to identify distant relatives, male or female, of the original genetic donor. If your fourth cousin participated in this database, we could use it to find out about your ancestry, Erlich says.

Whereas privacy concerns about publicly accessible genome data have cropped up in the past with genealogy databases, this is the first time that anyone has connected an anonymous DNA sequence to its donor without donor DNA as a reference.

Genome mining could have serious consequences for DNA donors. Under federal law health insurance companies cannot use genetic data, but there is currently nothing barring companies from using a persons genome to define life insurance policies or determine long-term disability care. The new research prompted the National Institutes of Health (NIH) to hide peoples ages from federally funded genetic databases such as the 1000 Genomes Project that allow open access to scientists.

Yet the NIHs strategy may be missing the point, says Lawrence Gostin, a professor of medicine at Georgetown University and director of the World Health Organizations Collaborating Center on Public Health Law and Human Rights. This is not a long-term solution to the problem because in reality there is nothing more personally identifiable than your genome, he says.

See more here:

A Hacked Database Prompts Debate about Genetic Privacy

Related Posts

Comments are closed.