DNA Data Storage

Researchers code a book into DNA, demonstrating the possibility of using the biological molecule for long-term data storage.

Coding messages into DNA was first demonstrated in the 1980s, but technology at the time would only allow one graphical symbol to be encoded. While that capacity has grown over the last 3 decades, the largest project to date, completed in 2010, managed just 7,920 bits of data, equating to approximately half a page of typed text. Using a novel technique, detailed today in Science, researchers at Harvard and Johns Hopkins Universities, have now encoded a 53,000-word book into DNA, including 11 JPG images and one JavaScript program.

Others have pointed out that DNA has certain advantages, said study co-author Sriram Kosuri. But no one had really taken it to a level that we were able to code really useful amounts of information.

Those advantages include the density of information that can be stored: an estimate of maximum capacity predicts that one gram of single-strand DNA could store as much as an exabyte (1018 bytes) of data. However, synthesizing and sequencing DNA carries a lot of inherent errors. Synthetic DNA typically has one incorrect nucleotide in every 70, and next gen sequencing techniques can make many mistakes when interpreting the stored data.

To overcome such errors, the team assigned the bases A and C as 0s, and G and T as 1s, creating a digital data stream. The manuscript and its accompanimentsa draft version of a book co-authored by one of the studys authors, George Church, called Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselveswas converted to HTML before being translated into the stream of 0s and 1s that could be written into the DNA sequence. The resulting stream was 5.27 megabits long, or 5.27 million 0s and 1s.

Previous methods have faced problems when trying to create whole streams in one long DNA sequence, a tricky and expensive process. The teams solution was to split the stream into smaller sections. They coded 96 bits per short nucleotide section, called an oligonucleotide, each of which contained a 19-bit address to order the information in the overall sequence. Each oligonucleotide was synthesized multiple times, so that upon reading, errors could be compared in each copy and a consensus reading could be reached.

Its a similar in the way that when you sequence the human genome, you dont sequence it once, you sequence it at 30 or 50 times coverage, and you just take consensus at each position, said Kosuri.

After synthesizing the sequence and attaching drops of DNA to microarray chips, the data was stored at 4 degrees Celsius for 3 months before being dissolved in water, amplified by PCR, and sequenced. By storing multiple copies, and sequencing each copy many times to reach consensus, the team managed to decode the entire 5.27-million-bit sequence with only 10 bit errors.

Theyve come up with a very clever way of managing error in the creation of the information, said synthetic biologist Steven Benner at the Foundation for Applied Molecular Evolution, who was not involved in the study. [The authors] provide some clever ways to get around the problems, allowing the reading of the minority molecules containing the desired information amid the larger numbers of molecules that do not.

While DNA storage is not re-writable, and not intended to replace your hard drive, the idea of long-term storage of large amounts of data in a very small space has advantages for archiving records and data. In contrast to a flat disc like a CD, with data only inscribed on the surface, a sheet of DNA has data stored throughout its thickness. The major challenge that remains, however, is the cost and efficiency of todays synthesizing and sequencing technologies, which currently make this system impractical for regular use. As sequencing costs continue to drop and technologies continue to advance, however, such DNA storage strategies may soon become much more practical.

Visit link:
DNA Data Storage

Related Posts

Comments are closed.