DNA will store all our data
DNA was first isolated in 1869 and the molecular structure remained unresolved until 1953 when the paired, helical structure was isolated. Human DNA has about 3 Billion base pairs arranged in 23 pairs of chromosomes – a rough draft of how these are arrayed was available in June 2000 and the complete sequencing of the 23 pairs of chromosomes is dated at April 14, 2003. That was a $3B project and now you can spend under C$100, spit into a test tube, and get results within a few weeks. That’s a lot of personal data.
Viewed as data, each chromosome is made up of genes which are sections of DNA. Every section is made up from a simple alphabet of C, G, A and T (for the four basic amino acids). Further, A and T bond, C and G bond. There are fairly simple rules for how the letters of this small, restricted alphabet can combine into “words” and how those words can combine into “paragraphs” which are the genes everyone speaks of. In another article, I’ll discuss DNA genealogy and some of the research already underway, and areas where there are interesting gaps or good projects for undergraduates.
One way to look at DNA is simply a highly efficient, encodable and self-repairing biological storage device. The data is densely packed, permanent, very low energy, available for random access, and standardized. Or, it could be once the technical challenges are solved and costs are reduced. This is already being marketed as an emerging technology.
Microsoft is working with the University of Washington on research project. They predict that 100 exabytes (about 10% of the entire Internet) would fit in a shoebox. There’s a good Ted-Ed talk that explains more about how this works. It might be a little slow-paced for some but it is thorough and very easily understood.
It looks like the technology is going to have at least another decade to keep up with the exponentially increasing rate of data acquisition. The real challenges are going to be processing the data (processor increases, while exponential, are on a much slower growth curve) and moving the data (network speeds for very large data are easily beaten by copying to storage media and physically shipping them.)
When storage is effectively infinite and very, very cheap, what changes? I’m going to save that topic for a future post or two - maybe one utopian viewpoint and one dystopian. We need to think these things through before they’re a fait accompli.