Using DNA to store digital data is one of the most researched interdisciplinary field that links technology with bio-technology. Over the years many strides have been taken in DNA-based information storage, but much needs to be achieved to make this a reality.
DNA holds a lot of promise as far as stability, high storage density and low maintenance cost are involved, but there are still a lot of problems including ability to accurately rewrite digital information encoded in DNA sequences.
Generally, DNA data storage technology has two modes, i.e., the “in vitro hard disk mode” and the “in vivo CD mode.” The primary advantage of the in vivo mode is its low-cost, reliable replication of chromosomal DNA by cell replication. Due to this characteristic, it can be used for rapid and low-cost data copy dissemination. Since encoded DNA sequences for some information contain a large number of repeats and the appearance of homopolymers, however, such information can only be “written” and “read,” but cannot be accurately “rewritten.”
To solve the rewriting problem researchers have recently developed a dual-plasmid editing system for accurately processing digital information in a microbial vector. Their findings were published in Science Advances.
The researchers established a dual-plasmid system in vivo using a rationally designed coding algorithm and an information editing tool. This dual-plasmid system is suitable for storing, reading and rewriting various types of information, including text, codebooks and images. It fully explores the coding capability of DNA sequences without requiring any addressing indices or backup sequences. It is also compatible with various kinds of coding algorithms, thus enabling high coding efficiency. For example, the coding efficiency of the current system reaches 4.0 bits per nucleotide.
To achieve high efficiency as well as reliability in rewriting complex information stored in exogenous DNA sequences in vivo, a variety of CRISPR-associated proteins (Cas) and recombinase were used. The tools were guided by their corresponding CRISPR RNA (crRNA) to cleave a target locus in a DNA sequence so that the specific information could be addressed and rewritten. Because of the high specificity between complementary pairs of nucleic acid molecules, the information-encoded DNA sequences were accurately reconstructed by recombinase to encode new information. Due to optimizing the crRNA sequence, the information rewriting tool became highly adaptable to complex information, thus resulting in rewriting reliability of up to 94%, which is comparable to existing gene-editing systems.
The dual-plasmid system can serve as a universal platform for DNA-based information rewriting in vivo, thus offering a new strategy for information processing and target-specific rewriting of large and complicated data on a molecular level.