Coding over Sets for DNA Storage

Andreas Lenz
PhD Candidate, Technical University of Munich (TUM)
Date: Mar. 8, 2019 / Time: 1:15pm / Room: Packard 202

Abstract

DNA based storage is a novel technology, where digital information is stored in synthetic DNA molecules. The recent advance in DNA sequencing methods and decrease in sequencing costs have paved the way for storage methods based on DNA. The natural stability of DNA molecules, (the genetic information from fossils is maintained over tens of thousands of years) motivate their use for long-term archival storage. Furthermore, because the information is stored on molecular levels, such storage systems have extremely high data densities. Recent experiments report data densities of 2 PB/gram, which corresponds to the capacity of a thousand conventional hard disk drives in one gram of DNA.

In this talk we present error-correcting codes for the storage of data in synthetic DNA. We investigate a storage model where data is represented by an unordered set of M sequences, each of length L. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.

Bio

Andreas Lenz received his B.Sc and M.Sc degree in Electrical Engineering and Information technology in 2013, respectively 2016 at TUM (both with high distinction). As part of his Master studies, he was an exchange student at University of Alberta, Canada. From 2014 until 2016 he was working on mobile network analysis systems at Rohde & Schwarz. For his master thesis, he visited Prof. Swindlehurst from University of California, Irvine. In 2016, he joined the coding for communications and data storage group at TUM (Prof. Wachter-Zeh), where he is involved in research on error correcting codes for insertion and deletion errors. In summer 2017 and autumn 2018, he was a visiting researcher at University of California, San Diego and Technion, Israel Institute of Technology, respectively.