51ÂÜÀò

Submitted by admin on Tue, 09/16/2025 - 20:45
As a potential implementation of data storage using DNA molecules, multiple strands of DNA are stored unordered in a liquid container. When the data are needed, an array of DNA readers will sample the strands with replacement, producing a Poisson-distributed number of noisy reads for each strand. The primary challenge here is to design an algorithm that reconstructs data from these unsorted, repetitive, and noisy reads. In this paper, we lay down a capacity-achieving rateless code along each strand to encode its index; we then lay down a capacity-achieving block code at the same position across all strands to protect the data. These codes weave a low-complexity storage scheme that saturates the fundamental upper limit of DNA. This improves upon the previous work of Weinberger and Merhav, which proves said bound and uses high-complexity random codes to saturate the limit. Our scheme also differs from other concatenation-based implementations of DNA data storage in the sense that, instead of decoding the inner codes first and passing the results to the outer code, our decoder alternates between the rateless codes and the block codes.
Hsin-Po Wang
Venkatesan Guruswami