Random Access Expectation in DNA Storage and Fountain Codes
Pith reviewed 2026-05-20 22:32 UTC · model grok-4.3
The pith
Binary fully symmetric codes achieve a normalized random access expectation of at least π/4 under peeling decoding in the large blocklength limit.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under binary fully symmetric codes and peeling decoding in the large blocklength limit, the random access expectation normalized by the number of information symbols is at least π/4 ≈ 0.7854, while a value of ≈ 0.7869 is achievable.
What carries the argument
The equivalence between binary fully symmetric codes and LT codes, which allows the peeling decoder analysis to determine the random access expectation in the infinite-blocklength limit.
Load-bearing premise
The analysis assumes that generator matrices are fully symmetric, conjectured to be optimal, and that peeling-decoder behavior in the infinite-blocklength limit accurately captures the random access expectation.
What would settle it
An exact calculation or simulation for large but finite blocklengths that produces a normalized random access expectation below π/4 would falsify the claimed lower bound.
Figures
read the original abstract
Motivated by DNA data storage, we study the expected number of coded symbols drawn from a linear code until a desired information symbol can be decoded - the random access expectation. We focus on generator matrices with a type of symmetry, conjectured in prior work to be optimal, which we call fully symmetric. We point out an equivalence between binary fully symmetric codes and LT codes. Using this observation, we analyze the random access expectation of binary fully symmetric codes under a peeling decoder, in the large blocklength limit. Under these assumptions, the random access expectation, normalized by the number of information symbols, is at least $\pi/4 \approx 0.7854$, while a value of $\approx 0.7869$ is achievable.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript examines the random access expectation for linear codes with fully symmetric generator matrices in the context of DNA data storage. It establishes an equivalence between these codes and LT codes, and analyzes the normalized random access expectation using peeling decoding in the large blocklength limit. The main results are a lower bound of π/4 ≈ 0.7854 and an achievable value of approximately 0.7869.
Significance. If the asymptotic analysis holds, this provides a solid theoretical benchmark for random access performance in fountain code-based DNA storage systems. The connection to established LT code theory is a strength, enabling the derivation of clean, parameter-free bounds that extend prior work on degree distribution optimization to this new metric.
minor comments (1)
- Abstract: The specific degree distribution achieving the value ≈0.7869 is not detailed; including a brief description or reference to the relevant section would enhance readability.
Simulated Author's Rebuttal
We thank the referee for their positive summary of our work and for recommending minor revision. The equivalence to LT codes and the resulting parameter-free bounds on normalized random access expectation appear to have been received as a strength, consistent with our motivation from DNA storage applications.
Circularity Check
Minor reliance on prior literature for optimality conjecture; core derivation independent
full rationale
The paper points out an equivalence between binary fully symmetric generator matrices and LT codes, then applies standard peeling-decoder asymptotic analysis in the large-blocklength limit to derive the normalized random access expectation lower bound of π/4 and the achievable value ≈0.7869. The fully symmetric property is noted as conjectured optimal in prior work, but the present bounds follow directly from the modeling choices and imported standard tools without reducing to a fitted parameter, self-definition, or load-bearing self-citation chain. No step equates the claimed result to its inputs by construction, and the analysis remains self-contained against external benchmarks from LT-code literature.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Fully symmetric generator matrices are optimal for minimizing random access expectation
- domain assumption Peeling decoder analysis in the infinite-blocklength limit accurately models the random access expectation
Reference graph
Works this paper leans on
-
[1]
Information-theoretic foundations of DNA data storage,
I. Shomorony and R. Heckel, “Information-theoretic foundations of DNA data storage,”F oundations and Trends® in Communications and Information Theory, vol. 19, no. 1, pp. 1–106, 2022
work page 2022
-
[2]
DNA fountain enables a robust and efficient storage architecture,
Y . Erlich and D. Zielinski, “DNA fountain enables a robust and efficient storage architecture,”science, vol. 355, no. 6328, pp. 950– 954, 2017
work page 2017
-
[3]
Random access in large-scale DNA data storage,
L. Organick, S. D. Ang, Y .-J. Chen, R. Lopez, S. Yekhanin, K. Makarychev, M. Z. Racz, G. Kamath, P. Gopalan, B. Nguyen et al., “Random access in large-scale DNA data storage,”Nature biotechnology, vol. 36, no. 3, pp. 242–248, 2018
work page 2018
-
[4]
Expected re- covery time in DNA-based distributed storage systems,
A. Levy, R. Con, E. Yaakobi, and H. M. Kiah, “Expected re- covery time in DNA-based distributed storage systems,”preprint arXiv:2602.07601, 2026
-
[5]
Cover your bases: How to minimize the sequencing coverage in DNA storage systems,
D. Bar-Lev, O. Sabary, R. Gabrys, and E. Yaakobi, “Cover your bases: How to minimize the sequencing coverage in DNA storage systems,” IEEE Transactions on Information Theory, 2024
work page 2024
-
[6]
A combinato- rial perspective on random access efficiency for DNA storage,
A. Gruica, D. Bar-Lev, A. Ravagnani, and E. Yaakobi, “A combinato- rial perspective on random access efficiency for DNA storage,”IEEE Transactions on Information Theory, 2025
work page 2025
-
[7]
The geometry of codes for random access in dna storage,
A. Gruica, M. Montanucci, and F. Zullo, “The geometry of codes for random access in dna storage,” Oct. 2025
work page 2025
-
[8]
Making it to first: The random access problem in DNA storage,
A. Boruchovsky, O. Elishco, R. Gabrys, A. Gruica, I. Tamo, and E. Yaakobi, “Making it to first: The random access problem in DNA storage,” Aug. 2025
work page 2025
-
[9]
The random variables of the DNA coverage depth problem,
S ¸. Bodur, S. Lia, H. H. L´opez, R. Ludhani, A. Ravagnani, and L. Sec- cia, “The random variables of the DNA coverage depth problem,” Jul. 2025
work page 2025
-
[10]
Random access in DNA storage: Algo- rithms, constructions, and bounds,
C. Wang and E. Yaakobi, “Random access in DNA storage: Algo- rithms, constructions, and bounds,”preprint arXiv:2601.07053, 2026
- [11]
-
[12]
D. J. MacKay, “Fountain codes,”IEE Proceedings-Communications, vol. 152, no. 6, pp. 1062–1068, 2005
work page 2005
-
[13]
Structure of large random hypergraphs,
R. W. R. Darling and J. R. Norris, “Structure of large random hypergraphs,”The Annals of Applied Probability, vol. 15, no. 1A, pp. 125–152, Feb. 2005
work page 2005
-
[14]
New model for rigorous analysis of LT-codes,
E. N. Maneva and A. Shokrollahi, “New model for rigorous analysis of LT-codes,” Dec. 2005
work page 2005
-
[15]
Intermediate performance of rateless codes,
S. Sanghavi, “Intermediate performance of rateless codes,” in2007 IEEE Information Theory Workshop, Sep. 2007, pp. 478–482
work page 2007
-
[16]
S. Boyd and L. Vandenberghe,Convex optimization. Cambridge university press, 2004
work page 2004
-
[17]
Rudin,Real and complex analysis
W. Rudin,Real and complex analysis. McGraw-Hill, 1987
work page 1987
-
[18]
R. G. Gallager,Stochastic processes: theory for applications. Cam- bridge University Press, 2013
work page 2013
-
[19]
E. H. . Lieb and M. . Loss,Analysis, second edition ed., ser. Grad- uate studies in mathematics. Providence, Rhode Island: American Mathematical Society, 2010. 7 APPENDIXA PROOF OFLEMMA2 Lemma 2.Letp (k) be a sequence of degree distributions withlim k→∞ p(k) =p. Ifp 1 >0andg(t,p)is strictly increasing for0< t <1, then lim k→∞ bTk(p(k)) =f(p).(4) F or anyp...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.