pith. sign in

arxiv: 2605.10919 · v2 · pith:6GGET3YYnew · submitted 2026-05-11 · 💻 cs.IT · math.IT

Random Access Expectation in DNA Storage and Fountain Codes

Pith reviewed 2026-05-20 22:32 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords random access expectationDNA storagefountain codesLT codespeeling decoderfully symmetric codeslinear codesinformation retrieval
0
0 comments X

The pith

Binary fully symmetric codes achieve a normalized random access expectation of at least π/4 under peeling decoding in the large blocklength limit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the random access expectation for linear codes motivated by DNA data storage, defined as the expected number of coded symbols that must be drawn until a chosen information symbol can be decoded. It establishes an equivalence between binary fully symmetric codes and LT codes, then analyzes performance under peeling decoding in the asymptotic large-blocklength regime. The resulting bound shows that the random access expectation divided by the number of information symbols is at least π/4 ≈ 0.7854, while a value of approximately 0.7869 is attainable. This supplies a concrete theoretical target for efficient random access in storage systems where data must be retrieved from unordered pools of symbols.

Core claim

Under binary fully symmetric codes and peeling decoding in the large blocklength limit, the random access expectation normalized by the number of information symbols is at least π/4 ≈ 0.7854, while a value of ≈ 0.7869 is achievable.

What carries the argument

The equivalence between binary fully symmetric codes and LT codes, which allows the peeling decoder analysis to determine the random access expectation in the infinite-blocklength limit.

Load-bearing premise

The analysis assumes that generator matrices are fully symmetric, conjectured to be optimal, and that peeling-decoder behavior in the infinite-blocklength limit accurately captures the random access expectation.

What would settle it

An exact calculation or simulation for large but finite blocklengths that produces a normalized random access expectation below π/4 would falsify the claimed lower bound.

Figures

Figures reproduced from arXiv: 2605.10919 by Christoph Hofmeister, Eitan Yaakobi, Rawad Bitar.

Figure 1
Figure 1. Figure 1: Achievable random access expectations for maximum column weight [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Simulated decoding trajectories for an LT code with a peeling decoder. The relative number of undecoded information symbols over the relative [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Depiction of the optimal degree distributions for values of [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: The degree distribution p ⋆,100 and the sum of the objective value and the derivative of the objective w.r.t. i (extended to the reals). The KKT conditions state that the support of p is limited to those i, where this sum crosses zero. The degree distribution is given by p1 ≈ 0.19363, p2 ≈ 0.75839, p14 ≈ 0.00004, p15 ≈ 0.04198, and p100 ≈ 0.00596. and strictly decreasing decoding curves across values of d,… view at source ↗
Figure 5
Figure 5. Figure 5: Decoding curves achieving Tb⋆ d for selected values of d between 2 and 104 . We observe similar S-shaped, strictly decreasing curves across d [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Motivated by DNA data storage, we study the expected number of coded symbols drawn from a linear code until a desired information symbol can be decoded - the random access expectation. We focus on generator matrices with a type of symmetry, conjectured in prior work to be optimal, which we call fully symmetric. We point out an equivalence between binary fully symmetric codes and LT codes. Using this observation, we analyze the random access expectation of binary fully symmetric codes under a peeling decoder, in the large blocklength limit. Under these assumptions, the random access expectation, normalized by the number of information symbols, is at least $\pi/4 \approx 0.7854$, while a value of $\approx 0.7869$ is achievable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The manuscript examines the random access expectation for linear codes with fully symmetric generator matrices in the context of DNA data storage. It establishes an equivalence between these codes and LT codes, and analyzes the normalized random access expectation using peeling decoding in the large blocklength limit. The main results are a lower bound of π/4 ≈ 0.7854 and an achievable value of approximately 0.7869.

Significance. If the asymptotic analysis holds, this provides a solid theoretical benchmark for random access performance in fountain code-based DNA storage systems. The connection to established LT code theory is a strength, enabling the derivation of clean, parameter-free bounds that extend prior work on degree distribution optimization to this new metric.

minor comments (1)
  1. Abstract: The specific degree distribution achieving the value ≈0.7869 is not detailed; including a brief description or reference to the relevant section would enhance readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and for recommending minor revision. The equivalence to LT codes and the resulting parameter-free bounds on normalized random access expectation appear to have been received as a strength, consistent with our motivation from DNA storage applications.

Circularity Check

0 steps flagged

Minor reliance on prior literature for optimality conjecture; core derivation independent

full rationale

The paper points out an equivalence between binary fully symmetric generator matrices and LT codes, then applies standard peeling-decoder asymptotic analysis in the large-blocklength limit to derive the normalized random access expectation lower bound of π/4 and the achievable value ≈0.7869. The fully symmetric property is noted as conjectured optimal in prior work, but the present bounds follow directly from the modeling choices and imported standard tools without reducing to a fitted parameter, self-definition, or load-bearing self-citation chain. No step equates the claimed result to its inputs by construction, and the analysis remains self-contained against external benchmarks from LT-code literature.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the conjectured optimality of fully symmetric matrices and the validity of the large-blocklength peeling analysis; no free parameters or new entities are introduced.

axioms (2)
  • domain assumption Fully symmetric generator matrices are optimal for minimizing random access expectation
    Conjectured in prior work and adopted as the focus of the analysis.
  • domain assumption Peeling decoder analysis in the infinite-blocklength limit accurately models the random access expectation
    The derivation is performed exclusively under this asymptotic regime and decoder.

pith-pipeline@v0.9.0 · 5651 in / 1374 out tokens · 43907 ms · 2026-05-20T22:32:39.355373+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Information-theoretic foundations of DNA data storage,

    I. Shomorony and R. Heckel, “Information-theoretic foundations of DNA data storage,”F oundations and Trends® in Communications and Information Theory, vol. 19, no. 1, pp. 1–106, 2022

  2. [2]

    DNA fountain enables a robust and efficient storage architecture,

    Y . Erlich and D. Zielinski, “DNA fountain enables a robust and efficient storage architecture,”science, vol. 355, no. 6328, pp. 950– 954, 2017

  3. [3]

    Random access in large-scale DNA data storage,

    L. Organick, S. D. Ang, Y .-J. Chen, R. Lopez, S. Yekhanin, K. Makarychev, M. Z. Racz, G. Kamath, P. Gopalan, B. Nguyen et al., “Random access in large-scale DNA data storage,”Nature biotechnology, vol. 36, no. 3, pp. 242–248, 2018

  4. [4]

    Expected re- covery time in DNA-based distributed storage systems,

    A. Levy, R. Con, E. Yaakobi, and H. M. Kiah, “Expected re- covery time in DNA-based distributed storage systems,”preprint arXiv:2602.07601, 2026

  5. [5]

    Cover your bases: How to minimize the sequencing coverage in DNA storage systems,

    D. Bar-Lev, O. Sabary, R. Gabrys, and E. Yaakobi, “Cover your bases: How to minimize the sequencing coverage in DNA storage systems,” IEEE Transactions on Information Theory, 2024

  6. [6]

    A combinato- rial perspective on random access efficiency for DNA storage,

    A. Gruica, D. Bar-Lev, A. Ravagnani, and E. Yaakobi, “A combinato- rial perspective on random access efficiency for DNA storage,”IEEE Transactions on Information Theory, 2025

  7. [7]

    The geometry of codes for random access in dna storage,

    A. Gruica, M. Montanucci, and F. Zullo, “The geometry of codes for random access in dna storage,” Oct. 2025

  8. [8]

    Making it to first: The random access problem in DNA storage,

    A. Boruchovsky, O. Elishco, R. Gabrys, A. Gruica, I. Tamo, and E. Yaakobi, “Making it to first: The random access problem in DNA storage,” Aug. 2025

  9. [9]

    The random variables of the DNA coverage depth problem,

    S ¸. Bodur, S. Lia, H. H. L´opez, R. Ludhani, A. Ravagnani, and L. Sec- cia, “The random variables of the DNA coverage depth problem,” Jul. 2025

  10. [10]

    Random access in DNA storage: Algo- rithms, constructions, and bounds,

    C. Wang and E. Yaakobi, “Random access in DNA storage: Algo- rithms, constructions, and bounds,”preprint arXiv:2601.07053, 2026

  11. [11]

    LT codes,

    M. Luby, “LT codes,” inThe 43rd Annual IEEE Symposium on F oundations of Computer Science, 2002. Proceedings., Nov. 2002, pp. 271–280

  12. [12]

    Fountain codes,

    D. J. MacKay, “Fountain codes,”IEE Proceedings-Communications, vol. 152, no. 6, pp. 1062–1068, 2005

  13. [13]

    Structure of large random hypergraphs,

    R. W. R. Darling and J. R. Norris, “Structure of large random hypergraphs,”The Annals of Applied Probability, vol. 15, no. 1A, pp. 125–152, Feb. 2005

  14. [14]

    New model for rigorous analysis of LT-codes,

    E. N. Maneva and A. Shokrollahi, “New model for rigorous analysis of LT-codes,” Dec. 2005

  15. [15]

    Intermediate performance of rateless codes,

    S. Sanghavi, “Intermediate performance of rateless codes,” in2007 IEEE Information Theory Workshop, Sep. 2007, pp. 478–482

  16. [16]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex optimization. Cambridge university press, 2004

  17. [17]

    Rudin,Real and complex analysis

    W. Rudin,Real and complex analysis. McGraw-Hill, 1987

  18. [18]

    R. G. Gallager,Stochastic processes: theory for applications. Cam- bridge University Press, 2013

  19. [19]

    E. H. . Lieb and M. . Loss,Analysis, second edition ed., ser. Grad- uate studies in mathematics. Providence, Rhode Island: American Mathematical Society, 2010. 7 APPENDIXA PROOF OFLEMMA2 Lemma 2.Letp (k) be a sequence of degree distributions withlim k→∞ p(k) =p. Ifp 1 >0andg(t,p)is strictly increasing for0< t <1, then lim k→∞ bTk(p(k)) =f(p).(4) F or anyp...