Rank Modulated Composite Encoding for Data Storage in DNA
Pith reviewed 2026-06-28 16:52 UTC · model grok-4.3
The pith
Rank modulation on composite DNA symbols determines channel capacity and yields code bounds and constructions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors study two problems arising from rank-modulated composite DNA symbols: they address the capacity of a channel that uses these symbols, and they study bounds and constructions of codes for them.
What carries the argument
The rank-modulated composite symbol, which encodes data solely in the ordering of nucleotide mixture abundances.
If this is right
- The computed capacity gives the maximum reliable rate for storing data in rank-modulated composite symbols.
- Code constructions achieve positive rates below the capacity for error correction.
- Bounds on code size limit the number of distinct messages that can be stored reliably.
- The approach exploits the large number of copies produced during DNA synthesis to observe mixture ranks.
Where Pith is reading between the lines
- Sequencing hardware could be simplified if only rank detection is required rather than quantitative concentration measurements.
- Similar rank-order techniques might transfer to other mixture-based storage media beyond DNA.
- Testing the constructions in physical DNA experiments would reveal whether synthesis noise preserves the rank model.
Load-bearing premise
The channel model accurately captures DNA synthesis and sequencing behavior when only the rank order of nucleotide mixtures is observed.
What would settle it
An experiment that synthesizes and sequences composite DNA strands and measures the actual frequency of rank-order changes to test whether error statistics match the assumed channel.
read the original abstract
This paper studies two problems that are motivated by combining two novel approaches, namely DNA composite and rank modulation. The recent approach of composite DNA takes advantage of the DNA synthesis property which generates a huge number of copies for every synthesized strand. Under this paradigm, every composite symbols does not store a single nucleotide but a mixture of the four DNA nucleotides. Instead of considering all the possible composite symbols we are interested only in the rank of the motifs in the symbol. The first problem in this paper addresses the capacity of a channel that uses such symbols, while in the second, bounds and construction of such codes are studied.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper studies two problems motivated by combining composite DNA storage with rank modulation. Composite symbols store mixtures of the four nucleotides rather than single bases, and the channel outputs only the rank order of the motif concentrations. The first problem derives the capacity of the resulting rank-modulated composite channel; the second provides bounds and explicit code constructions for this model.
Significance. If the derivations hold, the work supplies a clean information-theoretic abstraction for a DNA storage channel that exploits synthesis multiplicity and rank-order observation. The capacity result (Theorem 1) obtained by mutual-information maximization over the rank-order output alphabet and the explicit constructions with rate bounds in Section IV constitute standard but correctly applied tools that could inform practical code design. The modeling choice is presented explicitly as an abstraction rather than an empirical claim.
minor comments (2)
- [Section II] Section II: the transition probabilities of the rank-modulated composite channel are stated but an explicit small-alphabet example (e.g., 2-nucleotide mixtures) would clarify the output alphabet and make the capacity computation easier to follow.
- [Section IV] Section IV: the rate bounds and constructions are given, yet a short comparison table against conventional DNA storage codes (e.g., those based on single-nucleotide or unordered composite symbols) would strengthen the practical relevance of the achieved rates.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of our work deriving the capacity of the rank-modulated composite DNA channel and providing bounds and constructions. The recommendation for minor revision is noted; however, the report contains no specific major comments requiring response.
Circularity Check
No significant circularity in derivation chain
full rationale
The manuscript defines a rank-modulated composite channel in Section II and derives its capacity in Theorem 1 via standard mutual-information maximization over the rank-order output alphabet, with code constructions and rate bounds in Section IV. These steps rest on the stated channel transition probabilities and do not contain self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations that collapse the central claims to their own inputs by construction. The modeling choice is presented as an abstraction, and the derivation remains self-contained against external information-theoretic benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Data storage in DNA with fewer synthesis cycles using composite DNA letters,
L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,"Nature Biotechnology, vol. 37, no. 10, pp. 1229–1236, 2019
2019
-
[2]
Codes in permutations and error correction for rank modulation,
A. Barg and A. Mazumdar, "Codes in permutations and error correction for rank modulation," inIEEE Trans. on Inf. Theory, vol. 56, no. 7, pp. 3158-3165, July 2010
2010
-
[3]
High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,
Y . Choi et al., “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,"Scientific Reports, vol.9, 2019
2019
-
[4]
Optimizing the decoding probability and coverage ratio of composite DNA,
T. Cohen and E. Yaakobi, "Optimizing the decoding probability and coverage ratio of composite DNA,"IEEE Int. Symp. Inf. Theory (ISIT), Athens, Greece, 2024, pp. 1949-1954
2024
-
[5]
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,
N. Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,"Nature, vol. 494, no. 7435, pp. 77–80, 2013
2013
-
[6]
Rank modulation for flash memories,
A. Jiang, R. Mateescu, M. Schwartz and J. Bruck, "Rank modulation for flash memories," inIEEE Trans. on Inf. Theory, vol. 55, no. 6, pp. 2659-2673, June 2009
2009
-
[7]
Correcting charge-constrained errors in the rank-modulation scheme,
A. Jiang, M. Schwartz and J. Bruck, "Correcting charge-constrained errors in the rank-modulation scheme," inIEEE Trans. on Inf. Theory, vol. 56, no. 5, pp. 2112-2120, May 2010
2010
-
[8]
D. E. Knuth,The Art of Computer Programming Volume 3: Sorting and Searching, 2nd ed. Reading, MA: Addison-Wesley, 1998
1998
-
[9]
M-DAB: An input- distribution optimization algorithm for composite DNA storage by the multinomial channel,
A. Kobovich, E. Yaakobi, and N. Weinberger, “M-DAB: An input- distribution optimization algorithm for composite DNA storage by the multinomial channel," Arxiv, Sep., 2023
2023
-
[10]
Terminator- free template-independent enzymatic DNA synthesis for digital informa- tion storage,
H. Lee, R. Kalhor, N. Goela, J. Bolot, and G.M. Church, “Terminator- free template-independent enzymatic DNA synthesis for digital informa- tion storage,"Nature Communications, vol. 10, no. 1, Jun., 2019
2019
-
[11]
Sequencing coverage analysis for combinatorial DNA-based storage systems,
I. Preuss, B. Galili, Z. Yakhini, and L. Anavy, “Sequencing coverage analysis for combinatorial DNA-based storage systems," BioRxiv Jan., 2024, https://www.biorxiv.org/content/10.1101/ 2024.01.10.574966v1
2024
-
[12]
Error-correcting codes for combinatorial composite DNA,
O. Sabary, I. Preuss, R. Gabrys, Z. Yakhini, L. Anavy and E. Yaakobi, "Error-correcting codes for combinatorial composite DNA,"IEEE Int. Symp. Inf. Theory (ISIT), Athens, Greece, 2024, pp. 109-114
2024
-
[13]
Coding over coupon collector channels for combinatorial motif-based DNA storage,
R. Sokolovskii, P. Agarwal, L. A. Croquevielle, Z. Zhou and T. Heinis, "Coding over coupon collector channels for combinatorial motif-based DNA storage," inIEEE Trans. on Comm., Early Access
-
[14]
B., Olgica Milenkovic, Zohar Yakhini, Yonatan Yehezkeally, Anisha Banerjee, and Frederik Walter
R. B., Olgica Milenkovic, Zohar Yakhini, Yonatan Yehezkeally, Anisha Banerjee, and Frederik Walter. Coding Theory and Algo- rithms for Emerging Technologies in Synthetic Biology (Dagstuhl Seminar 24511). In Dagstuhl Reports, V olume 14, Issue 12, pp. 46-62, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/DagRep.14.12.46
-
[15]
Coding for Composite DNA to Correct Substitutions, Strand Losses, and Deletions,
F. Walter, O. Sabary, A. Wachter-Zeh and E. Yaakobi, "Coding for Composite DNA to Correct Substitutions, Strand Losses, and Deletions," IEEE Int. Symp. Inf. Theory (ISIT), Athens, Greece, 2024, pp. 97-102
2024
-
[16]
Efficient binomial channel capacity computation with an application to molecular communication,
R. D. Wesel, “Efficient binomial channel capacity computation with an application to molecular communication,” in Proc. ITA, 2018, pp. 1–5
2018
-
[17]
On codes derivable from the tensor product of check matrices,
J. Wolf, "On codes derivable from the tensor product of check matrices," inIEEE Trans. on Inf. Theory, vol. 11, no. 2, pp. 281-284, April 1965
1965
-
[18]
Limited-magnitude error correction for probability vectors in DNA storage,
W. Zhang, Z. Chen, and Z. Wang, “Limited-magnitude error correction for probability vectors in DNA storage,"IEEE International Conference on Communications (ICC), pp. 3460–3465, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.