Correcting Tail Deletions in Rank Modulated Composite Encoding for Data Storage in DNA
Pith reviewed 2026-05-20 06:57 UTC · model grok-4.3
The pith
Codes over partial permutations correct tail deletions and insertions in rank modulated composite DNA encoding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This work presents bounds and constructions of efficient codes defined over partial permutations for correcting deletion and insertion errors in rank modulated composite encoding for data storage in DNA.
What carries the argument
Partial permutations of nucleotide frequency ranks in composite DNA symbols, which enable the modeling and correction of tail deletions and insertions by tracking relative orders even with missing or extra elements.
If this is right
- The codes achieve correction of tail deletions without requiring exact frequency measurements.
- Constructions provide practical methods to encode data using permutations of frequencies.
- Bounds indicate the redundancy needed to handle a given number of errors.
- This extends previous rank modulation codes to handle insertion and deletion errors specifically.
Where Pith is reading between the lines
- These codes might integrate with existing DNA storage systems to reduce error rates from synthesis tail effects.
- Further research could explore combinations with other error-correcting techniques for comprehensive coverage.
- Experimental validation on physical DNA could test the assumptions about error patterns.
Load-bearing premise
The tail deletion model and frequency rank measurements in composite symbols match the main errors and capabilities in actual DNA synthesis and sequencing.
What would settle it
Generate sequences of partial permutations from composite DNA mixtures, apply the constructed codes, introduce tail deletions, and verify if the decoder recovers the original information correctly.
read the original abstract
We study the combination of two recent coding approaches, in the context of DNA based data storage. Composite DNA alphabets leverage properties of the DNA synthesis and sequencing process. A composite symbol does not represent a single nucleotide, but rather a designed mixture of DNA nucleotides. Using the high multiplicity that is intrinsic to synthesis and sequencing a composite symbol consists of frequencies in the mixture. Rank modulation codes use permutations to represent information. Combining the two, we construct encoding that uses permutations of nucleotide frequencies rather than the exact frequency values. Codes for this approach were addressed in previous work, under Kendall's tau distances. In this work we study deletion and insertion codes. We present bounds and constructions of efficient codes defined over partial permutations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript studies rank-modulated composite encoding for DNA data storage by combining composite alphabets (frequency mixtures) with permutations of nucleotide frequencies. It shifts focus from Kendall-tau distances to deletion and insertion correction, defining codes over partial permutations and presenting explicit bounds together with constructions for tail-deletion channels.
Significance. If the stated bounds and constructions hold, the work supplies concrete coding-theoretic tools for deletion correction under the rank-modulated model, a practically relevant error type in DNA synthesis and sequencing. The explicit definitions of partial permutations, the tail-deletion channel, and the distance bounds constitute a self-contained contribution that builds directly on prior composite-alphabet and rank-modulation literature.
minor comments (3)
- [Abstract] Abstract: the claim of 'efficient codes' would be strengthened by a brief statement of the achieved rate or asymptotic scaling relative to the deletion fraction.
- [Section 2] Section 2: the definition of partial permutations and the precise mapping from frequency ranks to code symbols should include an explicit example to aid readability.
- [Section 3] The distance metric for the tail-deletion channel is introduced without a short comparison table to the Kendall-tau metric used in prior work; adding such a table would clarify the modeling shift.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and for the positive assessment of its contribution to deletion and insertion correction under the rank-modulated composite encoding model. We are pleased with the recommendation for minor revision and will prepare an updated version accordingly.
Circularity Check
No significant circularity
full rationale
The paper defines partial permutations and the tail-deletion channel explicitly, then supplies independent constructions and distance bounds for deletion/insertion-correcting codes over this alphabet in the rank-modulated composite model. It references prior literature on composite alphabets and Kendall-tau rank modulation but does not reduce any central claim to a self-citation chain, fitted parameter renamed as prediction, or self-definitional loop; the new results on partial-permutation codes are mathematically self-contained and externally verifiable.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We present bounds and constructions of efficient codes defined over partial permutations for deletion and insertion correction in rank modulated composite encoding.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat ≃ Nat unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 5. ... C_det_{q,t} is a t-tail-deletion-detecting code ... DEL_det(q,t) = q! · Σ 1/(i(t+1))!
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Data storage in DNA with fewer synthesis cycles using composite DNA letters,
L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,"Nature Biotechnology, vol. 37, no. 10, pp. 1229–1236, 2019
work page 2019
-
[2]
Codes in permutations and error correction for rank modulation,
A. Barg and A. Mazumdar, "Codes in permutations and error correction for rank modulation," inIEEE Trans. on Inf. Theory, vol. 56, no. 7, pp. 3158-3165, July 2010
work page 2010
-
[3]
Y . Choi et al., “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,"Scientific Reports, vol.9, 2019
work page 2019
-
[4]
Optimizing the decoding probability and coverage ratio of composite DNA,
T. Cohen and E. Yaakobi, "Optimizing the decoding probability and coverage ratio of composite DNA,"IEEE Int. Symp. Inf. Theory (ISIT), Athens, Greece, 2024, pp. 1949-1954
work page 2024
-
[5]
Rank modulated composite encoding for data storage in DNA,
T. Cohen, Z. Wang, E. Yaakobi and Z. Yakhini, "Rank modulated composite encoding for data storage in DNA,"2025 13th International Symposium on Topics in Coding (ISTC), Los Angeles, CA, USA, 2025, pp. 1-5
work page 2025
-
[6]
Codes correcting era- sures and deletions for rank modulation,
R. Gabrys, E. Yaakobi, F. Farnoud and J. Bruck, "Codes correcting era- sures and deletions for rank modulation,"IEEE International Symposium on Information Theory, Honolulu, HI, USA, 2014, pp. 2759-2763
work page 2014
-
[7]
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,
N. Goldman et al., “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,"Nature, vol. 494, no. 7435, pp. 77–80, 2013
work page 2013
-
[8]
Rank modulation for flash memories,
A. Jiang, R. Mateescu, M. Schwartz and J. Bruck, "Rank modulation for flash memories," inIEEE Trans. on Inf. Theory, vol. 55, no. 6, pp. 2659-2673, June 2009
work page 2009
-
[9]
Rank modulation for flash memories,
A. Jiang, R. Mateescu, M. Schwartz, and J. Bruck, “Rank modulation for flash memories,”IEEE Trans. Inf. Theory, vol. 55, no. 6, pp. 2659–2673, Jun. 2009
work page 2009
-
[10]
Correcting charge-constrained errors in the rank-modulation scheme,
A. Jiang, M. Schwartz and J. Bruck, "Correcting charge-constrained errors in the rank-modulation scheme," inIEEE Trans. on Inf. Theory, vol. 56, no. 5, pp. 2112-2120, May 2010
work page 2010
-
[11]
D. E. Knuth,The Art of Computer Programming Volume 3: Sorting and Searching, 2nd ed. Reading, MA: Addison-Wesley, 1998
work page 1998
-
[12]
A. Kobovich, E. Yaakobi, and N. Weinberger, “M-DAB: An input- distribution optimization algorithm for composite DNA storage by the multinomial channel," Arxiv, Sep., 2023
work page 2023
-
[13]
Terminator- free template-independent enzymatic DNA synthesis for digital informa- tion storage,
H. Lee, R. Kalhor, N. Goela, J. Bolot, and G.M. Church, “Terminator- free template-independent enzymatic DNA synthesis for digital informa- tion storage,"Nature Communications, vol. 10, no. 1, Jun., 2019
work page 2019
-
[14]
Binary codes capable of correcting deletions, insertions, and reversals,
V . Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals,"Soviet Physics Doklady, vol. 10, no. 8, pp. 707–710, 1966
work page 1966
-
[15]
Sequencing coverage analysis for combinatorial DNA-based storage systems,
I. Preuss, B. Galili, Z. Yakhini, and L. Anavy, “Sequencing coverage analysis for combinatorial DNA-based storage systems," BioRxiv Jan., 2024, https://www.biorxiv.org/content/10.1101/ 2024.01.10.574966v1
work page 2024
-
[16]
Error-correcting codes for combinatorial composite DNA,
O. Sabary, I. Preuss, R. Gabrys, Z. Yakhini, L. Anavy and E. Yaakobi, "Error-correcting codes for combinatorial composite DNA,"IEEE Int. Symp. Inf. Theory (ISIT), Athens, Greece, 2024, pp. 109-114
work page 2024
-
[17]
Coding over coupon collector channels for combinatorial motif-based DNA storage,
R. Sokolovskii, P. Agarwal, L. A. Croquevielle, Z. Zhou and T. Heinis, "Coding over coupon collector channels for combinatorial motif-based DNA storage," inIEEE Trans. on Comm., Early Access
-
[18]
B., Olgica Milenkovic, Zohar Yakhini, Yonatan Yehezkeally, Anisha Banerjee, and Frederik Walter
R. B., Olgica Milenkovic, Zohar Yakhini, Yonatan Yehezkeally, Anisha Banerjee, and Frederik Walter. Coding Theory and Algo- rithms for Emerging Technologies in Synthetic Biology (Dagstuhl Seminar 24511). In Dagstuhl Reports, V olume 14, Issue 12, pp. 46-62, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2025) https://doi.org/10.4230/DagRep.14.12.46
-
[19]
Coding for composite DNA to correct substitutions, Strand Losses, and Deletions,
F. Walter, O. Sabary, A. Wachter-Zeh and E. Yaakobi, "Coding for composite DNA to correct substitutions, Strand Losses, and Deletions," IEEE Int. Symp. Inf. Theory (ISIT), Athens, Greece, 2024, pp. 97-102
work page 2024
-
[20]
Efficient Binomial Channel Capacity Computation with an Application to Molecular Communication,
R. D. Wesel, E. E. Wesel, L. Vandenberghe, C. Komninakis and M. Medard, "Efficient Binomial Channel Capacity Computation with an Application to Molecular Communication," 2018 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 2018, pp. 1-5, doi: 10.1109/ITA.2018.8503225
-
[21]
On codes derivable from the tensor product of check matrices,
J. Wolf, "On codes derivable from the tensor product of check matrices," inIEEE Trans. on Inf. Theory, vol. 11, no. 2, pp. 281-284, April 1965
work page 1965
-
[22]
Limited-magnitude error correction for probability vectors in DNA storage,
W. Zhang, Z. Chen, and Z. Wang, “Limited-magnitude error correction for probability vectors in DNA storage,"IEEE International Conference on Communications (ICC), pp. 3460–3465, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.