Correcting One Deletion and One Substitution with a Constant Number of Reads
Pith reviewed 2026-05-07 15:08 UTC · model grok-4.3
The pith
Codes allow unique reconstruction from 5 reads each with one deletion and one substitution using 3 log n +4 redundant bits.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this paper, we investigate the problem of designing (n, N; B)-reconstruction codes for N in {14,11,9,5}, where B is the single-deletion single-substitution ball function that maps a sequence to the set of all sequences obtainable via one deletion and one substitution. Such a code is defined by the requirement that the intersection size of any two distinct single-deletion single-substitution balls is strictly less than the given number of noisy reads N. When N=1 the problem reduces to single-deletion and single-substitution correcting codes, whose best known redundancy is (4+o(1)) log n. We show that this redundancy can be reduced to 3 log n +4 when N=5. As N increases further to 9 and 11,
What carries the argument
The (n,N;B)-reconstruction code, a subset of strings such that the single-deletion single-substitution balls of any two distinct codewords intersect in fewer than N strings; the ball intersection condition lets N noisy versions identify the original string.
If this is right
- For N=5 the codes achieve 3 log n +4 bits of redundancy.
- For N=9 the redundancy drops to 2 log n +12 log log n +O(1).
- For N=11 the redundancy is log n +12 log log n +O(1).
- For N=14 only log n +3 bits of redundancy are required, two bits above the best known codes for N=18.
- Any code built for a given N also works for all larger constant values of N.
Where Pith is reading between the lines
- The same ball-intersection approach can be examined for other fixed combinations of deletions, insertions and substitutions.
- Closing the remaining two-bit gap between the N=14 and N=18 constructions may be possible by examining intermediate constant values of N.
- The explicit dependence on hash functions and markers points to a general method for controlling overlaps in error balls under limited observations.
Load-bearing premise
Suitable combinatorial objects exist that enforce the ball-intersection size to be strictly less than N while using only the stated number of redundant bits.
What would settle it
Finding two strings at the claimed redundancy whose single-deletion single-substitution balls intersect in N or more sequences would show that the intersection condition fails for that N.
Figures
read the original abstract
In this paper, we investigate the problem of designing $(n, N; \mathcal{B})$-reconstruction codes for $N\in \{14,11,9,5\}$, where $\mathcal{B}$ is the single-deletion single-substitution ball function that maps a sequence to the set of all sequences obtainable via one deletion and one substitution. Such a code is defined by the requirement that the intersection size of any two distinct single-deletion single-substitution balls is strictly less than the given number of noisy reads $N$. Note that for any $1\le N<N'$, an $(n, N; \mathcal{B})$-reconstruction code is also an $(n, N'; \mathcal{B})$-reconstruction code. It follows that the problem of designing $(n, N; \mathcal{B})$-reconstruction codes with less redundancy becomes more challenging as $N$ decreases, particularly because the problem for $N=1$ already reduces to the coding problem of single-deletion and single-substitution correcting codes. To the best of our knowledge, most existing results focus on the case where $N$ is a linear function of $n$, while only a limited number consider constant $N$. When $N=1$, the best known $(n, 1; \mathcal{B})$-reconstruction codes (single-deletion and single-substitution correcting codes) require $(4+o(1))\log n$ redundant bits. In this work, we show that this redundancy can be reduced to $3\log n+4$ when $N=5$. As $N$ increases further to $9$ and $11$, the redundancy can be improved to $2\log n+12\log\log n+O(1)$ and $\log n +12\log \log n+O(1)$, respectively. Finally, for $N=14$, we provide a reconstruction code with $\log n+3$ bits of redundancy, which is only two bits more than the best known $(n, 18; \mathcal{B})$-reconstruction codes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies (n, N; B)-reconstruction codes where B is the ball of all strings obtained from a given string by one deletion and one substitution. It claims explicit constructions achieving redundancy 3 log n + 4 for N=5, 2 log n + 12 log log n + O(1) for N=9, log n + 12 log log n + O(1) for N=11, and log n + 3 for N=14. These are positioned as improvements over the (4+o(1)) log n redundancy known for the N=1 case (ordinary single-deletion single-substitution correcting codes).
Significance. If the constructions are valid, the results would be a useful contribution to the theory of reconstruction codes with constant reads. They quantify the redundancy-reads trade-off for this combined error model and show that moderate constant N already yields redundancy close to the information-theoretic floor (log n + O(1)). The explicit expressions for several N values and the comparison to the N=18 benchmark are concrete and could guide further work.
major comments (3)
- [Abstract / N=5 construction] Abstract and the constructions for N=5: the claimed redundancy of 3 log n + 4 rests on the existence of auxiliary hash or marker families that enforce |B(x) ∩ B(y)| < 5 while adding at most 3 log n + 4 bits. Because B is the combined deletion-substitution ball, its intersection structure differs from pure deletion or substitution balls; the paper must verify that standard existence results for hash families apply at these exact parameters without extra slack in the redundancy.
- [N=9 and N=11 constructions] Constructions for N=9 and N=11: the stated redundancies 2 log n + 12 log log n + O(1) and log n + 12 log log n + O(1) likewise depend on combinatorial objects (markers or hashes) that keep ball intersections below N. The manuscript should supply a concrete parameter check or self-contained existence proof showing that the O(1) and log-log terms suffice for the specific size and intersection properties of B.
- [N=14 construction] N=14 construction: the claim of log n + 3 redundancy (only two bits above the best known (n,18;B) code) is load-bearing for the overall narrative. The argument must confirm that the auxiliary objects used to enforce |B(x) ∩ B(y)| < 14 contribute no hidden n-dependent overhead beyond the stated +3 bits.
minor comments (2)
- A compact table listing N, achieved redundancy, and comparison to the N=1 and N=18 baselines would improve readability.
- [Introduction / Preliminaries] The notation for the ball B and the intersection condition |B(x) ∩ B(y)| < N should be introduced with a short example in the preliminaries.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments on our work. We address each major comment below with clarifications drawn directly from the manuscript's constructions. We will incorporate additional explicit parameter verifications and a short appendix in the revised version to strengthen the presentation without altering the claimed redundancies.
read point-by-point responses
-
Referee: [Abstract / N=5 construction] Abstract and the constructions for N=5: the claimed redundancy of 3 log n + 4 rests on the existence of auxiliary hash or marker families that enforce |B(x) ∩ B(y)| < 5 while adding at most 3 log n + 4 bits. Because B is the combined deletion-substitution ball, its intersection structure differs from pure deletion or substitution balls; the paper must verify that standard existence results for hash families apply at these exact parameters without extra slack in the redundancy.
Authors: We appreciate the referee pointing this out. Section 3 of the manuscript provides an explicit probabilistic construction of the hash families adapted to the deletion-substitution ball B, whose size is Θ(n²). The union bound is applied directly to pairs of codewords using the precise intersection probability for this B (accounting for the two error types and their positions), yielding exactly 3 log n + 4 bits with no additional slack. The constant 4 absorbs the small number of cases for synchronization. To make this verification fully self-contained, we will add a dedicated lemma with the parameter calculation in the revision. revision: partial
-
Referee: [N=9 and N=11 constructions] Constructions for N=9 and N=11: the stated redundancies 2 log n + 12 log log n + O(1) and log n + 12 log log n + O(1) likewise depend on combinatorial objects (markers or hashes) that keep ball intersections below N. The manuscript should supply a concrete parameter check or self-contained existence proof showing that the O(1) and log-log terms suffice for the specific size and intersection properties of B.
Authors: The constructions in Sections 4 and 5 build on efficient marker families whose redundancy is known to be 12 log log n + O(1) and that are adapted to enforce |B(x) ∩ B(y)| < N for these specific N. The log-log term comes from the marker length needed to handle O(n²) possible error locations in B, while the O(1) term is verified by direct counting of the remaining intersection cases after marker placement. We will add a self-contained existence proof (via the probabilistic method with explicit constants) and a parameter table in an appendix to confirm the bounds hold for B. revision: yes
-
Referee: [N=14 construction] N=14 construction: the claim of log n + 3 redundancy (only two bits above the best known (n,18;B) code) is load-bearing for the overall narrative. The argument must confirm that the auxiliary objects used to enforce |B(x) ∩ B(y)| < 14 contribute no hidden n-dependent overhead beyond the stated +3 bits.
Authors: Section 6 constructs the code by appending a fixed-length synchronization marker of 3 bits chosen to resolve all possible deletion-substitution patterns that could cause |B(x) ∩ B(y)| ≥ 14. Because the marker is constant-length (independent of n) and the main code contributes the log n term, there is no hidden n-dependent overhead. The choice of 3 bits is verified by exhaustive enumeration over the O(n) possible deletion positions and 4 substitution symbols, showing that N=14 is achieved. We will include an explicit verification table in the revision. revision: partial
Circularity Check
Constructions rely on standard combinatorial existence assumptions without self-referential reduction
full rationale
The paper's claims for reduced redundancy at N=5,9,11,14 are presented as explicit constructions that enforce |B(x) ∩ B(y)| < N via auxiliary objects such as hash families or markers. The abstract and described approach cite prior bounds for N=1 as a baseline and improve upon them, but no equations or steps reduce the stated redundancy figures to fitted parameters, self-definitions, or load-bearing self-citations. The ball-intersection condition is an external combinatorial requirement rather than an internal tautology. This is a normal non-circular finding for existence-based coding constructions.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The single-deletion single-substitution ball B maps each sequence to the set of all sequences obtainable by one deletion and one substitution.
- domain assumption An (n, N; B)-reconstruction code requires that the intersection of any two distinct balls has size strictly less than N.
Reference graph
Works this paper leans on
-
[1]
On Levenshtein’s reconstruction problem under insertions, deletions, and substitutions,
M. Abu-Sini and E. Yaakobi, “On Levenshtein’s reconstruction problem under insertions, deletions, and substitutions,”IEEE Transactions on Information Theory, vol. 67, no. 11, pp. 7132-7158, 2021
work page 2021
-
[2]
Coding for sequence reconstruction for single edits,
K. Cai, H. M. Kiah, T. T. Nguyen, and E. Yaakobi, “Coding for sequence reconstruction for single edits,”IEEE Transactions on Information Theory, vol. 68, no. 1, pp. 66-79, 2022
work page 2022
-
[3]
Coding for racetrack memories,
Y . M. Chee, H. M. Kiah, A. Vardy, V . K. Vu, and E. Yaakobi, “Coding for racetrack memories,”IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 7094-7112, 2018
work page 2018
-
[4]
Correcting deletions with multiple reads,
J. Chrisnata, H. M. Kiah, and E. Yaakobi, “Correcting deletions with multiple reads,”IEEE Transactions on Information Theory, vol. 68, no. 11, pp. 7141-7158, 2022
work page 2022
-
[5]
Next-generation digital information storage in DNA,
G. M. Church, Y . Gao, and S. Kosuri, “Next-generation digital information storage in DNA,”Science, vol. 337, no. 6102, pp. 1628-1628, 2012
work page 2012
-
[6]
Sequence reconstruction over the deletion channel,
R. Gabrys and E. Yaakobi, “Sequence reconstruction over the deletion channel,”IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2924-2931, 2018
work page 2018
-
[7]
Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,
N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust, B. Sipos, and E. Birney, “Towards practical, high-capacity, low-maintenance information storage in synthesized DNA,”Nature, vol. 494, pp. 77-80, 2013
work page 2013
-
[8]
Universal framework for parametric constrained coding,
A. Kobovich, O. Leitersdorf, D. Bar-Lev, and E. Yaakobi, “Universal framework for parametric constrained coding,”IEEE Transactions on Information Theory, Early Access, doi: 10.1109/TIT.2026.3672710
-
[9]
Sequence reconstruction under channels with multiple bursts of insertions or deletions,
Z. Lan, Y . Sun, W. Yu, and G. Ge, “Sequence reconstruction under channels with multiple bursts of insertions or deletions,”IEEE Transactions on Information Theory, vol. 72, no. 1, pp. 315-330, 2026
work page 2026
-
[10]
Binary codes capable of correcting deletions, insertions, and reversals,
V . I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,”Soviet Physics Doklady, vol. 10, no. 8, pp. 707-710, 1966
work page 1966
-
[11]
Efficient reconstruction of sequences,
V . I. Levenshtein, “Efficient reconstruction of sequences,”IEEE Transactions on Information Theory, vol. 47, no. 1, pp. 2-22, 2001
work page 2001
-
[12]
Efficient reconstruction of sequences from their subsequences or supersequences,
V . I. Levenshtein, “Efficient reconstruction of sequences from their subsequences or supersequences,”Journal of Combinatorial Theory, Series A, vol. 93, no. 2, pp. 310-332, 2001
work page 2001
-
[13]
Binary reconstruction codes for correcting one deletion and one substitution,
Y . Li, Y . Sun, and G. Ge, “Binary reconstruction codes for correcting one deletion and one substitution,”arXiv:2505.04232, 2025
-
[14]
Random access in large-scale DNA data storage,
L. Organick, S. Ang, Y .-J. Chen, R. Lopez, S. Yekhanin, K. Makarychev, M. Racz, G. Kamath, P. Gopalan, B. Nguyen, C. Takahashi, S. Newman, H.-Y . Parker, C. Rashtchian, K. Stewart, G. Gupta, R. Carlson, J. Mulligan, D. Carmean, G. Seelig, L. Ceze, and K. Strauss, “Random access in large-scale DNA data storage,”Nature Biotechnology, vol. 36, no. 3, pp. 24...
work page 2018
-
[15]
Sequence reconstruction problem for deletion channels: a complete asymptotic solution,
V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,”Journal of Combinatorial Theory, Series A, vol. 211, p. 105980, 2025
work page 2025
-
[16]
Exact reconstruction from insertions in synchronization codes,
F. Sala, R. Gabrys, C. Schoeny, and L. Dolecek, “Exact reconstruction from insertions in synchronization codes,”IEEE Transactions on Information Theory, vol. 63, no. 4, pp. 2428-2445, 2017
work page 2017
-
[17]
Single-deletion single-substitution correcting codes,
I. Smagloy, L. Welter, A. Wachter-Zeh, and E. Yaakobi, “Single-deletion single-substitution correcting codes,”IEEE Transactions on Information Theory, vol. 69, no. 12, pp. 7659-7671, 2023
work page 2023
-
[18]
List-decodable codes for single-deletion single-substitution with list-size two,
W. Song, K. Cai, and T. T. Nguyen, “List-decodable codes for single-deletion single-substitution with list-size two,” inProceeding of the International Symposium on Information Theory (ISIT), Espoo, Finland, pp. 1004-1009, 2022
work page 2022
-
[19]
Sequence reconstruction for the single-deletion single-substitution channel,
W. Song, K. Cai, and T. Q. S. Quek, “Sequence reconstruction for the single-deletion single-substitution channel,”IEEE Journal on Selected Areas in Information Theory, vol. 6, pp. 232-247, 2025
work page 2025
-
[20]
Correcting two-deletion with a constant number of reads,
Y . Sun and G. Ge, “Correcting two-deletion with a constant number of reads,”IEEE Transactions on Information Theory, vol. 69, no. 5, pp. 2969-2982, 2023
work page 2023
-
[21]
Binary codes for correcting two edits,
Y . Sun and G. Ge, “Binary codes for correcting two edits,”IEEE Transactions on Information Theory, vol. 70, no. 10, pp. 6877-6898, 2024
work page 2024
-
[22]
Bounds and constructions ofℓ-read codes under the Hamming metric,
Y . Sun and G. Ge, “Bounds and constructions ofℓ-read codes under the Hamming metric,”IEEE Transactions on Information Theory, vol. 71, no. 8, pp. 5868-5883, 2025
work page 2025
-
[23]
Survey for a decade of coding for DNA storage,
O. Sabary, H. M. Kiah, P. H. Siegel, and E. Yaakobi, “Survey for a decade of coding for DNA storage,”IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, vol. 10, no. 2, pp. 253-271, 2024
work page 2024
-
[24]
Balanced reconstruction codes for single edits,
R. Wu and X. Zhang, “Balanced reconstruction codes for single edits,”Designs, Codes and Cryptography, vol. 92, pp. 2011-2029, 2024
work page 2011
-
[25]
Survey of sequence reconstruction problems and their applications in DNA-based storage,
Y . Yang, “Survey of sequence reconstruction problems and their applications in DNA-based storage,”IEEE Journal on Selected Areas in Information Theory, vol. 6, pp. 352-366, 2025
work page 2025
-
[26]
DNA-based storage: trends and methods,
S. Yazdi, H. M. Kiah, E. R. Garcia, J. Ma, H. Zhao, and O. Milenkovic, “DNA-based storage: trends and methods,”IEEE Transactions on Molecular, Biological, and Multi-Scale Communications, vol. 1, no. 3, pp. 230-248, 2015
work page 2015
-
[27]
Reconstruction of sequences distorted by two insertions,
Z. Ye, X. Liu, X. Zhang, and G. Ge, “Reconstruction of sequences distorted by two insertions,”IEEE Transactions on Information Theory, vol. 69, no. 8, pp. 4977-4992, 2023
work page 2023
-
[28]
Sequence reconstruction over3-deletion channels,
D. Zhang, G. Ge, and Y . Zhang, “Sequence reconstruction over3-deletion channels,” inProceeding of the International Symposium on Information Theory (ISIT), Athens, Greece, 2024, pp. 891-896
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.