pith. sign in

arxiv: 2601.06503 · v2 · submitted 2026-01-10 · 💻 cs.IT · math.IT

Some New Results on Sequence Reconstruction Problem for Deletion Channels

Pith reviewed 2026-05-16 15:42 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords sequence reconstructiondeletion channelsLevenshtein distanceball intersectionN(n,d,t)error correctioncombinatorial bounds
0
0 comments X

The pith

The maximum intersection size of two deletion balls of radius 4 with centers at least distance 3 apart is exactly 20n-166 for sequence length n at least 13.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper determines the exact maximum number of sequences that lie within deletion distance 4 of two different original sequences that are themselves at least distance 3 apart. This quantity, called N(n,3,4), is shown to equal 20n minus 166 for all sufficiently long sequences. The result settles a specific open question by providing matching lower and upper bounds through explicit constructions for the lower bound and case-by-case analysis for the upper bound. A more general lower bound is also given for larger radii t greater than or equal to 4. Knowing this exact overlap size helps determine how many noisy copies are needed to reliably reconstruct the original sequence in channels that only delete symbols.

Core claim

The authors prove that N(n,3,4) equals 20n-166 for every n at least 13. They first construct two centers at distance 3 whose balls of radius 4 intersect in exactly 20n-166 sequences, establishing the lower bound. Then they show that no pair of centers at distance 3 can have a larger intersection by exhaustive enumeration of possible overlapping patterns for n at least 13. This confirms the conjectured value and provides a lower bound expression that is tight only for t=4.

What carries the argument

N(n,d,t), the largest possible intersection size between two balls of radius t in the Levenshtein deletion metric whose centers are at least distance d apart.

If this is right

  • The exact number of common neighbors for distance-3 centers in the deletion graph is now known for radius 4.
  • Reconstruction algorithms for deletion channels can use this bound to determine the minimum number of traces needed.
  • The linear growth rate of 20n is the precise worst-case overlap for these parameters.
  • Techniques involving case analysis on deletion patterns may extend to nearby parameters.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar exact formulas might exist for other small values of d and t beyond those covered here.
  • DNA sequencing applications could apply this bound directly when modeling deletion errors.
  • The general lower bound for t greater than 4 suggests that overlaps grow roughly linearly with n regardless of t in this regime.

Load-bearing premise

The case analysis and constructions remain valid without missing exceptional cases for every sequence length n at least 13.

What would settle it

An explicit pair of length-n sequences at distance exactly 3 whose common deletion ball of radius 4 contains more than 20n-166 strings, for some n at least 13.

Figures

Figures reproduced from arXiv: 2601.06503 by Fang-Wei Fu, Han Li, Weijun Fang, Xiang Wang.

Figure 1
Figure 1. Figure 1: Outline of proof of upper bound. Now, we examine the logical flow shown in [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
read the original abstract

Levenshtein first introduced the sequence reconstruction problem in $2001$. In the realm of combinatorics, the sequence reconstruction problem is equivalent to determining the value of $N(n,d,t)$, which represents the maximum size of the intersection of two metric balls of radius $t$, given that the distance between their centers is at least $d$ and the sequence length is $n$. In this paper, We present a lower bound on $N(n,3,t)$ for $n\geq \max\{13,t+8\}$ and $t \geq 4$. For $t=4$, we prove that this lower bound is tight. This settles an open question posed by Pham, Goyal, and Kiah, confirming that $N(n,3,4)=20n-166$ for all $n \geq 13$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript studies the sequence reconstruction problem equivalent to determining N(n,d,t), the maximum intersection size of two Levenshtein balls of radius t whose centers are at distance at least d. It establishes a general lower bound on N(n,3,t) for n ≥ max{13,t+8} and t ≥ 4, and proves that the bound is tight for t=4, yielding the exact formula N(n,3,4)=20n−166 for all n≥13. This resolves an open question posed by Pham, Goyal, and Kiah.

Significance. If the claimed equality holds, the result supplies the first exact closed-form determination of N(n,3,4) for sufficiently large n. This is a concrete advance in combinatorial coding theory for deletion channels, as the quantity directly governs the worst-case number of common subsequences that must be disambiguated when reconstructing a sequence from two reads at Levenshtein distance 3. The explicit constructions achieving the lower bound and the matching upper-bound argument via case analysis on ball intersections constitute a non-trivial combinatorial contribution.

major comments (2)
  1. [§4] §4 (upper-bound argument): the claim that N(n,3,4) ≤ 20n−166 for all n≥13 rests on an exhaustive enumeration of all pairs of centers at distance exactly 3 together with the common subsequences they can share inside radius-4 balls. The manuscript must demonstrate that the case division is complete, overlap-free, and free of missed configurations for every n≥13; a single overlooked pair permitting a larger intersection would falsify the exact formula.
  2. [§3] §3 (lower-bound construction): the explicit constructions achieving 20n−166 are stated to work for n≥13, but the verification that they remain valid without exceptions for arbitrarily large n (including the handling of boundary effects when n grows) is not fully detailed in the visible derivation steps.
minor comments (1)
  1. [§2] Notation: the definition of the Levenshtein ball and the precise meaning of “common subsequence inside radius t” should be restated once in the main text (rather than only by reference) to improve readability for readers outside the immediate subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for acknowledging the resolution of the open question on N(n,3,4). We address the major comments point by point below and will revise the manuscript to add the requested clarifications.

read point-by-point responses
  1. Referee: [§4] §4 (upper-bound argument): the claim that N(n,3,4) ≤ 20n−166 for all n≥13 rests on an exhaustive enumeration of all pairs of centers at distance exactly 3 together with the common subsequences they can share inside radius-4 balls. The manuscript must demonstrate that the case division is complete, overlap-free, and free of missed configurations for every n≥13; a single overlooked pair permitting a larger intersection would falsify the exact formula.

    Authors: We agree that explicit verification of completeness is essential. The case analysis in §4 enumerates all possible Levenshtein edit combinations yielding distance exactly 3 and classifies the resulting ball intersections by the positions and types of edits. For n≥13 the patterns are independent of boundary effects. We will add a dedicated subsection (or appendix) that tabulates the cases, proves they are exhaustive and disjoint via exhaustive enumeration of edit position tuples, and confirms no larger intersections exist. revision: yes

  2. Referee: [§3] §3 (lower-bound construction): the explicit constructions achieving 20n−166 are stated to work for n≥13, but the verification that they remain valid without exceptions for arbitrarily large n (including the handling of boundary effects when n grows) is not fully detailed in the visible derivation steps.

    Authors: The constructions in §3 are defined via fixed-length differing blocks embedded in a periodic background that scales linearly with n. The condition n≥13 ensures the blocks fit without overlap or boundary interference. We will expand the derivation with an inductive argument on n (base case n=13 verified directly, inductive step shows the intersection size increases exactly by 20) together with an explicit check that boundary positions never produce extra common subsequences. revision: yes

Circularity Check

0 steps flagged

No circularity: exact value obtained via explicit constructions and case analysis

full rationale

The derivation establishes N(n,3,4)=20n-166 through an explicit combinatorial construction achieving the lower bound and a separate exhaustive case analysis proving the matching upper bound for all n≥13. Neither step reduces to a self-definition, fitted parameter renamed as prediction, or load-bearing self-citation; the case division enumerates pairs of centers at distance 3 and their radius-4 intersections directly from the metric definition. The result settles an external open question without internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The result rests on the standard definition of the deletion metric and the intersection size N(n,d,t) introduced by Levenshtein; no new free parameters, invented entities, or non-standard axioms are introduced.

axioms (1)
  • standard math The deletion distance and the metric-ball intersection size N(n,d,t) are well-defined for all positive integers n, d, t.
    Invoked in the opening paragraph to frame the problem.

pith-pipeline@v0.9.0 · 5440 in / 1185 out tokens · 47297 ms · 2026-05-16T15:42:38.439288+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Next-generation digital information storage in DNA,

    G. M. Church, Y . Gao, and S. Kosuri, “Next-generation digital information storage in DNA,”Science, vol. 337, no. 6102, pp. 1628–1628, 2012

  2. [2]

    DNA-based storage: Trends and methods,

    S. Yazdi, H.M. Kiah, E.R. Garcia, J. Ma, H. Zhao, and O. Milenkovic, “DNA-based storage: Trends and methods,”IEEE Trans. Molecular, Biological, Multi-Scale Commun., vol. 1, no. 3, pp. 230–248, 2015

  3. [3]

    Coding over sets for DNA storage,

    A. Lenz, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi E, “Coding over sets for DNA storage,”IEEE Trans. Inf. Theory, vol. 66, no. 4, pp. 2331–2351, 2020

  4. [4]

    Magnetic domain-wall racetrack memory,

    S. S. Parkin, M. Hayashi, and L. Thomas, “Magnetic domain-wall racetrack memory,”Science, vol. 320, pp. 190–194, 2008

  5. [5]

    Coding for racetrack memories,

    Y . M. Chee, H. M. Kiah, A. Vardy, E. Yaakobi, and V . K. Vu, “Coding for racetrack memories,”IEEE Trans. Inf. Theory, vol. 64, no. 11, pp. 7094–7112, 2018

  6. [6]

    Reconstruction of permutations distorted by single reversal errors,

    E. Konstantinova, “Reconstruction of permutations distorted by single reversal errors,”Discrete Applied Mathematics, vol. 155, pp. 2426–2434, 2007

  7. [7]

    Efficient reconstruction of sequences,

    V . I. Levenshtein, “Efficient reconstruction of sequences,”IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2–22, 2001

  8. [8]

    Efficient reconstruction of sequences from their subsequences or supersequences,

    V . I. Levenshtein, “Efficient reconstruction of sequences from their subsequences or supersequences,”Journal of Combin. Theory, Ser. A, vol. 93, no.2, pp. 310–332, 2001

  9. [9]

    Reconstruction of a graph from2-vicinities of its vertices,

    V . I. Levenshtein, E. Konstantinova, E. Konstantinov, and S. Molodtsov, “Reconstruction of a graph from2-vicinities of its vertices,”Discrete Applied Mathematics, vol. 156, pp. 1399–1406, 2008

  10. [10]

    Error graphs and the reconstruction of elements in groups,

    V . I. Levenshtein and J. Siemons, “Error graphs and the reconstruction of elements in groups,”Journal of Combin. Theory, Ser. A, vol. 116, pp. 795–815, 2009

  11. [11]

    Exact reconstruction from insertions in synchronization codes,

    F. Sala, R. Gabrys, C. Schoeny, and L. Dolecek, “Exact reconstruction from insertions in synchronization codes,”IEEE Trans. Inf. Theory, vol. 63, no. 4, pp. 2428–2445, 2017

  12. [12]

    Sequence reconstruction over the deletion channel,

    R. Gabrys and E. Yaakobi, “Sequence reconstruction over the deletion channel,”IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2924–2931, 2018

  13. [13]

    Sequence reconstruction problem for deletion channels: a complete asymptotic solution,

    V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,” InProc. IEEE Int. Symp. Inform. Theory, pp. 992–997, 2022. 32

  14. [14]

    Sequence reconstruction problem for deletion channels: a complete asymptotic solution,

    V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,”Journal of Combin. Theory, Ser. A, https://doi.org/10.1016/j.jcta.2024.105980, 2025

  15. [15]

    Sequence reconstruction for Grassmann graphs and permutations,

    E. Yaakobi, M. Schwartz, M. Langberg, and J. Bruck, “Sequence reconstruction for Grassmann graphs and permutations,” InProc. Int. Symp. Inform. Theory, pp. 874–878, 2013

  16. [16]

    Reconstruction of permutations distorted by single Kendallτ-errors,

    X. Wang, “Reconstruction of permutations distorted by single Kendallτ-errors,”Cryptogr. Commun., vol. 15, pp. 131–144, 2023

  17. [17]

    The sequence reconstruction problem for permutations with the Hamming distance,

    X. Wang and E. V . Konstantinova, “The sequence reconstruction problem for permutations with the Hamming distance,”Cryptogr. Commun., vol. 16, pp. 1033–1057, 2024

  18. [18]

    The sequence reconstruction problem for permutations with the Hamming distance,

    X. Wang, F.-W. Fu, and E. V . Konstantinova, “The sequence reconstruction problem for permutations with the Hamming distance,”Des. Codes Cryptogr., vol. 93, no. 1, pp. 11–37, 2025

  19. [19]

    On the computation of Levenshtein’s distances,

    L. Calabi, “On the computation of Levenshtein’s distances,”TN-9-0030, Parke Math. Labs., Inc., Carlisle, MA., 1967

  20. [20]

    Correcting deletions with multiple reads,

    J. Chrisnata, H. M. Kiah, and E. Yaakobi, “Correcting deletions with multiple reads,”IEEE Trans. Inf. Theory, vol. 68, no. 11, pp. 7141–7158, 2022

  21. [21]

    Sequence reconstruction under channels with multiple bursts of insertions or deletions,

    Z. Lan, Y . Sun, W. Yu, and G. Ge, “Sequence reconstruction under channels with multiple bursts of insertions or deletions,”IEEE Trans. Inf. Theory, vol. 72, no. 1, pp. 315-330, 2026

  22. [22]

    Sequence reconstruction under single-burst insertion/deletion/edit channel,

    Y . Sun, Y . Xi, and G. Ge, “Sequence reconstruction under single-burst insertion/deletion/edit channel,”IEEE Trans. Inf. Theory, vol. 69, no. 7, pp. 4466-4483, 2023

  23. [23]

    Sequence reconstruction over 3-deletion Channels,

    D. Zhang, G. Ge, and Y . Zhang, “Sequence reconstruction over 3-deletion Channels,” InProc. IEEE Int. Symp. Inf. Theory, pp. 891-896, 2024

  24. [24]

    Sequence reconstruction for the single-deletion single-substitution channel,

    W. Song, K. Cai, and T. Q. S. Quek, “Sequence reconstruction for the single-deletion single-substitution channel,”IEEE Jou. on Sel. Areas Inf. Theory, vol. 6, pp. 232-247, 2025

  25. [25]

    Coding for sequence reconstruction for single edits,

    K. Cai, H. M. Kiah, T. T. Nguyen, and E. Yaakobi, “Coding for sequence reconstruction for single edits,”IEEE Trans. Inf. Theory, vol. 68, no. 1, pp. 66-79, 2022

  26. [26]

    Correcting two-deletion with a constant number of reads,

    Y . Sun and G. Ge, “Correcting two-deletion with a constant number of reads,”IEEE Trans. Inf. Theory, vol. 69, no. 5, pp. 2969-2982, 2023

  27. [27]

    Bounds and constructions ofℓ-read codes under the Hamming metric,

    Y . Sun and G. Ge, “Bounds and constructions ofℓ-read codes under the Hamming metric,”IEEE Trans. Inf. Theory, vol. 71, no. 8, pp. 5868-5883, 2025

  28. [28]

    Codes for correcting a burst of edits using weighted-summation VT sketch,

    Y . Sun and G. Ge, “Codes for correcting a burst of edits using weighted-summation VT sketch,”IEEE Trans. Inf. Theory, vol. 71, no. 3, pp. 1631-1646, 2025

  29. [29]

    Reconstruction of sequences distorted by two insertions,

    Z. Ye, X. Liu, X. Zhang, and G. Ge, “Reconstruction of sequences distorted by two insertions,”IEEE Trans. Inf. Theory, vol. 69, no. 8, pp. 4977-4992, 2023

  30. [30]

    Balanced reconstruction codes for single edits,

    R. Wu and X. Zhang, “Balanced reconstruction codes for single edits,”Des. Codes Cryptogr.,vol. 92, pp. 2011-2029, 2024