pith. sign in

arxiv: 2504.19363 · v2 · submitted 2025-04-27 · 💻 cs.IT · math.CO· math.IT

Sequence Reconstruction for Sticky Insertion/Deletion Channels

Pith reviewed 2026-05-22 18:12 UTC · model grok-4.3

classification 💻 cs.IT math.COmath.IT
keywords sequence reconstructionsticky insertionsticky deletioninsertion-deletion channelerror correctionDNA data storagereconstruction algorithmtrace reconstruction
0
0 comments X

The pith

A recursive formula determines the fewest distinct outputs needed to uniquely recover messages sent through channels with bounded sticky insertions and deletions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies the problem of reconstructing an original sequence after it passes through a channel that can add at most t sticky insertions and delete at most s sticky deletions. Sticky insertions repeat a symbol consecutively, while sticky deletions remove consecutive identical symbols. The authors first derive a recursive formula that gives the smallest number of distinct received sequences guaranteeing that only one original sequence fits all of them. They then give an efficient algorithm that takes those sequences and outputs the original vector. The work targets emerging storage technologies where such run-length errors appear.

Core claim

For the (t, s)-sticky-insdel channel, the minimum number of distinct outputs required to uniquely reconstruct the transmitted vector is given by a recursive formula. An efficient algorithm reconstructs the transmitted vector from a set of erroneous sequences once their count meets or exceeds this number.

What carries the argument

The recursive formula that computes the minimum number of distinct traces sufficient for unique reconstruction in the (t, s)-sticky-insdel model.

If this is right

  • The minimal number of distinct outputs for any fixed t and s can be calculated directly from the recursion.
  • Reconstruction becomes possible in polynomial time once the required number of outputs is supplied.
  • The sticky-insertion-only case reduces to a known tandem-duplication reconstruction problem.
  • Unique recovery is guaranteed precisely when the number of distinct outputs reaches the recursive threshold.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Storage systems that read multiple copies of the same data block could use the formula to decide how many reads suffice for reliable recovery.
  • The same recursive counting technique might apply to other channels whose errors are confined to runs of repeated symbols.
  • An implementation of the reconstruction algorithm could be tested on synthetic traces generated from the exact (t, s) model to measure runtime scaling.

Load-bearing premise

The channel produces only runs of at most t identical-symbol insertions and s identical-symbol deletions, and the distinct outputs behave exactly as this bounded process predicts.

What would settle it

A concrete counter-example in which the number of distinct outputs equals the value given by the recursion yet two different input vectors remain consistent with every output.

read the original abstract

The sequence reconstruction problem for insertion/deletion channels has attracted significant attention owing to their applications recently in some emerging data storage systems, such as racetrack memories, DNA-based data storage. Our goal is to investigate the reconstruction problem for sticky-insdel channels where both sticky-insertions and sticky-deletions occur. If there are only sticky-insertion errors, the reconstruction problem for sticky-insertion channel is a special case of the reconstruction problem for tandem-duplication channel which has been well-studied. In this work, we consider the $(t, s)$-sticky-insdel channel where there are at most $t$ sticky-insertion errors and $s$ sticky-deletion errors when we transmit a message through the channel. For the reconstruction problem, we are interested in the minimum number of distinct outputs from these channels that are needed to uniquely recover the transmitted vector. We first provide a recursive formula to determine the minimum number of distinct outputs required. Next, we provide an efficient algorithm to reconstruct the transmitted vector from erroneous sequences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper studies the sequence reconstruction problem over the (t,s)-sticky-insdel channel, in which a transmitted vector experiences at most t sticky insertions and s sticky deletions. It claims to derive a recursive formula that gives the minimum number of distinct channel outputs sufficient for unique recovery of the input and to supply an efficient algorithm that performs the reconstruction from those outputs. Sticky insertions are positioned as a special case of tandem-duplication channels.

Significance. A correct recursive characterization together with a polynomial-time reconstruction procedure would be useful for DNA-based storage and racetrack memory applications, where sticky errors arise naturally. The combination of an exact counting formula and an explicit algorithm is a constructive strength; however, the practical value hinges on whether the recursion fully enumerates worst-case distinguishable error patterns.

major comments (2)
  1. [Problem setup and recursive formula] The central claim that the recursive formula yields the exact minimum number of outputs rests on the assumption that all interactions between sticky insertions and deletions (especially on runs of identical symbols) are captured. The abstract and problem formulation do not exhibit an explicit accounting for sequential or overlapping errors on the same run; if such cases are undercounted, the stated threshold would be insufficient to guarantee uniqueness.
  2. [Reconstruction algorithm] The reconstruction algorithm is asserted to be efficient, yet no complexity analysis or correctness argument relative to the recursive count is supplied in the visible sections. This is load-bearing because the algorithm must succeed precisely when the number of outputs meets the recursive threshold.
minor comments (2)
  1. [Abstract] Notation for the (t,s)-sticky-insdel channel should be introduced once and used consistently; the current abstract mixes “sticky-insertion errors” and “tandem-duplication” without a clear mapping.
  2. [Main results] A small example (e.g., t=1, s=1, short binary vector) illustrating both the recursive count and the algorithm output would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of the manuscript and the constructive comments. We respond to each major comment below and outline the revisions that will be incorporated in the next version.

read point-by-point responses
  1. Referee: [Problem setup and recursive formula] The central claim that the recursive formula yields the exact minimum number of outputs rests on the assumption that all interactions between sticky insertions and deletions (especially on runs of identical symbols) are captured. The abstract and problem formulation do not exhibit an explicit accounting for sequential or overlapping errors on the same run; if such cases are undercounted, the stated threshold would be insufficient to guarantee uniqueness.

    Authors: We thank the referee for highlighting this point. The recursive formula is derived by enumerating error patterns that explicitly include sequential and overlapping sticky insertions and deletions acting on the same run of identical symbols; the recursion tracks changes to run lengths after each error type. Nevertheless, we agree that the initial problem formulation section does not make this accounting sufficiently explicit. In the revised manuscript we will expand the problem setup with a dedicated paragraph and two short examples that illustrate how the recursion handles overlapping errors on a single run, thereby confirming that the threshold accounts for all such interactions. revision: yes

  2. Referee: [Reconstruction algorithm] The reconstruction algorithm is asserted to be efficient, yet no complexity analysis or correctness argument relative to the recursive count is supplied in the visible sections. This is load-bearing because the algorithm must succeed precisely when the number of outputs meets the recursive threshold.

    Authors: We agree that an explicit correctness argument tied to the recursive threshold and a formal complexity analysis are required. The current manuscript describes the algorithm and states that it is efficient, but does not supply the supporting proofs. We will add a new subsection that (i) proves by induction on the number of outputs that the algorithm recovers the unique input whenever the number of distinct outputs meets or exceeds the value given by the recursion, and (ii) establishes that the running time is polynomial in the sequence length and the number of outputs. revision: yes

Circularity Check

0 steps flagged

Derivation is self-contained; recursive formula and algorithm introduce independent content

full rationale

The paper states a recursive formula for the minimum number of distinct outputs needed to uniquely recover the transmitted vector under the (t,s)-sticky-insdel channel and supplies an efficient reconstruction algorithm. The observation that sticky-insertion errors form a special case of tandem-duplication channels is presented as a reduction to previously studied results rather than a self-referential definition or fitted input. No equations or steps in the provided description reduce the claimed minimum or the algorithm to the inputs by construction, nor do they rely on load-bearing self-citations whose validity is assumed without external verification. The central claims therefore remain independent of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The contribution rests on the standard combinatorial model of sticky errors and the assumption that bounded sticky insertions and deletions allow unique reconstruction from a sufficient number of distinct outputs.

axioms (1)
  • domain assumption Errors consist of at most t sticky insertions and s sticky deletions.
    This defines the (t,s)-sticky-insdel channel and the reconstruction problem as stated in the abstract.

pith-pipeline@v0.9.0 · 5712 in / 1321 out tokens · 102119 ms · 2026-05-22T18:12:51.508127+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages

  1. [1]

    Binary codes capable of correcting spu rious insertions and deletion of ones,

    V . Levenshtein, “Binary codes capable of correcting spu rious insertions and deletion of ones,” Problems of Information Transmission, vol.. 1, no.. 1, pp. 8–17, 1965

  2. [2]

    Capacity Bounds for Sticky Channels,

    M. Mitzenmacher “Capacity Bounds for Sticky Channels,” IEEE Transactions on Information Theory , vol. 54, no. 1, pp. 72–77, 2008

  3. [3]

    Sharp analytical capacit y upper bounds for sticky and related channels,

    M. Cheraghchi and J. Ribeiro, “Sharp analytical capacit y upper bounds for sticky and related channels,” IEEE Transaction on Information Theory, vol. 65, no. 11, pp. 6950–6974, 2019

  4. [4]

    Repetition error correct ing sets: Ex- plicit constructions and prefixing methods,

    L. Dolecek and V . Anantharam, “Repetition error correct ing sets: Ex- plicit constructions and prefixing methods,” SIAM Journal on Discrete Mathematics, vol. 23, no. 4, pp. 2120–2146, 2010

  5. [5]

    Mahdavifar, A

    H. Mahdavifar, A. V ardy “Asymptotically Optimal Sticky -Insertion-Correcting Codes with Efficient Encoding and De coding“ 2017 IEEE International Symposium on Information Theory (ISIT) , pp. 2683–2687, 2017

  6. [6]

    On a new class of error control c odes and symmetric functions,

    L. G. Tallini and B. Bose, “On a new class of error control c odes and symmetric functions,” in IEEE International Symposium on Information Theory, pp. 980–984, 2008

  7. [7]

    On efficient repet ition error correcting codes,

    L. G. Tallini, N. Elarief, and B. Bose, “On efficient repet ition error correcting codes,” in IEEE International Symposium on Information Theory, pp. 1012–1016, 2010

  8. [8]

    C oding for racetrack memories,

    Y . M. Chee, H. M. Kiah, A. V ardy, V . K. Vu, and E. Y aakobi, “C oding for racetrack memories,” IEEE Transactions on Information Theory, vol. 64, no. 11, pp. 7094–7112, 2018

  9. [9]

    C odes correcting limited-shift errors in racetrack memorie s,

    Y . M. Chee, H. M. Kiah, A. V ardy, V . K. Vu, and E. Y aakobi, “C odes correcting limited-shift errors in racetrack memorie s,” Proc. IEEE International Symposium on Information Theory, pp. 96–100, 2018

  10. [10]

    Funda mental bounds for sequence reconstruction from nanopore se quencers,

    A. Magner, J. Duda, W. Szpankowski, and A. Grama, “Funda mental bounds for sequence reconstruction from nanopore se quencers,” IEEE Trans Mol Biol Multiscale Commun, vol. 2, no. 1, pp. 92–106, 2016

  11. [11]

    D uplication- correcting codes for data storage in the DNA of l iving organisms,

    S. Jain, F. F. Hassanzadeh, M. Schwartz, and J. Bruck, “D uplication- correcting codes for data storage in the DNA of l iving organisms,” IEEE Transactions on Information Theory, vol. 63, no. 8, pp. 4996–5010, 2017

  12. [12]

    Asymptotically optimal cod es correcting fixed-length duplication errors in DNA storag e systems,

    M. Kovacevic and V . Y . Tan, “Asymptotically optimal cod es correcting fixed-length duplication errors in DNA storag e systems,” IEEE Communications Letters, vol. 22, no. 11, pp. 2194–2197, 2018. 9

  13. [13]

    Efficient reconstruction of sequen ces,

    V . I. Levenshtein, “Efficient reconstruction of sequen ces,” IEEE Transaction on Information Theory, vol. 47, no. 1, pp. 2–22, 2001

  14. [14]

    Efficient reconstruction of sequen ces from their sub-sequences or super-sequences,

    V . I. Levenshtein, “Efficient reconstruction of sequen ces from their sub-sequences or super-sequences,” Journal of Combinatorial Theory, Series A, vol. 93, pp. 310–332, 2001

  15. [15]

    Next-generation di gital information storage in DNA,

    G. M. Church, Y . Gao, and S. Kosuri, “Next-generation di gital information storage in DNA,” Science, vol. 337, no. 6102, pp. 1628, 2012

  16. [16]

    DNA-based storage: Trends and methods,

    S. Y azdi, H. M. Kiah, E. R. Garcia, J. Ma, H. Zhao, and O. Mi lenkovic., “DNA-based storage: Trends and methods,” IEEE Trans. Molecular , Biological, Multi-Scale Commun., vol. 1, no. 3, pp. 230–248, 2015

  17. [17]

    Reconstruction from seletions in racetrack memories,

    Y . M. Chee, R. Gabrys, A. V ardy, V . K. Vu, and E. Y aakobi, “ Reconstruction from seletions in racetrack memories,” in IEEE Information Theory W orkshop,2018

  18. [18]

    Correcting deletions in multiple -heads racetrack memories

    J. Sima and J. Bruck, “Correcting deletions in multiple -heads racetrack memories”, in IEEE International Symposium on Information Theory, 2019

  19. [19]

    Sequence reconstruction ov er the deletion channel,

    R. Gabrys, and E. Y aakobi, “Sequence reconstruction ov er the deletion channel,” IEEE Trans. on Information Theory, vol. 64, no. 4, pp. 2924–2931, 2018

  20. [20]

    Levenshtein’s reconstru ction problem under insertions, deletions, and substituti ons,

    M. Abu-Sini, and E. Y aakobi, “Levenshtein’s reconstru ction problem under insertions, deletions, and substituti ons,” IEEE Transaction on Information Theory, vol. 67, no. 11, pp. 7132–7158, 2021

  21. [21]

    Optimal reconstr uction codes for deletion channels,

    J. Chrisnata, H. M. Kiah, E. Y aakobi, “Optimal reconstr uction codes for deletion channels,” in Proc. International Symposium on Information Theory and its Applications (ISITA), pp. 279–283, 2020

  22. [22]

    Sequence reconst ruction problem for deletion channels: a complete asymptot ic solution,

    V . L. P . Pham, K. Goyal, and H. M. Kiah, “Sequence reconst ruction problem for deletion channels: a complete asymptot ic solution,” Proc. International Symposium on Information Theory, pp. 992–997, 2022

  23. [23]

    Sequence reconstruction for li mited-magnitude errors,

    H. Wei and M. Schwartz, “Sequence reconstruction for li mited-magnitude errors,” IEEE Transaction on Information Theory, vol. 68, no. 7, pp. 4422–4434, 2022

  24. [24]

    Reconstruction codes for DNA sequences with uniform tandem-duplication errors,

    Y . Y ehezkeally and M. Schwartz, “Reconstruction codes for DNA sequences with uniform tandem-duplication errors, ” IEEE Transaction on Information Theory, vol. 66, no. 5, pp. 2658–2668, 2019. 10 Algorithm 1: Check validity of inputs y1, . . . ,yM . Input : y1, . . . ,yM Output: x or FAILURE 1 formation1← [y1,1] 2 pos← 1 3 for j = 2 to n do 4 if y1,j̸= y1...