On the Number of Subsequences in the Nonbinary Deletion Channel
Pith reviewed 2026-05-16 15:47 UTC · model grok-4.3
The pith
A family of r-run non-binary strings maximizes subsequences after t deletions, with the count computable in polynomial time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We characterize a family of r-run non-binary strings with the maximum number of subsequences under any t deletions, and show that this number can be computed in polynomial time.
What carries the argument
The family of r-run non-binary strings characterized by run-count properties that achieve the maximum subsequence count after deletions.
If this is right
- The maximum number of subsequences for any r-run string under t deletions is given by this family.
- This maximum can be found using an efficient polynomial-time procedure.
- Improved bounds on subsequence counts are obtained for non-binary r-run strings.
- These results apply directly to analyzing the deletion channel for non-binary alphabets.
Where Pith is reading between the lines
- This characterization may aid in constructing error-correcting codes optimized for deletion channels with larger alphabets.
- Similar run-based maximization might be explored for other channel models or error types.
- The polynomial-time computability suggests practical applications in sequence analysis for large parameters.
Load-bearing premise
The family achieving the maximum subsequence count depends only on the number of runs and can be identified without regard to particular symbol assignments or alphabet size variations.
What would settle it
An r-run non-binary string outside the characterized family that has more subsequences than the maximum claimed for some specific t would disprove the result.
Figures
read the original abstract
In the deletion channel, an important problem is to determine the number of subsequences derived from a string $U$ of length $n$ when subjected to $t$ deletions. It is well-known that the number of subsequences in the setting exhibits a strong dependence on the number of runs in the string $U$, where a run is defined as a maximal substring of identical characters. In this paper we study the number of subsequences of a non-binary string in this scenario, and propose some improved bounds on the number of subsequences of $r$-run non-binary strings. Specifically, we characterize a family of $r$-run non-binary strings with the maximum number of subsequences under any $t$ deletions, and show that this number can be computed in polynomial time.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the number of distinct subsequences of length n-t obtained from an r-run non-binary string after t deletions. It characterizes a specific family of r-run strings that achieve the maximum number of such subsequences for any t, proposes improved bounds on this quantity, and shows that the maximum can be computed in polynomial time from the run-length properties.
Significance. If the characterization is correct, the work supplies a precise combinatorial description of subsequence-maximizing strings in the non-binary deletion channel and an efficient algorithm for the maximum count. This strengthens the known dependence of subsequence numbers on run structure and could inform code design or reconstruction algorithms that exploit run-length statistics. The polynomial-time claim is a concrete algorithmic contribution.
major comments (2)
- [Abstract] Abstract and characterization: the claim that the maximum depends only on the run-length vector assumes an alphabet large enough to assign distinct symbols to all r runs. For |Σ| < r, non-adjacent runs may receive the same symbol, causing effective merging after deletions that skip intervening runs and strictly lowering the subsequence count relative to the claimed maximizer. This assumption is load-bearing for both the characterization and the polynomial-time result.
- [Characterization] Characterization and complexity sections: the polynomial-time algorithm is asserted to compute the maximum from run properties alone, but without an explicit statement of how symbol collisions are avoided or bounded, it is unclear whether the DP or recurrence remains valid for arbitrary alphabet sizes. A concrete counter-example or proof that the optimum is always attained with distinct symbols (or a correction for small alphabets) is needed.
minor comments (1)
- [Abstract] The abstract states that improved bounds are proposed but does not indicate whether they follow directly from the characterization or require separate arguments; a short remark linking the two would clarify the contribution.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. The points raised correctly identify that our characterization and algorithmic result rely on an alphabet large enough to assign distinct symbols to runs. We will revise the manuscript to state this assumption explicitly and add supporting discussion. Point-by-point responses follow.
read point-by-point responses
-
Referee: [Abstract] Abstract and characterization: the claim that the maximum depends only on the run-length vector assumes an alphabet large enough to assign distinct symbols to all r runs. For |Σ| < r, non-adjacent runs may receive the same symbol, causing effective merging after deletions that skip intervening runs and strictly lowering the subsequence count relative to the claimed maximizer. This assumption is load-bearing for both the characterization and the polynomial-time result.
Authors: We agree that the result assumes |Σ| ≥ r so that distinct symbols can be assigned to the r runs. This is the natural setting for maximizing the number of subsequences, as any symbol repetition between non-adjacent runs reduces the count by creating effective merges after deletions. We will revise the abstract and introduction to state the assumption explicitly: “for an alphabet of size at least r.” We will also add a brief remark noting that for |Σ| < r the maximum is strictly smaller and depends on the optimal symbol assignment. revision: yes
-
Referee: [Characterization] Characterization and complexity sections: the polynomial-time algorithm is asserted to compute the maximum from run properties alone, but without an explicit statement of how symbol collisions are avoided or bounded, it is unclear whether the DP or recurrence remains valid for arbitrary alphabet sizes. A concrete counter-example or proof that the optimum is always attained with distinct symbols (or a correction for small alphabets) is needed.
Authors: The DP and recurrence compute the maximum under the distinct-symbol assignment, which is always optimal when |Σ| ≥ r. We will insert a short lemma proving that any optimal string can be relabeled with distinct symbols without decreasing the subsequence count (by the pigeonhole principle and the effect of deletions on run merging). For |Σ| < r we will note that the algorithm requires an additional optimization step over symbol assignments; a small counter-example (binary alphabet, r=3) will be added to illustrate the gap. These clarifications will be placed in the characterization section. revision: yes
Circularity Check
No circularity: direct combinatorial characterization from run-length properties
full rationale
The paper's central result is a combinatorial characterization of the r-run non-binary strings maximizing the number of length-(n-t) subsequences for any t, together with a polynomial-time algorithm to compute that number. No equation or definition in the provided abstract reduces the claimed maximum to a fitted parameter, a self-referential quantity, or a result imported solely via self-citation. The derivation is presented as proceeding from the standard run-length decomposition of strings and explicit counting arguments over deletion patterns; these inputs are independent of the final characterization. The skeptic concern about symbol-assignment constraints is a question of proof completeness rather than circularity, as the paper does not define the maximum in terms of itself or rename a fitted quantity as a prediction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The number of subsequences exhibits a strong dependence on the number of runs in the string U.
Reference graph
Works this paper leans on
-
[1]
Tight bounds on the number of string subsequences,
D. S. Hirschberg and M. Regnier, “Tight bounds on the number of string subsequences,”J. Discrete Algorithms, vol. 1, no. 1, pp. 123-132, 2000
work page 2000
-
[2]
Sequence Reconstruction over 3-Deletion Channels,
D. Zhang, G. Ge, and Y . Zhang, “Sequence Reconstruction over 3-Deletion Channels,”InProc. Int. Symp. Inform. Theory, pp. 891-896, 2024
work page 2024
-
[3]
On the number of subsequences when deleting symbols from a string,
H. Mercier, M. Khabbazian, and V . K. Bhargava, “On the number of subsequences when deleting symbols from a string,”IEEE Trans. Inf. Theory, vol. 54, no. 7, pp. 3279-3285, Jul. 2008
work page 2008
-
[4]
A survey of error-correcting codes for channels with symbol synchronization errors,
H. Mercier, V . K. Bhargava, and V . Tarokh, “A survey of error-correcting codes for channels with symbol synchronization errors,”IEEE Commun. Surveys Tuts., vol. 12, no. 1, pp. 87-96, Feb. 2010
work page 2010
-
[5]
On the zero-error capacity threshold for deletion channels,
I. A. Kash, M. Mitzenmacher, J. Thaler, and J. Ullman, “On the zero-error capacity threshold for deletion channels,” inProc. Inf. Theory Appl. Workshop (ITA), 2011, pp. 1-5
work page 2011
-
[6]
L. Calabi and W. E. Hartnett, “Some general results of coding theory with applications to the study of codes for the correction of synchronization errors,” Inf. Control, vol. 15, no. 3, pp. 235-249, Sep. 1969
work page 1969
-
[7]
A survey of results for deletion channels and related synchronization channels,
M. Mitzenmacher, “A survey of results for deletion channels and related synchronization channels,”Probab. Surv., vol. 6, pp. 1-33, 2009
work page 2009
-
[8]
Sequence reconstruction over the deletion channel,
R. Gabrys and E. Yaakobi, “Sequence reconstruction over the deletion channel,”IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2924-2931, 2018
work page 2018
-
[9]
Binary codes capable of correcting deletions, insertions, and reversals,
V . I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,”Soviet Phys. Doklady, vol. 10, no. 8, pp. 707-710, 1966
work page 1966
-
[10]
Efficient reconstruction of sequences from their subsequences or supersequences,
V . I. Levenshtein, “Efficient reconstruction of sequences from their subsequences or supersequences,”Journal of Combin. Theory, Ser. A, vol. 93, no.2, pp. 310-332, 2001
work page 2001
-
[11]
Efficient reconstruction of sequences,
V . I. Levenshtein, “Efficient reconstruction of sequences,”IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2-22, 2001
work page 2001
-
[12]
Sequence reconstruction problem for deletion channels: a complete asymptotic solution,
V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,” InProc. IEEE Int. Symp. Inform. Theory, pp. 992-997, 2022
work page 2022
-
[13]
Sequence reconstruction problem for deletion channels: a complete asymptotic solution,
V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,”Journal of Combin. Theory, Ser. A, https://doi.org/10.1016/j.jcta.2024.105980, 2025
-
[14]
A characterization of the number of subsequences obtained via the deletion channel,
Y . Liron and M. Langberg, “A characterization of the number of subsequences obtained via the deletion channel,”IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2300-2312, 2015
work page 2015
-
[15]
Sequence reconstruction under single-burst insertion/deletion/edit channel,
Y . Sun, Y . Xi, and G. Ge, “Sequence reconstruction under single-burst insertion/deletion/edit channel,”IEEE Trans. Inf. Theory, vol. 69, no. 7, pp. 4466-4483, 2023
work page 2023
-
[16]
On the size distribution of the fixed-length Levenshtein balls with radius one,
G. Wang and Q. Wang, “On the size distribution of the fixed-length Levenshtein balls with radius one,”Designs, Codes and Cryptography, vol. 92, pp. 2253-2265, 2024
work page 2024
-
[17]
Sequence reconstruction under channels with multiple bursts of insertions or deletions,
Z. Lan, Y . Sun, W. Yu and G. Ge, “Sequence reconstruction under channels with multiple bursts of insertions or deletions,”IEEE Trans. Inf. Theory, vol. 72, no. 1, pp. 315-330, 2026
work page 2026
-
[18]
On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric,
D. Bar-Lev, T. Etzion, and E. Yaakobi, “On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric,”IEEE Trans. Inf. Theory, vol. 69, no. 4, pp. 2324-2340, 2023
work page 2023
-
[19]
The size of Levenshtein ball with radius 2: expectation and concentration bound,
L. He and M. Ye, “The size of Levenshtein ball with radius 2: expectation and concentration bound,” in Proceeding of theInternational Symposium on Information Theory (ISIT), Taipei, Taiwan, 2023, pp. 850-855
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.