On the Number of Subsequences in the Nonbinary Deletion Channel

Fang-Wei Fu; Han Li; Xiang Wang

arxiv: 2601.06493 · v2 · submitted 2026-01-10 · 💻 cs.IT · math.CO· math.IT

On the Number of Subsequences in the Nonbinary Deletion Channel

Han Li , Xiang Wang , Fang-Wei Fu This is my paper

Pith reviewed 2026-05-16 15:47 UTC · model grok-4.3

classification 💻 cs.IT math.COmath.IT

keywords subsequencesdeletion channelnon-binary stringsrunsmaximum numberpolynomial timeinformation theory

0 comments

The pith

A family of r-run non-binary strings maximizes subsequences after t deletions, with the count computable in polynomial time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates the number of subsequences that can be obtained from non-binary strings after undergoing t deletions. It concentrates on strings consisting of exactly r runs of identical symbols. The authors identify a specific family of these r-run strings that yields the highest possible number of subsequences for any given t. Additionally, they provide a method to calculate this maximum value efficiently in polynomial time.

Core claim

We characterize a family of r-run non-binary strings with the maximum number of subsequences under any t deletions, and show that this number can be computed in polynomial time.

What carries the argument

The family of r-run non-binary strings characterized by run-count properties that achieve the maximum subsequence count after deletions.

If this is right

The maximum number of subsequences for any r-run string under t deletions is given by this family.
This maximum can be found using an efficient polynomial-time procedure.
Improved bounds on subsequence counts are obtained for non-binary r-run strings.
These results apply directly to analyzing the deletion channel for non-binary alphabets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This characterization may aid in constructing error-correcting codes optimized for deletion channels with larger alphabets.
Similar run-based maximization might be explored for other channel models or error types.
The polynomial-time computability suggests practical applications in sequence analysis for large parameters.

Load-bearing premise

The family achieving the maximum subsequence count depends only on the number of runs and can be identified without regard to particular symbol assignments or alphabet size variations.

What would settle it

An r-run non-binary string outside the characterized family that has more subsequences than the maximum claimed for some specific t would disprove the result.

Figures

Figures reproduced from arXiv: 2601.06493 by Fang-Wei Fu, Han Li, Xiang Wang.

**Figure 1.** Figure 1: Comparison of upper bounds for the case q = 3, n = 120, r = 24. Our upper bounds derived from balanced strings [Corollary 24] are compared against previous best known bounds. [L] marks the upper bound proven by Levenstein [9]. [HR] marks the upper bound proven by Hirschberg and Regnier [1]. The results are presented as functions of t on a logarithmic scale. for all possible sequences Z (at most n−t+q−2 q−2… view at source ↗

read the original abstract

In the deletion channel, an important problem is to determine the number of subsequences derived from a string $U$ of length $n$ when subjected to $t$ deletions. It is well-known that the number of subsequences in the setting exhibits a strong dependence on the number of runs in the string $U$, where a run is defined as a maximal substring of identical characters. In this paper we study the number of subsequences of a non-binary string in this scenario, and propose some improved bounds on the number of subsequences of $r$-run non-binary strings. Specifically, we characterize a family of $r$-run non-binary strings with the maximum number of subsequences under any $t$ deletions, and show that this number can be computed in polynomial time.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper characterizes the r-run non-binary strings maximizing subsequence counts after t deletions and gives a polynomial-time algorithm for the maximum.

read the letter

The main thing to know is that this paper pins down a specific family of r-run strings over non-binary alphabets that achieve the highest number of distinct subsequences after any t deletions, and it shows how to compute that maximum in polynomial time. This moves past the binary-run results by giving an explicit characterization instead of just bounds. The algorithmic part is the practical win here, since it avoids brute-force enumeration for the count. The work builds directly on the run-length dependence that is already known to control subsequence behavior in deletion channels, so the extension feels natural rather than forced. They appear to have a clean combinatorial argument for which run-length vectors work best and how to assign symbols to realize the maximum. A soft spot worth checking is the alphabet-size assumption. If the construction relies on using distinct symbols for each run to prevent merging under deletions, then the result holds only when the alphabet is large enough relative to r. The abstract does not flag this, and for small alphabets the maximizing family could shift because repeated symbols on non-adjacent runs would collapse some subsequences. The proofs will need to show whether they handle symbol reuse or restrict the claim accordingly. This is a narrow but well-scoped combinatorial result aimed at people working on deletion-channel capacity or exact enumeration bounds. It is the kind of targeted tool that specialists can plug into larger arguments. I would send it to peer review; the characterization and complexity claims are concrete enough for referees to verify directly.

Referee Report

2 major / 1 minor

Summary. The paper studies the number of distinct subsequences of length n-t obtained from an r-run non-binary string after t deletions. It characterizes a specific family of r-run strings that achieve the maximum number of such subsequences for any t, proposes improved bounds on this quantity, and shows that the maximum can be computed in polynomial time from the run-length properties.

Significance. If the characterization is correct, the work supplies a precise combinatorial description of subsequence-maximizing strings in the non-binary deletion channel and an efficient algorithm for the maximum count. This strengthens the known dependence of subsequence numbers on run structure and could inform code design or reconstruction algorithms that exploit run-length statistics. The polynomial-time claim is a concrete algorithmic contribution.

major comments (2)

[Abstract] Abstract and characterization: the claim that the maximum depends only on the run-length vector assumes an alphabet large enough to assign distinct symbols to all r runs. For |Σ| < r, non-adjacent runs may receive the same symbol, causing effective merging after deletions that skip intervening runs and strictly lowering the subsequence count relative to the claimed maximizer. This assumption is load-bearing for both the characterization and the polynomial-time result.
[Characterization] Characterization and complexity sections: the polynomial-time algorithm is asserted to compute the maximum from run properties alone, but without an explicit statement of how symbol collisions are avoided or bounded, it is unclear whether the DP or recurrence remains valid for arbitrary alphabet sizes. A concrete counter-example or proof that the optimum is always attained with distinct symbols (or a correction for small alphabets) is needed.

minor comments (1)

[Abstract] The abstract states that improved bounds are proposed but does not indicate whether they follow directly from the characterization or require separate arguments; a short remark linking the two would clarify the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The points raised correctly identify that our characterization and algorithmic result rely on an alphabet large enough to assign distinct symbols to runs. We will revise the manuscript to state this assumption explicitly and add supporting discussion. Point-by-point responses follow.

read point-by-point responses

Referee: [Abstract] Abstract and characterization: the claim that the maximum depends only on the run-length vector assumes an alphabet large enough to assign distinct symbols to all r runs. For |Σ| < r, non-adjacent runs may receive the same symbol, causing effective merging after deletions that skip intervening runs and strictly lowering the subsequence count relative to the claimed maximizer. This assumption is load-bearing for both the characterization and the polynomial-time result.

Authors: We agree that the result assumes |Σ| ≥ r so that distinct symbols can be assigned to the r runs. This is the natural setting for maximizing the number of subsequences, as any symbol repetition between non-adjacent runs reduces the count by creating effective merges after deletions. We will revise the abstract and introduction to state the assumption explicitly: “for an alphabet of size at least r.” We will also add a brief remark noting that for |Σ| < r the maximum is strictly smaller and depends on the optimal symbol assignment. revision: yes
Referee: [Characterization] Characterization and complexity sections: the polynomial-time algorithm is asserted to compute the maximum from run properties alone, but without an explicit statement of how symbol collisions are avoided or bounded, it is unclear whether the DP or recurrence remains valid for arbitrary alphabet sizes. A concrete counter-example or proof that the optimum is always attained with distinct symbols (or a correction for small alphabets) is needed.

Authors: The DP and recurrence compute the maximum under the distinct-symbol assignment, which is always optimal when |Σ| ≥ r. We will insert a short lemma proving that any optimal string can be relabeled with distinct symbols without decreasing the subsequence count (by the pigeonhole principle and the effect of deletions on run merging). For |Σ| < r we will note that the algorithm requires an additional optimization step over symbol assignments; a small counter-example (binary alphabet, r=3) will be added to illustrate the gap. These clarifications will be placed in the characterization section. revision: yes

Circularity Check

0 steps flagged

No circularity: direct combinatorial characterization from run-length properties

full rationale

The paper's central result is a combinatorial characterization of the r-run non-binary strings maximizing the number of length-(n-t) subsequences for any t, together with a polynomial-time algorithm to compute that number. No equation or definition in the provided abstract reduces the claimed maximum to a fitted parameter, a self-referential quantity, or a result imported solely via self-citation. The derivation is presented as proceeding from the standard run-length decomposition of strings and explicit counting arguments over deletion patterns; these inputs are independent of the final characterization. The skeptic concern about symbol-assignment constraints is a question of proof completeness rather than circularity, as the paper does not define the maximum in terms of itself or rename a fitted quantity as a prediction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the standard combinatorial fact that subsequence counts in deletion channels are governed by run structure; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract.

axioms (1)

domain assumption The number of subsequences exhibits a strong dependence on the number of runs in the string U.
Explicitly stated as well-known in the abstract.

pith-pipeline@v0.9.0 · 5429 in / 1155 out tokens · 46094 ms · 2026-05-16T15:47:13.612477+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

Tight bounds on the number of string subsequences,

D. S. Hirschberg and M. Regnier, “Tight bounds on the number of string subsequences,”J. Discrete Algorithms, vol. 1, no. 1, pp. 123-132, 2000

work page 2000
[2]

Sequence Reconstruction over 3-Deletion Channels,

D. Zhang, G. Ge, and Y . Zhang, “Sequence Reconstruction over 3-Deletion Channels,”InProc. Int. Symp. Inform. Theory, pp. 891-896, 2024

work page 2024
[3]

On the number of subsequences when deleting symbols from a string,

H. Mercier, M. Khabbazian, and V . K. Bhargava, “On the number of subsequences when deleting symbols from a string,”IEEE Trans. Inf. Theory, vol. 54, no. 7, pp. 3279-3285, Jul. 2008

work page 2008
[4]

A survey of error-correcting codes for channels with symbol synchronization errors,

H. Mercier, V . K. Bhargava, and V . Tarokh, “A survey of error-correcting codes for channels with symbol synchronization errors,”IEEE Commun. Surveys Tuts., vol. 12, no. 1, pp. 87-96, Feb. 2010

work page 2010
[5]

On the zero-error capacity threshold for deletion channels,

I. A. Kash, M. Mitzenmacher, J. Thaler, and J. Ullman, “On the zero-error capacity threshold for deletion channels,” inProc. Inf. Theory Appl. Workshop (ITA), 2011, pp. 1-5

work page 2011
[6]

Some general results of coding theory with applications to the study of codes for the correction of synchronization errors,

L. Calabi and W. E. Hartnett, “Some general results of coding theory with applications to the study of codes for the correction of synchronization errors,” Inf. Control, vol. 15, no. 3, pp. 235-249, Sep. 1969

work page 1969
[7]

A survey of results for deletion channels and related synchronization channels,

M. Mitzenmacher, “A survey of results for deletion channels and related synchronization channels,”Probab. Surv., vol. 6, pp. 1-33, 2009

work page 2009
[8]

Sequence reconstruction over the deletion channel,

R. Gabrys and E. Yaakobi, “Sequence reconstruction over the deletion channel,”IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2924-2931, 2018

work page 2018
[9]

Binary codes capable of correcting deletions, insertions, and reversals,

V . I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,”Soviet Phys. Doklady, vol. 10, no. 8, pp. 707-710, 1966

work page 1966
[10]

Efficient reconstruction of sequences from their subsequences or supersequences,

V . I. Levenshtein, “Efficient reconstruction of sequences from their subsequences or supersequences,”Journal of Combin. Theory, Ser. A, vol. 93, no.2, pp. 310-332, 2001

work page 2001
[11]

Efficient reconstruction of sequences,

V . I. Levenshtein, “Efficient reconstruction of sequences,”IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2-22, 2001

work page 2001
[12]

Sequence reconstruction problem for deletion channels: a complete asymptotic solution,

V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,” InProc. IEEE Int. Symp. Inform. Theory, pp. 992-997, 2022

work page 2022
[13]

Sequence reconstruction problem for deletion channels: a complete asymptotic solution,

V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,”Journal of Combin. Theory, Ser. A, https://doi.org/10.1016/j.jcta.2024.105980, 2025

work page doi:10.1016/j.jcta.2024.105980 2024
[14]

A characterization of the number of subsequences obtained via the deletion channel,

Y . Liron and M. Langberg, “A characterization of the number of subsequences obtained via the deletion channel,”IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2300-2312, 2015

work page 2015
[15]

Sequence reconstruction under single-burst insertion/deletion/edit channel,

Y . Sun, Y . Xi, and G. Ge, “Sequence reconstruction under single-burst insertion/deletion/edit channel,”IEEE Trans. Inf. Theory, vol. 69, no. 7, pp. 4466-4483, 2023

work page 2023
[16]

On the size distribution of the fixed-length Levenshtein balls with radius one,

G. Wang and Q. Wang, “On the size distribution of the fixed-length Levenshtein balls with radius one,”Designs, Codes and Cryptography, vol. 92, pp. 2253-2265, 2024

work page 2024
[17]

Sequence reconstruction under channels with multiple bursts of insertions or deletions,

Z. Lan, Y . Sun, W. Yu and G. Ge, “Sequence reconstruction under channels with multiple bursts of insertions or deletions,”IEEE Trans. Inf. Theory, vol. 72, no. 1, pp. 315-330, 2026

work page 2026
[18]

On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric,

D. Bar-Lev, T. Etzion, and E. Yaakobi, “On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric,”IEEE Trans. Inf. Theory, vol. 69, no. 4, pp. 2324-2340, 2023

work page 2023
[19]

The size of Levenshtein ball with radius 2: expectation and concentration bound,

L. He and M. Ye, “The size of Levenshtein ball with radius 2: expectation and concentration bound,” in Proceeding of theInternational Symposium on Information Theory (ISIT), Taipei, Taiwan, 2023, pp. 850-855

work page 2023

[1] [1]

Tight bounds on the number of string subsequences,

D. S. Hirschberg and M. Regnier, “Tight bounds on the number of string subsequences,”J. Discrete Algorithms, vol. 1, no. 1, pp. 123-132, 2000

work page 2000

[2] [2]

Sequence Reconstruction over 3-Deletion Channels,

D. Zhang, G. Ge, and Y . Zhang, “Sequence Reconstruction over 3-Deletion Channels,”InProc. Int. Symp. Inform. Theory, pp. 891-896, 2024

work page 2024

[3] [3]

On the number of subsequences when deleting symbols from a string,

H. Mercier, M. Khabbazian, and V . K. Bhargava, “On the number of subsequences when deleting symbols from a string,”IEEE Trans. Inf. Theory, vol. 54, no. 7, pp. 3279-3285, Jul. 2008

work page 2008

[4] [4]

A survey of error-correcting codes for channels with symbol synchronization errors,

H. Mercier, V . K. Bhargava, and V . Tarokh, “A survey of error-correcting codes for channels with symbol synchronization errors,”IEEE Commun. Surveys Tuts., vol. 12, no. 1, pp. 87-96, Feb. 2010

work page 2010

[5] [5]

On the zero-error capacity threshold for deletion channels,

I. A. Kash, M. Mitzenmacher, J. Thaler, and J. Ullman, “On the zero-error capacity threshold for deletion channels,” inProc. Inf. Theory Appl. Workshop (ITA), 2011, pp. 1-5

work page 2011

[6] [6]

Some general results of coding theory with applications to the study of codes for the correction of synchronization errors,

L. Calabi and W. E. Hartnett, “Some general results of coding theory with applications to the study of codes for the correction of synchronization errors,” Inf. Control, vol. 15, no. 3, pp. 235-249, Sep. 1969

work page 1969

[7] [7]

A survey of results for deletion channels and related synchronization channels,

M. Mitzenmacher, “A survey of results for deletion channels and related synchronization channels,”Probab. Surv., vol. 6, pp. 1-33, 2009

work page 2009

[8] [8]

Sequence reconstruction over the deletion channel,

R. Gabrys and E. Yaakobi, “Sequence reconstruction over the deletion channel,”IEEE Trans. Inf. Theory, vol. 64, no. 4, pp. 2924-2931, 2018

work page 2018

[9] [9]

Binary codes capable of correcting deletions, insertions, and reversals,

V . I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,”Soviet Phys. Doklady, vol. 10, no. 8, pp. 707-710, 1966

work page 1966

[10] [10]

Efficient reconstruction of sequences from their subsequences or supersequences,

V . I. Levenshtein, “Efficient reconstruction of sequences from their subsequences or supersequences,”Journal of Combin. Theory, Ser. A, vol. 93, no.2, pp. 310-332, 2001

work page 2001

[11] [11]

Efficient reconstruction of sequences,

V . I. Levenshtein, “Efficient reconstruction of sequences,”IEEE Trans. Inf. Theory, vol. 47, no. 1, pp. 2-22, 2001

work page 2001

[12] [12]

Sequence reconstruction problem for deletion channels: a complete asymptotic solution,

V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,” InProc. IEEE Int. Symp. Inform. Theory, pp. 992-997, 2022

work page 2022

[13] [13]

Sequence reconstruction problem for deletion channels: a complete asymptotic solution,

V . L. P. Pham, K. Goyal, and H. M. Kiah, “Sequence reconstruction problem for deletion channels: a complete asymptotic solution,”Journal of Combin. Theory, Ser. A, https://doi.org/10.1016/j.jcta.2024.105980, 2025

work page doi:10.1016/j.jcta.2024.105980 2024

[14] [14]

A characterization of the number of subsequences obtained via the deletion channel,

Y . Liron and M. Langberg, “A characterization of the number of subsequences obtained via the deletion channel,”IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2300-2312, 2015

work page 2015

[15] [15]

Sequence reconstruction under single-burst insertion/deletion/edit channel,

Y . Sun, Y . Xi, and G. Ge, “Sequence reconstruction under single-burst insertion/deletion/edit channel,”IEEE Trans. Inf. Theory, vol. 69, no. 7, pp. 4466-4483, 2023

work page 2023

[16] [16]

On the size distribution of the fixed-length Levenshtein balls with radius one,

G. Wang and Q. Wang, “On the size distribution of the fixed-length Levenshtein balls with radius one,”Designs, Codes and Cryptography, vol. 92, pp. 2253-2265, 2024

work page 2024

[17] [17]

Sequence reconstruction under channels with multiple bursts of insertions or deletions,

Z. Lan, Y . Sun, W. Yu and G. Ge, “Sequence reconstruction under channels with multiple bursts of insertions or deletions,”IEEE Trans. Inf. Theory, vol. 72, no. 1, pp. 315-330, 2026

work page 2026

[18] [18]

On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric,

D. Bar-Lev, T. Etzion, and E. Yaakobi, “On the size of balls and anticodes of small diameter under the fixed-length Levenshtein metric,”IEEE Trans. Inf. Theory, vol. 69, no. 4, pp. 2324-2340, 2023

work page 2023

[19] [19]

The size of Levenshtein ball with radius 2: expectation and concentration bound,

L. He and M. Ye, “The size of Levenshtein ball with radius 2: expectation and concentration bound,” in Proceeding of theInternational Symposium on Information Theory (ISIT), Taipei, Taiwan, 2023, pp. 850-855

work page 2023