Simple Finite-Length Achievability and Converse Bounds for the Deletion Channel and the Insertion Channel

Ruslan Morozov; Tolga Mete Duman

arxiv: 2504.20961 · v4 · submitted 2025-04-29 · 💻 cs.IT · math.IT

Simple Finite-Length Achievability and Converse Bounds for the Deletion Channel and the Insertion Channel

Ruslan Morozov , Tolga Mete Duman This is my paper

Pith reviewed 2026-05-22 18:21 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords deletion channelinsertion channelfinite block lengthconverse boundachievability boundcode sizereference output distributionbinary erasure channel

0 comments

The pith

A specific reference output distribution yields tighter finite-length upper bounds on code size for deletion and insertion channels than the binary erasure channel.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes upper bounds on the maximum size of codes that can be used over deletion and insertion channels when the block length is fixed and a target error probability is set. It achieves this by adapting a general converse bound with a carefully chosen reference distribution over possible outputs. This choice leads to a simple formula that is easier to compute than the general version. For the deletion channel the resulting bound is shown to be stricter than the one obtained by treating the channel as a binary erasure channel. Similar bounds are derived for the insertion channel and an algorithm is given to compute matching achievability results.

Core claim

For a general discrete channel a reference output distribution can be selected so that the general converse bound simplifies to a closed-form expression; when this expression is evaluated on the deletion channel with finite block length it produces an upper bound on code size that is tighter than the binary-erasure-channel bound for the same length and frame-error target.

What carries the argument

A reference output distribution chosen for any finite-input finite-output channel that turns the general converse into a simple, explicit upper bound on code size.

If this is right

The upper bound applies directly to both i.i.d. deletion and insertion channels.
It is computable without optimizing over distributions or using data-dependent tuning.
For deletion channels the bound is strictly tighter than the only previously available finite-length converse based on the binary erasure channel.
An exponential-time algorithm computes a corresponding achievability bound for any discrete-input discrete-output channel.
The same reference-distribution technique can be reused for other channels with finite alphabets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These bounds could serve as quick sanity checks when designing practical deletion-correcting codes for short packets.
Extending the reference distribution to channels with memory or continuous alphabets might produce analogous finite-length results.
Comparing the bound against known asymptotic capacity expressions for large block lengths would test consistency.
Implementing the algorithm for small block lengths could reveal how close the converse and achievability bounds sit for deletion channels.

Load-bearing premise

The chosen reference output distribution produces a bound that remains valid and reasonably tight for the specific deletion and insertion channels without requiring further optimization or data-dependent tuning.

What would settle it

For small block lengths, compute the actual largest code size that achieves the target frame error probability over the deletion channel and check whether it exceeds the new upper bound.

Figures

Figures reproduced from arXiv: 2504.20961 by Ruslan Morozov, Tolga Mete Duman.

**Figure 2.** Figure 2: The LO-CVB for the deletion channel (D (δ) m ) n versus the BEC bound and the normal approximation. The x-coordinate is equal to the total number of input bits N = mn. VII. CONCLUSIONS In this paper, we provide an upper bound on the code size for the deletion (insertion) channel with a given deletion (insertion) probability, input length, and target frame error rate. This is done by providing a reference o… view at source ↗

**Figure 3.** Figure 3: GAVB for channel I (ι) 12 . 0.7 0.8 0.9 1 1.1 10 20 50 100 200 500 1000 2000 4000 code rate N=m*n LO−CVB, fer=1e−7 N. app., m=12, fer=1e−7 LO−CVB, fer=1e−5 N. app., m=12, fer=1e−5 LO−CVB, fer=1e−3 N. app., m=12, fer=1e−3 LO−CVB, fer=0.01 N. app., m=12, fer=0.01 LO−CVB, fer=0.1 N. app., m=12, fer=0.1 capacity lower bound capacity upper bound, m=12 insertion channel, ι = 0.001 (a) ι = 0.001 0.1 0.2 0.3 0.4 0… view at source ↗

**Figure 4.** Figure 4: The LO-CVB for the insertion channel (I (ι) m ) n versus the normal approximation and the lower capacity bound [17]. The x-coordinate is equal to the total number of input bits N = mn [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

read the original abstract

We develop upper bounds on code size for an independent and identically distributed deletion and insertion channels for a given code length and target frame error probability. The bounds are obtained as a variation of a general converse bound, which, though available for any channel, is inefficient and not easily computable without a good reference distribution over the output alphabet. We obtain a reference output distribution for a general finite-input finite-output channel and provide a simple formula for the converse bound on the capacity employing this distribution. We then evaluate the bound for the deletion channel with a finite block length, and show that the resulting upper bound on the code size is tighter than that for a binary erasure channel, which is the only alternative converse bound for the finite-length setting. We also provide similar results for the insertion channel. Furthermore, we present a simple algorithm for computing an achievability bound for a general discrete-input discrete-output channel. Although the algorithm has exponential complexity, it is useful for comparison purposes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives an explicit reference output distribution that produces a simple, tighter finite-length converse for deletion and insertion channels than the binary erasure bound.

read the letter

This paper's main contribution is a simple reference output distribution that turns a general converse into an explicit finite-length upper bound on code size for deletion and insertion channels, and that bound beats the one from the binary erasure channel. They start from a known general converse for any channel that requires choosing a reference distribution over the outputs. For finite alphabets they give a specific choice that yields a closed-form expression. Applying it to the deletion channel with i.i.d. deletions, they get a tighter bound than the erasure alternative for the same block length and error probability. They do the same for insertions. On the achievability side they describe a straightforward algorithm, though it has exponential complexity so it's mainly for benchmarking. The reference distribution idea is new in this context and the comparison to the erasure bound is a clear improvement. The derivation seems direct without hidden fitting or circularity. The potential issue is how the reference distribution handles the random output lengths. Deletion channels map n-bit inputs to outputs of length between 0 and n, so the output space is all binary strings up to length n. If the chosen distribution doesn't properly dominate or match the possible outputs under the capacity-achieving input, the bound might not be as strong as stated or could even have validity questions. The abstract doesn't spell out the construction in detail, so that needs checking in the full paper. This work is aimed at coding theorists interested in synchronization-error channels and finite-length analysis. Someone building bounds for DNA storage or similar applications would find the explicit expressions useful. It is solid enough to go to peer review, even if the improvement is incremental. I would send it for refereeing.

Referee Report

2 major / 2 minor

Summary. The paper develops finite-length upper bounds on code size for i.i.d. deletion and insertion channels by applying a variation of a general converse bound that relies on an explicit reference output distribution over the (variable-length) output alphabet. It derives a simple closed-form expression for the resulting converse and evaluates it numerically to claim a strictly tighter bound than the binary erasure channel alternative; it also supplies an exponential-time algorithm for computing matching achievability bounds on general discrete channels.

Significance. If the reference distribution is shown to be valid for variable-length outputs, the work supplies the first explicit, non-optimized finite-length converse for deletion/insertion channels that improves on the only previously available bound (BEC). The explicit reference distribution and the simple achievability algorithm constitute practical, reproducible tools that enable direct numerical comparison without parameter tuning.

major comments (2)

[Section describing the reference distribution and its application to deletion/insertion channels] The central claim that the derived upper bound is strictly tighter than the BEC bound rests on the reference output distribution remaining valid once the output alphabet is the union of {0,1}^k for k = 0 … n. The manuscript must explicitly construct this distribution over the variable-length space and verify that it dominates the true output measure induced by the capacity-achieving input (or at least that the resulting information-spectrum bound stays valid).
[Evaluation section for the deletion channel] The abstract states that the bound is obtained 'as a variation of a general converse bound' and then 'evaluated' for the deletion channel. The manuscript should include the precise statement of the general converse (including the exact divergence or information-spectrum quantity used) together with the numerical values of the resulting bound versus the BEC bound for at least one block length and error probability, so that the claimed improvement can be reproduced.

minor comments (2)

The achievability algorithm is described as having 'exponential complexity' yet 'useful for comparison purposes.' Adding a short pseudocode listing or a complexity table for small n would improve clarity.
Notation for the output alphabet of the deletion channel (variable-length binary strings) should be introduced once and used consistently when defining the reference distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our manuscript. We address each major comment below in a point-by-point manner and indicate planned revisions to strengthen the presentation and reproducibility of the results.

read point-by-point responses

Referee: [Section describing the reference distribution and its application to deletion/insertion channels] The central claim that the derived upper bound is strictly tighter than the BEC bound rests on the reference output distribution remaining valid once the output alphabet is the union of {0,1}^k for k = 0 … n. The manuscript must explicitly construct this distribution over the variable-length space and verify that it dominates the true output measure induced by the capacity-achieving input (or at least that the resulting information-spectrum bound stays valid).

Authors: We agree that an explicit construction over the variable-length output space is essential for rigor. In the revised manuscript we will insert a dedicated subsection that defines the reference distribution explicitly as the product measure induced by independent Bernoulli(1/2) symbols for each possible output length k = 0 to n, normalized appropriately to account for the deletion probability. We will then prove that this distribution satisfies the required dominance property with respect to the true output measure under the uniform input distribution, thereby ensuring the validity of the information-spectrum converse bound. This construction follows directly from the general finite-input finite-output reference distribution derived earlier in the paper but is specialized to the union of binary strings of length at most n. revision: yes
Referee: [Evaluation section for the deletion channel] The abstract states that the bound is obtained 'as a variation of a general converse bound' and then 'evaluated' for the deletion channel. The manuscript should include the precise statement of the general converse (including the exact divergence or information-spectrum quantity used) together with the numerical values of the resulting bound versus the BEC bound for at least one block length and error probability, so that the claimed improvement can be reproduced.

Authors: We concur that the current presentation would benefit from greater explicitness. We will revise the manuscript to include the precise statement of the general converse bound, specifying that the upper bound on the code size is given by the information-spectrum quantity involving the divergence between the induced output distribution and the chosen reference distribution. In addition, we will add a table in the evaluation section that reports the numerical values of our bound and the BEC bound for at least one concrete block length (e.g., n = 20) and target error probability (e.g., 10^{-3}), allowing direct reproduction of the claimed improvement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds derived from general converse with independently obtained reference distribution

full rationale

The paper presents a variation of a general converse bound for finite-length deletion and insertion channels by first obtaining a reference output distribution for any finite-input finite-output channel and then evaluating the resulting code-size upper bound. This reference distribution and the simple formula for the converse are derived as a general tool rather than fitted to deletion/insertion data or reduced to a self-citation chain. The claimed tightness is shown by direct comparison to the external binary erasure channel bound, with no equations or steps in the provided text reducing the target result to its own inputs by construction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard information-theoretic axioms such as the definition of mutual information and the general converse bound; no free parameters or invented entities are introduced beyond the reference distribution choice.

axioms (1)

standard math Existence of a general converse bound applicable to any channel
Invoked to obtain the variation used for deletion and insertion channels

pith-pipeline@v0.9.0 · 5701 in / 1162 out tokens · 59755 ms · 2026-05-22T18:21:22.044737+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

[1]

Three decades of na nopore sequencing,

D. Deamer, M. Akeson, and D. Branton, “Three decades of na nopore sequencing,” Nature biotechnology, vol. 34, pp. 518–524, 05 2016. 10 10−2 100 6 9 12 FER log2 of code size LO−CVB ι=0.05 AVB ι=0.05 LO−CVB ι=0.2 AVB ι=0.2 LO−CVB ι=0.5 AVB ι=0.5 insertion channel, m=12, n=1 Fig. 3: GA VB for channel I (ι) 12 . 0.7 0.8 0.9 1 1.1 10 20 50 100 200 500 1000 200...

work page 2016
[2]

Next-generation dig ital information storage in DNA,

G. M. Church, Y . Gao, and S. Kosuri, “Next-generation dig ital information storage in DNA,” Science, vol. 337, no. 6102, p. 1628, 2012

work page 2012
[3]

Towards practical, high-capacity, low-maintenance inf ormation storage in synthesized DNA,

N. Goldman, P . Bertone, S. Chen et al. , “Towards practical, high-capacity, low-maintenance inf ormation storage in synthesized DNA,” Nature, vol. 494, no. 7435, pp. 77–80, Feb 2013

work page 2013
[4]

Low cost DNA data storage using photolithographic synthe sis and advanced information reconstruction and error corr ection,

P . L. Antkowiak et al. , “Low cost DNA data storage using photolithographic synthe sis and advanced information reconstruction and error corr ection,” Nature Commun. , vol. 11, no. 1, pp. 1–10, Dec 2020

work page 2020
[5]

An end-to-end coding scheme fo r DNA-based data storage with nanopore-sequenced reads,

L. Welter, R. Sokolovskii, T. Heinis, A. Wachter-Zeh, E. Rosnes, and A. Graell i Amat, “An end-to-end coding scheme fo r DNA-based data storage with nanopore-sequenced reads,” 2024. [Online]. Availabl e: https://arxiv.org/abs/2406.12955

work page arXiv 2024
[6]

Channel coding r ate in the ﬁnite blocklength regime,

Y . Polyanskiy, H. V . Poor, and S. V erdu, “Channel coding r ate in the ﬁnite blocklength regime,” IEEE Transactions on Information Theory , vol. 56, no. 5, pp. 2307–2359, 2010

work page 2010
[7]

Approaching the norm al approximation of the ﬁnite blocklength capacity within 0 .025 db by short polar codes,

J. Piao, K. Niu, J. Dai, and C. Dong, “Approaching the norm al approximation of the ﬁnite blocklength capacity within 0 .025 db by short polar codes,” IEEE Wireless Communications Letters , vol. 9, no. 7, pp. 1089–1092, 2020

work page 2020
[8]

Fini te blocklength performance bound for the DNA storage channe l,

I. Maarouf, G. Liva, E. Rosnes, and A. Graell i Amat, “Fini te blocklength performance bound for the DNA storage channe l,” in 2023 12th International Symposium on Topics in Coding (ISTC) , 2023, pp. 1–5

work page 2023
[9]

C. E. Shannon, Lower Bounds to Error Probability for Coding on Discrete Mem oryless Channels. I . IEEE, 1993, pp. 385–423

work page 1993
[10]

On the evaluation of the Polyanskiy-Poor– Verdú converse bound for ﬁnite block-length coding in AWGN,

T. Erseghe, “On the evaluation of the Polyanskiy-Poor– Verdú converse bound for ﬁnite block-length coding in AWGN, ” IEEE Transactions on Information Theory, vol. 61, no. 12, pp. 6578–6590, 2015

work page 2015
[11]

The DNA storage channel: C apacity and error probability bounds,

N. Weinberger and N. Merhav, “The DNA storage channel: C apacity and error probability bounds,” IEEE Transactions on Information Theory , vol. 68, no. 9, pp. 5657–5700, 2022

work page 2022
[12]

Information rates of the noisy nanopore channel,

B. McBain, E. Viterbo, and J. Saunderson, “Information rates of the noisy nanopore channel,” IEEE Transactions on Information Theory , vol. 70, no. 8, pp. 5640–5652, 2024

work page 2024
[13]

Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities

V . Y . F. Tan, “Asymptotic estimates in information theo ry with non-vanishing error probabilities,” CoRR, vol. abs/1504.02608, 2015. [Online]. Available: http://arxiv.org/abs/1504.02608

work page internal anchor Pith review Pith/arXiv arXiv 2015
[14]

A general formula for channel cap acity,

S. V erdu and T. S. Han, “A general formula for channel cap acity,” IEEE Transactions on Information Theory , vol. 40, no. 4, pp. 1147–1157, 1994

work page 1994
[15]

Novel bounds on the capaci ty of the binary deletion channel,

D. Fertonani and T. M. Duman, “Novel bounds on the capaci ty of the binary deletion channel,” IEEE Transactions on Information Theory , vol. 56, no. 6, pp. 2753–2765, 2010

work page 2010
[16]

Upper bounds on the capacity of deletion channels using channel fragmentation,

M. Rahmati and T. M. Duman, “Upper bounds on the capacity of deletion channels using channel fragmentation,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 146–156, 2015

work page 2015
[17]

Bounds on the capacity of random insertion and dele tion-additive noise channels,

——, “Bounds on the capacity of random insertion and dele tion-additive noise channels,” IEEE Transactions on Information Theory , vol. 59, no. 9, pp. 5534–5546, 2013

work page 2013

[1] [1]

Three decades of na nopore sequencing,

D. Deamer, M. Akeson, and D. Branton, “Three decades of na nopore sequencing,” Nature biotechnology, vol. 34, pp. 518–524, 05 2016. 10 10−2 100 6 9 12 FER log2 of code size LO−CVB ι=0.05 AVB ι=0.05 LO−CVB ι=0.2 AVB ι=0.2 LO−CVB ι=0.5 AVB ι=0.5 insertion channel, m=12, n=1 Fig. 3: GA VB for channel I (ι) 12 . 0.7 0.8 0.9 1 1.1 10 20 50 100 200 500 1000 200...

work page 2016

[2] [2]

Next-generation dig ital information storage in DNA,

G. M. Church, Y . Gao, and S. Kosuri, “Next-generation dig ital information storage in DNA,” Science, vol. 337, no. 6102, p. 1628, 2012

work page 2012

[3] [3]

Towards practical, high-capacity, low-maintenance inf ormation storage in synthesized DNA,

N. Goldman, P . Bertone, S. Chen et al. , “Towards practical, high-capacity, low-maintenance inf ormation storage in synthesized DNA,” Nature, vol. 494, no. 7435, pp. 77–80, Feb 2013

work page 2013

[4] [4]

Low cost DNA data storage using photolithographic synthe sis and advanced information reconstruction and error corr ection,

P . L. Antkowiak et al. , “Low cost DNA data storage using photolithographic synthe sis and advanced information reconstruction and error corr ection,” Nature Commun. , vol. 11, no. 1, pp. 1–10, Dec 2020

work page 2020

[5] [5]

An end-to-end coding scheme fo r DNA-based data storage with nanopore-sequenced reads,

L. Welter, R. Sokolovskii, T. Heinis, A. Wachter-Zeh, E. Rosnes, and A. Graell i Amat, “An end-to-end coding scheme fo r DNA-based data storage with nanopore-sequenced reads,” 2024. [Online]. Availabl e: https://arxiv.org/abs/2406.12955

work page arXiv 2024

[6] [6]

Channel coding r ate in the ﬁnite blocklength regime,

Y . Polyanskiy, H. V . Poor, and S. V erdu, “Channel coding r ate in the ﬁnite blocklength regime,” IEEE Transactions on Information Theory , vol. 56, no. 5, pp. 2307–2359, 2010

work page 2010

[7] [7]

Approaching the norm al approximation of the ﬁnite blocklength capacity within 0 .025 db by short polar codes,

J. Piao, K. Niu, J. Dai, and C. Dong, “Approaching the norm al approximation of the ﬁnite blocklength capacity within 0 .025 db by short polar codes,” IEEE Wireless Communications Letters , vol. 9, no. 7, pp. 1089–1092, 2020

work page 2020

[8] [8]

Fini te blocklength performance bound for the DNA storage channe l,

I. Maarouf, G. Liva, E. Rosnes, and A. Graell i Amat, “Fini te blocklength performance bound for the DNA storage channe l,” in 2023 12th International Symposium on Topics in Coding (ISTC) , 2023, pp. 1–5

work page 2023

[9] [9]

C. E. Shannon, Lower Bounds to Error Probability for Coding on Discrete Mem oryless Channels. I . IEEE, 1993, pp. 385–423

work page 1993

[10] [10]

On the evaluation of the Polyanskiy-Poor– Verdú converse bound for ﬁnite block-length coding in AWGN,

T. Erseghe, “On the evaluation of the Polyanskiy-Poor– Verdú converse bound for ﬁnite block-length coding in AWGN, ” IEEE Transactions on Information Theory, vol. 61, no. 12, pp. 6578–6590, 2015

work page 2015

[11] [11]

The DNA storage channel: C apacity and error probability bounds,

N. Weinberger and N. Merhav, “The DNA storage channel: C apacity and error probability bounds,” IEEE Transactions on Information Theory , vol. 68, no. 9, pp. 5657–5700, 2022

work page 2022

[12] [12]

Information rates of the noisy nanopore channel,

B. McBain, E. Viterbo, and J. Saunderson, “Information rates of the noisy nanopore channel,” IEEE Transactions on Information Theory , vol. 70, no. 8, pp. 5640–5652, 2024

work page 2024

[13] [13]

Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities

V . Y . F. Tan, “Asymptotic estimates in information theo ry with non-vanishing error probabilities,” CoRR, vol. abs/1504.02608, 2015. [Online]. Available: http://arxiv.org/abs/1504.02608

work page internal anchor Pith review Pith/arXiv arXiv 2015

[14] [14]

A general formula for channel cap acity,

S. V erdu and T. S. Han, “A general formula for channel cap acity,” IEEE Transactions on Information Theory , vol. 40, no. 4, pp. 1147–1157, 1994

work page 1994

[15] [15]

Novel bounds on the capaci ty of the binary deletion channel,

D. Fertonani and T. M. Duman, “Novel bounds on the capaci ty of the binary deletion channel,” IEEE Transactions on Information Theory , vol. 56, no. 6, pp. 2753–2765, 2010

work page 2010

[16] [16]

Upper bounds on the capacity of deletion channels using channel fragmentation,

M. Rahmati and T. M. Duman, “Upper bounds on the capacity of deletion channels using channel fragmentation,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 146–156, 2015

work page 2015

[17] [17]

Bounds on the capacity of random insertion and dele tion-additive noise channels,

——, “Bounds on the capacity of random insertion and dele tion-additive noise channels,” IEEE Transactions on Information Theory , vol. 59, no. 9, pp. 5534–5546, 2013

work page 2013