pith. sign in

arxiv: 2504.20961 · v4 · submitted 2025-04-29 · 💻 cs.IT · math.IT

Simple Finite-Length Achievability and Converse Bounds for the Deletion Channel and the Insertion Channel

Pith reviewed 2026-05-22 18:21 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords deletion channelinsertion channelfinite block lengthconverse boundachievability boundcode sizereference output distributionbinary erasure channel
0
0 comments X

The pith

A specific reference output distribution yields tighter finite-length upper bounds on code size for deletion and insertion channels than the binary erasure channel.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes upper bounds on the maximum size of codes that can be used over deletion and insertion channels when the block length is fixed and a target error probability is set. It achieves this by adapting a general converse bound with a carefully chosen reference distribution over possible outputs. This choice leads to a simple formula that is easier to compute than the general version. For the deletion channel the resulting bound is shown to be stricter than the one obtained by treating the channel as a binary erasure channel. Similar bounds are derived for the insertion channel and an algorithm is given to compute matching achievability results.

Core claim

For a general discrete channel a reference output distribution can be selected so that the general converse bound simplifies to a closed-form expression; when this expression is evaluated on the deletion channel with finite block length it produces an upper bound on code size that is tighter than the binary-erasure-channel bound for the same length and frame-error target.

What carries the argument

A reference output distribution chosen for any finite-input finite-output channel that turns the general converse into a simple, explicit upper bound on code size.

If this is right

  • The upper bound applies directly to both i.i.d. deletion and insertion channels.
  • It is computable without optimizing over distributions or using data-dependent tuning.
  • For deletion channels the bound is strictly tighter than the only previously available finite-length converse based on the binary erasure channel.
  • An exponential-time algorithm computes a corresponding achievability bound for any discrete-input discrete-output channel.
  • The same reference-distribution technique can be reused for other channels with finite alphabets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These bounds could serve as quick sanity checks when designing practical deletion-correcting codes for short packets.
  • Extending the reference distribution to channels with memory or continuous alphabets might produce analogous finite-length results.
  • Comparing the bound against known asymptotic capacity expressions for large block lengths would test consistency.
  • Implementing the algorithm for small block lengths could reveal how close the converse and achievability bounds sit for deletion channels.

Load-bearing premise

The chosen reference output distribution produces a bound that remains valid and reasonably tight for the specific deletion and insertion channels without requiring further optimization or data-dependent tuning.

What would settle it

For small block lengths, compute the actual largest code size that achieves the target frame error probability over the deletion channel and check whether it exceeds the new upper bound.

Figures

Figures reproduced from arXiv: 2504.20961 by Ruslan Morozov, Tolga Mete Duman.

Figure 1
Figure 1. Figure 1: The optimal code peformance, the GAVB and the LO-CVB fo [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The LO-CVB for the deletion channel (D (δ) m ) n versus the BEC bound and the normal approximation. The x-coordinate is equal to the total number of input bits N = mn. VII. CONCLUSIONS In this paper, we provide an upper bound on the code size for the deletion (insertion) channel with a given deletion (insertion) probability, input length, and target frame error rate. This is done by providing a reference o… view at source ↗
Figure 3
Figure 3. Figure 3: GAVB for channel I (ι) 12 . 0.7 0.8 0.9 1 1.1 10 20 50 100 200 500 1000 2000 4000 code rate N=m*n LO−CVB, fer=1e−7 N. app., m=12, fer=1e−7 LO−CVB, fer=1e−5 N. app., m=12, fer=1e−5 LO−CVB, fer=1e−3 N. app., m=12, fer=1e−3 LO−CVB, fer=0.01 N. app., m=12, fer=0.01 LO−CVB, fer=0.1 N. app., m=12, fer=0.1 capacity lower bound capacity upper bound, m=12 insertion channel, ι = 0.001 (a) ι = 0.001 0.1 0.2 0.3 0.4 0… view at source ↗
Figure 4
Figure 4. Figure 4: The LO-CVB for the insertion channel (I (ι) m ) n versus the normal approximation and the lower capacity bound [17]. The x-coordinate is equal to the total number of input bits N = mn [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
read the original abstract

We develop upper bounds on code size for an independent and identically distributed deletion and insertion channels for a given code length and target frame error probability. The bounds are obtained as a variation of a general converse bound, which, though available for any channel, is inefficient and not easily computable without a good reference distribution over the output alphabet. We obtain a reference output distribution for a general finite-input finite-output channel and provide a simple formula for the converse bound on the capacity employing this distribution. We then evaluate the bound for the deletion channel with a finite block length, and show that the resulting upper bound on the code size is tighter than that for a binary erasure channel, which is the only alternative converse bound for the finite-length setting. We also provide similar results for the insertion channel. Furthermore, we present a simple algorithm for computing an achievability bound for a general discrete-input discrete-output channel. Although the algorithm has exponential complexity, it is useful for comparison purposes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops finite-length upper bounds on code size for i.i.d. deletion and insertion channels by applying a variation of a general converse bound that relies on an explicit reference output distribution over the (variable-length) output alphabet. It derives a simple closed-form expression for the resulting converse and evaluates it numerically to claim a strictly tighter bound than the binary erasure channel alternative; it also supplies an exponential-time algorithm for computing matching achievability bounds on general discrete channels.

Significance. If the reference distribution is shown to be valid for variable-length outputs, the work supplies the first explicit, non-optimized finite-length converse for deletion/insertion channels that improves on the only previously available bound (BEC). The explicit reference distribution and the simple achievability algorithm constitute practical, reproducible tools that enable direct numerical comparison without parameter tuning.

major comments (2)
  1. [Section describing the reference distribution and its application to deletion/insertion channels] The central claim that the derived upper bound is strictly tighter than the BEC bound rests on the reference output distribution remaining valid once the output alphabet is the union of {0,1}^k for k = 0 … n. The manuscript must explicitly construct this distribution over the variable-length space and verify that it dominates the true output measure induced by the capacity-achieving input (or at least that the resulting information-spectrum bound stays valid).
  2. [Evaluation section for the deletion channel] The abstract states that the bound is obtained 'as a variation of a general converse bound' and then 'evaluated' for the deletion channel. The manuscript should include the precise statement of the general converse (including the exact divergence or information-spectrum quantity used) together with the numerical values of the resulting bound versus the BEC bound for at least one block length and error probability, so that the claimed improvement can be reproduced.
minor comments (2)
  1. The achievability algorithm is described as having 'exponential complexity' yet 'useful for comparison purposes.' Adding a short pseudocode listing or a complexity table for small n would improve clarity.
  2. Notation for the output alphabet of the deletion channel (variable-length binary strings) should be introduced once and used consistently when defining the reference distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our manuscript. We address each major comment below in a point-by-point manner and indicate planned revisions to strengthen the presentation and reproducibility of the results.

read point-by-point responses
  1. Referee: [Section describing the reference distribution and its application to deletion/insertion channels] The central claim that the derived upper bound is strictly tighter than the BEC bound rests on the reference output distribution remaining valid once the output alphabet is the union of {0,1}^k for k = 0 … n. The manuscript must explicitly construct this distribution over the variable-length space and verify that it dominates the true output measure induced by the capacity-achieving input (or at least that the resulting information-spectrum bound stays valid).

    Authors: We agree that an explicit construction over the variable-length output space is essential for rigor. In the revised manuscript we will insert a dedicated subsection that defines the reference distribution explicitly as the product measure induced by independent Bernoulli(1/2) symbols for each possible output length k = 0 to n, normalized appropriately to account for the deletion probability. We will then prove that this distribution satisfies the required dominance property with respect to the true output measure under the uniform input distribution, thereby ensuring the validity of the information-spectrum converse bound. This construction follows directly from the general finite-input finite-output reference distribution derived earlier in the paper but is specialized to the union of binary strings of length at most n. revision: yes

  2. Referee: [Evaluation section for the deletion channel] The abstract states that the bound is obtained 'as a variation of a general converse bound' and then 'evaluated' for the deletion channel. The manuscript should include the precise statement of the general converse (including the exact divergence or information-spectrum quantity used) together with the numerical values of the resulting bound versus the BEC bound for at least one block length and error probability, so that the claimed improvement can be reproduced.

    Authors: We concur that the current presentation would benefit from greater explicitness. We will revise the manuscript to include the precise statement of the general converse bound, specifying that the upper bound on the code size is given by the information-spectrum quantity involving the divergence between the induced output distribution and the chosen reference distribution. In addition, we will add a table in the evaluation section that reports the numerical values of our bound and the BEC bound for at least one concrete block length (e.g., n = 20) and target error probability (e.g., 10^{-3}), allowing direct reproduction of the claimed improvement. revision: yes

Circularity Check

0 steps flagged

No significant circularity; bounds derived from general converse with independently obtained reference distribution

full rationale

The paper presents a variation of a general converse bound for finite-length deletion and insertion channels by first obtaining a reference output distribution for any finite-input finite-output channel and then evaluating the resulting code-size upper bound. This reference distribution and the simple formula for the converse are derived as a general tool rather than fitted to deletion/insertion data or reduced to a self-citation chain. The claimed tightness is shown by direct comparison to the external binary erasure channel bound, with no equations or steps in the provided text reducing the target result to its own inputs by construction. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard information-theoretic axioms such as the definition of mutual information and the general converse bound; no free parameters or invented entities are introduced beyond the reference distribution choice.

axioms (1)
  • standard math Existence of a general converse bound applicable to any channel
    Invoked to obtain the variation used for deletion and insertion channels

pith-pipeline@v0.9.0 · 5701 in / 1162 out tokens · 59755 ms · 2026-05-22T18:21:22.044737+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 1 internal anchor

  1. [1]

    Three decades of na nopore sequencing,

    D. Deamer, M. Akeson, and D. Branton, “Three decades of na nopore sequencing,” Nature biotechnology, vol. 34, pp. 518–524, 05 2016. 10 10−2 100 6 9 12 FER log2 of code size LO−CVB ι=0.05 AVB ι=0.05 LO−CVB ι=0.2 AVB ι=0.2 LO−CVB ι=0.5 AVB ι=0.5 insertion channel, m=12, n=1 Fig. 3: GA VB for channel I (ι) 12 . 0.7 0.8 0.9 1 1.1 10 20 50 100 200 500 1000 200...

  2. [2]

    Next-generation dig ital information storage in DNA,

    G. M. Church, Y . Gao, and S. Kosuri, “Next-generation dig ital information storage in DNA,” Science, vol. 337, no. 6102, p. 1628, 2012

  3. [3]

    Towards practical, high-capacity, low-maintenance inf ormation storage in synthesized DNA,

    N. Goldman, P . Bertone, S. Chen et al. , “Towards practical, high-capacity, low-maintenance inf ormation storage in synthesized DNA,” Nature, vol. 494, no. 7435, pp. 77–80, Feb 2013

  4. [4]

    Low cost DNA data storage using photolithographic synthe sis and advanced information reconstruction and error corr ection,

    P . L. Antkowiak et al. , “Low cost DNA data storage using photolithographic synthe sis and advanced information reconstruction and error corr ection,” Nature Commun. , vol. 11, no. 1, pp. 1–10, Dec 2020

  5. [5]

    An end-to-end coding scheme fo r DNA-based data storage with nanopore-sequenced reads,

    L. Welter, R. Sokolovskii, T. Heinis, A. Wachter-Zeh, E. Rosnes, and A. Graell i Amat, “An end-to-end coding scheme fo r DNA-based data storage with nanopore-sequenced reads,” 2024. [Online]. Availabl e: https://arxiv.org/abs/2406.12955

  6. [6]

    Channel coding r ate in the finite blocklength regime,

    Y . Polyanskiy, H. V . Poor, and S. V erdu, “Channel coding r ate in the finite blocklength regime,” IEEE Transactions on Information Theory , vol. 56, no. 5, pp. 2307–2359, 2010

  7. [7]

    Approaching the norm al approximation of the finite blocklength capacity within 0 .025 db by short polar codes,

    J. Piao, K. Niu, J. Dai, and C. Dong, “Approaching the norm al approximation of the finite blocklength capacity within 0 .025 db by short polar codes,” IEEE Wireless Communications Letters , vol. 9, no. 7, pp. 1089–1092, 2020

  8. [8]

    Fini te blocklength performance bound for the DNA storage channe l,

    I. Maarouf, G. Liva, E. Rosnes, and A. Graell i Amat, “Fini te blocklength performance bound for the DNA storage channe l,” in 2023 12th International Symposium on Topics in Coding (ISTC) , 2023, pp. 1–5

  9. [9]

    C. E. Shannon, Lower Bounds to Error Probability for Coding on Discrete Mem oryless Channels. I . IEEE, 1993, pp. 385–423

  10. [10]

    On the evaluation of the Polyanskiy-Poor– Verdú converse bound for finite block-length coding in AWGN,

    T. Erseghe, “On the evaluation of the Polyanskiy-Poor– Verdú converse bound for finite block-length coding in AWGN, ” IEEE Transactions on Information Theory, vol. 61, no. 12, pp. 6578–6590, 2015

  11. [11]

    The DNA storage channel: C apacity and error probability bounds,

    N. Weinberger and N. Merhav, “The DNA storage channel: C apacity and error probability bounds,” IEEE Transactions on Information Theory , vol. 68, no. 9, pp. 5657–5700, 2022

  12. [12]

    Information rates of the noisy nanopore channel,

    B. McBain, E. Viterbo, and J. Saunderson, “Information rates of the noisy nanopore channel,” IEEE Transactions on Information Theory , vol. 70, no. 8, pp. 5640–5652, 2024

  13. [13]

    Asymptotic Estimates in Information Theory with Non-Vanishing Error Probabilities

    V . Y . F. Tan, “Asymptotic estimates in information theo ry with non-vanishing error probabilities,” CoRR, vol. abs/1504.02608, 2015. [Online]. Available: http://arxiv.org/abs/1504.02608

  14. [14]

    A general formula for channel cap acity,

    S. V erdu and T. S. Han, “A general formula for channel cap acity,” IEEE Transactions on Information Theory , vol. 40, no. 4, pp. 1147–1157, 1994

  15. [15]

    Novel bounds on the capaci ty of the binary deletion channel,

    D. Fertonani and T. M. Duman, “Novel bounds on the capaci ty of the binary deletion channel,” IEEE Transactions on Information Theory , vol. 56, no. 6, pp. 2753–2765, 2010

  16. [16]

    Upper bounds on the capacity of deletion channels using channel fragmentation,

    M. Rahmati and T. M. Duman, “Upper bounds on the capacity of deletion channels using channel fragmentation,” IEEE Transactions on Information Theory, vol. 61, no. 1, pp. 146–156, 2015

  17. [17]

    Bounds on the capacity of random insertion and dele tion-additive noise channels,

    ——, “Bounds on the capacity of random insertion and dele tion-additive noise channels,” IEEE Transactions on Information Theory , vol. 59, no. 9, pp. 5534–5546, 2013