Capacity-Achieving Codes for Noisy Insertion Channels
Pith reviewed 2026-05-18 13:07 UTC · model grok-4.3
The pith
The coding capacity of a noisy insertion channel is determined and achieved by asymptotically optimal codes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We determine the coding capacity of the noisy channel and construct asymptotically optimal error-correcting codes achieving the coding capacity. The channel allows infinitely many insertions of complement or identical symbols and up to one insertion of a random symbol.
What carries the argument
The noisy insertion channel model that includes unlimited identical or complementary insertions and at most one random insertion, along with the capacity calculation and code constructions that achieve it.
If this is right
- The capacity value sets the maximum rate for reliable communication over this channel.
- Explicit codes can correct the described insertion errors while operating close to the capacity.
- As block lengths increase, the codes' performance approaches the theoretical limit.
- DNA storage systems can use these codes to improve data recovery reliability.
Where Pith is reading between the lines
- If the model holds, similar capacity results might apply to related channels with different noise levels.
- Testing these codes in actual DNA synthesis and sequencing experiments could validate the approach.
- Extensions to handle multiple random insertions could build on this foundation.
Load-bearing premise
The specific error model with infinitely many complement or identical insertions plus at most one random insertion correctly represents the main errors in DNA storage.
What would settle it
A calculation or proof showing that the capacity is different from what is claimed, or an experiment where codes exceed the stated capacity without error, would disprove the result.
read the original abstract
DNA storage has emerged as a promising solution for large-scale and long-term data preservation. Among various error types, insertions are the most frequent errors occurring in DNA sequences, where the inserted symbol is often identical or complementary to the original, and in practical implementations, noise can further cause the inserted symbol to mutate into a random one, which creates significant challenges to reliable data recovery. In this paper, we investigate a new noisy insertion channel, where infinitely many insertions of symbols complement or identical to the original ones and up to one insertion of random symbol may occur. We determine the coding capacity of the noisy channel and construct asymptotically optimal error-correcting codes achieving the coding capacity.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript investigates a noisy insertion channel for DNA storage, in which the channel permits arbitrarily many insertions of symbols identical or complementary to the original and at most one insertion of a random symbol. It claims to derive the coding capacity of this channel via a mutual-information characterization and to construct asymptotically optimal error-correcting codes that achieve the capacity through a random-coding argument.
Significance. If the derivations hold, the work supplies an explicit capacity formula and a matching achievability result for a channel model that captures dominant insertion statistics in DNA storage. The combination of a converse bound and random-coding construction is a standard and rigorous approach in information theory; the result would strengthen the theoretical toolkit for synchronization-error channels.
major comments (1)
- §4 (Capacity Derivation): the mutual-information expression for capacity must explicitly account for the normalization of output length under unbounded insertions; without a precise statement of the rate definition (e.g., bits per input symbol versus per output symbol), the claimed capacity value cannot be verified as load-bearing.
minor comments (3)
- Abstract: the channel model description is terse; adding one sentence that states the resulting capacity expression would improve accessibility.
- Notation: the symbols for identical, complementary, and random insertions should be introduced once in §2 and used consistently thereafter.
- References: several foundational papers on deletion/insertion channels (e.g., works by Dobrushin and on DNA-specific models) appear to be missing.
Simulated Author's Rebuttal
We thank the referee for the careful reading of our manuscript and for the constructive comment on the capacity derivation. We address the point below and will revise the manuscript accordingly to improve clarity.
read point-by-point responses
-
Referee: §4 (Capacity Derivation): the mutual-information expression for capacity must explicitly account for the normalization of output length under unbounded insertions; without a precise statement of the rate definition (e.g., bits per input symbol versus per output symbol), the claimed capacity value cannot be verified as load-bearing.
Authors: We agree that an explicit statement of the rate definition improves verifiability. In our derivation the coding capacity is defined in the standard information-theoretic sense as the supremum of achievable rates R where R = (1/n) log |C| (bits per input symbol) for block length n, such that there exist codes with vanishing error probability as n → ∞. The mutual-information characterization in Section 4 is obtained by maximizing I(X;Y) under the channel law that incorporates the (at most one) random insertion together with the insertions of identical or complementary symbols; the variable output length is handled asymptotically because the channel model ensures that the number of insertions is o(n) with high probability under the optimizing input distribution. Nevertheless, to address the referee’s concern directly we will add a dedicated paragraph at the beginning of Section 4 that (i) states the rate definition explicitly as bits per input symbol and (ii) explains how the normalization is preserved under the unbounded but probabilistically controlled insertions. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper explicitly defines the noisy insertion channel via transition probabilities (arbitrarily many identical/complement insertions plus at most one random-symbol insertion), then derives capacity as the maximum mutual information over input distributions and proves achievability via random coding that matches the converse. These steps follow standard information-theoretic definitions and proof techniques without reducing the claimed capacity or code optimality to fitted parameters, self-citations, or ansatzes imported from prior work by the same authors. The result is therefore independent of its own outputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We determine the coding capacity of the noisy channel and construct asymptotically optimal error-correcting codes achieving the coding capacity.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
cap_noisy_q = log_q(q-2)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
and Schwartz, M.: On the reverse-complement sequence- duplication system
Ben-Tolila, E. and Schwartz, M.: On the reverse-complement sequence- duplication system. IEEE Trans. Inform. Theory 68, 7184–7197 (2022)
work page 2022
-
[2]
and Kosuri, S.: Next-generation digital information storage in DNA
Church, G.M., Gao, Y. and Kosuri, S.: Next-generation digital information storage in DNA. Science 337, 1628 (2012)
work page 2012
-
[3]
and Bruck, J.: The capacity of sequence-duplication systems
Farnoud, F., Schwartz, M. and Bruck, J.: The capacity of sequence-duplication systems. IEEE Trans. Inform. Theory 62, 811–824 (2016)
work page 2016
-
[4]
and Bruck, J.: Estimation of duplication history under a stochastic model for tandem repeats
Farnoud, F., Schwartz, M. and Bruck, J.: Estimation of duplication history under a stochastic model for tandem repeats. BMC Bioinformatics 20, 1–11 (2019) 15
work page 2019
-
[5]
Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E.M., Sipos, B. and Birney, E.: Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494, 77–80 (2013)
work page 2013
-
[6]
and Vorobyev, I.: Codes Correcting Long Duplica- tion Errors
Goshkoder, D., Polyanskii, N. and Vorobyev, I.: Codes Correcting Long Duplica- tion Errors. IEEE Trans. Molecular, Biological, Multi-Scale Commun. 10, 272–288 (2024)
work page 2024
-
[7]
and Sala, F.: Codes correcting two deletions
Gabrys, R. and Sala, F.: Codes correcting two deletions. IEEE Trans. Inform. Theory 6, 965-974 (2018)
work page 2018
-
[8]
and Ferreira, H.C.: On multiple insertion/deletion correcting codes
Helberg, A.S. and Ferreira, H.C.: On multiple insertion/deletion correcting codes. IEEE Trans. Inform. Theory 48, 305-308 (2002)
work page 2002
-
[9]
and Bruck, J.: Capacity and expressiveness of genomic tandem duplication
Jain, S., Farnoud, F. and Bruck, J.: Capacity and expressiveness of genomic tandem duplication. IEEE Trans. Inform. Theory 63, 6129–6138 (2017)
work page 2017
-
[10]
and Bruck, J.: Duplication-correcting codes for data storage in the DNA of living organisms
Jain, S., Farnoud, F., Schwartz, M. and Bruck, J.: Duplication-correcting codes for data storage in the DNA of living organisms. IEEE Trans. Inform. Theory 63, 4996–5010 (2017)
work page 2017
-
[11]
and De Figueiredo, P.: DNA watermarking of infectious agents: Progress and prospects
Jupiter, D.C., Ficht, T.A., Samuel, J., Qin, Q.M. and De Figueiredo, P.: DNA watermarking of infectious agents: Progress and prospects. PLoS pathogens, 6(6), p.e1000950. (2010)
work page 2010
-
[12]
Kovaˇ cevi´ c, M.: Zero-error capacity of duplication channels. IEEE Trans. Com- mun. 67, 6735–6742 (2019)
work page 2019
-
[13]
Soviet Physics Doklady 10, 707–710 (1966)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)
work page 1966
-
[14]
and Yaakobi, E.: Duplication-correcting codes
Lenz, A., Wachter-Zeh, A. and Yaakobi, E.: Duplication-correcting codes. Des. Codes Cryptogr. 87, 277–298 (2019)
work page 2019
-
[15]
and Xing, C.: Explicit construction ofq-ary 2-deletion correcting codes with low redundancy
Liu, S., Tjuawinata, I. and Xing, C.: Explicit construction ofq-ary 2-deletion correcting codes with low redundancy. IEEE Trans. Inform. Theory, 70, 4093-4101 (2024)
work page 2024
-
[16]
and Zhang, Y.:t-deletion-s-insertion-burst correcting codes
Lu, Z. and Zhang, Y.:t-deletion-s-insertion-burst correcting codes. IEEE Trans. Inform. Theory 69, 6401–6413 (2023)
work page 2023
-
[17]
Mundy, N.I. and Helbig, A.J.: Origin and evolution of tandem repeats in the mitochondrial DNA control region of shrikes. J. Molecular Evolution 59, 250–257 (2004)
work page 2004
-
[18]
Nguyen T.T., Cai K., Song W., Immink K.A.S.: Optimal single chromosome- inversion correcting codes for data storage in live DNA. In: Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT2022), Espoo, 16 Finland, pp. 1791–1796 (2022)
work page 2022
-
[19]
Pumpernik, D., Oblak, B. and Boratnik, B.: Replication slippage versus point mutation rates in short tandem repeats of the human genome, Mol. Genet. Genomics 279, 53–61 (2008)
work page 2008
-
[20]
Reinsel, D., Rydning, J., and Gantz, J.: Worldwide global datasphere forecast, 2020–2024: The covid-19 data bump and the future of data growth. Int. Data Corp.(IDC) (2020)
work page 2020
-
[21]
Saeki, T. and Nozaki, T.: An improvement of non-binary code correcting single b-burst of insertions or deletions. In: Proc. Int. Symp. Inform. Theory Appl. (ISITA), 6–10 (2018)
work page 2018
-
[22]
and Yaakobi, E.: Codes correcting a burst of deletions or insertions
Schoeny, C., Wachter-Zeh, A., Gabrys, R. and Yaakobi, E.: Codes correcting a burst of deletions or insertions. IEEE Trans. Inform. Theory 63, 1971–1985 (2017)
work page 1971
-
[23]
In 2017 51st Asilomar Conference on Signals, Systems, and Computers 511-515 (2017)
Schoeny, C., Sala, F., Dolecek, L.: Novel combinatorial coding results for DNA sequencing and data storage. In 2017 51st Asilomar Conference on Signals, Systems, and Computers 511-515 (2017)
work page 2017
-
[24]
and Bruck, J.: Error Correction for DNA Storage
Sima, J., Raviv, N., Schwartz, M. and Bruck, J.: Error Correction for DNA Storage. IEEE BITS Inform. Theory Mag. 3, 78–94 (2023)
work page 2023
-
[25]
and Bruck, J.: Two deletion correcting codes from indicator vectors
Sima, J., Raviv, N. and Bruck, J.: Two deletion correcting codes from indicator vectors. IEEE trans. Inform. Theory 66, 2375-2391 (2019)
work page 2019
-
[26]
Shipman, S.L., Nivala, J., Macklis, J.D. and Church, G.M.: CRISPR-Cas encoding of digital movie into the genomes of a population of living bacteria. Nature 547, 345–349 (2017)
work page 2017
-
[27]
and Cai, K.: Non-binary two-deletion correcting codes and burst- deletion correcting codes
Song, W. and Cai, K.: Non-binary two-deletion correcting codes and burst- deletion correcting codes. IEEE Trans. Inform. Theory 69, 6470-6484 (2023)
work page 2023
-
[28]
and Ge, G.: Asymptotically Optimal Codes for (t, s)- Burst Error
Sun, Y., Lu, Z., Zhang, Y. and Ge, G.: Asymptotically Optimal Codes for (t, s)- Burst Error. IEEE Trans. Inform. Theory 71, 1570-1584 (2025)
work page 2025
-
[29]
and Farnoud, F.: Error-correcting codes for short tandem duplication and edit errors
Tang, Y. and Farnoud, F.: Error-correcting codes for short tandem duplication and edit errors. IEEE Trans. Inform. Theory 68, 871–880 (2021)
work page 2021
-
[30]
and Farnoud, F.: Low-redundancy codes for correcting multiple short-duplication and edit errors
Tang, Y., Wang, S., Lou, H., Gabrys, R. and Farnoud, F.: Low-redundancy codes for correcting multiple short-duplication and edit errors. IEEE Trans. Inform. Theory 69, 2940–2954 (2023)
work page 2023
-
[31]
and Farnoud, F.: Single-error detection and correction for duplication and substitution channels
Tang, Y., Yehezkeally, Y., Schwartz, M. and Farnoud, F.: Single-error detection and correction for duplication and substitution channels. IEEE Trans. Inform. Theory 66, 6908–6919 (2020) 17
work page 2020
-
[32]
and Farnoud, F.: Error-correcting codes for noisy duplication channels
Tang, Y. and Farnoud, F.: Error-correcting codes for noisy duplication channels. IEEE Trans. Inform. Theory 67, 3452-3463 (2021)
work page 2021
-
[33]
and Farnoud, F.: Error-correcting codes for short tandem duplications and at mostpsubstitutions
Tang, Y., Lou, H. and Farnoud, F.: Error-correcting codes for short tandem duplications and at mostpsubstitutions. In IEEE International Symposium on Information Theory. (ISIT), 1835-1840 (2021)
work page 2021
-
[34]
Tenengolts, G.,: Nonbinary codes correcting single deletion or insertion. IEEE Trans. Inform. Theory 30, 766–769 (1984)
work page 1984
-
[35]
and Varshamov, R.: A code that corrects single unsymmetric errors
Tenengolts, G. and Varshamov, R.: A code that corrects single unsymmetric errors. Avtomatika Telemekhanika 26, 288–292 (1965)
work page 1965
-
[36]
and Farnoud, F.: Non-binary codes for correcting a burst of at mosttdeletions
Wang, S., Tang, Y., Sima, J., Gabrys, R. and Farnoud, F.: Non-binary codes for correcting a burst of at mosttdeletions. IEEE Trans. Inform. Theory 70, 964-979 (2023)
work page 2023
-
[37]
and Schwartz, M.: On duplication-free codes for disjoint or equal-length errors
Yu, W. and Schwartz, M.: On duplication-free codes for disjoint or equal-length errors. Des. Codes Cryptogr. 92, 2845–2861 (2024)
work page 2024
-
[38]
Yohananov, L. and Schwartz, M.: On the coding capacity of reverse-complement and palindromic duplication-correcting codes. Des. Codes Cryptogr. 93, 3283- 3302 (2025)
work page 2025
-
[39]
and Gulliver, T.A.: Construction of tandem duplication correcting codes
Zeraatpisheh, M., Esmaeili, M. and Gulliver, T.A.: Construction of tandem duplication correcting codes. IET Commun. 13, 2217–2225 (2019)
work page 2019
-
[40]
and Gulliver, T.A.: Construction of duplication correcting codes
Zeraatpisheh, M., Esmaeili, M. and Gulliver, T.A.: Construction of duplication correcting codes. IEEE Access 8, 96150–96161 (2020) 18
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.