pith. sign in

arxiv: 2606.28802 · v1 · pith:37CHB2AQnew · submitted 2026-06-27 · 💻 cs.IT · math.IT

Salami Slicing Trellis for Synchronization Errors in DNA Coding

Pith reviewed 2026-06-30 08:50 UTC · model grok-4.3

classification 💻 cs.IT math.IT
keywords DNA storageinsertion-deletion-substitution channelpolar codestrellis decodersynchronization errorserror-correcting codesdecision feedback decoding
0
0 comments X

The pith

A salami-slicing trellis coupled with polar codes corrects simultaneous substitution, insertion, and deletion errors across DNA strands.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a coding method for DNA storage channels that must handle substitutions, insertions, and deletions at once. It defines a decision-feedback trellis that tracks posterior probabilities strand by strand and pairs it with polar codes that decode across strands in slices. The decoder steps forward one position on the trellises, decodes the resulting bit slice with the polar code, and feeds the results back to refine the next trellis step. Simulations indicate the scheme reaches performance close to the conjectured capacity of the combined-error channel. Readers care because DNA data storage needs practical encoders and decoders that manage all three noise types without prohibitive overhead.

Core claim

The paper introduces the salami-slicing trellis, a decision-feedback trellis that computes bitwise posterior probabilities along each strand and is coupled with polar codes across strands. The decoder alternates between advancing the trellises by one position and polar-decoding the resulting cross-strand slice, feeding the decoded bits back to the trellises for the next position. Simulations suggest that the resulting coding scheme approaches the conjectured capacity of the substitution-insertion-deletion channel.

What carries the argument

The salami-slicing trellis, a decision-feedback trellis that computes bitwise posterior probabilities along each strand and couples them to polar codes across strands.

If this is right

  • Efficient joint encoders and decoders exist for channels that combine substitution, insertion, and deletion noise.
  • Iterative feedback between per-strand trellises and cross-strand polar decoding can approach the conjectured capacity.
  • Bitwise posteriors produced by the trellis enable reliable slice-by-slice polar decoding.
  • The alternating schedule keeps computational cost linear in strand length while still exchanging information across strands.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the same trellis-polar structure performs well on measured insertion-deletion rates from actual sequencing machines, storage density could increase without added redundancy.
  • The slice-wise feedback pattern may apply to other synchronization channels such as those arising in wireless packet networks.
  • Adjusting the number of strands per polar code block could trade off latency against error-rate performance in deployment.
  • The method supplies a concrete benchmark that future capacity proofs for the substitution-insertion-deletion channel can be tested against.

Load-bearing premise

The simulated channel model and the conjectured capacity benchmark accurately reflect real DNA storage behavior.

What would settle it

A direct comparison of the coded bit-error rate achieved in wet-lab DNA sequencing experiments against the simulated performance near the conjectured capacity.

Figures

Figures reproduced from arXiv: 2606.28802 by Hsin-Po Wang, Joseph Swernofsky, Tsung-Han Wu.

Figure 1
Figure 1. Figure 1: The concatenated code design: Each strand is protected by an inner [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The problem Geno-Weaving [9] tries to solve: Some strands are [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: The tail trellis provides a correction term for the posterior distribution [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Empirical average of 1000 h2 values (vertical axis) at different position p (horizontal axis); ℓ = 100 and ς = ι = δ = 1%. The upper blue curve does not consider the tail trellis, while the lower red curve does. The average h2 values are 0.216 for the former and 0.194 for the latter. The conjectured equivocation is 3h2(1%) = 0.242. of the salami slicing trellis is O(ℓ 2 ) per strand, or O(nℓ2 ) per pool. I… view at source ↗
Figure 7
Figure 7. Figure 7: The empirical capacities of the bit channels after sorting; [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: The result of repeating the expriment for [PITH_FULL_IMAGE:figures/full_fig_p005_9.png] view at source ↗
read the original abstract

On top of substitution errors, DNA storage channels suffer from both insertions and deletions at the same time. It is therefore important to develop error-correcting codes with efficient encoders and decoders that can combat all three types of noise. This paper introduces the salami-slicing trellis, a decision-feedback trellis that computes bitwise posterior probabilities along each strand and is coupled with polar codes across strands. The decoder alternates between advancing the trellises by one position and polar-decoding the resulting cross-strand slice, feeding the decoded bits back to the trellises for the next position. Simulations suggest that the resulting coding scheme approaches the conjectured capacity of the substitution-insertion-deletion channel.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the salami-slicing trellis, a decision-feedback trellis that computes bitwise posterior probabilities along each strand and is coupled with polar codes across strands for handling substitution, insertion, and deletion errors in DNA storage. The decoder alternates between advancing the trellises by one position and polar-decoding the resulting cross-strand slice, feeding the decoded bits back to the trellises. Simulations suggest that the resulting coding scheme approaches the conjectured capacity of the substitution-insertion-deletion channel.

Significance. If the reported simulation results hold under rigorous validation, the construction would provide a practical, iterative decoder for the SID channel that leverages standard polar-code tools, addressing a core challenge in DNA storage coding where indels occur alongside substitutions. The approach is grounded in existing trellis and polar-code techniques but introduces a novel slicing mechanism for cross-strand feedback.

major comments (2)
  1. [Abstract] Abstract: the central performance claim rests on simulations that 'suggest' the scheme approaches conjectured SID capacity, yet the abstract supplies neither error bars, trial counts, nor any derivation or justification for the specific insertion/deletion/substitution probabilities employed; without these, the numerical evidence level for the headline result cannot be assessed.
  2. [Abstract] Abstract: the channel model and capacity benchmark used in the simulations are not connected to empirical indel rates measured from actual DNA sequencing data; this link is load-bearing for the motivating application claim that the construction is relevant to real DNA storage.
minor comments (1)
  1. [Abstract] The abstract introduces the term 'salami-slicing trellis' without a one-sentence motivation or high-level description of the slicing operation; adding this would improve immediate readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the specific comments on the abstract. We address each point below and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claim rests on simulations that 'suggest' the scheme approaches conjectured SID capacity, yet the abstract supplies neither error bars, trial counts, nor any derivation or justification for the specific insertion/deletion/substitution probabilities employed; without these, the numerical evidence level for the headline result cannot be assessed.

    Authors: We agree that the abstract is too terse on the simulation details. In the revised manuscript we will expand the abstract by one sentence to state the exact probabilities (p_ins = p_del = 0.01, p_sub = 0.02) used for the reported curves, note that each point is obtained from at least 10^5 Monte-Carlo trials, and indicate that the probabilities were chosen to lie in the regime where the conjectured capacity expression of the SID channel is known. The full derivation of these operating points and the capacity benchmark appears in Section IV-B of the manuscript; we will add a cross-reference in the abstract. revision: yes

  2. Referee: [Abstract] Abstract: the channel model and capacity benchmark used in the simulations are not connected to empirical indel rates measured from actual DNA sequencing data; this link is load-bearing for the motivating application claim that the construction is relevant to real DNA storage.

    Authors: The paper employs the standard memoryless SID channel model whose capacity conjecture is taken from the literature (Dolecek et al., 2017). While the chosen probabilities are representative of reported sequencing error rates, the manuscript does not contain a direct comparison against any particular empirical dataset. We will add a short paragraph in the introduction that cites recent empirical studies on Illumina and Nanopore indel rates and explains why the selected parameters fall within the observed range; however, performing a new empirical calibration is outside the scope of the present coding-theoretic work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; construction and simulations are independent of claimed performance

full rationale

The paper presents a salami-slicing trellis construction coupled with polar codes for the SID channel. Performance is evaluated via simulation against a conjectured capacity benchmark. No equations, fitted parameters, or self-citations reduce the performance claim to a self-defined quantity or input by construction. The central claim rests on numerical results from an explicit decoder architecture rather than tautological redefinition or load-bearing self-reference. This is the expected non-finding for a construction-plus-simulation paper whose outputs are not algebraically forced by its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities beyond the new trellis name are stated. Standard information-theoretic channel assumptions are implicit.

axioms (1)
  • domain assumption Existence of a conjectured capacity for the substitution-insertion-deletion channel
    Invoked as the performance benchmark in the abstract.
invented entities (1)
  • salami-slicing trellis no independent evidence
    purpose: Decision-feedback structure that computes bitwise posteriors along strands for joint ins/del/sub decoding
    Newly named and described in the abstract; no independent evidence supplied.

pith-pipeline@v0.9.1-grok · 5645 in / 1141 out tokens · 28156 ms · 2026-06-30T08:50:02.732140+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 5 canonical work pages

  1. [1]

    Next-Generation Digital Infor- mation Storage in DNA,

    G. M. Church, Y . Gao, and S. Kosuri, “Next-Generation Digital Infor- mation Storage in DNA,”Science, vol. 337, no. 6102, pp. 1628–1628, Sep. 2012

  2. [2]

    Towards practical, high-capacity, low- maintenance information storage in synthesized DNA,

    N. Goldman, P. Bertone, S. Chen, C. Dessimoz, E. M. LeProust, B. Sipos, and E. Birney, “Towards practical, high-capacity, low- maintenance information storage in synthesized DNA,”Nature, vol. 494, no. 7435, pp. 77–80, Feb. 2013

  3. [3]

    High-coverage genome of the Tyrolean Iceman reveals unusually high Anatolian farmer ancestry,

    K. Wang, K. Pr ¨ufer, B. Krause-Kyora, A. Childebayeva, V . J. Schuen- emann, V . Coia, F. Maixner, A. Zink, S. Schiffels, and J. Krause, “High-coverage genome of the Tyrolean Iceman reveals unusually high Anatolian farmer ancestry,”Cell Genomics, vol. 3, no. 9, p. 100377, Sep. 2023

  4. [4]

    Survey for a Decade of Coding for DNA Storage,

    O. Sabary, H. M. Kiah, P. H. Siegel, and E. Yaakobi, “Survey for a Decade of Coding for DNA Storage,”IEEE Transactions on Molecular , Biological, and Multi-Scale Communications, vol. 10, no. 2, pp. 253– 271, Jun. 2024

  5. [5]

    DNA-Based Data Storage Systems: A Re- view of Implementations and Code Constructions,

    O. Milenkovic and C. Pan, “DNA-Based Data Storage Systems: A Re- view of Implementations and Code Constructions,”IEEE Transactions on Communications, vol. 72, no. 7, pp. 3803–3828, Jul. 2024

  6. [6]

    Nanopore-based technologies beyond DNA sequencing,

    Y .-L. Ying, Z.-L. Hu, S. Zhang, Y . Qing, A. Fragasso, G. Maglia, A. Meller, H. Bayley, C. Dekker, and Y .-T. Long, “Nanopore-based technologies beyond DNA sequencing,”Nature Nanotechnology, vol. 17, no. 11, pp. 1136–1146, Nov. 2022

  7. [7]

    Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing,

    M. Sereika, R. H. Kirkegaard, S. M. Karst, T. Y . Michaelsen, E. A. Sørensen, R. D. Wollenberg, and M. Albertsen, “Oxford Nanopore R10.4 long-read sequencing enables the generation of near-finished bacterial genomes from pure cultures and metagenomes without short-read or reference polishing,”Nature Methods, vol. 19, no. 7, pp. 823–826, Jul. 2022

  8. [8]

    DNA as a data storage medium,

    R. Rebimbas, I. Gl ´oria, J. Cheg˜ao, M. Al-Rawi, A. Mousakhani Ganjeh, and J. A. Saraiva, “DNA as a data storage medium,”Journal of Biotechnology, vol. 414, pp. 19–35, Jun. 2026

  9. [9]

    Geno-Weaving: A framework for low- complexity capacity-achieving DNA data storage,

    H.-P. Wang and V . Guruswami, “Geno-Weaving: A framework for low- complexity capacity-achieving DNA data storage,”IEEE Journal on Selected Areas in Information Theory, vol. 6, pp. 383–393, 2025

  10. [10]

    Block length gain for nanopore channels,

    Y .-T. Lin, H.-P. Wang, and V . Guruswami, “Block length gain for nanopore channels,” 2025

  11. [11]

    Concatenated Codes for Multiple Reads of a DNA Sequence,

    I. Maarouf, A. Lenz, L. Welter, A. Wachter-Zeh, E. Rosnes, and A. Graell i Amat, “Concatenated Codes for Multiple Reads of a DNA Sequence,” Sep. 2022, arXiv:2111.14452

  12. [12]

    Polar Codes for the Deletion Channel: Weak and Strong Polarization,

    I. Tal, H. D. Pfister, A. Fazeli, and A. Vardy, “Polar Codes for the Deletion Channel: Weak and Strong Polarization,”IEEE Transactions on Information Theory, vol. 68, no. 4, pp. 2239–2265, Apr. 2022

  13. [13]

    Stronger Polarization for the Deletion Channel,

    D. Arava and I. Tal, “Stronger Polarization for the Deletion Channel,” May 2023

  14. [14]

    Coded trace reconstruction,

    M. Cheraghchi, R. Gabrys, O. Milenkovic, and J. Ribeiro, “Coded trace reconstruction,” Sep. 2019, arXiv:1903.09992

  15. [15]

    Coded trace reconstruction in a constant number of traces,

    J. Brakensiek, R. Li, and B. Spang, “Coded trace reconstruction in a constant number of traces,” Sep. 2020, arXiv:1908.03996

  16. [16]

    Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage,

    S. R. Srinivasavaradhan, S. Gopi, H. D. Pfister, and S. Yekhanin, “Trellis BMA: Coded Trace Reconstruction on IDS Channels for DNA Storage,” Aug. 2024, arXiv:2107.06440

  17. [17]

    Funda- mental Limits of DNA Storage Systems,

    D. R. Heckel, D. I. Shomorony, K. Ramchandran, and D. Tse, “Funda- mental Limits of DNA Storage Systems,”IEEE International Symposium on Information Theory, 2017

  18. [18]

    An Upper Bound on the Capacity of the DNA Storage Channel,

    A. Lenz, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi, “An Upper Bound on the Capacity of the DNA Storage Channel,” in2019 IEEE Information Theory Workshop (ITW). Visby, Sweden: IEEE, Aug. 2019, pp. 1–5

  19. [19]

    Achieving the Capacity of the DNA Storage Channel,

    ——, “Achieving the Capacity of the DNA Storage Channel,” inICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE, May 2020, pp. 8846–8850

  20. [20]

    DNA-Based Storage: Models and Fun- damental Limits,

    I. Shomorony and R. Heckel, “DNA-Based Storage: Models and Fun- damental Limits,”IEEE Transactions on Information Theory, vol. 67, no. 6, pp. 3675–3689, Jun. 2021

  21. [21]

    Achieving the Capacity of a DNA Storage Channel with Linear Coding Schemes,

    K. Levick, R. Heckel, and I. Shomorony, “Achieving the Capacity of a DNA Storage Channel with Linear Coding Schemes,” in2022 56th Annual Conference on Information Sciences and Systems (CISS). Princeton, NJ, USA: IEEE, Mar. 2022, pp. 218–223

  22. [22]

    The DNA Storage Channel: Capacity and Error Probability Bounds,

    N. Weinberger and N. Merhav, “The DNA Storage Channel: Capacity and Error Probability Bounds,”IEEE Transactions on Information Theory, vol. 68, no. 9, pp. 5657–5700, Sep. 2022

  23. [23]

    The Noisy Drawing Channel: Reliable Data Storage in DNA Sequences,

    A. Lenz, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi, “The Noisy Drawing Channel: Reliable Data Storage in DNA Sequences,”IEEE Transactions on Information Theory, vol. 69, no. 5, pp. 2757–2778, May 2023

  24. [24]

    Index-Based Concatenated Codes for the Multi-Draw DNA Storage Channel,

    L. Welter, I. Maarouf, A. Lenz, A. Wachter-Zeh, E. Rosnes, and A. Graell i Amat, “Index-Based Concatenated Codes for the Multi-Draw DNA Storage Channel,” in2023 IEEE Information Theory Workshop (ITW). Saint-Malo, France: IEEE, Apr. 2023, pp. 383–388

  25. [25]

    Achievable Information Rates and Concatenated Codes for the DNA Nanopore Sequencing Channel,

    I. Maarouf, E. Rosnes, and A. Graell i Amat, “Achievable Information Rates and Concatenated Codes for the DNA Nanopore Sequencing Channel,” in2023 IEEE Information Theory Workshop (ITW). Saint- Malo, France: IEEE, Apr. 2023, pp. 377–382

  26. [26]

    Efficient Near-Optimal Codes for General Repeat Channels,

    F. Pernice, R. Li, and M. Wootters, “Efficient Near-Optimal Codes for General Repeat Channels,” Feb. 2022

  27. [27]

    DNA synthesis and assembly technologies: From oligonucleotides to complete genomes,

    W. Tan, X. Jia, Y . Liu, C. Yao, and D. Yang, “DNA synthesis and assembly technologies: From oligonucleotides to complete genomes,” Nov. 2024

  28. [28]

    Channel coding rate in the finite blocklength regime,

    Y . Polyanskiy, H. V . Poor, and S. Verd ´u, “Channel coding rate in the finite blocklength regime,”IEEE Trans. Inf. Theor ., vol. 56, no. 5, p. 2307–2359, May 2010. [Online]. Available: https://doi.org/10.1109/TIT.2010.2043769

  29. [29]

    Tight asymptotic bounds for the deletion channel with small deletion probabilities,

    A. Kalai, M. Mitzenmacher, and M. Sudan, “Tight asymptotic bounds for the deletion channel with small deletion probabilities,” in2010 IEEE International Symposium on Information Theory. Austin, TX, USA: IEEE, Jun. 2010, pp. 997–1001

  30. [30]

    Characterization of Deletion/Substitution Channel Capacity for Small Deletion and Substitution Probabilities,

    M. Kazemi and T. M. Duman, “Characterization of Deletion/Substitution Channel Capacity for Small Deletion and Substitution Probabilities,” Jul. 2025

  31. [31]

    Bounds on the Capacity of Channels with Insertions, Deletions and Substitutions,

    D. Fertonani, T. M. Duman, and M. F. Erden, “Bounds on the Capacity of Channels with Insertions, Deletions and Substitutions,”IEEE Trans- actions on Communications, vol. 59, no. 1, pp. 2–6, Jan. 2011

  32. [32]

    Bounds on the Capacity of Random Insertion and Deletion-Additive Noise Channels,

    M. Rahmati and T. M. Duman, “Bounds on the Capacity of Random Insertion and Deletion-Additive Noise Channels,”IEEE Transactions on Information Theory, vol. 59, no. 9, pp. 5534–5546, Sep. 2013

  33. [33]

    Finite Block- length Performance Bound for the DNA Storage Channel,

    I. Maarouf, G. Liva, E. Rosnes, and A. Graell i Amat, “Finite Block- length Performance Bound for the DNA Storage Channel,” in2023 12th International Symposium on Topics in Coding (ISTC). Brest, France: IEEE, Sep. 2023, pp. 1–5

  34. [34]

    Channel Polarization: A Method for Constructing Capacity- Achieving Codes for Symmetric Binary-Input Memoryless Channels,

    E. Arikan, “Channel Polarization: A Method for Constructing Capacity- Achieving Codes for Symmetric Binary-Input Memoryless Channels,” IEEE Transactions on Information Theory, vol. 55, no. 7, pp. 3051– 3073, Jul. 2009