Error Exponent Bounds for Optimal Short-Read Clustering

Nir Weinberger; Yoav Chachamovitz

arxiv: 2606.21218 · v1 · pith:TIS6AK4Unew · submitted 2026-06-19 · 💻 cs.IT · math.IT

Error Exponent Bounds for Optimal Short-Read Clustering

Yoav Chachamovitz , Nir Weinberger This is my paper

Pith reviewed 2026-06-26 13:22 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords short-read clusteringerror exponent boundsDNA storagememoryless channelunsupervised clusteringoptimal clustering ruleprobability of error

0 comments

The pith

Upper and lower bounds on the error probability of optimal clustering are derived for noisy short sequences from unknown sources

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the unsupervised clustering of short noisy sequences that each come from one of several unknown original sequences after transmission over a memoryless channel. It focuses on the statistically optimal clustering rule and supplies both upper and lower bounds on the probability that this rule makes an incorrect grouping. The bounds are given explicitly in terms of sequence length, the number of reads, and the channel statistics. A reader would care because the results indicate how reliably reads can be grouped without advance knowledge of the sources, which directly affects the design of decoders in applications such as DNA data storage.

Core claim

Focusing on the statistically optimal clustering rule, the paper derives upper and lower bounds on the probability of incorrect clustering as a function of the sequence length, the number of reads, and the channel statistics.

What carries the argument

The statistically optimal clustering rule that minimizes the probability of incorrect grouping of the observed noisy reads

If this is right

The probability of incorrect clustering is bounded from above and below by expressions that depend on sequence length, read count, and channel statistics.
Error probability decreases as sequence length or number of reads increases.
Different channel statistics produce different bound values, allowing comparison across noise models.
The bounds apply without any prior statistical knowledge of the source sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The bounds could be used to select minimum sequence lengths needed for target reliability in DNA storage systems.
Practical algorithms that approximate the optimal rule might achieve performance close to these bounds.
The same bounding technique might extend to clustering problems outside DNA storage that involve short noisy strings from unknown origins.

Load-bearing premise

The source sequences are completely unknown and the channel is memoryless.

What would settle it

A direct computation or Monte Carlo simulation of the actual error probability for fixed sequence length, read count, and channel, checked against whether the value lies strictly between the derived upper and lower bounds.

Figures

Figures reproduced from arXiv: 2606.21218 by Nir Weinberger, Yoav Chachamovitz.

**Figure 2.** Figure 2: BSC, p ∈ {0.05, 0.10, 0.20}, uniform binary PX. Solid: Lower bound ϕL (Theorem 3). Dashed: Upper bound ϕU (Theorem 4). Dotted: − log B(PX × PX) at each p. Theorem 3 yields a positive bound only when EB(1/β) > 0, i.e. β > β3(p) := 1/DB(PX × PX), equal to 2/dB(0, 1) for the BSC. The condition (64) of Theorem 4 EB(1/β) > − log B(PX × PX), defines a second threshold β4(p) [PITH_FULL_IMAGE:figures/full_fig_p01… view at source ↗

**Figure 3.** Figure 3: Thresholds in the (β, p) plane for the BSC, uniform PX. Thin: Theorem 3 threshold β3(p) (the lower bound is zero for β < β3). Thick: Theorem 4 threshold β4(p) (the curve EB(1/β) = − log B(PX × PX)). Shading distinguishes the three regions of the plane. Values of β4 at p ∈ {0.05, 0.10, 0.20} are annotated [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: q-ary symmetric channel, q ∈ {2, 3, 4}, p ∈ {0.05, 0.10, 0.20}, uniform PX. Solid: Lower bound ϕL (Theorem 3). Dashed: Upper bound ϕU (Theorem 4). Shade and width encode q (darker and thicker is for larger q). Dotted: − log B(PX × PX) for each q. The thresholds extend to the q-ary case [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Thresholds in the (β, p) plane for the q-ary symmetric channel, q ∈ {2, 3, 4}, uniform PX. Thick: β4(p, q). Thin: β3(p, q). The line style encodes q: dotted (q = 2), dashed (q = 3), solid (q = 4, the DNA alphabet) [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: BSC with crossover probability p = 0.1, under uniform and two biased binary PX. Solid: The lower bound ϕL (Theorem 3). Dashed: The upper bound ϕU (Theorem 4). Black: uniform PX = ( 1 2 , 1 2 ). Grey: PX = ( 3 10 , 7 10 ) and ( 1 10 , 9 10 ). Each curve is plotted only over the range where its lower bound is non-zero or where the condition for the upper bound holds. IV. PROOF OF PROPOSITION 2 Let Rm 1 ∈ {0,… view at source ↗

read the original abstract

Motivated by the operation of decoders for DNA storage, we consider the problem of unsupervised clustering of noisy short sequences, each generated from one of multiple possible unknown source sequences after passing through a memoryless channel. Focusing on the statistically optimal clustering rule, we derive upper and lower bounds on the probability of incorrect clustering as a function of the sequence length, the number of reads, and the channel statistics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper derives upper and lower bounds on incorrect clustering probability for the optimal rule when sources are unknown and the channel is memoryless.

read the letter

The main point is that the authors give explicit upper and lower bounds on the probability that the statistically optimal clustering rule fails, expressed only in terms of sequence length, number of reads, and channel statistics.

The new element is the application of error exponent techniques to this unsupervised short-read clustering problem with fixed but unknown sources. Prior work on error exponents usually assumes known distributions or supervised settings, so tailoring the analysis to the DNA storage motivated case with completely unknown sources is a legitimate step.

The paper does a clean job of stating the problem and sticking to the optimal rule without adding source statistics or channel memory. The bounds depend directly on the given parameters, which keeps the result self-contained and matches the setup.

A soft spot is that the abstract alone does not show the actual derivations or any tightness checks, so it is hard to judge how loose the bounds end up or whether they require heavy computation. That is typical for exponent papers but still leaves the practical reach unclear.

This is for information theorists working on storage systems or clustering error analysis. A reader already familiar with likelihood-based methods and error exponents would get the most out of the specific bounds.

I would send it to peer review. The claim is well-posed, the approach is standard for the area, and the result fills a narrow but defined gap.

Referee Report

0 major / 1 minor

Summary. The manuscript addresses unsupervised clustering of noisy short sequences, each generated from one of multiple unknown source sequences and passed through a memoryless channel, motivated by DNA storage decoder operation. It focuses on the statistically optimal clustering rule and derives upper and lower bounds on the probability of incorrect clustering expressed as functions of sequence length, number of reads, and channel statistics only.

Significance. If the derivations hold, the bounds offer parameter-free theoretical limits on clustering error that depend solely on observable channel statistics. This aligns with standard information-theoretic techniques for error exponents in clustering and could inform practical decoder design in DNA storage systems where source sequences are fixed but unknown.

minor comments (1)

The abstract states that bounds are derived, but the provided text does not include the explicit derivation steps or verification; a full assessment of the central claim requires the detailed proofs in the body of the manuscript.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their review and for accurately summarizing the manuscript's focus on deriving upper and lower bounds on clustering error probability for unsupervised clustering of noisy short sequences in the context of DNA storage decoders. We note that the referee raises no specific major comments or concerns about the derivations, technical details, or presentation.

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper derives upper and lower bounds on P(incorrect clustering) for the statistically optimal rule, expressed as functions of sequence length, number of reads, and channel statistics under memoryless channels with unknown sources. No load-bearing steps reduce by construction to fitted inputs, self-citations, or ansatzes; the setup matches standard information-theoretic error analysis without hidden dependencies on the target quantities. This is the expected honest non-finding for a direct bounding derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on abstract; no free parameters, axioms, or invented entities are identifiable from the provided text.

pith-pipeline@v0.9.1-grok · 5579 in / 912 out tokens · 16044 ms · 2026-06-26T13:22:58.476168+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references

[1]

Expurgated bounds for the asymmetric broadcast channel.IEEE Transactions on Information Theory, 65(6):3412–3435, 2019

Ran Averbuch, Nir Weinberger, and Neri Merhav. Expurgated bounds for the asymmetric broadcast channel.IEEE Transactions on Information Theory, 65(6):3412–3435, 2019

2019
[2]

Minimum of average conditional entropy for given minimum probability of error.Kybernetika, 2(5):416–422, 1966

Libuše Baladová. Minimum of average conditional entropy for given minimum probability of error.Kybernetika, 2(5):416–422, 1966

1966
[3]

DNA-correcting codes: End-to-end correction in DNA storage systems.IEEE Transactions on Information Theory, 71(6):4214–4227, 2025

Avital Boruchovsky, Daniella Bar-Lev, and Eitan Yaakobi. DNA-correcting codes: End-to-end correction in DNA storage systems.IEEE Transactions on Information Theory, 71(6):4214–4227, 2025

2025
[4]

Chu and J

J. Chu and J. Chueh. Inequalities between information measures and error probability.Journal of the Franklin Institute, 282(2):121–125, 1966

1966
[5]

Church, Yuan Gao, and Sriram Kosuri

George M. Church, Yuan Gao, and Sriram Kosuri. Next-generation digital information storage in DNA.Science, 337(6102):1628–1628, 2012

2012
[6]

I. Csiszár. The method of types.IEEE Transactions on Information Theory, 44(6):2505–2523, 1998

1998
[7]

Csiszár and J

I. Csiszár and J. Körner.Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2011

2011
[8]

DNA fountain enables a robust and efficient storage architecture.Science, 355(6328):950–954, 2017

Yaniv Erlich and Dina Zielinski. DNA fountain enables a robust and efficient storage architecture.Science, 355(6328):950–954, 2017

2017
[9]

R. G. Gallager.Information Theory and Reliable Communication. John Wiley and Sons, 1968

1968
[10]

The random coding bound is tight for the average code (corresp.).IEEE Transactions on Information Theory, 19(2):244–246, 1973

Robert Gallager. The random coding bound is tight for the average code (corresp.).IEEE Transactions on Information Theory, 19(2):244–246, 1973

1973
[11]

Capacity of frequency-based channels: Encoding information in molecular concentrations.IEEE Transactions on Information Theory, 71(8):5788–5808, 2025

Yuval Gerzon, Ilan Shomorony, and Nir Weinberger. Capacity of frequency-based channels: Encoding information in molecular concentrations.IEEE Transactions on Information Theory, 71(8):5788–5808, 2025

2025
[12]

Goldman, S

N. Goldman, S. Bertone, P. Chen, C. Dessimoz, E. M. LeProust, B. Sipos, and E. Birney. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA.Nature, 494(7435):77–80, 2013

2013
[13]

R. N. Grass, R. Heckel, M. Puddu, D. Paunescu, and Wendelin J. Stark. Robust chemical preservation of digital information on DNA in silica with error-correcting codes.Angewandte Chemie International Edition, 54(8):2552–2555, 2015

2015
[14]

Probability of error, equivocation, and the Chernoff bound.IEEE Transactions on Information Theory, 16(4):368–372, 1970

Martin Hellman and Josef Raviv. Probability of error, equivocation, and the Chernoff bound.IEEE Transactions on Information Theory, 16(4):368–372, 1970

1970
[15]

H. M. Kiah, G. J. Puleo, and O. Milenkovic. Codes for DNA sequence profiles.IEEE Transactions on Information Theory, 62(6):3125–3146, 2016

2016
[16]

Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi. An upper bound on the capacity of the DNA storage channel. In2019 IEEE Information Theory Workshop (ITW), pages 1–5. IEEE, 2019

2019
[17]

Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi. Achieving the capacity of the DNA storage channel. In2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8846–8850. IEEE, 2020. 36

2020
[18]

Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi. The noisy drawing channel: Reliable data storage in DNA sequences.IEEE Transactions on Information Theory, 69(5):2757–2778, May 2023

2023
[19]

Efficient reconstruction of sequences.IEEE Transactions on Information Theory, 47(1):2–22, 2002

Vladimir I Levenshtein. Efficient reconstruction of sequences.IEEE Transactions on Information Theory, 47(1):2–22, 2002

2002
[20]

Y . H. Ling, N. Weinberger, and J. Scarlett. Error exponents for DNA storage codes with a variable number of reads.IEEE Journal on Selected Areas in Information Theory, pages 205–216, 2025

2025
[21]

Exact error exponents of concatenated codes for DNA storage.IEEE Transactions on Information Theory, 71(9):6566–6585, 2025

Yan Hao Ling and Jonathan Scarlett. Exact error exponents of concatenated codes for DNA storage.IEEE Transactions on Information Theory, 71(9):6566–6585, 2025

2025
[22]

Optimal overlap detection of shotgun reads.Submitted to IEEE Transactions on Information Theory, 2025

Nir Luria and Nir Weinberger. Optimal overlap detection of shotgun reads.Submitted to IEEE Transactions on Information Theory, 2025

2025
[23]

A toolbox for refined information-theoretic analyses with applications.Foundations and Trends in Communications and Information Theory, 22(1):1–184, 2025

Neri Merhav and Nir Weinberger. A toolbox for refined information-theoretic analyses with applications.Foundations and Trends in Communications and Information Theory, 22(1):1–184, 2025

2025
[24]

Organick, S

L. Organick, S. D. Ang, Y . Chen, R. Lopez, S. Yekhanin, K. Makarychev, M. Z. Racz, G. Kamath, P. Gopalan, and B. Nguyen. Random access in large-scale DNA data storage.Nature biotechnology, 36(3):242–248, 2018

2018
[25]

Arvind Rameshwar and Nir Weinberger

V . Arvind Rameshwar and Nir Weinberger. Information rates over multi-view channels.IEEE Transactions on Information Theory, 71(2):847–861, 2025

2025
[26]

Clustering billions of reads for DNA data storage.Advances in Neural Information Processing Systems, 30, 2017

Cyrus Rashtchian, Konstantin Makarychev, Miklos Racz, Siena Ang, Djordje Jevdjic, Sergey Yekhanin, Luis Ceze, and Karin Strauss. Clustering billions of reads for DNA data storage.Advances in Neural Information Processing Systems, 30, 2017

2017
[27]

Arimoto–Rényi conditional entropy and Bayesianm-ary hypothesis testing.IEEE Transactions on Information theory, 64(1):4–25, 2017

Igal Sason and Sergio Verdú. Arimoto–Rényi conditional entropy and Bayesianm-ary hypothesis testing.IEEE Transactions on Information theory, 64(1):4–25, 2017

2017
[28]

DNA-based storage: Models and fundamental limits.IEEE Transactions on Information Theory, 67(6):3675–3689, 2021

Ilan Shomorony and Reinhard Heckel. DNA-based storage: Models and fundamental limits.IEEE Transactions on Information Theory, 67(6):3675–3689, 2021

2021
[29]

Information-theoretic foundations of DNA data storage.Foundations and Trends® in Communications and Information Theory, 19(1):1–106, 2022

Ilan Shomorony and Reinhard Heckel. Information-theoretic foundations of DNA data storage.Foundations and Trends® in Communications and Information Theory, 19(1):1–106, 2022

2022
[30]

Shulman.Communication over an unknown channel via common broadcasting

N. Shulman.Communication over an unknown channel via common broadcasting. PhD thesis, Tel-Aviv University, Tel-Aviv, Israel, 2003

2003
[31]

On coding over sliced information.IEEE Transactions on Information Theory, 67(5):2793–2807, 2021

Jin Sima, Netanel Raviv, and Jehoshua Bruck. On coding over sliced information.IEEE Transactions on Information Theory, 67(5):2793–2807, 2021

2021
[32]

Error correction for DNA storage.IEEE BITS the Information Theory Magazine, 3(3):78–94, 2023

Jin Sima, Netanel Raviv, Moshe Schwartz, and Jehoshua Bruck. Error correction for DNA storage.IEEE BITS the Information Theory Magazine, 3(3):78–94, 2023

2023
[33]

Achievable rates and error probability bounds of frequency-based channels of unlimited input resolution.IEEE Journal on Selected Areas in Information Theory, 6:283–295, 2025

Ran Tamir and Nir Weinberger. Achievable rates and error probability bounds of frequency-based channels of unlimited input resolution.IEEE Journal on Selected Areas in Information Theory, 6:283–295, 2025

2025
[34]

Error probability bounds for coded-index DNA storage systems.IEEE Transactions on Information Theory, 68(11):7005–7022, 2022

Nir Weinberger. Error probability bounds for coded-index DNA storage systems.IEEE Transactions on Information Theory, 68(11):7005–7022, 2022

2022
[35]

The DNA storage channel: Capacity and error probability bounds.IEEE Transactions on Information Theory, 68(9):5657–5700, 2022

Nir Weinberger and Neri Merhav. The DNA storage channel: Capacity and error probability bounds.IEEE Transactions on Information Theory, 68(9):5657–5700, 2022

2022
[36]

Fundamental limits of reference-based sequence reordering.IEEE Transactions on Information Theory, 70(7):4634– 4654, 2024

Nir Weinberger and Ilan Shomorony. Fundamental limits of reference-based sequence reordering.IEEE Transactions on Information Theory, 70(7):4634– 4654, 2024

2024
[37]

S. M. H. T. Yazdi, Y . Yuan, J. Ma, H. Zhao, and O. Milenkovic. A rewritable, random-access DNA-based storage system.Scientific reports, 5(1):1–10, 2015

2015

[1] [1]

Expurgated bounds for the asymmetric broadcast channel.IEEE Transactions on Information Theory, 65(6):3412–3435, 2019

Ran Averbuch, Nir Weinberger, and Neri Merhav. Expurgated bounds for the asymmetric broadcast channel.IEEE Transactions on Information Theory, 65(6):3412–3435, 2019

2019

[2] [2]

Minimum of average conditional entropy for given minimum probability of error.Kybernetika, 2(5):416–422, 1966

Libuše Baladová. Minimum of average conditional entropy for given minimum probability of error.Kybernetika, 2(5):416–422, 1966

1966

[3] [3]

DNA-correcting codes: End-to-end correction in DNA storage systems.IEEE Transactions on Information Theory, 71(6):4214–4227, 2025

Avital Boruchovsky, Daniella Bar-Lev, and Eitan Yaakobi. DNA-correcting codes: End-to-end correction in DNA storage systems.IEEE Transactions on Information Theory, 71(6):4214–4227, 2025

2025

[4] [4]

Chu and J

J. Chu and J. Chueh. Inequalities between information measures and error probability.Journal of the Franklin Institute, 282(2):121–125, 1966

1966

[5] [5]

Church, Yuan Gao, and Sriram Kosuri

George M. Church, Yuan Gao, and Sriram Kosuri. Next-generation digital information storage in DNA.Science, 337(6102):1628–1628, 2012

2012

[6] [6]

I. Csiszár. The method of types.IEEE Transactions on Information Theory, 44(6):2505–2523, 1998

1998

[7] [7]

Csiszár and J

I. Csiszár and J. Körner.Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, 2011

2011

[8] [8]

DNA fountain enables a robust and efficient storage architecture.Science, 355(6328):950–954, 2017

Yaniv Erlich and Dina Zielinski. DNA fountain enables a robust and efficient storage architecture.Science, 355(6328):950–954, 2017

2017

[9] [9]

R. G. Gallager.Information Theory and Reliable Communication. John Wiley and Sons, 1968

1968

[10] [10]

The random coding bound is tight for the average code (corresp.).IEEE Transactions on Information Theory, 19(2):244–246, 1973

Robert Gallager. The random coding bound is tight for the average code (corresp.).IEEE Transactions on Information Theory, 19(2):244–246, 1973

1973

[11] [11]

Capacity of frequency-based channels: Encoding information in molecular concentrations.IEEE Transactions on Information Theory, 71(8):5788–5808, 2025

Yuval Gerzon, Ilan Shomorony, and Nir Weinberger. Capacity of frequency-based channels: Encoding information in molecular concentrations.IEEE Transactions on Information Theory, 71(8):5788–5808, 2025

2025

[12] [12]

Goldman, S

N. Goldman, S. Bertone, P. Chen, C. Dessimoz, E. M. LeProust, B. Sipos, and E. Birney. Towards practical, high-capacity, low-maintenance information storage in synthesized DNA.Nature, 494(7435):77–80, 2013

2013

[13] [13]

R. N. Grass, R. Heckel, M. Puddu, D. Paunescu, and Wendelin J. Stark. Robust chemical preservation of digital information on DNA in silica with error-correcting codes.Angewandte Chemie International Edition, 54(8):2552–2555, 2015

2015

[14] [14]

Probability of error, equivocation, and the Chernoff bound.IEEE Transactions on Information Theory, 16(4):368–372, 1970

Martin Hellman and Josef Raviv. Probability of error, equivocation, and the Chernoff bound.IEEE Transactions on Information Theory, 16(4):368–372, 1970

1970

[15] [15]

H. M. Kiah, G. J. Puleo, and O. Milenkovic. Codes for DNA sequence profiles.IEEE Transactions on Information Theory, 62(6):3125–3146, 2016

2016

[16] [16]

Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi. An upper bound on the capacity of the DNA storage channel. In2019 IEEE Information Theory Workshop (ITW), pages 1–5. IEEE, 2019

2019

[17] [17]

Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi. Achieving the capacity of the DNA storage channel. In2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8846–8850. IEEE, 2020. 36

2020

[18] [18]

Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, and Eitan Yaakobi. The noisy drawing channel: Reliable data storage in DNA sequences.IEEE Transactions on Information Theory, 69(5):2757–2778, May 2023

2023

[19] [19]

Efficient reconstruction of sequences.IEEE Transactions on Information Theory, 47(1):2–22, 2002

Vladimir I Levenshtein. Efficient reconstruction of sequences.IEEE Transactions on Information Theory, 47(1):2–22, 2002

2002

[20] [20]

Y . H. Ling, N. Weinberger, and J. Scarlett. Error exponents for DNA storage codes with a variable number of reads.IEEE Journal on Selected Areas in Information Theory, pages 205–216, 2025

2025

[21] [21]

Exact error exponents of concatenated codes for DNA storage.IEEE Transactions on Information Theory, 71(9):6566–6585, 2025

Yan Hao Ling and Jonathan Scarlett. Exact error exponents of concatenated codes for DNA storage.IEEE Transactions on Information Theory, 71(9):6566–6585, 2025

2025

[22] [22]

Optimal overlap detection of shotgun reads.Submitted to IEEE Transactions on Information Theory, 2025

Nir Luria and Nir Weinberger. Optimal overlap detection of shotgun reads.Submitted to IEEE Transactions on Information Theory, 2025

2025

[23] [23]

A toolbox for refined information-theoretic analyses with applications.Foundations and Trends in Communications and Information Theory, 22(1):1–184, 2025

Neri Merhav and Nir Weinberger. A toolbox for refined information-theoretic analyses with applications.Foundations and Trends in Communications and Information Theory, 22(1):1–184, 2025

2025

[24] [24]

Organick, S

L. Organick, S. D. Ang, Y . Chen, R. Lopez, S. Yekhanin, K. Makarychev, M. Z. Racz, G. Kamath, P. Gopalan, and B. Nguyen. Random access in large-scale DNA data storage.Nature biotechnology, 36(3):242–248, 2018

2018

[25] [25]

Arvind Rameshwar and Nir Weinberger

V . Arvind Rameshwar and Nir Weinberger. Information rates over multi-view channels.IEEE Transactions on Information Theory, 71(2):847–861, 2025

2025

[26] [26]

Clustering billions of reads for DNA data storage.Advances in Neural Information Processing Systems, 30, 2017

Cyrus Rashtchian, Konstantin Makarychev, Miklos Racz, Siena Ang, Djordje Jevdjic, Sergey Yekhanin, Luis Ceze, and Karin Strauss. Clustering billions of reads for DNA data storage.Advances in Neural Information Processing Systems, 30, 2017

2017

[27] [27]

Arimoto–Rényi conditional entropy and Bayesianm-ary hypothesis testing.IEEE Transactions on Information theory, 64(1):4–25, 2017

Igal Sason and Sergio Verdú. Arimoto–Rényi conditional entropy and Bayesianm-ary hypothesis testing.IEEE Transactions on Information theory, 64(1):4–25, 2017

2017

[28] [28]

DNA-based storage: Models and fundamental limits.IEEE Transactions on Information Theory, 67(6):3675–3689, 2021

Ilan Shomorony and Reinhard Heckel. DNA-based storage: Models and fundamental limits.IEEE Transactions on Information Theory, 67(6):3675–3689, 2021

2021

[29] [29]

Information-theoretic foundations of DNA data storage.Foundations and Trends® in Communications and Information Theory, 19(1):1–106, 2022

Ilan Shomorony and Reinhard Heckel. Information-theoretic foundations of DNA data storage.Foundations and Trends® in Communications and Information Theory, 19(1):1–106, 2022

2022

[30] [30]

Shulman.Communication over an unknown channel via common broadcasting

N. Shulman.Communication over an unknown channel via common broadcasting. PhD thesis, Tel-Aviv University, Tel-Aviv, Israel, 2003

2003

[31] [31]

On coding over sliced information.IEEE Transactions on Information Theory, 67(5):2793–2807, 2021

Jin Sima, Netanel Raviv, and Jehoshua Bruck. On coding over sliced information.IEEE Transactions on Information Theory, 67(5):2793–2807, 2021

2021

[32] [32]

Error correction for DNA storage.IEEE BITS the Information Theory Magazine, 3(3):78–94, 2023

Jin Sima, Netanel Raviv, Moshe Schwartz, and Jehoshua Bruck. Error correction for DNA storage.IEEE BITS the Information Theory Magazine, 3(3):78–94, 2023

2023

[33] [33]

Achievable rates and error probability bounds of frequency-based channels of unlimited input resolution.IEEE Journal on Selected Areas in Information Theory, 6:283–295, 2025

Ran Tamir and Nir Weinberger. Achievable rates and error probability bounds of frequency-based channels of unlimited input resolution.IEEE Journal on Selected Areas in Information Theory, 6:283–295, 2025

2025

[34] [34]

Error probability bounds for coded-index DNA storage systems.IEEE Transactions on Information Theory, 68(11):7005–7022, 2022

Nir Weinberger. Error probability bounds for coded-index DNA storage systems.IEEE Transactions on Information Theory, 68(11):7005–7022, 2022

2022

[35] [35]

The DNA storage channel: Capacity and error probability bounds.IEEE Transactions on Information Theory, 68(9):5657–5700, 2022

Nir Weinberger and Neri Merhav. The DNA storage channel: Capacity and error probability bounds.IEEE Transactions on Information Theory, 68(9):5657–5700, 2022

2022

[36] [36]

Fundamental limits of reference-based sequence reordering.IEEE Transactions on Information Theory, 70(7):4634– 4654, 2024

Nir Weinberger and Ilan Shomorony. Fundamental limits of reference-based sequence reordering.IEEE Transactions on Information Theory, 70(7):4634– 4654, 2024

2024

[37] [37]

S. M. H. T. Yazdi, Y . Yuan, J. Ma, H. Zhao, and O. Milenkovic. A rewritable, random-access DNA-based storage system.Scientific reports, 5(1):1–10, 2015

2015