Capacity-Achieving Codes with Inverse-Ackermann-Depth Encoders

Yuan Li

arxiv: 2512.11443 · v2 · submitted 2025-12-12 · 💻 cs.IT · math.IT

Capacity-Achieving Codes with Inverse-Ackermann-Depth Encoders

Yuan Li This is my paper

Pith reviewed 2026-05-16 22:58 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords capacity-achieving codesarithmetic circuitsinverse Ackermann depthdisperser graphsencoding complexityadditive noise channelsfinite fieldserror probability

0 comments

The pith

Error-correcting codes approach channel capacity when encoded by linear-size arithmetic circuits of inverse-Ackermann depth

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that for any additive noise channel over a finite field F_q there exist codes achieving rates arbitrarily close to capacity whose encoders are arithmetic circuits over F_q with O(n) size and depth 2α(n), where α(n) is a slow-growing inverse Ackermann function that stays at most 3 for all practical block lengths n. This matters because it demonstrates that capacity-achieving codes can be encoded with circuits whose depth is smaller than any fixed constant while keeping total size linear in n. The construction begins with a known linear code of constant rate and positive relative distance, then adds one layer consisting of a disperser graph whose edge weights are chosen by a probabilistic argument to produce a deterministic encoder with error probability 2^{-Ω(n)}.

Core claim

There exist error-correcting codes approaching the capacity of any additive noise channel over F_q that admit encoders realized as arithmetic circuits over F_q of size O(n) and depth 2α(n). The proof obtains such an encoder by composing a constant-rate linear code with positive relative distance and a disperser graph, then showing that a random assignment of edge weights yields a deterministic circuit whose output distribution is close to the uniform distribution over a codeword set with the required error probability.

What carries the argument

Composition of a constant-rate linear code with a disperser graph, followed by a probabilistic selection of weighted addition gates that derandomizes the encoder while preserving capacity achievement

If this is right

Encoding circuits for capacity-achieving codes can have depth bounded by 6 for every practical block length.
The total number of arithmetic operations required for encoding grows only linearly with block length.
Error probability decays exponentially in block length at any rate strictly below capacity.
The same codes remain reliable under any additive noise distribution over the finite field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The disperser-layer technique may extend to produce low-depth encoders for list-decodable codes or codes with additional algebraic structure.
If similar compositions apply to decoding circuits, overall communication systems could operate with near-constant depth in both directions.
The result separates encoding complexity from the information-theoretic rate limit more sharply than earlier constructions that required logarithmic or polylogarithmic depth.

Load-bearing premise

The base linear code with constant rate and positive relative distance can be combined with a disperser graph so that some choice of edge weights produces a deterministic encoder with exponentially small error probability at rates below capacity.

What would settle it

An explicit additive noise channel and rate below capacity for which every possible assignment of weights to the edges of every disperser of the required size produces an encoder whose error probability is larger than 2^{-Ω(n)}.

read the original abstract

We prove that for any additive noise channel over $\mathbb{F}_q$, there exist error-correcting codes approaching channel capacity encodable by arithmetic circuits (with weighted addition gates) over $\mathbb{F}_q$ of size $O(n)$ and depth $2\alpha(n)$, where $\alpha(n)$ is a version of the inverse Ackermann function that is at most $3$ for all input lengths $n$ in practice. Our results demonstrate that certain capacity-achieving codes admit highly efficient encoding circuits that are simultaneously of linear size and inverse-Ackermann depth. Our construction composes a linear code with constant rate and relative distance, based on the constructions of G\'{a}l, Hansen, Kouck\'{y}, Pudl\'{a}k, and Viola [IEEE Trans. Inform. Theory 59(10), 2013] and Drucker and Li [COCOON 2023], with an additional layer formed by a disperser graph. A probabilistic argument over the edge weights of the disperser shows the existence of a deterministic encoder achieving error probability $2^{-\Omega(n)}$ at any rate below capacity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows existence of capacity-achieving codes with linear-size encoders of depth 2α(n) by layering a disperser with random edge weights on top of known constant-rate linear codes.

read the letter

The main point is that they get encoders that are arithmetic circuits over F_q with O(n) size and depth only twice the inverse Ackermann function, which is at most 3 in practice. That combination of linear size and near-constant depth for codes approaching capacity is the new piece. They start from the linear codes in Gál et al. 2013 and Drucker-Li 2023 that already give constant rate and relative distance, add a disperser graph, and then argue that random weights on the edges produce a deterministic map whose error probability under additive noise is 2^{-Ω(n)} for any rate below capacity. The probabilistic weighting step is what delivers the depth bound, and that specific construction does not appear in the earlier literature on these components. The argument looks internally consistent on the surface and avoids circularity by composing independent prior results with a fresh existence argument. The soft spot is exactly where the stress-test note flags: the base code only guarantees constant distance, so the disperser neighborhoods can create linear dependencies among low-weight error patterns. It is not obvious that a simple union bound or second-moment calculation will still succeed with positive probability when the rate gets arbitrarily close to capacity; the dependencies might push the failure probability above 1 for some parameter regimes. Without the full expansion of that calculation it is hard to tell how tight the bound actually is. This is aimed at people working on the circuit complexity of coding and on efficient encoders for information-theoretic constructions. It is worth sending to a serious referee because the existence claim is clean, the approach is plausible, and the potential flaw is localized enough that referees can check it directly rather than reject outright.

Referee Report

1 major / 2 minor

Summary. The paper claims that for any additive noise channel over F_q, there exist capacity-achieving error-correcting codes encodable by arithmetic circuits over F_q with size O(n) and depth 2α(n), where α(n) is an inverse Ackermann function bounded by 3 in practice. The proof composes a constant-rate constant-distance linear code from prior literature with a disperser graph, selecting edge weights probabilistically to achieve error probability 2^{-Ω(n)}.

Significance. If correct, this shows that optimal codes admit encoders with linear size and inverse-Ackermann depth, which is a strong positive result on the efficiency of capacity-achieving codes. It leverages existing constructions and introduces a disperser-based layer with probabilistic weighting, providing a concrete way to achieve shallow circuits.

major comments (1)

[Probabilistic argument for disperser weights] The load-bearing step is the probabilistic method showing that random edge weights yield success probability >0 for error prob 2^{-Ω(n)} at rates R < C. Since the base code from Gál et al. and Drucker-Li has only constant relative distance, the disperser neighborhoods may induce dependencies among low-weight error events. The manuscript must specify the union bound or moment calculation and confirm it holds arbitrarily close to capacity without the success probability vanishing.

minor comments (2)

[Definition of α(n)] Provide the precise recursive definition of the inverse Ackermann function α(n) used in the depth bound.
[Channel model] Clarify if the additive noise channel is over arbitrary alphabets or specifically F_q, and any assumptions on the noise distribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed reading and for identifying the need to strengthen the exposition of the probabilistic argument. We address the comment below and will revise the manuscript to include an explicit calculation.

read point-by-point responses

Referee: The load-bearing step is the probabilistic method showing that random edge weights yield success probability >0 for error prob 2^{-Ω(n)} at rates R < C. Since the base code from Gál et al. and Drucker-Li has only constant relative distance, the disperser neighborhoods may induce dependencies among low-weight error events. The manuscript must specify the union bound or moment calculation and confirm it holds arbitrarily close to capacity without the success probability vanishing.

Authors: The analysis proceeds via a direct union bound over all vectors e of weight at most δn/2 (where δ is the constant relative distance of the base code). For each such e the disperser with random weights maps the support of e to a set of linear forms whose joint distribution is 2^{-Ω(n)}-close to uniform over F_q^{O(n)} by the disperser mixing property; consequently Pr[weighted sum of e equals a nonzero base-codeword difference] ≤ q^{-Ω(n)} = 2^{-Ω(n)}. The number of candidate e is at most 2^{O(n)}, so the union bound yields overall failure probability 2^{-Ω(n)} with positive probability over the choice of weights. Because the base code is linear and the disperser is applied after encoding, the constant distance only limits the weight of e that must be considered; it does not create positive correlations that would invalidate the bound. The exponent remains strictly positive for any rate R < C because the entropy gap C-R appears linearly in the large-deviation rate function of the weighted sums. We will add a new subsection (Section 4.3) that writes out the union bound and the entropy calculation explicitly. revision: yes

Circularity Check

1 steps flagged

Self-citation load-bearing for base linear code; probabilistic disperser argument is independent

specific steps

self citation load bearing [Abstract]
"Our construction composes a linear code with constant rate and relative distance, based on the constructions of Gál, Hansen, Koucký, Pudlák, and Viola [IEEE Trans. Inform. Theory 59(10), 2013] and Drucker and Li [COCOON 2023], with an additional layer formed by a disperser graph. A probabilistic argument over the edge weights of the disperser shows the existence of a deterministic encoder achieving error probability 2^{-Ω(n)} at any rate below capacity."

The load-bearing base code is taken directly from Drucker and Li [COCOON 2023] (co-authored by the present paper's author Yuan Li). The overall existence claim therefore rests on this self-citation for its foundational linear code, even though the subsequent disperser weighting and probabilistic method are new.

full rationale

The derivation composes a prior linear code (constant rate, relative distance) with a new disperser layer and probabilistic weighting argument to establish existence of the desired encoder. The only load-bearing citation is to Drucker and Li 2023, which overlaps in authorship with the present paper; this supplies the base code but does not reduce the central probabilistic existence claim to a self-referential definition or fitted parameter. No self-definitional equations, fitted inputs renamed as predictions, uniqueness theorems imported from the same authors, or ansatz smuggling appear. The construction remains largely self-contained against external benchmarks once the cited base code is granted.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two external results: the existence of suitable constant-rate linear codes from prior papers, and the standard probabilistic method for selecting disperser weights. No free parameters or new entities are introduced.

axioms (2)

domain assumption Existence of linear codes with constant rate and relative distance over F_q from Gál et al. and Drucker-Li constructions
Invoked as the base code in the composition step of the construction.
standard math Probabilistic method guarantees existence of edge weights making the disperser produce the required error probability
Used to convert the probabilistic argument into a deterministic encoder.

pith-pipeline@v0.9.0 · 5495 in / 1425 out tokens · 31770 ms · 2026-05-16T22:58:06.760372+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels

[Ari09] Erdal Arikan. “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels”. In:IEEE Transactions on information Theory55.7 (2009), pp. 3051–3073. [BGT93] Claude Berrou, Alain Glavieux, and Punya Thitimajshima. “Near Shannon limit error- correcting coding and decoding: Turbo-codes. 1”. In...

work page 2009
[2]

Superconcen- trators, generalizers and generalized connectors with limited depth

IEEE. 1993, pp. 1064–1070. 14 [Dol+83] Danny Dolev, Cynthia Dwork, Nicholas Pippenger, and Avi Wigderson. “Superconcen- trators, generalizers and generalized connectors with limited depth”. In:Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 25-27 April, 1983, Boston, Massachusetts, USA. Ed. by David S. Johnson, Ronald Fagin, Michael L...

work page 1993
[3]

Better extractors for better codes?

[Gur04] Venkatesan Guruswami. “Better extractors for better codes?” In:Proceedings of the thirty-sixth annual ACM symposium on Theory of computing. 2004, pp. 436–444. [GI01] Venkatesan Guruswami and Piotr Indyk. “Expander-based constructions of efficiently decodable codes”. In:Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE. 2001,...

work page arXiv 2004
[4]

Bounds for dispersers, extractors, and depth-two superconcentrators

15 [RT00] Jaikumar Radhakrishnan and Amnon Ta-Shma. “Bounds for dispersers, extractors, and depth-two superconcentrators”. In:SIAM Journal on Discrete Mathematics13.1 (2000), pp. 2–24. [RS03] Ran Raz and Amir Shpilka. “Lower bounds for matrix product in bounded depth circuits with arbitrary gates”. In:SIAM Journal on Computing32.2 (2003), pp. 488–513. [RU...

work page 2000
[5]

A mathematical theory of communication

[Sha48] Claude E Shannon. “A mathematical theory of communication”. In:The Bell system technical journal27.3 (1948), pp. 379–423. [SV84] Larry Stockmeyer and Uzi Vishkin. “Simulation of parallel random access machines by circuits”. In:SIAM Journal on Computing13.2 (1984), pp. 409–422. [Tar75] Robert Endre Tarjan. “Efficiency of a good but not linear set u...

work page 1948
[6]

Graph-theoretic properties in computational complexity

[Val76] Leslie G Valiant. “Graph-theoretic properties in computational complexity”. In:Journal of Computer and System Sciences13.3 (1976), pp. 278–285. [Val77] Leslie G. Valiant. “Graph-theoretic arguments in low-level complexity”. In:Mathe- matical Foundations of Computer Science 1977, 6th Symposium, Tatranska Lomnica, Czechoslovakia, September 5-9, 1977...

work page 1976

[1] [1]

Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels

[Ari09] Erdal Arikan. “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels”. In:IEEE Transactions on information Theory55.7 (2009), pp. 3051–3073. [BGT93] Claude Berrou, Alain Glavieux, and Punya Thitimajshima. “Near Shannon limit error- correcting coding and decoding: Turbo-codes. 1”. In...

work page 2009

[2] [2]

Superconcen- trators, generalizers and generalized connectors with limited depth

IEEE. 1993, pp. 1064–1070. 14 [Dol+83] Danny Dolev, Cynthia Dwork, Nicholas Pippenger, and Avi Wigderson. “Superconcen- trators, generalizers and generalized connectors with limited depth”. In:Proceedings of the 15th Annual ACM Symposium on Theory of Computing, 25-27 April, 1983, Boston, Massachusetts, USA. Ed. by David S. Johnson, Ronald Fagin, Michael L...

work page 1993

[3] [3]

Better extractors for better codes?

[Gur04] Venkatesan Guruswami. “Better extractors for better codes?” In:Proceedings of the thirty-sixth annual ACM symposium on Theory of computing. 2004, pp. 436–444. [GI01] Venkatesan Guruswami and Piotr Indyk. “Expander-based constructions of efficiently decodable codes”. In:Proceedings 42nd IEEE Symposium on Foundations of Computer Science. IEEE. 2001,...

work page arXiv 2004

[4] [4]

Bounds for dispersers, extractors, and depth-two superconcentrators

15 [RT00] Jaikumar Radhakrishnan and Amnon Ta-Shma. “Bounds for dispersers, extractors, and depth-two superconcentrators”. In:SIAM Journal on Discrete Mathematics13.1 (2000), pp. 2–24. [RS03] Ran Raz and Amir Shpilka. “Lower bounds for matrix product in bounded depth circuits with arbitrary gates”. In:SIAM Journal on Computing32.2 (2003), pp. 488–513. [RU...

work page 2000

[5] [5]

A mathematical theory of communication

[Sha48] Claude E Shannon. “A mathematical theory of communication”. In:The Bell system technical journal27.3 (1948), pp. 379–423. [SV84] Larry Stockmeyer and Uzi Vishkin. “Simulation of parallel random access machines by circuits”. In:SIAM Journal on Computing13.2 (1984), pp. 409–422. [Tar75] Robert Endre Tarjan. “Efficiency of a good but not linear set u...

work page 1948

[6] [6]

Graph-theoretic properties in computational complexity

[Val76] Leslie G Valiant. “Graph-theoretic properties in computational complexity”. In:Journal of Computer and System Sciences13.3 (1976), pp. 278–285. [Val77] Leslie G. Valiant. “Graph-theoretic arguments in low-level complexity”. In:Mathe- matical Foundations of Computer Science 1977, 6th Symposium, Tatranska Lomnica, Czechoslovakia, September 5-9, 1977...

work page 1976