Support Size of $\varepsilon$-Capacity-Achieving Inputs for the Amplitude-Constrained AWGN Channel

Alex Dytso; Luca Barletta

arxiv: 2604.14915 · v1 · submitted 2026-04-16 · 💻 cs.IT · math.IT

Support Size of varepsilon-Capacity-Achieving Inputs for the Amplitude-Constrained AWGN Channel

Luca Barletta , Alex Dytso This is my paper

Pith reviewed 2026-05-10 10:13 UTC · model grok-4.3

classification 💻 cs.IT math.IT

keywords amplitude-constrained AWGN channelepsilon-capacity-achieving inputssupport sizeGaussian mixturesmutual informationdiscrete inputschannel capacity

0 comments

The pith

The minimal support size of discrete inputs that achieve capacity within a polynomial gap for the amplitude-constrained AWGN channel scales as Theta of A times square root of log A.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper determines the smallest number of mass points an input distribution supported on the interval from negative A to positive A must use to reach a mutual information value within an epsilon gap of the true channel capacity. It establishes that when this gap epsilon shrinks polynomially with the amplitude limit A, the required support size is on the order of A times the square root of log A. This relaxed version of optimality turns out to be far more tractable than characterizing the exact capacity-achieving distribution, which is known to be discrete but whose support-size scaling has stayed open. The analysis also shows that the variety of growth rates reported in earlier numerical work arises simply from the different epsilon tolerances those experiments implicitly adopted rather than from any property of the exact optimizer.

Core claim

When the gap epsilon to capacity decays polynomially with the amplitude constraint A, specifically epsilon equals A to the power of negative beta for beta at least 1, the smallest support size K_epsilon of A among discrete inputs on the interval from negative A to positive A that achieve mutual information within epsilon of capacity satisfies K_epsilon of A equals Theta of A times square root of log A. For exponentially small gaps the paper obtains matching lower and upper bounds lying between A square root of log A and A to the power of three halves.

What carries the argument

The quantity K_epsilon of A, defined as the minimal support size among discrete inputs on negative A to A that achieve within epsilon of capacity, analyzed through approximation-theoretic bounds on Gaussian mixtures, entropy control via chi-squared divergence, and a wrapping argument that reduces the linear problem to uniform approximation on the circle.

If this is right

For polynomially small epsilon the minimal support size grows exactly as Theta of A times square root of log A.
For exponentially small epsilon the support size is bounded between A square root of log A and A to the power of three halves.
Differences in observed scaling laws across numerical studies trace to the different epsilon regimes those studies employed.
The epsilon-relaxed formulation admits sharp characterizations where the exact capacity-achieving support size problem remains open.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practical systems that tolerate moderate gaps to capacity could therefore employ inputs with far fewer points than the exact optimizer would require.
The same mixture-approximation and wrapping techniques may produce support-size results for other channels that impose amplitude or power constraints.
Improving the constants inside the Gaussian-mixture bounds could extend the Theta scaling result to smaller epsilon regimes.

Load-bearing premise

The error controls from approximating Gaussian mixtures and bounding entropy differences via chi-squared divergence stay tight enough not to change the leading scaling term, while the wrapping step that maps the amplitude interval to a circle adds only negligible error.

What would settle it

For a sequence of growing amplitude limits A and a fixed polynomial gap such as epsilon equal to one over A, compute the minimal number of points needed for some discrete input to reach mutual information at least capacity minus epsilon and check whether that number grows proportionally to A times the square root of log A.

read the original abstract

We study the amplitude-constrained additive white Gaussian noise (AWGN) channel from the perspective of near-optimal input distributions. While it is known that the capacity-achieving input is discrete with finitely many mass points, the precise scaling of its support size as a function of the amplitude constraint remains an open problem. In this work, we instead consider the minimal support size required to achieve capacity up to an $\varepsilon$-gap. We introduce the quantity $K_\varepsilon(A)$, defined as the smallest support size among discrete inputs supported on $[-A,A]$ that achieves mutual information within $\varepsilon$ of capacity. We show that this relaxed formulation is significantly more tractable and admits sharp characterizations across different regimes of $\varepsilon$. In particular, when $\varepsilon$ decays polynomially with $A$, i.e., $\varepsilon = A^{-\beta}$ for $\beta \geq 1$, we establish that $K_\varepsilon(A) = \Theta(A\sqrt{\log A})$. For exponentially small gaps, we obtain bounds of order between $A\sqrt{\log A}$ and $A^{3/2}$. Our approach combines approximation-theoretic bounds for Gaussian mixtures with information-theoretic control of entropy via $\chi^2$-divergence, together with a wrapping argument that relates the problem to approximating the uniform distribution on the circle. Beyond the technical results, our framework provides a conceptual explanation for the variety of scaling laws observed in prior numerical studies, showing that these correspond to different regimes of $\varepsilon$-optimality rather than intrinsic properties of the exact optimizer.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper defines K_ε(A) as the smallest support size of a discrete input on [-A,A] whose mutual information is within ε of the capacity of the amplitude-constrained AWGN channel. It claims that when ε decays polynomially as A^{-β} for β≥1, K_ε(A)=Θ(A√log A); for exponentially small ε the support size lies between A√log A and A^{3/2}. The proofs combine approximation-theoretic bounds on Gaussian mixtures, χ²-divergence control of entropy, and a wrapping map that reduces the linear problem to uniform approximation on the circle.

Significance. If the claimed scalings hold with A-independent constants, the work supplies the first sharp characterization of near-optimal discrete inputs for this channel and explains the range of support sizes seen in prior numerical studies as consequences of different ε-regimes rather than properties of the exact optimizer. The framework may extend to other channels with amplitude constraints.

major comments (2)

[Proof of Theorem 1 (upper bound)] The upper-bound construction in the polynomial-ε regime relies on the Gaussian-mixture approximation error being O(ε) with a prefactor independent of both A and β. If the hidden constant in the cited approximation theorem grows polynomially in A (or exponentially in β), the implied support size needed to reach gap ε would exceed the stated O(A√log A) upper bound, undermining the Θ claim.
[Section 4.2, Lemma 4] The wrapping-map reduction (linear interval to circle) must contribute an additive error o(ε) uniformly in the regime ε=A^{-β}. Any A-dependent remainder term would invalidate the reduction to the circular uniform-approximation problem and could alter the derived scaling.

minor comments (2)

[Section 3] Notation for the χ²-divergence bound on entropy (Eq. (12)) should explicitly state the range of validity for the constant C; the current statement leaves open whether C depends on A.
[Figure 2] Figure 2 caption should clarify whether the plotted curves correspond to the exact capacity or to the ε-approximations used in the analysis.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review of our manuscript. The comments raise important points about the uniformity of constants and error terms in our proofs. We address each major comment below with clarifications and indicate where minor revisions will strengthen the presentation.

read point-by-point responses

Referee: [Proof of Theorem 1 (upper bound)] The upper-bound construction in the polynomial-ε regime relies on the Gaussian-mixture approximation error being O(ε) with a prefactor independent of both A and β. If the hidden constant in the cited approximation theorem grows polynomially in A (or exponentially in β), the implied support size needed to reach gap ε would exceed the stated O(A√log A) upper bound, undermining the Θ claim.

Authors: We appreciate the referee's scrutiny of the constant dependence. The approximation result invoked for Gaussian mixtures (drawn from the uniform approximation theory for convolutions with the Gaussian kernel) yields an error bound whose leading constant depends only on the fixed variance of the noise and the smoothness parameters of the target density, remaining independent of both the amplitude A and the polynomial exponent β. This independence follows because the relevant approximation is performed after a suitable rescaling that normalizes the interval length, and the subsequent χ²-divergence bound converts the density error into an entropy gap of order ε without introducing additional A-factors. We will insert a short paragraph immediately after the statement of the approximation theorem in the proof of Theorem 1 to explicitly track these constants and confirm they are A-independent, thereby preserving the Θ(A√log A) claim. revision: partial
Referee: [Section 4.2, Lemma 4] The wrapping-map reduction (linear interval to circle) must contribute an additive error o(ε) uniformly in the regime ε=A^{-β}. Any A-dependent remainder term would invalidate the reduction to the circular uniform-approximation problem and could alter the derived scaling.

Authors: We thank the referee for emphasizing the need for uniform control on the wrapping error. In the construction of the wrapping map in Lemma 4, the map is defined via a periodic extension with a smooth cutoff whose transition region is of width o(1/A); the resulting additive discrepancy in mutual information is bounded by an exponentially decaying term in A (arising from the tail of the Gaussian density outside the principal period). This term is o(A^{-β}) for any fixed β ≥ 1, uniformly in the polynomial regime. We will augment the proof of Lemma 4 with an explicit lemma bounding this remainder (showing it is at most exp(-cA) for some c>0) to make the o(ε) property fully transparent and uniform. revision: partial

Circularity Check

0 steps flagged

No significant circularity; bounds derived from external approximation theory and standard divergence inequalities

full rationale

The paper defines K_ε(A) as the minimal support size for ε-near capacity and derives its scaling via approximation-theoretic bounds on Gaussian mixtures, χ²-divergence control of entropy, and a wrapping argument to the circle. These are invoked as independent external tools rather than being fitted to the target result or reduced to self-citations. No step equates the claimed Θ(A√log A) scaling to a definition or prior self-result by construction; the polynomial and exponential regimes follow from the stated error controls without tautological reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 3 axioms · 0 invented entities

The central claim rests on standard information-theoretic continuity properties, approximation-theory results for Gaussian mixtures, and a novel wrapping reduction; no free parameters or new entities are introduced in the abstract.

axioms (3)

domain assumption Mutual information is continuous with respect to weak convergence of input distributions under amplitude constraint
Required to define ε-capacity-achieving inputs
standard math χ²-divergence bounds the difference between differential entropies of Gaussian mixtures
Core information-theoretic control step
ad hoc to paper Wrapping map converts the amplitude-constrained problem to uniform approximation on the circle
Novel technique stated in the abstract

pith-pipeline@v0.9.0 · 5584 in / 1397 out tokens · 92891 ms · 2026-05-10T10:13:30.397422+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

The capacity achieving distribution for the amplitude constrained additive Gaussian channel: An upper bound on the number of mass points,

A. Dytso, S. Yagli, H. V . Poor, and S. Shamai (Shitz), “The capacity achieving distribution for the amplitude constrained additive Gaussian channel: An upper bound on the number of mass points,”IEEE Trans. Inf. Theory, vol. 66, no. 4, pp. 2006– 2022, 2020

work page 2006
[2]

Upper and lower bounds on the capacity of amplitude- constrained MIMO channels,

A. Dytso, M. Goldenbaum, S. Shamai (Shitz), and H. V . Poor, “Upper and lower bounds on the capacity of amplitude- constrained MIMO channels,” inProc. IEEE Global Commun. Conf. (GLOBECOM), Singapore, 2017, pp. 1–6

work page 2017
[3]

When are discrete channel inputs optimal? - Optimization tech- niques and some new results,

A. Dytso, M. Goldenbaum, H. V . Poor, and S. Shamai (Shitz), “When are discrete channel inputs optimal? - Optimization tech- niques and some new results,” inProc. Conf. on Inf. Sci. and Sys., Princeton, NJ, USA, March 2018, pp. 1–6

work page 2018
[4]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”Bell Syst. Tech. J., vol. 27, no. 379-423, 623-656, 1948

work page 1948
[5]

On the Information Capacity of Peak and Average Power Constrained Gaussian Channels,

J. G. Smith, “On the Information Capacity of Peak and Average Power Constrained Gaussian Channels,” PhD dissertation, Uni- versity of California, 1969

work page 1969
[6]

The information capacity of amplitude-and variance- constrained scalar Gaussian channels,

——, “The information capacity of amplitude-and variance- constrained scalar Gaussian channels,”Inform. and Contr., vol. 18, no. 3, pp. 203–219, 1971

work page 1971
[7]

Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels,

N. Sharma and S. Shamai (Shitz), “Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels,”Probl. Inf. Transm., vol. 46, no. 4, pp. 283–299, 2010

work page 2010
[8]

Available: https://arxiv.org/abs/2512.22691

H. Wang, L. Barletta, and A. Dytso, “An improved lower bound on cardinality of support of the amplitude- constrained awgn channel,” 2025. [Online]. Available: https://arxiv.org/abs/2512.22691

work page arXiv 2025
[9]

Maximizing the information learned from finite data selects a simple model,

H. H. Mattingly, M. K. Transtrum, M. C. Abbott, and B. B. Machta, “Maximizing the information learned from finite data selects a simple model,”Proceedings of the National Academy of Sciences, vol. 115, no. 8, pp. 1760–1765, 2018

work page 2018
[10]

A scaling law from discrete to continuous solutions of channel capacity problems in the low- noise limit,

M. C. Abbott and B. B. Machta, “A scaling law from discrete to continuous solutions of channel capacity problems in the low- noise limit,”Journal of Statistical Physics, vol. 176, no. 1, pp. 214–227, 2019

work page 2019
[11]

Discrete noninformative priors,

Z. Zhang, “Discrete noninformative priors,” Ph.D. dissertation, Yale University, New Haven, CT, 1994

work page 1994
[12]

Computation of channel capacity and rate-distortion functions,

R. Blahut, “Computation of channel capacity and rate-distortion functions,”IEEE Trans. Inf. Theory, vol. 18, no. 4, pp. 460–473, 1972

work page 1972
[13]

An algorithm for computing the capacity of arbitrary discrete memoryless channels,

S. Arimoto, “An algorithm for computing the capacity of arbitrary discrete memoryless channels,”IEEE Trans. Inf. Theory, vol. 18, no. 1, pp. 14–20, 1972

work page 1972
[14]

Amplitude constrained MIMO channels: Properties of optimal input distributions and bounds on the capacity,

A. Dytso, M. Goldenbaum, H. V . Poor, and S. Shamai (Shitz), “Amplitude constrained MIMO channels: Properties of optimal input distributions and bounds on the capacity,”Entropy, vol. 21, no. 2, p. 200, 2019

work page 2019
[15]

On the best approximation by finite Gaussian mixtures,

Y . Ma, Y . Wu, and P. Yang, “On the best approximation by finite Gaussian mixtures,”IEEE Trans. Inf. Theory, vol. 71, no. 7, pp. 5469–5492, 2025

work page 2025
[16]

An information theoretical identity and a problem involving capacity,

F. Topsøe, “An information theoretical identity and a problem involving capacity,”Studia Scientiarum Mathematicarum Hun- garica, vol. 2, no. 291-292, p. 246, 1967

work page 1967

[1] [1]

The capacity achieving distribution for the amplitude constrained additive Gaussian channel: An upper bound on the number of mass points,

A. Dytso, S. Yagli, H. V . Poor, and S. Shamai (Shitz), “The capacity achieving distribution for the amplitude constrained additive Gaussian channel: An upper bound on the number of mass points,”IEEE Trans. Inf. Theory, vol. 66, no. 4, pp. 2006– 2022, 2020

work page 2006

[2] [2]

Upper and lower bounds on the capacity of amplitude- constrained MIMO channels,

A. Dytso, M. Goldenbaum, S. Shamai (Shitz), and H. V . Poor, “Upper and lower bounds on the capacity of amplitude- constrained MIMO channels,” inProc. IEEE Global Commun. Conf. (GLOBECOM), Singapore, 2017, pp. 1–6

work page 2017

[3] [3]

When are discrete channel inputs optimal? - Optimization tech- niques and some new results,

A. Dytso, M. Goldenbaum, H. V . Poor, and S. Shamai (Shitz), “When are discrete channel inputs optimal? - Optimization tech- niques and some new results,” inProc. Conf. on Inf. Sci. and Sys., Princeton, NJ, USA, March 2018, pp. 1–6

work page 2018

[4] [4]

A mathematical theory of communication,

C. E. Shannon, “A mathematical theory of communication,”Bell Syst. Tech. J., vol. 27, no. 379-423, 623-656, 1948

work page 1948

[5] [5]

On the Information Capacity of Peak and Average Power Constrained Gaussian Channels,

J. G. Smith, “On the Information Capacity of Peak and Average Power Constrained Gaussian Channels,” PhD dissertation, Uni- versity of California, 1969

work page 1969

[6] [6]

The information capacity of amplitude-and variance- constrained scalar Gaussian channels,

——, “The information capacity of amplitude-and variance- constrained scalar Gaussian channels,”Inform. and Contr., vol. 18, no. 3, pp. 203–219, 1971

work page 1971

[7] [7]

Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels,

N. Sharma and S. Shamai (Shitz), “Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels,”Probl. Inf. Transm., vol. 46, no. 4, pp. 283–299, 2010

work page 2010

[8] [8]

Available: https://arxiv.org/abs/2512.22691

H. Wang, L. Barletta, and A. Dytso, “An improved lower bound on cardinality of support of the amplitude- constrained awgn channel,” 2025. [Online]. Available: https://arxiv.org/abs/2512.22691

work page arXiv 2025

[9] [9]

Maximizing the information learned from finite data selects a simple model,

H. H. Mattingly, M. K. Transtrum, M. C. Abbott, and B. B. Machta, “Maximizing the information learned from finite data selects a simple model,”Proceedings of the National Academy of Sciences, vol. 115, no. 8, pp. 1760–1765, 2018

work page 2018

[10] [10]

A scaling law from discrete to continuous solutions of channel capacity problems in the low- noise limit,

M. C. Abbott and B. B. Machta, “A scaling law from discrete to continuous solutions of channel capacity problems in the low- noise limit,”Journal of Statistical Physics, vol. 176, no. 1, pp. 214–227, 2019

work page 2019

[11] [11]

Discrete noninformative priors,

Z. Zhang, “Discrete noninformative priors,” Ph.D. dissertation, Yale University, New Haven, CT, 1994

work page 1994

[12] [12]

Computation of channel capacity and rate-distortion functions,

R. Blahut, “Computation of channel capacity and rate-distortion functions,”IEEE Trans. Inf. Theory, vol. 18, no. 4, pp. 460–473, 1972

work page 1972

[13] [13]

An algorithm for computing the capacity of arbitrary discrete memoryless channels,

S. Arimoto, “An algorithm for computing the capacity of arbitrary discrete memoryless channels,”IEEE Trans. Inf. Theory, vol. 18, no. 1, pp. 14–20, 1972

work page 1972

[14] [14]

Amplitude constrained MIMO channels: Properties of optimal input distributions and bounds on the capacity,

A. Dytso, M. Goldenbaum, H. V . Poor, and S. Shamai (Shitz), “Amplitude constrained MIMO channels: Properties of optimal input distributions and bounds on the capacity,”Entropy, vol. 21, no. 2, p. 200, 2019

work page 2019

[15] [15]

On the best approximation by finite Gaussian mixtures,

Y . Ma, Y . Wu, and P. Yang, “On the best approximation by finite Gaussian mixtures,”IEEE Trans. Inf. Theory, vol. 71, no. 7, pp. 5469–5492, 2025

work page 2025

[16] [16]

An information theoretical identity and a problem involving capacity,

F. Topsøe, “An information theoretical identity and a problem involving capacity,”Studia Scientiarum Mathematicarum Hun- garica, vol. 2, no. 291-292, p. 246, 1967

work page 1967