A Sufficient-Statistic Reduction of the Information Bottleneck to a Low-Dimensional Problem
Pith reviewed 2026-05-07 12:39 UTC · model grok-4.3
The pith
If the target conditional factors through a sufficient statistic, the Information Bottleneck problem on the full source is exactly equivalent to the problem on the reduced statistic.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If p(C|T) factors through a sufficient statistic φ(T), then the Information Bottleneck functional I(T;R) − η I(R;C) attains the same minimum value for every η when R is optimized over functions of T or over functions of φ(T), and the optimal representations are related by pullback through φ; consequently the entire IB curve is identical for the original and reduced problems.
What carries the argument
The sufficient statistic φ(T) for the target C, defined by the factorization p(C|T) = p(C|φ(T)), which collapses the source while preserving all information relevant to the IB objective.
If this is right
- The complexity of solving the IB problem scales with the dimension of the sufficient statistic rather than the dimension of the original source.
- The known closed-form Gaussian IB solution is recovered as the special case in which φ extracts the sufficient statistics of the Gaussian conditional.
- The same reduction immediately yields a nonlinear-Gaussian generalization of the IB solution.
- When a low-dimensional sufficient statistic is known, the exact IB curve can be computed by solving the reduced problem at cost set by that statistic.
Where Pith is reading between the lines
- The reduction supplies an exact structural condition that separates tractable from intractable IB instances.
- The same factorization argument may apply to other information-theoretic objectives that depend on the source only through its conditional on a target.
- In settings where approximate sufficient statistics can be learned, the reduction offers a route to approximate but still exact-on-the-curve IB solutions.
Load-bearing premise
There exists a function φ such that the conditional distribution of the target given the source depends on the source only through φ.
What would settle it
An explicit pair of distributions where p(C|T) = p(C|φ(T)) yet the minimal IB Lagrangian value for (T,C) differs from the value for (φ(T),C) at some η.
read the original abstract
We show that if the conditional distribution p(C | T) factors through a sufficient statistic {\phi}(T), then the Information Bottleneck (IB) problem for (T, C) is exactly equivalent to the IB problem for ({\phi}(T), C). The reduction is loss-free: it preserves the full IB curve, the Lagrangian optimum at every trade-off parameter \b{eta}, and the optimal representations up to pullback through {\phi}. As a result, the computational complexity of solving the IB problem is governed by the dimension of the sufficient statistic rather than the ambient dimension of the source. This identifies an exact structural condition under which the generic IB problem becomes tractable, and gives a formal bridge between the discrete and linear-Gaussian regimes. We then show that the classical Gaussian IB solution of Chechik, Globerson, Tishby and Weiss is an immediate corollary of this reduction, and we state a nonlinear-Gaussian generalisation. A small numerical example illustrates the practical consequence: when a low-dimensional sufficient statistic is available, the exact IB curve can be computed on the reduced problem at a cost determined by the statistic rather than by the ambient source dimension.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript shows that if p(C|T) factors through a sufficient statistic φ(T), then the Information Bottleneck problem for the pair (T,C) is exactly equivalent to the IB problem for (φ(T),C). The equivalence is loss-free: it preserves the full IB curve, the Lagrangian optimum for every trade-off parameter η, and the optimal representations up to pullback through φ. The authors recover the classical linear-Gaussian IB solution of Chechik et al. as an immediate corollary, state a nonlinear-Gaussian generalization, and illustrate the computational reduction with a small numerical example.
Significance. If the central reduction holds, the result is significant: it supplies an exact structural condition under which the generally intractable IB problem reduces to a problem whose complexity is governed by the dimension of the sufficient statistic rather than the ambient dimension of T. This provides a formal bridge between discrete and linear-Gaussian regimes and explains why the Gaussian IB admits a closed-form solution. The loss-free preservation of the entire curve and all optima strengthens the practical utility of the reduction for high-dimensional sources.
minor comments (2)
- Abstract: the claim that the reduction 'identifies an exact structural condition under which the generic IB problem becomes tractable' should be qualified by noting that tractability remains governed by the cardinality or dimension of φ(T) itself.
- The nonlinear-Gaussian generalisation is announced but its precise statement and derivation are not visible in the provided abstract; a short proof sketch or explicit statement of the resulting objective would aid verification.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our manuscript, the assessment of its significance, and the recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper's core reduction follows directly from the standard definition of a sufficient statistic (p(C|T) = p(C|φ(T))) and the chain-rule decomposition of mutual information I(T;Z) = I(φ(T);Z) + I(T;Z|φ(T)) ≥ I(φ(T);Z), both of which are external mathematical facts independent of the paper. The equivalence of the full IB curve, Lagrangian optima, and optimal representations is a straightforward consequence of the induced Markov chain C ⊥ T | φ(T) and does not involve any fitted parameters, self-referential predictions, or load-bearing self-citations. The Gaussian IB result is recovered as a corollary from prior independent work (Chechik et al.), and the nonlinear-Gaussian extension is stated without circularity. No step reduces by construction to its own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Definition of sufficient statistic: φ(T) is sufficient for C if p(C|T) = p(C|φ(T))
- domain assumption IB objective is the Lagrangian I(T;Z) - η I(Z;C) minimized over representations Z
Reference graph
Works this paper leans on
-
[1]
Alessandro Achille and Stefano Soatto. Emergence of invariance and disentanglement in deep representations.Journal of Machine Learning Research, 19(50):1–34, 2018
work page 2018
-
[2]
Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, and Kevin Murphy. Deep variational information bottleneck. InInternational Conference on Learning Representations, Toulon, France, 2017. Poster
work page 2017
-
[3]
Alemi, Ben Poole, Ian Fischer, Joshua V
Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, and Kevin Murphy. Fixing a broken ELBO. InProceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 159–168, 2018
work page 2018
-
[4]
Richard E. Blahut. Computation of channel capacity and rate-distortion functions.IEEE Transactions on Information Theory, 18(4):460–473, 1972
work page 1972
-
[5]
Gal Chechik, Amir Globerson, Naftali Tishby, and Yair Weiss. Information bottleneck for gaussian variables.Journal of Machine Learning Research, 6(6):165–188, 2005. 8
work page 2005
-
[6]
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, Hoboken, NJ, 2 edition, 2006
work page 2006
-
[7]
Springer, New York, 2 edition, 2002
Olav Kallenberg.Foundations of Modern Probability. Springer, New York, 2 edition, 2002
work page 2002
-
[8]
Amichai Painsky and Naftali Tishby. Gaussian lower bound for the information bottleneck limit.Journal of Machine Learning Research, 18(213):1–29, 2018
work page 2018
-
[9]
Noam Slonim, Gurinder Singh Atwal, Gaˇ sper Tkaˇ cik, and William Bialek. Information- based clustering.Proceedings of the National Academy of Sciences of the United States of America, 102(51):18297–18302, 2005
work page 2005
-
[10]
Naftali Tishby, Fernando C. N. Pereira, and William Bialek. The information bottleneck method. InProceedings of the 37th Annual Allerton Conference on Communication, Control, and Computing, pages 368–377, Monticello, IL, 1999. A Notation summary Symbol Meaning T,C,Xsource, relevance variable, representation φsufficient statistic mapφ:T → Z IT,C(R) IB info...
work page 1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.