How Does Machine Learning Manage Complexity?
Pith reviewed 2026-05-10 18:54 UTC · model grok-4.3
The pith
Machine learning models minimizing error on pseudorandom generator outputs must produce distributions close to uniform.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We abstract machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy. We illustrate how learning computable distributions models complexity by showing that if a machine learning model produces a distribution mu that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then mu must be close to uniform.
What carries the argument
P/poly-computable distributions with polynomially-bounded max-entropy, which allow complexity-theoretic analysis of what distributions machine learning can produce.
If this is right
- Machine learning manages complexity by focusing on computable distributions using probability.
- Minimizing error against a pseudorandom generator output requires the learned distribution to be close to uniform.
- The abstraction works without depending on particular training algorithms or neural network architectures.
- This provides a complexity-theoretic explanation for the power of machine learning on complex data.
Where Pith is reading between the lines
- This approach might explain why machine learning generalizes well on real data that contains pseudorandom elements.
- It opens the door to using more advanced complexity results to analyze learning limits.
- One could test the abstraction by checking if trained models on PRG data indeed produce near-uniform outputs.
Load-bearing premise
Machine learning can be faithfully abstracted as producing P/poly-computable distributions with polynomially-bounded max-entropy independent of any particular training algorithm or architecture.
What would settle it
Observe a machine learning model trained on samples from a pseudorandom generator that achieves low error but outputs a distribution far from uniform while still being P/poly-computable with bounded max-entropy.
read the original abstract
We provide a computational complexity lens to understand the power of machine learning models, particularly their ability to model complex systems. Machine learning models are often trained on data drawn from sampleable or more complex distributions, a far wider range of distributions than just computable ones. By focusing on computable distributions, machine learning models can better manage complexity via probability. We abstract away from specific learning mechanisms, modeling machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy. We illustrate how learning computable distributions models complexity by showing that if a machine learning model produces a distribution $\mu$ that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then $\mu$ must be close to uniform.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper offers a computational complexity lens on machine learning's ability to model complex systems. It abstracts ML models as producing P/poly-computable distributions with polynomially-bounded max-entropy, arguing that this abstraction allows ML to manage complexity via probability. The central illustration claims that any such distribution μ minimizing error to a cryptographic pseudorandom generator's output must be close to uniform.
Significance. If the central illustration were correct, the work would establish a useful connection between learning theory, cryptography, and complexity classes, showing how the P/poly + bounded-entropy abstraction captures why ML can handle pseudorandom complexity without explicit enumeration. This could inform analyses of generalization in settings where the target distribution is computationally indistinguishable from uniform.
major comments (1)
- [Abstract] Abstract, illustration paragraph: the claim that any P/poly-computable μ with polynomially-bounded max-entropy that minimizes error to a PRG distribution must be close to uniform is false under standard statistical distances. The PRG distribution itself satisfies the abstraction (it is P/poly-computable with seed-length entropy), achieves minimal error 0 to itself, and is statistically far from uniform by definition of a PRG. This counterexample is load-bearing for the illustration and requires either a non-statistical error notion or an unstated constraint on how minimization occurs (e.g., via samples from an unknown source).
minor comments (1)
- [Abstract] The notions of 'error' and 'close to uniform' lack explicit definitions or distance measures (TV, KL, etc.), and no proof sketch, error bounds, or formal statement of the minimization process is supplied, hindering verification of the derivation.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for highlighting this critical issue with the central illustration. We agree that the claim as currently phrased in the abstract does not hold under standard statistical distances, and we will revise the manuscript accordingly to incorporate the necessary clarifications and constraints on the minimization process.
read point-by-point responses
-
Referee: [Abstract] Abstract, illustration paragraph: the claim that any P/poly-computable μ with polynomially-bounded max-entropy that minimizes error to a PRG distribution must be close to uniform is false under standard statistical distances. The PRG distribution itself satisfies the abstraction (it is P/poly-computable with seed-length entropy), achieves minimal error 0 to itself, and is statistically far from uniform by definition of a PRG. This counterexample is load-bearing for the illustration and requires either a non-statistical error notion or an unstated constraint on how minimization occurs (e.g., via samples from an unknown source).
Authors: We agree that the stated claim is false under statistical distance, as the PRG distribution itself is a counterexample that satisfies the P/poly + bounded-entropy abstraction while achieving zero error to itself. The original intention of the illustration was to show that an efficient machine learning procedure, when trained to minimize error on samples drawn from a target distribution that is computationally indistinguishable from uniform (as with a PRG), must produce an output distribution close to uniform, since the learner cannot exploit the PRG's hidden structure. To address the referee's point, we will revise the abstract and the relevant illustration section to explicitly specify that minimization occurs via an efficient learning algorithm trained on samples from the unknown target distribution, rather than an arbitrary μ satisfying the abstraction properties. We will also clarify that the error notion is with respect to the outcome of this sample-based training process. This incorporates the suggested constraint and removes the counterexample. revision: yes
Circularity Check
No circularity; derivation relies on external cryptographic assumptions
full rationale
The paper abstracts machine learning as P/poly-computable distributions with polynomially bounded max-entropy and illustrates a property using cryptographic pseudorandom generators. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described chain. The central claim is presented as following from standard crypto assumptions (indistinguishability from uniform) rather than reducing to the paper's own inputs or definitions by construction. The derivation is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Cryptographic pseudorandom generators exist and produce distributions indistinguishable from uniform by polynomial-size circuits.
- domain assumption P/poly captures the computational power relevant to machine learning models.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We abstract away from specific learning mechanisms, modeling machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy... if a machine learning model produces a distribution μ that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then μ must be close to uniform.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 6.1... KL(U∥μ) is negligible in n
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anthropic. Claude Opus 4.6 system card. Technical report, Anthropic, February 2026
work page 2026
-
[2]
Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell
Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency ( FAccT ) , pages 610--623, 2021
work page 2021
-
[3]
Thomas M. Cover and Joy A. Thomas. Elements of Information Theory . Wiley-Interscience, Hoboken, NJ, 2 edition, 2006
work page 2006
-
[4]
L. Fortnow. Fifty years of P vs.\ NP and the possibility of the impossible. Communications of the ACM , 65(1):76–85, January 2022
work page 2022
- [5]
-
[6]
Foundations of Cryptography: Volume 1, Basic Tools
Oded Goldreich. Foundations of Cryptography: Volume 1, Basic Tools . Cambridge University Press, 2001
work page 2001
- [7]
-
[8]
Learning in pessiland via inductive inference
Shuichi Hirahara and Mikito Nanashima. Learning in pessiland via inductive inference. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS) , pages 447--457, 2023
work page 2023
-
[9]
R. Impagliazzo. A personal view of average-case complexity theory. In Proceedings of the 10th Annual Conference on Structure in Complexity Theory , pages 134--147. IEEE Computer Society Press, 1995
work page 1995
-
[10]
Cryptographic limitations on learning boolean formulae and finite automata
Michael Kearns and Leslie Valiant. Cryptographic limitations on learning boolean formulae and finite automata. J. ACM , 41(1):67–95, January 1994
work page 1994
-
[11]
Pierre-Simon Laplace. Recherches, 1 o , sur l'int \'e gration des \'e quations diff \'e rentielles aux diff \'e rences finies, et sur leur usage dans la th \'e orie des hasards. In Oeuvres compl \`e tes de Laplace , volume 8, pages 69--197. Gauthier-Villars, Paris, 1891. Originally published 1776; written 1773. Translation of pp. 144--145 in Charles Couls...
work page 1997
-
[12]
An introduction to Kolmogorov complexity and its applications
Ming Li and Paul Vit \'a nyi. An introduction to Kolmogorov complexity and its applications . Springer Science & Business Media, 2009
work page 2009
-
[13]
Modeling by shortest data description
Jorma Rissanen. Modeling by shortest data description. Automatica , 14(5):465--471, 1978
work page 1978
-
[14]
Claude E. Shannon. A mathematical theory of communication. Bell System Technical Journal , 27(3):379--423, 1948
work page 1948
-
[15]
Ray J. Solomonoff. A formal theory of inductive inference, Part I . Information and Control , 7(1):1--22, 1964
work page 1964
-
[16]
A. Turing. On computable numbers, with an application to the Entscheidungsproblem . Proceedings of the London Mathematical Society , 42:230--265, 1936
work page 1936
-
[17]
L. Valiant. A theory of the learnable. Communications of the ACM , 27(11):1134--1142, 1984
work page 1984
-
[18]
P.M.B. Vitanyi and Ming Li. Minimum description length induction, bayesianism, and kolmogorov complexity. IEEE Transactions on Information Theory , 46(2):446--464, 2000
work page 2000
-
[19]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems ( NeurIPS ) , volume 30, 2017. arXiv:1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
Chain-of-thought prompting elicits reasoning in large language models
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 24824--24837. Curran Ass...
work page 2022
-
[21]
A. Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science , pages 80--91. IEEE, New York, 1982
work page 1982
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.