How Does Machine Learning Manage Complexity?

Lance Fortnow

arxiv: 2604.07233 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.CC

How Does Machine Learning Manage Complexity?

Lance Fortnow This is my paper

Pith reviewed 2026-05-10 18:54 UTC · model grok-4.3

classification 💻 cs.LG cs.CC

keywords machine learningcomputational complexitypseudorandom generatorscomputable distributionsmaximum entropylearning theorycomplexity theory

0 comments

The pith

Machine learning models minimizing error on pseudorandom generator outputs must produce distributions close to uniform.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper offers a computational complexity perspective on why machine learning can handle complex systems. It abstracts machine learning models as generators of distributions that are computable by polynomial-size circuits and have only polynomially bounded maximum entropy. Using this view, the paper proves that such a model, when trying to match the distribution from a cryptographic pseudorandom generator with minimal error, must output something very close to the uniform distribution. This matters because it shows how learning can manage complexity through probability rather than by computing highly structured outputs directly. The approach sidesteps details of training algorithms to focus on what kinds of distributions are learnable in principle.

Core claim

We abstract machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy. We illustrate how learning computable distributions models complexity by showing that if a machine learning model produces a distribution mu that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then mu must be close to uniform.

What carries the argument

P/poly-computable distributions with polynomially-bounded max-entropy, which allow complexity-theoretic analysis of what distributions machine learning can produce.

If this is right

Machine learning manages complexity by focusing on computable distributions using probability.
Minimizing error against a pseudorandom generator output requires the learned distribution to be close to uniform.
The abstraction works without depending on particular training algorithms or neural network architectures.
This provides a complexity-theoretic explanation for the power of machine learning on complex data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might explain why machine learning generalizes well on real data that contains pseudorandom elements.
It opens the door to using more advanced complexity results to analyze learning limits.
One could test the abstraction by checking if trained models on PRG data indeed produce near-uniform outputs.

Load-bearing premise

Machine learning can be faithfully abstracted as producing P/poly-computable distributions with polynomially-bounded max-entropy independent of any particular training algorithm or architecture.

What would settle it

Observe a machine learning model trained on samples from a pseudorandom generator that achieves low error but outputs a distribution far from uniform while still being P/poly-computable with bounded max-entropy.

read the original abstract

We provide a computational complexity lens to understand the power of machine learning models, particularly their ability to model complex systems. Machine learning models are often trained on data drawn from sampleable or more complex distributions, a far wider range of distributions than just computable ones. By focusing on computable distributions, machine learning models can better manage complexity via probability. We abstract away from specific learning mechanisms, modeling machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy. We illustrate how learning computable distributions models complexity by showing that if a machine learning model produces a distribution $\mu$ that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then $\mu$ must be close to uniform.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The P/poly plus bounded-entropy abstraction is a clean modeling move, but the PRG illustration fails because the generator distribution itself meets every stated condition yet stays far from uniform.

read the letter

The paper abstracts machine learning as the production of distributions computable by polynomial-size circuits that also have polynomially bounded maximum entropy. It then illustrates this by claiming that any such distribution minimizing error to one generated by a cryptographic pseudorandom generator must be close to uniform. This modeling decision is the main new element. It moves away from specific algorithms or architectures and focuses on the computational class of the output distribution. That framing could help organize discussions at the edge of learning theory and complexity. The setup does a decent job highlighting how ML might handle complexity by sticking to computable distributions instead of trying to capture everything directly. The central illustration runs into trouble right away. The pseudorandom generator distribution satisfies the P/poly and entropy conditions, minimizes error to itself at distance zero, and yet is far from uniform. Without a precise definition of what minimizes error means or why the PRG case is excluded, the implication does not follow. The abstract gives no proof sketch or error bounds to check against. If the full paper supplies those missing pieces or uses a computational rather than statistical notion of closeness, the point might stand. As written, though, the argument has a gap. This is the kind of paper that could interest complexity theorists who want to see learning ideas through their lens. A reader comfortable with circuit classes and PRGs would get the most out of it and might see ways to strengthen the model. It is worth sending to a serious referee. The core modeling idea has enough substance that a review could clarify whether the illustration can be repaired or if a different example is needed. I recommend putting it through peer review rather than desk rejecting it.

Referee Report

1 major / 1 minor

Summary. The paper offers a computational complexity lens on machine learning's ability to model complex systems. It abstracts ML models as producing P/poly-computable distributions with polynomially-bounded max-entropy, arguing that this abstraction allows ML to manage complexity via probability. The central illustration claims that any such distribution μ minimizing error to a cryptographic pseudorandom generator's output must be close to uniform.

Significance. If the central illustration were correct, the work would establish a useful connection between learning theory, cryptography, and complexity classes, showing how the P/poly + bounded-entropy abstraction captures why ML can handle pseudorandom complexity without explicit enumeration. This could inform analyses of generalization in settings where the target distribution is computationally indistinguishable from uniform.

major comments (1)

[Abstract] Abstract, illustration paragraph: the claim that any P/poly-computable μ with polynomially-bounded max-entropy that minimizes error to a PRG distribution must be close to uniform is false under standard statistical distances. The PRG distribution itself satisfies the abstraction (it is P/poly-computable with seed-length entropy), achieves minimal error 0 to itself, and is statistically far from uniform by definition of a PRG. This counterexample is load-bearing for the illustration and requires either a non-statistical error notion or an unstated constraint on how minimization occurs (e.g., via samples from an unknown source).

minor comments (1)

[Abstract] The notions of 'error' and 'close to uniform' lack explicit definitions or distance measures (TV, KL, etc.), and no proof sketch, error bounds, or formal statement of the minimization process is supplied, hindering verification of the derivation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting this critical issue with the central illustration. We agree that the claim as currently phrased in the abstract does not hold under standard statistical distances, and we will revise the manuscript accordingly to incorporate the necessary clarifications and constraints on the minimization process.

read point-by-point responses

Referee: [Abstract] Abstract, illustration paragraph: the claim that any P/poly-computable μ with polynomially-bounded max-entropy that minimizes error to a PRG distribution must be close to uniform is false under standard statistical distances. The PRG distribution itself satisfies the abstraction (it is P/poly-computable with seed-length entropy), achieves minimal error 0 to itself, and is statistically far from uniform by definition of a PRG. This counterexample is load-bearing for the illustration and requires either a non-statistical error notion or an unstated constraint on how minimization occurs (e.g., via samples from an unknown source).

Authors: We agree that the stated claim is false under statistical distance, as the PRG distribution itself is a counterexample that satisfies the P/poly + bounded-entropy abstraction while achieving zero error to itself. The original intention of the illustration was to show that an efficient machine learning procedure, when trained to minimize error on samples drawn from a target distribution that is computationally indistinguishable from uniform (as with a PRG), must produce an output distribution close to uniform, since the learner cannot exploit the PRG's hidden structure. To address the referee's point, we will revise the abstract and the relevant illustration section to explicitly specify that minimization occurs via an efficient learning algorithm trained on samples from the unknown target distribution, rather than an arbitrary μ satisfying the abstraction properties. We will also clarify that the error notion is with respect to the outcome of this sample-based training process. This incorporates the suggested constraint and removes the counterexample. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external cryptographic assumptions

full rationale

The paper abstracts machine learning as P/poly-computable distributions with polynomially bounded max-entropy and illustrates a property using cryptographic pseudorandom generators. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described chain. The central claim is presented as following from standard crypto assumptions (indistinguishability from uniform) rather than reducing to the paper's own inputs or definitions by construction. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The argument rests on standard complexity assumptions about the existence of cryptographic pseudorandom generators and the definition of P/poly; no new free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Cryptographic pseudorandom generators exist and produce distributions indistinguishable from uniform by polynomial-size circuits.
Invoked to construct the target distribution against which the learned μ is compared.
domain assumption P/poly captures the computational power relevant to machine learning models.
Used to restrict the class of distributions that ML is allowed to produce.

pith-pipeline@v0.9.0 · 5406 in / 1283 out tokens · 26297 ms · 2026-05-10T18:54:22.902253+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We abstract away from specific learning mechanisms, modeling machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy... if a machine learning model produces a distribution μ that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then μ must be close to uniform.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 6.1... KL(U∥μ) is negligible in n

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

[1]

Claude Opus 4.6 system card

Anthropic. Claude Opus 4.6 system card. Technical report, Anthropic, February 2026

work page 2026
[2]

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency ( FAccT ) , pages 610--623, 2021

work page 2021
[3]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory . Wiley-Interscience, Hoboken, NJ, 2 edition, 2006

work page 2006
[4]

L. Fortnow. Fifty years of P vs.\ NP and the possibility of the impossible. Communications of the ACM , 65(1):76–85, January 2022

work page 2022
[5]

Mark Gold

E. Mark Gold. Language identification in the limit. Information and Control , 10(5):447--474, 1967

work page 1967
[6]

Foundations of Cryptography: Volume 1, Basic Tools

Oded Goldreich. Foundations of Cryptography: Volume 1, Basic Tools . Cambridge University Press, 2001

work page 2001
[7]

H stad, R

J. H stad, R. Impagliazzo, L. Levin, and M. Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing , 28(4):1364--1396, August 1999

work page 1999
[8]

Learning in pessiland via inductive inference

Shuichi Hirahara and Mikito Nanashima. Learning in pessiland via inductive inference. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS) , pages 447--457, 2023

work page 2023
[9]

Impagliazzo

R. Impagliazzo. A personal view of average-case complexity theory. In Proceedings of the 10th Annual Conference on Structure in Complexity Theory , pages 134--147. IEEE Computer Society Press, 1995

work page 1995
[10]

Cryptographic limitations on learning boolean formulae and finite automata

Michael Kearns and Leslie Valiant. Cryptographic limitations on learning boolean formulae and finite automata. J. ACM , 41(1):67–95, January 1994

work page 1994
[11]

Recherches, 1 o , sur l'int \'e gration des \'e quations diff \'e rentielles aux diff \'e rences finies, et sur leur usage dans la th \'e orie des hasards

Pierre-Simon Laplace. Recherches, 1 o , sur l'int \'e gration des \'e quations diff \'e rentielles aux diff \'e rences finies, et sur leur usage dans la th \'e orie des hasards. In Oeuvres compl \`e tes de Laplace , volume 8, pages 69--197. Gauthier-Villars, Paris, 1891. Originally published 1776; written 1773. Translation of pp. 144--145 in Charles Couls...

work page 1997
[12]

An introduction to Kolmogorov complexity and its applications

Ming Li and Paul Vit \'a nyi. An introduction to Kolmogorov complexity and its applications . Springer Science & Business Media, 2009

work page 2009
[13]

Modeling by shortest data description

Jorma Rissanen. Modeling by shortest data description. Automatica , 14(5):465--471, 1978

work page 1978
[14]

Claude E. Shannon. A mathematical theory of communication. Bell System Technical Journal , 27(3):379--423, 1948

work page 1948
[15]

Solomonoff

Ray J. Solomonoff. A formal theory of inductive inference, Part I . Information and Control , 7(1):1--22, 1964

work page 1964
[16]

A. Turing. On computable numbers, with an application to the Entscheidungsproblem . Proceedings of the London Mathematical Society , 42:230--265, 1936

work page 1936
[17]

L. Valiant. A theory of the learnable. Communications of the ACM , 27(11):1134--1142, 1984

work page 1984
[18]

Vitanyi and Ming Li

P.M.B. Vitanyi and Ming Li. Minimum description length induction, bayesianism, and kolmogorov complexity. IEEE Transactions on Information Theory , 46(2):446--464, 2000

work page 2000
[19]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems ( NeurIPS ) , volume 30, 2017. arXiv:1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017
[20]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 24824--24837. Curran Ass...

work page 2022
[21]

A. Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science , pages 80--91. IEEE, New York, 1982

work page 1982

[1] [1]

Claude Opus 4.6 system card

Anthropic. Claude Opus 4.6 system card. Technical report, Anthropic, February 2026

work page 2026

[2] [2]

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency ( FAccT ) , pages 610--623, 2021

work page 2021

[3] [3]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas. Elements of Information Theory . Wiley-Interscience, Hoboken, NJ, 2 edition, 2006

work page 2006

[4] [4]

L. Fortnow. Fifty years of P vs.\ NP and the possibility of the impossible. Communications of the ACM , 65(1):76–85, January 2022

work page 2022

[5] [5]

Mark Gold

E. Mark Gold. Language identification in the limit. Information and Control , 10(5):447--474, 1967

work page 1967

[6] [6]

Foundations of Cryptography: Volume 1, Basic Tools

Oded Goldreich. Foundations of Cryptography: Volume 1, Basic Tools . Cambridge University Press, 2001

work page 2001

[7] [7]

H stad, R

J. H stad, R. Impagliazzo, L. Levin, and M. Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing , 28(4):1364--1396, August 1999

work page 1999

[8] [8]

Learning in pessiland via inductive inference

Shuichi Hirahara and Mikito Nanashima. Learning in pessiland via inductive inference. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS) , pages 447--457, 2023

work page 2023

[9] [9]

Impagliazzo

R. Impagliazzo. A personal view of average-case complexity theory. In Proceedings of the 10th Annual Conference on Structure in Complexity Theory , pages 134--147. IEEE Computer Society Press, 1995

work page 1995

[10] [10]

Cryptographic limitations on learning boolean formulae and finite automata

Michael Kearns and Leslie Valiant. Cryptographic limitations on learning boolean formulae and finite automata. J. ACM , 41(1):67–95, January 1994

work page 1994

[11] [11]

Recherches, 1 o , sur l'int \'e gration des \'e quations diff \'e rentielles aux diff \'e rences finies, et sur leur usage dans la th \'e orie des hasards

Pierre-Simon Laplace. Recherches, 1 o , sur l'int \'e gration des \'e quations diff \'e rentielles aux diff \'e rences finies, et sur leur usage dans la th \'e orie des hasards. In Oeuvres compl \`e tes de Laplace , volume 8, pages 69--197. Gauthier-Villars, Paris, 1891. Originally published 1776; written 1773. Translation of pp. 144--145 in Charles Couls...

work page 1997

[12] [12]

An introduction to Kolmogorov complexity and its applications

Ming Li and Paul Vit \'a nyi. An introduction to Kolmogorov complexity and its applications . Springer Science & Business Media, 2009

work page 2009

[13] [13]

Modeling by shortest data description

Jorma Rissanen. Modeling by shortest data description. Automatica , 14(5):465--471, 1978

work page 1978

[14] [14]

Claude E. Shannon. A mathematical theory of communication. Bell System Technical Journal , 27(3):379--423, 1948

work page 1948

[15] [15]

Solomonoff

Ray J. Solomonoff. A formal theory of inductive inference, Part I . Information and Control , 7(1):1--22, 1964

work page 1964

[16] [16]

A. Turing. On computable numbers, with an application to the Entscheidungsproblem . Proceedings of the London Mathematical Society , 42:230--265, 1936

work page 1936

[17] [17]

L. Valiant. A theory of the learnable. Communications of the ACM , 27(11):1134--1142, 1984

work page 1984

[18] [18]

Vitanyi and Ming Li

P.M.B. Vitanyi and Ming Li. Minimum description length induction, bayesianism, and kolmogorov complexity. IEEE Transactions on Information Theory , 46(2):446--464, 2000

work page 2000

[19] [19]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems ( NeurIPS ) , volume 30, 2017. arXiv:1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017

[20] [20]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 24824--24837. Curran Ass...

work page 2022

[21] [21]

A. Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science , pages 80--91. IEEE, New York, 1982

work page 1982