pith. sign in

arxiv: 2604.07233 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.CC

How Does Machine Learning Manage Complexity?

Pith reviewed 2026-05-10 18:54 UTC · model grok-4.3

classification 💻 cs.LG cs.CC
keywords machine learningcomputational complexitypseudorandom generatorscomputable distributionsmaximum entropylearning theorycomplexity theory
0
0 comments X

The pith

Machine learning models minimizing error on pseudorandom generator outputs must produce distributions close to uniform.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper offers a computational complexity perspective on why machine learning can handle complex systems. It abstracts machine learning models as generators of distributions that are computable by polynomial-size circuits and have only polynomially bounded maximum entropy. Using this view, the paper proves that such a model, when trying to match the distribution from a cryptographic pseudorandom generator with minimal error, must output something very close to the uniform distribution. This matters because it shows how learning can manage complexity through probability rather than by computing highly structured outputs directly. The approach sidesteps details of training algorithms to focus on what kinds of distributions are learnable in principle.

Core claim

We abstract machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy. We illustrate how learning computable distributions models complexity by showing that if a machine learning model produces a distribution mu that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then mu must be close to uniform.

What carries the argument

P/poly-computable distributions with polynomially-bounded max-entropy, which allow complexity-theoretic analysis of what distributions machine learning can produce.

If this is right

  • Machine learning manages complexity by focusing on computable distributions using probability.
  • Minimizing error against a pseudorandom generator output requires the learned distribution to be close to uniform.
  • The abstraction works without depending on particular training algorithms or neural network architectures.
  • This provides a complexity-theoretic explanation for the power of machine learning on complex data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach might explain why machine learning generalizes well on real data that contains pseudorandom elements.
  • It opens the door to using more advanced complexity results to analyze learning limits.
  • One could test the abstraction by checking if trained models on PRG data indeed produce near-uniform outputs.

Load-bearing premise

Machine learning can be faithfully abstracted as producing P/poly-computable distributions with polynomially-bounded max-entropy independent of any particular training algorithm or architecture.

What would settle it

Observe a machine learning model trained on samples from a pseudorandom generator that achieves low error but outputs a distribution far from uniform while still being P/poly-computable with bounded max-entropy.

read the original abstract

We provide a computational complexity lens to understand the power of machine learning models, particularly their ability to model complex systems. Machine learning models are often trained on data drawn from sampleable or more complex distributions, a far wider range of distributions than just computable ones. By focusing on computable distributions, machine learning models can better manage complexity via probability. We abstract away from specific learning mechanisms, modeling machine learning as producing P/poly-computable distributions with polynomially-bounded max-entropy. We illustrate how learning computable distributions models complexity by showing that if a machine learning model produces a distribution $\mu$ that minimizes error against the distribution generated by a cryptographic pseudorandom generator, then $\mu$ must be close to uniform.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper offers a computational complexity lens on machine learning's ability to model complex systems. It abstracts ML models as producing P/poly-computable distributions with polynomially-bounded max-entropy, arguing that this abstraction allows ML to manage complexity via probability. The central illustration claims that any such distribution μ minimizing error to a cryptographic pseudorandom generator's output must be close to uniform.

Significance. If the central illustration were correct, the work would establish a useful connection between learning theory, cryptography, and complexity classes, showing how the P/poly + bounded-entropy abstraction captures why ML can handle pseudorandom complexity without explicit enumeration. This could inform analyses of generalization in settings where the target distribution is computationally indistinguishable from uniform.

major comments (1)
  1. [Abstract] Abstract, illustration paragraph: the claim that any P/poly-computable μ with polynomially-bounded max-entropy that minimizes error to a PRG distribution must be close to uniform is false under standard statistical distances. The PRG distribution itself satisfies the abstraction (it is P/poly-computable with seed-length entropy), achieves minimal error 0 to itself, and is statistically far from uniform by definition of a PRG. This counterexample is load-bearing for the illustration and requires either a non-statistical error notion or an unstated constraint on how minimization occurs (e.g., via samples from an unknown source).
minor comments (1)
  1. [Abstract] The notions of 'error' and 'close to uniform' lack explicit definitions or distance measures (TV, KL, etc.), and no proof sketch, error bounds, or formal statement of the minimization process is supplied, hindering verification of the derivation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for highlighting this critical issue with the central illustration. We agree that the claim as currently phrased in the abstract does not hold under standard statistical distances, and we will revise the manuscript accordingly to incorporate the necessary clarifications and constraints on the minimization process.

read point-by-point responses
  1. Referee: [Abstract] Abstract, illustration paragraph: the claim that any P/poly-computable μ with polynomially-bounded max-entropy that minimizes error to a PRG distribution must be close to uniform is false under standard statistical distances. The PRG distribution itself satisfies the abstraction (it is P/poly-computable with seed-length entropy), achieves minimal error 0 to itself, and is statistically far from uniform by definition of a PRG. This counterexample is load-bearing for the illustration and requires either a non-statistical error notion or an unstated constraint on how minimization occurs (e.g., via samples from an unknown source).

    Authors: We agree that the stated claim is false under statistical distance, as the PRG distribution itself is a counterexample that satisfies the P/poly + bounded-entropy abstraction while achieving zero error to itself. The original intention of the illustration was to show that an efficient machine learning procedure, when trained to minimize error on samples drawn from a target distribution that is computationally indistinguishable from uniform (as with a PRG), must produce an output distribution close to uniform, since the learner cannot exploit the PRG's hidden structure. To address the referee's point, we will revise the abstract and the relevant illustration section to explicitly specify that minimization occurs via an efficient learning algorithm trained on samples from the unknown target distribution, rather than an arbitrary μ satisfying the abstraction properties. We will also clarify that the error notion is with respect to the outcome of this sample-based training process. This incorporates the suggested constraint and removes the counterexample. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on external cryptographic assumptions

full rationale

The paper abstracts machine learning as P/poly-computable distributions with polynomially bounded max-entropy and illustrates a property using cryptographic pseudorandom generators. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described chain. The central claim is presented as following from standard crypto assumptions (indistinguishability from uniform) rather than reducing to the paper's own inputs or definitions by construction. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The argument rests on standard complexity assumptions about the existence of cryptographic pseudorandom generators and the definition of P/poly; no new free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Cryptographic pseudorandom generators exist and produce distributions indistinguishable from uniform by polynomial-size circuits.
    Invoked to construct the target distribution against which the learned μ is compared.
  • domain assumption P/poly captures the computational power relevant to machine learning models.
    Used to restrict the class of distributions that ML is allowed to produce.

pith-pipeline@v0.9.0 · 5406 in / 1283 out tokens · 26297 ms · 2026-05-10T18:54:22.902253+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 1 internal anchor

  1. [1]

    Claude Opus 4.6 system card

    Anthropic. Claude Opus 4.6 system card. Technical report, Anthropic, February 2026

  2. [2]

    Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

    Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency ( FAccT ) , pages 610--623, 2021

  3. [3]

    Cover and Joy A

    Thomas M. Cover and Joy A. Thomas. Elements of Information Theory . Wiley-Interscience, Hoboken, NJ, 2 edition, 2006

  4. [4]

    L. Fortnow. Fifty years of P vs.\ NP and the possibility of the impossible. Communications of the ACM , 65(1):76–85, January 2022

  5. [5]

    Mark Gold

    E. Mark Gold. Language identification in the limit. Information and Control , 10(5):447--474, 1967

  6. [6]

    Foundations of Cryptography: Volume 1, Basic Tools

    Oded Goldreich. Foundations of Cryptography: Volume 1, Basic Tools . Cambridge University Press, 2001

  7. [7]

    H stad, R

    J. H stad, R. Impagliazzo, L. Levin, and M. Luby. A pseudorandom generator from any one-way function. SIAM Journal on Computing , 28(4):1364--1396, August 1999

  8. [8]

    Learning in pessiland via inductive inference

    Shuichi Hirahara and Mikito Nanashima. Learning in pessiland via inductive inference. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS) , pages 447--457, 2023

  9. [9]

    Impagliazzo

    R. Impagliazzo. A personal view of average-case complexity theory. In Proceedings of the 10th Annual Conference on Structure in Complexity Theory , pages 134--147. IEEE Computer Society Press, 1995

  10. [10]

    Cryptographic limitations on learning boolean formulae and finite automata

    Michael Kearns and Leslie Valiant. Cryptographic limitations on learning boolean formulae and finite automata. J. ACM , 41(1):67–95, January 1994

  11. [11]

    Recherches, 1 o , sur l'int \'e gration des \'e quations diff \'e rentielles aux diff \'e rences finies, et sur leur usage dans la th \'e orie des hasards

    Pierre-Simon Laplace. Recherches, 1 o , sur l'int \'e gration des \'e quations diff \'e rentielles aux diff \'e rences finies, et sur leur usage dans la th \'e orie des hasards. In Oeuvres compl \`e tes de Laplace , volume 8, pages 69--197. Gauthier-Villars, Paris, 1891. Originally published 1776; written 1773. Translation of pp. 144--145 in Charles Couls...

  12. [12]

    An introduction to Kolmogorov complexity and its applications

    Ming Li and Paul Vit \'a nyi. An introduction to Kolmogorov complexity and its applications . Springer Science & Business Media, 2009

  13. [13]

    Modeling by shortest data description

    Jorma Rissanen. Modeling by shortest data description. Automatica , 14(5):465--471, 1978

  14. [14]

    Claude E. Shannon. A mathematical theory of communication. Bell System Technical Journal , 27(3):379--423, 1948

  15. [15]

    Solomonoff

    Ray J. Solomonoff. A formal theory of inductive inference, Part I . Information and Control , 7(1):1--22, 1964

  16. [16]

    A. Turing. On computable numbers, with an application to the Entscheidungsproblem . Proceedings of the London Mathematical Society , 42:230--265, 1936

  17. [17]

    L. Valiant. A theory of the learnable. Communications of the ACM , 27(11):1134--1142, 1984

  18. [18]

    Vitanyi and Ming Li

    P.M.B. Vitanyi and Ming Li. Minimum description length induction, bayesianism, and kolmogorov complexity. IEEE Transactions on Information Theory , 46(2):446--464, 2000

  19. [19]

    Attention Is All You Need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems ( NeurIPS ) , volume 30, 2017. arXiv:1706.03762

  20. [20]

    Chain-of-thought prompting elicits reasoning in large language models

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, Fei Xia, Ed Chi, Quoc V Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems , volume 35, pages 24824--24837. Curran Ass...

  21. [21]

    A. Yao. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science , pages 80--91. IEEE, New York, 1982