Learning mathsf{AC}⁰ Under Graphical Models
Pith reviewed 2026-05-10 19:41 UTC · model grok-4.3
The pith
Quasipolynomial-time algorithms learn AC^0 circuits from samples drawn from graphical models with strong spatial mixing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give quasipolynomial-time algorithms for learning AC^0 substantially beyond the product setting, when the inputs come from any graphical model with polynomial growth that exhibits strong spatial mixing. The main technical challenge is in giving a workaround to Fourier analysis, which we do by showing how new sampling algorithms allow us to transfer statements about low-degree polynomial approximation under the uniform setting to graphical models. Our approach is general enough to extend to other well-studied function classes, like monotone functions and halfspaces.
What carries the argument
Sampling algorithms that transfer low-degree polynomial approximation statements from the uniform distribution to graphical models with polynomial growth and strong spatial mixing.
If this is right
- The same quasipolynomial algorithms apply directly to learning monotone functions and halfspaces under the same graphical models.
- The runtime bound remains quasipolynomial for every graphical model satisfying the stated growth and mixing conditions.
- Fourier analysis is bypassed entirely by routing through the new sampling transfer.
- Any prior learning result that relied only on uniform low-degree approximations can be lifted to these graphical models via the same mechanism.
Where Pith is reading between the lines
- The transfer technique could be checked on simple grid or tree graphical models to measure actual sample complexity in small cases.
- Other learning problems that use low-degree approximations might be extendable to correlated inputs with similar sampling arguments.
- Realistic data sources with local spatial correlations, such as image patches or sensor networks, become candidate settings for these guarantees.
- Weakening the strong spatial mixing requirement while keeping polynomial growth could enlarge the class of allowable distributions.
Load-bearing premise
The input distribution must be generated by a graphical model that has polynomial growth and exhibits strong spatial mixing.
What would settle it
A concrete graphical model with polynomial growth and strong spatial mixing on which the transferred low-degree approximator for some AC^0 circuit fails to achieve the same accuracy as in the uniform case after quasipolynomial samples.
read the original abstract
In a landmark result, Linial, Mansour and Nisan (J. ACM 1993) gave a quasipolynomial-time algorithm for learning constant-depth circuits given labeled i.i.d. samples under the uniform distribution. Their work has had a deep and lasting legacy in computational learning theory, in particular introducing the $\textit{low-degree algorithm}$. However, an important critique of many results and techniques in the area is the reliance on product structure, which is unlikely to hold in realistic settings. Obtaining similar learning guarantees for more natural correlated distributions has been a longstanding challenge in the field. In particular, we give quasipolynomial-time algorithms for learning $\mathsf{AC}^0$ substantially beyond the product setting, when the inputs come from any graphical model with polynomial growth that exhibits strong spatial mixing. The main technical challenge is in giving a workaround to Fourier analysis, which we do by showing how new sampling algorithms allow us to transfer statements about low-degree polynomial approximation under the uniform setting to graphical models. Our approach is general enough to extend to other well-studied function classes, like monotone functions and halfspaces.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims quasipolynomial-time algorithms for learning AC^0 (and extensions to monotone functions and halfspaces) when examples are drawn from any graphical model with polynomial growth that exhibits strong spatial mixing. The approach develops new sampling algorithms to transfer low-degree polynomial approximation guarantees from the uniform-distribution case (building on Linial-Mansour-Nisan) to these correlated distributions, thereby bypassing direct Fourier analysis under the target measure.
Significance. If the transfer via sampling succeeds with the stated runtime and error bounds, the result would meaningfully broaden the scope of low-degree learning beyond product distributions, addressing a longstanding critique in the field. The explicit identification of polynomial-growth + strong spatial mixing as the sufficient condition is a concrete, testable advance.
minor comments (3)
- [Introduction / §3] The abstract and introduction state that the sampling algorithms 'transfer statements about low-degree polynomial approximation' while preserving quasipolynomial runtime, but do not exhibit the concrete error bounds or the precise dependence on the mixing parameters; a short lemma or theorem statement making this dependence explicit would strengthen the central claim.
- [Preliminaries] Notation for the graphical model (e.g., the precise definition of 'polynomial growth' and the strong spatial mixing rate) is introduced informally; a self-contained definition in §2 would aid readers unfamiliar with the statistical-physics literature.
- [§5] The extension paragraph for monotone functions and halfspaces is stated at high level; a brief indication of which parts of the sampling argument carry over unchanged versus which require new analysis would clarify the generality.
Simulated Author's Rebuttal
We thank the referee for their positive summary of the paper and for recommending minor revision. We appreciate the recognition that the results meaningfully extend low-degree learning beyond product distributions by identifying polynomial growth and strong spatial mixing as sufficient conditions.
Circularity Check
No significant circularity; transfers from independent uniform-case result
full rationale
The paper's core contribution is a sampling-based transfer of low-degree approximation guarantees from the uniform distribution (Linial-Mansour-Nisan 1993, an external result) to graphical models satisfying polynomial growth and strong spatial mixing. The weakest assumption is stated explicitly as the setting, not derived from the target learning algorithm. No equations or steps reduce the claimed quasipolynomial learning guarantee to a fit, self-definition, or load-bearing self-citation chain. The derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
On learning monotone boolean functions
[BBL98] Avrim Blum, Carl Burch, and John Langford. On learning monotone boolean functions. In39th Annual Symposium on Foundations of Computer Science, FOCS 1998, Palo Alto, California, USA, November 8-11, 1998, pages 408–415. IEEE Computer Society,
work page 1998
-
[2]
On Mixing of Markov Chains: Coupling, Spectral Independence, and Entropy Fac- torization
49 [BCC+22] Antonio Blanca, Pietro Caputo, Zongchen Chen, Daniel Parisi, Daniel Stefankovic, and Eric Vigoda. On Mixing of Markov Chains: Coupling, Spectral Independence, and Entropy Fac- torization. InProceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 3670–3692. SIAM,
work page 2022
-
[3]
[BCO+15] Eric Blais, Clément L. Canonne, Igor C. Oliveira, Rocco A. Servedio, and Li-Yang Tan. Learning Circuits with few Negations. InApproximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2015), volume 40 ofLeibniz International Proceedings in Informatics (LIPIcs), pages 512–527, Dagstuhl, Germany,
work page 2015
-
[4]
Efficiently Learning Ising Models on Arbitrary Graphs
[Bre15] Guy Bresler. Efficiently Learning Ising Models on Arbitrary Graphs. InProceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 771–782. ACM,
work page 2015
-
[5]
Localization Schemes: A Framework for Proving Mixing Bounds for Markov Chains
[CE22] Yuansi Chen and Ronen Eldan. Localization Schemes: A Framework for Proving Mixing Bounds for Markov Chains. In63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, pages 110–122. IEEE,
work page 2022
-
[6]
Modified log-Sobolev Inequalities for Strongly Log-Concave Distributions
[CGM19] Mary Cryan, Heng Guo, and Giorgos Mousa. Modified log-Sobolev Inequalities for Strongly Log-Concave Distributions. In60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 1358–1370. IEEE Computer Society,
work page 2019
- [7]
-
[8]
Efficient Parallel Ising Samplers via Localization Schemes
[CLYZ25] Xiaoyu Chen, Hongyang Liu, Yitong Yin, and Xinyuan Zhang. Efficient Parallel Ising Samplers via Localization Schemes. InApproximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2025,, volume 353 ofLIPIcs, pages 46:1–46:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
work page 2025
-
[9]
[DGJ+10] Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco A Servedio, and Emanuele Viola
ISSN: 1938-7228. [DGJ+10] Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco A Servedio, and Emanuele Viola. Bounded independence fools halfspaces.SIAM Journal on Computing, 39(8):3441–3462,
work page 1938
-
[10]
Estimating Ising Models in Total Variation Distance.CoRR, abs/2511.21008,
[DKY25] Constantinos Daskalakis, Anthimos Vardis Kandiros, and Rui Yao. Estimating Ising Models in Total Variation Distance.CoRR, abs/2511.21008,
-
[11]
A Simple Parallel and Distributed Sampling Tech- nique: Local Glauber Dynamics
[FG18] Manuela Fischer and Mohsen Ghaffari. A Simple Parallel and Distributed Sampling Tech- nique: Local Glauber Dynamics. In32nd International Symposium on Distributed Computing, DISC 2018, volume 121 ofLIPIcs, pages 26:1–26:11. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
work page 2018
-
[12]
[FJS91] Merrick L. Furst, Jeffrey C. Jackson, and Sean W. Smith. Improved learning ofAC 0 functions. InProceedings of the Fourth Annual Workshop on Computational Learning Theory, COLT 1991, pages 317–325. Morgan Kaufmann,
work page 1991
- [13]
-
[14]
[GMM25] Jason Gaitonde, Ankur Moitra, and Elchanan Mossel. Better Models and Algorithms for Learning Ising Models from Dynamics.arXiv preprint arXiv:2507.15173,
-
[15]
Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications
[HKM17] Linus Hamilton, Frederic Koehler, and Ankur Moitra. Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications. InAdvances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pages 2463–2472,
work page 2017
-
[16]
[Kan14a] Daniel M. Kane. The average sensitivity of an intersection of half spaces. InSymposium on Theory of Computing, STOC 2014, pages 437–440. ACM,
work page 2014
-
[17]
[KLMM24] Frederic Koehler, Noam Lifshitz, Dor Minzer, and Elchanan Mossel. Influences in Mixing Measures. InProceedings of the 56th Annual ACM Symposium on Theory of Computing, STOC 2024, pages 527–536. ACM,
work page 2024
-
[18]
[KM15] Varun Kanade and Elchanan Mossel. MCMC Learning. InProceedings of The 28th Conference on Learning Theory, COLT 2015, volume 40 ofJMLR Workshop and Conference Proceedings, pages 1101–1128. JMLR.org,
work page 2015
-
[19]
[KM17] Adam R. Klivans and Raghu Meka. Learning Graphical Models Using Multiplicative Weights. In58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, pages 343–354. IEEE Computer Society,
work page 2017
-
[20]
Smoothed Agnostic Learning of Halfspaces over the Hypercube
[KM25] Yiwen Kou and Raghu Meka. Smoothed Agnostic Learning of Halfspaces over the Hypercube. CoRR, abs/2511.17782,
-
[21]
Learning and Smoothed Analysis
[KST09] Adam Tauman Kalai, Alex Samorodnitsky, and Shang-Hua Teng. Learning and Smoothed Analysis. In50th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2009, pages 395–404. IEEE Computer Society,
work page 2009
-
[22]
[LMRW24] Kuikui Liu, Sidhanth Mohanty, Amit Rajaraman, and David X. Wu. Fast Mixing in Sparse Random Ising Models. In65th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2024, pages 120–128. IEEE,
work page 2024
-
[23]
Lectures on Glauber dynamics for discrete spin models
[Mar04] Fabio Martinelli. Lectures on Glauber dynamics for discrete spin models. InLectures on probability theory and statistics: Ecole d’eté de probailités de saint-flour xxvii-1997, pages 93–191. Springer,
work page 1997
- [24]
-
[25]
Tight bounds on the fourier spectrum of AC0
[Tal17] Avishay Tal. Tight bounds on the fourier spectrum of AC0. In Ryan O’Donnell, editor,32nd Computational Complexity Conference, CCC 2017, Riga, Latvia, July 6-9, 2017, LIPIcs, pages 15:1–15:31. Schloss Dagstuhl - Leibniz-Zentrum für Informatik,
work page 2017
- [26]
-
[27]
55 [WSD19] Shanshan Wu, Sujay Sanghavi, and Alexandros G. Dimakis. Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models. InAdvances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, pages 8069–8079,
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.