Certification from Examples is Hard for Circuits and Transformers under Minimal Overparametrization

Artur Back de Luca; Kimon Fountoulakis

arxiv: 2605.22964 · v1 · pith:6QGMGU6Dnew · submitted 2026-05-21 · 💻 cs.LG

Certification from Examples is Hard for Circuits and Transformers under Minimal Overparametrization

Artur Back de Luca , Kimon Fountoulakis This is my paper

Pith reviewed 2026-05-25 05:59 UTC · model grok-4.3

classification 💻 cs.LG

keywords exact certificationoverparametrizationthreshold circuitstransformerscertificate sizeapproximate certificationneural network verification

0 comments

The pith

Adding one extra gate forces exponential certificate sizes for exact certification of threshold circuits and transformers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that exact certification, which requires proving a learned hypothesis equals the target function, can demand exponentially many labeled examples even under minimal overparametrization. For threshold circuits of depth at least 2, a single additional gate creates this exponential barrier in the input dimension. An analogous result applies to log-precision transformers when only constant architectural overhead is permitted. This matters because deployed networks on reasoning tasks need exactness guarantees to avoid hidden inconsistencies that average accuracy misses, yet slight extra capacity can render the certification process intractable.

Core claim

Exact certification asks for the smallest set of labeled examples sufficient to prove a hypothesis is identical to the target function. The paper proves that threshold circuits of depth greater than or equal to 2 with one extra gate require certificate sizes exponential in the input dimension. It establishes the same hardness for log-precision transformers under only constant architectural overhead. It further shows that approximate certification allowing polynomially many mistakes still needs exponential certificates, while constant relative-error bounds can conceal exponentially many mistakes.

What carries the argument

Certificate size, the minimal number of labeled examples needed to certify that a hypothesis equals the target function exactly.

If this is right

Threshold circuits of depth at least 2 with one extra gate require certificate sizes exponential in input dimension for exact certification.
Log-precision transformers with constant architectural overhead require exponential certificate sizes.
Approximate certification that allows polynomially many mistakes still demands exponentially large certificates.
Constant relative-error guarantees can conceal exponentially many mistakes in the hypothesis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Certification procedures may need to measure overparametrization more carefully to remain tractable in practice.
Training methods that constrain models closer to minimal capacity could yield smaller certificates.
Alternative notions of agreement short of exact identity might make certification feasible for overparametrized models.

Load-bearing premise

The hardness depends on defining minimal overparametrization as precisely one extra gate or constant overhead, and on requiring proof of exact identity rather than approximate agreement.

What would settle it

A threshold circuit of depth 2 with one extra gate for which a polynomial number of examples suffices to certify exact identity to the target function.

Figures

Figures reproduced from arXiv: 2605.22964 by Artur Back de Luca, Kimon Fountoulakis.

**Figure 2.** Figure 2: Surviving deceivers under uniformly sampled certificate candidates. [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Remaining constructed threshold-circuit deceivers under optimally chosen certificate candidates. [PITH_FULL_IMAGE:figures/full_fig_p033_3.png] view at source ↗

**Figure 4.** Figure 4: Distribution of validation and test errors across accepted models. Each point is an accepted addition [PITH_FULL_IMAGE:figures/full_fig_p035_4.png] view at source ↗

**Figure 5.** Figure 5: Easy-versus-hard certification in the two finite-class experiments. Left: the restricted threshold [PITH_FULL_IMAGE:figures/full_fig_p038_5.png] view at source ↗

read the original abstract

As state-of-the-art neural networks are deployed on reasoning and algorithmic tasks, exactness guarantees become increasingly important. However, high average-case accuracy can still mask inconsistent behaviors. This motivates exact certification, which asks for the smallest set of labeled examples needed to certify that a learned hypothesis equals the target. We show that while some hypotheses are easy to certify, even minimal overparametrization can make certification exponentially hard across several hypothesis classes. For threshold circuits of depth $\ge 2$, adding a single extra gate can force certificate sizes exponential in the input dimension. We show an analogous hardness result for log-precision Transformers with only constant architectural overhead. We also characterize approximate certification, showing that allowing only polynomially many mistakes still requires exponentially large certificates, whereas constant relative-error guarantees can hide exponentially many mistakes. Empirically, we study certification for constructed circuits and trained Transformers for recognizing binary addition. While the constructed circuits instantiate the exponential barrier for certification, the trained Transformer analysis shows that imperfect models can evade detection by large uniformly sampled certificate candidates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper proves exponential lower bounds on exact certification size for threshold circuits and log-precision transformers once you add just one extra gate or constant overhead.

read the letter

The core result is that exact certification from examples becomes exponentially hard in the input dimension for depth-2 threshold circuits with one extra gate and for log-precision transformers with only constant extra architecture. The same holds for approximate certification if you allow only polynomially many mistakes. Constant relative error, by contrast, can conceal exponentially many errors. They also run a small empirical check on constructed circuits and trained transformers for binary addition, where the trained models sometimes evade large uniform certificate sets.

Referee Report

0 major / 3 minor

Summary. The paper proves that exact certification from examples is hard for threshold circuits of depth ≥2 and for log-precision Transformers under minimal overparametrization: adding one extra gate (or constant overhead) forces certificate sizes exponential in the input dimension for exact identity to the target function. It further shows that allowing polynomially many mistakes still requires exponential certificates, while constant relative-error guarantees can conceal exponentially many mistakes. Empirical results on constructed circuits and trained Transformers for binary addition illustrate the barrier and show that imperfect models can evade large uniform certificate candidates.

Significance. If the lower bounds hold, the results are significant because they establish concrete computational barriers to exact certification even under minimal overparametrization, with direct implications for verification of neural networks on reasoning tasks. The separation between exact, polynomially approximate, and constant-relative-error regimes is a useful contribution. The empirical instantiation on binary addition provides concrete examples of both the hardness and practical evasion.

minor comments (3)

[Abstract, §2] Abstract and §2: the precise definition of 'minimal overparametrization' (one extra gate or constant overhead) and the model of exact certification (identity vs. approximate agreement) should be stated with a forward reference to the formal definitions used in the hardness proofs.
[§5] §5 (empirical section): clarify whether the certificate candidates for the trained Transformer are drawn uniformly at random and how the observed evasion relates to the theoretical exponential lower bound (e.g., does it only apply to perfect models?).
Notation for certificate size (e.g., the function mapping hypothesis to minimal certificate length) should be introduced once and used consistently; a short table summarizing the exact vs. approximate regimes would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary, recognition of the significance of the lower bounds, and recommendation of minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper establishes theoretical hardness lower bounds showing that minimal overparametrization forces exponential certificate sizes for exact certification in depth-2 threshold circuits and log-precision transformers. These results rely on explicit constructions and reductions rather than any fitted parameters, self-referential definitions, or load-bearing self-citations that collapse the claimed results to the inputs by construction. The derivation chain is self-contained as standard complexity-theoretic arguments with no reduction of predictions to fitted quantities or ansatzes smuggled via prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract only; no free parameters, axioms, or invented entities are extractable from the provided text.

pith-pipeline@v0.9.0 · 5712 in / 1027 out tokens · 36896 ms · 2026-05-25T05:59:24.736648+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Learning regular sets from queries and counterexamples.Information and computation, 75(2):87–106, 1987

Dana Angluin. Learning regular sets from queries and counterexamples.Information and computation, 75(2):87–106, 1987

work page 1987
[2]

Queries and concept learning.Machine learning, 2(4):319–342, 1988

Dana Angluin. Queries and concept learning.Machine learning, 2(4):319–342, 1988

work page 1988
[3]

Negative results for equivalence queries.Machine Learning, 5(2):121–150, 1990

Dana Angluin. Negative results for equivalence queries.Machine Learning, 5(2):121–150, 1990

work page 1990
[4]

On exact specification by examples

Martin Anthony, Graham Brightwell, Dave Cohen, and John Shawe-Taylor. On exact specification by examples. InFifth annual workshop on computational learning theory, pages 311–318, 1992

work page 1992
[5]

Polynomial threshold functions, hyperplane arrangements, and random tensors.SIAM Journal on Mathematics of Data Science, 1(4):699–729, 2019

Pierre Baldi and Roman Vershynin. Polynomial threshold functions, hyperplane arrangements, and random tensors.SIAM Journal on Mathematics of Data Science, 1(4):699–729, 2019

work page 2019
[6]

Bshouty, Richard Cleve, Ricard Gavaldà, Sampath Kannan, and Christino Tamon

Nader H. Bshouty, Richard Cleve, Ricard Gavaldà, Sampath Kannan, and Christino Tamon. Oracles and queries that are sufficient for exact learning.Journal of Computer and System Sciences, 52(3):421–433, 1996

work page 1996
[7]

Bounds in query learning

Hunter Chase and James Freitag. Bounds in query learning. InProceedings of Thirty Third Conference on Learning Theory, volume 125 ofProceedings of Machine Learning Research, pages 1142–1160. PMLR, 2020

work page 2020
[8]

On the recursive teaching dimension of VC classes

Xi Chen, Yu Cheng, and Bo Tang. On the recursive teaching dimension of VC classes. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[9]

Transformers in uniform TC0.Transactions on Machine Learning Research, 2025

David Chiang. Transformers in uniform TC0.Transactions on Machine Learning Research, 2025. ISSN 2835–8856

work page 2025
[10]

Learning to add, multiply, and execute algorithmic instructions exactly with neural networks

Artur Back de Luca, George Giapitzakis, and Kimon Fountoulakis. Learning to add, multiply, and execute algorithmic instructions exactly with neural networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

work page 2026
[11]

Recursive teaching dimension, VC-dimension and sample compression.Journal of Machine Learning Research, 15(89):3107–3131, 2014

Thorsten Doliwa, Gaojian Fan, Hans Ulrich Simon, and Sandra Zilles. Recursive teaching dimension, VC-dimension and sample compression.Journal of Machine Learning Research, 15(89):3107–3131, 2014

work page 2014
[12]

Preference-based teaching

Ziyuan Gao, Christoph Ries, Hans Simon, and Sandra Zilles. Preference-based teaching. In29th Annual Conference on Learning Theory, volume 49 ofProceedings of Machine Learning Research, pages 971–997. PMLR, 2016

work page 2016
[13]

Goldman and M.J

S.A. Goldman and M.J. Kearns. On the complexity of teaching.Journal of Computer and System Sciences, 50(1):20–31, 1995. ISSN 0022-0000

work page 1995
[14]

Beyond statistical learning: Exact learning is essential for general intelligence.arXiv preprint arXiv:2506.23908, 2025

András György, Tor Lattimore, Nevena Lazić, and Csaba Szepesvári. Beyond statistical learning: Exact learning is essential for general intelligence.arXiv preprint arXiv:2506.23908, 2025

work page arXiv 2025
[15]

Formal language recognition by hard attention transform- ers: Perspectives from circuit complexity.Transactions of the Association for Computational Linguistics, 10:800–810, 2022

Yiding Hao, Dana Angluin, and Robert Frank. Formal language recognition by hard attention transform- ers: Perspectives from circuit complexity.Transactions of the Association for Computational Linguistics, 10:800–810, 2022

work page 2022
[16]

Generalized teaching dimensions and the query complexity of learning

Tibor Hegedüs. Generalized teaching dimensions and the query complexity of learning. InProceedings of the Eighth Annual Conference on Computational Learning Theory, pages 108––117, 1995. doi: 10.1145/225298.225311

work page doi:10.1145/225298.225311 1995
[17]

Exact learning of DNF formulas using DNF hypotheses.Journal of Computer and System Sciences, 70(4):435–470, 2005

Lisa Hellerstein and Vijay Raghavan. Exact learning of DNF formulas using DNF hypotheses.Journal of Computer and System Sciences, 70(4):435–470, 2005. 11

work page 2005
[18]

How many queries are needed to learn?Journal of the ACM (JACM), 43(5):840–862, 1996

Lisa Hellerstein, Krishnan Pillaipakkamnatt, Vijay Raghavan, and Dawn Wilkins. How many queries are needed to learn?Journal of the ACM (JACM), 43(5):840–862, 1996

work page 1996
[19]

Quadratic upper bound for recursive teaching dimension of finite vc classes

Lunjia Hu, Ruihan Wu, Tianhong Li, and Liwei Wang. Quadratic upper bound for recursive teaching dimension of finite vc classes. InProceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 1147–1156. PMLR, 2017

work page 2017
[20]

Frontier LLMs still struggle with simple reasoning tasks.arXiv preprint arXiv:2507.07313, 2025

Alan Malek, Jiawei Ge, Nevena Lazic, Chi Jin, András György, and Csaba Szepesvári. Frontier LLMs still struggle with simple reasoning tasks.arXiv preprint arXiv:2507.07313, 2025

work page arXiv 2025
[21]

A logic for expressing log-precision transformers

William Merrill and Ashish Sabharwal. A logic for expressing log-precision transformers. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023
[22]

The parallelism tradeoff: Limitations of log-precision transformers

William Merrill and Ashish Sabharwal. The parallelism tradeoff: Limitations of log-precision transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 2023

work page 2023
[23]

The expressive power of transformers with chain of thought

William Merrill and Ashish Sabharwal. The expressive power of transformers with chain of thought. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[24]

Saturated transformers are constant-depth threshold circuits.Transactions of the Association for Computational Linguistics, 10:843–856, 2022

William Merrill, Ashish Sabharwal, and Noah A Smith. Saturated transformers are constant-depth threshold circuits.Transactions of the Association for Computational Linguistics, 10:843–856, 2022

work page 2022
[25]

Alice in wonderland: Simple tasks reveal severe generalization and basic reasoning deficits in state-of-the-art large language models

Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, and Jenia Jitsev. Alice in wonderland: Simple tasks reveal severe generalization and basic reasoning deficits in state-of-the-art large language models. InNeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning, 2024

work page 2024
[26]

Exact learning of arithmetic with differentiable agents

Hristo Papazov, Francesco D’Angelo, and Nicolas Flammarion. Exact learning of arithmetic with differentiable agents. InThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025, 2025

work page 2025
[27]

On the limits of proper learnability of subclasses of DNF formulas

Krishnan Pillaipakkamnatt and Vijay Raghavan. On the limits of proper learnability of subclasses of DNF formulas. InProceedings of the seventh annual conference on Computational learning theory, pages 118–129, 1994

work page 1994
[28]

On the limits of efficient teachability.Information Processing Letters, 79(6):267–272, 2001

Rocco A Servedio. On the limits of efficient teachability.Information Processing Letters, 79(6):267–272, 2001

work page 2001
[29]

Teachabilityincomputationallearning.New Generation Computing, 8:337–347, 1991

AyumiShinoharaandSatoruMiyano. Teachabilityincomputationallearning.New Generation Computing, 8:337–347, 1991. doi: 10.1007/BF03037091

work page doi:10.1007/bf03037091 1991
[30]

Models of cooperative teaching and learning.Journal of Machine Learning Research, 12:349–384, 2011

Sandra Zilles, Steffen Lange, Robert Holte, and Martin Zinkevich. Models of cooperative teaching and learning.Journal of Machine Learning Research, 12:349–384, 2011. 12 A Generic certification tools A.1 Disjoint-block principle Proposition A.1(Restatement of Proposition 4.1).Let X be a finite domain, letH⊆{h:X→{0, 1}}, and letf⋆∈H. LetT be a finite index ...

work page 2011
[31]

scan inputs in increasing binary order until finding the first disagreement point among the surviving hypotheses

work page
[32]

keep the smaller label side

work page
[33]

36 The process stops when one hypothesis remains

if the split is tied, keep label0. 36 The process stops when one hypothesis remains. Thus, for eachn, the procedure returns a target in the base class and a certificate for that target inside the base class. In the worst case, halving uses logarithmically many rounds in the size of the semantic class. Table 1 reports the exact semantic class sizes and the...

work page

[1] [1]

Learning regular sets from queries and counterexamples.Information and computation, 75(2):87–106, 1987

Dana Angluin. Learning regular sets from queries and counterexamples.Information and computation, 75(2):87–106, 1987

work page 1987

[2] [2]

Queries and concept learning.Machine learning, 2(4):319–342, 1988

Dana Angluin. Queries and concept learning.Machine learning, 2(4):319–342, 1988

work page 1988

[3] [3]

Negative results for equivalence queries.Machine Learning, 5(2):121–150, 1990

Dana Angluin. Negative results for equivalence queries.Machine Learning, 5(2):121–150, 1990

work page 1990

[4] [4]

On exact specification by examples

Martin Anthony, Graham Brightwell, Dave Cohen, and John Shawe-Taylor. On exact specification by examples. InFifth annual workshop on computational learning theory, pages 311–318, 1992

work page 1992

[5] [5]

Polynomial threshold functions, hyperplane arrangements, and random tensors.SIAM Journal on Mathematics of Data Science, 1(4):699–729, 2019

Pierre Baldi and Roman Vershynin. Polynomial threshold functions, hyperplane arrangements, and random tensors.SIAM Journal on Mathematics of Data Science, 1(4):699–729, 2019

work page 2019

[6] [6]

Bshouty, Richard Cleve, Ricard Gavaldà, Sampath Kannan, and Christino Tamon

Nader H. Bshouty, Richard Cleve, Ricard Gavaldà, Sampath Kannan, and Christino Tamon. Oracles and queries that are sufficient for exact learning.Journal of Computer and System Sciences, 52(3):421–433, 1996

work page 1996

[7] [7]

Bounds in query learning

Hunter Chase and James Freitag. Bounds in query learning. InProceedings of Thirty Third Conference on Learning Theory, volume 125 ofProceedings of Machine Learning Research, pages 1142–1160. PMLR, 2020

work page 2020

[8] [8]

On the recursive teaching dimension of VC classes

Xi Chen, Yu Cheng, and Bo Tang. On the recursive teaching dimension of VC classes. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[9] [9]

Transformers in uniform TC0.Transactions on Machine Learning Research, 2025

David Chiang. Transformers in uniform TC0.Transactions on Machine Learning Research, 2025. ISSN 2835–8856

work page 2025

[10] [10]

Learning to add, multiply, and execute algorithmic instructions exactly with neural networks

Artur Back de Luca, George Giapitzakis, and Kimon Fountoulakis. Learning to add, multiply, and execute algorithmic instructions exactly with neural networks. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

work page 2026

[11] [11]

Recursive teaching dimension, VC-dimension and sample compression.Journal of Machine Learning Research, 15(89):3107–3131, 2014

Thorsten Doliwa, Gaojian Fan, Hans Ulrich Simon, and Sandra Zilles. Recursive teaching dimension, VC-dimension and sample compression.Journal of Machine Learning Research, 15(89):3107–3131, 2014

work page 2014

[12] [12]

Preference-based teaching

Ziyuan Gao, Christoph Ries, Hans Simon, and Sandra Zilles. Preference-based teaching. In29th Annual Conference on Learning Theory, volume 49 ofProceedings of Machine Learning Research, pages 971–997. PMLR, 2016

work page 2016

[13] [13]

Goldman and M.J

S.A. Goldman and M.J. Kearns. On the complexity of teaching.Journal of Computer and System Sciences, 50(1):20–31, 1995. ISSN 0022-0000

work page 1995

[14] [14]

Beyond statistical learning: Exact learning is essential for general intelligence.arXiv preprint arXiv:2506.23908, 2025

András György, Tor Lattimore, Nevena Lazić, and Csaba Szepesvári. Beyond statistical learning: Exact learning is essential for general intelligence.arXiv preprint arXiv:2506.23908, 2025

work page arXiv 2025

[15] [15]

Formal language recognition by hard attention transform- ers: Perspectives from circuit complexity.Transactions of the Association for Computational Linguistics, 10:800–810, 2022

Yiding Hao, Dana Angluin, and Robert Frank. Formal language recognition by hard attention transform- ers: Perspectives from circuit complexity.Transactions of the Association for Computational Linguistics, 10:800–810, 2022

work page 2022

[16] [16]

Generalized teaching dimensions and the query complexity of learning

Tibor Hegedüs. Generalized teaching dimensions and the query complexity of learning. InProceedings of the Eighth Annual Conference on Computational Learning Theory, pages 108––117, 1995. doi: 10.1145/225298.225311

work page doi:10.1145/225298.225311 1995

[17] [17]

Exact learning of DNF formulas using DNF hypotheses.Journal of Computer and System Sciences, 70(4):435–470, 2005

Lisa Hellerstein and Vijay Raghavan. Exact learning of DNF formulas using DNF hypotheses.Journal of Computer and System Sciences, 70(4):435–470, 2005. 11

work page 2005

[18] [18]

How many queries are needed to learn?Journal of the ACM (JACM), 43(5):840–862, 1996

Lisa Hellerstein, Krishnan Pillaipakkamnatt, Vijay Raghavan, and Dawn Wilkins. How many queries are needed to learn?Journal of the ACM (JACM), 43(5):840–862, 1996

work page 1996

[19] [19]

Quadratic upper bound for recursive teaching dimension of finite vc classes

Lunjia Hu, Ruihan Wu, Tianhong Li, and Liwei Wang. Quadratic upper bound for recursive teaching dimension of finite vc classes. InProceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 1147–1156. PMLR, 2017

work page 2017

[20] [20]

Frontier LLMs still struggle with simple reasoning tasks.arXiv preprint arXiv:2507.07313, 2025

Alan Malek, Jiawei Ge, Nevena Lazic, Chi Jin, András György, and Csaba Szepesvári. Frontier LLMs still struggle with simple reasoning tasks.arXiv preprint arXiv:2507.07313, 2025

work page arXiv 2025

[21] [21]

A logic for expressing log-precision transformers

William Merrill and Ashish Sabharwal. A logic for expressing log-precision transformers. InThirty-seventh Conference on Neural Information Processing Systems, 2023

work page 2023

[22] [22]

The parallelism tradeoff: Limitations of log-precision transformers

William Merrill and Ashish Sabharwal. The parallelism tradeoff: Limitations of log-precision transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 2023

work page 2023

[23] [23]

The expressive power of transformers with chain of thought

William Merrill and Ashish Sabharwal. The expressive power of transformers with chain of thought. In The Twelfth International Conference on Learning Representations, 2024

work page 2024

[24] [24]

Saturated transformers are constant-depth threshold circuits.Transactions of the Association for Computational Linguistics, 10:843–856, 2022

William Merrill, Ashish Sabharwal, and Noah A Smith. Saturated transformers are constant-depth threshold circuits.Transactions of the Association for Computational Linguistics, 10:843–856, 2022

work page 2022

[25] [25]

Alice in wonderland: Simple tasks reveal severe generalization and basic reasoning deficits in state-of-the-art large language models

Marianna Nezhurina, Lucia Cipolina-Kun, Mehdi Cherti, and Jenia Jitsev. Alice in wonderland: Simple tasks reveal severe generalization and basic reasoning deficits in state-of-the-art large language models. InNeurIPS 2024 Workshop on Scientific Methods for Understanding Deep Learning, 2024

work page 2024

[26] [26]

Exact learning of arithmetic with differentiable agents

Hristo Papazov, Francesco D’Angelo, and Nicolas Flammarion. Exact learning of arithmetic with differentiable agents. InThe 5th Workshop on Mathematical Reasoning and AI at NeurIPS 2025, 2025

work page 2025

[27] [27]

On the limits of proper learnability of subclasses of DNF formulas

Krishnan Pillaipakkamnatt and Vijay Raghavan. On the limits of proper learnability of subclasses of DNF formulas. InProceedings of the seventh annual conference on Computational learning theory, pages 118–129, 1994

work page 1994

[28] [28]

On the limits of efficient teachability.Information Processing Letters, 79(6):267–272, 2001

Rocco A Servedio. On the limits of efficient teachability.Information Processing Letters, 79(6):267–272, 2001

work page 2001

[29] [29]

Teachabilityincomputationallearning.New Generation Computing, 8:337–347, 1991

AyumiShinoharaandSatoruMiyano. Teachabilityincomputationallearning.New Generation Computing, 8:337–347, 1991. doi: 10.1007/BF03037091

work page doi:10.1007/bf03037091 1991

[30] [30]

Models of cooperative teaching and learning.Journal of Machine Learning Research, 12:349–384, 2011

Sandra Zilles, Steffen Lange, Robert Holte, and Martin Zinkevich. Models of cooperative teaching and learning.Journal of Machine Learning Research, 12:349–384, 2011. 12 A Generic certification tools A.1 Disjoint-block principle Proposition A.1(Restatement of Proposition 4.1).Let X be a finite domain, letH⊆{h:X→{0, 1}}, and letf⋆∈H. LetT be a finite index ...

work page 2011

[31] [31]

scan inputs in increasing binary order until finding the first disagreement point among the surviving hypotheses

work page

[32] [32]

keep the smaller label side

work page

[33] [33]

36 The process stops when one hypothesis remains

if the split is tied, keep label0. 36 The process stops when one hypothesis remains. Thus, for eachn, the procedure returns a target in the base class and a certificate for that target inside the base class. In the worst case, halving uses logarithmically many rounds in the size of the semantic class. Table 1 reports the exact semantic class sizes and the...

work page