Position: Ideas Should be the Center of Machine Learning Research

Jairo Diaz-Rodriguez

arxiv: 2605.15253 · v1 · pith:RNWOONGVnew · submitted 2026-05-14 · 💻 cs.LG

Position: Ideas Should be the Center of Machine Learning Research

Jairo Diaz-Rodriguez This is my paper

Pith reviewed 2026-05-19 17:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords machine learning researchideas first frameworkbehavioral signaturesbenchmark-driven engineeringtheory-practice gapmechanistic hypothesesresearch equitycomplexity premium

0 comments

The pith

Machine learning research should center on ideas by testing the behavioral signatures they predict in modern models through tailored experiments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that machine learning has split into two unproductive paths: one that chases benchmark scores without seeking understanding and another that develops theories disconnected from actual working systems. It proposes an Ideas First framework that places ideas at the core, judging each idea by the specific behaviors or patterns it should produce inside today's models. These predicted patterns would be checked with experiments built to reveal them clearly, not with runs aimed at topping leaderboards. A sympathetic reader would care because the shift could make research more about genuine mechanisms while letting people without large compute budgets still test ideas rigorously and contribute.

Core claim

The field should adopt an Ideas First framework in which ideas are valued for the behavioral signatures they predict in modern models, and these signatures are tested through tailored experiments designed to detect the relevant patterns rather than to win leaderboards. This shift bridges the gap between theory and practice and promotes equity by removing the complexity premium, enabling rigorous scientific contributions from researchers with modest computational, financial, and human resources. Benchmarks and theorems serve as instruments for testing mechanistic hypotheses rather than as ends in themselves.

What carries the argument

The Ideas First framework, which treats ideas as the central object whose value comes from the behavioral signatures they imply in contemporary models and verifies those signatures via targeted experiments.

If this is right

Benchmarks and theorems become tools for testing mechanistic hypotheses rather than primary goals of research.
Researchers with modest resources can make rigorous contributions by designing small-scale experiments to detect predicted behaviors.
The gap between idealized theory and practical model behavior narrows as ideas are tested directly in modern systems.
Research culture shifts to value understanding of model mechanisms over metric improvements or mathematical abstraction alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This framework might encourage new methods for probing models to detect targeted behaviors without retraining from scratch.
It could extend to other complex systems where ideas need testing through observable patterns rather than full simulations.
Success metrics for papers might move toward clear demonstrations of predicted versus observed behaviors in models.
Funding and evaluation could prioritize experiments that isolate mechanisms over those that scale model size.

Load-bearing premise

Behavioral signatures predicted by ideas can be isolated and reliably detected in complex modern models using tailored experiments that do not require substantial computational resources or risk being confounded by unrelated model behaviors.

What would settle it

An attempt to isolate a specific behavioral signature predicted by a simple idea inside a large modern model that either requires full-scale training runs comparable to leaderboard work or cannot separate the signature from other effects no matter how the experiment is designed.

Figures

Figures reproduced from arXiv: 2605.15253 by Jairo Diaz-Rodriguez.

**Figure 1.** Figure 1: (Top) Current practice often bifurcates into BenchmarkDriven Engineering (optimizing a single metric on large systems, often obscuring mechanism) and Idealized Theory (rigorous proofs in simplified settings that may not transfer to modern models). (Bottom) Our proposed framework links these worlds: An Idea generates a concrete Signature (an observable behavioral commitment ). A Tailored Experiment is the… view at source ↗

**Figure 2.** Figure 2: The framework on a hypothetical example. The Idea (Topic Inertia) predicts a specific Signature: an upward trend in embedding similarity as prompt increases. The Tailored Experiment confirms this signature in Transformer models, while the negative control (RNN) shows no such trend, validating the mechanism. 6.1. Hypothetical Criticisms and the Defense of Scope If this study were submitted under the curren… view at source ↗

read the original abstract

Machine learning research increasingly bifurcates into two disconnected modes: benchmark-driven engineering that prioritizes metrics over understanding, and idealized theory that often fails to transfer to modern systems. In this position paper, we argue that the field focuses too heavily on these endpoints, neglecting the central scientific object: the idea. We propose an Ideas First framework in which ideas are valued for the behavioral signatures they predict in modern models, and these signatures are tested through tailored experiments designed to detect the relevant patterns rather than to win leaderboards. This shift not only bridges the gap between theory and practice but also promotes equity by removing the "complexity premium," enabling rigorous scientific contributions from researchers with modest computational, financial, and human resources. Ultimately, we advocate for a research culture centered on ideas, treating benchmarks and theorems as instruments for testing mechanistic hypotheses rather than as ends in themselves.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that ML research has bifurcated into benchmark-driven engineering and idealized theory that fails to transfer, neglecting ideas as the central scientific object. It proposes an 'Ideas First' framework in which ideas are valued according to the behavioral signatures they predict in modern models; these signatures are tested via tailored experiments designed to detect relevant patterns rather than to optimize leaderboards. The shift is argued to bridge theory and practice while promoting equity by eliminating the 'complexity premium' and enabling contributions from researchers with modest resources.

Significance. If the framework can be operationalized with low-resource experiments that reliably isolate idea-specific behavioral signatures without substantial compute or confounding, the position could meaningfully shift research culture toward mechanistic hypothesis testing and greater accessibility. The manuscript correctly identifies real tensions between leaderboard chasing and transferable understanding, and its equity argument is a substantive strength worth engaging.

major comments (2)

Abstract: the central claim that 'tailored experiments designed to detect the relevant patterns rather than to win leaderboards' can be executed at modest computational, financial, and human cost while isolating idea-specific signatures is load-bearing for both the scientific and equity arguments, yet the manuscript provides no protocols, case studies, or citations demonstrating such isolation in entangled modern models.
Description of the Ideas First framework: the proposal assumes behavioral signatures can be detected without controlled ablations, multiple training runs, or large-scale evaluations to rule out optimization artifacts and data shifts, but offers no concrete design principles or existing methods that achieve this at low resource scale.

minor comments (1)

The manuscript would be strengthened by explicit references to related work in mechanistic interpretability or low-resource evaluation protocols that could serve as starting points for the proposed tailored experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments on our position paper. We value the recognition of the real tensions between benchmark-driven research and transferable understanding, as well as the equity implications. In our response below, we address the major comments point by point and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: Abstract: the central claim that 'tailored experiments designed to detect the relevant patterns rather than to win leaderboards' can be executed at modest computational, financial, and human cost while isolating idea-specific signatures is load-bearing for both the scientific and equity arguments, yet the manuscript provides no protocols, case studies, or citations demonstrating such isolation in entangled modern models.

Authors: We agree that demonstrating the feasibility of low-cost isolation of idea-specific signatures is important for the arguments presented. As this is a position paper, the original manuscript prioritizes the conceptual framework over detailed empirical protocols. However, to strengthen the paper, we have added citations to relevant works in mechanistic interpretability and causal analysis that show how targeted experiments can be conducted with modest resources to test specific behavioral predictions. We have also included a brief outline of design principles for such experiments in a new subsection. This addresses the load-bearing claim without altering the position nature of the paper. revision: yes
Referee: Description of the Ideas First framework: the proposal assumes behavioral signatures can be detected without controlled ablations, multiple training runs, or large-scale evaluations to rule out optimization artifacts and data shifts, but offers no concrete design principles or existing methods that achieve this at low resource scale.

Authors: The framework does not preclude the use of controls where necessary; it emphasizes tailoring experiments to efficiently detect predicted patterns. We acknowledge that the manuscript could benefit from more explicit guidance. In the revision, we have elaborated on the description of the framework by incorporating concrete design principles, such as using minimal interventions like input ablations on representative samples, leveraging pre-trained models for quick tests, and drawing on existing low-resource evaluation methods from the literature. We believe this provides the requested concreteness while maintaining accessibility. revision: yes

Circularity Check

0 steps flagged

No circularity: normative position paper with no derivations or self-referential reductions

full rationale

The manuscript is a position paper proposing an 'Ideas First' framework for ML research. It contains no equations, fitted parameters, predictions, or derivation chains. Claims rest on general observations of benchmark-driven vs. theoretical modes and advocacy for tailored experiments to test behavioral signatures. No self-citations are load-bearing, no ansatzes are smuggled, and no uniqueness theorems or renamings reduce the argument to its own inputs. This is self-contained normative reasoning, consistent with the default non-circular finding for papers lacking mathematical derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the descriptive premise that current research is bifurcated and neglects ideas, plus the normative premise that behavioral signatures can be tested accessibly; no free parameters or mathematical axioms are introduced.

axioms (1)

domain assumption Machine learning research increasingly bifurcates into benchmark-driven engineering and idealized theory that fail to connect.
This observation is stated as the starting point for the argument in the abstract.

invented entities (1)

Ideas First framework no independent evidence
purpose: To re-center research on ideas and their testable behavioral signatures in models.
This is a newly proposed conceptual structure without prior empirical grounding or independent validation supplied in the paper.

pith-pipeline@v0.9.0 · 5667 in / 1205 out tokens · 46323 ms · 2026-05-19T17:07:13.657294+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose an Ideas First framework in which ideas are valued for the behavioral signatures they predict in modern models, and these signatures are tested through tailored experiments designed to detect the relevant patterns rather than to win leaderboards.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

147 extracted references · 147 canonical work pages · 3 internal anchors

[1]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page
[2]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page
[3]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980
[4]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page
[5]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984
[6]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page
[7]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page
[8]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page
[9]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page
[10]

2017 , eprint=

Attention Is All You Need , author=. 2017 , eprint=

work page 2017
[11]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page
[12]

Troubling Trends in Machine Learning Scholarship

Troubling Trends in Machine Learning Scholarship , author=. arXiv:1807.03341 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[13]

ICLR , year=

Scaling Laws for Neural Language Models , author=. ICLR , year=

work page
[14]

Proceedings of the 36th International Conference on Neural Information Processing Systems , pages=

Training compute-optimal large language models , author=. Proceedings of the 36th International Conference on Neural Information Processing Systems , pages=

work page
[15]

Advances in neural information processing systems , year=

Neural Tangent Kernel: Convergence and Generalization in Neural Networks , author=. Advances in neural information processing systems , year=

work page
[16]

NeurIPS , year=

On the Convergence Rate of Training Recurrent Neural Networks , author=. NeurIPS , year=

work page
[17]

Distill , year=

Zoom In: An Introduction to Circuits , author=. Distill , year=

work page
[18]

Transformer Circuits Thread , year=

A Mathematical Framework for Transformer Circuits , author=. Transformer Circuits Thread , year=

work page
[19]

Transformer Circuits Thread , year=

In-context Learning and Induction Heads , author=. Transformer Circuits Thread , year=

work page
[20]

NeurIPS , year=

Causal Abstractions of Neural Networks , author=. NeurIPS , year=

work page
[21]

NeurIPS , year=

Locating and Editing Factual Knowledge in GPT , author=. NeurIPS , year=

work page
[22]

ICLR , year=

Deep Double Descent: Where Bigger Models and More Data Hurt , author=. ICLR , year=

work page
[23]

ICLR , year=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. ICLR , year=

work page
[24]

NeurIPS , year=

Denoising Diffusion Probabilistic Models , author=. NeurIPS , year=

work page
[25]

Understanding intermediate layers using linear classifier probes

Understanding intermediate layers using linear classifier probes , author=. arXiv preprint arXiv:1610.01644 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

A Structural Probe for Finding Syntax in Word Representations , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

work page 2019
[27]

International Conference on Learning Representations , year=

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , author=. International Conference on Learning Representations , year=

work page
[28]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[29]

Journal of Machine Learning Research , volume =

The Implicit Bias of Gradient Descent on Separable Data , author =. Journal of Machine Learning Research , volume =

work page
[30]

International Conference on Learning Representations (ICLR) , year =

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , author =. International Conference on Learning Representations (ICLR) , year =

work page
[31]

Advances in Neural Information Processing Systems , volume =

ImageNet Classification with Deep Convolutional Neural Networks , author =. Advances in Neural Information Processing Systems , volume =

work page
[32]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =

Deep Residual Learning for Image Recognition , author =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =

work page
[33]

Proceedings of the 36th International Conference on Machine Learning , series =

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , author =. Proceedings of the 36th International Conference on Machine Learning , series =

work page
[34]

Proceedings of the 38th International Conference on Machine Learning , series =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , series =

work page
[35]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page
[36]

Advances in Neural Information Processing Systems , volume =

Spectrally-Normalized Margin Bounds for Neural Networks , author =. Advances in Neural Information Processing Systems , volume =

work page
[37]

Proceedings of the 28th Conference on Learning Theory , series =

Norm-Based Capacity Control in Neural Networks , author =. Proceedings of the 28th Conference on Learning Theory , series =

work page
[38]

Advances in Neural Information Processing Systems , volume =

Exponential Expressivity in Deep Neural Networks through Transient Chaos , author =. Advances in Neural Information Processing Systems , volume =

work page
[39]

International Conference on Learning Representations , year =

Deep Information Propagation , author =. International Conference on Learning Representations , year =

work page
[40]

International Conference on Learning Representations , year =

Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks , author =. International Conference on Learning Representations , year =

work page
[41]

Proceedings of the National Academy of Sciences , volume =

Benign Overfitting in Linear Regression , author =. Proceedings of the National Academy of Sciences , volume =

work page
[42]

International Conference on Learning Representations , year =

Understanding Deep Learning Requires Rethinking Generalization , author =. International Conference on Learning Representations , year =

work page
[43]

Proceedings of the National Academy of Sciences , volume =

Reconciling Modern Machine-Learning Practice and the Classical Bias--Variance Trade-off , author =. Proceedings of the National Academy of Sciences , volume =

work page
[44]

International Conference on Learning Representations (ICLR) , year =

Deep Double Descent: Where Bigger Models and More Data Hurt , author =. arXiv preprint arXiv:1912.02292 , year =

work page arXiv 1912
[45]

Proceedings of the National Academy of Sciences , volume =

Prevalence of Neural Collapse During the Terminal Phase of Deep Learning Training , author =. Proceedings of the National Academy of Sciences , volume =

work page
[46]

Advances in Neural Information Processing Systems , volume =

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , author =. Advances in Neural Information Processing Systems , volume =

work page
[47]

arXiv preprint arXiv:2404.06647 , year =

From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution , author =. arXiv preprint arXiv:2404.06647 , year =

work page arXiv
[48]

Proceedings of the 9th International Conference on Learning Representations (ICLR) , year =

The Benchmark Lottery , author =. Proceedings of the 9th International Conference on Learning Representations (ICLR) , year =

work page
[49]

Proceedings of Machine Learning and Systems , volume=

Accounting for variance in machine learning benchmarks , author=. Proceedings of Machine Learning and Systems , volume=

work page
[50]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

When benchmarks are targets: Revealing the sensitivity of large language model leaderboards , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page
[51]

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES) , year =

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation , author =. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES) , year =

work page
[52]

Benchmarking is Broken:

Cheng, Zerui and Wohnig, Stella and Gupta, Ruchika and Alam, Samiul and Abdullahi, Tassallah and Alves Ribeiro, Jo. Benchmarking is Broken:. Advances in neural information processing systems , year =

work page
[53]

Advances in neural information processing systems , volume = 39, year =

The Leaderboard Illusion , author =. Advances in neural information processing systems , volume = 39, year =

work page
[54]

Utility is in the Eye of the User: A Critique of

Ethayarajh, Kawin and Swayamdipta, Swabha and Choi, Yejin , booktitle =. Utility is in the Eye of the User: A Critique of

work page
[55]

Preprint , year =

Global Benchmarks for AI: Mapping the Geography of Evaluation , author =. Preprint , year =

work page
[56]

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle =

work page
[57]

, booktitle =

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R. , booktitle =

work page
[58]

2018 , note =

Why Machine Learning Needs Benchmarks , author =. 2018 , note =

work page 2018
[59]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =

Kiela, Douwe and Bartolo, Maxime and Nie, Yixin and Kaushik, Divyansh and Geiger, Atticus and. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =

work page
[60]

Philosophical Transactions of the Royal Society A , volume =

Scientific Machine Learning Benchmarks , author =. Philosophical Transactions of the Royal Society A , volume =

work page
[61]

Statistical Learning Theory , author =

work page
[62]

Understanding Machine Learning: From Theory to Algorithms , author =

work page
[63]

Technical Report, Carnegie Mellon University , year =

The Discipline of Machine Learning , author =. Technical Report, Carnegie Mellon University , year =

work page
[64]

International Conference on Learning Representations (ICLR) , year =

Understanding Deep Learning Requires Rethinking Generalization , author =. International Conference on Learning Representations (ICLR) , year =

work page
[65]

Proceedings of the National Academy of Sciences , volume =

The Science of Deep Learning , author =. Proceedings of the National Academy of Sciences , volume =

work page
[66]

Understanding Deep Learning , author =

work page
[67]

2025 , note =

A Survey on Neurodynamics of Deep Learning: A Unified Perspective on Architectures and Optimization Algorithms , author =. 2025 , note =

work page 2025
[68]

Nature Communications , year =

Theory of Deep Learning: A Primer , author =. Nature Communications , year =

work page
[69]

Proceedings of the National Academy of Sciences , year =

A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level , author =. Proceedings of the National Academy of Sciences , year =

work page
[70]

How the Laws of Physics Lie , author =

work page
[71]

Science and Partial Truth: A Unitary Approach to Models and Scientific Reasoning , author =

work page
[72]

2019 , publisher=

Scientific perspectivism , author=. 2019 , publisher=

work page 2019
[73]

The Stanford Encyclopedia of Philosophy , editor =

Models in Science , author =. The Stanford Encyclopedia of Philosophy , editor =

work page
[74]

Nature , volume=

Scientific method: Defend the integrity of physics , author=. Nature , volume=. 2014 , publisher=

work page 2014
[75]

Lost in Math: How Beauty Leads Physics Astray , author =

work page
[76]

Representing and Intervening: Introductory Topics in the Philosophy of Natural Science , author =

work page
[77]

2005 , note =

No Easy Answers: Science and the Pursuit of Knowledge , author =. 2005 , note =

work page 2005
[78]

The Philosophy of Scientific Experimentation , editor =

Experiments in History and Philosophy of Science , author =. The Philosophy of Scientific Experimentation , editor =

work page
[79]

The Philosophy of Scientific Experimentation , author =

work page
[80]

The Stanford Encyclopedia of Philosophy , editor =

Experiment in Physics , author =. The Stanford Encyclopedia of Philosophy , editor =

work page

Showing first 80 references.

[1] [1]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)

work page

[2] [2]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence

work page

[3] [3]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

work page 1980

[4] [4]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science

work page

[5] [5]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984

[6] [6]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies

work page

[7] [7]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving

work page

[8] [8]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue

work page

[9] [9]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models

work page

[10] [10]

2017 , eprint=

Attention Is All You Need , author=. 2017 , eprint=

work page 2017

[11] [11]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet

work page

[12] [12]

Troubling Trends in Machine Learning Scholarship

Troubling Trends in Machine Learning Scholarship , author=. arXiv:1807.03341 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

ICLR , year=

Scaling Laws for Neural Language Models , author=. ICLR , year=

work page

[14] [14]

Proceedings of the 36th International Conference on Neural Information Processing Systems , pages=

Training compute-optimal large language models , author=. Proceedings of the 36th International Conference on Neural Information Processing Systems , pages=

work page

[15] [15]

Advances in neural information processing systems , year=

Neural Tangent Kernel: Convergence and Generalization in Neural Networks , author=. Advances in neural information processing systems , year=

work page

[16] [16]

NeurIPS , year=

On the Convergence Rate of Training Recurrent Neural Networks , author=. NeurIPS , year=

work page

[17] [17]

Distill , year=

Zoom In: An Introduction to Circuits , author=. Distill , year=

work page

[18] [18]

Transformer Circuits Thread , year=

A Mathematical Framework for Transformer Circuits , author=. Transformer Circuits Thread , year=

work page

[19] [19]

Transformer Circuits Thread , year=

In-context Learning and Induction Heads , author=. Transformer Circuits Thread , year=

work page

[20] [20]

NeurIPS , year=

Causal Abstractions of Neural Networks , author=. NeurIPS , year=

work page

[21] [21]

NeurIPS , year=

Locating and Editing Factual Knowledge in GPT , author=. NeurIPS , year=

work page

[22] [22]

ICLR , year=

Deep Double Descent: Where Bigger Models and More Data Hurt , author=. ICLR , year=

work page

[23] [23]

ICLR , year=

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , author=. ICLR , year=

work page

[24] [24]

NeurIPS , year=

Denoising Diffusion Probabilistic Models , author=. NeurIPS , year=

work page

[25] [25]

Understanding intermediate layers using linear classifier probes

Understanding intermediate layers using linear classifier probes , author=. arXiv preprint arXiv:1610.01644 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

A Structural Probe for Finding Syntax in Word Representations , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pages=

work page 2019

[27] [27]

International Conference on Learning Representations , year=

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , author=. International Conference on Learning Representations , year=

work page

[28] [28]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent , author =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page

[29] [29]

Journal of Machine Learning Research , volume =

The Implicit Bias of Gradient Descent on Separable Data , author =. Journal of Machine Learning Research , volume =

work page

[30] [30]

International Conference on Learning Representations (ICLR) , year =

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , author =. International Conference on Learning Representations (ICLR) , year =

work page

[31] [31]

Advances in Neural Information Processing Systems , volume =

ImageNet Classification with Deep Convolutional Neural Networks , author =. Advances in Neural Information Processing Systems , volume =

work page

[32] [32]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =

Deep Residual Learning for Image Recognition , author =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages =

work page

[33] [33]

Proceedings of the 36th International Conference on Machine Learning , series =

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , author =. Proceedings of the 36th International Conference on Machine Learning , series =

work page

[34] [34]

Proceedings of the 38th International Conference on Machine Learning , series =

Learning Transferable Visual Models From Natural Language Supervision , author =. Proceedings of the 38th International Conference on Machine Learning , series =

work page

[35] [35]

Advances in neural information processing systems , volume=

Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

work page

[36] [36]

Advances in Neural Information Processing Systems , volume =

Spectrally-Normalized Margin Bounds for Neural Networks , author =. Advances in Neural Information Processing Systems , volume =

work page

[37] [37]

Proceedings of the 28th Conference on Learning Theory , series =

Norm-Based Capacity Control in Neural Networks , author =. Proceedings of the 28th Conference on Learning Theory , series =

work page

[38] [38]

Advances in Neural Information Processing Systems , volume =

Exponential Expressivity in Deep Neural Networks through Transient Chaos , author =. Advances in Neural Information Processing Systems , volume =

work page

[39] [39]

International Conference on Learning Representations , year =

Deep Information Propagation , author =. International Conference on Learning Representations , year =

work page

[40] [40]

International Conference on Learning Representations , year =

Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks , author =. International Conference on Learning Representations , year =

work page

[41] [41]

Proceedings of the National Academy of Sciences , volume =

Benign Overfitting in Linear Regression , author =. Proceedings of the National Academy of Sciences , volume =

work page

[42] [42]

International Conference on Learning Representations , year =

Understanding Deep Learning Requires Rethinking Generalization , author =. International Conference on Learning Representations , year =

work page

[43] [43]

Proceedings of the National Academy of Sciences , volume =

Reconciling Modern Machine-Learning Practice and the Classical Bias--Variance Trade-off , author =. Proceedings of the National Academy of Sciences , volume =

work page

[44] [44]

International Conference on Learning Representations (ICLR) , year =

Deep Double Descent: Where Bigger Models and More Data Hurt , author =. arXiv preprint arXiv:1912.02292 , year =

work page arXiv 1912

[45] [45]

Proceedings of the National Academy of Sciences , volume =

Prevalence of Neural Collapse During the Terminal Phase of Deep Learning Training , author =. Proceedings of the National Academy of Sciences , volume =

work page

[46] [46]

Advances in Neural Information Processing Systems , volume =

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , author =. Advances in Neural Information Processing Systems , volume =

work page

[47] [47]

arXiv preprint arXiv:2404.06647 , year =

From Protoscience to Epistemic Monoculture: How Benchmarking Set the Stage for the Deep Learning Revolution , author =. arXiv preprint arXiv:2404.06647 , year =

work page arXiv

[48] [48]

Proceedings of the 9th International Conference on Learning Representations (ICLR) , year =

The Benchmark Lottery , author =. Proceedings of the 9th International Conference on Learning Representations (ICLR) , year =

work page

[49] [49]

Proceedings of Machine Learning and Systems , volume=

Accounting for variance in machine learning benchmarks , author=. Proceedings of Machine Learning and Systems , volume=

work page

[50] [50]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

When benchmarks are targets: Revealing the sensitivity of large language model leaderboards , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

work page

[51] [51]

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES) , year =

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation , author =. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES) , year =

work page

[52] [52]

Benchmarking is Broken:

Cheng, Zerui and Wohnig, Stella and Gupta, Ruchika and Alam, Samiul and Abdullahi, Tassallah and Alves Ribeiro, Jo. Benchmarking is Broken:. Advances in neural information processing systems , year =

work page

[53] [53]

Advances in neural information processing systems , volume = 39, year =

The Leaderboard Illusion , author =. Advances in neural information processing systems , volume = 39, year =

work page

[54] [54]

Utility is in the Eye of the User: A Critique of

Ethayarajh, Kawin and Swayamdipta, Swabha and Choi, Yejin , booktitle =. Utility is in the Eye of the User: A Critique of

work page

[55] [55]

Preprint , year =

Global Benchmarks for AI: Mapping the Geography of Evaluation , author =. Preprint , year =

work page

[56] [56]

Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li , booktitle =

work page

[57] [57]

, booktitle =

Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R. , booktitle =

work page

[58] [58]

2018 , note =

Why Machine Learning Needs Benchmarks , author =. 2018 , note =

work page 2018

[59] [59]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =

Kiela, Douwe and Bartolo, Maxime and Nie, Yixin and Kaushik, Divyansh and Geiger, Atticus and. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , pages =

work page

[60] [60]

Philosophical Transactions of the Royal Society A , volume =

Scientific Machine Learning Benchmarks , author =. Philosophical Transactions of the Royal Society A , volume =

work page

[61] [61]

Statistical Learning Theory , author =

work page

[62] [62]

Understanding Machine Learning: From Theory to Algorithms , author =

work page

[63] [63]

Technical Report, Carnegie Mellon University , year =

The Discipline of Machine Learning , author =. Technical Report, Carnegie Mellon University , year =

work page

[64] [64]

International Conference on Learning Representations (ICLR) , year =

Understanding Deep Learning Requires Rethinking Generalization , author =. International Conference on Learning Representations (ICLR) , year =

work page

[65] [65]

Proceedings of the National Academy of Sciences , volume =

The Science of Deep Learning , author =. Proceedings of the National Academy of Sciences , volume =

work page

[66] [66]

Understanding Deep Learning , author =

work page

[67] [67]

2025 , note =

A Survey on Neurodynamics of Deep Learning: A Unified Perspective on Architectures and Optimization Algorithms , author =. 2025 , note =

work page 2025

[68] [68]

Nature Communications , year =

Theory of Deep Learning: A Primer , author =. Nature Communications , year =

work page

[69] [69]

Proceedings of the National Academy of Sciences , year =

A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level , author =. Proceedings of the National Academy of Sciences , year =

work page

[70] [70]

How the Laws of Physics Lie , author =

work page

[71] [71]

Science and Partial Truth: A Unitary Approach to Models and Scientific Reasoning , author =

work page

[72] [72]

2019 , publisher=

Scientific perspectivism , author=. 2019 , publisher=

work page 2019

[73] [73]

The Stanford Encyclopedia of Philosophy , editor =

Models in Science , author =. The Stanford Encyclopedia of Philosophy , editor =

work page

[74] [74]

Nature , volume=

Scientific method: Defend the integrity of physics , author=. Nature , volume=. 2014 , publisher=

work page 2014

[75] [75]

Lost in Math: How Beauty Leads Physics Astray , author =

work page

[76] [76]

Representing and Intervening: Introductory Topics in the Philosophy of Natural Science , author =

work page

[77] [77]

2005 , note =

No Easy Answers: Science and the Pursuit of Knowledge , author =. 2005 , note =

work page 2005

[78] [78]

The Philosophy of Scientific Experimentation , editor =

Experiments in History and Philosophy of Science , author =. The Philosophy of Scientific Experimentation , editor =

work page

[79] [79]

The Philosophy of Scientific Experimentation , author =

work page

[80] [80]

The Stanford Encyclopedia of Philosophy , editor =

Experiment in Physics , author =. The Stanford Encyclopedia of Philosophy , editor =

work page