Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences
Pith reviewed 2026-05-07 16:08 UTC · model grok-4.3
The pith
An automated AI framework can adjudicate competing cognitive theories by discovering models and experiments in a closed loop.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework recovers the ground-truth theory across noise settings in a simulation study spanning three classic categorization theories, with weaker reliability in the hardest settings. The system combines LLM-based theory agents, program synthesis, and information-theoretic experimental design in a closed loop to adjudicate among competing theories even when the candidate models and experiments must be discovered during the adjudication process.
What carries the argument
The closed-loop automated adversarial collaboration framework that integrates LLM-based theory agents with program synthesis and information-theoretic experimental design.
If this is right
- Theory evaluation can integrate evidence across multiple tasks instead of remaining limited to narrow paradigms.
- Competing models can be generated and tested automatically without researchers pre-specifying them.
- Adjudication among theories becomes possible through in-silico loops before committing to real-world experiments.
- The approach supplies a concrete proof of concept for closed-loop theory building in cognitive science.
Where Pith is reading between the lines
- This method could lower the influence of researcher-chosen experimental designs on which theories survive.
- The same loop structure might apply to theory comparison in other fields that rely on computational models.
- Testing the framework on existing public datasets from psychology would reveal how well it handles real behavioral variability.
Load-bearing premise
LLM-based theory agents can faithfully represent, discover, and adjudicate among competing cognitive models without introducing systematic biases or hallucinations.
What would settle it
Apply the framework to human data from categorization experiments and check whether it selects a theory whose predictions match independent held-out data better than the other theories.
Figures
read the original abstract
Cognitive science often evaluates theories through narrow paradigms and local model comparisons, limiting the integration of evidence across tasks and realizations. We introduce an automated adversarial collaboration framework for adjudicating among competing theories even when the candidate models and experiments must be discovered during the adjudication process. The system combines LLM-based theory agents, program synthesis, and information-theoretic experimental design in a closed loop. In a simulation study spanning three classic categorization theories, the framework recovered the ground-truth theory across noise settings with weaker reliability in the hardest settings. Together, the framework and findings provide a concrete proof of concept for closed-loop, in-silico theory adjudication in cognitive science.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces an automated adversarial collaboration framework for theory adjudication in cognitive science. It combines LLM-based theory agents, program synthesis, and information-theoretic experimental design in a closed loop to adjudicate among competing theories, including cases where models and experiments must be discovered dynamically. In a simulation study with three classic categorization theories (prototype, exemplar, rule-based), the framework recovers the ground-truth theory across noise regimes, with weaker reliability in the hardest noise settings. The work is positioned as a proof-of-concept for closed-loop, in-silico theory building.
Significance. If the simulation result holds after addressing implementation details, the framework could meaningfully advance cognitive science by enabling scalable, automated integration of evidence across paradigms and reducing reliance on narrow local comparisons. The closed-loop design incorporating program synthesis and information-theoretic selection is a clear strength, as is the explicit validation against known ground-truth theories in simulation. These elements provide a concrete starting point for automated theory adjudication, though the approach's dependence on LLMs requires rigorous safeguards.
major comments (2)
- [Abstract / Simulation Study] Abstract and Simulation Study section: The abstract and simulation description provide no details on LLM agent implementation, exact noise models, exclusion criteria for agent outputs, or controls for LLM-specific biases (e.g., pre-training priors). This is load-bearing for the central claim of ground-truth recovery, as the weaker performance in high-noise regimes is precisely where such biases could most distort adjudication.
- [Simulation Study] Simulation Study section: The evaluation relies on externally supplied ground-truth theories, but lacks ablations or controls to isolate whether recovery stems from the adversarial collaboration loop versus statistical regularities in the LLMs' training data. Without these, the proof-of-concept does not yet demonstrate faithful adjudication of novel or under-represented theories.
minor comments (2)
- [Methods] Clarify in the methods whether the three categorization theories were pre-specified or discovered by the agents during the loop, to better align with the framework's stated capability for dynamic discovery.
- [Results] Ensure simulation results tables or figures explicitly report per-theory recovery rates, noise parameter values, and any statistical tests for reliability across runs.
Simulated Author's Rebuttal
We thank the referee for the constructive review and the recommendation for major revision. The comments identify key areas where additional transparency will strengthen the manuscript. We respond to each point below and will incorporate the necessary changes.
read point-by-point responses
-
Referee: [Abstract / Simulation Study] Abstract and Simulation Study section: The abstract and simulation description provide no details on LLM agent implementation, exact noise models, exclusion criteria for agent outputs, or controls for LLM-specific biases (e.g., pre-training priors). This is load-bearing for the central claim of ground-truth recovery, as the weaker performance in high-noise regimes is precisely where such biases could most distort adjudication.
Authors: We agree that these implementation details are essential for evaluating the results. In the revised manuscript we will expand the Simulation Study section with a new 'Implementation Details' subsection. This will specify the LLM models and versions used for theory agents, exact prompting strategies and temperature settings, the mathematical formulation of the noise models applied to the categorization data, the exclusion criteria for filtering invalid or inconsistent agent outputs, and controls for LLM-specific biases including sensitivity checks across multiple model backends and explicit bias-detection prompts. These additions will directly support interpretation of the ground-truth recovery rates, especially in the high-noise regime. revision: yes
-
Referee: [Simulation Study] Simulation Study section: The evaluation relies on externally supplied ground-truth theories, but lacks ablations or controls to isolate whether recovery stems from the adversarial collaboration loop versus statistical regularities in the LLMs' training data. Without these, the proof-of-concept does not yet demonstrate faithful adjudication of novel or under-represented theories.
Authors: This observation correctly identifies a limitation in the current validation. We will add a dedicated 'Limitations and Scope' paragraph clarifying that the simulation uses established categorization theories to test recovery of known ground truth, while the framework itself is designed for dynamic discovery via program synthesis. We will explain how the closed-loop adversarial process and information-theoretic experiment selection encourage behavior beyond static training-data regularities, as reflected in the differential recovery performance across noise levels. We will also outline planned future ablations using non-LLM theory generators. The revision will therefore position the work more precisely as a proof-of-concept for the closed-loop architecture rather than a comprehensive demonstration for novel theories. revision: partial
Circularity Check
Simulation recovery uses externally supplied ground-truth benchmarks
full rationale
The paper's central empirical claim is a simulation study in which the automated framework recovers externally provided ground-truth categorization theories (prototype, exemplar, rule-based) across noise levels. This recovery metric is defined by agreement with independent, pre-specified ground-truth models rather than by any parameter fitted to the framework's own outputs or by a self-referential definition. No derivation step, equation, or load-bearing premise reduces to a fitted input renamed as prediction, a self-citation chain, or an ansatz smuggled via prior work. The evaluation therefore remains non-circular and externally falsifiable.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM-based agents can serve as faithful, unbiased proxies for competing cognitive theories
Reference graph
Works this paper leans on
-
[1]
Ashby, F. G., & Maddox, W. T. (2005). Human Cate- goryLearning.Annual Review of Psychology,56(1), 149–178. https://doi.org/10.1146/annurev.psych. 56.091103.070217. Binz, M., Akata, E., Bethge, M., Brändle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eck- stein, M. K., Éltető, N., et al. (2025). A foundation model to predict and capture huma...
-
[2]
Griffiths, T. L. (2015). Manifesto for a new (computa- tional) cognitive revolution.Cognition,135, 21–23. Hartshorne, J. K., de Leeuw, J. R., Goodman, N. D., Jennings, M., & O’Donnell, T. J. (2019). A thousand studiesforthepriceofone:Acceleratingpsycholog- ical science with pushkin.Behavior research meth- ods,51(4), 1782–1803. Jagadish, A. K., Rmus, M., W...
-
[3]
C., Medin, D
Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). Sus- tain: A network model of category learning.Psycho- logical review,111(2),
2004
-
[4]
Marr, D., & Vaina, L. (1982). Representation and recog- nition of the movements of shapes.Proceedings of the Royal Society of London. Series B. Biological Sciences,214(1197), 501–524. Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction ef- fects? an exercise in adversarial collaboration.Psy- chological Scien...
1982
-
[5]
C., Bourgin, D
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-scale exper- iments and machine learning to discover theories of human decision-making.Science,372(6547), 1209–1214. https:// doi.org/ 10.1126/ science. abe2629. Rmus, M., Jagadish, A. K., Mathony, M., Ludwig, T., & Schulz, E. (2025). Generating computationa...
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.