Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences

Akshay Jagadish; George Kachergis; Suyog Chandramouli

arxiv: 2604.25521 · v1 · submitted 2026-04-28 · 💻 cs.AI

Automated Adversarial Collaboration for Advancing Theory Building in the Cognitive Sciences

Suyog Chandramouli , George Kachergis , Akshay Jagadish This is my paper

Pith reviewed 2026-05-07 16:08 UTC · model grok-4.3

classification 💻 cs.AI

keywords automated theory adjudicationcognitive scienceLLM agentsprogram synthesisinformation-theoretic designcategorization theoriessimulation studyadversarial collaboration

0 comments

The pith

An automated AI framework can adjudicate competing cognitive theories by discovering models and experiments in a closed loop.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that automates the comparison of theories in cognitive science using AI. LLM-based agents represent different theories, program synthesis creates computational models, and information-theoretic methods design experiments, all operating together in a repeating cycle. This setup allows theories to be tested and integrated even when the specific models and experiments are not known in advance. In simulations based on three established theories of how people categorize objects, the system identified the correct underlying theory despite added noise, though accuracy declined in the most difficult cases. The work shows how such a system could reduce reliance on isolated experiments and help connect findings across different cognitive tasks.

Core claim

The framework recovers the ground-truth theory across noise settings in a simulation study spanning three classic categorization theories, with weaker reliability in the hardest settings. The system combines LLM-based theory agents, program synthesis, and information-theoretic experimental design in a closed loop to adjudicate among competing theories even when the candidate models and experiments must be discovered during the adjudication process.

What carries the argument

The closed-loop automated adversarial collaboration framework that integrates LLM-based theory agents with program synthesis and information-theoretic experimental design.

If this is right

Theory evaluation can integrate evidence across multiple tasks instead of remaining limited to narrow paradigms.
Competing models can be generated and tested automatically without researchers pre-specifying them.
Adjudication among theories becomes possible through in-silico loops before committing to real-world experiments.
The approach supplies a concrete proof of concept for closed-loop theory building in cognitive science.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could lower the influence of researcher-chosen experimental designs on which theories survive.
The same loop structure might apply to theory comparison in other fields that rely on computational models.
Testing the framework on existing public datasets from psychology would reveal how well it handles real behavioral variability.

Load-bearing premise

LLM-based theory agents can faithfully represent, discover, and adjudicate among competing cognitive models without introducing systematic biases or hallucinations.

What would settle it

Apply the framework to human data from categorization experiments and check whether it selects a theory whose predictions match independent held-out data better than the other theories.

Figures

Figures reproduced from arXiv: 2604.25521 by Akshay Jagadish, George Kachergis, Suyog Chandramouli.

**Figure 1.** Figure 1: Left panel: Adversarial collaboration loop for theory adjudication and open experiment design. Right Panel: view at source ↗

read the original abstract

Cognitive science often evaluates theories through narrow paradigms and local model comparisons, limiting the integration of evidence across tasks and realizations. We introduce an automated adversarial collaboration framework for adjudicating among competing theories even when the candidate models and experiments must be discovered during the adjudication process. The system combines LLM-based theory agents, program synthesis, and information-theoretic experimental design in a closed loop. In a simulation study spanning three classic categorization theories, the framework recovered the ground-truth theory across noise settings with weaker reliability in the hardest settings. Together, the framework and findings provide a concrete proof of concept for closed-loop, in-silico theory adjudication in cognitive science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The closed-loop LLM-plus-synthesis framework is a fresh architecture for dynamic theory adjudication, but the simulation recovery is too underspecified to rule out training-data artifacts.

read the letter

The paper introduces a framework for automated adversarial collaboration in cognitive science that uses LLM theory agents, program synthesis for discovering models and experiments, and information-theoretic design to adjudicate competing theories in a closed loop. The key result is that in simulations with three classic categorization theories, the system recovered the ground-truth theory across various noise levels, though less reliably in the noisiest conditions. This combination is new and addresses a real limitation in the field by allowing dynamic discovery rather than fixed comparisons. The simulation recovery provides initial evidence that such a system can function, which is a solid starting point for the proof of concept. The main soft spots lie in the lack of transparency around implementation. There are no specifics on how the LLM agents are prompted or constrained to avoid biases, what the exact noise models were, or any controls for the LLMs potentially favoring theories based on their training data. The drop in performance under high noise is concerning because that's where true adjudication is most valuable, and it aligns with the possibility that success might stem from pre-existing knowledge in the models rather than the process. Without those details, it's difficult to assess how robust or generalizable the approach is. This paper is for cognitive scientists interested in computational methods to build and test theories more broadly, or AI researchers working on automated science. It deserves peer review because the idea targets an important problem with a concrete architecture and some positive simulation results, even though it will require more rigorous validation and bias analysis to strengthen the claims.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces an automated adversarial collaboration framework for theory adjudication in cognitive science. It combines LLM-based theory agents, program synthesis, and information-theoretic experimental design in a closed loop to adjudicate among competing theories, including cases where models and experiments must be discovered dynamically. In a simulation study with three classic categorization theories (prototype, exemplar, rule-based), the framework recovers the ground-truth theory across noise regimes, with weaker reliability in the hardest noise settings. The work is positioned as a proof-of-concept for closed-loop, in-silico theory building.

Significance. If the simulation result holds after addressing implementation details, the framework could meaningfully advance cognitive science by enabling scalable, automated integration of evidence across paradigms and reducing reliance on narrow local comparisons. The closed-loop design incorporating program synthesis and information-theoretic selection is a clear strength, as is the explicit validation against known ground-truth theories in simulation. These elements provide a concrete starting point for automated theory adjudication, though the approach's dependence on LLMs requires rigorous safeguards.

major comments (2)

[Abstract / Simulation Study] Abstract and Simulation Study section: The abstract and simulation description provide no details on LLM agent implementation, exact noise models, exclusion criteria for agent outputs, or controls for LLM-specific biases (e.g., pre-training priors). This is load-bearing for the central claim of ground-truth recovery, as the weaker performance in high-noise regimes is precisely where such biases could most distort adjudication.
[Simulation Study] Simulation Study section: The evaluation relies on externally supplied ground-truth theories, but lacks ablations or controls to isolate whether recovery stems from the adversarial collaboration loop versus statistical regularities in the LLMs' training data. Without these, the proof-of-concept does not yet demonstrate faithful adjudication of novel or under-represented theories.

minor comments (2)

[Methods] Clarify in the methods whether the three categorization theories were pre-specified or discovered by the agents during the loop, to better align with the framework's stated capability for dynamic discovery.
[Results] Ensure simulation results tables or figures explicitly report per-theory recovery rates, noise parameter values, and any statistical tests for reliability across runs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation for major revision. The comments identify key areas where additional transparency will strengthen the manuscript. We respond to each point below and will incorporate the necessary changes.

read point-by-point responses

Referee: [Abstract / Simulation Study] Abstract and Simulation Study section: The abstract and simulation description provide no details on LLM agent implementation, exact noise models, exclusion criteria for agent outputs, or controls for LLM-specific biases (e.g., pre-training priors). This is load-bearing for the central claim of ground-truth recovery, as the weaker performance in high-noise regimes is precisely where such biases could most distort adjudication.

Authors: We agree that these implementation details are essential for evaluating the results. In the revised manuscript we will expand the Simulation Study section with a new 'Implementation Details' subsection. This will specify the LLM models and versions used for theory agents, exact prompting strategies and temperature settings, the mathematical formulation of the noise models applied to the categorization data, the exclusion criteria for filtering invalid or inconsistent agent outputs, and controls for LLM-specific biases including sensitivity checks across multiple model backends and explicit bias-detection prompts. These additions will directly support interpretation of the ground-truth recovery rates, especially in the high-noise regime. revision: yes
Referee: [Simulation Study] Simulation Study section: The evaluation relies on externally supplied ground-truth theories, but lacks ablations or controls to isolate whether recovery stems from the adversarial collaboration loop versus statistical regularities in the LLMs' training data. Without these, the proof-of-concept does not yet demonstrate faithful adjudication of novel or under-represented theories.

Authors: This observation correctly identifies a limitation in the current validation. We will add a dedicated 'Limitations and Scope' paragraph clarifying that the simulation uses established categorization theories to test recovery of known ground truth, while the framework itself is designed for dynamic discovery via program synthesis. We will explain how the closed-loop adversarial process and information-theoretic experiment selection encourage behavior beyond static training-data regularities, as reflected in the differential recovery performance across noise levels. We will also outline planned future ablations using non-LLM theory generators. The revision will therefore position the work more precisely as a proof-of-concept for the closed-loop architecture rather than a comprehensive demonstration for novel theories. revision: partial

Circularity Check

0 steps flagged

Simulation recovery uses externally supplied ground-truth benchmarks

full rationale

The paper's central empirical claim is a simulation study in which the automated framework recovers externally provided ground-truth categorization theories (prototype, exemplar, rule-based) across noise levels. This recovery metric is defined by agreement with independent, pre-specified ground-truth models rather than by any parameter fitted to the framework's own outputs or by a self-referential definition. No derivation step, equation, or load-bearing premise reduces to a fitted input renamed as prediction, a self-citation chain, or an ansatz smuggled via prior work. The evaluation therefore remains non-circular and externally falsifiable.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the untested assumption that LLM agents can act as unbiased proxies for cognitive theories and that program synthesis can generate sufficiently rich experiment spaces; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption LLM-based agents can serve as faithful, unbiased proxies for competing cognitive theories
Invoked to enable the closed-loop discovery and adjudication process described in the abstract.

pith-pipeline@v0.9.0 · 5405 in / 1211 out tokens · 60591 ms · 2026-05-07T16:08:24.753340+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

5 extracted references · 2 canonical work pages

[1]

G., & Maddox, W

Ashby, F. G., & Maddox, W. T. (2005). Human Cate- goryLearning.Annual Review of Psychology,56(1), 149–178. https://doi.org/10.1146/annurev.psych. 56.091103.070217. Binz, M., Akata, E., Bethge, M., Brändle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eck- stein, M. K., Éltető, N., et al. (2025). A foundation model to predict and capture huma...

work page doi:10.1146/annurev.psych 2005
[2]

Griffiths, T. L. (2015). Manifesto for a new (computa- tional) cognitive revolution.Cognition,135, 21–23. Hartshorne, J. K., de Leeuw, J. R., Goodman, N. D., Jennings, M., & O’Donnell, T. J. (2019). A thousand studiesforthepriceofone:Acceleratingpsycholog- ical science with pushkin.Behavior research meth- ods,51(4), 1782–1803. Jagadish, A. K., Rmus, M., W...

work page arXiv 2015
[3]

C., Medin, D

Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). Sus- tain: A network model of category learning.Psycho- logical review,111(2),

2004
[4]

Marr, D., & Vaina, L. (1982). Representation and recog- nition of the movements of shapes.Proceedings of the Royal Society of London. Series B. Biological Sciences,214(1197), 501–524. Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction ef- fects? an exercise in adversarial collaboration.Psy- chological Scien...

1982
[5]

C., Bourgin, D

Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-scale exper- iments and machine learning to discover theories of human decision-making.Science,372(6547), 1209–1214. https:// doi.org/ 10.1126/ science. abe2629. Rmus, M., Jagadish, A. K., Mathony, M., Ludwig, T., & Schulz, E. (2025). Generating computationa...

2021

[1] [1]

G., & Maddox, W

Ashby, F. G., & Maddox, W. T. (2005). Human Cate- goryLearning.Annual Review of Psychology,56(1), 149–178. https://doi.org/10.1146/annurev.psych. 56.091103.070217. Binz, M., Akata, E., Bethge, M., Brändle, F., Callaway, F., Coda-Forno, J., Dayan, P., Demircan, C., Eck- stein, M. K., Éltető, N., et al. (2025). A foundation model to predict and capture huma...

work page doi:10.1146/annurev.psych 2005

[2] [2]

Griffiths, T. L. (2015). Manifesto for a new (computa- tional) cognitive revolution.Cognition,135, 21–23. Hartshorne, J. K., de Leeuw, J. R., Goodman, N. D., Jennings, M., & O’Donnell, T. J. (2019). A thousand studiesforthepriceofone:Acceleratingpsycholog- ical science with pushkin.Behavior research meth- ods,51(4), 1782–1803. Jagadish, A. K., Rmus, M., W...

work page arXiv 2015

[3] [3]

C., Medin, D

Love, B. C., Medin, D. L., & Gureckis, T. M. (2004). Sus- tain: A network model of category learning.Psycho- logical review,111(2),

2004

[4] [4]

Marr, D., & Vaina, L. (1982). Representation and recog- nition of the movements of shapes.Proceedings of the Royal Society of London. Series B. Biological Sciences,214(1197), 501–524. Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction ef- fects? an exercise in adversarial collaboration.Psy- chological Scien...

1982

[5] [5]

C., Bourgin, D

Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-scale exper- iments and machine learning to discover theories of human decision-making.Science,372(6547), 1209–1214. https:// doi.org/ 10.1126/ science. abe2629. Rmus, M., Jagadish, A. K., Mathony, M., Ludwig, T., & Schulz, E. (2025). Generating computationa...

2021