pith. sign in

arxiv: 2606.26448 · v1 · pith:2ADTQCYPnew · submitted 2026-06-24 · 🧬 q-bio.NC · cs.AI

Closing the Loop to Discover Psychological Theories with an Automated Cognitive Scientist

Pith reviewed 2026-06-26 00:20 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AI
keywords automated discoverycognitive modelingdecision makinglarge language modelstheory generationagentic AImodel comparisononline experiments
0
0 comments X

The pith

An automated AI system closes the loop on cognitive theory discovery by proposing executable models, running experiments on humans, and iteratively refining them from data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents AutoCog, a fully autonomous agentic-AI system in which large-language-model agents propose competing cognitive theories as executable models, design experiments to discriminate among them, collect behavioral data from online participants, score the models on how well they generate the data, diagnose failures, and synthesize improved successor theories. The cycle repeats to search the space of theories. In decision-making, the system recovers known strategies including unconventional ones from simulated data, showing discoveries are data-driven. With human participants it generates theories that outperform the seeded established models, generalize to held-out studies in two settings, and produces a novel multi-cue decision theory featuring diminishing sensitivity to feature values whose distinctive predictions are confirmed in a preregistered experiment with new participants.

Core claim

AutoCog recovers known decision-making strategies from simulated behavior, including unconventional ones, and when run with human participants produces theories that outperform the established theories it was seeded with and generalize to held-out studies in two different experimental settings; it also surfaces a novel theory of multi-cue decision-making in which choices show diminishing sensitivity to feature values whose distinctive predictions are confirmed in a preregistered study with new participants.

What carries the argument

The AutoCog closed-loop system in which LLM agents advocate executable cognitive models, design discriminating experiments, collect online behavioral data, score theories on generative performance, diagnose failures, and synthesize successor theories.

If this is right

  • Theory-building in cognitive science can shift from a manual creative step to an explicit, executable, and cumulative process.
  • Automated systems can search the joint space of theories, models, and experiments without being strictly bound by language-model priors.
  • Novel theories discovered by the system can have their unique predictions tested and confirmed in independent preregistered experiments.
  • Data-driven iteration can produce models that generalize across different experimental settings.
  • The accumulated failures of existing models can be turned into better successor models automatically.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same closed-loop approach could be applied to other subfields of psychology where theory generation remains the main bottleneck.
  • Future versions might handle richer classes of cognitive models if the executability constraint is relaxed.
  • Parallel discovery loops running simultaneously could accelerate the rate at which theories evolve.
  • The method raises the question of how to maintain cumulative progress when different discovery runs start from different seed theories.

Load-bearing premise

That LLM agents can diagnose model failures and synthesize successor theories driven by the collected behavioral data rather than by the language models' pre-existing priors, and that the online experiment platform yields unbiased measurements suitable for model comparison.

What would settle it

A run of the system on human data in which the generated theories do not outperform the seeded ones or the novel theory's distinctive predictions fail to hold in the preregistered follow-up study.

read the original abstract

Across the sciences, autonomous systems are increasingly being used in closed-loop discovery, proposing new theories and designing and running experiments to test them. This approach is yet to be applied in the field of cognitive science, where the central bottleneck is theory-building: the creative step of turning the accumulated failures of existing models into better ones. Theory generation has remained manual even as data collection, modeling, and experiment design have been automated. We present the Automated Cognitive Scientist (AutoCog), a fully autonomous agentic-AI system that closes this loop. Large-language-model agents advocate competing theories, each expressed as an executable cognitive model, design experiments that best discriminate them, collect behavioral data from participants recruited online, score theories against collected data based on their generative performance, diagnose why they fail, and synthesize a better successor. Repeating this cycle allows them to search the space of theories, models, and experiments. In the domain of decision-making, AutoCog recovered known decision-making strategies from simulated behavior, including unconventional ones, showing that its discoveries are ultimately driven by the data rather than strictly bound by the priors of the underlying language models. When run with human participants, it produced theories that outperformed the established theories it was seeded with and generalized to held-out studies in two different experimental settings. It also surfaced a novel theory of multi-cue decision-making in which choices show diminishing sensitivity to feature values. The distinctive predictions of this theory were confirmed in a preregistered study with new participants. AutoCog demonstrates how an automated discovery system can be used to turn cognitive theory-building into an explicit, executable, and cumulative science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the Automated Cognitive Scientist (AutoCog), an autonomous agentic-AI system that closes the discovery loop in cognitive science. LLM agents propose competing executable cognitive models as theories, design discriminating experiments, collect online participant data, score models on generative performance, diagnose failures, and synthesize improved successors. Applied to decision-making, it recovered known strategies from simulated data (including unconventional ones), generated theories outperforming seeded established ones with human participants that generalized to held-out studies, and discovered a novel multi-cue decision theory with diminishing sensitivity to feature values, whose predictions were confirmed in a preregistered experiment.

Significance. If the reported results hold, this work would be highly significant as the first demonstration of fully automated, closed-loop theory discovery in cognitive science. By making theory generation explicit, executable, and iterative based on data, it addresses the central bottleneck of manual theory-building. The recovery of known strategies from data, outperformance of seeded theories, generalization, and preregistered confirmation of a novel theory provide evidence that such systems can produce data-driven advances beyond LLM priors. This could transform psychology into a more cumulative science.

major comments (2)
  1. [Abstract] Abstract: The central claims that AutoCog 'produced theories that outperformed the established theories it was seeded with and generalized to held-out studies' and that 'the distinctive predictions of this theory were confirmed in a preregistered study' are load-bearing for the efficacy of the system, yet the available manuscript provides no methods, model specifications, data, statistical comparisons, or preregistration details needed to assess whether these outcomes are supported.
  2. [Abstract] Abstract: The assertion that 'its discoveries are ultimately driven by the data rather than strictly bound by the priors of the underlying language models' is central to the claim of data-driven discovery, but the high-level description of failure diagnosis and successor synthesis provides no concrete mechanism or evidence (e.g., comparisons to LLM-only baselines) to evaluate whether the process escapes LLM priors.

Simulated Author's Rebuttal

2 responses · 2 unresolved

We thank the referee for their careful reading and for identifying the evidentiary gaps in the abstract. We respond to each major comment below. Because the only text available to us is the abstract itself, our responses are necessarily limited to what is stated there.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claims that AutoCog 'produced theories that outperformed the established theories it was seeded with and generalized to held-out studies' and that 'the distinctive predictions of this theory were confirmed in a preregistered study' are load-bearing for the efficacy of the system, yet the available manuscript provides no methods, model specifications, data, statistical comparisons, or preregistration details needed to assess whether these outcomes are supported.

    Authors: The referee is correct: the abstract contains no methods, model specifications, data, statistical comparisons, or preregistration details. We therefore cannot demonstrate the claims from the provided text. We will revise the manuscript to include a concise methods summary and explicit references to the preregistration and statistical procedures so that the abstract's claims can be evaluated. revision: yes

  2. Referee: [Abstract] Abstract: The assertion that 'its discoveries are ultimately driven by the data rather than strictly bound by the priors of the underlying language models' is central to the claim of data-driven discovery, but the high-level description of failure diagnosis and successor synthesis provides no concrete mechanism or evidence (e.g., comparisons to LLM-only baselines) to evaluate whether the process escapes LLM priors.

    Authors: The referee is correct that the abstract supplies only a high-level description and offers neither a concrete mechanism nor any baseline comparison (e.g., LLM-only runs) that would show escape from model priors. We will revise the abstract to state the limitation explicitly and to indicate where in the full study such comparisons appear. revision: yes

standing simulated objections not resolved
  • The actual methods, model specifications, participant data, statistical comparisons, and preregistration details required to substantiate the abstract's claims are absent from the only available text.
  • Concrete mechanisms for failure diagnosis and successor synthesis, together with any LLM-only baseline comparisons, are not present in the provided abstract.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The abstract describes an iterative agentic system that proposes executable cognitive models, designs experiments, collects fresh behavioral data from participants, scores models on generative performance against that data, diagnoses failures, and synthesizes successors. No equations, fitted parameters, or derivation steps are presented that reduce any claimed prediction or novel theory to its own inputs by construction. The central claim that discoveries are data-driven is supported by explicit use of newly collected and held-out data plus a preregistered confirmation study, rendering the process self-contained against external benchmarks rather than reliant on self-citation chains or definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, background axioms, or new postulated entities; the system is described at the level of LLM agents and executable models without technical specification.

pith-pipeline@v0.9.1-grok · 5828 in / 1184 out tokens · 34736 ms · 2026-06-26T00:20:21.248739+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.