Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results

· 2026 · cs.AI · arXiv 2604.21965

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Recent work has used LLM agents to reproduce empirical social science results with access to both the data and code. We broaden this scope by asking: Can they reproduce results given only a paper's methods description and original data? We develop an agentic reproduction system that extracts structured methods descriptions from papers, runs reimplementations under strict information isolation -- agents never see the original code, results, or paper -- and enables deterministic, cell-level comparison of reproduced outputs to the original results. An error attribution step traces discrepancies through the system chain to identify root causes. Evaluating four agent scaffolds and four LLMs on 48 papers with human-verified reproducibility, we find that agents can largely recover published results, but performance varies substantially between models, scaffolds, and papers. Root cause analysis reveals that failures stem both from agent errors and from underspecification in the papers themselves.

representative citing papers

Coding-agents can replicate scientific machine learning papers

cs.AI · 2026-07-02 · unverdicted · novelty 7.0

Paper-replication is a workflow that enables coding agents to replicate computational claims from scientific ML papers by recording targets, reconstructing methods, running experiments, and validating evidence against original claims.

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

cs.CL · 2026-06-16 · unverdicted · novelty 7.0

ReproRepo uses GitHub issues as natural supervision to benchmark LLM agents on detecting reproducibility blockers across 1,149 ML papers, with the top agent finding related issues for roughly 90% of cases.

Automated reproducibility assessments in the social and behavioral sciences using large language models

cs.AI · 2026-06-11 · conditional · novelty 6.0

LLMs match original qualitative conclusions in 80% of 180 studies and effect sizes in 24%, performing similarly to humans in a tested subset, positioning them as a screening tool rather than a full replacement.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Coding-agents can replicate scientific machine learning papers cs.AI · 2026-07-02 · unverdicted · none · ref 54 · internal anchor
Paper-replication is a workflow that enables coding agents to replicate computational claims from scientific ML papers by recording targets, reconstructing methods, running experiments, and validating evidence against original claims.
Automated reproducibility assessments in the social and behavioral sciences using large language models cs.AI · 2026-06-11 · conditional · none · ref 19 · internal anchor
LLMs match original qualitative conclusions in 80% of 180 studies and effect sizes in 24%, performing similarly to humans in a tested subset, positioning them as a screening tool rather than a full replacement.

Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results

fields

years

verdicts

representative citing papers

citing papers explorer