ReplicatorBench evaluates LLM agents on replicating social and behavioral science claims across retrieval, computation, and interpretation stages, finding strength in experiment execution but weakness in resource retrieval.
Science , volume =
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
For exchangeable hypotheses the optimal FWER-controlling multiple-testing procedure is computed via elementary symmetric polynomials on likelihood ratios plus a monotonicity theorem that enables an efficient bisection coordinate-descent algorithm.
LLMs given only research questions from 1000 arXiv CS papers recommend a narrower set of methods than the original papers, with effective model-entity diversity dropping from 1232 to 59-96 and stronger agreement among LLMs than with papers.
AI agents automating alignment research are prone to systematic undetected errors in fuzzy tasks, leading to overconfident but flawed safety assessments even without deliberate sabotage.
Longitudinal study of 56,800 AI papers finds sixfold increase in code+data sharing from 2014-2024 with inferred reproducibility rising from 28% to 64%.
Science would advance faster with higher trust if its thinking processes were made visible, trackable, and forkable like software development.
Position paper calling for stronger evidentiary standards and a diagnostic checklist in anthropomorphic misalignment research.
citing papers explorer
-
Thinking Like a Scientist? A Structural Study of LLM-Generated Research Methods
LLMs given only research questions from 1000 arXiv CS papers recommend a narrower set of methods than the original papers, with effective model-entity diversity dropping from 1232 to 59-96 and stronger agreement among LLMs than with papers.
-
The Shift Toward Open and Reproducible AI Research
Longitudinal study of 56,800 AI papers finds sixfold increase in code+data sharing from 2014-2024 with inferred reproducibility rising from 28% to 64%.
-
Visible, Trackable, Forkable: Opening the Process of Science
Science would advance faster with higher trust if its thinking processes were made visible, trackable, and forkable like software development.
-
Position: Anthropomorphic Misalignment Research Needs Stronger Evidence
Position paper calling for stronger evidentiary standards and a diagnostic checklist in anthropomorphic misalignment research.