ARA uses LLMs to build workflow graphs linking sources, methods, and outputs in papers, then scores reproducibility, reaching ~61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
State of the art: Reproducibility in artificial intelligence.Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), April 2018
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
Longitudinal study of 56,800 AI papers finds sixfold increase in code+data sharing from 2014-2024 with inferred reproducibility rising from 28% to 64%.
Traxia is a proposed agent-native scientific publishing framework with five formalised components: agent identity registry, verifiable publishing layer, four-tier peer review, reputation engine, and knowledge graph with contradiction detection.
Sibyl-AutoResearch introduces self-evolving trial-and-error harnesses with auditable conversion units that link trial signals to updated research behaviors and harness repairs in autonomous systems.
Multi-level bootstrapping models annotator variance using large rater-ID datasets to find optimal tradeoffs between number of items N and ratings per item K for statistically significant AI evaluations.
citing papers explorer
-
ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review
ARA uses LLMs to build workflow graphs linking sources, methods, and outputs in papers, then scores reproducibility, reaching ~61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
-
The Shift Toward Open and Reproducible AI Research
Longitudinal study of 56,800 AI papers finds sixfold increase in code+data sharing from 2014-2024 with inferred reproducibility rising from 28% to 64%.
-
Traxia: A Framework for Verifiable, Agent-Native Scientific Publishing
Traxia is a proposed agent-native scientific publishing framework with five formalised components: agent identity registry, verifiable publishing layer, four-tier peer review, reputation engine, and knowledge graph with contradiction detection.
-
Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators
Sibyl-AutoResearch introduces self-evolving trial-and-error harnesses with auditable conversion units that link trial signals to updated research behaviors and harness repairs in autonomous systems.
-
Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling
Multi-level bootstrapping models annotator variance using large rater-ID datasets to find optimal tradeoffs between number of items N and ratings per item K for statistically significant AI evaluations.