In small-budget RCTs where significance tests decide scale-up, optimal pilot sampling shifts from representative to single homogeneous subpopulation as budget shrinks.
National Academies Press
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
ARA uses LLMs to build workflow graphs linking sources, methods, and outputs in papers, then scores reproducibility, reaching ~61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
Agent-based AI workflows repair injected reproducibility failures in R social-science code at 69-96% success, substantially outperforming prompt-based LLM approaches at 31-79%.
Visualization researchers propose traceability—recording abundant annotated artifacts, reporting curated research threads, and enabling reading via interfaces—as a way to ensure rigor and transparency in inherently unreproducible design processes.
K-fold CUBV combines cross-validation with PAC-Bayesian upper bounds on actual risk to provide a more robust criterion for validating ML accuracy and reducing false positives than standard CV.
citing papers explorer
-
When Representative Samples Produce Worse Outcomes: Scale-up Decisions and Testing in Small-Budget RCTs
In small-budget RCTs where significance tests decide scale-up, optimal pilot sampling shifts from representative to single homogeneous subpopulation as budget shrinks.
-
ARA: Agentic Reproducibility Assessment For Scalable Support Of Scientific Peer-Review
ARA uses LLMs to build workflow graphs linking sources, methods, and outputs in papers, then scores reproducibility, reaching ~61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
-
Automating Computational Reproducibility in Social Science: Comparing Prompt-Based and Agent-Based Approaches
Agent-based AI workflows repair injected reproducibility failures in R social-science code at 69-96% success, substantially outperforming prompt-based LLM approaches at 31-79%.
-
Is K-fold cross validation the best model selection method for Machine Learning?
K-fold CUBV combines cross-validation with PAC-Bayesian upper bounds on actual risk to provide a more robust criterion for validating ML accuracy and reducing false positives than standard CV.