In small-budget RCTs where significance tests decide scale-up, optimal pilot sampling shifts from representative to single homogeneous subpopulation as budget shrinks.
National Academies Press
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
ARA uses LLMs to build workflow graphs linking sources, methods, and outputs in papers, then scores reproducibility, reaching ~61% accuracy on 213 ReScience C articles and outperforming priors on ReproBench and GoldStandardDB.
Agent-based AI workflows repair injected reproducibility failures in R social-science code at 69-96% success, substantially outperforming prompt-based LLM approaches at 31-79%.
Visualization researchers propose traceability—recording abundant annotated artifacts, reporting curated research threads, and enabling reading via interfaces—as a way to ensure rigor and transparency in inherently unreproducible design processes.
K-fold CUBV combines cross-validation with PAC-Bayesian upper bounds on actual risk to provide a more robust criterion for validating ML accuracy and reducing false positives than standard CV.