A survey of 172 open educational datasets from 204 papers across LAK, EDM, and AIED conferences reveals trends, 143 previously uncatalogued datasets, field gaps, and an 8-item PRACTICE checklist for better data publication.
Nature , year=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 2polarities
background 2representative citing papers
Agent-based AI workflows repair injected reproducibility failures in R social-science code at 69-96% success, substantially outperforming prompt-based LLM approaches at 31-79%.
Multi-level bootstrapping models annotator variance using large rater-ID datasets to find optimal tradeoffs between number of items N and ratings per item K for statistically significant AI evaluations.
ReproScore separates readiness (26 static sub-metrics) from outcome (execution probes) and shows near-zero correlation between them on 423 repositories, validating the separation.
The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.
NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice and persona language.
citing papers explorer
-
Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE
A survey of 172 open educational datasets from 204 papers across LAK, EDM, and AIED conferences reveals trends, 143 previously uncatalogued datasets, field gaps, and an 8-item PRACTICE checklist for better data publication.
-
Automating Computational Reproducibility in Social Science: Comparing Prompt-Based and Agent-Based Approaches
Agent-based AI workflows repair injected reproducibility failures in R social-science code at 69-96% success, substantially outperforming prompt-based LLM approaches at 31-79%.
-
Improving Reproducibility in Evaluation through Multi-Level Annotator Modeling
Multi-level bootstrapping models annotator variance using large rater-ID datasets to find optimal tradeoffs between number of items N and ratings per item K for statistically significant AI evaluations.
-
ReproScore: Separating Readiness from Outcome in Research Software Reproducibility Assessment
ReproScore separates readiness (26 static sub-metrics) from outcome (execution probes) and shows near-zero correlation between them on 423 repositories, validating the separation.
-
Experiment-as-Code Labs: A Declarative Stack for AI-Driven Scientific Discovery
The paper introduces Experiment-as-Code Labs as a declarative stack synthesizing AI agents, systems orchestration, and physical lab control for AI-driven discovery.
-
Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI
NormCoRe is a replication-by-translation framework that maps human subject studies onto multi-agent AI environments, showing AI normative judgments on fairness differ from human baselines and vary with model choice and persona language.
- Quantifying the Reconstructability of Astrophysical Methods with Large Language Models and Information Theory: A Case Study in Spectral Reconstruction