Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.
Doubly-robust LLM-as-a-judge: Externally valid estimation with imperfect personas.arXiv preprint arXiv:2509.22957,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Mitigating LLM-based p-Hacking by Preregistering for the Next LLM
Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.