OPAL learns optimal smooth labeling policies from ML uncertainty scores to enable low-variance prediction-assisted inference with finite-sample coverage guarantees.
Angelopoulos, Jacob Eisenstein, Jonathan Berant, Alekh Agar- wal, and Adam Fisch
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.
SIREN corrects winner's curse bias in adaptive LLM benchmarking via selection-aware repeated splits and bootstrap for valid procedure-level confidence intervals.
Factorized Active Querying (FAQ) provides up to 5 times more effective samples for LLM accuracy estimation by using Bayesian factor models and adaptive querying under a fixed budget with guaranteed coverage.
Position paper mapping causal inference opportunities across the LLM development pipeline from pretraining to evaluation to address confounding and non-stationarity.
citing papers explorer
-
Optimized Labeling Resource Allocation for Prediction-Assisted Inference via OPAL
OPAL learns optimal smooth labeling policies from ML uncertainty scores to enable low-variance prediction-assisted inference with finite-sample coverage guarantees.
-
Learning U-Statistics with Active Inference
Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.
-
Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking
SIREN corrects winner's curse bias in adaptive LLM benchmarking via selection-aware repeated splits and bootstrap for valid procedure-level confidence intervals.
-
Efficient Evaluation of LLM Performance with Statistical Guarantees
Factorized Active Querying (FAQ) provides up to 5 times more effective samples for LLM accuracy estimation by using Bayesian factor models and adaptive querying under a fixed budget with guaranteed coverage.
-
Causal methods for LLM development and evaluation
Position paper mapping causal inference opportunities across the LLM development pipeline from pretraining to evaluation to address confounding and non-stationarity.