Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.
arXiv preprint arXiv:2506.07949 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
fields
stat.ML 3years
2026 3verdicts
UNVERDICTED 3representative citing papers
SIREN corrects winner's curse bias in adaptive LLM benchmarking via selection-aware repeated splits and bootstrap for valid procedure-level confidence intervals.
Factorized Active Querying (FAQ) provides up to 5 times more effective samples for LLM accuracy estimation by using Bayesian factor models and adaptive querying under a fixed budget with guaranteed coverage.
citing papers explorer
-
Learning U-Statistics with Active Inference
Active inference framework for U-statistics using augmented IPW to optimize label queries and minimize variance under budget constraints.
-
Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking
SIREN corrects winner's curse bias in adaptive LLM benchmarking via selection-aware repeated splits and bootstrap for valid procedure-level confidence intervals.
-
Efficient Evaluation of LLM Performance with Statistical Guarantees
Factorized Active Querying (FAQ) provides up to 5 times more effective samples for LLM accuracy estimation by using Bayesian factor models and adaptive querying under a fixed budget with guaranteed coverage.