Active Statistical Inference

Emmanuel J. Cand\`es; Tijana Zrnic

arxiv: 2403.03208 · v3 · submitted 2024-03-05 · 📊 stat.ML · cs.LG· stat.ME

Active Statistical Inference

Tijana Zrnic , Emmanuel J. Cand\`es This is my paper

classification 📊 stat.ML cs.LGstat.ME

keywords activeinferencedatamodellearningbudgetcollectedcollection

0 comments

read the original abstract

Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards
math.ST 2025-06 unverdicted novelty 7.0

The MLA-UCB algorithm uses ML-generated surrogate rewards from auxiliary data to provably lower cumulative regret in multi-armed bandits, achieving asymptotic optimality under joint Gaussian assumptions without requir...
Batch-Adaptive Causal Annotations
stat.ML 2025-02 unverdicted novelty 6.0

Derives closed-form optimal batch sampling probabilities to minimize asymptotic variance of doubly robust ATE estimator with missing outcomes, achieving lower MSE and matching full-sample precision with 75% fewer labe...
High-Dimensional Statistics: Reflections on Progress and Open Problems
math.ST 2026-05 unverdicted novelty 2.0

A survey synthesizing representative advances, common themes, and open problems in high-dimensional statistics while pointing to key entry-point works.