Demystifying prediction powered inference.arXiv preprint arXiv:2601.20819

Yilin Song, Dan M Kluger, Harsh Parikh, Tian Gu · 2026 · arXiv 2601.20819

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

In-Context Learning for the Imputation of Public Opinion Data with Large Language Models

cs.CL · 2026-06-08 · unverdicted · novelty 7.0

ICL with LLMs reduces absolute imputation error for survey data versus MICE PMM across MCAR/MAR/MNAR mechanisms and yields narrower intervals with near-nominal coverage.

Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

stat.ML · 2026-05-28 · unverdicted · novelty 7.0

Multi-task PPI framework uses cross-task recalibration to improve inference power across related tasks, with a proof that gains require nonlinear proxy-ground-truth structure, shown on synthetic data and a 2024 election LM audit case study.

Calibeating Prediction-Powered Inference

stat.ML · 2026-04-23 · unverdicted · novelty 7.0

Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.

Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation

cs.AI · 2026-05-29 · unverdicted · novelty 3.0

GLIDE is a Python library that packages multiple PPI estimators and samplers for reliable GenAI evaluation and reports annotation savings in an agentic case study.

citing papers explorer

Showing 4 of 4 citing papers after filters.

In-Context Learning for the Imputation of Public Opinion Data with Large Language Models cs.CL · 2026-06-08 · unverdicted · none · ref 38
ICL with LLMs reduces absolute imputation error for survey data versus MICE PMM across MCAR/MAR/MNAR mechanisms and yields narrower intervals with near-nominal coverage.
Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research stat.ML · 2026-05-28 · unverdicted · none · ref 9
Multi-task PPI framework uses cross-task recalibration to improve inference power across related tasks, with a proof that gains require nonlinear proxy-ground-truth structure, shown on synthetic data and a 2024 election LM audit case study.
Calibeating Prediction-Powered Inference stat.ML · 2026-04-23 · unverdicted · none · ref 23
Post-hoc calibration of miscalibrated black-box predictions on a labeled sample improves efficiency of prediction-powered inference for semisupervised mean estimation.
Industrializing Prediction-Powered Inference: The GLIDE Library for Reliable GenAI and Agentic Systems Evaluation cs.AI · 2026-05-29 · unverdicted · none · ref 11
GLIDE is a Python library that packages multiple PPI estimators and samplers for reliable GenAI evaluation and reports annotation savings in an agentic case study.

Demystifying prediction powered inference.arXiv preprint arXiv:2601.20819

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer