Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference

Amirmohammad Farzaneh; Matteo Zecchin; Osvaldo Simeone

arxiv: 2509.04112 · v3 · submitted 2025-09-04 · 💻 cs.LG · cs.IT· math.IT

Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference

Amirmohammad Farzaneh , Matteo Zecchin , Osvaldo Simeone This is my paper

Pith reviewed 2026-05-18 19:04 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT

keywords conformal inferencecounterfactual inferencesynthetic data augmentationprediction intervalstreatment effectsrisk-controlling prediction setsimportance weighting

0 comments

The pith

Augmenting conformal calibration with synthetic counterfactual labels from a pre-trained model produces narrower prediction intervals while preserving marginal coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that adds synthetic counterfactual outcomes, generated by an existing model, to the set used for calibrating prediction intervals. This augmentation is combined with a debiasing correction drawn from prediction-powered inference and then processed through risk-controlling prediction sets. A reader would care because standard conformal methods for counterfactuals often yield wide intervals when few real counterfactual samples are available, as in cases of treatment imbalance. The approach claims to deliver the same coverage guarantee with materially smaller intervals.

Core claim

The central claim is that the synthetic-data-powered conformal counterfactual inference procedure, by augmenting the calibration set with labels from a pre-trained model and applying a debiasing step before risk-controlling prediction sets, yields strictly tighter intervals than standard conformal counterfactual inference while retaining exact marginal coverage, with supporting guarantees that hold under both exact and approximate importance weighting.

What carries the argument

SP-CCI, the procedure that inserts synthetic counterfactual labels into the calibration set, applies a prediction-powered-inference debiasing correction, and then runs risk-controlling prediction sets.

If this is right

Prediction intervals for individual counterfactual outcomes become narrower under treatment imbalance.
Marginal coverage is retained for both exact and approximate importance weighting.
The same coverage target is achieved with fewer real counterfactual samples.
Interval widths decrease consistently across multiple datasets relative to standard conformal counterfactual inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to sequential decision problems where new synthetic labels are generated on the fly from an updating model.
It raises the question of how the quality of the pre-trained model affects the degree of tightening in finite samples.
One could test whether replacing the pre-trained model with a simpler baseline still yields net gains after the debiasing step.

Load-bearing premise

The synthetic labels, once corrected by the debiasing step, must preserve the marginal coverage property of the risk-controlling procedure without extra conditions on how accurate the pre-trained model is or how much the data overlap.

What would settle it

An experiment in which the empirical coverage of the resulting intervals falls below the nominal target level on a dataset where the pre-trained counterfactual model is deliberately made inaccurate.

Figures

Figures reproduced from arXiv: 2509.04112 by Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone.

**Figure 1.** Figure 1: The proposed synthetic data-powered conformal counterfactual inference (SP-CCI) method leverages synthetic counterfactual labels Yˆ (1) produced using a pre-trained generative model Pˆ Y (1)|X from the, typically larger, dataset D0 (n0 ≫ n1). as tumor size reduction. The counterfactual outcome Y cf = Y (1 − T) represents what would have happened had the patient received the other treatment option. Individ… view at source ↗

**Figure 2.** Figure 2: A Bayesian network representation of the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: SP-CCI partitions the synthetic dataset D˜ 1 into n1 disjoint groups {D˜ 1,i} n1 i=1, each with r data points. Each group D˜ 1,i is assigned to a real data point (Xi , Yi) from the dataset D1. where Xi represents the covariates for the i-th data point of dataset D0. As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Synthetic data example from [3]: (a) Distribution of empirical test coverage for CCI [3] and SP [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Policy evaluation for counterfactual loss in a wireless handover setting: A mobile device at location [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Distribution of average interval widths (in dB) over 50 random trials for optimistic CCI and SP-CCI [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

This work addresses the problem of constructing reliable prediction intervals for individual counterfactual outcomes. Existing conformal counterfactual inference (CCI) methods provide marginal coverage guarantees but often produce overly conservative intervals, particularly under treatment imbalance when counterfactual samples are scarce. We introduce synthetic data-powered CCI (SP-CCI), a new framework that augments the calibration set with synthetic counterfactual labels generated by a pre-trained counterfactual model. To ensure validity, SP-CCI incorporates synthetic samples into a conformal calibration procedure based on risk-controlling prediction sets (RCPS) with a debiasing step informed by prediction-powered inference (PPI). We prove that SP-CCI achieves tighter prediction intervals while preserving marginal coverage, with theoretical guarantees under both exact and approximate importance weighting. Empirical results on different datasets confirm that SP-CCI consistently reduces interval width compared to standard CCI across all settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SP-CCI augments conformal calibration with synthetic counterfactual labels and PPI debiasing inside RCPS to tighten intervals under imbalance while claiming coverage for both exact and approximate weights.

read the letter

This paper's main contribution is a method called SP-CCI that adds synthetic counterfactual labels from a pre-trained model to the calibration set. It then applies a debiasing step from prediction-powered inference inside a risk-controlling prediction sets procedure. The goal is narrower prediction intervals for individual counterfactual outcomes while keeping marginal coverage, especially when real samples are scarce due to treatment imbalance.

Referee Report

2 major / 2 minor

Summary. The paper proposes SP-CCI, a framework for conformal counterfactual inference that augments the calibration set with synthetic counterfactual labels from a pre-trained model. It applies a PPI-based debiasing correction within an RCPS procedure to obtain tighter prediction intervals while preserving marginal coverage, with claimed theoretical guarantees under both exact and approximate importance weighting. Empirical results across datasets are reported to show consistent reductions in interval width relative to standard CCI.

Significance. If the central claims hold, SP-CCI would offer a practical way to mitigate conservatism in CCI under treatment imbalance by leveraging synthetic data without sacrificing validity. The combination of synthetic labels, PPI debiasing, and RCPS is a coherent technical contribution that could generalize to other conformal settings with scarce counterfactual samples. The manuscript supplies both proofs and experiments, which is a strength.

major comments (2)

[Theoretical guarantees for approximate importance weighting] The section on theoretical guarantees for approximate importance weighting: the high-probability coverage statement for the RCPS procedure after PPI correction does not include an explicit bound on the residual bias induced by approximation error in the importance weights (whether from finite-sample estimation or model misspecification). Without such a bound relative to the RCPS risk tolerance, the marginal coverage guarantee may not hold even if the exact-weighting case is correct.
[§3 (method)] §3 (method) and the associated assumptions: the analysis assumes that the pre-trained counterfactual model produces synthetic labels whose distribution, after PPI debiasing, satisfies the conditions for RCPS to retain marginal coverage. No quantitative conditions on model accuracy or overlap between observed and synthetic distributions are stated, leaving the validity claim dependent on unverified properties of the pre-trained model.

minor comments (2)

[Abstract] The abstract states results on 'different datasets' without naming them or reporting the precise metrics (e.g., average width reduction, coverage rates) used for comparison.
[Notation and definitions] Notation for the importance weights and the PPI correction term could be introduced earlier and used consistently in both the theoretical and experimental sections to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments below and describe the revisions we will make to strengthen the theoretical analysis.

read point-by-point responses

Referee: [Theoretical guarantees for approximate importance weighting] The section on theoretical guarantees for approximate importance weighting: the high-probability coverage statement for the RCPS procedure after PPI correction does not include an explicit bound on the residual bias induced by approximation error in the importance weights (whether from finite-sample estimation or model misspecification). Without such a bound relative to the RCPS risk tolerance, the marginal coverage guarantee may not hold even if the exact-weighting case is correct.

Authors: We agree that an explicit bound on the residual bias would make the guarantee more complete. In the revised version, we will add a lemma bounding the approximation error in the importance weights (from both finite samples and potential misspecification) and show how this error propagates to the RCPS risk control. Specifically, we will derive a term that can be absorbed into the risk tolerance α, ensuring the marginal coverage holds with high probability under a mild condition on the approximation quality. revision: yes
Referee: [§3 (method)] §3 (method) and the associated assumptions: the analysis assumes that the pre-trained counterfactual model produces synthetic labels whose distribution, after PPI debiasing, satisfies the conditions for RCPS to retain marginal coverage. No quantitative conditions on model accuracy or overlap between observed and synthetic distributions are stated, leaving the validity claim dependent on unverified properties of the pre-trained model.

Authors: This is a valid point. The current presentation relies on the debiasing step to correct for discrepancies, but we did not quantify the required model quality. In the revision, we will introduce an assumption on the accuracy of the pre-trained model, such as a bound on the expected absolute difference between synthetic and true counterfactuals or on the variance of the importance weights. We will also discuss how overlap can be ensured via the importance weighting scheme and add a remark on practical verification of these conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent RCPS and PPI steps

full rationale

The abstract and claims describe SP-CCI as augmenting calibration with synthetic labels from a pre-trained model, then applying RCPS with PPI debiasing to obtain tighter intervals while preserving marginal coverage. No quoted equations or steps reduce any 'prediction' to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation or ansatz imported from the authors' prior work. The theoretical guarantees under exact and approximate weighting are presented as separate proofs, leaving the central result self-contained against external conformal baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach depends on the pre-trained counterfactual model generating usable synthetic samples and on the debiasing step correctly compensating for distribution shift; these are not independently verified in the abstract.

axioms (1)

domain assumption Marginal coverage of RCPS is preserved after augmentation with synthetic labels and PPI debiasing under exact or approximate importance weighting.
This is the load-bearing theoretical claim stated in the abstract.

invented entities (1)

Synthetic counterfactual labels no independent evidence
purpose: Augment the calibration set to reduce interval width under treatment imbalance.
Generated by a pre-trained model; no independent evidence of fidelity provided in abstract.

pith-pipeline@v0.9.0 · 5676 in / 1261 out tokens · 47920 ms · 2026-05-18T19:04:07.706288+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that SP-CCI achieves tighter prediction intervals while preserving marginal coverage, with theoretical guarantees under both exact and approximate importance weighting.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SP-CCI incorporates synthetic samples into a conformal calibration procedure based on risk-controlling prediction sets (RCPS) with a debiasing step informed by prediction-powered inference (PPI).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

A tutorial on conformal prediction.Journal of Machine Learn- ing Research, 9(3), 2008

Glenn Shafer and Vladimir Vovk. A tutorial on conformal prediction.Journal of Machine Learn- ing Research, 9(3), 2008

work page 2008
[2]

Springer, 2005

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world. Springer, 2005

work page 2005
[3]

Conformal inference of counterfactuals and individual treat- ment effects.Journal of the Royal Statistical Soci- ety Series B: Statistical Methodology, 83(5):911– 938, 2021

Lihua Lei and Emmanuel J Cand` es. Conformal inference of counterfactuals and individual treat- ment effects.Journal of the Royal Statistical Soci- ety Series B: Statistical Methodology, 83(5):911– 938, 2021

work page 2021
[4]

Extending inferences from a randomized trial to a new target population.Statistics in medicine, 39(14):1999–2014, 2020

Issa J Dahabreh, Sarah E Robertson, Jon A Stein- grimsson, Elizabeth A Stuart, and Miguel A Her- nan. Extending inferences from a randomized trial to a new target population.Statistics in medicine, 39(14):1999–2014, 2020

work page 1999
[5]

Counterfactual reasoning and learning systems: The example of computational advertising.The Journal of Machine Learning Research, 14(1):3207–3260, 2013

L´ eon Bottou, Jonas Peters, Joaquin Qui˜ nonero- Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. Counterfactual reasoning and learning systems: The example of computational advertising.The Journal of Machine Learning Research, 14(1):3207–3260, 2013

work page 2013
[6]

Counterfactual risk minimization: Learning from logged bandit feedback

Adith Swaminathan and Thorsten Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. InInternational confer- ence on machine learning, pages 814–823. PMLR, 2015

work page 2015
[7]

Cambridge University Press, 2020

Ron Kohavi, Diane Tang, and Ya Xu.Trust- worthy online controlled experiments: A practical guide to a/b testing. Cambridge University Press, 2020

work page 2020
[8]

Metalearners for estimating hetero- geneous treatment effects using machine learning

S¨ oren R K¨ unzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. Metalearners for estimating hetero- geneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019

work page 2019
[9]

Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statis- tical Association, 113(523):1228–1242, 2018

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statis- tical Association, 113(523):1228–1242, 2018

work page 2018
[10]

Deep structural causal models for tractable counterfactual inference.Advances in neural information processing systems, 33:857– 869, 2020

Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. Deep structural causal models for tractable counterfactual inference.Advances in neural information processing systems, 33:857– 869, 2020

work page 2020
[11]

Prediction-powered inference.Science, 382(6671):669–674, 2023

Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zr- nic. Prediction-powered inference.Science, 382(6671):669–674, 2023

work page 2023
[12]

Distribution-free, risk-controlling prediction sets

Stephen Bates, Anastasios Angelopoulos, Li- hua Lei, Jitendra Malik, and Michael Jordan. Distribution-free, risk-controlling prediction sets. Journal of the ACM (JACM), 68(6):1–34, 2021

work page 2021
[13]

Learning representations for counterfactual inference

Fredrik Johansson, Uri Shalit, and David Son- tag. Learning representations for counterfactual inference. InInternational conference on machine learning, pages 3020–3029. PMLR, 2016

work page 2016
[14]

Estimating individual treatment effect: gen- eralization bounds and algorithms

Uri Shalit, Fredrik D Johansson, and David Son- tag. Estimating individual treatment effect: gen- eralization bounds and algorithms. InInter- national conference on machine learning, pages 3076–3085. PMLR, 2017

work page 2017
[15]

Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011

Jennifer L Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011

work page 2011
[16]

Causal effect inference with deep latent-variable models.Advances in neural information process- ing systems, 30, 2017

Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models.Advances in neural information process- ing systems, 30, 2017

work page 2017
[17]

Conformal sensitivity analysis for individual treatment effects.Journal of the American Statistical Association, 119(545):122– 135, 2024

Mingzhang Yin, Claudia Shi, Yixin Wang, and David M Blei. Conformal sensitivity analysis for individual treatment effects.Journal of the American Statistical Association, 119(545):122– 135, 2024

work page 2024
[18]

Conformal meta-learners for predictive inference of individual treatment effects.Ad- vances in neural information processing systems, 36:47682–47703, 2023

Ahmed M Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects.Ad- vances in neural information processing systems, 36:47682–47703, 2023

work page 2023
[19]

Ganite: Estimation of individual- ized treatment effects using generative adversar- ial nets

Jinsung Yoon, James Jordon, and Mihaela Van Der Schaar. Ganite: Estimation of individual- ized treatment effects using generative adversar- ial nets. InInternational conference on learning representations, 2018

work page 2018
[20]

Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in Neural Information Pro- cessing Systems, 33:16434–16445, 2020

Ioana Bica, James Jordon, and Mihaela van der Schaar. Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in Neural Information Pro- cessing Systems, 33:16434–16445, 2020. Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference

work page 2020
[21]

Data- efficient off-policy policy evaluation for reinforce- ment learning

Philip Thomas and Emma Brunskill. Data- efficient off-policy policy evaluation for reinforce- ment learning. InInternational conference on ma- chine learning, pages 2139–2148. PMLR, 2016

work page 2016
[22]

Semi-supervised risk control via prediction-powered inference.arXiv preprint arXiv:2412.11174, 2024

Bat-Sheva Einbinder, Liran Ringel, and Yaniv Romano. Semi-supervised risk control via prediction-powered inference.arXiv preprint arXiv:2412.11174, 2024

work page arXiv 2024
[23]

On the application of prob- ability theory to agricultural experiments

Jerzy Splawa-Neyman, Dorota M Dabrowska, and Terrence P Speed. On the application of prob- ability theory to agricultural experiments. essay on principles. section 9.Statistical Science, pages 465–472, 1990

work page 1990
[24]

Estimating causal effects of treatments in randomized and nonrandom- ized studies.Journal of educational Psychology, 66(5):688, 1974

Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandom- ized studies.Journal of educational Psychology, 66(5):688, 1974

work page 1974
[25]

Formal mode of statistical in- ference for causal effects.Journal of statistical planning and inference, 25(3):279–292, 1990

Donald B Rubin. Formal mode of statistical in- ference for causal effects.Journal of statistical planning and inference, 25(3):279–292, 1990

work page 1990
[26]

The central role of the propensity score in obser- vational studies for causal effects.Biometrika, 70(1):41–55, 1983

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in obser- vational studies for causal effects.Biometrika, 70(1):41–55, 1983

work page 1983
[27]

Bayesian inference for causal effects: The role of randomization.The Annals of statistics, pages 34–58, 1978

Donald B Rubin. Bayesian inference for causal effects: The role of randomization.The Annals of statistics, pages 34–58, 1978

work page 1978
[28]

Cambridge university press, 2015

Guido W Imbens and Donald B Rubin.Causal inference in statistics, social, and biomedical sci- ences. Cambridge university press, 2015

work page 2015
[29]

MIT press, 2009

Daphne Koller and Nir Friedman.Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009
[30]

Quan- tile regression forests.Journal of machine learn- ing research, 7(6), 2006

Nicolai Meinshausen and Greg Ridgeway. Quan- tile regression forests.Journal of machine learn- ing research, 7(6), 2006

work page 2006
[31]

Greedy function approxi- mation: a gradient boosting machine.Annals of statistics, pages 1189–1232, 2001

Jerome H Friedman. Greedy function approxi- mation: a gradient boosting machine.Annals of statistics, pages 1189–1232, 2001

work page 2001
[32]

Cambridge university press, 2009

Judea Pearl.Causality. Cambridge university press, 2009

work page 2009
[33]

Calibrating doubly-robust es- timators with unbalanced treatment assignment

Daniele Ballinari. Calibrating doubly-robust es- timators with unbalanced treatment assignment. Economics Letters, 241:111838, 2024

work page 2024
[34]

arXiv preprint arXiv:2505.18659 , year=

Sangwoo Park, Matteo Zecchin, and Osvaldo Simeone. Adaptive prediction-powered autoeval with reliability and efficiency guarantees.arXiv preprint arXiv:2505.18659, 2025

work page arXiv 2025
[35]

Statistical de- cision theory with counterfactual loss.arXiv preprint arXiv:2505.08908, 2025

Benedikt Koch and Kosuke Imai. Statistical de- cision theory with counterfactual loss.arXiv preprint arXiv:2505.08908, 2025

work page arXiv 2025
[36]

Learning-based handover in mo- bile millimeter-wave networks.IEEE Transac- tions on Cognitive Communications and Network- ing, 7(2):663–674, 2020

Sara Khosravi, Hossein Shokri-Ghadikolaei, and Marina Petrova. Learning-based handover in mo- bile millimeter-wave networks.IEEE Transac- tions on Cognitive Communications and Network- ing, 7(2):663–674, 2020

work page 2020
[37]

arXiv preprint arXiv:2203.11854 , year=

Jakob Hoydis, Sebastian Cammerer, Fay¸ cal Ait Aoudia, Avinash Vem, Nikolaus Binder, Guillermo Marcus, and Alexander Keller. Sionna: An open-source library for next- generation physical layer research.arXiv preprint arXiv:2203.11854, 2022. Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone Synthetic Counterfactual Labels for Efficient Conformal Counte...

work page arXiv 2022

[1] [1]

A tutorial on conformal prediction.Journal of Machine Learn- ing Research, 9(3), 2008

Glenn Shafer and Vladimir Vovk. A tutorial on conformal prediction.Journal of Machine Learn- ing Research, 9(3), 2008

work page 2008

[2] [2]

Springer, 2005

Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world. Springer, 2005

work page 2005

[3] [3]

Conformal inference of counterfactuals and individual treat- ment effects.Journal of the Royal Statistical Soci- ety Series B: Statistical Methodology, 83(5):911– 938, 2021

Lihua Lei and Emmanuel J Cand` es. Conformal inference of counterfactuals and individual treat- ment effects.Journal of the Royal Statistical Soci- ety Series B: Statistical Methodology, 83(5):911– 938, 2021

work page 2021

[4] [4]

Extending inferences from a randomized trial to a new target population.Statistics in medicine, 39(14):1999–2014, 2020

Issa J Dahabreh, Sarah E Robertson, Jon A Stein- grimsson, Elizabeth A Stuart, and Miguel A Her- nan. Extending inferences from a randomized trial to a new target population.Statistics in medicine, 39(14):1999–2014, 2020

work page 1999

[5] [5]

Counterfactual reasoning and learning systems: The example of computational advertising.The Journal of Machine Learning Research, 14(1):3207–3260, 2013

L´ eon Bottou, Jonas Peters, Joaquin Qui˜ nonero- Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. Counterfactual reasoning and learning systems: The example of computational advertising.The Journal of Machine Learning Research, 14(1):3207–3260, 2013

work page 2013

[6] [6]

Counterfactual risk minimization: Learning from logged bandit feedback

Adith Swaminathan and Thorsten Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. InInternational confer- ence on machine learning, pages 814–823. PMLR, 2015

work page 2015

[7] [7]

Cambridge University Press, 2020

Ron Kohavi, Diane Tang, and Ya Xu.Trust- worthy online controlled experiments: A practical guide to a/b testing. Cambridge University Press, 2020

work page 2020

[8] [8]

Metalearners for estimating hetero- geneous treatment effects using machine learning

S¨ oren R K¨ unzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. Metalearners for estimating hetero- geneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019

work page 2019

[9] [9]

Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statis- tical Association, 113(523):1228–1242, 2018

Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statis- tical Association, 113(523):1228–1242, 2018

work page 2018

[10] [10]

Deep structural causal models for tractable counterfactual inference.Advances in neural information processing systems, 33:857– 869, 2020

Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. Deep structural causal models for tractable counterfactual inference.Advances in neural information processing systems, 33:857– 869, 2020

work page 2020

[11] [11]

Prediction-powered inference.Science, 382(6671):669–674, 2023

Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zr- nic. Prediction-powered inference.Science, 382(6671):669–674, 2023

work page 2023

[12] [12]

Distribution-free, risk-controlling prediction sets

Stephen Bates, Anastasios Angelopoulos, Li- hua Lei, Jitendra Malik, and Michael Jordan. Distribution-free, risk-controlling prediction sets. Journal of the ACM (JACM), 68(6):1–34, 2021

work page 2021

[13] [13]

Learning representations for counterfactual inference

Fredrik Johansson, Uri Shalit, and David Son- tag. Learning representations for counterfactual inference. InInternational conference on machine learning, pages 3020–3029. PMLR, 2016

work page 2016

[14] [14]

Estimating individual treatment effect: gen- eralization bounds and algorithms

Uri Shalit, Fredrik D Johansson, and David Son- tag. Estimating individual treatment effect: gen- eralization bounds and algorithms. InInter- national conference on machine learning, pages 3076–3085. PMLR, 2017

work page 2017

[15] [15]

Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011

Jennifer L Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011

work page 2011

[16] [16]

Causal effect inference with deep latent-variable models.Advances in neural information process- ing systems, 30, 2017

Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models.Advances in neural information process- ing systems, 30, 2017

work page 2017

[17] [17]

Conformal sensitivity analysis for individual treatment effects.Journal of the American Statistical Association, 119(545):122– 135, 2024

Mingzhang Yin, Claudia Shi, Yixin Wang, and David M Blei. Conformal sensitivity analysis for individual treatment effects.Journal of the American Statistical Association, 119(545):122– 135, 2024

work page 2024

[18] [18]

Conformal meta-learners for predictive inference of individual treatment effects.Ad- vances in neural information processing systems, 36:47682–47703, 2023

Ahmed M Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects.Ad- vances in neural information processing systems, 36:47682–47703, 2023

work page 2023

[19] [19]

Ganite: Estimation of individual- ized treatment effects using generative adversar- ial nets

Jinsung Yoon, James Jordon, and Mihaela Van Der Schaar. Ganite: Estimation of individual- ized treatment effects using generative adversar- ial nets. InInternational conference on learning representations, 2018

work page 2018

[20] [20]

Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in Neural Information Pro- cessing Systems, 33:16434–16445, 2020

Ioana Bica, James Jordon, and Mihaela van der Schaar. Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in Neural Information Pro- cessing Systems, 33:16434–16445, 2020. Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference

work page 2020

[21] [21]

Data- efficient off-policy policy evaluation for reinforce- ment learning

Philip Thomas and Emma Brunskill. Data- efficient off-policy policy evaluation for reinforce- ment learning. InInternational conference on ma- chine learning, pages 2139–2148. PMLR, 2016

work page 2016

[22] [22]

Semi-supervised risk control via prediction-powered inference.arXiv preprint arXiv:2412.11174, 2024

Bat-Sheva Einbinder, Liran Ringel, and Yaniv Romano. Semi-supervised risk control via prediction-powered inference.arXiv preprint arXiv:2412.11174, 2024

work page arXiv 2024

[23] [23]

On the application of prob- ability theory to agricultural experiments

Jerzy Splawa-Neyman, Dorota M Dabrowska, and Terrence P Speed. On the application of prob- ability theory to agricultural experiments. essay on principles. section 9.Statistical Science, pages 465–472, 1990

work page 1990

[24] [24]

Estimating causal effects of treatments in randomized and nonrandom- ized studies.Journal of educational Psychology, 66(5):688, 1974

Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandom- ized studies.Journal of educational Psychology, 66(5):688, 1974

work page 1974

[25] [25]

Formal mode of statistical in- ference for causal effects.Journal of statistical planning and inference, 25(3):279–292, 1990

Donald B Rubin. Formal mode of statistical in- ference for causal effects.Journal of statistical planning and inference, 25(3):279–292, 1990

work page 1990

[26] [26]

The central role of the propensity score in obser- vational studies for causal effects.Biometrika, 70(1):41–55, 1983

Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in obser- vational studies for causal effects.Biometrika, 70(1):41–55, 1983

work page 1983

[27] [27]

Bayesian inference for causal effects: The role of randomization.The Annals of statistics, pages 34–58, 1978

Donald B Rubin. Bayesian inference for causal effects: The role of randomization.The Annals of statistics, pages 34–58, 1978

work page 1978

[28] [28]

Cambridge university press, 2015

Guido W Imbens and Donald B Rubin.Causal inference in statistics, social, and biomedical sci- ences. Cambridge university press, 2015

work page 2015

[29] [29]

MIT press, 2009

Daphne Koller and Nir Friedman.Probabilistic graphical models: principles and techniques. MIT press, 2009

work page 2009

[30] [30]

Quan- tile regression forests.Journal of machine learn- ing research, 7(6), 2006

Nicolai Meinshausen and Greg Ridgeway. Quan- tile regression forests.Journal of machine learn- ing research, 7(6), 2006

work page 2006

[31] [31]

Greedy function approxi- mation: a gradient boosting machine.Annals of statistics, pages 1189–1232, 2001

Jerome H Friedman. Greedy function approxi- mation: a gradient boosting machine.Annals of statistics, pages 1189–1232, 2001

work page 2001

[32] [32]

Cambridge university press, 2009

Judea Pearl.Causality. Cambridge university press, 2009

work page 2009

[33] [33]

Calibrating doubly-robust es- timators with unbalanced treatment assignment

Daniele Ballinari. Calibrating doubly-robust es- timators with unbalanced treatment assignment. Economics Letters, 241:111838, 2024

work page 2024

[34] [34]

arXiv preprint arXiv:2505.18659 , year=

Sangwoo Park, Matteo Zecchin, and Osvaldo Simeone. Adaptive prediction-powered autoeval with reliability and efficiency guarantees.arXiv preprint arXiv:2505.18659, 2025

work page arXiv 2025

[35] [35]

Statistical de- cision theory with counterfactual loss.arXiv preprint arXiv:2505.08908, 2025

Benedikt Koch and Kosuke Imai. Statistical de- cision theory with counterfactual loss.arXiv preprint arXiv:2505.08908, 2025

work page arXiv 2025

[36] [36]

Learning-based handover in mo- bile millimeter-wave networks.IEEE Transac- tions on Cognitive Communications and Network- ing, 7(2):663–674, 2020

Sara Khosravi, Hossein Shokri-Ghadikolaei, and Marina Petrova. Learning-based handover in mo- bile millimeter-wave networks.IEEE Transac- tions on Cognitive Communications and Network- ing, 7(2):663–674, 2020

work page 2020

[37] [37]

arXiv preprint arXiv:2203.11854 , year=

Jakob Hoydis, Sebastian Cammerer, Fay¸ cal Ait Aoudia, Avinash Vem, Nikolaus Binder, Guillermo Marcus, and Alexander Keller. Sionna: An open-source library for next- generation physical layer research.arXiv preprint arXiv:2203.11854, 2022. Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone Synthetic Counterfactual Labels for Efficient Conformal Counte...

work page arXiv 2022