pith. sign in

arxiv: 2509.04112 · v3 · submitted 2025-09-04 · 💻 cs.LG · cs.IT· math.IT

Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference

Pith reviewed 2026-05-18 19:04 UTC · model grok-4.3

classification 💻 cs.LG cs.ITmath.IT
keywords conformal inferencecounterfactual inferencesynthetic data augmentationprediction intervalstreatment effectsrisk-controlling prediction setsimportance weighting
0
0 comments X

The pith

Augmenting conformal calibration with synthetic counterfactual labels from a pre-trained model produces narrower prediction intervals while preserving marginal coverage.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that adds synthetic counterfactual outcomes, generated by an existing model, to the set used for calibrating prediction intervals. This augmentation is combined with a debiasing correction drawn from prediction-powered inference and then processed through risk-controlling prediction sets. A reader would care because standard conformal methods for counterfactuals often yield wide intervals when few real counterfactual samples are available, as in cases of treatment imbalance. The approach claims to deliver the same coverage guarantee with materially smaller intervals.

Core claim

The central claim is that the synthetic-data-powered conformal counterfactual inference procedure, by augmenting the calibration set with labels from a pre-trained model and applying a debiasing step before risk-controlling prediction sets, yields strictly tighter intervals than standard conformal counterfactual inference while retaining exact marginal coverage, with supporting guarantees that hold under both exact and approximate importance weighting.

What carries the argument

SP-CCI, the procedure that inserts synthetic counterfactual labels into the calibration set, applies a prediction-powered-inference debiasing correction, and then runs risk-controlling prediction sets.

If this is right

  • Prediction intervals for individual counterfactual outcomes become narrower under treatment imbalance.
  • Marginal coverage is retained for both exact and approximate importance weighting.
  • The same coverage target is achieved with fewer real counterfactual samples.
  • Interval widths decrease consistently across multiple datasets relative to standard conformal counterfactual inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could be applied to sequential decision problems where new synthetic labels are generated on the fly from an updating model.
  • It raises the question of how the quality of the pre-trained model affects the degree of tightening in finite samples.
  • One could test whether replacing the pre-trained model with a simpler baseline still yields net gains after the debiasing step.

Load-bearing premise

The synthetic labels, once corrected by the debiasing step, must preserve the marginal coverage property of the risk-controlling procedure without extra conditions on how accurate the pre-trained model is or how much the data overlap.

What would settle it

An experiment in which the empirical coverage of the resulting intervals falls below the nominal target level on a dataset where the pre-trained counterfactual model is deliberately made inaccurate.

Figures

Figures reproduced from arXiv: 2509.04112 by Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone.

Figure 1
Figure 1. Figure 1: The proposed synthetic data-powered conformal counterfactual inference (SP-CCI) method leverages synthetic counterfactual labels Yˆ (1) pro￾duced using a pre-trained generative model Pˆ Y (1)|X from the, typically larger, dataset D0 (n0 ≫ n1). as tumor size reduction. The counterfactual outcome Y cf = Y (1 − T) represents what would have happened had the patient received the other treatment option. Individ… view at source ↗
Figure 2
Figure 2. Figure 2: A Bayesian network representation of the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SP-CCI partitions the synthetic dataset D˜ 1 into n1 disjoint groups {D˜ 1,i} n1 i=1, each with r data points. Each group D˜ 1,i is assigned to a real data point (Xi , Yi) from the dataset D1. where Xi represents the covariates for the i-th data point of dataset D0. As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Synthetic data example from [3]: (a) Distribution of empirical test coverage for CCI [3] and SP [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Policy evaluation for counterfactual loss in a wireless handover setting: A mobile device at location [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of average interval widths (in dB) over 50 random trials for optimistic CCI and SP-CCI [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
read the original abstract

This work addresses the problem of constructing reliable prediction intervals for individual counterfactual outcomes. Existing conformal counterfactual inference (CCI) methods provide marginal coverage guarantees but often produce overly conservative intervals, particularly under treatment imbalance when counterfactual samples are scarce. We introduce synthetic data-powered CCI (SP-CCI), a new framework that augments the calibration set with synthetic counterfactual labels generated by a pre-trained counterfactual model. To ensure validity, SP-CCI incorporates synthetic samples into a conformal calibration procedure based on risk-controlling prediction sets (RCPS) with a debiasing step informed by prediction-powered inference (PPI). We prove that SP-CCI achieves tighter prediction intervals while preserving marginal coverage, with theoretical guarantees under both exact and approximate importance weighting. Empirical results on different datasets confirm that SP-CCI consistently reduces interval width compared to standard CCI across all settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes SP-CCI, a framework for conformal counterfactual inference that augments the calibration set with synthetic counterfactual labels from a pre-trained model. It applies a PPI-based debiasing correction within an RCPS procedure to obtain tighter prediction intervals while preserving marginal coverage, with claimed theoretical guarantees under both exact and approximate importance weighting. Empirical results across datasets are reported to show consistent reductions in interval width relative to standard CCI.

Significance. If the central claims hold, SP-CCI would offer a practical way to mitigate conservatism in CCI under treatment imbalance by leveraging synthetic data without sacrificing validity. The combination of synthetic labels, PPI debiasing, and RCPS is a coherent technical contribution that could generalize to other conformal settings with scarce counterfactual samples. The manuscript supplies both proofs and experiments, which is a strength.

major comments (2)
  1. [Theoretical guarantees for approximate importance weighting] The section on theoretical guarantees for approximate importance weighting: the high-probability coverage statement for the RCPS procedure after PPI correction does not include an explicit bound on the residual bias induced by approximation error in the importance weights (whether from finite-sample estimation or model misspecification). Without such a bound relative to the RCPS risk tolerance, the marginal coverage guarantee may not hold even if the exact-weighting case is correct.
  2. [§3 (method)] §3 (method) and the associated assumptions: the analysis assumes that the pre-trained counterfactual model produces synthetic labels whose distribution, after PPI debiasing, satisfies the conditions for RCPS to retain marginal coverage. No quantitative conditions on model accuracy or overlap between observed and synthetic distributions are stated, leaving the validity claim dependent on unverified properties of the pre-trained model.
minor comments (2)
  1. [Abstract] The abstract states results on 'different datasets' without naming them or reporting the precise metrics (e.g., average width reduction, coverage rates) used for comparison.
  2. [Notation and definitions] Notation for the importance weights and the PPI correction term could be introduced earlier and used consistently in both the theoretical and experimental sections to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments on our manuscript. We address each of the major comments below and describe the revisions we will make to strengthen the theoretical analysis.

read point-by-point responses
  1. Referee: [Theoretical guarantees for approximate importance weighting] The section on theoretical guarantees for approximate importance weighting: the high-probability coverage statement for the RCPS procedure after PPI correction does not include an explicit bound on the residual bias induced by approximation error in the importance weights (whether from finite-sample estimation or model misspecification). Without such a bound relative to the RCPS risk tolerance, the marginal coverage guarantee may not hold even if the exact-weighting case is correct.

    Authors: We agree that an explicit bound on the residual bias would make the guarantee more complete. In the revised version, we will add a lemma bounding the approximation error in the importance weights (from both finite samples and potential misspecification) and show how this error propagates to the RCPS risk control. Specifically, we will derive a term that can be absorbed into the risk tolerance α, ensuring the marginal coverage holds with high probability under a mild condition on the approximation quality. revision: yes

  2. Referee: [§3 (method)] §3 (method) and the associated assumptions: the analysis assumes that the pre-trained counterfactual model produces synthetic labels whose distribution, after PPI debiasing, satisfies the conditions for RCPS to retain marginal coverage. No quantitative conditions on model accuracy or overlap between observed and synthetic distributions are stated, leaving the validity claim dependent on unverified properties of the pre-trained model.

    Authors: This is a valid point. The current presentation relies on the debiasing step to correct for discrepancies, but we did not quantify the required model quality. In the revision, we will introduce an assumption on the accuracy of the pre-trained model, such as a bound on the expected absolute difference between synthetic and true counterfactuals or on the variance of the importance weights. We will also discuss how overlap can be ensured via the importance weighting scheme and add a remark on practical verification of these conditions. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation relies on independent RCPS and PPI steps

full rationale

The abstract and claims describe SP-CCI as augmenting calibration with synthetic labels from a pre-trained model, then applying RCPS with PPI debiasing to obtain tighter intervals while preserving marginal coverage. No quoted equations or steps reduce any 'prediction' to a fitted parameter by construction, nor does any load-bearing premise collapse to a self-citation or ansatz imported from the authors' prior work. The theoretical guarantees under exact and approximate weighting are presented as separate proofs, leaving the central result self-contained against external conformal baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach depends on the pre-trained counterfactual model generating usable synthetic samples and on the debiasing step correctly compensating for distribution shift; these are not independently verified in the abstract.

axioms (1)
  • domain assumption Marginal coverage of RCPS is preserved after augmentation with synthetic labels and PPI debiasing under exact or approximate importance weighting.
    This is the load-bearing theoretical claim stated in the abstract.
invented entities (1)
  • Synthetic counterfactual labels no independent evidence
    purpose: Augment the calibration set to reduce interval width under treatment imbalance.
    Generated by a pre-trained model; no independent evidence of fidelity provided in abstract.

pith-pipeline@v0.9.0 · 5676 in / 1261 out tokens · 47920 ms · 2026-05-18T19:04:07.706288+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    A tutorial on conformal prediction.Journal of Machine Learn- ing Research, 9(3), 2008

    Glenn Shafer and Vladimir Vovk. A tutorial on conformal prediction.Journal of Machine Learn- ing Research, 9(3), 2008

  2. [2]

    Springer, 2005

    Vladimir Vovk, Alexander Gammerman, and Glenn Shafer.Algorithmic learning in a random world. Springer, 2005

  3. [3]

    Conformal inference of counterfactuals and individual treat- ment effects.Journal of the Royal Statistical Soci- ety Series B: Statistical Methodology, 83(5):911– 938, 2021

    Lihua Lei and Emmanuel J Cand` es. Conformal inference of counterfactuals and individual treat- ment effects.Journal of the Royal Statistical Soci- ety Series B: Statistical Methodology, 83(5):911– 938, 2021

  4. [4]

    Extending inferences from a randomized trial to a new target population.Statistics in medicine, 39(14):1999–2014, 2020

    Issa J Dahabreh, Sarah E Robertson, Jon A Stein- grimsson, Elizabeth A Stuart, and Miguel A Her- nan. Extending inferences from a randomized trial to a new target population.Statistics in medicine, 39(14):1999–2014, 2020

  5. [5]

    Counterfactual reasoning and learning systems: The example of computational advertising.The Journal of Machine Learning Research, 14(1):3207–3260, 2013

    L´ eon Bottou, Jonas Peters, Joaquin Qui˜ nonero- Candela, Denis X Charles, D Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, and Ed Snelson. Counterfactual reasoning and learning systems: The example of computational advertising.The Journal of Machine Learning Research, 14(1):3207–3260, 2013

  6. [6]

    Counterfactual risk minimization: Learning from logged bandit feedback

    Adith Swaminathan and Thorsten Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. InInternational confer- ence on machine learning, pages 814–823. PMLR, 2015

  7. [7]

    Cambridge University Press, 2020

    Ron Kohavi, Diane Tang, and Ya Xu.Trust- worthy online controlled experiments: A practical guide to a/b testing. Cambridge University Press, 2020

  8. [8]

    Metalearners for estimating hetero- geneous treatment effects using machine learning

    S¨ oren R K¨ unzel, Jasjeet S Sekhon, Peter J Bickel, and Bin Yu. Metalearners for estimating hetero- geneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10):4156–4165, 2019

  9. [9]

    Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statis- tical Association, 113(523):1228–1242, 2018

    Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statis- tical Association, 113(523):1228–1242, 2018

  10. [10]

    Deep structural causal models for tractable counterfactual inference.Advances in neural information processing systems, 33:857– 869, 2020

    Nick Pawlowski, Daniel Coelho de Castro, and Ben Glocker. Deep structural causal models for tractable counterfactual inference.Advances in neural information processing systems, 33:857– 869, 2020

  11. [11]

    Prediction-powered inference.Science, 382(6671):669–674, 2023

    Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zr- nic. Prediction-powered inference.Science, 382(6671):669–674, 2023

  12. [12]

    Distribution-free, risk-controlling prediction sets

    Stephen Bates, Anastasios Angelopoulos, Li- hua Lei, Jitendra Malik, and Michael Jordan. Distribution-free, risk-controlling prediction sets. Journal of the ACM (JACM), 68(6):1–34, 2021

  13. [13]

    Learning representations for counterfactual inference

    Fredrik Johansson, Uri Shalit, and David Son- tag. Learning representations for counterfactual inference. InInternational conference on machine learning, pages 3020–3029. PMLR, 2016

  14. [14]

    Estimating individual treatment effect: gen- eralization bounds and algorithms

    Uri Shalit, Fredrik D Johansson, and David Son- tag. Estimating individual treatment effect: gen- eralization bounds and algorithms. InInter- national conference on machine learning, pages 3076–3085. PMLR, 2017

  15. [15]

    Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011

    Jennifer L Hill. Bayesian nonparametric modeling for causal inference.Journal of Computational and Graphical Statistics, 20(1):217–240, 2011

  16. [16]

    Causal effect inference with deep latent-variable models.Advances in neural information process- ing systems, 30, 2017

    Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models.Advances in neural information process- ing systems, 30, 2017

  17. [17]

    Conformal sensitivity analysis for individual treatment effects.Journal of the American Statistical Association, 119(545):122– 135, 2024

    Mingzhang Yin, Claudia Shi, Yixin Wang, and David M Blei. Conformal sensitivity analysis for individual treatment effects.Journal of the American Statistical Association, 119(545):122– 135, 2024

  18. [18]

    Conformal meta-learners for predictive inference of individual treatment effects.Ad- vances in neural information processing systems, 36:47682–47703, 2023

    Ahmed M Alaa, Zaid Ahmad, and Mark van der Laan. Conformal meta-learners for predictive inference of individual treatment effects.Ad- vances in neural information processing systems, 36:47682–47703, 2023

  19. [19]

    Ganite: Estimation of individual- ized treatment effects using generative adversar- ial nets

    Jinsung Yoon, James Jordon, and Mihaela Van Der Schaar. Ganite: Estimation of individual- ized treatment effects using generative adversar- ial nets. InInternational conference on learning representations, 2018

  20. [20]

    Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in Neural Information Pro- cessing Systems, 33:16434–16445, 2020

    Ioana Bica, James Jordon, and Mihaela van der Schaar. Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in Neural Information Pro- cessing Systems, 33:16434–16445, 2020. Synthetic Counterfactual Labels for Efficient Conformal Counterfactual Inference

  21. [21]

    Data- efficient off-policy policy evaluation for reinforce- ment learning

    Philip Thomas and Emma Brunskill. Data- efficient off-policy policy evaluation for reinforce- ment learning. InInternational conference on ma- chine learning, pages 2139–2148. PMLR, 2016

  22. [22]

    Semi-supervised risk control via prediction-powered inference.arXiv preprint arXiv:2412.11174, 2024

    Bat-Sheva Einbinder, Liran Ringel, and Yaniv Romano. Semi-supervised risk control via prediction-powered inference.arXiv preprint arXiv:2412.11174, 2024

  23. [23]

    On the application of prob- ability theory to agricultural experiments

    Jerzy Splawa-Neyman, Dorota M Dabrowska, and Terrence P Speed. On the application of prob- ability theory to agricultural experiments. essay on principles. section 9.Statistical Science, pages 465–472, 1990

  24. [24]

    Estimating causal effects of treatments in randomized and nonrandom- ized studies.Journal of educational Psychology, 66(5):688, 1974

    Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandom- ized studies.Journal of educational Psychology, 66(5):688, 1974

  25. [25]

    Formal mode of statistical in- ference for causal effects.Journal of statistical planning and inference, 25(3):279–292, 1990

    Donald B Rubin. Formal mode of statistical in- ference for causal effects.Journal of statistical planning and inference, 25(3):279–292, 1990

  26. [26]

    The central role of the propensity score in obser- vational studies for causal effects.Biometrika, 70(1):41–55, 1983

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in obser- vational studies for causal effects.Biometrika, 70(1):41–55, 1983

  27. [27]

    Bayesian inference for causal effects: The role of randomization.The Annals of statistics, pages 34–58, 1978

    Donald B Rubin. Bayesian inference for causal effects: The role of randomization.The Annals of statistics, pages 34–58, 1978

  28. [28]

    Cambridge university press, 2015

    Guido W Imbens and Donald B Rubin.Causal inference in statistics, social, and biomedical sci- ences. Cambridge university press, 2015

  29. [29]

    MIT press, 2009

    Daphne Koller and Nir Friedman.Probabilistic graphical models: principles and techniques. MIT press, 2009

  30. [30]

    Quan- tile regression forests.Journal of machine learn- ing research, 7(6), 2006

    Nicolai Meinshausen and Greg Ridgeway. Quan- tile regression forests.Journal of machine learn- ing research, 7(6), 2006

  31. [31]

    Greedy function approxi- mation: a gradient boosting machine.Annals of statistics, pages 1189–1232, 2001

    Jerome H Friedman. Greedy function approxi- mation: a gradient boosting machine.Annals of statistics, pages 1189–1232, 2001

  32. [32]

    Cambridge university press, 2009

    Judea Pearl.Causality. Cambridge university press, 2009

  33. [33]

    Calibrating doubly-robust es- timators with unbalanced treatment assignment

    Daniele Ballinari. Calibrating doubly-robust es- timators with unbalanced treatment assignment. Economics Letters, 241:111838, 2024

  34. [34]

    arXiv preprint arXiv:2505.18659 , year=

    Sangwoo Park, Matteo Zecchin, and Osvaldo Simeone. Adaptive prediction-powered autoeval with reliability and efficiency guarantees.arXiv preprint arXiv:2505.18659, 2025

  35. [35]

    Statistical de- cision theory with counterfactual loss.arXiv preprint arXiv:2505.08908, 2025

    Benedikt Koch and Kosuke Imai. Statistical de- cision theory with counterfactual loss.arXiv preprint arXiv:2505.08908, 2025

  36. [36]

    Learning-based handover in mo- bile millimeter-wave networks.IEEE Transac- tions on Cognitive Communications and Network- ing, 7(2):663–674, 2020

    Sara Khosravi, Hossein Shokri-Ghadikolaei, and Marina Petrova. Learning-based handover in mo- bile millimeter-wave networks.IEEE Transac- tions on Cognitive Communications and Network- ing, 7(2):663–674, 2020

  37. [37]

    arXiv preprint arXiv:2203.11854 , year=

    Jakob Hoydis, Sebastian Cammerer, Fay¸ cal Ait Aoudia, Avinash Vem, Nikolaus Binder, Guillermo Marcus, and Alexander Keller. Sionna: An open-source library for next- generation physical layer research.arXiv preprint arXiv:2203.11854, 2022. Amirmohammad Farzaneh, Matteo Zecchin, Osvaldo Simeone Synthetic Counterfactual Labels for Efficient Conformal Counte...