pith. sign in

arxiv: 2605.16989 · v1 · pith:7PLKTR26new · submitted 2026-05-16 · 💻 cs.LG

Decision-Aware Proximal Bridge Learning for Optimal Treatment Selection

Pith reviewed 2026-05-19 20:54 UTC · model grok-4.3

classification 💻 cs.LG
keywords proximal causal inferencebridge functionstreatment selectiondecision-aware learningregret boundhidden confoundingoptimal policyweighted loss
0
0 comments X

The pith

A policy-targeted weighted bridge loss controls treatment-selection regret through a weighted ill-posedness constant in proximal causal inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Individualized treatment selection with continuous actions needs accurate causal response estimates mainly in the action regions that decide the optimal choice, not uniformly across all observed treatments. Standard proximal bridge methods allocate modeling effort according to the data distribution, which can leave the decision-critical areas poorly estimated even when identification holds via proxies and bridge functions. This paper introduces a weighted bridge loss that rebalances the objective toward policy-relevant regions while retaining global stabilization properties. The authors prove a regret bound in which the resulting treatment-selection regret is controlled by a weighted version of the usual ill-posedness constant. They instantiate the idea in decision-aware versions of several proximal solvers and show empirically that the weighting reduces regret relative to unweighted baselines.

Core claim

The paper establishes that a policy-targeted weighted bridge loss, when used inside proximal bridge estimation, controls treatment-selection regret via a weighted ill-posedness constant. The loss emphasizes decision-relevant treatment regions while preserving identification under the standard proximal causal inference assumptions. Practical algorithms alternate between weighted bridge estimation, response-surface projection, policy update, and iterative weight refinement; experiments indicate lower regret across multiple proximal solvers.

What carries the argument

The policy-targeted weighted bridge loss, which reweights the proximal bridge objective to emphasize regions that determine the optimal treatment choice.

If this is right

  • Treatment-selection regret is bounded by a weighted ill-posedness constant rather than the unweighted version.
  • Decision-aware variants can be created for multiple existing proximal bridge solvers.
  • The alternating procedure of weighted estimation, projection, policy update, and weight refinement yields practical algorithms.
  • Empirical results show reduced regret compared with standard proximal methods under hidden confounding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same weighting idea could be applied to other proximal tasks where the end goal is a decision rather than pure effect estimation.
  • The weighted ill-posedness constant offers a way to quantify how hidden confounding affects decision quality specifically.
  • The method could be tested on longitudinal data to see whether the weighting remains stable when policies change over time.

Load-bearing premise

The proximal causal inference identification assumptions hold, including suitable bridge functions and proxy variables that recover causal effects despite hidden confounding, and that the decision-aware weighting preserves identification.

What would settle it

Run the weighted and unweighted proximal solvers on a synthetic or semi-synthetic dataset with known optimal policy and hidden confounding; if the weighted version does not produce lower treatment-selection regret when the weights correctly highlight the optimal-action region, the regret bound would be falsified.

Figures

Figures reproduced from arXiv: 2605.16989 by Alejandro Almod\'ovar, Axel Brando, Eduard Serrahima de Cambra, Gerard Sanz, Juan Parras, Tom\`as Garriga.

Figure 1
Figure 1. Figure 1: (a) Proximal causal graph with proxy variables. (b) Decision-aware bridge-learning pipeline [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Regret metrics in the synthetic and semi-synthetic datasets. Mean and 95 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Counterfactual RMSE in the synthetic and semi-synthetic datasets. Mean and 95 [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Single-individual example from the synthetic benchmark. DA-PMMR im￾proves the response estimate near the ora￾cle optimum, yielding a treatment recom￾mendation closer to the optimal dose even though it does not uniformly improve the full response surface. We evaluate on two proximal continuous-treatment benchmarks with hidden confounding and proxy vari￾ables. Each observation is O = (X, Z, W, A, Y ), where … view at source ↗
Figure 5
Figure 5. Figure 5: Sensitivity of weighting hyperparameters, holding the other two fixed. Mean and [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Regret as a function of the con￾founding strength for three of the DA mod￾els. Mean and 95% confidence intervals over 5 seeds reported. Sensitivity to weighting hyperparameters. We vary the weighting bandwidth τ , localization strength λ, and number of reweighting rounds nrounds one at a time. The results in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Synthetic proximal graph. Observed variables are shown in gray with solid nodes, while [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Reference causal and associational curves in the synthetic benchmark. Left: for a fixed [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Semi-synthetic TCGA proximal graph. Observed variables are shown in gray, while the [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Reference causal and associational curves in the semi-synthetic TCGA benchmark. Left: [PITH_FULL_IMAGE:figures/full_fig_p034_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Factual RMSE in the synthetic and semi-synthetic datasets. Mean and 95 [PITH_FULL_IMAGE:figures/full_fig_p037_11.png] view at source ↗
read the original abstract

Individualized treatment selection with continuous actions requires accurate causal response estimation in decision-relevant regions, rather than uniformly over the entire action space. Estimating a global causal response surface and then choosing the treatment that maximizes it can therefore be suboptimal, since standard estimation objectives allocate modeling effort according to the observed treatment distribution rather than the regions that determine the optimal decision. While decision-aware approaches have been studied in unconfounded settings, this problem remains underexplored in proximal causal inference, where proxy variables and bridge functions enable identification under suitable assumptions even in the presence of hidden confounding. Despite recent progress, proximal methods have primarily focused on treatment-effect and potential-outcome estimation rather than treatment selection and optimal decision-making. To bridge this gap, we introduce a policy-targeted weighted bridge loss that emphasizes decision-relevant treatment regions while retaining global stabilization. We prove a regret bound showing that the proposed weighted bridge loss controls treatment-selection regret through a weighted ill-posedness constant. We instantiate the framework in decision-aware variants of several proximal bridge solvers, yielding practical algorithms that alternate between weighted bridge estimation, response-surface projection, policy update, and weight refinement. Empirically, we find that decision-aware weighting reduces regret across several bridge solvers, suggesting improved treatment selection in proximal settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a decision-aware proximal bridge learning framework for optimal individualized treatment selection with continuous actions under hidden confounding. It introduces a policy-targeted weighted bridge loss that emphasizes decision-relevant regions of the action space, derives a regret bound showing that this loss controls treatment-selection regret via a weighted ill-posedness constant, and instantiates the approach in iterative algorithms that alternate between weighted bridge estimation, response projection, policy update, and weight refinement. Empirical results indicate reduced regret across multiple proximal bridge solvers.

Significance. If the regret bound is valid and the weighting scheme does not inflate the effective ill-posedness constant in decision-critical regions, the work would represent a useful extension of proximal causal inference methods from effect estimation to direct policy optimization. It addresses an important gap between standard proximal approaches and decision-aware objectives, with potential implications for treatment selection in confounded observational data settings. The empirical improvements provide supporting evidence, though the theoretical contribution hinges on careful control of the weighting-induced terms.

major comments (2)
  1. [§4, Theorem 1] §4, Theorem 1 (regret bound): the derivation claims that the weighted bridge loss controls selection regret through the weighted ill-posedness constant, but does not appear to isolate or bound the cross-term that arises from the dependence between the iteratively refined decision-aware weights (constructed from the estimated response surface) and the bridge function estimation error. Standard proximal analyses do not automatically cancel this term, and its growth could undermine the bound when emphasis is placed on proxy-dependent regions.
  2. [§3.1] §3.1 (weighted loss definition): the policy-targeted weighting is motivated by focusing modeling effort on decision-relevant regions, yet it is unclear from the identification argument whether the weights preserve the existence and uniqueness of the bridge functions under the standard proximal causal inference assumptions when the weights depend on the current response-surface estimate.
minor comments (2)
  1. [§5] The experimental setup would be strengthened by reporting the estimated weighted ill-posedness constants alongside the regret values to allow direct assessment of whether the weighting reduces or inflates this quantity in practice.
  2. [Notation] Notation for the weighted loss and the distinction between global and policy-targeted bridge functions could be introduced earlier in the main text with a small illustrative example.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and insightful comments on the theoretical foundations of our decision-aware proximal bridge framework. We address each major comment below with clarifications and indicate the revisions we will make.

read point-by-point responses
  1. Referee: [§4, Theorem 1] §4, Theorem 1 (regret bound): the derivation claims that the weighted bridge loss controls selection regret through the weighted ill-posedness constant, but does not appear to isolate or bound the cross-term that arises from the dependence between the iteratively refined decision-aware weights (constructed from the estimated response surface) and the bridge function estimation error. Standard proximal analyses do not automatically cancel this term, and its growth could undermine the bound when emphasis is placed on proxy-dependent regions.

    Authors: We appreciate the referee pointing out the potential cross-term arising from the iterative dependence between the decision-aware weights and the bridge estimation error. In the current proof of Theorem 1, the analysis proceeds by conditioning on fixed weights at each iteration and then bounding the resulting regret; however, we agree that an explicit isolation and bound on the cross-term is not fully detailed. To address this, we will revise the proof by adding a supporting lemma that controls the cross-term via the Lipschitz continuity of the weight function with respect to the response-surface estimator and the contraction property of the iterative updates. Under the stated assumptions, this term is shown to be of strictly higher order than the leading terms and does not inflate the weighted ill-posedness constant. The revised proof will appear in the next version of the manuscript. revision: yes

  2. Referee: [§3.1] §3.1 (weighted loss definition): the policy-targeted weighting is motivated by focusing modeling effort on decision-relevant regions, yet it is unclear from the identification argument whether the weights preserve the existence and uniqueness of the bridge functions under the standard proximal causal inference assumptions when the weights depend on the current response-surface estimate.

    Authors: We agree that the dependence of the weights on the current response-surface estimate requires careful justification for identification. Under the standard proximal assumptions (A1–A4), the bridge functions are identified by the unweighted loss. Because our policy-targeted weights are constructed to be continuous, strictly positive, and bounded away from zero over the entire action space (via a smoothed indicator based on the current estimate), the weighted loss remains equivalent to a reweighted version of the original conditional moment restrictions. Consequently, existence and uniqueness are preserved at each fixed-weight step of the iteration. The iterative dependence is handled by freezing the weights during bridge estimation. We will add a short proposition in §3.1 that formally states this inheritance of identification and uniqueness from the unweighted case. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained; regret bound rests on standard proximal identification rather than self-referential fits or citations

full rationale

The paper claims a regret bound in which the policy-targeted weighted bridge loss controls treatment-selection regret through a weighted ill-posedness constant. This construction relies on proximal causal inference identification assumptions (existence of bridge functions and proxy variables) that are external to the present work and not defined in terms of the authors' own fitted parameters or prior self-citations. The decision-aware weighting is introduced as a modification to emphasize decision-relevant regions, but the bounding argument is presented as a mathematical control that does not reduce the target regret to a quantity already fixed by the estimation procedure itself. No equations or steps in the abstract or context exhibit self-definition, fitted-input-as-prediction, or load-bearing self-citation chains. The framework therefore remains non-circular by the stated criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard proximal causal inference assumptions without introducing new free parameters or invented entities in the abstract description.

axioms (1)
  • domain assumption Existence of bridge functions and proxy variables that identify causal effects under hidden confounding
    This is the core identification assumption of proximal causal inference invoked to justify the bridge learning step.

pith-pipeline@v0.9.0 · 5768 in / 1149 out tokens · 59205 ms · 2026-05-19T20:54:57.459064+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    Optuna: A next-generation hyperparameter optimization framework

    Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, and Masanori Koyama. Optuna: A next-generation hyperparameter optimization framework. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2019

  2. [2]

    DeCaFlow: A deconfounding causal generative model.arXiv preprint arXiv:2503.15114, 2025

    Alejandro Almodóvar, Adrián Javaloy, Juan Parras, Santiago Zazo, and Isabel Valera. DeCaFlow: A deconfounding causal generative model.arXiv preprint arXiv:2503.15114, 2025

  3. [3]

    Algorithms for hyper- parameter optimization

    James Bergstra, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. Algorithms for hyper- parameter optimization. InAdvances in Neural Information Processing Systems, 2011

  4. [4]

    Alaa, James Jordon, and Mihaela van der Schaar

    Ioana Bica, Ahmed M. Alaa, James Jordon, and Mihaela van der Schaar. Estimating coun- terfactual treatment outcomes over time through adversarially balanced representations. In Proceedings of the 8th International Conference on Learning Representations, 2020

  5. [5]

    Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in neural information processing systems, 2020

    Ioana Bica, James Jordon, and Mihaela van der Schaar. Estimating the effects of continuous- valued interventions using generative adversarial networks.Advances in neural information processing systems, 2020

  6. [6]

    Density ratio- based proxy causal learning without density ratios

    Bariscan Bozkurt, Ben Deaner, Dimitri Meunier, Liyuan Xu, and Arthur Gretton. Density ratio- based proxy causal learning without density ratios. InProceedings of The 28th International Conference on Artificial Intelligence and Statistics, volume 258 ofProceedings of Machine Learning Research, pages 5095–5103. PMLR, 2025. URL https://proceedings.mlr. press...

  7. [7]

    Density ratio-free doubly robust proxy causal learning

    Bariscan Bozkurt, Houssam Zenati, Dimitri Meunier, Liyuan Xu, and Arthur Gretton. Density ratio-free doubly robust proxy causal learning. InAdvances in Neural Information Processing Systems, 2025. URLhttps://openreview.net/forum?id=a9HOg4f9Gh

  8. [8]

    Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals.Econometrica, 80(1):277–321, 2012

    Xiaohong Chen and Demian Pouzo. Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals.Econometrica, 80(1):277–321, 2012. doi: 10.3982/ECTA7888. URLhttps://doi.org/10.3982/ECTA7888

  9. [9]

    \ Rousseeuw, P J

    Yifan Cui, Hongming Pu, Xu Shi, Wang Miao, and Eric J. Tchetgen Tchetgen. Semiparametric proximal causal inference.Journal of the American Statistical Association, 119(546):1348– 1359, 2024. doi: 10.1080/01621459.2023.2191817. URL https://doi.org/10.1080/ 01621459.2023.2191817

  10. [10]

    Mackey, and Vasilis Syrgkanis

    Nishanth Dikkala, Greg Lewis, Lester W. Mackey, and Vasilis Syrgkanis. Minimax estimation of conditional moment models. InAdvances in Neural Information Processing Systems, volume 33, pages 12248–12262, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/ 8fcd9e5482a62a5fa130468f4cf641ef-Abstract.html. 10

  11. [11]

    Smart “Predict, then Optimize

    Adam N. Elmachtoub and Paul Grigas. Smart “predict, then optimize”.Management Science, 68(1):9–26, 2022. doi: 10.1287/mnsc.2020.3922. URL https://doi.org/10.1287/mnsc. 2020.3922

  12. [12]

    Causal decision making and causal effect estimation are not the same

    Carlos Fernández-Loría and Foster Provost. Causal decision making and causal effect estimation are not the same... and why it matters.INFORMS Journal on Data Science, 1(1):4–16, 2022. doi: 10.1287/ijds.2021.0006. URLhttps://doi.org/10.1287/ijds.2021.0006

  13. [13]

    CEPAE: Conditional entropy-penalized autoencoders for time series counterfactuals.arXiv preprint arXiv:2602.15546, 2026

    Tomàs Garriga, Gerard Sanz, Eduard Serrahima de Cambra, and Axel Brando. CEPAE: Conditional entropy-penalized autoencoders for time series counterfactuals.arXiv preprint arXiv:2602.15546, 2026

  14. [14]

    Minimax kernel machine learning for a class of doubly robust functionals with application to proximal causal inference

    Amiremad Ghassami, Andrew Ying, Ilya Shpitser, and Eric Tchetgen Tchetgen. Minimax kernel machine learning for a class of doubly robust functionals with application to proximal causal inference. InProceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 ofProceedings of Machine Learning Research, pages 7210–72...

  15. [15]

    Hernán and James M

    Miguel A. Hernán and James M. Robins.Causal Inference: What If. Chapman & Hall/CRC, 2020

  16. [16]

    Igc-net for conditional average potential outcome estimation over time, 2026

    Konstantin Hess, Dennis Frauen, Valentyn Melnychuk, and Stefan Feuerriegel. Igc-net for conditional average potential outcome estimation over time, 2026. URLhttps://arxiv.org/ abs/2405.21012

  17. [17]

    Keisuke Hirano and Guido W. Imbens. The propensity score with continuous treatments. In Andrew Gelman and Xiao-Li Meng, editors,Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, pages 73–84. John Wiley & Sons, Hoboken, NJ, 2004. doi: 10.1002/0470090456.ch7. URLhttps://doi.org/10.1002/0470090456.ch7

  18. [18]

    Imbens and Donald B

    Guido W. Imbens and Donald B. Rubin.Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press, 2015

  19. [19]

    Johansson, Uri Shalit, and David Sontag

    Fredrik D. Johansson, Uri Shalit, and David Sontag. Learning representations for counterfactual inference. InProceedings of the 33rd International Conference on Machine Learning, pages 3020–3029, 2016

  20. [20]

    Policy evaluation and optimization with continuous treat- ments

    Nathan Kallus and Angela Zhou. Policy evaluation and optimization with continuous treat- ments. In Amos Storkey and Fernando Perez-Cruz, editors,Proceedings of the Twenty- First International Conference on Artificial Intelligence and Statistics, volume 84 ofPro- ceedings of Machine Learning Research, pages 1243–1251. PMLR, 2018. URL https: //proceedings.m...

  21. [21]

    Causal inference under unmeasured confounding with negative controls: A minimax learning approach, 2021

    Nathan Kallus, Xiaojie Mao, and Masatoshi Uehara. Causal inference under unmeasured confounding with negative controls: A minimax learning approach, 2021. URL https:// arxiv.org/abs/2103.14029

  22. [22]

    Robins, and An- drew Beam

    Benjamin Kompa, David Bellamy, Tom Kolokotrones, James M. Robins, and An- drew Beam. Deep learning methods for proximal inference via maximum mo- ment restriction. InAdvances in Neural Information Processing Systems, volume 35,

  23. [23]

    URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/ 487c9d6ef55e73aa9dfd4b48fe3713a6-Abstract-Conference.html

  24. [24]

    G-net: a recurrent network approach to g-computation for counterfactual prediction under a dynamic treatment regime

    Rui Li, Stephanie Hu, Mingyu Lu, Yuria Utsumi, Prithwish Chakraborty, Daby M Sow, Piyush Madan, Jun Li, Mohamed Ghalwash, Zach Shahn, et al. G-net: a recurrent network approach to g-computation for counterfactual prediction under a dynamic treatment regime. InMachine Learning for Health, pages 282–299. PMLR, 2021

  25. [25]

    Learning How to V ote with Principles: Axiomatic Insights Into the Collective Decisions of Neural Networks.J

    Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, and Ferdinando Fioretto. Decision-focused learning: Foundations, state of the art, benchmark and future opportunities.Journal of Artificial Intelligence Research, 80, 2024. doi: 10.1613/jair. 1.15320. URLhttps://doi.org/10.1613/jair.1.15320. 11

  26. [26]

    Proximal causal learning with kernels: Two-stage estimation and moment restriction

    Afsaneh Mastouri, Yuchen Zhu, Limor Gultchin, Anna Korba, Ricardo Silva, Matt Kusner, Arthur Gretton, and Krikamol Muandet. Proximal causal learning with kernels: Two-stage estimation and moment restriction. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Conference on Machine Learning, volume 139 ofProceedings of Machine Lea...

  27. [27]

    Causal transformer for estimating counterfactual outcomes

    Valentyn Melnychuk, Dennis Frauen, and Stefan Feuerriegel. Causal transformer for estimating counterfactual outcomes. InProceedings of the 39th International Conference on Machine Learning, pages 15293–15329, 2022

  28. [28]

    Identifying causal effects with proxy variables of an unmeasured confounder , year =

    Wang Miao, Zhi Geng, and Eric J. Tchetgen Tchetgen. Identifying causal effects with proxy variables of an unmeasured confounder.Biometrika, 105(4):987–993, December 2018. doi: 10.1093/biomet/asy038. URLhttps://doi.org/10.1093/biomet/asy038

  29. [29]

    Graphical Models for Processing Missing Data

    Wang Miao, Wenjie Hu, Elizabeth L. Ogburn, and Xiao-Hua Zhou. Identifying effects of multiple treatments in the presence of unmeasured confounding.Journal of the American Statistical Association, 118(543):1953–1967, 2023. doi: 10.1080/01621459.2021.2023551. URLhttps://doi.org/10.1080/01621459.2021.2023551

  30. [30]

    Tchetgen Tchetgen , title =

    Wang Miao, Xu Shi, Yilin Li, and Eric J. Tchetgen Tchetgen. A confounding bridge approach for double negative control inference on causal effects.Statistical Theory and Related Fields, 8 (4):262–273, October 2024. doi: 10.1080/24754269.2024.2390748. URL https://doi.org/ 10.1080/24754269.2024.2390748

  31. [31]

    Practical do- shapley explanations with estimand-agnostic causal inference

    Álvaro Parafita, Tomas Garriga, Axel Brando, and Francisco Cazorla. Practical do- shapley explanations with estimand-agnostic causal inference. In D. Belgrave, C. Zhang, H. Lin, R. Pascanu, P. Koniusz, M. Ghassemi, and N. Chen, editors,Advances in Neu- ral Information Processing Systems, volume 38, pages 171421–171462. Curran Associates, Inc., 2025. URL h...

  32. [32]

    Estimand-agnostic causal query estimation with deep causal graphs.IEEE Access, 10:71370–71386, 2022

    Álvaro Parafita and Jordi Vitrià. Estimand-agnostic causal query estimation with deep causal graphs.IEEE Access, 10:71370–71386, 2022. doi: 10.1109/ACCESS.2022.3188395

  33. [33]

    Cambridge University Press, 2 edition, 2009

    Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, 2 edition, 2009

  34. [34]

    Journal of the American Statistical Association , year =

    Zhengling Qi, Rui Miao, and Xiaoke Zhang. Proximal learning for individualized treatment regimes under unmeasured confounding.Journal of the American Statistical Association, 119 (546):915–928, 2024. doi: 10.1080/01621459.2022.2147841. URL https://doi.org/10. 1080/01621459.2022.2147841

  35. [35]

    Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of Educational Psychology, 66(5):688–701, 1974

  36. [36]

    Johansson, and David Sontag

    Uri Shalit, Fredrik D. Johansson, and David Sontag. Estimating individual treatment effect: Generalization bounds and algorithms. InProceedings of the 34th International Conference on Machine Learning, pages 3076–3085, 2017

  37. [37]

    Optimal treatment regimes for proximal causal learning

    Tao Shen and Yifan Cui. Optimal treatment regimes for proximal causal learning. In Advances in Neural Information Processing Systems, volume 36, pages 47735–47748,

  38. [38]

    URL https://proceedings.neurips.cc/paper_files/paper/2023/hash/ 94ccfdb2ca14f33a86a0b9b7d0c1bfb1-Abstract-Conference.html

  39. [39]

    Kernel methods for causal functions: Dose, heterogeneous and incremental response curves.Biometrika, 111(2):497–516, 2024

    Rahul Singh, Liyuan Xu, and Arthur Gretton. Kernel methods for causal functions: Dose, heterogeneous and incremental response curves.Biometrika, 111(2):497–516, 2024. doi: 10.1093/biomet/asad042. URLhttps://doi.org/10.1093/biomet/asad042

  40. [40]

    and Ying, Andrew and Cui, Yifan and Shi, Xu and Miao, Wang , journal =

    Eric J. Tchetgen Tchetgen, Andrew Ying, Yifan Cui, Xu Shi, and Wang Miao. An introduction to proximal causal inference.Statistical Science, 39(3):375–390, 2024. doi: 10.1214/23-STS911. URLhttps://doi.org/10.1214/23-STS911. 12

  41. [41]

    Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523):1228–1242, 2018

    Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523):1228–1242, 2018

  42. [42]

    Weinstein, Eric A

    John N. Weinstein, Eric A. Collisson, Gordon B. Mills, Kenna R. Mills Shaw, Brad A. Ozen- berger, Kyle Ellrott, Ilya Shmulevich, Chris Sander, Joshua M. Stuart, and Cancer Genome Atlas Research Network. The cancer genome atlas pan-cancer analysis project.Nature Genetics, 45(10):1113, 2013

  43. [43]

    Teal Witter, Álvaro Parafita, Tomas Garriga, Maximilian Muschalik, Fabian Fumagalli, Axel Brando, and Lucas Rosenblatt

    R. Teal Witter, Álvaro Parafita, Tomas Garriga, Maximilian Muschalik, Fabian Fumagalli, Axel Brando, and Lucas Rosenblatt. Exactly computing do-shapley values, 2026. URL https://arxiv.org/abs/2602.07203

  44. [44]

    Doubly robust proximal causal learning for continuous treatments

    Yong Wu, Yanwei Fu, Shouyan Wang, and Xinwei Sun. Doubly robust proximal causal learning for continuous treatments. InInternational Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=TjGJFkU3xL

  45. [45]

    Deep proxy causal learning and its appli- cation to confounded bandit policy evaluation

    Liyuan Xu, Heishiro Kanagawa, and Arthur Gretton. Deep proxy causal learning and its appli- cation to confounded bandit policy evaluation. InAdvances in Neural Information Processing Systems, volume 34, pages 26264–26275, 2021. URL https://proceedings.neurips.cc/ paper/2021/hash/dcf3219715a7c9cd9286f19db46f2384-Abstract.html

  46. [46]

    Tchetgen Tchetgen

    Andrew Ying, Wang Miao, Xu Shi, and Eric J. Tchetgen Tchetgen. Proximal causal inference for complex longitudinal studies.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(3):684–704, 2023. doi: 10.1093/jrsssb/qkad020. URL https://doi.org/ 10.1093/jrsssb/qkad020

  47. [47]

    Tchetgen Tchetgen

    Jeffrey Zhang, Wei Li, Wang Miao, and Eric J. Tchetgen Tchetgen. Proximal causal inference without uniqueness assumptions.Statistics & Probability Letters, 198:109836, 2023. doi: 10.1016/j.spl.2023.109836. URLhttps://doi.org/10.1016/j.spl.2023.109836

  48. [48]

    Journal of the American Statistical Association , author =

    Yingqi Zhao, Donglin Zeng, A. John Rush, and Michael R. Kosorok. Estimating individ- ualized treatment rules using outcome weighted learning.Journal of the American Statis- tical Association, 107(499):1106–1118, 2012. doi: 10.1080/01621459.2012.695674. URL https://doi.org/10.1080/01621459.2012.695674

  49. [49]

    Counterfactual prediction for outcome-oriented treatments

    Hao Zou, Bo Li, Jiangang Han, Shuiping Chen, Xuetao Ding, and Peng Cui. Counterfactual prediction for outcome-oriented treatments. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors,Proceedings of the 39th International Conference on Machine Learning, volume 162 ofProceedings of Machine Learning Researc...

  50. [50]

    Almodóvar et al.[2] propose a deconfounding normalizing flow that, applied to proximal settings, implicitly solves the bridge equation

    propose a kernel-smoothed doubly robust proximal estimator, addressing the difficulty that binary-treatment proximal doubly robust estimators do not transfer directly to continuous actions because exact treatment matching has probability zero. Almodóvar et al.[2] propose a deconfounding normalizing flow that, applied to proximal settings, implicitly solve...

  51. [51]

    The same equivalence holds for the unweighted bridge risk, corresponding to ω≡1

    Therefore, Lbr,ω(h) = 0⇐ ⇒E{Y−h(A, W, X)|A, Z, X}= 0a.s. The same equivalence holds for the unweighted bridge risk, corresponding to ω≡1 . Hence the weighted and unweighted population bridge risks have the same zero-risk solution. E Practical Implementation Template for Policy-Targeted Proximal Solvers This appendix collects the implementation details for...