Causal Gaussian Processes for Robust Treatment Effect Evaluation with Unobserved Confounding

Elias Bareinboim; Jingyuan Chen; Junzhe Zhang

arxiv: 2606.21809 · v1 · pith:52VKLOZMnew · submitted 2026-06-20 · 💻 cs.LG

Causal Gaussian Processes for Robust Treatment Effect Evaluation with Unobserved Confounding

Junzhe Zhang , Jingyuan Chen , Elias Bareinboim This is my paper

Pith reviewed 2026-06-26 12:45 UTC · model grok-4.3

classification 💻 cs.LG

keywords causal inferencegaussian processesunobserved confoundingtreatment effect evaluationcontinuous domainsuniversal discretizationinterventional distributionspolicy evaluation

0 comments

The pith

Causal Gaussian processes can approximate observational and interventional distributions of any causal model with unobserved confounding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to enable evaluation of causal effects for continuous treatments and outcomes when unobserved confounding is present, relying only on the temporal order between treatment and outcome. It establishes that any causal model can be approximated to arbitrary accuracy by a universal discretization of the exogenous domains into a finite set of latent states. This approximation property supports the construction of a family of Causal Gaussian Process models that capture both observational and interventional behavior. A reader would care because prior robust methods demand strong prior knowledge or restrict to discrete variables, while this approach targets the continuous case common in applications.

Core claim

The authors introduce a universal discretization of the exogenous domains that approximates the observational and interventional distributions of any causal model with arbitrary accuracy using a finite number of latent states. Building on this newfound universal approximation property, they develop a novel family of Causal Gaussian process (CGP) models that effectively approximate the observational and interventional distributions of any causal model with confounded observations.

What carries the argument

universal discretization of the exogenous domains into a finite number of latent states, which enables Causal Gaussian process (CGP) models to approximate any causal model

If this is right

Causal effect evaluation is possible over continuous domains from confounded observations.
Only basic temporal ordering between treatment and outcome is needed.
Any causal model can be approximated with arbitrary accuracy.
Robust evaluation works without detailed prior knowledge of the environment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The discretization approach could be combined with other function approximators beyond Gaussian processes for similar causal tasks.
Practical tests could compare CGP-based estimates against ground-truth interventions in simulated continuous confounded systems.
The method opens a route to handling mixed continuous-discrete variables by extending the same discretization idea.

Load-bearing premise

There exists a universal discretization of the exogenous domains that approximates the observational and interventional distributions of any causal model with arbitrary accuracy using a finite number of latent states.

What would settle it

A concrete causal model with continuous exogenous variables for which no finite discretization of those variables can make the induced observational and interventional distributions arbitrarily close to the true ones.

Figures

Figures reproduced from arXiv: 2606.21809 by Elias Bareinboim, Jingyuan Chen, Junzhe Zhang.

**Figure 1.** Figure 1: Samples drawn from the observational P(X, Y ) (blue) and interventional Px(Y ) (orange) distributions defined by various reward functions. These functions include: (a) polynomial function; (b) logistic function; (c) phase function; and (d) linear function. The regression function is obtained by applying a Gaussian Process model to the observed data. gives a different evaluation on the treatment effects Ex … view at source ↗

**Figure 2.** Figure 2: Causal diagrams of (a) a contextual bandit model [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The illustration of the confounding-robust infer [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: A simple function approximating the reward function fY (x, u) in the ground-truth causal model M of Example 1. The observed trajectories and their approximations are highlighted in blue and orange. Among the above equations, the first condition in fbY (x, u) ensures that the outcome variable Yb(u) in the canonical model Mc effectively approximates the observation Y (u) in the causal model M. It follows … view at source ↗

**Figure 5.** Figure 5: (a) Stratified observed data based on the assigned functional types; (b, c) posteriors over selected canonical [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: Simulations comparing the derived posterior approximations over various reward functions using our proposed [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

read the original abstract

The presence of confounding bias poses a key challenge in policy evaluation, as the target causal effects of actions are not identifiable (i.e., underdetermined) from observational data. On the other hand, existing confounding-robust evaluation strategies require detailed prior knowledge about the environment or apply only to discrete treatments and outcomes. This paper investigates causal effect evaluation over the continuous domain from confounded observations, while requiring only basic temporal ordering between the treatment and the outcome. We introduce a universal discretization of the exogenous domains that approximates the observational and interventional distributions of any causal model with arbitrary accuracy using a finite number of latent states. Building on this newfound universal approximation property, we develop a novel family of Causal Gaussian process (CGP) models that effectively approximate the observational and interventional distributions of any causal model with confounded observations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The universal discretization property claimed for the CGPs probably needs unstated regularity conditions on the causal models.

read the letter

Hey,

The punchline here is the claimed universal discretization property that underpins the whole Causal Gaussian Process construction probably doesn't go through for arbitrary causal models without some regularity conditions that the paper doesn't mention.

They introduce a discretization of the exogenous domains using a finite number of latent states that can approximate the distributions of any causal model to arbitrary accuracy. This lets them define CGP models that approximate observational and interventional distributions even with unobserved confounding, as long as treatment comes before outcome in time.

What's new is this discretization trick and the family of CGP models built on it. Prior work on confounding-robust evaluation was limited to discrete cases or needed more knowledge about the environment.

It does a good job setting up the continuous domain problem and proposing a flexible GP-based solution that could apply more broadly.

The main soft spot is in that universal property. The stress-test note is correct: to have one finite discretization work across all causal models, you'd need uniformity on things like the support of exogenous variables or continuity of the structural equations. Without those, the number of states needed could depend on the specific model and become unbounded. The abstract presents it as general, so that's a load-bearing assumption that needs clarification or restriction in the paper.

If there are proofs or experiments in the full text showing it works in practice, that would help, but the central claim seems ambitious.

This paper is for researchers working on causal effect estimation in continuous spaces with machine learning tools like GPs. Someone focused on policy evaluation under confounding might get some ideas from the CGP approach.

It deserves a serious referee because the topic matters and the proposal is distinct enough to be worth checking, though revisions on the assumptions are likely.

Referee Report

1 major / 0 minor

Summary. The manuscript claims to introduce a universal discretization of exogenous domains that approximates the observational and interventional distributions of any causal model to arbitrary accuracy using a finite number of latent states. It then develops a family of Causal Gaussian Process (CGP) models that leverage this property to approximate those distributions from confounded observations, enabling robust causal effect evaluation over continuous treatment and outcome domains while requiring only temporal ordering between treatment and outcome.

Significance. If the universal approximation property holds, the CGP framework would represent a notable contribution to causal inference by providing a nonparametric approach to continuous-domain treatment effect estimation under unobserved confounding, extending beyond existing methods limited to discrete variables or requiring strong prior knowledge. The approach could enable more flexible modeling in policy evaluation settings.

major comments (1)

[Abstract] Abstract: The claim that there exists a 'universal discretization of the exogenous domains' that approximates 'the observational and interventional distributions of any causal model with arbitrary accuracy using a finite number of latent states' is load-bearing for the entire contribution. This property cannot hold for arbitrary causal models without additional uniformity conditions (e.g., compact support of all exogenous variables or uniform Lipschitz bounds on structural equations across the model class) that are not stated; without them the required resolution cannot be bounded independently of the specific model, as the discretization would need to adapt to arbitrarily differing noise supports or function moduli.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and valuable feedback. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that there exists a 'universal discretization of the exogenous domains' that approximates 'the observational and interventional distributions of any causal model with arbitrary accuracy using a finite number of latent states' is load-bearing for the entire contribution. This property cannot hold for arbitrary causal models without additional uniformity conditions (e.g., compact support of all exogenous variables or uniform Lipschitz bounds on structural equations across the model class) that are not stated; without them the required resolution cannot be bounded independently of the specific model, as the discretization would need to adapt to arbitrarily differing noise supports or function moduli.

Authors: We agree that the stated claim requires regularity conditions to hold uniformly. Our discretization construction relies on the exogenous variables having compact support and the structural equations satisfying a uniform Lipschitz bound over the model class; these ensure a finite partition suffices for arbitrary accuracy independent of any particular model. These conditions are standard for obtaining uniform approximation guarantees in nonparametric settings but were not explicitly listed in the abstract. In the revision we will update the abstract, the introduction, and the formal statement of the universal discretization result to make the assumptions explicit, thereby qualifying the claim to apply to causal models satisfying them. The remainder of the CGP development and the empirical results are unaffected, as they operate within this class. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation introduces new discretization property and builds CGP models independently

full rationale

The paper states a universal discretization of exogenous domains as a new property that approximates any causal model's observational and interventional distributions to arbitrary accuracy with finite latent states, then develops CGP models on this property. No equations or text reduce a prediction or result to a fitted input, self-definition, or self-citation chain by construction. The central approximation claim is presented as a foundational result rather than derived from prior fitted parameters or renamed known results within the paper. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the existence of a universal discretization property and introduces CGP as a new modeling family; no free parameters or invented entities with independent evidence are detailed in the abstract.

axioms (1)

domain assumption A universal discretization of the exogenous domains approximates the observational and interventional distributions of any causal model with arbitrary accuracy using a finite number of latent states.
This property is invoked as the foundation for the CGP models in the abstract.

invented entities (1)

Causal Gaussian Process (CGP) models no independent evidence
purpose: Approximate observational and interventional distributions under unobserved confounding.
New family of models developed in the paper.

pith-pipeline@v0.9.1-grok · 5664 in / 1125 out tokens · 23479 ms · 2026-06-26T12:45:55.085721+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 1 canonical work pages · 1 internal anchor

[1]

Pymc: a modern, and comprehensive probabilistic programming framework in python

Oriol Abril-Pla, Virgile Andreani, Colin Carroll, Larry Dong, Christopher J Fonnesbeck, Maxim Kochurov, Ravin Kumar, Junpeng Lao, Christian C Luhmann, Osvaldo A Martin, et al. Pymc: a modern, and comprehensive probabilistic programming framework in python. PeerJ Computer Science, 9: 0 e1516, 2023

2023
[2]

Angrist, G.W

J.D. Angrist, G.W. Imbens, and D.B. Rubin. Identification of causal effects using instrumental variables (with comments). Journal of the American Statistical Association, 91 0 (434): 0 444--472, 1996

1996
[3]

Measure, integration & real analysis

Sheldon Axler. Measure, integration & real analysis. Springer Nature, 2020

2020
[4]

Gaussian process linking functions for mind, brain, and behavior

Giwon Bahg, Daniel G Evans, Matthew Galdo, and Brandon M Turner. Gaussian process linking functions for mind, brain, and behavior. Proceedings of the National Academy of Sciences, 117 0 (47): 0 29398--29406, 2020

2020
[5]

Balke and J

A. Balke and J. Pearl. Counterfactual probabilities: Computational methods, bounds, and applications. In R. Lopez de Mantaras and D. Poole, editors, Uncertainty in Artificial Intelligence 10, pages 46--54. Morgan Kaufmann, San Mateo, CA, 1994

1994
[6]

Balke and J

A. Balke and J. Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92 0 (439): 0 1172--1176, September 1997

1997
[7]

Doubly robust estimation in missing data and causal inference models

Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

2005
[8]

E.\ Bareinboim and J. Pearl. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113: 0 7345--7352, 2016

2016
[9]

On pearl’s hierarchy and the foundations of causal inference

E Bareinboim, JD Correa, D Ibeling, and T Icard. On pearl’s hierarchy and the foundations of causal inference. ACM Special Volume in Honor of Judea Pearl, 2020. forthcoming. Also, Technical Report R-60, Causal AI Lab, Columbia University, https://causalai.net/r60.pdf https://causalai.net/r60.pdf

2020
[10]

Variational inference: A review for statisticians

David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American statistical Association, 112 0 (518): 0 859--877, 2017

2017
[11]

The international stroke trial (ist): a randomized trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke

A Carolei et al. The international stroke trial (ist): a randomized trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke. The Lancet, 349: 0 1569--1581, 1997

1997
[12]

Gaussian process regression for materials and molecules

Volker L Deringer, Albert P Bart \'o k, Noam Bernstein, David M Wilkins, Michele Ceriotti, and G \'a bor Cs \'a nyi. Gaussian process regression for materials and molecules. Chemical Reviews, 121 0 (16): 0 10073--10141, 2021

2021
[13]

Doubly robust policy evaluation and learning

Miroslav Dud \' k, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 1097--1104. Omnipress, 2011

2011
[14]

Frangakis and D.B

C.E. Frangakis and D.B. Rubin. Principal stratification in causal inference. Biometrics, 1 0 (58): 0 21--29, 2002

2002
[15]

Galles and J

D. Galles and J. Pearl. Axioms of causal relevance. Artificial Intelligence, 97 0 (1-2): 0 9--43, 1997

1997
[16]

Stochastic relaxation, gibbs distributions, and the bayesian restoration of images

Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, 0 (6): 0 721--741, 1984

1984
[17]

A tutorial on bayesian nonparametric models

Samuel J Gershman and David M Blei. A tutorial on bayesian nonparametric models. Journal of Mathematical Psychology, 56 0 (1): 0 1--12, 2012

2012
[18]

Huang and M

Y. Huang and M. Valtorta. Pearl's calculus of intervention is complete. In R. Dechter and T.S. Richardson, editors, Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 217--224. AUAI Press, Corvallis, OR, 2006

2006
[19]

Confounding-robust policy improvement

Nathan Kallus and Angela Zhou. Confounding-robust policy improvement. In Advances in neural information processing systems, pages 9269--9279, 2018

2018
[20]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[21]

Causalgan: Learning causal implicit generative models with adversarial training

Murat Kocaoglu, Christopher Snyder, Alexandros G Dimakis, and Sriram Vishwanath. Causalgan: Learning causal implicit generative models with adversarial training. In International Conference on Learning Representations, 2018

2018
[22]

Automatic differentiation variational inference

Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Automatic differentiation variational inference. Journal of machine learning research, 18 0 (14): 0 1--45, 2017

2017
[23]

Toward minimax off-policy value estimation

Lihong Li, Remi Munos, and Csaba Szepesvari. Toward minimax off-policy value estimation. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, May 2015. URL https://www.microsoft.com/en-us/research/publication/toward-minimax-off-policy-value-estimation/

2015
[24]

Stein variational gradient descent: A general purpose bayesian inference algorithm

Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in neural information processing systems, 29, 2016

2016
[25]

Monotone instrumental variables with an application to the returns to schooling, 1998

Charles F Manski and John V Pepper. Monotone instrumental variables with an application to the returns to schooling, 1998

1998
[26]

Clustering gene expression time series data using an infinite gaussian process mixture model

Ian C McDowell, Dinesh Manandhar, Christopher M Vockley, Amy K Schmid, Timothy E Reddy, and Barbara E Engelhardt. Clustering gene expression time series data using an infinite gaussian process mixture model. PLoS computational biology, 14 0 (1): 0 e1005896, 2018

2018
[27]

Universal kernels

Charles A Micchelli, Yuesheng Xu, and Haizhang Zhang. Universal kernels. Journal of Machine Learning Research, 7 0 (12), 2006

2006
[28]

Safe and efficient off-policy reinforcement learning

R \'e mi Munos, Tom Stepleton, Anna Harutyunyan, and Marc Bellemare. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems, 2016

2016
[29]

Counterfactual identifiability of bijective causal models

Arash Nasr-Esfahany, Mohammad Alizadeh, and Devavrat Shah. Counterfactual identifiability of bijective causal models. In International Conference on Machine Learning, pages 25733--25754. PMLR, 2023

2023
[30]

J. Pearl. Causal diagrams for empirical research. Biometrika, 82 0 (4): 0 669--710, 1995

1995
[31]

J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, NY, 2000. 2nd edition, 2009

2000
[32]

Sticking the landing: Simple, lower-variance gradient estimators for variational inference

Geoffrey Roeder, Yuhuai Wu, and David K Duvenaud. Sticking the landing: Simple, lower-variance gradient estimators for variational inference. Advances in Neural Information Processing Systems, 30, 2017

2017
[33]

Rosenbaum and D

P. Rosenbaum and D. Rubin. The central role of propensity score in observational studies for causal effects. Biometrika, 70: 0 41--55, 1983

1983
[34]

Estimating the dimension of a model

Gideon Schwarz. Estimating the dimension of a model. The annals of statistics, pages 461--464, 1978

1978
[35]

A constructive definition of dirichlet priors

Jayaram Sethuraman. A constructive definition of dirichlet priors. Statistica sinica, pages 639--650, 1994

1994
[36]

Gaussian process cosmography

Arman Shafieloo, Alex G Kim, and Eric V Linder. Gaussian process cosmography. Physical Review D—Particles, Fields, Gravitation, and Cosmology, 85 0 (12): 0 123530, 2012

2012
[37]

Shpitser and J

I. Shpitser and J. Pearl. Identification of joint interventional distributions in recursive semi- M arkovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 1219--1226. 2006

2006
[38]

Spirtes, C.N

P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2001

2001
[39]

Causation, prediction, and search

Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, and search. MIT press, 2000

2000
[40]

Real analysis

Elias M Stein and Rami Shakarchi. Real analysis. Princeton University Press, 2009

2009
[41]

Data-efficient off-policy policy evaluation for reinforcement learning

Philip Thomas and Emma Brunskill. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pages 2139--2148, 2016

2016
[42]

A widely applicable bayesian information criterion

Sumio Watanabe. A widely applicable bayesian information criterion. The Journal of Machine Learning Research, 14 0 (1): 0 867--897, 2013

2013
[43]

Prediction with gaussian processes: From linear regression to linear prediction and beyond

Christopher KI Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. In Learning in graphical models, pages 599--621. Springer, 1998

1998
[44]

P.G. Wright. The Tariff on Animal and Vegetable Oils. The MacMillan Company, New York, NY, 1928

1928
[45]

The method of path coefficients

Sewall Wright. The method of path coefficients. The annals of mathematical statistics, 5 0 (3): 0 161--215, 1934

1934
[46]

The causal-neural connection: Expressiveness, learnability, and inference

Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim. The causal-neural connection: Expressiveness, learnability, and inference. Advances in Neural Information Processing Systems, 34: 0 10823--10836, 2021

2021
[47]

Neural causal models for counterfactual identification and estimation

Kevin Xia, Yushu Pan, and Elias Bareinboim. Neural causal models for counterfactual identification and estimation. In Proceedings of the 11th Eleventh International Conference on Learning Representations, 2022

2022
[48]

Bounding causal effects on continuous outcomes

Junzhe Zhang and Elias Bareinboim. Bounding causal effects on continuous outcomes. In Proceedings of the 35nd AAAI Conference on Artificial Intelligence, 2021

2021
[49]

Partial counterfactual identification from observational and experimental data

Junzhe Zhang, Jin Tian, and Elias Bareinboim. Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, pages 26548--26558. PMLR, 2022

2022

[1] [1]

Pymc: a modern, and comprehensive probabilistic programming framework in python

Oriol Abril-Pla, Virgile Andreani, Colin Carroll, Larry Dong, Christopher J Fonnesbeck, Maxim Kochurov, Ravin Kumar, Junpeng Lao, Christian C Luhmann, Osvaldo A Martin, et al. Pymc: a modern, and comprehensive probabilistic programming framework in python. PeerJ Computer Science, 9: 0 e1516, 2023

2023

[2] [2]

Angrist, G.W

J.D. Angrist, G.W. Imbens, and D.B. Rubin. Identification of causal effects using instrumental variables (with comments). Journal of the American Statistical Association, 91 0 (434): 0 444--472, 1996

1996

[3] [3]

Measure, integration & real analysis

Sheldon Axler. Measure, integration & real analysis. Springer Nature, 2020

2020

[4] [4]

Gaussian process linking functions for mind, brain, and behavior

Giwon Bahg, Daniel G Evans, Matthew Galdo, and Brandon M Turner. Gaussian process linking functions for mind, brain, and behavior. Proceedings of the National Academy of Sciences, 117 0 (47): 0 29398--29406, 2020

2020

[5] [5]

Balke and J

A. Balke and J. Pearl. Counterfactual probabilities: Computational methods, bounds, and applications. In R. Lopez de Mantaras and D. Poole, editors, Uncertainty in Artificial Intelligence 10, pages 46--54. Morgan Kaufmann, San Mateo, CA, 1994

1994

[6] [6]

Balke and J

A. Balke and J. Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92 0 (439): 0 1172--1176, September 1997

1997

[7] [7]

Doubly robust estimation in missing data and causal inference models

Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61 0 (4): 0 962--973, 2005

2005

[8] [8]

E.\ Bareinboim and J. Pearl. Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences, 113: 0 7345--7352, 2016

2016

[9] [9]

On pearl’s hierarchy and the foundations of causal inference

E Bareinboim, JD Correa, D Ibeling, and T Icard. On pearl’s hierarchy and the foundations of causal inference. ACM Special Volume in Honor of Judea Pearl, 2020. forthcoming. Also, Technical Report R-60, Causal AI Lab, Columbia University, https://causalai.net/r60.pdf https://causalai.net/r60.pdf

2020

[10] [10]

Variational inference: A review for statisticians

David M Blei, Alp Kucukelbir, and Jon D McAuliffe. Variational inference: A review for statisticians. Journal of the American statistical Association, 112 0 (518): 0 859--877, 2017

2017

[11] [11]

The international stroke trial (ist): a randomized trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke

A Carolei et al. The international stroke trial (ist): a randomized trial of aspirin, subcutaneous heparin, both, or neither among 19435 patients with acute ischaemic stroke. The Lancet, 349: 0 1569--1581, 1997

1997

[12] [12]

Gaussian process regression for materials and molecules

Volker L Deringer, Albert P Bart \'o k, Noam Bernstein, David M Wilkins, Michele Ceriotti, and G \'a bor Cs \'a nyi. Gaussian process regression for materials and molecules. Chemical Reviews, 121 0 (16): 0 10073--10141, 2021

2021

[13] [13]

Doubly robust policy evaluation and learning

Miroslav Dud \' k, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 1097--1104. Omnipress, 2011

2011

[14] [14]

Frangakis and D.B

C.E. Frangakis and D.B. Rubin. Principal stratification in causal inference. Biometrics, 1 0 (58): 0 21--29, 2002

2002

[15] [15]

Galles and J

D. Galles and J. Pearl. Axioms of causal relevance. Artificial Intelligence, 97 0 (1-2): 0 9--43, 1997

1997

[16] [16]

Stochastic relaxation, gibbs distributions, and the bayesian restoration of images

Stuart Geman and Donald Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on pattern analysis and machine intelligence, 0 (6): 0 721--741, 1984

1984

[17] [17]

A tutorial on bayesian nonparametric models

Samuel J Gershman and David M Blei. A tutorial on bayesian nonparametric models. Journal of Mathematical Psychology, 56 0 (1): 0 1--12, 2012

2012

[18] [18]

Huang and M

Y. Huang and M. Valtorta. Pearl's calculus of intervention is complete. In R. Dechter and T.S. Richardson, editors, Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, pages 217--224. AUAI Press, Corvallis, OR, 2006

2006

[19] [19]

Confounding-robust policy improvement

Nathan Kallus and Angela Zhou. Confounding-robust policy improvement. In Advances in neural information processing systems, pages 9269--9279, 2018

2018

[20] [20]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[21] [21]

Causalgan: Learning causal implicit generative models with adversarial training

Murat Kocaoglu, Christopher Snyder, Alexandros G Dimakis, and Sriram Vishwanath. Causalgan: Learning causal implicit generative models with adversarial training. In International Conference on Learning Representations, 2018

2018

[22] [22]

Automatic differentiation variational inference

Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. Automatic differentiation variational inference. Journal of machine learning research, 18 0 (14): 0 1--45, 2017

2017

[23] [23]

Toward minimax off-policy value estimation

Lihong Li, Remi Munos, and Csaba Szepesvari. Toward minimax off-policy value estimation. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, May 2015. URL https://www.microsoft.com/en-us/research/publication/toward-minimax-off-policy-value-estimation/

2015

[24] [24]

Stein variational gradient descent: A general purpose bayesian inference algorithm

Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. Advances in neural information processing systems, 29, 2016

2016

[25] [25]

Monotone instrumental variables with an application to the returns to schooling, 1998

Charles F Manski and John V Pepper. Monotone instrumental variables with an application to the returns to schooling, 1998

1998

[26] [26]

Clustering gene expression time series data using an infinite gaussian process mixture model

Ian C McDowell, Dinesh Manandhar, Christopher M Vockley, Amy K Schmid, Timothy E Reddy, and Barbara E Engelhardt. Clustering gene expression time series data using an infinite gaussian process mixture model. PLoS computational biology, 14 0 (1): 0 e1005896, 2018

2018

[27] [27]

Universal kernels

Charles A Micchelli, Yuesheng Xu, and Haizhang Zhang. Universal kernels. Journal of Machine Learning Research, 7 0 (12), 2006

2006

[28] [28]

Safe and efficient off-policy reinforcement learning

R \'e mi Munos, Tom Stepleton, Anna Harutyunyan, and Marc Bellemare. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems, 2016

2016

[29] [29]

Counterfactual identifiability of bijective causal models

Arash Nasr-Esfahany, Mohammad Alizadeh, and Devavrat Shah. Counterfactual identifiability of bijective causal models. In International Conference on Machine Learning, pages 25733--25754. PMLR, 2023

2023

[30] [30]

J. Pearl. Causal diagrams for empirical research. Biometrika, 82 0 (4): 0 669--710, 1995

1995

[31] [31]

J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, NY, 2000. 2nd edition, 2009

2000

[32] [32]

Sticking the landing: Simple, lower-variance gradient estimators for variational inference

Geoffrey Roeder, Yuhuai Wu, and David K Duvenaud. Sticking the landing: Simple, lower-variance gradient estimators for variational inference. Advances in Neural Information Processing Systems, 30, 2017

2017

[33] [33]

Rosenbaum and D

P. Rosenbaum and D. Rubin. The central role of propensity score in observational studies for causal effects. Biometrika, 70: 0 41--55, 1983

1983

[34] [34]

Estimating the dimension of a model

Gideon Schwarz. Estimating the dimension of a model. The annals of statistics, pages 461--464, 1978

1978

[35] [35]

A constructive definition of dirichlet priors

Jayaram Sethuraman. A constructive definition of dirichlet priors. Statistica sinica, pages 639--650, 1994

1994

[36] [36]

Gaussian process cosmography

Arman Shafieloo, Alex G Kim, and Eric V Linder. Gaussian process cosmography. Physical Review D—Particles, Fields, Gravitation, and Cosmology, 85 0 (12): 0 123530, 2012

2012

[37] [37]

Shpitser and J

I. Shpitser and J. Pearl. Identification of joint interventional distributions in recursive semi- M arkovian causal models. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, pages 1219--1226. 2006

2006

[38] [38]

Spirtes, C.N

P. Spirtes, C.N. Glymour, and R. Scheines. Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2001

2001

[39] [39]

Causation, prediction, and search

Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, and search. MIT press, 2000

2000

[40] [40]

Real analysis

Elias M Stein and Rami Shakarchi. Real analysis. Princeton University Press, 2009

2009

[41] [41]

Data-efficient off-policy policy evaluation for reinforcement learning

Philip Thomas and Emma Brunskill. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pages 2139--2148, 2016

2016

[42] [42]

A widely applicable bayesian information criterion

Sumio Watanabe. A widely applicable bayesian information criterion. The Journal of Machine Learning Research, 14 0 (1): 0 867--897, 2013

2013

[43] [43]

Prediction with gaussian processes: From linear regression to linear prediction and beyond

Christopher KI Williams. Prediction with gaussian processes: From linear regression to linear prediction and beyond. In Learning in graphical models, pages 599--621. Springer, 1998

1998

[44] [44]

P.G. Wright. The Tariff on Animal and Vegetable Oils. The MacMillan Company, New York, NY, 1928

1928

[45] [45]

The method of path coefficients

Sewall Wright. The method of path coefficients. The annals of mathematical statistics, 5 0 (3): 0 161--215, 1934

1934

[46] [46]

The causal-neural connection: Expressiveness, learnability, and inference

Kevin Xia, Kai-Zhan Lee, Yoshua Bengio, and Elias Bareinboim. The causal-neural connection: Expressiveness, learnability, and inference. Advances in Neural Information Processing Systems, 34: 0 10823--10836, 2021

2021

[47] [47]

Neural causal models for counterfactual identification and estimation

Kevin Xia, Yushu Pan, and Elias Bareinboim. Neural causal models for counterfactual identification and estimation. In Proceedings of the 11th Eleventh International Conference on Learning Representations, 2022

2022

[48] [48]

Bounding causal effects on continuous outcomes

Junzhe Zhang and Elias Bareinboim. Bounding causal effects on continuous outcomes. In Proceedings of the 35nd AAAI Conference on Artificial Intelligence, 2021

2021

[49] [49]

Partial counterfactual identification from observational and experimental data

Junzhe Zhang, Jin Tian, and Elias Bareinboim. Partial counterfactual identification from observational and experimental data. In International Conference on Machine Learning, pages 26548--26558. PMLR, 2022

2022