Causal Variational Deep Embedding: A Family of Interventional Generators for Confounded Images

Jingyuan Chen; Junzhe Zhang; Kangrui Ruan

arxiv: 2606.21806 · v1 · pith:PGPMFGBFnew · submitted 2026-06-19 · 💻 cs.LG

Causal Variational Deep Embedding: A Family of Interventional Generators for Confounded Images

Jingyuan Chen , Kangrui Ruan , Junzhe Zhang This is my paper

Pith reviewed 2026-06-26 14:07 UTC · model grok-4.3

classification 💻 cs.LG

keywords causal generative modelsvariational autoencodersconfounded imagesinterventional distributionsstructural causal modelsmixture modelsentropy regularization

0 comments

The pith

A canonical class of augmented causal models with discrete cluster confounders is dense in Wasserstein distance for any diagram-compatible mechanism, allowing a mixture VAE to trace families of interventional image generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that deep generative models need not commit to one interventional distribution when an unobserved confounder creates ambiguity in image data. Instead, a canonical form collapses the confounder into a discrete latent cluster while absorbing other variation into noises, and this form can approximate any compatible causal structure arbitrarily closely. The authors instantiate the form as a mixture variational autoencoder and add an entropy penalty on the cluster posterior whose strength varies the implied causal effect. A sympathetic reader would care because the method produces multiple plausible counterfactual images from the same observational training set without requiring assumptions strong enough to identify a unique mechanism.

Core claim

We prove that this canonical class is dense, in both observational and interventional Wasserstein distance, in the class of augmented SCMs compatible with a given causal diagram, and instantiate it as a mixture variational autoencoder whose cluster variable plays the role of the canonical confounder. An entropy regularizer with weight γ on the cluster posterior then traces a family of candidate causal effects that fit the observational data to comparable likelihood while spanning the feasible region.

What carries the argument

The canonical augmented SCM whose unobserved confounder is represented as a discrete latent cluster of bounded support, instantiated as a mixture variational autoencoder with entropy regularization on the cluster posterior.

If this is right

The mixture VAE produces diverse interventional samples on image benchmarks.
The generated images achieve improved FID scores relative to an unconfounded reference.
Varying the entropy weight γ yields a continuous family of causal effects that all match observational likelihood.
The density result implies that the canonical form can stand in for any diagram-compatible model to arbitrary precision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same canonical reduction might be applied to non-image domains where confounders produce similar ambiguity in generative models.
The regularization parameter γ offers a tunable control for the strength of the implied causal mechanism.
Downstream tasks such as data augmentation or fairness auditing could use the spanned family to quantify sensitivity to confounding.
Empirical checks on real confounded image pairs with known interventions would test whether the generated family brackets ground-truth effects.

Load-bearing premise

The unobserved confounder can be represented without loss of generality as a discrete latent cluster of bounded support, with all continuous variation absorbed into independent noise terms.

What would settle it

Construct a synthetic image dataset from a known augmented SCM with a continuous confounder, train the model, and check whether the interventional distributions obtained by varying γ include or approach the true interventional distribution in Wasserstein distance.

Figures

Figures reproduced from arXiv: 2606.21806 by Jingyuan Chen, Junzhe Zhang, Kangrui Ruan.

**Figure 1.** Figure 1: Estimated P(Y =1 | do(X=1)) on confounded ColorMNIST. Dashed lines indicate Manski’s bound [22] identified from observational data. Example 1 (Confounded Color-MNIST). Consider a binary ColorMNIST in which digit X ∈ {0, 1} and background color Y ∈ {green, blue} are linked through a binary confounder U in the training set (Fig. 2a), but are independent in the unconfounded distribution we would like the m… view at source ↗

**Figure 3.** Figure 3: (a) An ASCM with treatment X, pre-treatment Z, post-treatment attribute Y , and image I. (b) Submodel induced by do(X=x), in which the structural equation for X is replaced by the constant X ← x. We use capital letters to denote variables (X), small letters for their values (x), and ΩX for their domains. For a set X, |X| denotes its cardinality. The probability distribution over variables X is denoted by … view at source ↗

**Figure 4.** Figure 4: CAUVADE: (a) decoder of (4), where the discrete cluster C instantiates the canonical confounder of Def. 2.1 and the continuous Z bundles the pre-treatment attribute with independent noises; (b) encoder network qϕ(Z, C | I) = qϕ(Z | I) qϕ(C | Z). CAUVADE is itself an instance of CASCM. The starting point is Variational Deep Embedding (VaDE) [12], which models the latent space of a VAE as a Gaussian mixture… view at source ↗

**Figure 5.** Figure 5: CAUVADE samples on Confounded Color-MNIST under X = 1 at (a) γ = 1, (b) γ = 2, (c) γ = 10, (d) γ = 20, (e) γ = 50, and (f) γ = 100. Larger γ shifts the digit–color association, revealing distinct causal mechanisms consistent with the same observational distribution. We assess each by how well its estimated P(Y | do(X)) aligns with the feasible region partially identifiable from observational data, characte… view at source ↗

**Figure 6.** Figure 6: Estimated P(Y | do(X)) on Confounded Color-MNIST and CelebA. VAE and ANCM each collapse to a single point; CAUVADE traces the feasible region (shaded, Manski bound [22]) as γ is swept. Confounded Color-MNIST. We construct a confounded variant of Color-MNIST with digits X ∈ {0, 1} and background colors Y ∈ {green, blue} (encoded as 0, 1), introducing a binary confounder U that jointly governs both. Sample… view at source ↗

**Figure 7.** Figure 7: (a) Confounded CelebA samples P(I | X = 1); (b) unconfounded ground truth P(I | do(X=1)); (c) VAE; (d) ANCM; (e) CAUVADE at γ = 0; (f) CAUVADE at γ = 1; (g) CAUVADE at γ = 10; (h) CAUVADE at γ = 100. Blue circles mark images with Y = 1 (Heavy Makeup) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Confounded MIMIC-CXR-JPG samples: X = 0 (top row, no Pneumonia) and X = 1 (bottom row, Pneumonia). Confounded MIMIC-CXR-JPG. We evaluate CAUVADE on MIMIC-CXR-JPG [13, 14, 5] (377,110 radiographs; [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Generated samples from CAUVADE on Colored MNIST across varying constraint strengths: (a) γ = 1; (b) γ = 2; (c) γ = 10; (d) γ = 20; (e) γ = 50; and (f) γ = 100. stages compute their channel dimensions as multiples of h (i.e. h, 2h, 4h, . . .), so the model’s representational capacity scales with image complexity. The latent dimension dz defines the size of the global latent z and is increased on higher-reso… view at source ↗

**Figure 10.** Figure 10: Visualizing data distributions on CelebA. Samples are drawn from: (a) the confounded [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Generated samples from CAUVADE under interventions PX=0(I) (top) and PX=1(I) (bottom). association—potentially propagating diagnostic bias into any downstream model trained on its output, audit, or counterfactual explanation. CAUVADE mitigates this by exposing the feasible region for partial identification rather than collapsing onto a single explanation, giving practitioners a tool to inspect the family … view at source ↗

read the original abstract

Deep generative models reproduce the observational distribution of their training data, inheriting any spurious associations it contains. A common source is an unobserved confounder that shapes both an attribute the user wants to control at sampling time and an attribute expected to vary in response. Existing causal generative approaches resolve the resulting ambiguity by imposing structural assumptions strong enough to single out one interventional distribution; in image domains, such assumptions are rarely warranted, and the data is generally consistent with a set of distinct causal mechanisms -- a feasible region of interventional distributions. We propose CauVaDE (Causal Variational Deep Embedding), built on a canonical augmented SCM in which the unobserved confounder collapses, without loss of generality, into a discrete latent cluster of bounded support while continuous variation is absorbed into independent noises. We prove that this canonical class is dense, in both observational and interventional Wasserstein distance, in the class of augmented SCMs compatible with a given causal diagram, and instantiate it as a mixture variational autoencoder whose cluster variable plays the role of the canonical confounder. An entropy regularizer with weight $\gamma$ on the cluster posterior then traces a family of candidate causal effects that fit the observational data to comparable likelihood while spanning the feasible region. Experiments on image data benchmarks show that CauVaDE produces diverse interventional samples and improves FID against an unconfounded reference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CauVaDE gives a concrete mixture-VAE plus entropy regularizer to span families of interventional image distributions, but the density claim rests on a discrete-cluster reduction whose scope is unclear.

read the letter

The paper's core move is to replace an arbitrary unobserved confounder with a finite discrete cluster variable whose continuous effects are absorbed into independent noise terms, then prove this canonical class is dense in Wasserstein distance for both observational and interventional distributions. They instantiate the class as a mixture VAE and add an entropy term on the cluster posterior, controlled by gamma, so that varying gamma traces different candidate causal effects that all fit the observational likelihood to similar degree.

This construction is new relative to prior causal VAE work. It supplies an explicit mechanism for producing a range of interventional distributions rather than committing to one, which matches a practical need when image data is consistent with multiple mechanisms.

The experiments report FID gains against an unconfounded baseline and show diverse interventional samples, which is useful evidence that the regularizer does something.

The soft spot is the reduction itself. The claim that any augmented SCM compatible with the diagram can be approximated arbitrarily closely by the discrete-cluster form is presented as without loss of generality. If the confounder acts through non-separable continuous mechanisms that cannot be isolated into additive noises, the interventional marginals may not be preserved and the spanned region would be strictly smaller than the true feasible set. The abstract asserts a proof but gives no derivation steps, so it is impossible to tell whether the argument covers the general case or only separable mechanisms. The experimental section would need to demonstrate that the generated family actually reaches the boundaries of the feasible region, not just produce visually varied outputs.

The work is aimed at people building causal generative models for vision who want to avoid strong structural assumptions. A reader working on that problem would get value from the instantiation and the regularizer idea. It deserves peer review so the density argument and the validation of the spanned region can be checked in detail.

Referee Report

2 major / 2 minor

Summary. The paper proposes CauVaDE, a mixture variational autoencoder for generating interventional images under unobserved confounding. It defines a canonical augmented SCM in which the confounder is replaced by a discrete latent cluster variable of finite support (with continuous effects absorbed into independent noises), proves that this class is dense in both observational and interventional Wasserstein distance among all augmented SCMs compatible with a given causal diagram, and uses an entropy regularizer of weight γ on the cluster posterior to trace a family of candidate interventional distributions that fit the observational data while spanning the feasible region. Experiments on image benchmarks report improved FID scores relative to an unconfounded reference.

Significance. If the density result and the w.l.o.g. reduction hold, the work supplies a principled, assumption-light method for producing diverse interventional samples in image domains where multiple causal mechanisms remain consistent with the data; the entropy-regularized family explicitly parametrizes the feasible region rather than selecting a single effect. The combination of a provable approximation result with a practical VAE instantiation is a substantive contribution to causal generative modeling.

major comments (2)

[Abstract; canonical augmented SCM definition and density proof] The central density claim rests on the assertion (abstract and canonical-SCM section) that any augmented SCM compatible with the diagram can be approximated arbitrarily closely by replacing the unobserved confounder with a discrete cluster of bounded support while pushing all remaining continuous variation into independent noises. This reduction is presented as without loss of generality, yet it is unclear whether interventional marginals are preserved when the confounder enters through non-separable continuous mechanisms; if the reduction fails for such diagrams, the spanned feasible region is strictly smaller than the true set and the density statement does not hold.
[Entropy-regularizer paragraph; density theorem statement] The entropy regularizer is claimed to trace the feasible region independently of the specific fitted γ values. Without the explicit derivation showing that the Wasserstein distance to the target interventional distributions remains controlled uniformly in γ (or an explicit statement of the conditions under which this independence holds), it is impossible to confirm that the family genuinely spans the region rather than collapsing to a single point for some diagrams.

minor comments (2)

[Experiments] The experimental section should report the precise range of γ values tested and the corresponding cluster-posterior entropy values to allow readers to verify that the reported FID improvements correspond to distinct points along the claimed family.
[Model instantiation] Notation for the mixture components and the role of the cluster variable as the canonical confounder should be introduced with an explicit diagram or equation reference in the model section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below.

read point-by-point responses

Referee: [Abstract; canonical augmented SCM definition and density proof] The central density claim rests on the assertion (abstract and canonical-SCM section) that any augmented SCM compatible with the diagram can be approximated arbitrarily closely by replacing the unobserved confounder with a discrete cluster of bounded support while pushing all remaining continuous variation into independent noises. This reduction is presented as without loss of generality, yet it is unclear whether interventional marginals are preserved when the confounder enters through non-separable continuous mechanisms; if the reduction fails for such diagrams, the spanned feasible region is strictly smaller than the true set and the density statement does not hold.

Authors: The canonical construction absorbs any continuous non-separable effects of the confounder into the independent noise terms by definition, so that the discrete cluster approximates the marginal distribution of the confounder while the SCM structure (and therefore the interventional marginals) is preserved. The density theorem then shows that the resulting class is dense in Wasserstein distance for both observational and interventional measures over all augmented SCMs compatible with the diagram. We will insert a short clarifying sentence in the canonical-SCM section of the revision to make this absorption step explicit for non-separable mechanisms. revision: yes
Referee: [Entropy-regularizer paragraph; density theorem statement] The entropy regularizer is claimed to trace the feasible region independently of the specific fitted γ values. Without the explicit derivation showing that the Wasserstein distance to the target interventional distributions remains controlled uniformly in γ (or an explicit statement of the conditions under which this independence holds), it is impossible to confirm that the family genuinely spans the region rather than collapsing to a single point for some diagrams.

Authors: Different values of γ produce posteriors with different entropies and therefore different effective cluster assignments, each of which corresponds to a distinct interventional distribution inside the feasible region while remaining observationally consistent. Because every such model lies inside the dense canonical class, the Wasserstein distance to any target interventional distribution is controlled uniformly by the density result. We agree that an explicit statement of this uniform control (or the precise conditions) is currently only implicit and will add a short derivation in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; density proof and wlog reduction presented as independent

full rationale

The paper states it proves the canonical class (with confounder collapsed to discrete bounded cluster and continuous effects in independent noises) is dense in both observational and interventional Wasserstein distance over augmented SCMs compatible with a given diagram. This proof is invoked to justify the 'without loss of generality' reduction and the subsequent mixture-VAE instantiation. The entropy regularizer with weight γ is described as tracing a family spanning the feasible region rather than defining any target quantity by construction. No equations or steps are shown reducing a claimed prediction or result to a fitted parameter or self-citation by definition. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes smuggled via citation are identified. The central claim remains self-contained against the stated external benchmarks of Wasserstein distances.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the density of the canonical discrete-cluster class and the modeling choice that the confounder can be represented this way without loss of generality; gamma is the only explicit tunable weight mentioned.

free parameters (1)

gamma
Weight on the entropy regularizer of the cluster posterior; controls how the model spans the family of interventional distributions.

axioms (1)

domain assumption The canonical class of augmented SCMs with discrete bounded-support confounder is dense in observational and interventional Wasserstein distance within the class compatible with a given causal diagram.
Invoked to justify that the mixture VAE instantiation covers the feasible region of causal mechanisms.

invented entities (1)

discrete latent cluster variable as canonical confounder no independent evidence
purpose: To collapse the unobserved confounder while preserving the ability to generate interventional samples.
Introduced as the modeling device that makes the density result usable inside a VAE.

pith-pipeline@v0.9.1-grok · 5772 in / 1485 out tokens · 30878 ms · 2026-06-26T14:07:30.764731+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages · 2 internal anchors

[1]

Balke and J

A. Balke and J. Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439):1171–1176, 1997

1997
[2]

Bareinboim, J

E. Bareinboim, J. D. Correa, D. Ibeling, and T. Icard. On Pearl’s hierarchy and the foundations of causal inference. In H. Geffner, R. Dechter, and J. Y . Halpern, editors,Probabilistic and Causal Inference: The Works of Judea Pearl, ACM Books, pages 507–556. Association for Computing Machinery, New York, NY , USA, 2022

2022
[3]

Duarte, N

G. Duarte, N. Finkelstein, D. Knox, J. Mummolo, and I. Shpitser. An automated approach to causal inference in discrete settings.Journal of the American Statistical Association, 119(547):1778–1793, 2024

2024
[4]

C. E. Frangakis and D. B. Rubin. Principal stratification in causal inference.Biometrics, 58(1):21–29, 2002

2002
[5]

A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, June 2000. PMID: 10851218

2000
[6]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Generative adversarial networks.arXiv preprint arXiv:1406.2661, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

Hartford, G

J. Hartford, G. Lewis, K. Leyton-Brown, and M. Taddy. Deep IV: A flexible approach for counterfactual prediction. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70 ofProceedings of Machine Learning Research, pages 1414–1423. PMLR, 2017

2017
[8]

Higgins, L

I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Ler- chner. β-V AE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR), 2017

2017
[9]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.arXiv preprint arXiv:2006.11239, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[10]

Hyvärinen and P

A. Hyvärinen and P. Pajunen. Nonlinear independent component analysis: Existence and uniqueness results.Neural Networks, 12(3):429–439, 1999

1999
[11]

Javaloy, P

A. Javaloy, P. Sánchez-Martín, and I. Valera. Causal normalizing flows: From theory to practice. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023), pages 58833–58864, 2023

2023
[12]

Jiang, Y

Z. Jiang, Y . Zheng, H. Tan, B. Tang, and H. Zhou. Variational deep embedding: An unsupervised and generative approach to clustering. InProceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 1965–1972, 2017

1965
[13]

A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, Y . Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng. MIMIC-CXR-JPG — chest radiographs with structured labels.PhysioNet, March 2024. Version 2.1.0

2024
[14]

A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C. ying Deng, Y . Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs, 2019

2019
[15]

Khemakhem, D

I. Khemakhem, D. P. Kingma, R. P. Monti, and A. Hyvärinen. Variational autoencoders and nonlinear ICA: A unifying framework. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS), volume 108 ofProceedings of Machine Learning Research, pages 2207–2217. PMLR, 2020

2020
[16]

Khemakhem, R

I. Khemakhem, R. Monti, R. Leech, and A. Hyvärinen. Causal autoregressive flows. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 ofProceedings of Machine Learning Research, pages 3520–3528. PMLR, 2021. 10

2021
[17]

D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In2nd International Confer- ence on Learning Representations (ICLR), 2014

2014
[18]

Kocaoglu, C

M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath. CausalGAN: Learning causal implicit generative models with adversarial training. InInternational Conference on Learning Representations (ICLR), 2018

2018
[19]

Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. InProceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

2015
[20]

Locatello, S

F. Locatello, S. Bauer, M. Lucic, G. Rätsch, S. Gelly, B. Schölkopf, and O. Bachem. Chal- lenging common assumptions in the unsupervised learning of disentangled representations. In Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 4114–4124. PMLR, 2019

2019
[21]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization, 2019

2019
[22]

C. F. Manski. Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990

1990
[23]

C. F. Manski. Monotone treatment response.Econometrica, 65(6):1311–1334, 1997

1997
[24]

W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen. Identifying causal effects with proxy variables of an unmeasured confounder.Biometrika, 105(4):987–993, 2018

2018
[25]

Monteiro, F

M. Monteiro, F. De Sousa Ribeiro, N. Pawlowski, D. C. Castro, and B. Glocker. Measuring axiomatic soundness of counterfactual image models. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

2023
[26]

Nasr-Esfahany, M

A. Nasr-Esfahany, M. Alizadeh, and D. Shah. Counterfactual identifiability of bijective causal models. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceedings of Machine Learning Research, pages 25733–25754. PMLR, 2023

2023
[27]

K. Padh, J. Zeitler, D. Watson, M. Kusner, R. Silva, and N. Kilbertus. Stochastic causal programming for bounding treatment effects. InProceedings of the Second Conference on Causal Learning and Reasoning (CLeaR), volume 213 ofProceedings of Machine Learning Research, pages 142–176. PMLR, 2023

2023
[28]

Pan and E

Y . Pan and E. Bareinboim. Counterfactual image editing. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, pages 39087–39101. PMLR, 2024

2024
[29]

Pawlowski, D

N. Pawlowski, D. C. Castro, and B. Glocker. Deep structural causal models for tractable counterfactual inference. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 857–869, 2020

2020
[30]

Pearl.Causality: Models, Reasoning, and Inference

J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000

2000
[31]

P. R. Rosenbaum.Observational Studies. Springer, New York, 2 edition, 2002

2002
[32]

Sanchez and S

P. Sanchez and S. A. Tsaftaris. Diffusion causal models for counterfactual estimation. In Proceedings of the First Conference on Causal Learning and Reasoning (CLeaR), volume 177 ofProceedings of Machine Learning Research, pages 647–668. PMLR, 2022

2022
[33]

Schölkopf, F

B. Schölkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y . Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

2021
[34]

E. J. Tchetgen Tchetgen, A. Ying, Y . Cui, X. Shi, and W. Miao. An introduction to proximal causal inference.Statistical Science, 39(3):375–390, 2024

2024
[35]

T. J. VanderWeele and P. Ding. Sensitivity analysis in observational research: Introducing the E-value.Annals of Internal Medicine, 167(4):268–274, 2017. 11

2017
[36]

Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften

C. Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 2009

2009
[37]

Xia, K.-Z

K. Xia, K.-Z. Lee, Y . Bengio, and E. Bareinboim. The causal-neural connection: Expressiveness, learnability, and inference. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021

2021
[38]

K. Xia, Y . Pan, and E. Bareinboim. Neural causal models for counterfactual identification and estimation. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

2023
[39]

L. Xu, Y . Chen, S. Srinivasan, N. de Freitas, A. Doucet, and A. Gretton. Learning deep features in instrumental variable regression. InInternational Conference on Learning Representations (ICLR), 2021

2021
[40]

M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang. CausalV AE: Disentangled representation learning via neural structural causal models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9593–9602, 2021

2021
[41]

Zhang, J

J. Zhang, J. Tian, and E. Bareinboim. Partial counterfactual identification from observational and experimental data. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 26548–26558. PMLR, 2022. A Related Work Causal generative models.A growing body of work injects cau...

2022

[1] [1]

Balke and J

A. Balke and J. Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439):1171–1176, 1997

1997

[2] [2]

Bareinboim, J

E. Bareinboim, J. D. Correa, D. Ibeling, and T. Icard. On Pearl’s hierarchy and the foundations of causal inference. In H. Geffner, R. Dechter, and J. Y . Halpern, editors,Probabilistic and Causal Inference: The Works of Judea Pearl, ACM Books, pages 507–556. Association for Computing Machinery, New York, NY , USA, 2022

2022

[3] [3]

Duarte, N

G. Duarte, N. Finkelstein, D. Knox, J. Mummolo, and I. Shpitser. An automated approach to causal inference in discrete settings.Journal of the American Statistical Association, 119(547):1778–1793, 2024

2024

[4] [4]

C. E. Frangakis and D. B. Rubin. Principal stratification in causal inference.Biometrics, 58(1):21–29, 2002

2002

[5] [5]

A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, June 2000. PMID: 10851218

2000

[6] [6]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Generative adversarial networks.arXiv preprint arXiv:1406.2661, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[7] [7]

Hartford, G

J. Hartford, G. Lewis, K. Leyton-Brown, and M. Taddy. Deep IV: A flexible approach for counterfactual prediction. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70 ofProceedings of Machine Learning Research, pages 1414–1423. PMLR, 2017

2017

[8] [8]

Higgins, L

I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Ler- chner. β-V AE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR), 2017

2017

[9] [9]

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.arXiv preprint arXiv:2006.11239, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006

[10] [10]

Hyvärinen and P

A. Hyvärinen and P. Pajunen. Nonlinear independent component analysis: Existence and uniqueness results.Neural Networks, 12(3):429–439, 1999

1999

[11] [11]

Javaloy, P

A. Javaloy, P. Sánchez-Martín, and I. Valera. Causal normalizing flows: From theory to practice. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023), pages 58833–58864, 2023

2023

[12] [12]

Jiang, Y

Z. Jiang, Y . Zheng, H. Tan, B. Tang, and H. Zhou. Variational deep embedding: An unsupervised and generative approach to clustering. InProceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 1965–1972, 2017

1965

[13] [13]

A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, Y . Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng. MIMIC-CXR-JPG — chest radiographs with structured labels.PhysioNet, March 2024. Version 2.1.0

2024

[14] [14]

A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C. ying Deng, Y . Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs, 2019

2019

[15] [15]

Khemakhem, D

I. Khemakhem, D. P. Kingma, R. P. Monti, and A. Hyvärinen. Variational autoencoders and nonlinear ICA: A unifying framework. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS), volume 108 ofProceedings of Machine Learning Research, pages 2207–2217. PMLR, 2020

2020

[16] [16]

Khemakhem, R

I. Khemakhem, R. Monti, R. Leech, and A. Hyvärinen. Causal autoregressive flows. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 ofProceedings of Machine Learning Research, pages 3520–3528. PMLR, 2021. 10

2021

[17] [17]

D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In2nd International Confer- ence on Learning Representations (ICLR), 2014

2014

[18] [18]

Kocaoglu, C

M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath. CausalGAN: Learning causal implicit generative models with adversarial training. InInternational Conference on Learning Representations (ICLR), 2018

2018

[19] [19]

Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. InProceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

2015

[20] [20]

Locatello, S

F. Locatello, S. Bauer, M. Lucic, G. Rätsch, S. Gelly, B. Schölkopf, and O. Bachem. Chal- lenging common assumptions in the unsupervised learning of disentangled representations. In Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 4114–4124. PMLR, 2019

2019

[21] [21]

Loshchilov and F

I. Loshchilov and F. Hutter. Decoupled weight decay regularization, 2019

2019

[22] [22]

C. F. Manski. Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990

1990

[23] [23]

C. F. Manski. Monotone treatment response.Econometrica, 65(6):1311–1334, 1997

1997

[24] [24]

W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen. Identifying causal effects with proxy variables of an unmeasured confounder.Biometrika, 105(4):987–993, 2018

2018

[25] [25]

Monteiro, F

M. Monteiro, F. De Sousa Ribeiro, N. Pawlowski, D. C. Castro, and B. Glocker. Measuring axiomatic soundness of counterfactual image models. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

2023

[26] [26]

Nasr-Esfahany, M

A. Nasr-Esfahany, M. Alizadeh, and D. Shah. Counterfactual identifiability of bijective causal models. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceedings of Machine Learning Research, pages 25733–25754. PMLR, 2023

2023

[27] [27]

K. Padh, J. Zeitler, D. Watson, M. Kusner, R. Silva, and N. Kilbertus. Stochastic causal programming for bounding treatment effects. InProceedings of the Second Conference on Causal Learning and Reasoning (CLeaR), volume 213 ofProceedings of Machine Learning Research, pages 142–176. PMLR, 2023

2023

[28] [28]

Pan and E

Y . Pan and E. Bareinboim. Counterfactual image editing. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, pages 39087–39101. PMLR, 2024

2024

[29] [29]

Pawlowski, D

N. Pawlowski, D. C. Castro, and B. Glocker. Deep structural causal models for tractable counterfactual inference. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 857–869, 2020

2020

[30] [30]

Pearl.Causality: Models, Reasoning, and Inference

J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000

2000

[31] [31]

P. R. Rosenbaum.Observational Studies. Springer, New York, 2 edition, 2002

2002

[32] [32]

Sanchez and S

P. Sanchez and S. A. Tsaftaris. Diffusion causal models for counterfactual estimation. In Proceedings of the First Conference on Causal Learning and Reasoning (CLeaR), volume 177 ofProceedings of Machine Learning Research, pages 647–668. PMLR, 2022

2022

[33] [33]

Schölkopf, F

B. Schölkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y . Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

2021

[34] [34]

E. J. Tchetgen Tchetgen, A. Ying, Y . Cui, X. Shi, and W. Miao. An introduction to proximal causal inference.Statistical Science, 39(3):375–390, 2024

2024

[35] [35]

T. J. VanderWeele and P. Ding. Sensitivity analysis in observational research: Introducing the E-value.Annals of Internal Medicine, 167(4):268–274, 2017. 11

2017

[36] [36]

Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften

C. Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 2009

2009

[37] [37]

Xia, K.-Z

K. Xia, K.-Z. Lee, Y . Bengio, and E. Bareinboim. The causal-neural connection: Expressiveness, learnability, and inference. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021

2021

[38] [38]

K. Xia, Y . Pan, and E. Bareinboim. Neural causal models for counterfactual identification and estimation. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

2023

[39] [39]

L. Xu, Y . Chen, S. Srinivasan, N. de Freitas, A. Doucet, and A. Gretton. Learning deep features in instrumental variable regression. InInternational Conference on Learning Representations (ICLR), 2021

2021

[40] [40]

M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang. CausalV AE: Disentangled representation learning via neural structural causal models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9593–9602, 2021

2021

[41] [41]

Zhang, J

J. Zhang, J. Tian, and E. Bareinboim. Partial counterfactual identification from observational and experimental data. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 26548–26558. PMLR, 2022. A Related Work Causal generative models.A growing body of work injects cau...

2022