pith. sign in

arxiv: 2606.21806 · v1 · pith:PGPMFGBFnew · submitted 2026-06-19 · 💻 cs.LG

Causal Variational Deep Embedding: A Family of Interventional Generators for Confounded Images

Pith reviewed 2026-06-26 14:07 UTC · model grok-4.3

classification 💻 cs.LG
keywords causal generative modelsvariational autoencodersconfounded imagesinterventional distributionsstructural causal modelsmixture modelsentropy regularization
0
0 comments X

The pith

A canonical class of augmented causal models with discrete cluster confounders is dense in Wasserstein distance for any diagram-compatible mechanism, allowing a mixture VAE to trace families of interventional image generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that deep generative models need not commit to one interventional distribution when an unobserved confounder creates ambiguity in image data. Instead, a canonical form collapses the confounder into a discrete latent cluster while absorbing other variation into noises, and this form can approximate any compatible causal structure arbitrarily closely. The authors instantiate the form as a mixture variational autoencoder and add an entropy penalty on the cluster posterior whose strength varies the implied causal effect. A sympathetic reader would care because the method produces multiple plausible counterfactual images from the same observational training set without requiring assumptions strong enough to identify a unique mechanism.

Core claim

We prove that this canonical class is dense, in both observational and interventional Wasserstein distance, in the class of augmented SCMs compatible with a given causal diagram, and instantiate it as a mixture variational autoencoder whose cluster variable plays the role of the canonical confounder. An entropy regularizer with weight γ on the cluster posterior then traces a family of candidate causal effects that fit the observational data to comparable likelihood while spanning the feasible region.

What carries the argument

The canonical augmented SCM whose unobserved confounder is represented as a discrete latent cluster of bounded support, instantiated as a mixture variational autoencoder with entropy regularization on the cluster posterior.

If this is right

  • The mixture VAE produces diverse interventional samples on image benchmarks.
  • The generated images achieve improved FID scores relative to an unconfounded reference.
  • Varying the entropy weight γ yields a continuous family of causal effects that all match observational likelihood.
  • The density result implies that the canonical form can stand in for any diagram-compatible model to arbitrary precision.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same canonical reduction might be applied to non-image domains where confounders produce similar ambiguity in generative models.
  • The regularization parameter γ offers a tunable control for the strength of the implied causal mechanism.
  • Downstream tasks such as data augmentation or fairness auditing could use the spanned family to quantify sensitivity to confounding.
  • Empirical checks on real confounded image pairs with known interventions would test whether the generated family brackets ground-truth effects.

Load-bearing premise

The unobserved confounder can be represented without loss of generality as a discrete latent cluster of bounded support, with all continuous variation absorbed into independent noise terms.

What would settle it

Construct a synthetic image dataset from a known augmented SCM with a continuous confounder, train the model, and check whether the interventional distributions obtained by varying γ include or approach the true interventional distribution in Wasserstein distance.

Figures

Figures reproduced from arXiv: 2606.21806 by Jingyuan Chen, Junzhe Zhang, Kangrui Ruan.

Figure 2
Figure 2. Figure 2: Samples from (a) confounded and (b) unconfounded Color-MNIST; and generated by [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 1
Figure 1. Figure 1: Estimated P(Y =1 | do(X=1)) on confounded Color￾MNIST. Dashed lines indicate Manski’s bound [22] identified from observational data. Example 1 (Confounded Color-MNIST). Consider a binary Color￾MNIST in which digit X ∈ {0, 1} and background color Y ∈ {green, blue} are linked through a binary confounder U in the train￾ing set (Fig. 2a), but are independent in the unconfounded distribution we would like the m… view at source ↗
Figure 3
Figure 3. Figure 3: (a) An ASCM with treatment X, pre-treatment Z, post-treatment attribute Y , and image I. (b) Submodel induced by do(X=x), in which the structural equation for X is replaced by the constant X ← x. We use capital letters to denote variables (X), small letters for their values (x), and ΩX for their domains. For a set X, |X| denotes its cardinality. The probabil￾ity distribution over variables X is denoted by … view at source ↗
Figure 4
Figure 4. Figure 4: CAUVADE: (a) decoder of (4), where the discrete cluster C instantiates the canonical confounder of Def. 2.1 and the con￾tinuous Z bundles the pre-treatment attribute with independent noises; (b) encoder network qϕ(Z, C | I) = qϕ(Z | I) qϕ(C | Z). CAUVADE is itself an instance of CASCM. The starting point is Variational Deep Embedding (VaDE) [12], which models the latent space of a VAE as a Gaussian mixture… view at source ↗
Figure 5
Figure 5. Figure 5: CAUVADE samples on Confounded Color-MNIST under X = 1 at (a) γ = 1, (b) γ = 2, (c) γ = 10, (d) γ = 20, (e) γ = 50, and (f) γ = 100. Larger γ shifts the digit–color association, revealing distinct causal mechanisms consistent with the same observational distribution. We assess each by how well its estimated P(Y | do(X)) aligns with the feasible region partially identifiable from observational data, characte… view at source ↗
Figure 6
Figure 6. Figure 6: Estimated P(Y | do(X)) on Confounded Color-MNIST and CelebA. VAE and ANCM each col￾lapse to a single point; CAUVADE traces the feasible region (shaded, Manski bound [22]) as γ is swept. Confounded Color-MNIST. We construct a confounded variant of Color-MNIST with digits X ∈ {0, 1} and background col￾ors Y ∈ {green, blue} (encoded as 0, 1), introducing a binary confounder U that jointly governs both. Sample… view at source ↗
Figure 7
Figure 7. Figure 7: (a) Confounded CelebA samples P(I | X = 1); (b) unconfounded ground truth P(I | do(X=1)); (c) VAE; (d) ANCM; (e) CAUVADE at γ = 0; (f) CAUVADE at γ = 1; (g) CAUVADE at γ = 10; (h) CAUVADE at γ = 100. Blue circles mark images with Y = 1 (Heavy Makeup) [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Confounded MIMIC-CXR-JPG samples: X = 0 (top row, no Pneumonia) and X = 1 (bottom row, Pneumonia). Confounded MIMIC-CXR-JPG. We evaluate CAU￾VADE on MIMIC-CXR-JPG [13, 14, 5] (377,110 radiographs; [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Generated samples from CAUVADE on Colored MNIST across varying constraint strengths: (a) γ = 1; (b) γ = 2; (c) γ = 10; (d) γ = 20; (e) γ = 50; and (f) γ = 100. stages compute their channel dimensions as multiples of h (i.e. h, 2h, 4h, . . .), so the model’s representational capacity scales with image complexity. The latent dimension dz defines the size of the global latent z and is increased on higher-reso… view at source ↗
Figure 10
Figure 10. Figure 10: Visualizing data distributions on CelebA. Samples are drawn from: (a) the confounded [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Generated samples from CAUVADE under interventions PX=0(I) (top) and PX=1(I) (bottom). association—potentially propagating diagnostic bias into any downstream model trained on its output, audit, or counterfactual explanation. CAUVADE mitigates this by exposing the feasible region for partial identification rather than collapsing onto a single explanation, giving practitioners a tool to inspect the family … view at source ↗
read the original abstract

Deep generative models reproduce the observational distribution of their training data, inheriting any spurious associations it contains. A common source is an unobserved confounder that shapes both an attribute the user wants to control at sampling time and an attribute expected to vary in response. Existing causal generative approaches resolve the resulting ambiguity by imposing structural assumptions strong enough to single out one interventional distribution; in image domains, such assumptions are rarely warranted, and the data is generally consistent with a set of distinct causal mechanisms -- a feasible region of interventional distributions. We propose CauVaDE (Causal Variational Deep Embedding), built on a canonical augmented SCM in which the unobserved confounder collapses, without loss of generality, into a discrete latent cluster of bounded support while continuous variation is absorbed into independent noises. We prove that this canonical class is dense, in both observational and interventional Wasserstein distance, in the class of augmented SCMs compatible with a given causal diagram, and instantiate it as a mixture variational autoencoder whose cluster variable plays the role of the canonical confounder. An entropy regularizer with weight $\gamma$ on the cluster posterior then traces a family of candidate causal effects that fit the observational data to comparable likelihood while spanning the feasible region. Experiments on image data benchmarks show that CauVaDE produces diverse interventional samples and improves FID against an unconfounded reference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes CauVaDE, a mixture variational autoencoder for generating interventional images under unobserved confounding. It defines a canonical augmented SCM in which the confounder is replaced by a discrete latent cluster variable of finite support (with continuous effects absorbed into independent noises), proves that this class is dense in both observational and interventional Wasserstein distance among all augmented SCMs compatible with a given causal diagram, and uses an entropy regularizer of weight γ on the cluster posterior to trace a family of candidate interventional distributions that fit the observational data while spanning the feasible region. Experiments on image benchmarks report improved FID scores relative to an unconfounded reference.

Significance. If the density result and the w.l.o.g. reduction hold, the work supplies a principled, assumption-light method for producing diverse interventional samples in image domains where multiple causal mechanisms remain consistent with the data; the entropy-regularized family explicitly parametrizes the feasible region rather than selecting a single effect. The combination of a provable approximation result with a practical VAE instantiation is a substantive contribution to causal generative modeling.

major comments (2)
  1. [Abstract; canonical augmented SCM definition and density proof] The central density claim rests on the assertion (abstract and canonical-SCM section) that any augmented SCM compatible with the diagram can be approximated arbitrarily closely by replacing the unobserved confounder with a discrete cluster of bounded support while pushing all remaining continuous variation into independent noises. This reduction is presented as without loss of generality, yet it is unclear whether interventional marginals are preserved when the confounder enters through non-separable continuous mechanisms; if the reduction fails for such diagrams, the spanned feasible region is strictly smaller than the true set and the density statement does not hold.
  2. [Entropy-regularizer paragraph; density theorem statement] The entropy regularizer is claimed to trace the feasible region independently of the specific fitted γ values. Without the explicit derivation showing that the Wasserstein distance to the target interventional distributions remains controlled uniformly in γ (or an explicit statement of the conditions under which this independence holds), it is impossible to confirm that the family genuinely spans the region rather than collapsing to a single point for some diagrams.
minor comments (2)
  1. [Experiments] The experimental section should report the precise range of γ values tested and the corresponding cluster-posterior entropy values to allow readers to verify that the reported FID improvements correspond to distinct points along the claimed family.
  2. [Model instantiation] Notation for the mixture components and the role of the cluster variable as the canonical confounder should be introduced with an explicit diagram or equation reference in the model section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below.

read point-by-point responses
  1. Referee: [Abstract; canonical augmented SCM definition and density proof] The central density claim rests on the assertion (abstract and canonical-SCM section) that any augmented SCM compatible with the diagram can be approximated arbitrarily closely by replacing the unobserved confounder with a discrete cluster of bounded support while pushing all remaining continuous variation into independent noises. This reduction is presented as without loss of generality, yet it is unclear whether interventional marginals are preserved when the confounder enters through non-separable continuous mechanisms; if the reduction fails for such diagrams, the spanned feasible region is strictly smaller than the true set and the density statement does not hold.

    Authors: The canonical construction absorbs any continuous non-separable effects of the confounder into the independent noise terms by definition, so that the discrete cluster approximates the marginal distribution of the confounder while the SCM structure (and therefore the interventional marginals) is preserved. The density theorem then shows that the resulting class is dense in Wasserstein distance for both observational and interventional measures over all augmented SCMs compatible with the diagram. We will insert a short clarifying sentence in the canonical-SCM section of the revision to make this absorption step explicit for non-separable mechanisms. revision: yes

  2. Referee: [Entropy-regularizer paragraph; density theorem statement] The entropy regularizer is claimed to trace the feasible region independently of the specific fitted γ values. Without the explicit derivation showing that the Wasserstein distance to the target interventional distributions remains controlled uniformly in γ (or an explicit statement of the conditions under which this independence holds), it is impossible to confirm that the family genuinely spans the region rather than collapsing to a single point for some diagrams.

    Authors: Different values of γ produce posteriors with different entropies and therefore different effective cluster assignments, each of which corresponds to a distinct interventional distribution inside the feasible region while remaining observationally consistent. Because every such model lies inside the dense canonical class, the Wasserstein distance to any target interventional distribution is controlled uniformly by the density result. We agree that an explicit statement of this uniform control (or the precise conditions) is currently only implicit and will add a short derivation in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; density proof and wlog reduction presented as independent

full rationale

The paper states it proves the canonical class (with confounder collapsed to discrete bounded cluster and continuous effects in independent noises) is dense in both observational and interventional Wasserstein distance over augmented SCMs compatible with a given diagram. This proof is invoked to justify the 'without loss of generality' reduction and the subsequent mixture-VAE instantiation. The entropy regularizer with weight γ is described as tracing a family spanning the feasible region rather than defining any target quantity by construction. No equations or steps are shown reducing a claimed prediction or result to a fitted parameter or self-citation by definition. No load-bearing self-citations, uniqueness theorems from prior author work, or ansatzes smuggled via citation are identified. The central claim remains self-contained against the stated external benchmarks of Wasserstein distances.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the density of the canonical discrete-cluster class and the modeling choice that the confounder can be represented this way without loss of generality; gamma is the only explicit tunable weight mentioned.

free parameters (1)
  • gamma
    Weight on the entropy regularizer of the cluster posterior; controls how the model spans the family of interventional distributions.
axioms (1)
  • domain assumption The canonical class of augmented SCMs with discrete bounded-support confounder is dense in observational and interventional Wasserstein distance within the class compatible with a given causal diagram.
    Invoked to justify that the mixture VAE instantiation covers the feasible region of causal mechanisms.
invented entities (1)
  • discrete latent cluster variable as canonical confounder no independent evidence
    purpose: To collapse the unobserved confounder while preserving the ability to generate interventional samples.
    Introduced as the modeling device that makes the density result usable inside a VAE.

pith-pipeline@v0.9.1-grok · 5772 in / 1485 out tokens · 30878 ms · 2026-06-26T14:07:30.764731+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 2 canonical work pages · 2 internal anchors

  1. [1]

    Balke and J

    A. Balke and J. Pearl. Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association, 92(439):1171–1176, 1997

  2. [2]

    Bareinboim, J

    E. Bareinboim, J. D. Correa, D. Ibeling, and T. Icard. On Pearl’s hierarchy and the foundations of causal inference. In H. Geffner, R. Dechter, and J. Y . Halpern, editors,Probabilistic and Causal Inference: The Works of Judea Pearl, ACM Books, pages 507–556. Association for Computing Machinery, New York, NY , USA, 2022

  3. [3]

    Duarte, N

    G. Duarte, N. Finkelstein, D. Knox, J. Mummolo, and I. Shpitser. An automated approach to causal inference in discrete settings.Journal of the American Statistical Association, 119(547):1778–1793, 2024

  4. [4]

    C. E. Frangakis and D. B. Rubin. Principal stratification in causal inference.Biometrics, 58(1):21–29, 2002

  5. [5]

    A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23):e215–e220, June 2000. PMID: 10851218

  6. [6]

    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio. Generative adversarial networks.arXiv preprint arXiv:1406.2661, 2014

  7. [7]

    Hartford, G

    J. Hartford, G. Lewis, K. Leyton-Brown, and M. Taddy. Deep IV: A flexible approach for counterfactual prediction. InProceedings of the 34th International Conference on Machine Learning (ICML), volume 70 ofProceedings of Machine Learning Research, pages 1414–1423. PMLR, 2017

  8. [8]

    Higgins, L

    I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Ler- chner. β-V AE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations (ICLR), 2017

  9. [9]

    J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models.arXiv preprint arXiv:2006.11239, 2020

  10. [10]

    Hyvärinen and P

    A. Hyvärinen and P. Pajunen. Nonlinear independent component analysis: Existence and uniqueness results.Neural Networks, 12(3):429–439, 1999

  11. [11]

    Javaloy, P

    A. Javaloy, P. Sánchez-Martín, and I. Valera. Causal normalizing flows: From theory to practice. InAdvances in Neural Information Processing Systems 36 (NeurIPS 2023), pages 58833–58864, 2023

  12. [12]

    Jiang, Y

    Z. Jiang, Y . Zheng, H. Tan, B. Tang, and H. Zhou. Variational deep embedding: An unsupervised and generative approach to clustering. InProceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 1965–1972, 2017

  13. [13]

    A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C.-y. Deng, Y . Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng. MIMIC-CXR-JPG — chest radiographs with structured labels.PhysioNet, March 2024. Version 2.1.0

  14. [14]

    A. E. W. Johnson, T. J. Pollard, N. R. Greenbaum, M. P. Lungren, C. ying Deng, Y . Peng, Z. Lu, R. G. Mark, S. J. Berkowitz, and S. Horng. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs, 2019

  15. [15]

    Khemakhem, D

    I. Khemakhem, D. P. Kingma, R. P. Monti, and A. Hyvärinen. Variational autoencoders and nonlinear ICA: A unifying framework. InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS), volume 108 ofProceedings of Machine Learning Research, pages 2207–2217. PMLR, 2020

  16. [16]

    Khemakhem, R

    I. Khemakhem, R. Monti, R. Leech, and A. Hyvärinen. Causal autoregressive flows. In Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS), volume 130 ofProceedings of Machine Learning Research, pages 3520–3528. PMLR, 2021. 10

  17. [17]

    D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In2nd International Confer- ence on Learning Representations (ICLR), 2014

  18. [18]

    Kocaoglu, C

    M. Kocaoglu, C. Snyder, A. G. Dimakis, and S. Vishwanath. CausalGAN: Learning causal implicit generative models with adversarial training. InInternational Conference on Learning Representations (ICLR), 2018

  19. [19]

    Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. InProceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015

  20. [20]

    Locatello, S

    F. Locatello, S. Bauer, M. Lucic, G. Rätsch, S. Gelly, B. Schölkopf, and O. Bachem. Chal- lenging common assumptions in the unsupervised learning of disentangled representations. In Proceedings of the 36th International Conference on Machine Learning (ICML), volume 97 of Proceedings of Machine Learning Research, pages 4114–4124. PMLR, 2019

  21. [21]

    Loshchilov and F

    I. Loshchilov and F. Hutter. Decoupled weight decay regularization, 2019

  22. [22]

    C. F. Manski. Nonparametric bounds on treatment effects.The American Economic Review, 80(2):319–323, 1990

  23. [23]

    C. F. Manski. Monotone treatment response.Econometrica, 65(6):1311–1334, 1997

  24. [24]

    W. Miao, Z. Geng, and E. J. Tchetgen Tchetgen. Identifying causal effects with proxy variables of an unmeasured confounder.Biometrika, 105(4):987–993, 2018

  25. [25]

    Monteiro, F

    M. Monteiro, F. De Sousa Ribeiro, N. Pawlowski, D. C. Castro, and B. Glocker. Measuring axiomatic soundness of counterfactual image models. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

  26. [26]

    Nasr-Esfahany, M

    A. Nasr-Esfahany, M. Alizadeh, and D. Shah. Counterfactual identifiability of bijective causal models. InProceedings of the 40th International Conference on Machine Learning (ICML), volume 202 ofProceedings of Machine Learning Research, pages 25733–25754. PMLR, 2023

  27. [27]

    K. Padh, J. Zeitler, D. Watson, M. Kusner, R. Silva, and N. Kilbertus. Stochastic causal programming for bounding treatment effects. InProceedings of the Second Conference on Causal Learning and Reasoning (CLeaR), volume 213 ofProceedings of Machine Learning Research, pages 142–176. PMLR, 2023

  28. [28]

    Pan and E

    Y . Pan and E. Bareinboim. Counterfactual image editing. InProceedings of the 41st International Conference on Machine Learning (ICML), volume 235 ofProceedings of Machine Learning Research, pages 39087–39101. PMLR, 2024

  29. [29]

    Pawlowski, D

    N. Pawlowski, D. C. Castro, and B. Glocker. Deep structural causal models for tractable counterfactual inference. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 857–869, 2020

  30. [30]

    Pearl.Causality: Models, Reasoning, and Inference

    J. Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2000

  31. [31]

    P. R. Rosenbaum.Observational Studies. Springer, New York, 2 edition, 2002

  32. [32]

    Sanchez and S

    P. Sanchez and S. A. Tsaftaris. Diffusion causal models for counterfactual estimation. In Proceedings of the First Conference on Causal Learning and Reasoning (CLeaR), volume 177 ofProceedings of Machine Learning Research, pages 647–668. PMLR, 2022

  33. [33]

    Schölkopf, F

    B. Schölkopf, F. Locatello, S. Bauer, N. R. Ke, N. Kalchbrenner, A. Goyal, and Y . Bengio. Toward causal representation learning.Proceedings of the IEEE, 109(5):612–634, 2021

  34. [34]

    E. J. Tchetgen Tchetgen, A. Ying, Y . Cui, X. Shi, and W. Miao. An introduction to proximal causal inference.Statistical Science, 39(3):375–390, 2024

  35. [35]

    T. J. VanderWeele and P. Ding. Sensitivity analysis in observational research: Introducing the E-value.Annals of Internal Medicine, 167(4):268–274, 2017. 11

  36. [36]

    Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften

    C. Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg, 2009

  37. [37]

    Xia, K.-Z

    K. Xia, K.-Z. Lee, Y . Bengio, and E. Bareinboim. The causal-neural connection: Expressiveness, learnability, and inference. InAdvances in Neural Information Processing Systems 34 (NeurIPS 2021), 2021

  38. [38]

    K. Xia, Y . Pan, and E. Bareinboim. Neural causal models for counterfactual identification and estimation. InThe Eleventh International Conference on Learning Representations (ICLR), 2023

  39. [39]

    L. Xu, Y . Chen, S. Srinivasan, N. de Freitas, A. Doucet, and A. Gretton. Learning deep features in instrumental variable regression. InInternational Conference on Learning Representations (ICLR), 2021

  40. [40]

    M. Yang, F. Liu, Z. Chen, X. Shen, J. Hao, and J. Wang. CausalV AE: Disentangled representation learning via neural structural causal models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9593–9602, 2021

  41. [41]

    Zhang, J

    J. Zhang, J. Tian, and E. Bareinboim. Partial counterfactual identification from observational and experimental data. InProceedings of the 39th International Conference on Machine Learning (ICML), volume 162 ofProceedings of Machine Learning Research, pages 26548–26558. PMLR, 2022. A Related Work Causal generative models.A growing body of work injects cau...