Causal Representation Learning from General Environments under Nonparametric Mixing

Ignavier Ng; Kun Zhang; Peter Spirtes; Shaoan Xie; Xinshuai Dong

arxiv: 2604.23800 · v1 · submitted 2026-04-26 · 💻 cs.LG · stat.ML

Causal Representation Learning from General Environments under Nonparametric Mixing

Ignavier Ng , Shaoan Xie , Xinshuai Dong , Peter Spirtes , Kun Zhang This is my paper

Pith reviewed 2026-05-08 06:26 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords causal representation learninglatent DAG recoverynonparametric mixingsufficient change conditionsnonlinear causal modelsgeneral environmentsidentification up to indeterminaciesthird-order derivatives

0 comments

The pith

Sufficient changes in causal mechanisms up to third-order derivatives allow full recovery of the latent DAG and variables from general environments under nonparametric mixing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to recover hidden causal variables and their directed acyclic graph structure from low-level observations such as image pixels, across multiple environments where data distributions vary. It shows this recovery is possible even when the mixing from latent variables to observations is nonparametric and the latent causal models are nonlinear, such as additive Gaussian noise or heteroscedastic noise models. The key is to use sufficient changes in the causal mechanisms up to their third-order derivatives. This matters because prior approaches often required specific intervention types or parametric constraints on mixing or models, which frequently fail to hold in practice. The results therefore apply to a broader class of general environments while matching or improving on existing identification guarantees.

Core claim

Under a nonparametric mixing function and nonlinear latent causal models, such as additive Gaussian noise models or heteroscedastic noise models, one can fully recover the latent DAG and identify the latent variables up to minor indeterminacies by properly leveraging sufficient change conditions on the causal mechanisms up to third-order derivatives. These represent the first results to fully recover the latent DAG from general environments under nonparametric mixing, and they match or improve upon many existing works while requiring less restrictive assumptions about how environments change.

What carries the argument

Sufficient change conditions on the causal mechanisms up to third-order derivatives, which carry the identification argument by ensuring the latent structure can be distinguished without any parametric restrictions on the mixing function or the latent causal model.

If this is right

The full latent directed acyclic graph can be recovered from observations in general environments without assuming specific intervention types.
Latent variables become identifiable up to minor indeterminacies under nonlinear models including additive Gaussian noise and heteroscedastic noise.
Nonparametric mixing functions are permitted, removing the need for linearity or other parametric forms on how latents map to observations.
The results achieve identification with weaker assumptions on environment changes than many prior methods, while delivering comparable or stronger guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same change conditions could be tested in continuous-time or streaming data settings to see whether identification still holds when environments evolve smoothly rather than in discrete shifts.
Methods that actively select or generate environments to maximize variation in higher-order derivatives might achieve identification with fewer total environments.
Links to disentanglement objectives in representation learning could be examined, since both rely on changes in underlying factors to separate variables.

Load-bearing premise

The causal mechanisms must change sufficiently in their derivatives up to third order across the general environments.

What would settle it

A collection of environments satisfying all other conditions but where the third-order derivatives of the causal mechanisms remain constant, resulting in non-unique recovery of the latent DAG.

Figures

Figures reproduced from arXiv: 2604.23800 by Ignavier Ng, Kun Zhang, Peter Spirtes, Shaoan Xie, Xinshuai Dong.

**Figure 1.** Figure 1: The observed random variables X are generated from the latent variables Z via an unknown, nonparametric mixing function g. The causal mechanism for each latent variable Zi may vary across different environments specified by u. The gray shading of the nodes indicates that the variables are observable. metric mixing function g that is assumed to be a C 2 - diffeomorphism onto its image X ⊆ R d . In each en… view at source ↗

**Figure 2.** Figure 2: The recovered latent variables Zˆ versus the true latent variables Z under two latent DAGs: (1) Z1 → Z2 → Z3 and (2) Z1, Z2 → Z3. 6 Simulation Studies We conduct simulation studies to verify our identifiability theory. We consider three latent variables that follow our predefined DAG and the data generating procedure in Eq. (4). The mean and variances of ϵ (u) i are randomly sampled across different envir… view at source ↗

**Figure 3.** Figure 3: The losses versus the training steps with different number of estimated latent variables. The ground view at source ↗

**Figure 4.** Figure 4: The losses versus the training steps with different values of view at source ↗

read the original abstract

Causal representation learning aims to recover the latent causal variables and their causal relations, typically represented by directed acyclic graphs (DAGs), from low-level observations such as image pixels. A prevailing line of research exploits multiple environments, which assume how data distributions change, including single-node interventions, coupled interventions, or hard interventions, or parametric constraints on the mixing function or the latent causal model, such as linearity. Despite the novelty and elegance of the results, they are often violated in real problems. Accordingly, we formalize a set of desiderata for causal representation learning that applies to a broader class of environments, referred to as general environments. Interestingly, we show that one can fully recover the latent DAG and identify the latent variables up to minor indeterminacies under a nonparametric mixing function and nonlinear latent causal models, such as additive (Gaussian) noise models or heteroscedastic noise models, by properly leveraging sufficient change conditions on the causal mechanisms up to third-order derivatives. These represent, to our knowledge, the first results to fully recover the latent DAG from general environments under nonparametric mixing. Notably, our results match or improve upon many existing works, but require less restrictive assumptions about changing environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a theoretical identifiability result for full latent DAG recovery under nonparametric mixing in general environments, using third-order mechanism changes, which relaxes some prior restrictions but rests on conditions whose practical reach is unclear.

read the letter

This paper formalizes general environments for causal representation learning and claims that, when causal mechanisms change sufficiently across them, you can recover the full latent DAG plus the latent variables up to minor indeterminacies. The models allowed are nonlinear, including additive Gaussian noise and heteroscedastic cases, with a fully nonparametric mixing function. That combination is the main advance: earlier results typically needed linearity, parametric mixing, or narrower intervention types such as single-node or hard interventions. The authors position their third-order derivative conditions on the mechanisms as the lever that makes the broader setting work, and they argue the result matches or improves on several existing guarantees while dropping stronger assumptions on the environments themselves. That framing is useful and the comparison to prior work is clear. The soft spots sit where the result is most sensitive. Identification fails if the stated change conditions do not hold, and third-order derivatives are a fairly strong requirement that may not be easy to verify or satisfy in real data. The paper is almost entirely theoretical, so there is little direct evidence on how often these conditions appear in practice or how sensitive the recovery is to small violations. Minor indeterminacies in the latent variables are acknowledged but their downstream effect on tasks like prediction or intervention is not explored in depth. The proofs would need close checking for any hidden restrictions that reintroduce parametric flavor. This work is aimed at researchers focused on identifiability in causal representation learning rather than immediate applications. A reader already following the line of papers on multi-environment causal discovery will find the relaxed assumptions and the general-environment desiderata worth examining. It deserves peer review because the claim is substantive, the literature engagement is honest, and the formal result, if the proofs hold, fills a stated gap even if practical validation remains open.

Referee Report

2 major / 2 minor

Summary. The paper claims to establish theoretical identifiability results for causal representation learning in general environments. Under nonparametric mixing functions and nonlinear latent causal models (e.g., additive Gaussian noise or heteroscedastic noise), it shows that the latent DAG can be fully recovered and latent variables identified up to minor indeterminacies by leveraging sufficient change conditions on the causal mechanisms up to third-order derivatives. This is positioned as relaxing prior restrictions on interventions or parametric forms while matching or improving existing guarantees.

Significance. If the result holds, it would represent a meaningful advance by achieving full latent DAG recovery from general environments without parametric constraints on the mixing function, which prior works often require. The use of nonparametric mixing and nonlinear models broadens applicability, and the framing as the first such result under these relaxed conditions on environment changes is a strength if the third-order conditions can be shown to be realistic and sufficient.

major comments (2)

[Main identifiability result (abstract and theorem statement)] The identifiability theorem relies on the assumption that sufficient change conditions on causal mechanisms hold up to third-order derivatives in general environments. This is load-bearing for the claim of applicability beyond parametric or intervention-specific settings; the manuscript should include explicit verification or examples showing these conditions are satisfied for the stated nonlinear models (additive Gaussian noise, heteroscedastic) rather than leaving them as an unverified prerequisite.
[Proof of the main theorem] The central claim of full DAG recovery under nonparametric mixing depends on the sufficiency of the third-order derivative conditions without additional unstated restrictions. A detailed proof outline or key derivation steps should be provided in the main text (or clearly referenced appendix) to allow verification that no derivation gaps exist and that the conditions are not circular or overly restrictive in practice.

minor comments (2)

[Identification result statement] Clarify the precise definition of 'minor indeterminacies' in the identification of latent variables, with an explicit statement of what remains unidentified.
[Related work] Expand the related-work section to include quantitative comparison of assumption strength against at least two prior nonparametric or multi-environment CRL papers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the presentation of the identifiability results.

read point-by-point responses

Referee: The identifiability theorem relies on the assumption that sufficient change conditions on causal mechanisms hold up to third-order derivatives in general environments. This is load-bearing for the claim of applicability beyond parametric or intervention-specific settings; the manuscript should include explicit verification or examples showing these conditions are satisfied for the stated nonlinear models (additive Gaussian noise, heteroscedastic) rather than leaving them as an unverified prerequisite.

Authors: We agree that explicit verification strengthens the applicability claims. In the revised manuscript, we will add a dedicated subsection with concrete examples for additive Gaussian noise and heteroscedastic noise models, showing how the third-order derivative conditions on causal mechanisms are satisfied under nonparametric mixing in general environments. This will confirm the conditions hold without additional restrictions. revision: yes
Referee: The central claim of full DAG recovery under nonparametric mixing depends on the sufficiency of the third-order derivative conditions without additional unstated restrictions. A detailed proof outline or key derivation steps should be provided in the main text (or clearly referenced appendix) to allow verification that no derivation gaps exist and that the conditions are not circular or overly restrictive in practice.

Authors: The full proof appears in the appendix, but we concur that a high-level outline in the main text improves accessibility. We will insert a concise proof sketch in the main body (e.g., near the theorem statement) that walks through the key derivation steps from the third-order conditions to latent DAG recovery and variable identification, explicitly noting the absence of circularity or hidden restrictions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; identification result is derived from stated assumptions on mechanism changes

full rationale

The paper's central claim is a theoretical identification theorem: under nonparametric mixing and nonlinear latent models (e.g., additive Gaussian noise), the latent DAG and variables are recoverable up to minor indeterminacies when sufficient change conditions hold on causal mechanisms up to third-order derivatives in general environments. This is presented as a derivation from explicitly stated assumptions rather than from fitted parameters, self-referential definitions, or load-bearing self-citations. No equations or steps in the abstract reduce the output to the input by construction; the result is conditioned on external verifiable conditions about mechanism changes and matches or relaxes prior guarantees without circular reduction. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that sufficient changes in causal mechanisms exist and can be characterized up to third-order derivatives; no free parameters or new entities are introduced.

axioms (1)

domain assumption Sufficient change conditions on the causal mechanisms up to third-order derivatives exist in the general environments.
This condition is explicitly invoked as the key enabler for identification under nonparametric mixing.

pith-pipeline@v0.9.0 · 5513 in / 1190 out tokens · 51594 ms · 2026-05-08T06:26:49.115224+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 2 canonical work pages

[1]

PMLR, 2020. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.International Conference on Learning Representations, 2014. D. P. Kingma and M. Welling. Auto-encoding varia- tional Bayes. InInternational Conference on Learn- ing Representations, 2014. B. Kivva, G. Rajendran, P. Ravikumar, and B. Aragam. Learning latent causal graphs via mix...

work page arXiv 2020
[2]

PMLR, 2022. T. E. Lee, S. Vats, S. Girdhar, and O. Kroemer. SCALE: Causal learning and discovery of robot ma- nipulation skills using simulation. In7th Annual Conference on Robot Learning, 2023. W. Liang, A. Keki´ c, J. von K¨ ugelgen, S. Buchholz, M. Besserve, L. Gresele, and B. Sch¨ olkopf. Causal component analysis. InThirty-seventh Conference on Neura...

work page arXiv 2022
[3]

[Yes] The assumptions are clearly stated in each theorem statement

For all models and algorithms presented, check if you include: (a) A clear description of the mathematical set- ting, assumptions, algorithm, and/or model. [Yes] The assumptions are clearly stated in each theorem statement. (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Not Applicable] The primary focus of t...
[4]

[Yes] The assumptions are clearly stated in each theorem statement

For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. [Yes] The assumptions are clearly stated in each theorem statement. (b) Complete proofs of all theoretical results. [Yes] The proofs are provided in the supple- mentary materials. (c) Clear explanations of any assumptions. [Yes]
[5]

[Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen)

For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to reproduce the main experimental results (ei- ther in the supplemental material or as a URL). [Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen). [Yes] The details are explained in Sect...
[6]

[Not Applicable] (b) The license information of the assets, if ap- plicable

If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include: (a) Citations of the creator If your work uses ex- isting assets. [Not Applicable] (b) The license information of the assets, if ap- plicable. [Not Applicable] (c) New assets either in the supplemental mate- rial or as a URL, if applicable. ...
[7]

[Not Applicable] (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) approvals if applicable

If you used crowdsourcing or conducted research with human subjects, check if you include: (a) The full text of instructions given to partici- pants and screenshots. [Not Applicable] (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) approvals if applicable. [Not Appli- cable] (c) The estimated hourly wage paid...

1999
[8]

⇐” side is trivial. We now prove the “⇒

to sequentially identify leave nodes based on the estimated precision matrix. More recently, score-matching based methods for causal discovery have been proposed. This line of work typically assumes nonlinear additive Gaussian noise models (Rolland et al., 2022), while other settings are also considered (Montagna et al., 2023a,c,b; Sanchez et al., 2023). ...

2022
[10]

Proof.Let (ˆg, p ˆZ,G ˆZ) be an output of Line 3 in Algorithm 1 during the (n−1)-th iteration, which corresponds to the output of Algorithm 1

(Identifiability ofZ) ˆZπ(i) is solely a function of a subset of{Z i} ∪sur(Z i;G Z). Proof.Let (ˆg, p ˆZ,G ˆZ) be an output of Line 3 in Algorithm 1 during the (n−1)-th iteration, which corresponds to the output of Algorithm 1. Since both the true mixing functiongand the estimated mixing function ˆgare diffeomorphisms onto their images, the transformation...

2024
[11]

(Identifiability ofG Z)G ˆZπ andG Z are identical
[12]

Proof.The proof here is identical to that of Theorem 1, which leverages Propositions 2 and 3

(Identifiability ofZ) ˆZπ(i) is solely a function of a subset of{Z i} ∪sur(Z i;G Z). Proof.The proof here is identical to that of Theorem 1, which leverages Propositions 2 and 3. Therefore, we Causal Representation Learning from General Environments under Nonparametric Mixing omit the proof here. The only (minor) difference is that the third-order derivat...

[1] [1]

PMLR, 2020. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization.International Conference on Learning Representations, 2014. D. P. Kingma and M. Welling. Auto-encoding varia- tional Bayes. InInternational Conference on Learn- ing Representations, 2014. B. Kivva, G. Rajendran, P. Ravikumar, and B. Aragam. Learning latent causal graphs via mix...

work page arXiv 2020

[2] [2]

PMLR, 2022. T. E. Lee, S. Vats, S. Girdhar, and O. Kroemer. SCALE: Causal learning and discovery of robot ma- nipulation skills using simulation. In7th Annual Conference on Robot Learning, 2023. W. Liang, A. Keki´ c, J. von K¨ ugelgen, S. Buchholz, M. Besserve, L. Gresele, and B. Sch¨ olkopf. Causal component analysis. InThirty-seventh Conference on Neura...

work page arXiv 2022

[3] [3]

[Yes] The assumptions are clearly stated in each theorem statement

For all models and algorithms presented, check if you include: (a) A clear description of the mathematical set- ting, assumptions, algorithm, and/or model. [Yes] The assumptions are clearly stated in each theorem statement. (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. [Not Applicable] The primary focus of t...

[4] [4]

[Yes] The assumptions are clearly stated in each theorem statement

For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. [Yes] The assumptions are clearly stated in each theorem statement. (b) Complete proofs of all theoretical results. [Yes] The proofs are provided in the supple- mentary materials. (c) Clear explanations of any assumptions. [Yes]

[5] [5]

[Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen)

For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to reproduce the main experimental results (ei- ther in the supplemental material or as a URL). [Yes] (b) All the training details (e.g., data splits, hy- perparameters, how they were chosen). [Yes] The details are explained in Sect...

[6] [6]

[Not Applicable] (b) The license information of the assets, if ap- plicable

If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include: (a) Citations of the creator If your work uses ex- isting assets. [Not Applicable] (b) The license information of the assets, if ap- plicable. [Not Applicable] (c) New assets either in the supplemental mate- rial or as a URL, if applicable. ...

[7] [7]

[Not Applicable] (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) approvals if applicable

If you used crowdsourcing or conducted research with human subjects, check if you include: (a) The full text of instructions given to partici- pants and screenshots. [Not Applicable] (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) approvals if applicable. [Not Appli- cable] (c) The estimated hourly wage paid...

1999

[8] [8]

⇐” side is trivial. We now prove the “⇒

to sequentially identify leave nodes based on the estimated precision matrix. More recently, score-matching based methods for causal discovery have been proposed. This line of work typically assumes nonlinear additive Gaussian noise models (Rolland et al., 2022), while other settings are also considered (Montagna et al., 2023a,c,b; Sanchez et al., 2023). ...

2022

[9] [10]

Proof.Let (ˆg, p ˆZ,G ˆZ) be an output of Line 3 in Algorithm 1 during the (n−1)-th iteration, which corresponds to the output of Algorithm 1

(Identifiability ofZ) ˆZπ(i) is solely a function of a subset of{Z i} ∪sur(Z i;G Z). Proof.Let (ˆg, p ˆZ,G ˆZ) be an output of Line 3 in Algorithm 1 during the (n−1)-th iteration, which corresponds to the output of Algorithm 1. Since both the true mixing functiongand the estimated mixing function ˆgare diffeomorphisms onto their images, the transformation...

2024

[10] [11]

(Identifiability ofG Z)G ˆZπ andG Z are identical

[11] [12]

Proof.The proof here is identical to that of Theorem 1, which leverages Propositions 2 and 3

(Identifiability ofZ) ˆZπ(i) is solely a function of a subset of{Z i} ∪sur(Z i;G Z). Proof.The proof here is identical to that of Theorem 1, which leverages Propositions 2 and 3. Therefore, we Causal Representation Learning from General Environments under Nonparametric Mixing omit the proof here. The only (minor) difference is that the third-order derivat...