On the Identifiability of Causal Graphs with the Invariance Principle
Pith reviewed 2026-05-18 06:16 UTC · model grok-4.3
The pith
Data from two environments with differing noise statistics uniquely identifies any causal graph under Gaussian noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
If we have access to the distribution induced by a structural causal model and additional data from only two environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. This is the first result guaranteeing entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms, under the constraint of Gaussian noise terms.
What carries the argument
The invariance principle applied across environments with differing noise statistics, which exploits the duality between nonlinear ICA and causal discovery to enforce graph identifiability.
If this is right
- The full causal graph is uniquely recoverable from the model distribution plus data from two environments.
- Arbitrary nonlinear mechanisms between variables are permitted.
- Only a constant number of environments is required, in contrast to needing as many as the number of variables.
- The Gaussian noise constraint can potentially be relaxed through further analysis.
Where Pith is reading between the lines
- Real-world causal discovery in fields like biology could become feasible with limited intervention or observational shifts.
- The same invariance idea might extend to settings where environments differ in other ways besides noise variance.
- This reduces the data burden compared to methods requiring one environment per source variable.
Load-bearing premise
The noise terms must follow Gaussian distributions.
What would settle it
A concrete counterexample consisting of a nonlinear structural causal model with Gaussian noises where two environments with different noise statistics produce data consistent with more than one causal graph.
read the original abstract
Causal discovery from i.i.d. observational data is known to be generally ill-posed. We demonstrate that if we have access to the distribution {induced} by a structural causal model, and additional data from (in the best case) \textit{only two} environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. Notably, this is the first result in the literature that guarantees the entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms. Our only constraint is the Gaussianity of the noise terms; however, we propose potential ways to relax this requirement. Of interest on its own, we expand on the well-known duality between independent component analysis (ICA) and causal discovery; recent advancements have shown that nonlinear ICA can be solved from multiple environments, at least as many as the number of sources: we show that the same can be achieved for causal discovery while having access to much less auxiliary information.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims that the causal graph of a structural causal model is uniquely identifiable given the observational distribution induced by the model together with data from (in the best case) only two environments whose noise statistics differ sufficiently. The result holds for arbitrary nonlinear mechanisms under the assumption of Gaussian noise; the argument proceeds by expanding the known duality between nonlinear ICA and causal discovery, showing that the causal structure supplies enough additional constraints to achieve identifiability with far fewer environments than the number of sources required by standard nonlinear ICA.
Significance. If the central identifiability theorem is correct, the result would be a meaningful advance: it is the first to guarantee recovery of the entire causal graph with a constant (specifically two) number of environments for arbitrary nonlinear mechanisms. The explicit treatment of the ICA–causal-discovery duality is of independent technical interest and could inform future work on minimal auxiliary-data requirements.
major comments (2)
- [Main identifiability theorem] Main identifiability theorem (likely §3 or §4): the invariance argument shows that conditional distributions are preserved across environments, yet it is not shown explicitly that this invariance, together with the two observed joint distributions, rules out every alternative DAG that could generate the same pair of distributions under nonlinear mechanisms. A separate uniqueness lemma that enumerates and excludes all residual Markov-equivalence classes or ICA-style indeterminacies would be required to support the claim that two environments suffice.
- [Assumption on noise] Assumption on noise (Gaussianity and sufficient difference): the reduction from the usual “number of sources” environments in nonlinear ICA to a fixed two environments rests on the Gaussian noise model supplying exactly the right additional constraints. The manuscript should verify, perhaps via a low-dimensional counter-example or explicit construction, that no other graph remains consistent once the noise variances or higher moments differ between the two environments.
minor comments (2)
- [Abstract] Abstract contains a stray LaTeX brace: “distribution {induced} by”.
- [Introduction / Assumptions] The precise quantitative condition on how much the noise statistics must differ (“sufficiently differ”) should be stated as a formal assumption rather than left informal.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and for highlighting the potential impact of our work on causal discovery. We provide point-by-point responses to the major comments below, and we plan to incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Main identifiability theorem] the invariance argument shows that conditional distributions are preserved across environments, yet it is not shown explicitly that this invariance, together with the two observed joint distributions, rules out every alternative DAG that could generate the same pair of distributions under nonlinear mechanisms. A separate uniqueness lemma that enumerates and excludes all residual Markov-equivalence classes or ICA-style indeterminacies would be required to support the claim that two environments suffice.
Authors: We appreciate this suggestion for improving the clarity of our proof. Our main identifiability result (Theorem 4.1) establishes uniqueness by showing that the invariance principle, when combined with the two distinct joint distributions from the environments, pins down the exact parent sets for each variable. Specifically, for any candidate DAG that differs from the true one, there exists at least one node whose conditional distribution would not be invariant across environments unless the correct parents are used. To address the referee's concern explicitly, we will introduce a new lemma (Lemma 4.2) that formally excludes alternative structures, including those arising from ICA indeterminacies, by leveraging the Gaussian noise assumption and the sufficient difference in noise statistics between the two environments. revision: yes
-
Referee: [Assumption on noise] the reduction from the usual “number of sources” environments in nonlinear ICA to a fixed two environments rests on the Gaussian noise model supplying exactly the right additional constraints. The manuscript should verify, perhaps via a low-dimensional counter-example or explicit construction, that no other graph remains consistent once the noise variances or higher moments differ between the two environments.
Authors: We agree that providing a concrete verification would enhance the reader's understanding. While the proof in Section 3 relies on the properties of Gaussian distributions to ensure that differing variances break the symmetries present in standard nonlinear ICA, we will add an illustrative example in a new subsection (e.g., for a simple chain graph with three variables) demonstrating that only the true causal graph is consistent with the observed distributions and invariances when the noise parameters differ between the two environments. This example will also clarify how the Gaussianity allows us to achieve identifiability with just two environments rather than one per source. revision: yes
Circularity Check
No circularity; identifiability theorem is self-contained
full rationale
The paper advances a theoretical identifiability result for causal graphs from the SCM-induced distribution plus two environments differing in Gaussian noise statistics. It builds on the ICA-causal discovery duality but presents the constant-environment claim as a new reduction. No quoted steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; Gaussianity and environment difference are stated as explicit constraints rather than derived outputs. The derivation relies on external mathematical principles (invariance and ICA) without renaming known results or smuggling ansatzes via self-citation chains.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Noise terms are Gaussian
- domain assumption Environments differ sufficiently in noise statistics
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We demonstrate that if we have access to the distribution induced by a structural causal model, and additional data from (in the best case) only two environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable... Our only constraint is the Gaussianity of the noise terms
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Lemma 1... X_{i∈I1} D²_x log p(x) − D²_x log p^i(x) = J_{f^{-1}}(x)^T Ω_1 J_{f^{-1}}(x) ... Jh(s)^T bΩ_l Jh(s) = Ω_l ... Jh(s)^{-1} bΩ_1^{-1} bΩ_2 Jh(s) = Ω_1^{-1} Ω_2
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.