pith. sign in

arxiv: 2510.13583 · v4 · pith:7F3SAYK5new · submitted 2025-10-15 · 📊 stat.ML · cs.LG

On the Identifiability of Causal Graphs with the Invariance Principle

Pith reviewed 2026-05-18 06:16 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords causal discoveryidentifiabilitystructural causal modelsinvariance principlenonlinear mechanismsGaussian noisemultiple environmentsICA duality
0
0 comments X

The pith

Data from two environments with differing noise statistics uniquely identifies any causal graph under Gaussian noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that causal discovery from observational data alone is ill-posed in general, but becomes solvable when the distribution induced by a structural causal model is combined with data from just two environments that differ sufficiently in their noise statistics. This holds even for arbitrary nonlinear causal mechanisms. The approach builds on the duality between independent component analysis and causal discovery, showing that far less auxiliary information than previously needed suffices to recover the full graph. If correct, it would allow unique recovery of causal structures with minimal additional data collection across environments.

Core claim

If we have access to the distribution induced by a structural causal model and additional data from only two environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. This is the first result guaranteeing entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms, under the constraint of Gaussian noise terms.

What carries the argument

The invariance principle applied across environments with differing noise statistics, which exploits the duality between nonlinear ICA and causal discovery to enforce graph identifiability.

If this is right

  • The full causal graph is uniquely recoverable from the model distribution plus data from two environments.
  • Arbitrary nonlinear mechanisms between variables are permitted.
  • Only a constant number of environments is required, in contrast to needing as many as the number of variables.
  • The Gaussian noise constraint can potentially be relaxed through further analysis.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-world causal discovery in fields like biology could become feasible with limited intervention or observational shifts.
  • The same invariance idea might extend to settings where environments differ in other ways besides noise variance.
  • This reduces the data burden compared to methods requiring one environment per source variable.

Load-bearing premise

The noise terms must follow Gaussian distributions.

What would settle it

A concrete counterexample consisting of a nonlinear structural causal model with Gaussian noises where two environments with different noise statistics produce data consistent with more than one causal graph.

read the original abstract

Causal discovery from i.i.d. observational data is known to be generally ill-posed. We demonstrate that if we have access to the distribution {induced} by a structural causal model, and additional data from (in the best case) \textit{only two} environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. Notably, this is the first result in the literature that guarantees the entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms. Our only constraint is the Gaussianity of the noise terms; however, we propose potential ways to relax this requirement. Of interest on its own, we expand on the well-known duality between independent component analysis (ICA) and causal discovery; recent advancements have shown that nonlinear ICA can be solved from multiple environments, at least as many as the number of sources: we show that the same can be achieved for causal discovery while having access to much less auxiliary information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that the causal graph of a structural causal model is uniquely identifiable given the observational distribution induced by the model together with data from (in the best case) only two environments whose noise statistics differ sufficiently. The result holds for arbitrary nonlinear mechanisms under the assumption of Gaussian noise; the argument proceeds by expanding the known duality between nonlinear ICA and causal discovery, showing that the causal structure supplies enough additional constraints to achieve identifiability with far fewer environments than the number of sources required by standard nonlinear ICA.

Significance. If the central identifiability theorem is correct, the result would be a meaningful advance: it is the first to guarantee recovery of the entire causal graph with a constant (specifically two) number of environments for arbitrary nonlinear mechanisms. The explicit treatment of the ICA–causal-discovery duality is of independent technical interest and could inform future work on minimal auxiliary-data requirements.

major comments (2)
  1. [Main identifiability theorem] Main identifiability theorem (likely §3 or §4): the invariance argument shows that conditional distributions are preserved across environments, yet it is not shown explicitly that this invariance, together with the two observed joint distributions, rules out every alternative DAG that could generate the same pair of distributions under nonlinear mechanisms. A separate uniqueness lemma that enumerates and excludes all residual Markov-equivalence classes or ICA-style indeterminacies would be required to support the claim that two environments suffice.
  2. [Assumption on noise] Assumption on noise (Gaussianity and sufficient difference): the reduction from the usual “number of sources” environments in nonlinear ICA to a fixed two environments rests on the Gaussian noise model supplying exactly the right additional constraints. The manuscript should verify, perhaps via a low-dimensional counter-example or explicit construction, that no other graph remains consistent once the noise variances or higher moments differ between the two environments.
minor comments (2)
  1. [Abstract] Abstract contains a stray LaTeX brace: “distribution {induced} by”.
  2. [Introduction / Assumptions] The precise quantitative condition on how much the noise statistics must differ (“sufficiently differ”) should be stated as a formal assumption rather than left informal.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and for highlighting the potential impact of our work on causal discovery. We provide point-by-point responses to the major comments below, and we plan to incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Main identifiability theorem] the invariance argument shows that conditional distributions are preserved across environments, yet it is not shown explicitly that this invariance, together with the two observed joint distributions, rules out every alternative DAG that could generate the same pair of distributions under nonlinear mechanisms. A separate uniqueness lemma that enumerates and excludes all residual Markov-equivalence classes or ICA-style indeterminacies would be required to support the claim that two environments suffice.

    Authors: We appreciate this suggestion for improving the clarity of our proof. Our main identifiability result (Theorem 4.1) establishes uniqueness by showing that the invariance principle, when combined with the two distinct joint distributions from the environments, pins down the exact parent sets for each variable. Specifically, for any candidate DAG that differs from the true one, there exists at least one node whose conditional distribution would not be invariant across environments unless the correct parents are used. To address the referee's concern explicitly, we will introduce a new lemma (Lemma 4.2) that formally excludes alternative structures, including those arising from ICA indeterminacies, by leveraging the Gaussian noise assumption and the sufficient difference in noise statistics between the two environments. revision: yes

  2. Referee: [Assumption on noise] the reduction from the usual “number of sources” environments in nonlinear ICA to a fixed two environments rests on the Gaussian noise model supplying exactly the right additional constraints. The manuscript should verify, perhaps via a low-dimensional counter-example or explicit construction, that no other graph remains consistent once the noise variances or higher moments differ between the two environments.

    Authors: We agree that providing a concrete verification would enhance the reader's understanding. While the proof in Section 3 relies on the properties of Gaussian distributions to ensure that differing variances break the symmetries present in standard nonlinear ICA, we will add an illustrative example in a new subsection (e.g., for a simple chain graph with three variables) demonstrating that only the true causal graph is consistent with the observed distributions and invariances when the noise parameters differ between the two environments. This example will also clarify how the Gaussianity allows us to achieve identifiability with just two environments rather than one per source. revision: yes

Circularity Check

0 steps flagged

No circularity; identifiability theorem is self-contained

full rationale

The paper advances a theoretical identifiability result for causal graphs from the SCM-induced distribution plus two environments differing in Gaussian noise statistics. It builds on the ICA-causal discovery duality but presents the constant-environment claim as a new reduction. No quoted steps reduce by construction to fitted parameters, self-definitions, or load-bearing self-citations; Gaussianity and environment difference are stated as explicit constraints rather than derived outputs. The derivation relies on external mathematical principles (invariance and ICA) without renaming known results or smuggling ansatzes via self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption of Gaussian noise and the modeling choice that two environments differ sufficiently in noise statistics; no free parameters or new entities are introduced in the abstract.

axioms (2)
  • domain assumption Noise terms are Gaussian
    Explicitly stated as the only constraint on the result.
  • domain assumption Environments differ sufficiently in noise statistics
    Required for the two-environment identifiability guarantee.

pith-pipeline@v0.9.0 · 5686 in / 1093 out tokens · 26624 ms · 2026-05-18T06:16:04.106112+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We demonstrate that if we have access to the distribution induced by a structural causal model, and additional data from (in the best case) only two environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable... Our only constraint is the Gaussianity of the noise terms

  • IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Lemma 1... X_{i∈I1} D²_x log p(x) − D²_x log p^i(x) = J_{f^{-1}}(x)^T Ω_1 J_{f^{-1}}(x) ... Jh(s)^T bΩ_l Jh(s) = Ω_l ... Jh(s)^{-1} bΩ_1^{-1} bΩ_2 Jh(s) = Ω_1^{-1} Ω_2

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.