pith. sign in

arxiv: 2606.05942 · v1 · pith:GQIRCMT4new · submitted 2026-06-04 · 📊 stat.ML · cs.LG

EML-CD: Causal Mechanism Recovery via EML Symbolic Trees in Structure Learning

Pith reviewed 2026-06-27 23:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords causal discoverystructure learningsymbolic regressionmechanism recoveryEML treesDAG learninginterpretable causal models
0
0 comments X

The pith

EML-CD recovers causal graph structure together with closed-form equations for each mechanism by representing edges as gated symbolic trees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EML-CD as a way to perform causal structure learning while also recovering explicit mathematical equations for the mechanisms on each edge. It replaces black-box neural representations with gated binary trees built from the EML operator, which composes elementary functions into readable closed-form expressions. Because the equations are explicit, Jacobians can be computed analytically to quantify causal effects. On the Sachs protein-signaling dataset the method reaches structural accuracy comparable to PC and GES while returning equations whose precision and recall are reported. Controlled experiments show faithful recovery of most elementary function families and lower held-out mechanism error than a fixed dictionary approach.

Core claim

EML-CD represents each candidate edge mechanism as a gated EML binary tree and jointly optimizes structure and tree parameters to output a DAG whose edges carry closed-form causal equations; these equations are obtained directly from the EML compositions and permit direct Jacobian evaluation without post-hoc extraction.

What carries the argument

The gated EML binary tree, which encodes each causal mechanism as a composition of elementary functions through repeated application of a single binary operator together with learned gates.

If this is right

  • Analytical Jacobians become available for every discovered edge, enabling direct quantification of causal effects.
  • On the Sachs data the method returns equations with edge precision 0.756 while keeping SHD within the variance of PC and GES.
  • In bivariate tests with known mechanisms the approach recovers ten of eleven elementary function families with held-out shape correlation at least 0.96.
  • On symbolic synthetic data the held-out mechanism f-MSE is substantially lower than that of a fixed SINDy dictionary.
  • A depth-2 model improves F1 over linear OLS-BIC on the Causal Chambers light-tunnel subset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Knowing the explicit equation on each edge could allow direct simulation of interventions without retraining a separate predictor.
  • The same gated-tree representation might be inserted into other continuous structure-learning objectives that currently use neural mechanisms.
  • If tree depth is limited, the method may remain tractable on problems larger than the d=11 Sachs example while still producing human-readable equations.

Load-bearing premise

Real causal mechanisms are sufficiently well approximated by compositions of elementary functions inside gated binary trees so that joint structure-and-mechanism optimization does not create systematic bias.

What would settle it

A benchmark dataset whose ground-truth mechanisms require functions or compositions outside the EML operator's elementary library would produce either high mechanism f-MSE or structure errors that exceed those of PC/GES.

Figures

Figures reproduced from arXiv: 2606.05942 by Sota Asanuma.

Figure 1
Figure 1. Figure 1: True DAG (left), EML-CD (center, representative seed, SHD=11), and CAM (right, SHD=12) on [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sachs data: analytical Jacobian ∂P38/∂PKC for PKC → P38, computed by automatic differentiation of the gate-annealed (hard-gate) EML-CD mechanism—so the curve is the exact derivative of the displayed Example-1 equation (dense grid; no interpolation). It ranges from ≈ −5.4 to ≈ 2.3 and changes sign with input level, quantifying a state-dependent nonlinear causal effect; the jumps reflect the numerical bounds… view at source ↗
Figure 3
Figure 3. Figure 3: Controlled mechanism recovery: true mechanism (dashed) vs. the recovered gate-annealed EML [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: S-Sym held-out mechanism f-MSE per seed (log scale). EML-CD (blue) stays low and stable across [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Neural network (NN)-based nonlinear causal discovery methods recover DAG structure but leave each causal mechanism as a black box. Waxman et al. argued that extracting causal mechanisms from NN weights is ill-posed. We propose EML-CD, a framework that integrates the EML operator (capable of composing elementary functions from a single binary operator) into causal structure learning, with interpretable mechanism recovery as the primary objective. EML-CD represents each edge mechanism as a gated EML binary tree and automatically discovers closed-form causal equations. Analytical Jacobians can be directly computed from the output equations, enabling quantitative understanding of causal effects. On real data (Sachs protein signaling, d=11), EML-CD achieves SHD=11.2 +/- 0.4 (5-seed mean; baselines are single deterministic runs), on par with PC/GES within seed variance and below CAM, while attaching closed-form equations to each detected edge (precision 0.756, recall 0.365). In a controlled bivariate test with known mechanisms, EML-CD recovers 10 of 11 elementary function families faithfully (held-out shape correlation >= 0.96; only high-frequency sine is partial). On a symbolic synthetic benchmark, EML-CD attains a substantially lower and more stable held-out mechanism f-MSE than a fixed SINDy dictionary (mean 3.67 vs. 7644, the latter inflated by catastrophic extrapolation on one seed), although its structure recovery (SHD 14.0) only matches the dictionary and stays below specialized optimizers; on the Causal Chambers light-tunnel subset, a depth-2 model improves F1 over linear OLS-BIC (0.444 vs. 0.273).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes EML-CD, which integrates the EML operator into causal structure learning by representing each edge mechanism as a gated EML binary tree. This enables joint recovery of DAG structure and closed-form symbolic causal equations, with analytical Jacobians for effect quantification. Claims include SHD=11.2±0.4 on Sachs (on par with PC/GES), recovery of 10/11 elementary functions in bivariate tests (shape correlation ≥0.96), held-out mechanism f-MSE of 3.67 vs. 7644 for fixed SINDy on symbolic synthetics, and F1=0.444 vs. 0.273 for linear OLS-BIC on Causal Chambers light-tunnel data.

Significance. If the central claims hold, the work addresses a key limitation of NN-based causal discovery by delivering interpretable, closed-form mechanisms rather than black-box functions. This is significant for domains requiring mechanistic insight and quantitative causal effects, as it combines structure learning with symbolic regression in a single framework. The reported gains in mechanism fidelity on synthetics drawn from the EML library are a concrete strength.

major comments (3)
  1. [Experimental Results (Sachs)] Experimental section on Sachs protein signaling: SHD is reported as a 5-seed mean (11.2±0.4) for EML-CD while PC/GES and other baselines are single deterministic runs; this protocol mismatch undermines the direct 'on par' claim and requires either matched multi-seed baselines or explicit variance reporting for all methods.
  2. [Methods] Methods and experimental protocol sections: no full hyperparameter protocol, gating threshold schedule, or tree-construction details are provided, nor is the joint optimization procedure for structure and mechanism parameters fully specified; these omissions prevent independent verification of the reported f-MSE and SHD numbers.
  3. [Results (Symbolic Synthetic)] Symbolic synthetic benchmark results: while mechanism f-MSE is lower, structure recovery (SHD=14.0) only matches the dictionary baseline and remains below specialized structure optimizers; the manuscript should quantify whether the mechanism-recovery advantage systematically trades off against structure accuracy.
minor comments (2)
  1. [Abstract] Abstract: the citation 'Waxman et al.' should include the full reference and year for clarity.
  2. [Bivariate Experiments] Bivariate test description: clarify how the 11 elementary function families were selected and whether the held-out shape correlation metric is defined in the main text or supplement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We respond to each major comment below and indicate the revisions we plan to make.

read point-by-point responses
  1. Referee: Experimental section on Sachs protein signaling: SHD is reported as a 5-seed mean (11.2±0.4) for EML-CD while PC/GES and other baselines are single deterministic runs; this protocol mismatch undermines the direct 'on par' claim and requires either matched multi-seed baselines or explicit variance reporting for all methods.

    Authors: We acknowledge this inconsistency in reporting. PC and GES are deterministic algorithms without inherent randomness, hence single runs. However, to strengthen the comparison, we will add multi-seed evaluations for any stochastic baselines and explicitly state the deterministic nature of PC/GES in the revised manuscript. The 'on par' claim will be qualified to reflect that EML-CD's performance is comparable within its variance. revision: yes

  2. Referee: Methods and experimental protocol sections: no full hyperparameter protocol, gating threshold schedule, or tree-construction details are provided, nor is the joint optimization procedure for structure and mechanism parameters fully specified; these omissions prevent independent verification of the reported f-MSE and SHD numbers.

    Authors: We agree that additional details are necessary for reproducibility. The revised manuscript will include a dedicated section or appendix with the full hyperparameter settings, the gating threshold schedule, the EML tree construction procedure, and a precise description of the joint optimization algorithm used for structure and mechanism parameters. revision: yes

  3. Referee: Symbolic synthetic benchmark results: while mechanism f-MSE is lower, structure recovery (SHD=14.0) only matches the dictionary baseline and remains below specialized structure optimizers; the manuscript should quantify whether the mechanism-recovery advantage systematically trades off against structure accuracy.

    Authors: The paper already highlights that structure recovery is not the primary goal and matches the dictionary baseline. To quantify the potential trade-off, we will include additional discussion and possibly new metrics or comparisons in the revision that explicitly contrast the mechanism recovery benefits against any structure accuracy differences relative to specialized structure learning methods. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on empirical performance against external benchmarks (Sachs protein data, known bivariate function families, symbolic synthetic data, Causal Chambers) and explicit modeling assumptions about EML tree expressivity. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or a self-definitional equivalence. Structure recovery metrics and mechanism f-MSE are reported relative to independent baselines (PC, GES, SINDy) without the target quantities being defined solely in terms of the model's own fitted values. The derivation is therefore self-contained against the stated external validation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that real causal mechanisms admit compact representation via EML-composable elementary functions and that gating can separate structure decisions from parameter fitting without circular dependence on the recovered equations themselves.

free parameters (1)
  • gating thresholds and tree construction hyperparameters
    Parameters that control which edges are active and how deep the symbolic trees grow; these are optimized during learning and directly affect both structure and mechanism outputs.
axioms (1)
  • domain assumption Causal mechanisms can be expressed as compositions of elementary functions generated from a single binary operator via binary trees.
    Invoked when the paper states that each edge mechanism is represented as a gated EML binary tree.
invented entities (1)
  • gated EML binary tree no independent evidence
    purpose: To encode and discover interpretable closed-form causal mechanisms jointly with DAG structure.
    New representational device introduced by the framework; no independent evidence outside the reported benchmarks is supplied.

pith-pipeline@v0.9.1-grok · 5847 in / 1422 out tokens · 32151 ms · 2026-06-27T23:44:08.002196+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 6 canonical work pages · 1 internal anchor

  1. [1]

    Apellániz, Santiago Zazo, and Juan Parras

    Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, and Juan Parras. KaCGM: Kolmogorov-arnold causal generative models.arXiv preprint arXiv:2603.20184, 2026

  2. [2]

    DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization

    Kevin Bello, Bryon Aragam, and Pradeep Ravikumar. DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization. InNeurIPS, 2022. arXiv:2209.08037

  3. [3]

    Tiago Brogueira and Mário A. T. Figueiredo. Bivariate causal discovery using rate-distortion MDL: An information dimension approach.arXiv preprint arXiv:2604.05829, 2026

  4. [4]

    Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar

    Philippe Brouillard, Chandler Squires, Jonas Wahl, Konrad P. Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar. The landscape of causal discovery data: Grounding causal discovery in real-world applications. InProceedings of the Fourth Conference on Causal Learning and Reasoning (CLeaR), volume 275 ofProceedings of Machine Learning Research, pages...

  5. [5]

    L., Proctor, J

    Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.PNAS, 113(15):3932–3937, 2016. doi: 10.1073/pnas.1517384113

  6. [6]

    CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014

    Peter Bühlmann, Jonas Peters, and Jan Ernest. CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014. doi: 10.1214/ 14-AOS1260

  7. [7]

    Discovering symbolic models from deep learning with inductive biases

    Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering symbolic models from deep learning with inductive biases. InNeurIPS, 2020. arXiv:2006.11287

  8. [8]

    Gamella, Jonas Peters, and Peter Bühlmann

    Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal chambers as a real-world physical testbed for AI methodology.Nature Machine Intelligence, 7(1):107–118, 2025. doi: 10.1038/s42256-024-00964-x

  9. [9]

    Hoyer, Dominik Janzing, Joris M

    Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. InNeurIPS, 2008

  10. [10]

    Gradient-based neural DAG learning

    Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural DAG learning. InICLR, 2020. arXiv:1906.02226

  11. [11]

    DiBS: Differentiable Bayesian structure learning

    Lars Lorch, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. DiBS: Differentiable Bayesian structure learning. InNeurIPS, 2021

  12. [12]

    Lundberg and Su-In Lee

    Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InNeurIPS, 2017

  13. [13]

    On the role of sparsity and DAG constraints for learning linear DAGs

    Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. On the role of sparsity and DAG constraints for learning linear DAGs. InNeurIPS, 2020

  14. [14]

    All elementary functions from a single binary operator

    Andrzej Odrzywołek. All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026. doi: 10.48550/arXiv.2603.21852

  15. [15]

    Mooij, Dominik Janzing, and Bernhard Schölkopf

    Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014. 12

  16. [16]

    Scalable causal discovery from recursive nonlinear data via truncated basis function scores and tests.arXiv preprint arXiv:2510.04276, 2025

    Joseph Ramsey, Bryan Andrews, and Peter Spirtes. Scalable causal discovery from recursive nonlinear data via truncated basis function scores and tests.arXiv preprint arXiv:2510.04276, 2025

  17. [17]

    Why Should I Trust You?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?”: Explaining the predictions of any classifier. InKDD, 2016

  18. [18]

    Causal protein-signaling networks derived from multiparameter single-cell data

    Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. Causal protein- signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005. doi: 10.1126/science.1105809

  19. [19]

    Bootstrap methods: Another look at the jackknife,

    Gideon Schwarz. Estimating the dimension of a model.Annals of Statistics, 6(2):461–464, 1978. doi: 10.1214/aos/1176344136

  20. [20]

    Hoyer, Aapo Hyvärinen, and Antti Kerminen

    Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

  21. [21]

    Hoyer, and Kenneth Bollen

    Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, and Kenneth Bollen. DirectLiNGAM: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research, 12:1225–1248, 2011

  22. [22]

    Djuri ´c

    Daniel Waxman, Kurt Butler, and Petar M. Djuri ´c. DAGMA-DCE: Interpretable, non-parametric differentiable causal discovery.IEEE Open Journal of Signal Processing, 5:393–401, 2024. doi: 10.1109/OJSP.2024.3351593. arXiv:2401.02930

  23. [23]

    Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. DAGs with NO TEARS: Continuous Optimization for Structure Learning. InNeurIPS, 2018

  24. [24]

    Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. Learning sparse nonparametric DAGs. InAISTATS, 2020. arXiv:1909.13189

  25. [25]

    Differentiable constraint-based causal discovery

    Jincheng Zhou, Mengbo Wang, Anqi He, Yumeng Zhou, Hessam Olya, Murat Kocaoglu, and Bruno Ribeiro. Differentiable constraint-based causal discovery. InNeurIPS, 2025. arXiv:2510.22031. 13