EML-CD: Causal Mechanism Recovery via EML Symbolic Trees in Structure Learning
Pith reviewed 2026-06-27 23:44 UTC · model grok-4.3
The pith
EML-CD recovers causal graph structure together with closed-form equations for each mechanism by representing edges as gated symbolic trees.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
EML-CD represents each candidate edge mechanism as a gated EML binary tree and jointly optimizes structure and tree parameters to output a DAG whose edges carry closed-form causal equations; these equations are obtained directly from the EML compositions and permit direct Jacobian evaluation without post-hoc extraction.
What carries the argument
The gated EML binary tree, which encodes each causal mechanism as a composition of elementary functions through repeated application of a single binary operator together with learned gates.
If this is right
- Analytical Jacobians become available for every discovered edge, enabling direct quantification of causal effects.
- On the Sachs data the method returns equations with edge precision 0.756 while keeping SHD within the variance of PC and GES.
- In bivariate tests with known mechanisms the approach recovers ten of eleven elementary function families with held-out shape correlation at least 0.96.
- On symbolic synthetic data the held-out mechanism f-MSE is substantially lower than that of a fixed SINDy dictionary.
- A depth-2 model improves F1 over linear OLS-BIC on the Causal Chambers light-tunnel subset.
Where Pith is reading between the lines
- Knowing the explicit equation on each edge could allow direct simulation of interventions without retraining a separate predictor.
- The same gated-tree representation might be inserted into other continuous structure-learning objectives that currently use neural mechanisms.
- If tree depth is limited, the method may remain tractable on problems larger than the d=11 Sachs example while still producing human-readable equations.
Load-bearing premise
Real causal mechanisms are sufficiently well approximated by compositions of elementary functions inside gated binary trees so that joint structure-and-mechanism optimization does not create systematic bias.
What would settle it
A benchmark dataset whose ground-truth mechanisms require functions or compositions outside the EML operator's elementary library would produce either high mechanism f-MSE or structure errors that exceed those of PC/GES.
Figures
read the original abstract
Neural network (NN)-based nonlinear causal discovery methods recover DAG structure but leave each causal mechanism as a black box. Waxman et al. argued that extracting causal mechanisms from NN weights is ill-posed. We propose EML-CD, a framework that integrates the EML operator (capable of composing elementary functions from a single binary operator) into causal structure learning, with interpretable mechanism recovery as the primary objective. EML-CD represents each edge mechanism as a gated EML binary tree and automatically discovers closed-form causal equations. Analytical Jacobians can be directly computed from the output equations, enabling quantitative understanding of causal effects. On real data (Sachs protein signaling, d=11), EML-CD achieves SHD=11.2 +/- 0.4 (5-seed mean; baselines are single deterministic runs), on par with PC/GES within seed variance and below CAM, while attaching closed-form equations to each detected edge (precision 0.756, recall 0.365). In a controlled bivariate test with known mechanisms, EML-CD recovers 10 of 11 elementary function families faithfully (held-out shape correlation >= 0.96; only high-frequency sine is partial). On a symbolic synthetic benchmark, EML-CD attains a substantially lower and more stable held-out mechanism f-MSE than a fixed SINDy dictionary (mean 3.67 vs. 7644, the latter inflated by catastrophic extrapolation on one seed), although its structure recovery (SHD 14.0) only matches the dictionary and stays below specialized optimizers; on the Causal Chambers light-tunnel subset, a depth-2 model improves F1 over linear OLS-BIC (0.444 vs. 0.273).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes EML-CD, which integrates the EML operator into causal structure learning by representing each edge mechanism as a gated EML binary tree. This enables joint recovery of DAG structure and closed-form symbolic causal equations, with analytical Jacobians for effect quantification. Claims include SHD=11.2±0.4 on Sachs (on par with PC/GES), recovery of 10/11 elementary functions in bivariate tests (shape correlation ≥0.96), held-out mechanism f-MSE of 3.67 vs. 7644 for fixed SINDy on symbolic synthetics, and F1=0.444 vs. 0.273 for linear OLS-BIC on Causal Chambers light-tunnel data.
Significance. If the central claims hold, the work addresses a key limitation of NN-based causal discovery by delivering interpretable, closed-form mechanisms rather than black-box functions. This is significant for domains requiring mechanistic insight and quantitative causal effects, as it combines structure learning with symbolic regression in a single framework. The reported gains in mechanism fidelity on synthetics drawn from the EML library are a concrete strength.
major comments (3)
- [Experimental Results (Sachs)] Experimental section on Sachs protein signaling: SHD is reported as a 5-seed mean (11.2±0.4) for EML-CD while PC/GES and other baselines are single deterministic runs; this protocol mismatch undermines the direct 'on par' claim and requires either matched multi-seed baselines or explicit variance reporting for all methods.
- [Methods] Methods and experimental protocol sections: no full hyperparameter protocol, gating threshold schedule, or tree-construction details are provided, nor is the joint optimization procedure for structure and mechanism parameters fully specified; these omissions prevent independent verification of the reported f-MSE and SHD numbers.
- [Results (Symbolic Synthetic)] Symbolic synthetic benchmark results: while mechanism f-MSE is lower, structure recovery (SHD=14.0) only matches the dictionary baseline and remains below specialized structure optimizers; the manuscript should quantify whether the mechanism-recovery advantage systematically trades off against structure accuracy.
minor comments (2)
- [Abstract] Abstract: the citation 'Waxman et al.' should include the full reference and year for clarity.
- [Bivariate Experiments] Bivariate test description: clarify how the 11 elementary function families were selected and whether the held-out shape correlation metric is defined in the main text or supplement.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We respond to each major comment below and indicate the revisions we plan to make.
read point-by-point responses
-
Referee: Experimental section on Sachs protein signaling: SHD is reported as a 5-seed mean (11.2±0.4) for EML-CD while PC/GES and other baselines are single deterministic runs; this protocol mismatch undermines the direct 'on par' claim and requires either matched multi-seed baselines or explicit variance reporting for all methods.
Authors: We acknowledge this inconsistency in reporting. PC and GES are deterministic algorithms without inherent randomness, hence single runs. However, to strengthen the comparison, we will add multi-seed evaluations for any stochastic baselines and explicitly state the deterministic nature of PC/GES in the revised manuscript. The 'on par' claim will be qualified to reflect that EML-CD's performance is comparable within its variance. revision: yes
-
Referee: Methods and experimental protocol sections: no full hyperparameter protocol, gating threshold schedule, or tree-construction details are provided, nor is the joint optimization procedure for structure and mechanism parameters fully specified; these omissions prevent independent verification of the reported f-MSE and SHD numbers.
Authors: We agree that additional details are necessary for reproducibility. The revised manuscript will include a dedicated section or appendix with the full hyperparameter settings, the gating threshold schedule, the EML tree construction procedure, and a precise description of the joint optimization algorithm used for structure and mechanism parameters. revision: yes
-
Referee: Symbolic synthetic benchmark results: while mechanism f-MSE is lower, structure recovery (SHD=14.0) only matches the dictionary baseline and remains below specialized structure optimizers; the manuscript should quantify whether the mechanism-recovery advantage systematically trades off against structure accuracy.
Authors: The paper already highlights that structure recovery is not the primary goal and matches the dictionary baseline. To quantify the potential trade-off, we will include additional discussion and possibly new metrics or comparisons in the revision that explicitly contrast the mechanism recovery benefits against any structure accuracy differences relative to specialized structure learning methods. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's central claims rest on empirical performance against external benchmarks (Sachs protein data, known bivariate function families, symbolic synthetic data, Causal Chambers) and explicit modeling assumptions about EML tree expressivity. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or a self-definitional equivalence. Structure recovery metrics and mechanism f-MSE are reported relative to independent baselines (PC, GES, SINDy) without the target quantities being defined solely in terms of the model's own fitted values. The derivation is therefore self-contained against the stated external validation.
Axiom & Free-Parameter Ledger
free parameters (1)
- gating thresholds and tree construction hyperparameters
axioms (1)
- domain assumption Causal mechanisms can be expressed as compositions of elementary functions generated from a single binary operator via binary trees.
invented entities (1)
-
gated EML binary tree
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Apellániz, Santiago Zazo, and Juan Parras
Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, and Juan Parras. KaCGM: Kolmogorov-arnold causal generative models.arXiv preprint arXiv:2603.20184, 2026
arXiv 2026
-
[2]
DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization
Kevin Bello, Bryon Aragam, and Pradeep Ravikumar. DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization. InNeurIPS, 2022. arXiv:2209.08037
arXiv 2022
-
[3]
Tiago Brogueira and Mário A. T. Figueiredo. Bivariate causal discovery using rate-distortion MDL: An information dimension approach.arXiv preprint arXiv:2604.05829, 2026
Pith/arXiv arXiv 2026
-
[4]
Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar
Philippe Brouillard, Chandler Squires, Jonas Wahl, Konrad P. Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar. The landscape of causal discovery data: Grounding causal discovery in real-world applications. InProceedings of the Fourth Conference on Causal Learning and Reasoning (CLeaR), volume 275 ofProceedings of Machine Learning Research, pages...
arXiv 2025
-
[5]
Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.PNAS, 113(15):3932–3937, 2016. doi: 10.1073/pnas.1517384113
-
[6]
CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014
Peter Bühlmann, Jonas Peters, and Jan Ernest. CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014. doi: 10.1214/ 14-AOS1260
2014
-
[7]
Discovering symbolic models from deep learning with inductive biases
Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering symbolic models from deep learning with inductive biases. InNeurIPS, 2020. arXiv:2006.11287
arXiv 2020
-
[8]
Gamella, Jonas Peters, and Peter Bühlmann
Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal chambers as a real-world physical testbed for AI methodology.Nature Machine Intelligence, 7(1):107–118, 2025. doi: 10.1038/s42256-024-00964-x
-
[9]
Hoyer, Dominik Janzing, Joris M
Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. InNeurIPS, 2008
2008
-
[10]
Gradient-based neural DAG learning
Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural DAG learning. InICLR, 2020. arXiv:1906.02226
arXiv 2020
-
[11]
DiBS: Differentiable Bayesian structure learning
Lars Lorch, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. DiBS: Differentiable Bayesian structure learning. InNeurIPS, 2021
2021
-
[12]
Lundberg and Su-In Lee
Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InNeurIPS, 2017
2017
-
[13]
On the role of sparsity and DAG constraints for learning linear DAGs
Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. On the role of sparsity and DAG constraints for learning linear DAGs. InNeurIPS, 2020
2020
-
[14]
All elementary functions from a single binary operator
Andrzej Odrzywołek. All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026. doi: 10.48550/arXiv.2603.21852
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.21852 2026
-
[15]
Mooij, Dominik Janzing, and Bernhard Schölkopf
Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014. 12
2009
-
[16]
Joseph Ramsey, Bryan Andrews, and Peter Spirtes. Scalable causal discovery from recursive nonlinear data via truncated basis function scores and tests.arXiv preprint arXiv:2510.04276, 2025
arXiv 2025
-
[17]
Why Should I Trust You?
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?”: Explaining the predictions of any classifier. InKDD, 2016
2016
-
[18]
Causal protein-signaling networks derived from multiparameter single-cell data
Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. Causal protein- signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005. doi: 10.1126/science.1105809
-
[19]
Gideon Schwarz. Estimating the dimension of a model.Annals of Statistics, 6(2):461–464, 1978. doi: 10.1214/aos/1176344136
-
[20]
Hoyer, Aapo Hyvärinen, and Antti Kerminen
Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006
2003
-
[21]
Hoyer, and Kenneth Bollen
Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, and Kenneth Bollen. DirectLiNGAM: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research, 12:1225–1248, 2011
2011
-
[22]
Daniel Waxman, Kurt Butler, and Petar M. Djuri ´c. DAGMA-DCE: Interpretable, non-parametric differentiable causal discovery.IEEE Open Journal of Signal Processing, 5:393–401, 2024. doi: 10.1109/OJSP.2024.3351593. arXiv:2401.02930
-
[23]
Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. DAGs with NO TEARS: Continuous Optimization for Structure Learning. InNeurIPS, 2018
2018
-
[24]
Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. Learning sparse nonparametric DAGs. InAISTATS, 2020. arXiv:1909.13189
arXiv 2020
-
[25]
Differentiable constraint-based causal discovery
Jincheng Zhou, Mengbo Wang, Anqi He, Yumeng Zhou, Hessam Olya, Murat Kocaoglu, and Bruno Ribeiro. Differentiable constraint-based causal discovery. InNeurIPS, 2025. arXiv:2510.22031. 13
arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.