EML-CD: Causal Mechanism Recovery via EML Symbolic Trees in Structure Learning

Sota Asanuma

arxiv: 2606.05942 · v1 · pith:GQIRCMT4new · submitted 2026-06-04 · 📊 stat.ML · cs.LG

EML-CD: Causal Mechanism Recovery via EML Symbolic Trees in Structure Learning

Sota Asanuma This is my paper

Pith reviewed 2026-06-27 23:44 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords causal discoverystructure learningsymbolic regressionmechanism recoveryEML treesDAG learninginterpretable causal models

0 comments

The pith

EML-CD recovers causal graph structure together with closed-form equations for each mechanism by representing edges as gated symbolic trees.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents EML-CD as a way to perform causal structure learning while also recovering explicit mathematical equations for the mechanisms on each edge. It replaces black-box neural representations with gated binary trees built from the EML operator, which composes elementary functions into readable closed-form expressions. Because the equations are explicit, Jacobians can be computed analytically to quantify causal effects. On the Sachs protein-signaling dataset the method reaches structural accuracy comparable to PC and GES while returning equations whose precision and recall are reported. Controlled experiments show faithful recovery of most elementary function families and lower held-out mechanism error than a fixed dictionary approach.

Core claim

EML-CD represents each candidate edge mechanism as a gated EML binary tree and jointly optimizes structure and tree parameters to output a DAG whose edges carry closed-form causal equations; these equations are obtained directly from the EML compositions and permit direct Jacobian evaluation without post-hoc extraction.

What carries the argument

The gated EML binary tree, which encodes each causal mechanism as a composition of elementary functions through repeated application of a single binary operator together with learned gates.

If this is right

Analytical Jacobians become available for every discovered edge, enabling direct quantification of causal effects.
On the Sachs data the method returns equations with edge precision 0.756 while keeping SHD within the variance of PC and GES.
In bivariate tests with known mechanisms the approach recovers ten of eleven elementary function families with held-out shape correlation at least 0.96.
On symbolic synthetic data the held-out mechanism f-MSE is substantially lower than that of a fixed SINDy dictionary.
A depth-2 model improves F1 over linear OLS-BIC on the Causal Chambers light-tunnel subset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Knowing the explicit equation on each edge could allow direct simulation of interventions without retraining a separate predictor.
The same gated-tree representation might be inserted into other continuous structure-learning objectives that currently use neural mechanisms.
If tree depth is limited, the method may remain tractable on problems larger than the d=11 Sachs example while still producing human-readable equations.

Load-bearing premise

Real causal mechanisms are sufficiently well approximated by compositions of elementary functions inside gated binary trees so that joint structure-and-mechanism optimization does not create systematic bias.

What would settle it

A benchmark dataset whose ground-truth mechanisms require functions or compositions outside the EML operator's elementary library would produce either high mechanism f-MSE or structure errors that exceed those of PC/GES.

Figures

Figures reproduced from arXiv: 2606.05942 by Sota Asanuma.

**Figure 2.** Figure 2: Sachs data: analytical Jacobian ∂P38/∂PKC for PKC → P38, computed by automatic differentiation of the gate-annealed (hard-gate) EML-CD mechanism—so the curve is the exact derivative of the displayed Example-1 equation (dense grid; no interpolation). It ranges from ≈ −5.4 to ≈ 2.3 and changes sign with input level, quantifying a state-dependent nonlinear causal effect; the jumps reflect the numerical bounds… view at source ↗

**Figure 3.** Figure 3: Controlled mechanism recovery: true mechanism (dashed) vs. the recovered gate-annealed EML [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: S-Sym held-out mechanism f-MSE per seed (log scale). EML-CD (blue) stays low and stable across [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Neural network (NN)-based nonlinear causal discovery methods recover DAG structure but leave each causal mechanism as a black box. Waxman et al. argued that extracting causal mechanisms from NN weights is ill-posed. We propose EML-CD, a framework that integrates the EML operator (capable of composing elementary functions from a single binary operator) into causal structure learning, with interpretable mechanism recovery as the primary objective. EML-CD represents each edge mechanism as a gated EML binary tree and automatically discovers closed-form causal equations. Analytical Jacobians can be directly computed from the output equations, enabling quantitative understanding of causal effects. On real data (Sachs protein signaling, d=11), EML-CD achieves SHD=11.2 +/- 0.4 (5-seed mean; baselines are single deterministic runs), on par with PC/GES within seed variance and below CAM, while attaching closed-form equations to each detected edge (precision 0.756, recall 0.365). In a controlled bivariate test with known mechanisms, EML-CD recovers 10 of 11 elementary function families faithfully (held-out shape correlation >= 0.96; only high-frequency sine is partial). On a symbolic synthetic benchmark, EML-CD attains a substantially lower and more stable held-out mechanism f-MSE than a fixed SINDy dictionary (mean 3.67 vs. 7644, the latter inflated by catastrophic extrapolation on one seed), although its structure recovery (SHD 14.0) only matches the dictionary and stays below specialized optimizers; on the Causal Chambers light-tunnel subset, a depth-2 model improves F1 over linear OLS-BIC (0.444 vs. 0.273).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EML-CD pairs gated symbolic trees with structure learning to output closed-form mechanisms, but real-data validation stays mostly structural.

read the letter

The main point is that this paper folds EML binary trees into causal discovery so the output includes both a DAG and explicit equations for each edge. The gated tree representation lets them optimize structure and mechanism parameters together, which is a concrete response to the ill-posedness issue with neural weights.

It does well on the controlled bivariate tests, recovering ten of eleven elementary function families with high held-out shape correlation. The symbolic synthetic benchmark also shows much lower and more stable mechanism f-MSE than the fixed SINDy dictionary. Structure recovery on Sachs lands within seed variance of PC and GES.

The soft spots are straightforward. Sachs results report only structural metrics; the attached equations get no quantitative mechanism check such as held-out f-MSE or prediction error. The synthetic wins occur on data generated from the same elementary-function families the model assumes, so they do not test how well the approximation holds outside that class. Some baseline numbers come from single deterministic runs while EML-CD reports seed means, which makes direct comparison noisier.

This is for researchers who need explicit functional forms and Jacobians rather than black-box weights in causal pipelines. Readers working on symbolic methods or scientific causal modeling will see the clearest value. The work engages the literature honestly and the central claim is testable, so it deserves a serious referee even with the gaps in real-data mechanism validation.

Referee Report

3 major / 2 minor

Summary. The paper proposes EML-CD, which integrates the EML operator into causal structure learning by representing each edge mechanism as a gated EML binary tree. This enables joint recovery of DAG structure and closed-form symbolic causal equations, with analytical Jacobians for effect quantification. Claims include SHD=11.2±0.4 on Sachs (on par with PC/GES), recovery of 10/11 elementary functions in bivariate tests (shape correlation ≥0.96), held-out mechanism f-MSE of 3.67 vs. 7644 for fixed SINDy on symbolic synthetics, and F1=0.444 vs. 0.273 for linear OLS-BIC on Causal Chambers light-tunnel data.

Significance. If the central claims hold, the work addresses a key limitation of NN-based causal discovery by delivering interpretable, closed-form mechanisms rather than black-box functions. This is significant for domains requiring mechanistic insight and quantitative causal effects, as it combines structure learning with symbolic regression in a single framework. The reported gains in mechanism fidelity on synthetics drawn from the EML library are a concrete strength.

major comments (3)

[Experimental Results (Sachs)] Experimental section on Sachs protein signaling: SHD is reported as a 5-seed mean (11.2±0.4) for EML-CD while PC/GES and other baselines are single deterministic runs; this protocol mismatch undermines the direct 'on par' claim and requires either matched multi-seed baselines or explicit variance reporting for all methods.
[Methods] Methods and experimental protocol sections: no full hyperparameter protocol, gating threshold schedule, or tree-construction details are provided, nor is the joint optimization procedure for structure and mechanism parameters fully specified; these omissions prevent independent verification of the reported f-MSE and SHD numbers.
[Results (Symbolic Synthetic)] Symbolic synthetic benchmark results: while mechanism f-MSE is lower, structure recovery (SHD=14.0) only matches the dictionary baseline and remains below specialized structure optimizers; the manuscript should quantify whether the mechanism-recovery advantage systematically trades off against structure accuracy.

minor comments (2)

[Abstract] Abstract: the citation 'Waxman et al.' should include the full reference and year for clarity.
[Bivariate Experiments] Bivariate test description: clarify how the 11 elementary function families were selected and whether the held-out shape correlation metric is defined in the main text or supplement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We respond to each major comment below and indicate the revisions we plan to make.

read point-by-point responses

Referee: Experimental section on Sachs protein signaling: SHD is reported as a 5-seed mean (11.2±0.4) for EML-CD while PC/GES and other baselines are single deterministic runs; this protocol mismatch undermines the direct 'on par' claim and requires either matched multi-seed baselines or explicit variance reporting for all methods.

Authors: We acknowledge this inconsistency in reporting. PC and GES are deterministic algorithms without inherent randomness, hence single runs. However, to strengthen the comparison, we will add multi-seed evaluations for any stochastic baselines and explicitly state the deterministic nature of PC/GES in the revised manuscript. The 'on par' claim will be qualified to reflect that EML-CD's performance is comparable within its variance. revision: yes
Referee: Methods and experimental protocol sections: no full hyperparameter protocol, gating threshold schedule, or tree-construction details are provided, nor is the joint optimization procedure for structure and mechanism parameters fully specified; these omissions prevent independent verification of the reported f-MSE and SHD numbers.

Authors: We agree that additional details are necessary for reproducibility. The revised manuscript will include a dedicated section or appendix with the full hyperparameter settings, the gating threshold schedule, the EML tree construction procedure, and a precise description of the joint optimization algorithm used for structure and mechanism parameters. revision: yes
Referee: Symbolic synthetic benchmark results: while mechanism f-MSE is lower, structure recovery (SHD=14.0) only matches the dictionary baseline and remains below specialized structure optimizers; the manuscript should quantify whether the mechanism-recovery advantage systematically trades off against structure accuracy.

Authors: The paper already highlights that structure recovery is not the primary goal and matches the dictionary baseline. To quantify the potential trade-off, we will include additional discussion and possibly new metrics or comparisons in the revision that explicitly contrast the mechanism recovery benefits against any structure accuracy differences relative to specialized structure learning methods. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's central claims rest on empirical performance against external benchmarks (Sachs protein data, known bivariate function families, symbolic synthetic data, Causal Chambers) and explicit modeling assumptions about EML tree expressivity. No load-bearing step reduces by construction to a fitted parameter renamed as prediction, a self-citation chain, or a self-definitional equivalence. Structure recovery metrics and mechanism f-MSE are reported relative to independent baselines (PC, GES, SINDy) without the target quantities being defined solely in terms of the model's own fitted values. The derivation is therefore self-contained against the stated external validation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The framework rests on the domain assumption that real causal mechanisms admit compact representation via EML-composable elementary functions and that gating can separate structure decisions from parameter fitting without circular dependence on the recovered equations themselves.

free parameters (1)

gating thresholds and tree construction hyperparameters
Parameters that control which edges are active and how deep the symbolic trees grow; these are optimized during learning and directly affect both structure and mechanism outputs.

axioms (1)

domain assumption Causal mechanisms can be expressed as compositions of elementary functions generated from a single binary operator via binary trees.
Invoked when the paper states that each edge mechanism is represented as a gated EML binary tree.

invented entities (1)

gated EML binary tree no independent evidence
purpose: To encode and discover interpretable closed-form causal mechanisms jointly with DAG structure.
New representational device introduced by the framework; no independent evidence outside the reported benchmarks is supplied.

pith-pipeline@v0.9.1-grok · 5847 in / 1422 out tokens · 32151 ms · 2026-06-27T23:44:08.002196+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 6 canonical work pages · 1 internal anchor

[1]

Apellániz, Santiago Zazo, and Juan Parras

Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, and Juan Parras. KaCGM: Kolmogorov-arnold causal generative models.arXiv preprint arXiv:2603.20184, 2026

arXiv 2026
[2]

DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization

Kevin Bello, Bryon Aragam, and Pradeep Ravikumar. DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization. InNeurIPS, 2022. arXiv:2209.08037

arXiv 2022
[3]

Tiago Brogueira and Mário A. T. Figueiredo. Bivariate causal discovery using rate-distortion MDL: An information dimension approach.arXiv preprint arXiv:2604.05829, 2026

Pith/arXiv arXiv 2026
[4]

Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar

Philippe Brouillard, Chandler Squires, Jonas Wahl, Konrad P. Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar. The landscape of causal discovery data: Grounding causal discovery in real-world applications. InProceedings of the Fourth Conference on Causal Learning and Reasoning (CLeaR), volume 275 ofProceedings of Machine Learning Research, pages...

arXiv 2025
[5]

L., Proctor, J

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.PNAS, 113(15):3932–3937, 2016. doi: 10.1073/pnas.1517384113

work page doi:10.1073/pnas.1517384113 2016
[6]

CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014

Peter Bühlmann, Jonas Peters, and Jan Ernest. CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014. doi: 10.1214/ 14-AOS1260

2014
[7]

Discovering symbolic models from deep learning with inductive biases

Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering symbolic models from deep learning with inductive biases. InNeurIPS, 2020. arXiv:2006.11287

arXiv 2020
[8]

Gamella, Jonas Peters, and Peter Bühlmann

Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal chambers as a real-world physical testbed for AI methodology.Nature Machine Intelligence, 7(1):107–118, 2025. doi: 10.1038/s42256-024-00964-x

work page doi:10.1038/s42256-024-00964-x 2025
[9]

Hoyer, Dominik Janzing, Joris M

Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. InNeurIPS, 2008

2008
[10]

Gradient-based neural DAG learning

Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural DAG learning. InICLR, 2020. arXiv:1906.02226

arXiv 2020
[11]

DiBS: Differentiable Bayesian structure learning

Lars Lorch, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. DiBS: Differentiable Bayesian structure learning. InNeurIPS, 2021

2021
[12]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InNeurIPS, 2017

2017
[13]

On the role of sparsity and DAG constraints for learning linear DAGs

Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. On the role of sparsity and DAG constraints for learning linear DAGs. InNeurIPS, 2020

2020
[14]

All elementary functions from a single binary operator

Andrzej Odrzywołek. All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026. doi: 10.48550/arXiv.2603.21852

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.21852 2026
[15]

Mooij, Dominik Janzing, and Bernhard Schölkopf

Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014. 12

2009
[16]

Scalable causal discovery from recursive nonlinear data via truncated basis function scores and tests.arXiv preprint arXiv:2510.04276, 2025

Joseph Ramsey, Bryan Andrews, and Peter Spirtes. Scalable causal discovery from recursive nonlinear data via truncated basis function scores and tests.arXiv preprint arXiv:2510.04276, 2025

arXiv 2025
[17]

Why Should I Trust You?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?”: Explaining the predictions of any classifier. InKDD, 2016

2016
[18]

Causal protein-signaling networks derived from multiparameter single-cell data

Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. Causal protein- signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005. doi: 10.1126/science.1105809

work page doi:10.1126/science.1105809 2005
[19]

The Annals of Statistics7(1), 1 – 26 (1979).https://doi.org/10.1214/aos/1176344552,https: //doi.org/10.1214/aos/11763445525

Gideon Schwarz. Estimating the dimension of a model.Annals of Statistics, 6(2):461–464, 1978. doi: 10.1214/aos/1176344136

work page doi:10.1214/aos/1176344136 1978
[20]

Hoyer, Aapo Hyvärinen, and Antti Kerminen

Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

2003
[21]

Hoyer, and Kenneth Bollen

Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, and Kenneth Bollen. DirectLiNGAM: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research, 12:1225–1248, 2011

2011
[22]

Djuri ´c

Daniel Waxman, Kurt Butler, and Petar M. Djuri ´c. DAGMA-DCE: Interpretable, non-parametric differentiable causal discovery.IEEE Open Journal of Signal Processing, 5:393–401, 2024. doi: 10.1109/OJSP.2024.3351593. arXiv:2401.02930

work page doi:10.1109/ojsp.2024.3351593 2024
[23]

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. DAGs with NO TEARS: Continuous Optimization for Structure Learning. InNeurIPS, 2018

2018
[24]

Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. Learning sparse nonparametric DAGs. InAISTATS, 2020. arXiv:1909.13189

arXiv 2020
[25]

Differentiable constraint-based causal discovery

Jincheng Zhou, Mengbo Wang, Anqi He, Yumeng Zhou, Hessam Olya, Murat Kocaoglu, and Bruno Ribeiro. Differentiable constraint-based causal discovery. InNeurIPS, 2025. arXiv:2510.22031. 13

arXiv 2025

[1] [1]

Apellániz, Santiago Zazo, and Juan Parras

Alejandro Almodóvar, Mar Elizo, Patricia A. Apellániz, Santiago Zazo, and Juan Parras. KaCGM: Kolmogorov-arnold causal generative models.arXiv preprint arXiv:2603.20184, 2026

arXiv 2026

[2] [2]

DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization

Kevin Bello, Bryon Aragam, and Pradeep Ravikumar. DAGMA: Learning DAGs via M-matrices and a log-determinant acyclicity characterization. InNeurIPS, 2022. arXiv:2209.08037

arXiv 2022

[3] [3]

Tiago Brogueira and Mário A. T. Figueiredo. Bivariate causal discovery using rate-distortion MDL: An information dimension approach.arXiv preprint arXiv:2604.05829, 2026

Pith/arXiv arXiv 2026

[4] [4]

Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar

Philippe Brouillard, Chandler Squires, Jonas Wahl, Konrad P. Körding, Karen Sachs, Alexandre Drouin, and Dhanya Sridhar. The landscape of causal discovery data: Grounding causal discovery in real-world applications. InProceedings of the Fourth Conference on Causal Learning and Reasoning (CLeaR), volume 275 ofProceedings of Machine Learning Research, pages...

arXiv 2025

[5] [5]

L., Proctor, J

Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.PNAS, 113(15):3932–3937, 2016. doi: 10.1073/pnas.1517384113

work page doi:10.1073/pnas.1517384113 2016

[6] [6]

CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014

Peter Bühlmann, Jonas Peters, and Jan Ernest. CAM: Causal additive models, high-dimensional order search and penalized regression.Annals of Statistics, 42(6):2526–2556, 2014. doi: 10.1214/ 14-AOS1260

2014

[7] [7]

Discovering symbolic models from deep learning with inductive biases

Miles Cranmer, Alvaro Sanchez-Gonzalez, Peter Battaglia, Rui Xu, Kyle Cranmer, David Spergel, and Shirley Ho. Discovering symbolic models from deep learning with inductive biases. InNeurIPS, 2020. arXiv:2006.11287

arXiv 2020

[8] [8]

Gamella, Jonas Peters, and Peter Bühlmann

Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal chambers as a real-world physical testbed for AI methodology.Nature Machine Intelligence, 7(1):107–118, 2025. doi: 10.1038/s42256-024-00964-x

work page doi:10.1038/s42256-024-00964-x 2025

[9] [9]

Hoyer, Dominik Janzing, Joris M

Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jonas Peters, and Bernhard Schölkopf. Nonlinear causal discovery with additive noise models. InNeurIPS, 2008

2008

[10] [10]

Gradient-based neural DAG learning

Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural DAG learning. InICLR, 2020. arXiv:1906.02226

arXiv 2020

[11] [11]

DiBS: Differentiable Bayesian structure learning

Lars Lorch, Jonas Rothfuss, Bernhard Schölkopf, and Andreas Krause. DiBS: Differentiable Bayesian structure learning. InNeurIPS, 2021

2021

[12] [12]

Lundberg and Su-In Lee

Scott M. Lundberg and Su-In Lee. A unified approach to interpreting model predictions. InNeurIPS, 2017

2017

[13] [13]

On the role of sparsity and DAG constraints for learning linear DAGs

Ignavier Ng, AmirEmad Ghassami, and Kun Zhang. On the role of sparsity and DAG constraints for learning linear DAGs. InNeurIPS, 2020

2020

[14] [14]

All elementary functions from a single binary operator

Andrzej Odrzywołek. All elementary functions from a single binary operator.arXiv preprint arXiv:2603.21852, 2026. doi: 10.48550/arXiv.2603.21852

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2603.21852 2026

[15] [15]

Mooij, Dominik Janzing, and Bernhard Schölkopf

Jonas Peters, Joris M. Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models.Journal of Machine Learning Research, 15:2009–2053, 2014. 12

2009

[16] [16]

Scalable causal discovery from recursive nonlinear data via truncated basis function scores and tests.arXiv preprint arXiv:2510.04276, 2025

Joseph Ramsey, Bryan Andrews, and Peter Spirtes. Scalable causal discovery from recursive nonlinear data via truncated basis function scores and tests.arXiv preprint arXiv:2510.04276, 2025

arXiv 2025

[17] [17]

Why Should I Trust You?

Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why Should I Trust You?”: Explaining the predictions of any classifier. InKDD, 2016

2016

[18] [18]

Causal protein-signaling networks derived from multiparameter single-cell data

Karen Sachs, Omar Perez, Dana Pe’er, Douglas A. Lauffenburger, and Garry P. Nolan. Causal protein- signaling networks derived from multiparameter single-cell data.Science, 308(5721):523–529, 2005. doi: 10.1126/science.1105809

work page doi:10.1126/science.1105809 2005

[19] [19]

The Annals of Statistics7(1), 1 – 26 (1979).https://doi.org/10.1214/aos/1176344552,https: //doi.org/10.1214/aos/11763445525

Gideon Schwarz. Estimating the dimension of a model.Annals of Statistics, 6(2):461–464, 1978. doi: 10.1214/aos/1176344136

work page doi:10.1214/aos/1176344136 1978

[20] [20]

Hoyer, Aapo Hyvärinen, and Antti Kerminen

Shohei Shimizu, Patrik O. Hoyer, Aapo Hyvärinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

2003

[21] [21]

Hoyer, and Kenneth Bollen

Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvärinen, Yoshinobu Kawahara, Takashi Washio, Patrik O. Hoyer, and Kenneth Bollen. DirectLiNGAM: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research, 12:1225–1248, 2011

2011

[22] [22]

Djuri ´c

Daniel Waxman, Kurt Butler, and Petar M. Djuri ´c. DAGMA-DCE: Interpretable, non-parametric differentiable causal discovery.IEEE Open Journal of Signal Processing, 5:393–401, 2024. doi: 10.1109/OJSP.2024.3351593. arXiv:2401.02930

work page doi:10.1109/ojsp.2024.3351593 2024

[23] [23]

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. DAGs with NO TEARS: Continuous Optimization for Structure Learning. InNeurIPS, 2018

2018

[24] [24]

Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric P. Xing. Learning sparse nonparametric DAGs. InAISTATS, 2020. arXiv:1909.13189

arXiv 2020

[25] [25]

Differentiable constraint-based causal discovery

Jincheng Zhou, Mengbo Wang, Anqi He, Yumeng Zhou, Hessam Olya, Murat Kocaoglu, and Bruno Ribeiro. Differentiable constraint-based causal discovery. InNeurIPS, 2025. arXiv:2510.22031. 13

arXiv 2025