Latent Confounded Causal Discovery via Lie Bracket Geometry

Sridhar Mahadevan

arxiv: 2606.19610 · v1 · pith:RLEN4U2Bnew · submitted 2026-06-17 · 💻 cs.LG · cs.AI

Latent Confounded Causal Discovery via Lie Bracket Geometry

Sridhar Mahadevan This is my paper

Pith reviewed 2026-06-26 20:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords causal discoverylatent confoundingLie bracketsRadon-Nikodym derivativesintervention geometryKan-Do-CalculusBRIDGE algorithmSKFM

0 comments

The pith

Failures of Lie brackets to close on intervention vector fields reveal latent confounders in causal models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that in smooth statistical settings, Radon-Nikodym derivatives between observational and interventional measures create local causal vector fields whose Lie bracket closures can be checked to find latent structure. This geometric approach allows two new algorithms, BRIDGE and SKFM, to screen possible causal arrows and handle latent confounding without exhaustive search. A sympathetic reader would care because traditional causal discovery struggles with hidden variables and the combinatorial explosion of possible graphs, and this method offers a way to prune the space using differential geometry. The work builds on prior Kan-Do-Calculus to make intervention effects into computable flows. If correct, it changes how we infer causal structure from data involving unmeasured variables.

Core claim

In smooth statistical settings, Radon-Nikodym derivatives between observational and interventional measures induce local causal vector fields; failures of these fields to close under Lie brackets become computable Frobenius residuals, which we interpret as witnesses of failed visible integrability and possible latent or unmodeled structure. The algorithms BRIDGE and SKFM use this to discover causal models with latent confounders while collapsing the space of possible DAGs by many orders of magnitude.

What carries the argument

Lie bracket residuals (or Frobenius residuals) computed from Radon-Nikodym derivatives of interventional measures, serving as geometric screens for admissible causal arrows and detectors of latent obstructions.

If this is right

BRIDGE combines an interventional density engine with a geometric screen to propose high-recall admissible arrows and identify latent-obstruction candidates.
SKFM learns amortized intervention fields and factors latent curvature spectrally.
Both algorithms can discover causal models with latent confounders.
The methods collapse the super-exponential space of possible DAGs by many orders of magnitude.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This geometric view of causal discovery might extend to other areas of machine learning where structure is learned from flows or dynamics.
The connection to category theory via Kan extensions could inspire similar adjunction-based methods in other inference problems.
Practical implementations would need efficient ways to estimate the Radon-Nikodym derivatives from data.
If the method works, it could be combined with existing score-based discovery methods to improve their handling of hidden variables.

Load-bearing premise

That non-closing Lie brackets computed from Radon-Nikodym derivatives reliably indicate failed visible integrability due to latent structure rather than other statistical or modeling issues.

What would settle it

Running the algorithms on synthetic data from fully observed causal models where all variables are measured and checking if they incorrectly flag latent confounders, or on models with known latent variables and seeing if the flagged structures match the true hidden variables.

Figures

Figures reproduced from arXiv: 2606.19610 by Sridhar Mahadevan.

**Figure 1.** Figure 1: BRIDGE pipeline for information-geometric causal discovery. Instead of enumerating every DAG, BRIDGE first estimates local intervention vector fields on the statistical manifold. Directed influence proposes candidate arrows, while Lie-bracket residuals detect failures of closure caused by missing structure or latent confounding. Downstream discovery is then run only inside the geometry-pruned family, insta… view at source ↗

**Figure 2.** Figure 2: Kan extensions in the causal adjunction. The functor [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Fisherman-derivative intuition for Lie-bracket causal discovery. A local intervention field [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: SKFM as an end-to-end counterpart to the main Lie/Frobenius screening pipeline. Unlike [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: SKFM/BRIDGE fork diagnostic instantiated with the Experiment 2 nonlinear latent-fork data from Section 10. Panel A shows the observed variables X1 = sin(L) + η1 and X2 = cos(L) + η2, colored by the hidden driver L for visualization. Panel B reports the hidden-latent Lie-bracket norms; the dominant non-closing pair is (X1, X2) with B12 = 0.523. Panel C compares hidden-L and observed-L regimes, showing that … view at source ↗

**Figure 6.** Figure 6: BRIDGE geometry walkthrough for the seven-node latent-confounded chain in Experiment 3. The panels show learned local intervention fields, the non-closing Lie-bracket residual [vi , vj ]⊥, pairwise Frobenius anisotropy, and the final BRIDGE-pruned candidate mask. Green outlines mark the true visible chain arrows, all retained before TCES/BIC scoring. We generated n = 220 samples from a seven-node nonlinear… view at source ↗

read the original abstract

Recent work on Kan-Do-Calculus (KDC) has established that the boundary between passive observation and active intervention in causal inference is a category-theoretic bi-adjunction, with interventions modeled by left Kan extensions and conditioning by right Kan extensions. This paper introduces two causal discovery algorithms under latent confounding, building on the information-geometric and categorical consequences of KDC. In smooth statistical settings, Radon-Nikodym derivatives between observational and interventional measures induce local causal vector fields; failures of these fields to close under Lie brackets become computable Frobenius residuals, which we interpret as witnesses of failed visible integrability and possible latent or unmodeled structure. Our first algorithm, BRIDGE (Bracket Residuals for Interventional Discovery and Geometric Estimation), combines an interventional density or Radon-Nikodym-ratio engine with a geometric screen that proposes a high-recall family of admissible arrows, identifies non-closing visible pairs as latent-obstruction candidates, and passes the reduced family to downstream score-based or differentiable discovery routines. The second algorithmic contribution, Spectral Kan-Do Flow Matching (SKFM), learns amortized intervention fields and factors latent curvature spectrally, exposing the direct Lie-space endpoint toward which BRIDGE points. A detailed set of experiments show that both algorithms are capable of discovering causal models with latent confounders while collapsing the super-exponential space of possible DAGs by many orders of magnitude. This paper introduces a new paradigm in causal discovery, where latent structure is inferred directly from the geometry of intervention-induced flows.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Lie-bracket residual idea is a distinct geometric move on top of KDC, but the step from non-closing brackets to latent confounders still needs an explicit theorem.

read the letter

The paper's main contribution is BRIDGE, which builds vector fields from Radon-Nikodym ratios between observational and interventional measures, then treats non-vanishing Lie brackets as Frobenius residuals that flag possible latent structure and prune the space of admissible arrows before handing off to standard discovery methods. SKFM adds a spectral factorization step on the learned intervention fields. Both are presented as new and are claimed to cut the DAG search space by orders of magnitude in the reported experiments.

That framing is genuinely different from the usual score-based or constraint-based approaches, and it does give a concrete way to operationalize the earlier Kan-Do-Calculus work. If the geometric screen works as described, it could be useful for high-recall filtering in settings with hidden variables.

The soft spot is the missing link between non-involutivity and latent confounding specifically. Non-closing brackets signal failed integrability, but the abstract does not supply a result showing this happens if and only if (or even typically when) a confounder is present rather than from estimation error, intervention kernel choice, or other model departures. The stress-test note captures this exactly. Without that justification the high-recall filter rests on an interpretive step whose reliability is not yet shown.

The work is aimed at researchers already comfortable with categorical and information-geometric causal inference. A reader who has followed the KDC papers will see what is being extended. It is coherent on its own terms and engages the literature directly, so it deserves a serious referee even though the central geometric claim needs more support in the full text.

Referee Report

3 major / 2 minor

Summary. The paper claims that in smooth statistical settings, Radon-Nikodym derivatives between observational and interventional measures induce local causal vector fields whose failures to close under Lie brackets yield computable Frobenius residuals; these residuals are interpreted as witnesses of failed visible integrability and latent or unmodeled structure. Building on Kan-Do-Calculus, it introduces BRIDGE, which combines an interventional density engine with a geometric screen to produce a high-recall family of admissible arrows and pass a reduced DAG family to downstream discovery routines, and SKFM, which learns amortized intervention fields and factors latent curvature spectrally. Experiments are reported to show both algorithms discover causal models with latent confounders while collapsing the space of possible DAGs by many orders of magnitude.

Significance. If the central geometric link holds, the work would offer a novel paradigm for causal discovery under latent confounding by directly extracting latent-structure signals from intervention-induced flows rather than exhaustive search or score-based enumeration. The reported ability to reduce super-exponential DAG spaces by orders of magnitude, if substantiated with reproducible code and controlled baselines, would constitute a practical strength for scaling discovery algorithms.

major comments (3)

[§4] §4 (BRIDGE algorithm description): the interpretation of non-vanishing Lie brackets as reliable witnesses of latent confounders (as opposed to finite-sample error, intervention-kernel choice, or departures from the smooth model) is presented without a supporting theorem establishing that non-involutivity arises if and only if (or even if) a latent confounder is present. This interpretive step is load-bearing for the high-recall admissible-arrow filter.
[§5] §5 (experimental evaluation): the claim that both algorithms collapse the space of possible DAGs by many orders of magnitude is stated without reporting the number of observed variables, sample sizes, number of latent confounders, or quantitative comparison against standard baselines such as FCI or NOTEARS; without these controls the practical significance of the reduction cannot be assessed.
[§3.1] §3.1 (vector-field construction): the definition of the causal vector fields via Radon-Nikodym derivatives between observational and interventional measures is given, but no error analysis or consistency result is supplied showing that the estimated brackets converge to the population quantities under standard regularity conditions; this directly affects the soundness of the Frobenius-residual screen.

minor comments (2)

Notation for the Lie bracket operator and the Radon-Nikodym ratio is introduced without an explicit glossary or consistent symbol table, making cross-references between the geometric and algorithmic sections difficult to follow.
The abstract states that the boundary between observation and intervention is a category-theoretic bi-adjunction, but the main text does not restate the relevant KDC adjunction diagrams or functors, forcing the reader to consult prior work for the categorical foundation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below with clarifications and commitments to revision where the manuscript can be strengthened without misrepresentation.

read point-by-point responses

Referee: [§4] §4 (BRIDGE algorithm description): the interpretation of non-vanishing Lie brackets as reliable witnesses of latent confounders (as opposed to finite-sample error, intervention-kernel choice, or departures from the smooth model) is presented without a supporting theorem establishing that non-involutivity arises if and only if (or even if) a latent confounder is present. This interpretive step is load-bearing for the high-recall admissible-arrow filter.

Authors: The current manuscript presents the non-vanishing brackets as geometric indicators of failed visible integrability within the Kan-Do-Calculus bi-adjunction, supporting a high-recall filter rather than a strict equivalence. We will revise §4 to include an explicit discussion of the interpretive scope, potential sources of non-involutivity such as estimation error, and the conditions under which the screen operates in the smooth setting. revision: partial
Referee: [§5] §5 (experimental evaluation): the claim that both algorithms collapse the space of possible DAGs by many orders of magnitude is stated without reporting the number of observed variables, sample sizes, number of latent confounders, or quantitative comparison against standard baselines such as FCI or NOTEARS; without these controls the practical significance of the reduction cannot be assessed.

Authors: We agree that the experimental claims require these supporting details for proper assessment. The revised version will report the number of observed variables, sample sizes, number of latent confounders, and include direct quantitative comparisons to FCI and NOTEARS. revision: yes
Referee: [§3.1] §3.1 (vector-field construction): the definition of the causal vector fields via Radon-Nikodym derivatives between observational and interventional measures is given, but no error analysis or consistency result is supplied showing that the estimated brackets converge to the population quantities under standard regularity conditions; this directly affects the soundness of the Frobenius-residual screen.

Authors: We will add to §3.1 an error analysis together with a consistency result establishing convergence of the estimated brackets to the population quantities under standard regularity conditions (e.g., sufficient smoothness of the densities). This will be stated as a proposition with a sketch of the argument. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation introduces new algorithms on top of cited prior framework

full rationale

The provided text (abstract plus context) describes the paper as building new algorithms BRIDGE and SKFM on the consequences of prior KDC work, using Radon-Nikodym derivatives to induce vector fields whose Lie brackets yield Frobenius residuals interpreted as latent-structure signals. No equations or definitions are shown that reduce the target outputs (admissible-arrow filters, spectral factorization) to the inputs by construction. The KDC reference is to prior work establishing a bi-adjunction; the present paper adds geometric screening and amortized flow matching as independent algorithmic content. No self-definitional loop, fitted-input prediction, or load-bearing uniqueness theorem from overlapping authors is exhibited. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; full details unavailable. The central approach rests on the assumption that smooth statistical settings allow well-defined vector fields from density ratios and that bracket non-closure directly signals latent structure.

axioms (1)

domain assumption Kan-Do-Calculus establishes a category-theoretic bi-adjunction between passive observation and active intervention
Paper states it builds directly on recent KDC results.

pith-pipeline@v0.9.1-grok · 5791 in / 1083 out tokens · 23948 ms · 2026-06-26T20:35:05.110186+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Infinitesimal Causality
math.CT 2026-06 unverdicted novelty 7.0

Infinitesimal causality is defined via compatibility of categorical and geometric Frobenius structures in Markov categories, with interventions as tangent vectors deforming copy/discard operations and Lie brackets mea...

Reference graph

Works this paper leans on

25 extracted references · 18 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

doi:10.1007/s002110050002. P. J. Bickel, C. A. J. Klaassen, Y . Ritov, and J. A. Wellner.Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press,

work page doi:10.1007/s002110050002
[2]

doi:10.1007/s10851-014-0506-3. P. Brouillard, B. Lachapelle, S. Lacoste-Julien, A. Lacoste, and B. Oreshkin. Differentiable causal discovery from interventional data. InAdvances in Neural Information Processing Systems,

work page doi:10.1007/s10851-014-0506-3
[3]

doi:10.1111/ectj.12097. D. M. Chickering. Optimal structure identification with greedy equivalence search.Journal of Machine Learning Research, 3:507–554,

work page doi:10.1111/ectj.12097
[5]

URLhttps://arxiv.org/abs/1510.05468. B. Coecke and A. Kissinger.Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press, Cambridge,

Pith/arXiv arXiv
[6]

doi:10.1017/9781316219317. D. Colombo, M. H. Maathuis, M. Kalisch, and T. S. Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables.The Annals of Statistics, 40(1):294–321,

work page doi:10.1017/9781316219317
[7]

doi:10.1214/11-AOS940. M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26,

work page doi:10.1214/11-aos940
[8]

doi:10.1016/S0004-3702(02)00264-3. C. Glymour, K. Zhang, and P. Spirtes. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10,

work page doi:10.1016/s0004-3702(02)00264-3
[9]

Frontiers in Genetics , author =

ISSN 1664-8021. doi:10.3389/fgene.2019.00524. URL https://www.frontiersin.org/ journals/genetics/articles/10.3389/fgene.2019.00524. J. Gorham and L. Mackey. Measuring sample quality with kernels. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1292–1301. PMLR,

work page doi:10.3389/fgene.2019.00524 2019
[10]

37 APREPRINT- JUNE19, 2026 T

URL https://www.jmlr.org/papers/v14/ hyvarinen13a.html. 37 APREPRINT- JUNE19, 2026 T. Ikeuchi, M. Ide, Y . Zeng, T. N. Maeda, and S. Shimizu. Python package for causal discovery based on LiNGAM. Journal of Machine Learning Research, 24(14):1–8,

2026
[11]

doi:10.1017/S096012952100027X. S. Lachapelle, P. Brouillard, T. Deleu, and S. Lacoste-Julien. Gradient-based neural dag learning. InInternational Conference on Learning Representations,

work page doi:10.1017/s096012952100027x
[12]

Lee.Introduction to Smooth Manifolds, volume 218 ofGraduate Texts in Mathe- matics

ISBN 978-1-4419-9982-5. doi:10.1007/978-1-4419-9982-5. Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations,

work page doi:10.1007/978-1-4419-9982-5
[13]

URLhttps://proceedings.mlr.press/v161/maeda21a.html. S. Mahadevan. Decentralized causal discovery using judo calculus, 2025a. URL https://arxiv.org/abs/2510. 23942. S. Mahadevan. Large causal models from large language models, 2025b. URL https://arxiv.org/abs/2512. 07796. S. Mahadevan. Higher algebraic k-theory of causality.Entropy, 27(5), 2025c. ISSN 109...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.3390/e27050531
[14]

38 APREPRINT- JUNE19, 2026 J

ISBN 052189560X. 38 APREPRINT- JUNE19, 2026 J. Pearl and D. Mackenzie.The Book of Why: The New Science of Cause and Effect. Basic Books, New York,

2026
[15]

doi:10.1016/0024- 3795(94)00211-8. D. J. Rezende and S. Mohamed. Variational inference with normalizing flows. InProceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 1530–1538. PMLR,

work page doi:10.1016/0024-
[16]

doi:10.1080/01621459.1994.10476818. D. Schmid and A. Sly. On the number and size of markov equivalence classes of random directed acyclic graphs. arXiv:2209.04395v2 [math.PR],

work page doi:10.1080/01621459.1994.10476818 1994
[17]

Submitted 2022; revised

URL https://arxiv.org/abs/2209.04395. Submitted 2022; revised

arXiv 2022
[18]

doi:10.48550/arXiv.2209.04395. S. Shimizu, P. O. Hoyer, A. Hyv¨arinen, and A. Kerminen. A linear non-Gaussian acyclic model for causal discov- ery.Journal of Machine Learning Research, 7:2003–2030,

work page doi:10.48550/arxiv.2209.04395 2003
[19]

doi:10.1063/1.1788852. M. Sugiyama, T. Suzuki, and T. Kanamori.Density Ratio Estimation in Machine Learning. Cambridge University Press,

work page doi:10.1063/1.1788852
[20]

doi:10.1017/CBO9781139035613. T. Tashiro, S. Shimizu, A. Hyv¨arinen, and T. Washio. ParceLiNGAM: A causal ordering method robust against latent confounders.Neural Computation, 26(1):57–83,

work page doi:10.1017/cbo9781139035613
[21]

doi:10.1162/NECO a 00533. P. S. Thomas, W. Dabney, S. Giguere, and S. Mahadevan. Projected natural actor-critic. In C. J. C. Burges, L. Bottou, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems

work page doi:10.1162/neco
[22]

Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 2337–2345,

2013
[24]

URLhttps://arxiv.org/abs/2506.05202. R. van Belle.Kan Extensions in Probability Theory. PhD thesis, University of Edinburgh,

arXiv
[25]

doi:10.1007/978-1-4419-9782-1. C. Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin,

work page doi:10.1007/978-1-4419-9782-1
[26]

doi:10.1007/978-3-540-71050-9. D. Yeral. Frobenius algebras, factorization homology and the Reshetikhin–Turaev invariants,

work page doi:10.1007/978-3-540-71050-9
[27]

URL https: //arxiv.org/abs/2508.16351. X. Zheng, B. Aragam, P. Ravikumar, and E. Xing. DAGs with NO TEARS: Continuous Optimization for Structure Learning.Advances in Neural Information Processing Systems,

arXiv

[1] [1]

doi:10.1007/s002110050002. P. J. Bickel, C. A. J. Klaassen, Y . Ritov, and J. A. Wellner.Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press,

work page doi:10.1007/s002110050002

[2] [2]

doi:10.1007/s10851-014-0506-3. P. Brouillard, B. Lachapelle, S. Lacoste-Julien, A. Lacoste, and B. Oreshkin. Differentiable causal discovery from interventional data. InAdvances in Neural Information Processing Systems,

work page doi:10.1007/s10851-014-0506-3

[3] [3]

doi:10.1111/ectj.12097. D. M. Chickering. Optimal structure identification with greedy equivalence search.Journal of Machine Learning Research, 3:507–554,

work page doi:10.1111/ectj.12097

[4] [5]

URLhttps://arxiv.org/abs/1510.05468. B. Coecke and A. Kissinger.Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press, Cambridge,

Pith/arXiv arXiv

[5] [6]

doi:10.1017/9781316219317. D. Colombo, M. H. Maathuis, M. Kalisch, and T. S. Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables.The Annals of Statistics, 40(1):294–321,

work page doi:10.1017/9781316219317

[6] [7]

doi:10.1214/11-AOS940. M. Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. InAdvances in Neural Information Processing Systems, volume 26,

work page doi:10.1214/11-aos940

[7] [8]

doi:10.1016/S0004-3702(02)00264-3. C. Glymour, K. Zhang, and P. Spirtes. Review of causal discovery methods based on graphical models.Frontiers in Genetics, 10,

work page doi:10.1016/s0004-3702(02)00264-3

[8] [9]

Frontiers in Genetics , author =

ISSN 1664-8021. doi:10.3389/fgene.2019.00524. URL https://www.frontiersin.org/ journals/genetics/articles/10.3389/fgene.2019.00524. J. Gorham and L. Mackey. Measuring sample quality with kernels. InProceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1292–1301. PMLR,

work page doi:10.3389/fgene.2019.00524 2019

[9] [10]

37 APREPRINT- JUNE19, 2026 T

URL https://www.jmlr.org/papers/v14/ hyvarinen13a.html. 37 APREPRINT- JUNE19, 2026 T. Ikeuchi, M. Ide, Y . Zeng, T. N. Maeda, and S. Shimizu. Python package for causal discovery based on LiNGAM. Journal of Machine Learning Research, 24(14):1–8,

2026

[10] [11]

doi:10.1017/S096012952100027X. S. Lachapelle, P. Brouillard, T. Deleu, and S. Lacoste-Julien. Gradient-based neural dag learning. InInternational Conference on Learning Representations,

work page doi:10.1017/s096012952100027x

[11] [12]

Lee.Introduction to Smooth Manifolds, volume 218 ofGraduate Texts in Mathe- matics

ISBN 978-1-4419-9982-5. doi:10.1007/978-1-4419-9982-5. Y . Lipman, R. T. Q. Chen, H. Ben-Hamu, M. Nickel, and M. Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations,

work page doi:10.1007/978-1-4419-9982-5

[12] [13]

URLhttps://proceedings.mlr.press/v161/maeda21a.html. S. Mahadevan. Decentralized causal discovery using judo calculus, 2025a. URL https://arxiv.org/abs/2510. 23942. S. Mahadevan. Large causal models from large language models, 2025b. URL https://arxiv.org/abs/2512. 07796. S. Mahadevan. Higher algebraic k-theory of causality.Entropy, 27(5), 2025c. ISSN 109...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.3390/e27050531

[13] [14]

38 APREPRINT- JUNE19, 2026 J

ISBN 052189560X. 38 APREPRINT- JUNE19, 2026 J. Pearl and D. Mackenzie.The Book of Why: The New Science of Cause and Effect. Basic Books, New York,

2026

[14] [15]

doi:10.1016/0024- 3795(94)00211-8. D. J. Rezende and S. Mohamed. Variational inference with normalizing flows. InProceedings of the 32nd International Conference on Machine Learning, volume 37 ofProceedings of Machine Learning Research, pages 1530–1538. PMLR,

work page doi:10.1016/0024-

[15] [16]

doi:10.1080/01621459.1994.10476818. D. Schmid and A. Sly. On the number and size of markov equivalence classes of random directed acyclic graphs. arXiv:2209.04395v2 [math.PR],

work page doi:10.1080/01621459.1994.10476818 1994

[16] [17]

Submitted 2022; revised

URL https://arxiv.org/abs/2209.04395. Submitted 2022; revised

arXiv 2022

[17] [18]

doi:10.48550/arXiv.2209.04395. S. Shimizu, P. O. Hoyer, A. Hyv¨arinen, and A. Kerminen. A linear non-Gaussian acyclic model for causal discov- ery.Journal of Machine Learning Research, 7:2003–2030,

work page doi:10.48550/arxiv.2209.04395 2003

[18] [19]

doi:10.1063/1.1788852. M. Sugiyama, T. Suzuki, and T. Kanamori.Density Ratio Estimation in Machine Learning. Cambridge University Press,

work page doi:10.1063/1.1788852

[19] [20]

doi:10.1017/CBO9781139035613. T. Tashiro, S. Shimizu, A. Hyv¨arinen, and T. Washio. ParceLiNGAM: A causal ordering method robust against latent confounders.Neural Computation, 26(1):57–83,

work page doi:10.1017/cbo9781139035613

[20] [21]

doi:10.1162/NECO a 00533. P. S. Thomas, W. Dabney, S. Giguere, and S. Mahadevan. Projected natural actor-critic. In C. J. C. Burges, L. Bottou, Z. Ghahramani, and K. Q. Weinberger, editors,Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems

work page doi:10.1162/neco

[21] [22]

Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States, pages 2337–2345,

2013

[22] [24]

URLhttps://arxiv.org/abs/2506.05202. R. van Belle.Kan Extensions in Probability Theory. PhD thesis, University of Edinburgh,

arXiv

[23] [25]

doi:10.1007/978-1-4419-9782-1. C. Villani.Optimal Transport: Old and New, volume 338 ofGrundlehren der mathematischen Wissenschaften. Springer, Berlin,

work page doi:10.1007/978-1-4419-9782-1

[24] [26]

doi:10.1007/978-3-540-71050-9. D. Yeral. Frobenius algebras, factorization homology and the Reshetikhin–Turaev invariants,

work page doi:10.1007/978-3-540-71050-9

[25] [27]

URL https: //arxiv.org/abs/2508.16351. X. Zheng, B. Aragam, P. Ravikumar, and E. Xing. DAGs with NO TEARS: Continuous Optimization for Structure Learning.Advances in Neural Information Processing Systems,

arXiv