pith. sign in

arxiv: 2605.17465 · v1 · pith:YWOXCNDVnew · submitted 2026-05-17 · 💻 cs.LG

TriOpt: A Scalable Algorithm for Linear Causal Discovery

Pith reviewed 2026-05-20 14:59 UTC · model grok-4.3

classification 💻 cs.LG
keywords linear causal discoverytopological orderingconvex optimizationDAG learningscalabilitySherman-Morrisonacyclicity constraints
0
0 comments X

The pith

TriOpt recovers the exact linear DAG by first estimating its topological ordering with rank-1 downdates and then solving a convex problem without acyclicity constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TriOpt, a two-stage method for linear causal discovery that addresses the super-exponential growth of the graph search space. The first stage recovers a topological ordering from observational data by applying the Sherman-Morrison rank-1 downdate to the additive structure of linear kernels. With the ordering in hand, the second stage casts structure learning as a convex continuous optimization task that needs no separate acyclicity penalty because the order already rules out cycles. The authors prove that this procedure exactly recovers the underlying linear DAG whenever the supplied ordering is correct. The resulting algorithm runs orders of magnitude faster than prior linear methods while preserving accuracy on synthetic, semi-synthetic, and real data.

Core claim

TriOpt decomposes linear causal discovery into two stages. First, it recovers the topological ordering by exploiting the Sherman-Morrison rank-1 downdate together with the additive structure of linear kernels. Second, given this ordering, it reformulates structure learning as a convex continuous optimization problem that avoids enforcing acyclicity constraints. The authors show that, under the true ordering, TriOpt exactly recovers the underlying linear DAG.

What carries the argument

Two-stage decomposition that uses Sherman-Morrison rank-1 downdates for ordering recovery and then solves a convex program without acyclicity enforcement.

If this is right

  • Whenever the true ordering is supplied, TriOpt returns exactly the edge coefficients of the linear DAG.
  • The absence of an acyclicity term in the second stage removes a major computational bottleneck that limits other continuous methods.
  • The ordering step scales linearly with the number of variables because each downdate is rank-1.
  • The overall procedure therefore extends practical causal discovery to regimes where exhaustive or penalty-based search becomes infeasible.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the ordering estimator can be made robust to modest noise, the same split could be reused as a fast preprocessor for other structure-learning algorithms.
  • The separation of ordering from coefficient estimation may simplify theoretical analysis for mildly nonlinear or heteroscedastic models if analogous kernel identities exist.
  • In applied settings one could run the fast ordering step on subsamples to quantify uncertainty before committing to the convex solve.

Load-bearing premise

The first stage must correctly recover the true topological ordering from the observational data by using the Sherman-Morrison rank-1 downdate on linear kernels.

What would settle it

A linear Gaussian dataset whose ground-truth ordering is known in advance, yet TriOpt returns a different ordering and therefore a different graph from the true DAG.

Figures

Figures reproduced from arXiv: 2605.17465 by Elena Zheleva, Rafat Ashraf Joy.

Figure 1
Figure 1. Figure 1: Performance comparison across graph types and degrees under [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Results on Nonlinear Data. TriOpt stays robust, delivering accuracy that is better (exponen [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Normalized SHD (Lower Better) vs α. TriOpt consistently demonstrates stronger robustness across the entire weight range than the baseline methods 19 [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Robustness of the models with increase in degree. Both versions of TriOpt keep up the best [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of Ordering time and Peak Memory usage by 2 variants of TriOpt and CaPS: [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of Ordering time and Peak Memory usage by 2 variants of TriOpt on 5000 [PITH_FULL_IMAGE:figures/full_fig_p022_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of Order Divergence by 2 variants of TriOpt and CaPS on 2000 samples. TriOpt [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of Order Divergence by 2 variants of TriOpt on 5000 samples. There is [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Performance comparison on 5000 samples. Results are shown for (a) runtime, (b) structural [PITH_FULL_IMAGE:figures/full_fig_p025_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Performance comparison on 2000 samples. Results are shown for (a) runtime, (b) structural [PITH_FULL_IMAGE:figures/full_fig_p026_10.png] view at source ↗
read the original abstract

Learning causal relations from observational data is challenging because the graph search space grows super-exponentially with the number of variables. Ordering-based methods reduce this space by first identifying the topological ordering, whereas continuous optimization methods explore most likely regions of the space by casting DAG learning as a differentiable objective with an acyclicity constraint. Despite their conceptual appeal, both paradigms face significant scalability limitations in high-dimensional settings, restricting their practical applicability. In this work, we introduce a new formulation for linear causal discovery that tightly integrates these two paradigms to achieve substantial gains in scalability without sacrificing accuracy. Our approach, TriOpt, decomposes the problem into two efficient stages. First, it recovers the topological ordering by exploiting the Sherman-Morrison rank-1 downdate together with the additive structure of linear kernels, enabling fast and scalable ordering estimation. Second, given this ordering, we reformulate structure learning as a convex continuous optimization problem that entirely avoids the need for enforcing costly acyclicity constraints. We theoretically show that, under the true ordering, TriOpt exactly recovers the underlying linear DAG. Empirically, across synthetic, semi-synthetic, and real-world datasets, TriOpt achieves orders-of-magnitude speedups over state-of-the-art linear causal discovery methods in high-dimensional regimes, while maintaining comparable or superior accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces TriOpt, a two-stage algorithm for linear causal discovery from observational data. The first stage recovers the topological ordering by exploiting the Sherman-Morrison rank-1 downdate together with the additive structure of linear kernels. The second stage reformulates structure learning as an unconstrained convex continuous optimization problem given the ordering, entirely avoiding acyclicity constraints. The authors theoretically show that, under the true ordering, TriOpt exactly recovers the underlying linear DAG. Empirically, it reports orders-of-magnitude speedups over state-of-the-art methods on synthetic, semi-synthetic, and real-world datasets while maintaining comparable accuracy.

Significance. If the conditional exact-recovery result holds and the ordering stage is reliable under standard linear SEM assumptions, this work offers a meaningful advance in scalable causal discovery by combining ordering-based reduction with convex optimization to sidestep costly acyclicity enforcement. The use of rank-1 downdates for efficient ordering estimation and the parameter-free theoretical guarantee (conditional on ordering) are notable strengths that could enable application to higher-dimensional problems.

major comments (1)
  1. [Theoretical Analysis (around the main recovery theorem)] The central theoretical guarantee (exact recovery of the linear DAG) is explicitly conditional on the true ordering being recovered in stage one. While this is clearly stated, the manuscript would benefit from additional analysis or bounds characterizing the success probability of the ordering estimator under the linear SEM assumptions (e.g., minimum edge strength or noise variance), as this step is load-bearing for the practical applicability of the overall claim.
minor comments (2)
  1. [Experiments] The experimental section would be strengthened by including more precise details on hyperparameter selection for baselines and the exact procedure for generating semi-synthetic ground-truth DAGs and data.
  2. [Method (ordering recovery subsection)] Notation for the linear kernel and the downdate formula could be clarified with an explicit equation reference in the ordering stage description to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation for minor revision. We address the major comment below.

read point-by-point responses
  1. Referee: The central theoretical guarantee (exact recovery of the linear DAG) is explicitly conditional on the true ordering being recovered in stage one. While this is clearly stated, the manuscript would benefit from additional analysis or bounds characterizing the success probability of the ordering estimator under the linear SEM assumptions (e.g., minimum edge strength or noise variance), as this step is load-bearing for the practical applicability of the overall claim.

    Authors: We appreciate the referee's suggestion regarding the ordering stage. Our theoretical result establishes exact recovery of the linear DAG under the assumption that the true topological ordering has been recovered, which is a deliberate design choice to isolate the contribution of the convex optimization stage and avoid acyclicity constraints. Deriving explicit probabilistic bounds on the success of the Sherman-Morrison-based ordering estimator would require additional assumptions (such as minimum edge strengths or bounds on noise variance) that are not part of the standard linear SEM setup used in the paper and could restrict the generality of the claims. We have instead provided extensive empirical evidence across synthetic, semi-synthetic, and real-world datasets demonstrating reliable ordering recovery in high-dimensional regimes. We therefore do not plan to incorporate new theoretical bounds on the ordering estimator in the revision. revision: no

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained conditional result

full rationale

The paper presents a two-stage algorithm where the second stage's exact-recovery theorem is explicitly conditioned on receiving the true topological ordering from the first stage. This is a standard conditional identifiability claim under linear SEM assumptions (additive noise, known covariance) and does not reduce the final DAG estimate to a fitted quantity or self-citation by construction. The first-stage ordering recovery via Sherman-Morrison rank-1 downdates on linear kernels is described as an independent algorithmic step without equations that define the recovered ordering in terms of the downstream DAG parameters. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation are visible in the abstract or reader's summary. The central claim therefore remains non-circular and externally falsifiable given the ordering.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the standard linear SEM assumption and on the claim that the ordering stage works via Sherman-Morrison updates; no new entities are introduced and no free parameters are explicitly fitted in the abstract description.

axioms (2)
  • domain assumption Observational data is generated by a linear structural equation model whose graph is a DAG.
    This is the foundational modeling assumption for all linear causal discovery methods referenced in the abstract.
  • domain assumption The topological ordering can be recovered accurately by exploiting the Sherman-Morrison rank-1 downdate and the additive structure of linear kernels.
    This premise is required for the first stage and for the subsequent exact-recovery guarantee to apply.

pith-pipeline@v0.9.0 · 5759 in / 1429 out tokens · 45116 ms · 2026-05-20T14:59:37.743228+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    The Annals of Statistics , volume=

    CAM: Causal additive models, high-dimensional order search and penalized regression , author=. The Annals of Statistics , volume=. 2014 , publisher=

  2. [2]

    Advances in Neural Information Processing Systems , volume=

    On the role of sparsity and dag constraints for learning linear dags , author=. Advances in Neural Information Processing Systems , volume=

  3. [3]

    ACM Trans

    Xu, Shuyuan and Xu, Da and Korpeoglu, Evren and Kumar, Sushant and Guo, Stephen and Achan, Kannan and Zhang, Yongfeng , title =. ACM Trans. Recomm. Syst. , month = oct, articleno =. 2024 , issue_date =. doi:10.1145/3680296 , abstract =

  4. [4]

    Molecular Genetics & Genomic Medicine , volume=

    A review of causal discovery methods for molecular network analysis , author=. Molecular Genetics & Genomic Medicine , volume=. 2022 , publisher=

  5. [5]

    International Conference on Learning Representations , year =

    Gradient Estimators for Implicit Models , author =. International Conference on Learning Representations , year =

  6. [6]

    Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory , volume=

    A bound for the error in the normal approximation to the distribution of a sum of dependent random variables , author=. Proceedings of the sixth Berkeley symposium on mathematical statistics and probability, volume 2: Probability theory , volume=. 1972 , organization=

  7. [7]

    Annals of Mathematical Statistics , year=

    Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , author=. Annals of Mathematical Statistics , year=

  8. [8]

    Advances in Neural Information Processing Systems , volume=

    Ordering-based causal discovery for linear and nonlinear relations , author=. Advances in Neural Information Processing Systems , volume=

  9. [9]

    Proceedings of the 39th International Conference on Machine Learning , pages =

    Score Matching Enables Causal Discovery of Nonlinear Additive Noise Models , author =. Proceedings of the 39th International Conference on Machine Learning , pages =. 2022 , editor =

  10. [10]

    Science , volume=

    Causal protein-signaling networks derived from multiparameter single-cell data , author=. Science , volume=. 2005 , publisher=

  11. [11]

    International Journal of Epidemiology , volume =

    Rodrigues, Daniela and Kreif, Noemi and Lawrence-Jones, Anna and Barahona, Mauricio and Mayer, Erik , title =. International Journal of Epidemiology , volume =. 2022 , month =. doi:10.1093/ije/dyac135 , url =

  12. [12]

    2021 , eprint=

    gCastle: A Python Toolbox for Causal Discovery , author=. 2021 , eprint=

  13. [13]

    Advances in neural information processing systems , volume=

    Dags with no tears: Continuous optimization for structure learning , author=. Advances in neural information processing systems , volume=

  14. [14]

    The Eleventh International Conference on Learning Representations , year=

    Diffusion Models for Causal Discovery via Topological Ordering , author=. The Eleventh International Conference on Learning Representations , year=

  15. [15]

    Advances in Neural Information Processing Systems , volume=

    Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization , author=. Advances in Neural Information Processing Systems , volume=

  16. [16]

    The Annals of Statistics , number =

    Peter B. The Annals of Statistics , number =. 2014 , doi =

  17. [17]

    The annals of Statistics , pages=

    Estimation of the mean of a multivariate normal distribution , author=. The annals of Statistics , pages=. 1981 , publisher=

  18. [18]

    2024 , doi =

    Leonard Henckel and Theo Würtzen and Sebastian Weichwald , booktitle =. 2024 , doi =

  19. [19]

    Estimation of Non-Normalized Statistical Models by Score Matching , journal =

    Aapo Hyv. Estimation of Non-Normalized Statistical Models by Score Matching , journal =. 2005 , volume =

  20. [20]

    2007 , institution=

    Exploring network structure, dynamics, and function using NetworkX , author=. 2007 , institution=

  21. [21]

    Complex syst , volume=

    The igraph software , author=. Complex syst , volume=

  22. [22]

    SIAM Journal on Matrix Analysis and Applications , volume=

    The scaling and squaring method for the matrix exponential revisited , author=. SIAM Journal on Matrix Analysis and Applications , volume=. 2005 , publisher=

  23. [23]

    BMC bioinformatics , volume=

    SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , author=. BMC bioinformatics , volume=. 2006 , publisher=

  24. [24]

    , author=

    A linear non-Gaussian acyclic model for causal discovery. , author=. Journal of Machine Learning Research , volume=

  25. [25]

    Advances in neural information processing systems , volume=

    Nonlinear causal discovery with additive noise models , author=. Advances in neural information processing systems , volume=

  26. [26]

    Conference on Causal Learning and Reasoning , pages=

    Causal discovery for non-stationary non-linear time series data using just-in-time modeling , author=. Conference on Causal Learning and Reasoning , pages=. 2023 , organization=

  27. [27]

    BMC systems biology , volume=

    From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , author=. BMC systems biology , volume=. 2007 , publisher=

  28. [28]

    Scutari , journal =

    M. Scutari , journal =. 2010 , volume =

  29. [29]

    GW Stewart: Selected Works with Commentaries , pages=

    Updating and Downdating Matrix Decompositions , author=. GW Stewart: Selected Works with Commentaries , pages=. 2010 , publisher=