pith. sign in

arxiv: 2605.05568 · v1 · submitted 2026-05-07 · 📊 stat.ML · cs.LG

Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale

Pith reviewed 2026-05-08 06:02 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords causal discoverysparsest permutationincomplete Cholesky factorizationMarkov equivalence classstructural equation modelsprecision matrixcausal structure learningscalability
0
0 comments X

The pith

A relaxed sparsest-permutation method recovers Markov equivalence classes in linear causal models without requiring exact Cholesky factorization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Causal structure learning from observational data remains slow for large numbers of variables. The paper shows that sparsest-permutation approaches for linear structural equation models do not need exact Cholesky factorization to identify the Markov equivalence class. Instead, a support-level relaxation searches over a precision-support screening graph and evaluates candidate orderings with masked zero-fill incomplete Cholesky factorization. This yields population-level soundness guarantees under no-cancellation and sparsest Markov representation assumptions, plus robustness to ordering misspecification. The resulting SCOPE pipeline matches the accuracy of slower methods while scaling to 10,000 variables on synthetic and real data.

Core claim

We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recovery. This observation motivates a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph. The relaxed formulation can be efficiently evaluated via masked zero-fill incomplete Cholesky factorization, enabling scalable comparison of candidate orderings. At the population level, we establish soundness for Markov equivalence class recovery under no-cancellation and sparsest Markov representation assumptions, as well as robustness to ordering misspecification.

What carries the argument

The support-level relaxation of the sparsest-permutation formulation, evaluated via masked zero-fill incomplete Cholesky factorization over a precision-support screening graph.

If this is right

  • The method enables direct comparison of many candidate orderings at scale.
  • Markov equivalence class recovery remains sound at the population level under the stated assumptions.
  • The pipeline scales to 10,000 variables while matching the accuracy of substantially slower baselines.
  • The approach tolerates some misspecification in the input ordering without losing soundness.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The screening graph step may become the new computational bottleneck in even higher dimensions, suggesting room for faster screening heuristics.
  • The relaxation could extend to settings where only partial order information is available from domain knowledge.
  • Similar incomplete factorization tricks might reduce cost in other precision-matrix-based graphical model tasks beyond causal discovery.

Load-bearing premise

The no-cancellation and sparsest Markov representation assumptions must hold for the population-level soundness guarantees on Markov equivalence class recovery.

What would settle it

Run the method on large synthetic linear SEM data that satisfy the no-cancellation and sparsest assumptions and check whether it recovers the true Markov equivalence class as sample size increases, or observe whether runtime and accuracy degrade sharply beyond a few thousand variables.

Figures

Figures reproduced from arXiv: 2605.05568 by Gunwoong Park, Sang-Yun Oh, Sunmin Oh.

Figure 1
Figure 1. Figure 1: Runtime-accuracy tradeoffs for MEC recovery view at source ↗
Figure 2
Figure 2. Figure 2: Nested patterns induced by an ordering: S1 fill view at source ↗
Figure 3
Figure 3. Figure 3: CPDAG F1 versus runtime across problem sizes (bubble size p); × denotes timed-out runs (shaded region). Two one-pass SCOPE results are shown: when given π0 (SCOPE_π0) and when given AMD ordering (SCOPE_AMD). per, we sample bounded in-degree DAGs where each node selects at most d parents among earlier nodes in an ordering. We draw nonzero edge weights uniformly from ±[0.6, 0.8] and sample error variances fr… view at source ↗
Figure 4
Figure 4. Figure 4: DAG, moral graph, and triangulations under rev-topo, min-fill, and min-degree orderings, with sparsity patterns of view at source ↗
Figure 5
Figure 5. Figure 5: CPDAG-skeleton F1. Shaded bands represent ±1 standard error around the mean across 30 replicates view at source ↗
Figure 6
Figure 6. Figure 6: nSHD for CPDAG recovery. Shaded bands represent view at source ↗
read the original abstract

Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recovery. This observation motivates a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph. The relaxed formulation can be efficiently evaluated via masked zero-fill incomplete Cholesky factorization, enabling scalable comparison of candidate orderings. At the population level, we establish soundness for Markov equivalence class (MEC) recovery under no-cancellation and sparsest Markov representation assumptions, as well as robustness to ordering misspecification. Motivated by these guarantees, we introduce SCOPE, a sparse-Cholesky pipeline that provides a scalable implementation of the relaxed formulation. Experiments on synthetic and real datasets demonstrate that SCOPE matches the MEC recovery accuracy of substantially slower baselines, while achieving significantly reduced runtime and scaling to 10k variables.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper revisits sparsest-permutation causal discovery for linear SEMs and argues that exact Cholesky factorization is unnecessary for structure recovery. It introduces a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph, which is evaluated efficiently via masked zero-fill incomplete Cholesky factorization. This enables the SCOPE pipeline to compare candidate orderings at scale. At the population level the authors claim soundness for Markov equivalence class recovery under no-cancellation and sparsest-Markov-representation assumptions, plus robustness to ordering misspecification. Experiments on synthetic and real data show that SCOPE matches the MEC-recovery accuracy of slower baselines while scaling to 10k variables.

Significance. If the stated assumptions hold, the work supplies a theoretically grounded, computationally tractable alternative to existing sparsest-permutation methods, with the potential to extend causal discovery to datasets an order of magnitude larger than current practical limits. The combination of a provably sound relaxation and an efficient factorization routine is a concrete engineering contribution.

major comments (2)
  1. [Abstract / theoretical results] The population-level soundness guarantee for MEC recovery is explicitly conditioned on the no-cancellation assumption and the sparsest Markov representation assumption (Abstract). Because these conditions are load-bearing for the central theoretical claim, the manuscript should include a dedicated discussion or corollary that characterizes the measure of parameter space on which they fail and the resulting error in permutation selection.
  2. [factorization routine] The equivalence between the masked zero-fill incomplete Cholesky factorization and the exact relaxed objective is asserted to hold under the same assumptions, yet the manuscript provides no explicit statement of the numerical tolerance or fill-in threshold at which this equivalence breaks (the section describing the factorization routine). A short lemma or numerical counter-example would clarify the practical range of validity.
minor comments (2)
  1. The abstract introduces the term 'precision-support screening graph' without a one-sentence definition; a brief parenthetical gloss would improve readability for readers outside the immediate subfield.
  2. [experiments] Table captions and axis labels in the experimental section should explicitly state whether reported accuracies are averaged over multiple random seeds and whether error bars represent standard deviation or standard error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation for minor revision. We address each major comment below and will incorporate the requested clarifications into the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract / theoretical results] The population-level soundness guarantee for MEC recovery is explicitly conditioned on the no-cancellation assumption and the sparsest Markov representation assumption (Abstract). Because these conditions are load-bearing for the central theoretical claim, the manuscript should include a dedicated discussion or corollary that characterizes the measure of parameter space on which they fail and the resulting error in permutation selection.

    Authors: We agree that a more explicit discussion of the assumptions would strengthen the paper. In the revision we will add a short dedicated paragraph (new Corollary 3.4) in the theoretical results section. The paragraph will note that, for continuous parameter distributions, the no-cancellation condition fails only on a Lebesgue-measure-zero set and that the sparsest-Markov-representation assumption is generic under the linear SEM model; we will also include a brief analytic bound on the permutation-selection error under small additive perturbations that violate the assumptions, together with a one-paragraph simulation illustration. revision: yes

  2. Referee: [factorization routine] The equivalence between the masked zero-fill incomplete Cholesky factorization and the exact relaxed objective is asserted to hold under the same assumptions, yet the manuscript provides no explicit statement of the numerical tolerance or fill-in threshold at which this equivalence breaks (the section describing the factorization routine). A short lemma or numerical counter-example would clarify the practical range of validity.

    Authors: We thank the referee for highlighting this point. The equivalence holds exactly in exact arithmetic whenever the support mask is respected. In the revision we will insert a short remark (new Remark 4.3) in the incomplete-Cholesky subsection that states the default numerical tolerance (relative residual < 1e-12) and supplies a compact numerical counter-example on a 5-variable instance showing the fill-in threshold at which the masked factorization deviates from the exact relaxed objective. revision: yes

Circularity Check

0 steps flagged

No circularity: soundness derived from explicit external assumptions

full rationale

The paper derives population-level soundness for MEC recovery from two stated domain assumptions (no-cancellation and sparsest Markov representation) plus the definition of the relaxed objective. These assumptions are independent of the proposed method and are not shown to be equivalent to the output by construction. The masked zero-fill incomplete Cholesky is presented as an efficient computational proxy for the relaxed objective, not as a redefinition of the target quantity. No self-citation chains, fitted parameters renamed as predictions, or ansatzes smuggled via prior work appear in the provided text. The central claim therefore remains non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two standard domain assumptions from causal discovery; no free parameters or new invented entities are introduced in the abstract.

axioms (2)
  • domain assumption No-cancellation assumption
    Invoked to establish population-level soundness for MEC recovery.
  • domain assumption Sparsest Markov representation assumption
    Required for the soundness of the relaxed formulation in recovering the Markov equivalence class.

pith-pipeline@v0.9.0 · 5458 in / 1325 out tokens · 79452 ms · 2026-05-08T06:02:21.773640+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

66 extracted references · 66 canonical work pages

  1. [1]

    Nature Biotechnology , year=

    The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , author=. Nature Biotechnology , year=

  2. [2]

    Nature Methods , year=

    Single-cell RNA-sequencing technologies and applications , author=. Nature Methods , year=

  3. [3]

    Biostatistics , volume =

    Sparse inverse covariance estimation with the graphical lasso , author =. Biostatistics , volume =. 2008 , publisher =

  4. [4]

    Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

    Distributionally Robust Formulation and Model Selection for the Graphical Lasso , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS) , series =

  5. [5]

    Stat , volume=

    Family-wise error rate control in Gaussian graphical model selection via distributionally robust optimization , author=. Stat , volume=. 2022 , publisher=

  6. [6]

    Meijerink, J. A. and van der Vorst, H. A. , booktitle =. An iterative solution method for linear systems of which the coefficient matrix is a symmetric. 1977 , publisher =

  7. [7]

    2003 , publisher =

    Iterative Methods for Sparse Linear Systems , author =. 2003 , publisher =

  8. [8]

    Journal of the American Statistical Association , volume =

    Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation , author =. Journal of the American Statistical Association , volume =

  9. [9]

    High-dimensional semiparametric

    Liu, Han and Han, Fang and Yuan, Ming and Lafferty, John and Wasserman, Larry , journal =. High-dimensional semiparametric. 2012 , publisher =

  10. [10]

    The Annals of Statistics , volume =

    High-dimensional covariance estimation by minimizing _1 -penalized log-determinant divergence , author =. The Annals of Statistics , volume =. 2011 , publisher =

  11. [11]

    Journal of Machine Learning Research , year =

    High-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation , author =. Journal of Machine Learning Research , year =

  12. [12]

    Stat , volume =

    Learning directed acyclic graph models based on sparsest permutations , author =. Stat , volume =. 2018 , publisher =

  13. [13]

    Biometrika , volume =

    Joint mean--covariance models with applications to longitudinal data: unconstrained parameterisation , author =. Biometrika , volume =. 1999 , publisher =

  14. [14]

    , booktitle =

    Chickering, David M. , booktitle =. Learning. 1996 , publisher =

  15. [15]

    SIAM Review , volume =

    The Evolution of the Minimum Degree Ordering Algorithm , author =. SIAM Review , volume =

  16. [16]

    SIAM Journal on Matrix Analysis and Applications , volume =

    An Approximate Minimum Degree Ordering Algorithm , author =. SIAM Journal on Matrix Analysis and Applications , volume =

  17. [17]

    and Gilbert, John R

    Berman, Piotr and Garay, Juan A. and Gilbert, John R. , journal =. On the Performance of the Minimum Degree Ordering for

  18. [18]

    Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI) , series =

    Greedy Relaxations of the Sparsest Permutation Algorithm , author =. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI) , series =. 2022 , publisher =

  19. [19]

    Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =

    Equivalence and Synthesis of Causal Models , author =. Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =

  20. [20]

    and Madigan, David and Perlman, Michael D

    Andersson, Steen A. and Madigan, David and Perlman, Michael D. , journal =. A Characterization of

  21. [21]

    SIAM Journal on Computing , volume =

    Algorithmic Aspects of Vertex Elimination on Graphs , author =. SIAM Journal on Computing , volume =

  22. [22]

    2000 , edition =

    Causation, Prediction, and Search , author =. 2000 , edition =

  23. [23]

    Estimating High-Dimensional Directed Acyclic Graphs with the

    Kalisch, Markus and B. Estimating High-Dimensional Directed Acyclic Graphs with the. Journal of Machine Learning Research , volume =

  24. [24]

    Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =

    Optimal Structure Identification with Greedy Search , author =. Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =

  25. [25]

    and Aliferis, Constantin F

    Tsamardinos, Ioannis and Brown, Laura E. and Aliferis, Constantin F. , journal =. The

  26. [26]

    Efficient Permutation Discovery in Causal

    Squires, Chandler and Amaniampong, Joshua and Uhler, Caroline , journal =. Efficient Permutation Discovery in Causal

  27. [27]

    , booktitle =

    Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep and Xing, Eric P. , booktitle =. DAGs with

  28. [28]

    On the Role of Sparsity and

    Ng, Ignavier and Ghassami, AmirEmad and Zhang, Kun , booktitle =. On the Role of Sparsity and. 2020 , pages =

  29. [29]

    Journal of Data Science , volume =

    FROSTY: A High-Dimensional Scale-Free Bayesian Network Learning Method , author =. Journal of Data Science , volume =. 2023 , doi =

  30. [30]

    Nature , volume =

    The Large-Scale Organization of Metabolic Networks , author =. Nature , volume =. 2000 , doi =

  31. [31]

    Nature Reviews Genetics , volume =

    Network biology: understanding the cell's functional organization , author =. Nature Reviews Genetics , volume =. 2004 , doi =

  32. [32]

    SIAM Review , volume =

    The Structure and Function of Complex Networks , author =. SIAM Review , volume =. 2003 , doi =

  33. [33]

    The Annals of Statistics , volume =

    Learning high-dimensional directed acyclic graphs with latent and selection variables , author =. The Annals of Statistics , volume =. 2012 , doi =

  34. [34]

    Davis , title =

    Timothy A. Davis , title =

  35. [35]

    1981 , publisher =

    Computer Solution of Large Sparse Positive Definite Systems , author =. 1981 , publisher =

  36. [36]

    Nature Reviews Neuroscience , volume =

    Complex brain networks: graph theoretical analysis of structural and functional systems , author =. Nature Reviews Neuroscience , volume =. 2009 , doi =

  37. [37]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Optimizing regularized cholesky score for order-based learning of bayesian networks , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

  38. [38]

    Biometrika , volume=

    Consistency guarantees for greedy permutation-based causal inference algorithms , author=. Biometrika , volume=. 2021 , publisher=

  39. [39]

    International Conference on Artificial Intelligence and Statistics , pages=

    Learning linear structural equation models in polynomial time and sample complexity , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2018 , organization=

  40. [40]

    The adaptive and the thresholded

    van de Geer, Sara and B. The adaptive and the thresholded. Electronic Journal of Statistics , year=

  41. [41]

    Bernoulli , volume=

    Least squares after model selection in high-dimensional sparse models , author=. Bernoulli , volume=

  42. [42]

    SIAM Journal on Scientific computing , volume=

    Incomplete Cholesky factorizations with limited memory , author=. SIAM Journal on Scientific computing , volume=. 1999 , publisher=

  43. [43]

    Econometrica , volume =

    Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain , author =. Econometrica , volume =

  44. [44]

    Review of Economic Studies , volume =

    Inference on Treatment Effects after Selection among High-Dimensional Controls , author =. Review of Economic Studies , volume =

  45. [45]

    Annals of Statistics , volume =

    High Dimensional Variable Selection , author =. Annals of Statistics , volume =

  46. [46]

    Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI-05) , pages =

    Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , author =. Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI-05) , pages =. 2005 , publisher =

  47. [47]

    and Dhillon, Inderjit S

    Hsieh, Cho-Jui and Sustik, Matyas A. and Dhillon, Inderjit S. and Ravikumar, Pradeep , title =. Journal of Machine Learning Research , volume =

  48. [48]

    and Dhillon, Inderjit S

    Hsieh, Cho-Jui and Sustik, Matyas A. and Dhillon, Inderjit S. and Ravikumar, Pradeep and Poldrack, Russell A. , title =. Advances in Neural Information Processing Systems 26 , pages =

  49. [49]

    Journal of Applied Probability , year =

    Robust Wasserstein Profile Inference and Applications to Machine Learning , author =. Journal of Applied Probability , year =

  50. [50]

    Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) , pages =

    Permutation-Based Causal Structure Learning with Unknown Intervention Targets , author =. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) , pages =. 2020 , editor =

  51. [51]

    Proceedings of the First Conference on Causal Learning and Reasoning , pages =

    Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors , author =. Proceedings of the First Conference on Causal Learning and Reasoning , pages =. 2022 , editor =

  52. [52]

    Proceedings of the First Conference on Causal Learning and Reasoning , pages =

    Causal Discovery in Linear Structural Causal Models with Deterministic Relations , author =. Proceedings of the First Conference on Causal Learning and Reasoning , pages =. 2022 , editor =

  53. [53]

    Submitted to International Conference on Learning Representations (ICLR) 2024 , year =

    Causal Structure Learning Supervised by Large Language Model , author =. Submitted to International Conference on Learning Representations (ICLR) 2024 , year =

  54. [54]

    Gene Ontology: tool for the unification of biology

    Ashburner, Michael and Ball, Catherine A. and Blake, Judith A. and Botstein, David and Butler, Heather and Cherry, J. Michael and Davis, Allan P. and Dolinski, Kara and Dwight, Selina S. and Eppig, Janan T. and Harris, Midori A. and Hill, David P. and. Gene. Nature Genetics , volume =. doi:10.1038/75556 , urldate =

  55. [55]

    doi:10.1093/nar/gkq1156 , urldate =

    Kamburov, Atanas and Pentchev, Konstantin and Galicka, Hanna and Wierling, Christoph and Lehrach, Hans and Herwig, Ralf , year = 2011, month = jan, journal =. doi:10.1093/nar/gkq1156 , urldate =

  56. [56]

    Koh, Hiromi W. L. and Fermin, Damian and Vogel, Christine and Choi, Kwok Pui and Ewing, Rob M. and Choi, Hyungwon , year = 2019, month = jul, journal =. doi:10.1038/s41540-019-0099-y , urldate =

  57. [57]

    and Quackenbush, J.F

    Parker, Joel S. and Mullins, Michael and Cheang, Maggie C.U. and Leung, Samuel and Voduc, David and Vickery, Tammi and Davies, Sherri and Fauron, Christiane and He, Xiaping and Hu, Zhiyuan and Quackenbush, John F. and Stijleman, Inge J. and Palazzo, Juan and Marron, J.S. and Nobel, Andrew B. and Mardis, Elaine and Nielsen, Torsten O. and Ellis, Matthew J....

  58. [58]

    Mitchell A

    Comprehensive Molecular Portraits of Human Breast Tumours , author =. Nature , volume =. doi:10.1038/nature11412 , urldate =

  59. [59]

    Causal Inference Using Graphical Models with the

    Kalisch, Markus and M. Causal Inference Using Graphical Models with the. Journal of Statistical Software , year =

  60. [60]

    Zhang, Keli and Zhu, Shengyu and Kalander, Marcus and Ng, Ignavier and Ye, Junjian and Chen, Zhitang and Pan, Lujia , year =

  61. [61]

    Squires, Chandler , year =

  62. [62]

    Sociological Methods & Research , year =

    Spirtes, Peter and Richardson, Thomas and Meek, Christopher and Scheines, Richard and Glymour, Clark , title =. Sociological Methods & Research , year =

  63. [63]

    2009 , isbn =

    Pearl, Judea , title =. 2009 , isbn =

  64. [64]

    Graphical Models , author =

  65. [65]

    Large-scale Sparse Inverse Covariance Matrix Estimation , journal =

    Bollh. Large-scale Sparse Inverse Covariance Matrix Estimation , journal =. 2019 , volume =

  66. [66]

    and Cisneros-Velarde, Pedro and Petersen, Alexander and Oh, Sang-Yun , title =

    Tran, C. and Cisneros-Velarde, Pedro and Petersen, Alexander and Oh, Sang-Yun , title =. 2020 , howpublished =