Relaxed Sparsest-Permutation Formulation for Causal Discovery at Scale
Pith reviewed 2026-05-08 06:02 UTC · model grok-4.3
The pith
A relaxed sparsest-permutation method recovers Markov equivalence classes in linear causal models without requiring exact Cholesky factorization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recovery. This observation motivates a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph. The relaxed formulation can be efficiently evaluated via masked zero-fill incomplete Cholesky factorization, enabling scalable comparison of candidate orderings. At the population level, we establish soundness for Markov equivalence class recovery under no-cancellation and sparsest Markov representation assumptions, as well as robustness to ordering misspecification.
What carries the argument
The support-level relaxation of the sparsest-permutation formulation, evaluated via masked zero-fill incomplete Cholesky factorization over a precision-support screening graph.
If this is right
- The method enables direct comparison of many candidate orderings at scale.
- Markov equivalence class recovery remains sound at the population level under the stated assumptions.
- The pipeline scales to 10,000 variables while matching the accuracy of substantially slower baselines.
- The approach tolerates some misspecification in the input ordering without losing soundness.
Where Pith is reading between the lines
- The screening graph step may become the new computational bottleneck in even higher dimensions, suggesting room for faster screening heuristics.
- The relaxation could extend to settings where only partial order information is available from domain knowledge.
- Similar incomplete factorization tricks might reduce cost in other precision-matrix-based graphical model tasks beyond causal discovery.
Load-bearing premise
The no-cancellation and sparsest Markov representation assumptions must hold for the population-level soundness guarantees on Markov equivalence class recovery.
What would settle it
Run the method on large synthetic linear SEM data that satisfy the no-cancellation and sparsest assumptions and check whether it recovers the true Markov equivalence class as sample size increases, or observe whether runtime and accuracy degrade sharply beyond a few thousand variables.
Figures
read the original abstract
Despite the growing availability of large datasets, causal structure learning remains computationally prohibitive at scale. We revisit sparsest-permutation learning for linear structural equation models and show that exact Cholesky factorization is unnecessary for structure recovery. This observation motivates a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph. The relaxed formulation can be efficiently evaluated via masked zero-fill incomplete Cholesky factorization, enabling scalable comparison of candidate orderings. At the population level, we establish soundness for Markov equivalence class (MEC) recovery under no-cancellation and sparsest Markov representation assumptions, as well as robustness to ordering misspecification. Motivated by these guarantees, we introduce SCOPE, a sparse-Cholesky pipeline that provides a scalable implementation of the relaxed formulation. Experiments on synthetic and real datasets demonstrate that SCOPE matches the MEC recovery accuracy of substantially slower baselines, while achieving significantly reduced runtime and scaling to 10k variables.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper revisits sparsest-permutation causal discovery for linear SEMs and argues that exact Cholesky factorization is unnecessary for structure recovery. It introduces a support-level relaxation that searches for sparse triangular factors over a precision-support screening graph, which is evaluated efficiently via masked zero-fill incomplete Cholesky factorization. This enables the SCOPE pipeline to compare candidate orderings at scale. At the population level the authors claim soundness for Markov equivalence class recovery under no-cancellation and sparsest-Markov-representation assumptions, plus robustness to ordering misspecification. Experiments on synthetic and real data show that SCOPE matches the MEC-recovery accuracy of slower baselines while scaling to 10k variables.
Significance. If the stated assumptions hold, the work supplies a theoretically grounded, computationally tractable alternative to existing sparsest-permutation methods, with the potential to extend causal discovery to datasets an order of magnitude larger than current practical limits. The combination of a provably sound relaxation and an efficient factorization routine is a concrete engineering contribution.
major comments (2)
- [Abstract / theoretical results] The population-level soundness guarantee for MEC recovery is explicitly conditioned on the no-cancellation assumption and the sparsest Markov representation assumption (Abstract). Because these conditions are load-bearing for the central theoretical claim, the manuscript should include a dedicated discussion or corollary that characterizes the measure of parameter space on which they fail and the resulting error in permutation selection.
- [factorization routine] The equivalence between the masked zero-fill incomplete Cholesky factorization and the exact relaxed objective is asserted to hold under the same assumptions, yet the manuscript provides no explicit statement of the numerical tolerance or fill-in threshold at which this equivalence breaks (the section describing the factorization routine). A short lemma or numerical counter-example would clarify the practical range of validity.
minor comments (2)
- The abstract introduces the term 'precision-support screening graph' without a one-sentence definition; a brief parenthetical gloss would improve readability for readers outside the immediate subfield.
- [experiments] Table captions and axis labels in the experimental section should explicitly state whether reported accuracies are averaged over multiple random seeds and whether error bars represent standard deviation or standard error.
Simulated Author's Rebuttal
We thank the referee for the positive evaluation and the recommendation for minor revision. We address each major comment below and will incorporate the requested clarifications into the revised manuscript.
read point-by-point responses
-
Referee: [Abstract / theoretical results] The population-level soundness guarantee for MEC recovery is explicitly conditioned on the no-cancellation assumption and the sparsest Markov representation assumption (Abstract). Because these conditions are load-bearing for the central theoretical claim, the manuscript should include a dedicated discussion or corollary that characterizes the measure of parameter space on which they fail and the resulting error in permutation selection.
Authors: We agree that a more explicit discussion of the assumptions would strengthen the paper. In the revision we will add a short dedicated paragraph (new Corollary 3.4) in the theoretical results section. The paragraph will note that, for continuous parameter distributions, the no-cancellation condition fails only on a Lebesgue-measure-zero set and that the sparsest-Markov-representation assumption is generic under the linear SEM model; we will also include a brief analytic bound on the permutation-selection error under small additive perturbations that violate the assumptions, together with a one-paragraph simulation illustration. revision: yes
-
Referee: [factorization routine] The equivalence between the masked zero-fill incomplete Cholesky factorization and the exact relaxed objective is asserted to hold under the same assumptions, yet the manuscript provides no explicit statement of the numerical tolerance or fill-in threshold at which this equivalence breaks (the section describing the factorization routine). A short lemma or numerical counter-example would clarify the practical range of validity.
Authors: We thank the referee for highlighting this point. The equivalence holds exactly in exact arithmetic whenever the support mask is respected. In the revision we will insert a short remark (new Remark 4.3) in the incomplete-Cholesky subsection that states the default numerical tolerance (relative residual < 1e-12) and supplies a compact numerical counter-example on a 5-variable instance showing the fill-in threshold at which the masked factorization deviates from the exact relaxed objective. revision: yes
Circularity Check
No circularity: soundness derived from explicit external assumptions
full rationale
The paper derives population-level soundness for MEC recovery from two stated domain assumptions (no-cancellation and sparsest Markov representation) plus the definition of the relaxed objective. These assumptions are independent of the proposed method and are not shown to be equivalent to the output by construction. The masked zero-fill incomplete Cholesky is presented as an efficient computational proxy for the relaxed objective, not as a redefinition of the target quantity. No self-citation chains, fitted parameters renamed as predictions, or ansatzes smuggled via prior work appear in the provided text. The central claim therefore remains non-circular.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption No-cancellation assumption
- domain assumption Sparsest Markov representation assumption
Reference graph
Works this paper leans on
-
[1]
The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , author=. Nature Biotechnology , year=
-
[2]
Single-cell RNA-sequencing technologies and applications , author=. Nature Methods , year=
-
[3]
Sparse inverse covariance estimation with the graphical lasso , author =. Biostatistics , volume =. 2008 , publisher =
work page 2008
-
[4]
Distributionally Robust Formulation and Model Selection for the Graphical Lasso , author =. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (AISTATS) , series =
-
[5]
Family-wise error rate control in Gaussian graphical model selection via distributionally robust optimization , author=. Stat , volume=. 2022 , publisher=
work page 2022
-
[6]
Meijerink, J. A. and van der Vorst, H. A. , booktitle =. An iterative solution method for linear systems of which the coefficient matrix is a symmetric. 1977 , publisher =
work page 1977
-
[7]
Iterative Methods for Sparse Linear Systems , author =. 2003 , publisher =
work page 2003
-
[8]
Journal of the American Statistical Association , volume =
Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation , author =. Journal of the American Statistical Association , volume =
-
[9]
High-dimensional semiparametric
Liu, Han and Han, Fang and Yuan, Ming and Lafferty, John and Wasserman, Larry , journal =. High-dimensional semiparametric. 2012 , publisher =
work page 2012
-
[10]
The Annals of Statistics , volume =
High-dimensional covariance estimation by minimizing _1 -penalized log-determinant divergence , author =. The Annals of Statistics , volume =. 2011 , publisher =
work page 2011
-
[11]
Journal of Machine Learning Research , year =
High-Dimensional Learning of Linear Causal Networks via Inverse Covariance Estimation , author =. Journal of Machine Learning Research , year =
-
[12]
Learning directed acyclic graph models based on sparsest permutations , author =. Stat , volume =. 2018 , publisher =
work page 2018
-
[13]
Joint mean--covariance models with applications to longitudinal data: unconstrained parameterisation , author =. Biometrika , volume =. 1999 , publisher =
work page 1999
- [14]
-
[15]
The Evolution of the Minimum Degree Ordering Algorithm , author =. SIAM Review , volume =
-
[16]
SIAM Journal on Matrix Analysis and Applications , volume =
An Approximate Minimum Degree Ordering Algorithm , author =. SIAM Journal on Matrix Analysis and Applications , volume =
-
[17]
Berman, Piotr and Garay, Juan A. and Gilbert, John R. , journal =. On the Performance of the Minimum Degree Ordering for
-
[18]
Greedy Relaxations of the Sparsest Permutation Algorithm , author =. Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI) , series =. 2022 , publisher =
work page 2022
-
[19]
Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =
Equivalence and Synthesis of Causal Models , author =. Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =
-
[20]
and Madigan, David and Perlman, Michael D
Andersson, Steen A. and Madigan, David and Perlman, Michael D. , journal =. A Characterization of
-
[21]
SIAM Journal on Computing , volume =
Algorithmic Aspects of Vertex Elimination on Graphs , author =. SIAM Journal on Computing , volume =
- [22]
-
[23]
Estimating High-Dimensional Directed Acyclic Graphs with the
Kalisch, Markus and B. Estimating High-Dimensional Directed Acyclic Graphs with the. Journal of Machine Learning Research , volume =
-
[24]
Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =
Optimal Structure Identification with Greedy Search , author =. Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI) , pages =
-
[25]
Tsamardinos, Ioannis and Brown, Laura E. and Aliferis, Constantin F. , journal =. The
-
[26]
Efficient Permutation Discovery in Causal
Squires, Chandler and Amaniampong, Joshua and Uhler, Caroline , journal =. Efficient Permutation Discovery in Causal
-
[27]
Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep and Xing, Eric P. , booktitle =. DAGs with
-
[28]
Ng, Ignavier and Ghassami, AmirEmad and Zhang, Kun , booktitle =. On the Role of Sparsity and. 2020 , pages =
work page 2020
-
[29]
Journal of Data Science , volume =
FROSTY: A High-Dimensional Scale-Free Bayesian Network Learning Method , author =. Journal of Data Science , volume =. 2023 , doi =
work page 2023
-
[30]
The Large-Scale Organization of Metabolic Networks , author =. Nature , volume =. 2000 , doi =
work page 2000
-
[31]
Nature Reviews Genetics , volume =
Network biology: understanding the cell's functional organization , author =. Nature Reviews Genetics , volume =. 2004 , doi =
work page 2004
-
[32]
The Structure and Function of Complex Networks , author =. SIAM Review , volume =. 2003 , doi =
work page 2003
-
[33]
The Annals of Statistics , volume =
Learning high-dimensional directed acyclic graphs with latent and selection variables , author =. The Annals of Statistics , volume =. 2012 , doi =
work page 2012
- [34]
-
[35]
Computer Solution of Large Sparse Positive Definite Systems , author =. 1981 , publisher =
work page 1981
-
[36]
Nature Reviews Neuroscience , volume =
Complex brain networks: graph theoretical analysis of structural and functional systems , author =. Nature Reviews Neuroscience , volume =. 2009 , doi =
work page 2009
-
[37]
IEEE transactions on pattern analysis and machine intelligence , volume=
Optimizing regularized cholesky score for order-based learning of bayesian networks , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=
work page 2020
-
[38]
Consistency guarantees for greedy permutation-based causal inference algorithms , author=. Biometrika , volume=. 2021 , publisher=
work page 2021
-
[39]
International Conference on Artificial Intelligence and Statistics , pages=
Learning linear structural equation models in polynomial time and sample complexity , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2018 , organization=
work page 2018
-
[40]
The adaptive and the thresholded
van de Geer, Sara and B. The adaptive and the thresholded. Electronic Journal of Statistics , year=
-
[41]
Least squares after model selection in high-dimensional sparse models , author=. Bernoulli , volume=
-
[42]
SIAM Journal on Scientific computing , volume=
Incomplete Cholesky factorizations with limited memory , author=. SIAM Journal on Scientific computing , volume=. 1999 , publisher=
work page 1999
-
[43]
Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain , author =. Econometrica , volume =
-
[44]
Review of Economic Studies , volume =
Inference on Treatment Effects after Selection among High-Dimensional Controls , author =. Review of Economic Studies , volume =
-
[45]
Annals of Statistics , volume =
High Dimensional Variable Selection , author =. Annals of Statistics , volume =
-
[46]
Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , author =. Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI-05) , pages =. 2005 , publisher =
work page 2005
-
[47]
Hsieh, Cho-Jui and Sustik, Matyas A. and Dhillon, Inderjit S. and Ravikumar, Pradeep , title =. Journal of Machine Learning Research , volume =
-
[48]
Hsieh, Cho-Jui and Sustik, Matyas A. and Dhillon, Inderjit S. and Ravikumar, Pradeep and Poldrack, Russell A. , title =. Advances in Neural Information Processing Systems 26 , pages =
-
[49]
Journal of Applied Probability , year =
Robust Wasserstein Profile Inference and Applications to Machine Learning , author =. Journal of Applied Probability , year =
-
[50]
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) , pages =
Permutation-Based Causal Structure Learning with Unknown Intervention Targets , author =. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) , pages =. 2020 , editor =
work page 2020
-
[51]
Proceedings of the First Conference on Causal Learning and Reasoning , pages =
Causal Structure Discovery between Clusters of Nodes Induced by Latent Factors , author =. Proceedings of the First Conference on Causal Learning and Reasoning , pages =. 2022 , editor =
work page 2022
-
[52]
Proceedings of the First Conference on Causal Learning and Reasoning , pages =
Causal Discovery in Linear Structural Causal Models with Deterministic Relations , author =. Proceedings of the First Conference on Causal Learning and Reasoning , pages =. 2022 , editor =
work page 2022
-
[53]
Submitted to International Conference on Learning Representations (ICLR) 2024 , year =
Causal Structure Learning Supervised by Large Language Model , author =. Submitted to International Conference on Learning Representations (ICLR) 2024 , year =
work page 2024
-
[54]
Gene Ontology: tool for the unification of biology
Ashburner, Michael and Ball, Catherine A. and Blake, Judith A. and Botstein, David and Butler, Heather and Cherry, J. Michael and Davis, Allan P. and Dolinski, Kara and Dwight, Selina S. and Eppig, Janan T. and Harris, Midori A. and Hill, David P. and. Gene. Nature Genetics , volume =. doi:10.1038/75556 , urldate =
-
[55]
doi:10.1093/nar/gkq1156 , urldate =
Kamburov, Atanas and Pentchev, Konstantin and Galicka, Hanna and Wierling, Christoph and Lehrach, Hans and Herwig, Ralf , year = 2011, month = jan, journal =. doi:10.1093/nar/gkq1156 , urldate =
-
[56]
Koh, Hiromi W. L. and Fermin, Damian and Vogel, Christine and Choi, Kwok Pui and Ewing, Rob M. and Choi, Hyungwon , year = 2019, month = jul, journal =. doi:10.1038/s41540-019-0099-y , urldate =
-
[57]
Parker, Joel S. and Mullins, Michael and Cheang, Maggie C.U. and Leung, Samuel and Voduc, David and Vickery, Tammi and Davies, Sherri and Fauron, Christiane and He, Xiaping and Hu, Zhiyuan and Quackenbush, John F. and Stijleman, Inge J. and Palazzo, Juan and Marron, J.S. and Nobel, Andrew B. and Mardis, Elaine and Nielsen, Torsten O. and Ellis, Matthew J....
-
[58]
Comprehensive Molecular Portraits of Human Breast Tumours , author =. Nature , volume =. doi:10.1038/nature11412 , urldate =
-
[59]
Causal Inference Using Graphical Models with the
Kalisch, Markus and M. Causal Inference Using Graphical Models with the. Journal of Statistical Software , year =
-
[60]
Zhang, Keli and Zhu, Shengyu and Kalander, Marcus and Ng, Ignavier and Ye, Junjian and Chen, Zhitang and Pan, Lujia , year =
-
[61]
Squires, Chandler , year =
-
[62]
Sociological Methods & Research , year =
Spirtes, Peter and Richardson, Thomas and Meek, Christopher and Scheines, Richard and Glymour, Clark , title =. Sociological Methods & Research , year =
- [63]
-
[64]
Graphical Models , author =
-
[65]
Large-scale Sparse Inverse Covariance Matrix Estimation , journal =
Bollh. Large-scale Sparse Inverse Covariance Matrix Estimation , journal =. 2019 , volume =
work page 2019
-
[66]
and Cisneros-Velarde, Pedro and Petersen, Alexander and Oh, Sang-Yun , title =
Tran, C. and Cisneros-Velarde, Pedro and Petersen, Alexander and Oh, Sang-Yun , title =. 2020 , howpublished =
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.