Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models
Pith reviewed 2026-05-18 09:46 UTC · model grok-4.3
The pith
A score-based greedy search identifies causal structures with latent variables up to Markov equivalence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose the Generalized N Factor Model and establish the global consistency: the true structure including latent variables can be identified up to the Markov equivalence class by using score. We then design Latent variable Greedy Equivalence Search (LGES), a greedy search algorithm for this class of model with well-defined operators, which search very efficiently over the graph space to find the optimal structure.
What carries the argument
The Generalized N Factor Model, a linear causal model with latent variables under specific noise and identifiability conditions that makes the full graph recoverable from scores up to Markov equivalence.
If this is right
- The true causal graph with latent variables is recoverable by score maximization rather than constraint testing.
- LGES performs an efficient greedy traversal of the space of graphs that contain latent variables.
- Error propagation from sequential independence tests is avoided because the method relies on a global score.
- The method applies directly to both synthetic linear systems and real data sets that approximately match the model class.
Where Pith is reading between the lines
- If real data only approximately satisfies the factor-model noise assumptions, LGES may still return a useful approximation to the equivalence class.
- The score-based formulation could be paired with downstream tasks such as estimating causal effects after the graph is recovered.
- Similar greedy-search operators might be adapted to other parametric classes that admit latent-variable identifiability.
Load-bearing premise
The observed data must be generated exactly by a linear system obeying the Generalized N Factor Model's noise structure and latent identifiability conditions.
What would settle it
Generate data from a Generalized N Factor Model whose true graph is known, run LGES, and check whether the returned equivalence class always contains the true graph.
Figures
read the original abstract
Identifying the structure of a partially observed causal system is essential to various scientific fields. Recent advances have focused on constraint-based causal discovery to solve this problem, and yet in practice these methods often face challenges related to multiple testing and error propagation. These issues could be mitigated by a score-based method and thus it has raised great attention whether there exists a score-based greedy search method that can handle the partially observed scenario. In this work, we propose the first score-based greedy search method for the identification of structure involving latent variables with identifiability guarantees. Specifically, we propose Generalized N Factor Model and establish the global consistency: the true structure including latent variables can be identified up to the Markov equivalence class by using score. We then design Latent variable Greedy Equivalence Search (LGES), a greedy search algorithm for this class of model with well-defined operators, which search very efficiently over the graph space to find the optimal structure. Our experiments on both synthetic and real-life data validate the effectiveness of our method (code will be publicly available).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the Generalized N Factor Model for partially observed linear causal systems with latent variables and introduces the Latent variable Greedy Equivalence Search (LGES) algorithm as a score-based greedy search procedure. It claims to establish global consistency: the true structure (including latents) can be identified up to the Markov equivalence class via the score. The method defines well-specified search operators and is evaluated on synthetic and real-life data, with code promised to be released.
Significance. If the global consistency result is rigorously derived and holds under the model assumptions, the work would fill a notable gap by supplying the first score-based greedy method with identifiability guarantees for latent-variable causal models. This could offer practical advantages over constraint-based approaches by avoiding multiple-testing and error-propagation issues. The explicit model definition and promised reproducibility are strengths.
major comments (2)
- [Theoretical results on global consistency] The global consistency theorem (likely in §4 or the theoretical section following the model definition) is derived under the assumption that the data-generating process exactly matches the Generalized N Factor Model, including linear relations, the specific noise structure, and the latent identifiability conditions. The manuscript should add an explicit statement or simulation study quantifying robustness to mild misspecification (e.g., small non-linearities or noise deviations), as this assumption is load-bearing for the claimed identifiability guarantees.
- [Introduction and related work] The abstract and introduction assert that LGES is the 'first' score-based greedy search method with identifiability guarantees for latent variables. A dedicated comparison subsection should enumerate prior score-based methods for partially observed models and explain why they lack the global consistency property established here.
minor comments (2)
- [Experiments] In the experimental section, the synthetic data generation procedure should report the exact parameter ranges used for the Generalized N Factor Model (e.g., number of latents, noise variances) to facilitate exact reproduction.
- [Method] Notation for the score function and the search operators should be introduced with a single consolidated table or definition block early in the methods section to improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment point by point below, indicating the changes we plan to incorporate in the revised manuscript.
read point-by-point responses
-
Referee: The global consistency theorem (likely in §4 or the theoretical section following the model definition) is derived under the assumption that the data-generating process exactly matches the Generalized N Factor Model, including linear relations, the specific noise structure, and the latent identifiability conditions. The manuscript should add an explicit statement or simulation study quantifying robustness to mild misspecification (e.g., small non-linearities or noise deviations), as this assumption is load-bearing for the claimed identifiability guarantees.
Authors: We agree that the global consistency result holds under the exact assumptions of the Generalized N Factor Model. In the revised manuscript, we will add an explicit paragraph in the theoretical section that restates these assumptions and their role in the identifiability guarantees. A comprehensive simulation study on robustness to misspecification would require new experiments that extend the current scope; we will instead add a concise discussion of potential limitations and sensitivity to mild deviations, while noting that full quantification remains an important direction for future work. revision: partial
-
Referee: The abstract and introduction assert that LGES is the 'first' score-based greedy search method with identifiability guarantees for latent variables. A dedicated comparison subsection should enumerate prior score-based methods for partially observed models and explain why they lack the global consistency property established here.
Authors: We appreciate the suggestion to strengthen the contextualization. Our claim is that LGES is the first score-based greedy procedure to establish global consistency for structure identification including latent variables under the Generalized N Factor Model. To address the comment, we will insert a new dedicated subsection in the related work that reviews prior score-based methods applicable to partially observed settings and clarifies the distinctions, particularly the absence of comparable global consistency guarantees in those approaches. revision: yes
Circularity Check
No significant circularity; consistency result derived from model assumptions without reduction to inputs
full rationale
The paper defines the Generalized N Factor Model and proves global consistency for identifying the true structure (including latents) up to the Markov equivalence class using a score-based approach. This is a standard theoretical derivation where the identifiability guarantees follow from the stated linear relations, noise structure, and latent conditions of the proposed model. No load-bearing self-citations, fitted parameters renamed as predictions, or self-definitional loops are evident in the provided abstract or description. The central claim remains independent and self-contained against external benchmarks, as the consistency holds conditionally on the model matching the data-generating process, which is explicitly invoked rather than smuggled in circularly.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Data follows linear causal relations with latent variables under the Generalized N Factor Model
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose Generalized N Factor Model and establish the global consistency: the true structure including latent variables can be identified up to the Markov equivalence class by using score.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
scoreML(G, Σ̂X) = max L = −(N/2)(tr((ΣX)−1 Σ̂X) + log det ΣX)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alexis Bellot and Mihaela van der Schaar. Deconfounded score method: Scoring dags with dense unobserved confounding.arXiv preprint arXiv:2103.15106,
-
[2]
doi: 10.3102/00028312031003645. Barbara M. Byrne.Structural Equation Modeling with Amos: Basic Concepts, Applications, and Programming. Multivariate Application Series. Toutledge, Taylor and Francis Group, New York London, 2nd edition,
-
[3]
Learning Sparse Causal Models is not NP-hard
Tom Claassen, Joris Mooij, and Tom Heskes. Learning sparse causal models is not np-hard.arXiv preprint arXiv:1309.6824,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Bayesian network learning with cutting planes
James Cussens. Bayesian network learning with cutting planes.arXiv preprint arXiv:1202.3713,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Algebraic sparse factor analysis
Mathias Drton, Alexandros Grosdos, Irem Portakal, and Nils Sturma. Algebraic sparse factor analysis. arXiv preprint arXiv:2312.14762,
-
[6]
URL https://psycnet.apa.org/doiLanding?doi=10.1037% 2F0003-066X.48.1.26
doi: 0003-066X. URL https://psycnet.apa.org/doiLanding?doi=10.1037% 2F0003-066X.48.1.26. 11 Under Review Samsad Afrin Himi, Markus Buehner, Matthias Schwaighofer, Anna Klapetek, and Sven Hilbert. Multitasking behavior and its related constructs: Executive functions, working memory capacity, relational integration, and divided attention.Cognition, 189:275–...
-
[7]
Learning directed acyclic graphs based on sparsest permutations
Garvesh Raskutti and Caroline Uhler. Learning directed acyclic graphs based on sparsest permutations. arXiv preprint arXiv:1307.0366v3,
work page internal anchor Pith review Pith/arXiv arXiv
-
[8]
Shohei Shimizu, Patrik O Hoyer, and Aapo Hyv¨arinen. Estimation of linear non-gaussian acyclic models for latent factors.Neurocomputing, 72(7-9):2024–2027,
work page 2024
-
[9]
Parameter and Structure Learning in Nested Markov Models
Ilya Shpitser, Thomas S. Richardson, James M. Robins, and Robin Evans. Parameter and structure learning in nested Markov models.arXiv preprint arXiv:1207.5058,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Causal Inference in the Presence of Latent Variables and Selection Bias
13 Under Review Peter L Spirtes, Christopher Meek, and Thomas S Richardson. Causal inference in the presence of latent variables and selection bias.arXiv preprint arXiv:1302.4983,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Statistically based tests for the number of common factors
James H Steiger. Statistically based tests for the number of common factors. InPaper presented at the Annual Meeting of the Psychometric Society, Iowa Cyty, 1980,
work page 1980
-
[12]
Unpaired multi-domain causal representation learning.arXiv preprint arXiv:2302.00993,
Nils Sturma, Chandler Squires, Mathias Drton, and Caroline Uhler. Unpaired multi-domain causal representation learning.arXiv preprint arXiv:2302.00993,
-
[13]
Empirical Evaluation of Rectified Activations in Convolutional Network
Bing Xu. Empirical evaluation of rectified activations in convolutional network.arXiv preprint arXiv:1505.00853,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Algorithm 2Phase 2 of LGES 1: Input:Empirical covariance ˆΣX over X, tolerance δ, the output Sphase1 and P of Algorithm 1 2:Output:CPDAGS final 3:LetS=S phase1; 4:whileTruedo 5:forL i ∈P(in parallel)do 6:forL j ∈P(in parallel)do 7:ifL i −L j orL i →L j then 8:LetH={L:L−L j and(L−L i orL→L i orL←L i)}; 9:foreach subsetH ′ k ⊆H(in parallel)do 10:S ijk =O LL...
work page 2001
-
[15]
Among them, the most commonly used one and also the earliest developed one might be rank constraints (Sullivant et al., 2010), which generalize the classical Tetrad representation theorem (Spirtes et al.,
work page 2010
-
[16]
and basic conditional independence constraints. Based on these new tools, many algorithms have also been developed (Silva et al., 2003; Huang et al., 2022; Dong et al., 2024a). However, despite their theoretical advancements, most existing methods remain within the constraint-based paradigm, heavily relying on statistical tests that suffer from multiple-t...
work page 2003
-
[17]
Given observation ˆΣX and let G∗ = arg maxG∈GGNF scoreML(G, ˆΣX). If ˆG ∈G ∗ and ˆG ∈arg min G∈G∗ dim(G), then ˆGandG ∗ are Markov equivalent in the large sample limit. Proof. Similar to the proof of Theorem 1, we can show that H( ˆG) =H(G ∗). By Theorem 2, we have ˆGandG ∗ belong to the same MEC. B.4 PROOF OFLEMMA1 Lemma 1(Properties of Initial State).Su...
work page 2002
-
[18]
measures the relative improvement in terms of fit from the baseline model to the proposed model. The sample CFI is estimated as follows: ˆCFI= 1− max(χ2 k −d fk,0) max(χ2 0 −d f0,0) ,(7) where χ2 k and d fk corresponds to the concerned model while χ2 0 and d f0 corresponds to the baseline independent model that can only parameterize a diagonal covariance ...
work page 1973
-
[19]
Data is standardized to have zero mean and unit variance
and LBFGS. Data is standardized to have zero mean and unit variance. The hyper parameter δ in Algorithms 1 and 2 is set as δ= 0.25× log(N) N , where N is the sample size. This design follows the spirit of BIC score such that δ→0 when N→ ∞. In practice, we found that the result is only influenced marginally by a small change ofδ. Bellow we provide a furthe...
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.