Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models

Haoyue Dai; Ignavier Ng; Jiaqi Sun; Kun Zhang; Peter Spirtes; Xiangchen Song; Xinshuai Dong

arxiv: 2510.04378 · v2 · submitted 2025-10-05 · 💻 cs.LG

Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models

Xinshuai Dong , Ignavier Ng , Haoyue Dai , Jiaqi Sun , Xiangchen Song , Peter Spirtes , Kun Zhang This is my paper

Pith reviewed 2026-05-18 09:46 UTC · model grok-4.3

classification 💻 cs.LG

keywords causal discoverylatent variablesstructure learningscore-based methodsgreedy searchpartially observed systemslinear causal models

0 comments

The pith

A score-based greedy search identifies causal structures with latent variables up to Markov equivalence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops the first score-based greedy search for recovering causal graphs when some variables are hidden. It defines the Generalized N Factor Model and proves that the true graph, including the latent variables, belongs to a unique Markov equivalence class recoverable from scores on the observed data. The authors introduce the Latent variable Greedy Equivalence Search algorithm that uses a small set of well-defined operators to move efficiently through the space of candidate graphs. The approach is motivated by the error propagation and multiple-testing problems that arise in constraint-based methods for the same setting. If the consistency result holds, practitioners gain a direct optimization route to causal structure that incorporates hidden variables without explicit conditional-independence tests.

Core claim

We propose the Generalized N Factor Model and establish the global consistency: the true structure including latent variables can be identified up to the Markov equivalence class by using score. We then design Latent variable Greedy Equivalence Search (LGES), a greedy search algorithm for this class of model with well-defined operators, which search very efficiently over the graph space to find the optimal structure.

What carries the argument

The Generalized N Factor Model, a linear causal model with latent variables under specific noise and identifiability conditions that makes the full graph recoverable from scores up to Markov equivalence.

If this is right

The true causal graph with latent variables is recoverable by score maximization rather than constraint testing.
LGES performs an efficient greedy traversal of the space of graphs that contain latent variables.
Error propagation from sequential independence tests is avoided because the method relies on a global score.
The method applies directly to both synthetic linear systems and real data sets that approximately match the model class.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If real data only approximately satisfies the factor-model noise assumptions, LGES may still return a useful approximation to the equivalence class.
The score-based formulation could be paired with downstream tasks such as estimating causal effects after the graph is recovered.
Similar greedy-search operators might be adapted to other parametric classes that admit latent-variable identifiability.

Load-bearing premise

The observed data must be generated exactly by a linear system obeying the Generalized N Factor Model's noise structure and latent identifiability conditions.

What would settle it

Generate data from a Generalized N Factor Model whose true graph is known, run LGES, and check whether the returned equivalence class always contains the true graph.

Figures

Figures reproduced from arXiv: 2510.04378 by Haoyue Dai, Ignavier Ng, Jiaqi Sun, Kun Zhang, Peter Spirtes, Xiangchen Song, Xinshuai Dong.

**Figure 2.** Figure 2: An illustrative example of the graph that satisfies generalized N factor model in Definition 2 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: An illustration of the whole process of LGES, where (a) Sinit is the initial state of Algorithm 1, (b) Sphase1 is the output of Algorithm 1, and (c) Sfinal is the final output of Algorithm 2. Our overall objective is to find a graph Gˆ such that it can generate ΣˆX equally well as the ground truth G ∗ , while having a dimension that is as small as possible (as in Theorem 1 and Corollary 1). To efficiently … view at source ↗

**Figure 4.** Figure 4: Causal structure (CPDAG) recovered by LGES on Multi-tasking behavior dataset [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Causal structure (CPDAG) recovered by LGES on Big Five personality dataset [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 7.** Figure 7: Illustrative examples to compare two graphical assumptions, generalized N factor model [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗

**Figure 8.** Figure 8: Examples of graphs considered in our experiments. They satisfy Definition 2. [PITH_FULL_IMAGE:figures/full_fig_p024_8.png] view at source ↗

read the original abstract

Identifying the structure of a partially observed causal system is essential to various scientific fields. Recent advances have focused on constraint-based causal discovery to solve this problem, and yet in practice these methods often face challenges related to multiple testing and error propagation. These issues could be mitigated by a score-based method and thus it has raised great attention whether there exists a score-based greedy search method that can handle the partially observed scenario. In this work, we propose the first score-based greedy search method for the identification of structure involving latent variables with identifiability guarantees. Specifically, we propose Generalized N Factor Model and establish the global consistency: the true structure including latent variables can be identified up to the Markov equivalence class by using score. We then design Latent variable Greedy Equivalence Search (LGES), a greedy search algorithm for this class of model with well-defined operators, which search very efficiently over the graph space to find the optimal structure. Our experiments on both synthetic and real-life data validate the effectiveness of our method (code will be publicly available).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LGES adapts greedy equivalence search to latents via a new linear factor model and claims global consistency up to MEC, but the result stands or falls on how tightly the data must match their assumptions.

read the letter

The core contribution is LGES, a score-based greedy search that extends the GES framework to partially observed linear systems. They introduce the Generalized N Factor Model and argue that a suitable score recovers the true graph including latent variables up to the Markov equivalence class. This is genuinely new relative to the constraint-based literature on latent causal discovery; most prior methods rely on conditional independence tests and therefore inherit multiple-testing and error-propagation problems that a score can sidestep in principle. The paper also supplies explicit search operators and shows the algorithm runs efficiently over the enlarged graph space, which is a practical plus for anyone who has tried to scale constraint-based methods with hidden variables. Experiments on synthetic data and one real-life example are included to illustrate the claim. The citation pattern is standard and appropriate, drawing on the usual Spirtes, Pearl, and score-based search references without obvious omissions. The central soft spot is the narrow set of assumptions required for the consistency theorem. The data-generating process must be exactly linear with the specific noise structure and latent identifiability conditions they define; even moderate nonlinearity or noise misspecification would invalidate the guarantee that the score distinguishes equivalence classes correctly. The abstract and stress-test note both flag this dependence, and without seeing explicit robustness checks or sensitivity analysis in the full text it is hard to judge how brittle the method becomes in practice. Minor additional concerns are whether the score definition itself introduces any post-hoc adjustments and how the real-data experiment was scored when ground truth is unavailable. This paper is for researchers working on score-based causal discovery who already accept linear-Gaussian or linear-factor settings and want an alternative to PC-style algorithms when latents are present. A reader looking for immediate practical tools on non-linear or heavily misspecified data will get less out of it. The work shows clear engagement with the literature and a concrete algorithmic proposal, so it deserves a serious referee even if the review will likely press on assumption robustness and experimental controls.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Generalized N Factor Model for partially observed linear causal systems with latent variables and introduces the Latent variable Greedy Equivalence Search (LGES) algorithm as a score-based greedy search procedure. It claims to establish global consistency: the true structure (including latents) can be identified up to the Markov equivalence class via the score. The method defines well-specified search operators and is evaluated on synthetic and real-life data, with code promised to be released.

Significance. If the global consistency result is rigorously derived and holds under the model assumptions, the work would fill a notable gap by supplying the first score-based greedy method with identifiability guarantees for latent-variable causal models. This could offer practical advantages over constraint-based approaches by avoiding multiple-testing and error-propagation issues. The explicit model definition and promised reproducibility are strengths.

major comments (2)

[Theoretical results on global consistency] The global consistency theorem (likely in §4 or the theoretical section following the model definition) is derived under the assumption that the data-generating process exactly matches the Generalized N Factor Model, including linear relations, the specific noise structure, and the latent identifiability conditions. The manuscript should add an explicit statement or simulation study quantifying robustness to mild misspecification (e.g., small non-linearities or noise deviations), as this assumption is load-bearing for the claimed identifiability guarantees.
[Introduction and related work] The abstract and introduction assert that LGES is the 'first' score-based greedy search method with identifiability guarantees for latent variables. A dedicated comparison subsection should enumerate prior score-based methods for partially observed models and explain why they lack the global consistency property established here.

minor comments (2)

[Experiments] In the experimental section, the synthetic data generation procedure should report the exact parameter ranges used for the Generalized N Factor Model (e.g., number of latents, noise variances) to facilitate exact reproduction.
[Method] Notation for the score function and the search operators should be introduced with a single consolidated table or definition block early in the methods section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below, indicating the changes we plan to incorporate in the revised manuscript.

read point-by-point responses

Referee: The global consistency theorem (likely in §4 or the theoretical section following the model definition) is derived under the assumption that the data-generating process exactly matches the Generalized N Factor Model, including linear relations, the specific noise structure, and the latent identifiability conditions. The manuscript should add an explicit statement or simulation study quantifying robustness to mild misspecification (e.g., small non-linearities or noise deviations), as this assumption is load-bearing for the claimed identifiability guarantees.

Authors: We agree that the global consistency result holds under the exact assumptions of the Generalized N Factor Model. In the revised manuscript, we will add an explicit paragraph in the theoretical section that restates these assumptions and their role in the identifiability guarantees. A comprehensive simulation study on robustness to misspecification would require new experiments that extend the current scope; we will instead add a concise discussion of potential limitations and sensitivity to mild deviations, while noting that full quantification remains an important direction for future work. revision: partial
Referee: The abstract and introduction assert that LGES is the 'first' score-based greedy search method with identifiability guarantees for latent variables. A dedicated comparison subsection should enumerate prior score-based methods for partially observed models and explain why they lack the global consistency property established here.

Authors: We appreciate the suggestion to strengthen the contextualization. Our claim is that LGES is the first score-based greedy procedure to establish global consistency for structure identification including latent variables under the Generalized N Factor Model. To address the comment, we will insert a new dedicated subsection in the related work that reviews prior score-based methods applicable to partially observed settings and clarifies the distinctions, particularly the absence of comparable global consistency guarantees in those approaches. revision: yes

Circularity Check

0 steps flagged

No significant circularity; consistency result derived from model assumptions without reduction to inputs

full rationale

The paper defines the Generalized N Factor Model and proves global consistency for identifying the true structure (including latents) up to the Markov equivalence class using a score-based approach. This is a standard theoretical derivation where the identifiability guarantees follow from the stated linear relations, noise structure, and latent conditions of the proposed model. No load-bearing self-citations, fitted parameters renamed as predictions, or self-definitional loops are evident in the provided abstract or description. The central claim remains independent and self-contained against external benchmarks, as the consistency holds conditionally on the model matching the data-generating process, which is explicitly invoked rather than smuggled in circularly.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the definition of the Generalized N Factor Model and its identifiability properties for linear systems with latents; no explicit free parameters or invented entities are detailed in the abstract.

axioms (1)

domain assumption Data follows linear causal relations with latent variables under the Generalized N Factor Model
Invoked to establish global consistency of score-based identification up to Markov equivalence class

pith-pipeline@v0.9.0 · 5731 in / 1159 out tokens · 26483 ms · 2026-05-18T09:46:35.404076+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose Generalized N Factor Model and establish the global consistency: the true structure including latent variables can be identified up to the Markov equivalence class by using score.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

scoreML(G, Σ̂X) = max L = −(N/2)(tr((ΣX)−1 Σ̂X) + log det ΣX)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 6 internal anchors

[1]

Deconfounded score method: Scoring dags with dense unobserved confounding.arXiv preprint arXiv:2103.15106,

Alexis Bellot and Mihaela van der Schaar. Deconfounded score method: Scoring dags with dense unobserved confounding.arXiv preprint arXiv:2103.15106,

work page arXiv
[2]

Barbara M

doi: 10.3102/00028312031003645. Barbara M. Byrne.Structural Equation Modeling with Amos: Basic Concepts, Applications, and Programming. Multivariate Application Series. Toutledge, Taylor and Francis Group, New York London, 2nd edition,

work page doi:10.3102/00028312031003645
[3]

Learning Sparse Causal Models is not NP-hard

Tom Claassen, Joris Mooij, and Tom Heskes. Learning sparse causal models is not np-hard.arXiv preprint arXiv:1309.6824,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Bayesian network learning with cutting planes

James Cussens. Bayesian network learning with cutting planes.arXiv preprint arXiv:1202.3713,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Algebraic sparse factor analysis

Mathias Drton, Alexandros Grosdos, Irem Portakal, and Nils Sturma. Algebraic sparse factor analysis. arXiv preprint arXiv:2312.14762,

work page arXiv
[6]

URL https://psycnet.apa.org/doiLanding?doi=10.1037% 2F0003-066X.48.1.26

doi: 0003-066X. URL https://psycnet.apa.org/doiLanding?doi=10.1037% 2F0003-066X.48.1.26. 11 Under Review Samsad Afrin Himi, Markus Buehner, Matthias Schwaighofer, Anna Klapetek, and Sven Hilbert. Multitasking behavior and its related constructs: Executive functions, working memory capacity, relational integration, and divided attention.Cognition, 189:275–...

work page doi:10.1016/j.cognition.2019.04.010 2019
[7]

Learning directed acyclic graphs based on sparsest permutations

Garvesh Raskutti and Caroline Uhler. Learning directed acyclic graphs based on sparsest permutations. arXiv preprint arXiv:1307.0366v3,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Estimation of linear non-gaussian acyclic models for latent factors.Neurocomputing, 72(7-9):2024–2027,

Shohei Shimizu, Patrik O Hoyer, and Aapo Hyv¨arinen. Estimation of linear non-gaussian acyclic models for latent factors.Neurocomputing, 72(7-9):2024–2027,

work page 2024
[9]

Parameter and Structure Learning in Nested Markov Models

Ilya Shpitser, Thomas S. Richardson, James M. Robins, and Robin Evans. Parameter and structure learning in nested Markov models.arXiv preprint arXiv:1207.5058,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Causal Inference in the Presence of Latent Variables and Selection Bias

13 Under Review Peter L Spirtes, Christopher Meek, and Thomas S Richardson. Causal inference in the presence of latent variables and selection bias.arXiv preprint arXiv:1302.4983,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Statistically based tests for the number of common factors

James H Steiger. Statistically based tests for the number of common factors. InPaper presented at the Annual Meeting of the Psychometric Society, Iowa Cyty, 1980,

work page 1980
[12]

Unpaired multi-domain causal representation learning.arXiv preprint arXiv:2302.00993,

Nils Sturma, Chandler Squires, Mathias Drton, and Caroline Uhler. Unpaired multi-domain causal representation learning.arXiv preprint arXiv:2302.00993,

work page arXiv
[13]

Empirical Evaluation of Rectified Activations in Convolutional Network

Bing Xu. Empirical evaluation of rectified activations in convolutional network.arXiv preprint arXiv:1505.00853,

work page internal anchor Pith review Pith/arXiv arXiv
[14]

deconfounding

Algorithm 2Phase 2 of LGES 1: Input:Empirical covariance ˆΣX over X, tolerance δ, the output Sphase1 and P of Algorithm 1 2:Output:CPDAGS final 3:LetS=S phase1; 4:whileTruedo 5:forL i ∈P(in parallel)do 6:forL j ∈P(in parallel)do 7:ifL i −L j orL i →L j then 8:LetH={L:L−L j and(L−L i orL→L i orL←L i)}; 9:foreach subsetH ′ k ⊆H(in parallel)do 10:S ijk =O LL...

work page 2001
[15]

Among them, the most commonly used one and also the earliest developed one might be rank constraints (Sullivant et al., 2010), which generalize the classical Tetrad representation theorem (Spirtes et al.,

work page 2010
[16]

Based on these new tools, many algorithms have also been developed (Silva et al., 2003; Huang et al., 2022; Dong et al., 2024a)

and basic conditional independence constraints. Based on these new tools, many algorithms have also been developed (Silva et al., 2003; Huang et al., 2022; Dong et al., 2024a). However, despite their theoretical advancements, most existing methods remain within the constraint-based paradigm, heavily relying on statistical tests that suffer from multiple-t...

work page 2003
[17]

If ˆG ∈G ∗ and ˆG ∈arg min G∈G∗ dim(G), then ˆGandG ∗ are Markov equivalent in the large sample limit

Given observation ˆΣX and let G∗ = arg maxG∈GGNF scoreML(G, ˆΣX). If ˆG ∈G ∗ and ˆG ∈arg min G∈G∗ dim(G), then ˆGandG ∗ are Markov equivalent in the large sample limit. Proof. Similar to the proof of Theorem 1, we can show that H( ˆG) =H(G ∗). By Theorem 2, we have ˆGandG ∗ belong to the same MEC. B.4 PROOF OFLEMMA1 Lemma 1(Properties of Initial State).Su...

work page 2002
[18]

measures the relative improvement in terms of fit from the baseline model to the proposed model. The sample CFI is estimated as follows: ˆCFI= 1− max(χ2 k −d fk,0) max(χ2 0 −d f0,0) ,(7) where χ2 k and d fk corresponds to the concerned model while χ2 0 and d f0 corresponds to the baseline independent model that can only parameterize a diagonal covariance ...

work page 1973
[19]

Data is standardized to have zero mean and unit variance

and LBFGS. Data is standardized to have zero mean and unit variance. The hyper parameter δ in Algorithms 1 and 2 is set as δ= 0.25× log(N) N , where N is the sample size. This design follows the spirit of BIC score such that δ→0 when N→ ∞. In practice, we found that the result is only influenced marginally by a small change ofδ. Bellow we provide a furthe...

work page 2007

[1] [1]

Deconfounded score method: Scoring dags with dense unobserved confounding.arXiv preprint arXiv:2103.15106,

Alexis Bellot and Mihaela van der Schaar. Deconfounded score method: Scoring dags with dense unobserved confounding.arXiv preprint arXiv:2103.15106,

work page arXiv

[2] [2]

Barbara M

doi: 10.3102/00028312031003645. Barbara M. Byrne.Structural Equation Modeling with Amos: Basic Concepts, Applications, and Programming. Multivariate Application Series. Toutledge, Taylor and Francis Group, New York London, 2nd edition,

work page doi:10.3102/00028312031003645

[3] [3]

Learning Sparse Causal Models is not NP-hard

Tom Claassen, Joris Mooij, and Tom Heskes. Learning sparse causal models is not np-hard.arXiv preprint arXiv:1309.6824,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Bayesian network learning with cutting planes

James Cussens. Bayesian network learning with cutting planes.arXiv preprint arXiv:1202.3713,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Algebraic sparse factor analysis

Mathias Drton, Alexandros Grosdos, Irem Portakal, and Nils Sturma. Algebraic sparse factor analysis. arXiv preprint arXiv:2312.14762,

work page arXiv

[6] [6]

URL https://psycnet.apa.org/doiLanding?doi=10.1037% 2F0003-066X.48.1.26

doi: 0003-066X. URL https://psycnet.apa.org/doiLanding?doi=10.1037% 2F0003-066X.48.1.26. 11 Under Review Samsad Afrin Himi, Markus Buehner, Matthias Schwaighofer, Anna Klapetek, and Sven Hilbert. Multitasking behavior and its related constructs: Executive functions, working memory capacity, relational integration, and divided attention.Cognition, 189:275–...

work page doi:10.1016/j.cognition.2019.04.010 2019

[7] [7]

Learning directed acyclic graphs based on sparsest permutations

Garvesh Raskutti and Caroline Uhler. Learning directed acyclic graphs based on sparsest permutations. arXiv preprint arXiv:1307.0366v3,

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Estimation of linear non-gaussian acyclic models for latent factors.Neurocomputing, 72(7-9):2024–2027,

Shohei Shimizu, Patrik O Hoyer, and Aapo Hyv¨arinen. Estimation of linear non-gaussian acyclic models for latent factors.Neurocomputing, 72(7-9):2024–2027,

work page 2024

[9] [9]

Parameter and Structure Learning in Nested Markov Models

Ilya Shpitser, Thomas S. Richardson, James M. Robins, and Robin Evans. Parameter and structure learning in nested Markov models.arXiv preprint arXiv:1207.5058,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Causal Inference in the Presence of Latent Variables and Selection Bias

13 Under Review Peter L Spirtes, Christopher Meek, and Thomas S Richardson. Causal inference in the presence of latent variables and selection bias.arXiv preprint arXiv:1302.4983,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Statistically based tests for the number of common factors

James H Steiger. Statistically based tests for the number of common factors. InPaper presented at the Annual Meeting of the Psychometric Society, Iowa Cyty, 1980,

work page 1980

[12] [12]

Unpaired multi-domain causal representation learning.arXiv preprint arXiv:2302.00993,

Nils Sturma, Chandler Squires, Mathias Drton, and Caroline Uhler. Unpaired multi-domain causal representation learning.arXiv preprint arXiv:2302.00993,

work page arXiv

[13] [13]

Empirical Evaluation of Rectified Activations in Convolutional Network

Bing Xu. Empirical evaluation of rectified activations in convolutional network.arXiv preprint arXiv:1505.00853,

work page internal anchor Pith review Pith/arXiv arXiv

[14] [14]

deconfounding

Algorithm 2Phase 2 of LGES 1: Input:Empirical covariance ˆΣX over X, tolerance δ, the output Sphase1 and P of Algorithm 1 2:Output:CPDAGS final 3:LetS=S phase1; 4:whileTruedo 5:forL i ∈P(in parallel)do 6:forL j ∈P(in parallel)do 7:ifL i −L j orL i →L j then 8:LetH={L:L−L j and(L−L i orL→L i orL←L i)}; 9:foreach subsetH ′ k ⊆H(in parallel)do 10:S ijk =O LL...

work page 2001

[15] [15]

Among them, the most commonly used one and also the earliest developed one might be rank constraints (Sullivant et al., 2010), which generalize the classical Tetrad representation theorem (Spirtes et al.,

work page 2010

[16] [16]

Based on these new tools, many algorithms have also been developed (Silva et al., 2003; Huang et al., 2022; Dong et al., 2024a)

and basic conditional independence constraints. Based on these new tools, many algorithms have also been developed (Silva et al., 2003; Huang et al., 2022; Dong et al., 2024a). However, despite their theoretical advancements, most existing methods remain within the constraint-based paradigm, heavily relying on statistical tests that suffer from multiple-t...

work page 2003

[17] [17]

If ˆG ∈G ∗ and ˆG ∈arg min G∈G∗ dim(G), then ˆGandG ∗ are Markov equivalent in the large sample limit

Given observation ˆΣX and let G∗ = arg maxG∈GGNF scoreML(G, ˆΣX). If ˆG ∈G ∗ and ˆG ∈arg min G∈G∗ dim(G), then ˆGandG ∗ are Markov equivalent in the large sample limit. Proof. Similar to the proof of Theorem 1, we can show that H( ˆG) =H(G ∗). By Theorem 2, we have ˆGandG ∗ belong to the same MEC. B.4 PROOF OFLEMMA1 Lemma 1(Properties of Initial State).Su...

work page 2002

[18] [18]

measures the relative improvement in terms of fit from the baseline model to the proposed model. The sample CFI is estimated as follows: ˆCFI= 1− max(χ2 k −d fk,0) max(χ2 0 −d f0,0) ,(7) where χ2 k and d fk corresponds to the concerned model while χ2 0 and d f0 corresponds to the baseline independent model that can only parameterize a diagonal covariance ...

work page 1973

[19] [19]

Data is standardized to have zero mean and unit variance

and LBFGS. Data is standardized to have zero mean and unit variance. The hyper parameter δ in Algorithms 1 and 2 is set as δ= 0.25× log(N) N , where N is the sample size. This design follows the spirit of BIC score such that δ→0 when N→ ∞. In practice, we found that the result is only influenced marginally by a small change ofδ. Bellow we provide a furthe...

work page 2007