pith. sign in

arxiv: 2502.08531 · v3 · submitted 2025-02-12 · 💻 cs.LG · stat.ML

On Different Notions of Redundancy in Conditional-Independence-Based Discovery of Graphical Models

Pith reviewed 2026-05-23 03:18 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords conditional independencegraphical modelsredundancyerror detectionstructure learningindependence testscausal discovery
0
0 comments X

The pith

Conditional independence tests true for all distributions rarely catch errors in graphical models, unlike those implied only by the graph.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines redundant conditional independence tests that remain after a graphical model is learned from data. These unused tests can sometimes detect or correct mistakes introduced by unreliable statistical tests during discovery. The authors separate two kinds of redundancy: statements that must hold no matter what the probability distribution is, and statements that hold only because of the particular graph structure chosen. Only the second kind supplies extra information that can reveal or fix errors. The distinction matters because real datasets produce noisy test results, so algorithms need ways to use leftover checks selectively rather than uniformly.

Core claim

Conditional (in)dependence statements that hold for every probability distribution are unlikely to detect and correct errors in the learned model, in contrast to those that follow only from graphical assumptions.

What carries the argument

The distinction between universal conditional (in)dependencies (true for every distribution) and graph-specific ones (implied solely by the learned structure) when evaluating which redundant tests can expose errors.

If this is right

  • Redundant tests that follow from graphical assumptions can be used to detect or sometimes correct errors produced by faulty statistical tests.
  • Redundant tests that hold for every probability distribution should not be expected to detect or correct such errors.
  • Discovery algorithms must choose which redundant tests to apply according to whether they are graph-specific or universal.
  • The reliability of conditional-independence-based methods improves only when graph-specific redundancies are consulted after initial model construction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Algorithms could be redesigned to reserve graph-specific independence queries for post-learning verification rather than using them only during search.
  • The same distinction may apply to other structure-learning procedures that leave some independence relations untested during model construction.
  • Empirical studies on benchmark datasets with known ground-truth graphs could quantify how often graph-specific checks succeed where universal ones fail.

Load-bearing premise

The theoretical split between universal and graph-specific independencies produces measurable differences in error detection on finite noisy data.

What would settle it

Generate data from a known graph, inject controlled errors into the independence tests used by a discovery algorithm, then measure whether graph-specific redundant tests flag or repair those errors more reliably than universal ones.

Figures

Figures reproduced from arXiv: 2502.08531 by Dominik Janzing, Philipp M. Faller.

Figure 1
Figure 1. Figure 1: The marginal independence tests identify the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Hierarchy of definitions 1 to 3. We argue to [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: In this graph, every dependence along more [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A false test Y ̸⊥⊥ Z may lead to the conclusion that the true graph is X → Y → Z. If this were the only error, the test Y ⊥⊥ Z | X would correct that. But Y ̸⊥⊥ Z | X follows via Graphoid axioms from the marginal tests. such a model exists and thus none of the CI-statements is Graphoid-redundant given the others. 3.2 Error correction One might wonder whether it is also possible to correct certain errors. A… view at source ↗
Figure 5
Figure 5. Figure 5: Experimental results CI-statements that PC would do anyway to detect con￾tradictions without discussing the dependences between tests. (Bromberg and Margaritis, 2009) have also noted that contradictions between CI-tests can be corrected to improve discovery results, and (Kim et al., 2024) correct errors by using Graphoid axioms to conduct a set of tests that are statistically better conditioned. In contras… view at source ↗
Figure 6
Figure 6. Figure 6: Example for a modified graph G′ from the proof of proposition 2. All intermediate nodes on the path X − V − W − Y are replaced by two copies and each new path only connects to either X or Y . To see the former, let (A ⊥⊥ B | C) ∈ L. In our construction we did not add any dependences. If A and B were not connected given C in G, they still are not, as we did not add an edge between nodes that were disconnect… view at source ↗
Figure 7
Figure 7. Figure 7: Visualization how the node W from the proof of proposition 3 can be connected to X and Y in case 2 of the proof. Proof (Proof for lemma 2) Concerning (1): Suppose there is not permutation π ∗ such that all CI-statements in Lπ∗ are as in MG∗ . So for every π there is a CI-statement CI(X, Y | Z) that is not as in MG∗ . If X ⊥G∗ Y | Z G∗ does not have an edge between X and Y . But due to X ̸⊥⊥ Y | Z the graph… view at source ↗
Figure 8
Figure 8. Figure 8: When the collider-structure X0 → X1 ← Z is wrongly identified, all edges Xi+1 → Xi will be oriented the wrong way around. Example 6 (Probabilistically but not Graphoid-redundant) In this example, we will see a case where some CI-statements are probabilistically redundant but not Graphoid-redundant. Suppose we have random variables V = {X, Y, Z, W} and let L = {X ⊥ Y | {Z, W}, X ⊥ Y | ∅, Z ⊥ W | X, Z ⊥ W | … view at source ↗
Figure 9
Figure 9. Figure 9: This example illustrates how the PC algorithm arrives at a graph that contradicts a dependence [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: A and B are coupled over (X, Y, ∅) given ∅ (as defined in definition 5) but neither A and B nor X and Y are coupled given ∅ (as defined in definition 7) as they are not adjacent. G THE MMD ALGORITHM In the following, we will propose an assumption and an algorithm that can correct violations of (a) in lemma 2 in some cases, but at the price of being even more expensive than the SP algorithm. The main purpo… view at source ↗
Figure 12
Figure 12. Figure 12: If this graph is the ground truth but we have the unfaithful independences [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Structural Hamming distance for simplified PC algorithm and MMD algorithm from proposition 3. [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Structural Hamming distance when we modify MMD to optimize over different sets of tests. Additional [PITH_FULL_IMAGE:figures/full_fig_p025_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Incorrect predictions of purely graphically versus Graphoid redundant CI-statements on synthetic [PITH_FULL_IMAGE:figures/full_fig_p026_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Ground truth graphs for the experiment in fig. 5b. [PITH_FULL_IMAGE:figures/full_fig_p027_16.png] view at source ↗
read the original abstract

Conditional-independence-based discovery uses statistical tests to identify a graphical model that represents the independence structure of variables in a dataset. These tests, however, can be unreliable, and algorithms are sensitive to errors and violated assumptions. Often, there are tests that were not used in the construction of the graph. In this work, we show that these redundant tests have the potential to detect or sometimes correct errors in the learned model. But we further show that not all tests contain this additional information and that such redundant tests have to be applied with care. Precisely, we argue that the conditional (in)dependence statements that hold for every probability distribution are unlikely to detect and correct errors - in contrast to those that follow only from graphical assumptions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript distinguishes two notions of redundant conditional independence (CI) statements in constraint-based graphical model discovery: universal CIs (true for every probability distribution) versus graph-specific CIs (implied only by the Markov properties of a candidate DAG). It claims that only the latter can detect or correct errors arising from unreliable statistical tests, while the former cannot and must be applied with care.

Significance. The clarification of redundancy notions could help practitioners select informative unused CI tests for post-hoc validation of learned graphs. If the distinction is made operational, it offers a conceptual tool for improving robustness of algorithms such as PC or FCI. The paper receives credit for cleanly separating semantic notions of redundancy that had not been explicitly contrasted in prior literature on CI-based discovery.

major comments (1)
  1. [Abstract / main argument] Abstract and main argument: the central claim that universal CIs are 'unlikely to detect and correct errors' is load-bearing yet rests on a purely semantic distinction without a concrete counter-example, finite-sample bound, or simulation demonstrating that tests of universal CIs systematically fail to flag inconsistencies that graph-specific CIs would catch under realistic type-I/II error rates.
minor comments (1)
  1. Notation for the two classes of redundant statements could be introduced earlier and used consistently to improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify our central argument. Below we respond to the major comment.

read point-by-point responses
  1. Referee: [Abstract / main argument] Abstract and main argument: the central claim that universal CIs are 'unlikely to detect and correct errors' is load-bearing yet rests on a purely semantic distinction without a concrete counter-example, finite-sample bound, or simulation demonstrating that tests of universal CIs systematically fail to flag inconsistencies that graph-specific CIs would catch under realistic type-I/II error rates.

    Authors: The claim follows directly from the definitions we introduce. A universal CI holds in every probability distribution; therefore an incorrect learned graph cannot cause its violation. Any statistical rejection of a universal CI must be due to test error alone and supplies no evidence that the graph is wrong. By contrast, a graph-specific CI is implied only by the Markov properties of the candidate DAG; its violation is consistent with either test error or an incorrect graph. This is a logical consequence of the two notions rather than an empirical claim requiring bounds or simulations. We nevertheless agree that an explicit illustrative example would make the distinction more accessible and will add one in revision. revision: yes

Circularity Check

0 steps flagged

No circularity: theoretical distinction between universal and graph-specific CIs is self-contained

full rationale

The paper's core argument distinguishes conditional independencies that hold for every probability distribution (universal) from those implied only by the Markov properties of a specific DAG (graph-specific), asserting the former are unlikely to detect or correct errors in learned models. This distinction follows directly from the definitions of the two classes of statements and is presented as a logical consequence without any fitted parameters, predictions that reduce to inputs, or load-bearing self-citations. No equations or derivations in the provided abstract or described content loop back to the inputs by construction; the claim is advanced as a semantic and theoretical observation rather than an empirical or derived result that presupposes itself. The translation to practical error detection is asserted on the basis of this distinction alone and does not rely on prior author work invoked as an external uniqueness theorem or ansatz.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no visible free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5652 in / 823 out tokens · 23087 ms · 2026-05-23T03:18:48.922806+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Sofia Faltenbacher, Jonas Wahl, Rebecca Herman, and Jakob Runge

    PMLR, 2024. Sofia Faltenbacher, Jonas Wahl, Rebecca Herman, and Jakob Runge. Internal incoherency scores for constraint-based causal discovery algorithms.arXiv preprint arXiv:2502.14719, 2025. Dan Geiger and Judea Pearl. Logical and algorithmic properties of conditional independence and graphical models.The annals of statistics, 21(4):2001–2021, 1993. Lui...

  2. [2]

    Jiji Zhang and Peter Spirtes

    PMLR, 2024. Jiji Zhang and Peter Spirtes. The three faces of faith- fulness.Synthese, 193:1011–1027, 2016. Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ram- sey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024. Checklist

  3. [3]

    For all models and algorithms presented, check if you include: (a) A clear description of the mathematical setting, assumptions, algorithm, and/or model. Yes. (b) An analysis of the properties and complexity (time, space, sample size) of any algorithm. Yes. (c) (Optional) Anonymized source code, with spe- cificationofalldependencies, includingexternal lib...

  4. [4]

    For any theoretical claim, check if you include: (a) Statements of the full set of assumptions of all theoretical results. Yes. (b) Complete proofs of all theoretical results. Yes. (c) Clear explanations of any assumptions. Yes

  5. [5]

    For all figures and tables that present empirical results, check if you include: (a) The code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL). Yes. (b) All the training details (e.g., data splits, hyper- parameters, how they were chosen). Yes. (c) A clear definition of the specifi...

  6. [6]

    If you are using existing assets (e.g., code, data, models) or curating/releasing new assets, check if you include: (a) Citations of the creator If your work uses ex- isting assets. Yes. (b) The license information of the assets, if applic- able. Yes. (c) New assets either in the supplemental material or as a URL, if applicable. Yes. (d) Information about...

  7. [7]

    Not Applicable

    If you used crowdsourcing or conducted research with human subjects, check if you include: (a) The full text of instructions given to parti- cipants and screenshots. Not Applicable. (b) Descriptions of potential participant risks, with links to Institutional Review Board (IRB) ap- provals if applicable. Not Applicable. (c) The estimated hourly wage paid t...

  8. [8]

    Symmetry:X⊥ ⊥Y|Z⇐ ⇒Y⊥ ⊥X|Z

  9. [9]

    Decomposition:X⊥ ⊥Y∪W|Z=⇒X⊥ ⊥Y|Z∧X⊥ ⊥W|Z

  10. [10]

    Weak Union:X⊥ ⊥Y∪W|Z=⇒X⊥ ⊥Y|Z∪W∧X⊥ ⊥W|Z∪Y

  11. [11]

    Contraction:X⊥ ⊥Y|Z∧X⊥ ⊥W|Z∪Y=⇒X⊥ ⊥Y∪W|Z Mis calledgraphoidif we further have

  12. [12]

    Bouckaert (1995) uses the following definition to graphically characterise CI-statements that follow via the Graphoid axioms, as used in corollary 6

    Intersection:X⊥ ⊥Y|Z∪W∧X⊥ ⊥W|Z∪Y=⇒X⊥ ⊥Y∪W|Z. Bouckaert (1995) uses the following definition to graphically characterise CI-statements that follow via the Graphoid axioms, as used in corollary 6. Definition 7 (coupling)Let G be an undirected graph overVandX , Y, Z ⊆ Vbe disjoint withX ̸= ∅ ̸=Y. ThenXandYarecoupledgivenZif there areX∈X, Y∈YorY∈X, X∈Ysuch th...

  13. [13]

    Analogously addW→W ′ 1 and W→W ′ 2 if W∈V\N (X∼Y| Z)

    If W ′ ∈V\N (X∼Y| Z)add W1 →W ′ and W2 →W ′. Analogously addW→W ′ 1 and W→W ′ 2 if W∈V\N (X∼Y| Z). Finally addX→W 1 and X←W 1 wheneverX→WorX←WforW∈N(X∼Y|Z)andY→W 2 andY←W 2 respectively. Clearly, in this graph we haveX⊥ G′ Y| Z′, whereZ ′ contains all nodes inZ\N (X∼Y| Z)and W1, W2 for W∈ Z ∩N (X∼Y| Z). Let P be a distribution withM as independence model ...

  14. [14]

    IfYlies betweenXandZ, i.e.X−Y ∗ −Zwe getX̸⊥ T ′ Z|YbutX⊥ T ∗ Z|Y

  15. [15]

    So in any case we have another difference betweenT∗ andT ′

    IfXlies betweenYandZ, i.e.Z ∗ −X−Ywe getZ̸⊥ T ′ Y|XbutZ⊥ T ∗ Y|X. So in any case we have another difference betweenT∗ andT ′. W.l.o.g. we can assume thatZ is the node closest toX on the path betweenX and Y, i.e. the node that has a direct edge toX. Now letW be another node (which exists, becausen > 3). There are three cases how this node can be connected ...

  16. [16]

    if we have X−Z ∗ −Y ∗ −W

    IfYis betweenZandW, i.e. if we have X−Z ∗ −Y ∗ −W. Then we haveX⊥ T ′ W|ZbutX̸⊥ T ∗ W|Z, since inT ∗ we had the direct edge fromXtoY

  17. [17]

    if we have the structure in fig

    If Y does not lie on the paths betweenX and W and X not on the one betweenY and W, i.e. if we have the structure in fig. 7, we will further subdivide this into two subcases depending onT∗. (a) InT ∗ we haveYbetweenXandW, i.e.X−Y ∗ −W. ThenX̸⊥ T ′ W|YbutX⊥ T ∗ W|Y. (b) InT ∗ we haveXbetweenYandW, i.e.W ∗ −X−Y. Then we haveY̸⊥ T ′ W|XbutY⊥ T ∗ W|X

  18. [18]

    W ∗ −X−Z ∗ −Y

    IfXis betweenWandY, i.e. W ∗ −X−Z ∗ −Y. Then this case works symmetrically to the first one. Since we choseW arbitrarily, we can repeat this for every node other thanX, Y, Z. This means, we getn− 3 more contradictions between the independence model ofT ∗ and T ′. So overall we haven− 1CI-statements where T ′ and T ∗ disagree. So even ifT ′ fits all the⌊(n...