pith. sign in

arxiv: 2605.27477 · v1 · pith:TODY4EBAnew · submitted 2026-05-26 · 📊 stat.ML · cs.LG

Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the 1+K Lower Bound

Pith reviewed 2026-06-29 15:56 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords causal discoveryDAG orientationoracle queriesimpossibility certificatesidentifiability tiersMarkov equivalence classexpert interventionscontinuous data
0
0 comments X

The pith

Causal discovery recovers any DAG with at most 1 plus the number of non-leaf vertices expert queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a protocol that attaches to every candidate edge either a RESOLVED code naming the identifiability theorem that fixed its direction or an IMPOSSIBLE code listing the exact expert question required to resolve it. Five gated methods (LSNM, IGCI, Stein, MDL, PEIT) extend the bivariate cascade and abstain unless their precondition tests pass. Two oracle primitives, the meta-hub query and the node-children query, together prove an upper bound of 1+K expert interactions that suffices to orient any DAG, where K counts non-leaf vertices. Under an ideal-oracle assumption the bound holds exactly for the asia, sachs, child, and alarm networks.

Core claim

Observational data under Markov and faithfulness conditions identifies only a Markov equivalence class, leaving some edge directions undetermined by the joint distribution. The protocol records per edge whether its orientation follows from one of the five identifiability tiers or requires external resolution. The meta-hub and node-children oracle primitives jointly establish that 1+K interactions recover the full DAG for any graph with K non-leaf vertices, and this count is achieved exactly on the listed benchmarks when every query returns the needed information without error.

What carries the argument

Per-edge impossibility certificates (RESOLVED and IMPOSSIBLE codes) together with the meta-hub query and node-children query primitives.

If this is right

  • Any DAG is recoverable with at most 1+K oracle interactions.
  • The bound is attained exactly on the asia, sachs, child, and alarm benchmarks under ideal oracle responses.
  • The five gated tiers abstain on edges whose preconditions fail, preventing incorrect commitments.
  • Certificates distinguish directions fixed by data from those assigned by untested assumptions.
  • The protocol works on continuous data and attaches a discrete question to each unresolved edge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The per-edge certificates could be attached to outputs of existing causal discovery algorithms to flag which edges still need external validation.
  • The 1+K bound suggests a minimal expert budget that scales with graph structure rather than total edges.
  • Testing the same protocol on graphs with known ground truth but different sparsity patterns would show whether the bound remains tight.
  • Integration with active learning loops could prioritize which IMPOSSIBLE certificates to submit first.

Load-bearing premise

Every oracle query returns exactly the information needed to resolve the impossibility certificate without error or additional cost.

What would settle it

A concrete DAG on which recovering the complete orientation requires strictly more than 1+K ideal responses to the meta-hub and node-children queries.

Figures

Figures reproduced from arXiv: 2605.27477 by Eichi Uehara.

Figure 1
Figure 1. Figure 1: Existing causal-discovery output formats and the per-edge certificate alternative. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Protocol overview. Stage 2’s cascade contains five new gated safe tiers (LSNM, IGCI, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cross-benchmark 1+K upper bound under the ideal-oracle assumption. Assuming a domain expert who answers each meta-hub and node-children query correctly, the protocol recovers each ground-truth DAG at precision = recall = F1 = 1.000 using exactly 1+K oracle interactions, where K is the count of non-leaf vertices. Query count grows much slower than edge count (alarm: 46 edges committed by 27 interactions). P… view at source ↗
Figure 4
Figure 4. Figure 4: Sachs Pareto frontier. Every operating point holds precision [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: diagrams one complete oracle-query round. The protocol selects the IMPOSSIBLE edge with the highest info-value (Section 3), formulates a certificate-specific question, receives one of FWD/BWD/ABSENT, updates the partial DAG, and runs the three zero-prior-cost auto-resolution passes (propagation, transitive mediation, conditional re-audit) before picking the next edge. 1. Pick highest-info-value IMPOSSIBLE … view at source ↗
Figure 6
Figure 6. Figure 6: Iterative DAG growth on asia. Black solid arrows are committed edges; the red thick [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sachs walk-through (V = 11, N = 2000, propagation disabled, multi-tier mediator search enabled). Edges shown are representative of GT structure; the multi-tier mediator search drops 8 spurious skeleton edges at audit time, leaving 21 IMPOSSIBLE that get oracle-queried. The third case study is the financial-returns benchmark stocks_banks, on which the cascade defers every candidate to the oracle. stocks_ban… view at source ↗
Figure 8
Figure 8. Figure 8: stocks_banks walk-through. The cascade refuses to auto-commit any edge on dense [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: child walk-through, hub-node subgraph view ( [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: alarm walk-through, cardiovascular-hub subgraph view ( [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗
read the original abstract

Causal-discovery algorithms return a directed graph, yet provide no principled means of distinguishing edge directions identified by the data from those assigned without an identifying assumption. Under the standard Markov and faithfulness conditions, the observational distribution identifies only a Markov equivalence class; orientations within that class are not determined by the joint distribution and cannot be recovered from additional samples alone, but require either a functional restriction or an intervention. We introduce a protocol for observational causal discovery on continuous data that attaches to each candidate edge a discrete impossibility certificate: a RESOLVED code records the identifiability theorem under which the direction was committed, while an IMPOSSIBLE code records the failure mode together with the specific question a domain expert must answer to resolve it. The bivariate cascade is extended with five gated identifiability tiers LSNM, IGCI, Stein, MDL, and PEIT that abstain when their precondition test rejects. Two oracle primitives, the meta-hub query and the node-children query, jointly establish an upper bound of $1+K$ expert interactions sufficient to recover any DAG, where $K$ denotes the number of non-leaf vertices. Under an ideal-oracle assumption, the bound is met exactly on the asia, sachs, child, and alarm benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims to introduce per-edge impossibility certificates (RESOLVED and IMPOSSIBLE) for causal discovery using five gated tiers (LSNM, IGCI, Stein, MDL, PEIT) that abstain when preconditions fail. It proposes two oracle primitives (meta-hub query and node-children query) that establish an upper bound of 1+K expert interactions to recover any DAG (K = number of non-leaf vertices), with the bound achieved on asia, sachs, child, and alarm benchmarks under an ideal oracle.

Significance. Should the mapping between IMPOSSIBLE certificates and the query primitives be verified and the bound proven, this would represent a meaningful contribution to interactive causal discovery by providing a principled, low-query protocol for resolving identifiability issues on continuous data. The certificate approach adds transparency to edge orientations.

major comments (1)
  1. [Abstract] Abstract: the central claim that the meta-hub and node-children queries jointly establish the 1+K upper bound requires an explicit mapping or proof that every IMPOSSIBLE certificate from the five gated tiers is resolvable by exactly one invocation of one primitive without residual ambiguity; no such derivation or coverage argument is supplied.
minor comments (1)
  1. [Title] Title: refers to a '$1+K$ Lower Bound' while the abstract describes an upper bound of 1+K expert interactions; the terminology should be reconciled for consistency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for recognizing the potential contribution of the certificate-based protocol to interactive causal discovery. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the meta-hub and node-children queries jointly establish the 1+K upper bound requires an explicit mapping or proof that every IMPOSSIBLE certificate from the five gated tiers is resolvable by exactly one invocation of one primitive without residual ambiguity; no such derivation or coverage argument is supplied.

    Authors: We agree that an explicit mapping and coverage argument would make the central claim more transparent. In the revised manuscript we will insert a new subsection immediately after the definitions of the two oracle primitives. This subsection will contain (i) a complete enumeration of every IMPOSSIBLE code that can be emitted by the LSNM, IGCI, Stein, MDL and PEIT tiers, (ii) a table mapping each code to the single primitive (meta-hub or node-children) that resolves it, and (iii) a short proof that the chosen primitive eliminates the recorded failure mode without residual ambiguity. The argument relies only on the definitions already present in Sections 3 and 4 and does not require additional assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the 1+K bound derivation

full rationale

The paper presents the 1+K upper bound as following directly from the two defined oracle primitives (meta-hub query and node-children query) under the ideal-oracle assumption, with the bound verified exactly on the listed benchmarks. No equations, fitted parameters, or self-citations are shown that would reduce the bound to a definition or input by construction. The derivation chain is self-contained against the stated primitives and external benchmarks, with no load-bearing reductions of the enumerated kinds.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Review performed on abstract only; full methods, proofs, and any additional assumptions are unavailable. The ledger therefore records only the explicit background stated in the abstract.

axioms (1)
  • domain assumption Standard Markov and faithfulness conditions
    Explicitly invoked in the abstract as the setting under which only a Markov equivalence class is identified from observational data.
invented entities (2)
  • RESOLVED and IMPOSSIBLE per-edge certificates no independent evidence
    purpose: To record the identifiability theorem used or the exact expert question required for each candidate edge
    New discrete codes introduced by the protocol; no independent evidence supplied in abstract.
  • meta-hub query and node-children query no independent evidence
    purpose: Oracle primitives that together suffice for the 1+K bound
    Two new query types proposed; no independent evidence supplied in abstract.

pith-pipeline@v0.9.1-grok · 5757 in / 1423 out tokens · 34172 ms · 2026-06-29T15:56:51.042930+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 11 canonical work pages · 1 internal anchor

  1. [1]

    pgmpy: Probabilistic graphical models using python

    Ankur Ankan and Abinash Panda. pgmpy: Probabilistic graphical models using python. In Proceedings of the 14th Python in Science Conference (SciPy), pages 6--11, 2015. doi:10.25080/Majora-7b98e3ed-001

  2. [2]

    Variational causal networks: Approximate B ayesian inference over causal structures

    Yashas Annadani, Jonas Rothfuss, Alexandre Lacoste, Nino Scherrer, Anirudh Goyal, Yoshua Bengio, and Stefan Bauer. Variational causal networks: Approximate B ayesian inference over causal structures. arXiv preprint arXiv:2106.07635, 2021

  3. [3]

    The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks

    Ingo A Beinlich, H Jaap Suermondt, R Martin Chavez, and Gregory F Cooper. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medicine, pages 247--256, 1989. Origin of the ``alarm'' Bayesian-network benchmark

  4. [4]

    Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization

    Kevin Bello, Bryon Aragam, and Pradeep Ravikumar. Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization. In Advances in Neural Information Processing Systems (NeurIPS), 2022

  5. [5]

    Controlling the false discovery rate: a practical and powerful approach to multiple testing

    Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57 0 (1): 0 289--300, 1995

  6. [6]

    Optimal structure identification with greedy search

    David Maxwell Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 0 507--554, 2002

  7. [7]

    Learning high-dimensional directed acyclic graphs with latent and selection variables

    Diego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40 0 (1): 0 294--321, 2012

  8. [8]

    BCD nets: Scalable variational approaches for B ayesian causal discovery

    Chris Cundy, Aditya Grover, and Stefano Ermon. BCD nets: Scalable variational approaches for B ayesian causal discovery. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

  9. [9]

    On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables

    Frederick Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among N variables. arXiv preprint arXiv:1207.1389, 2005

  10. [10]

    Review of causal discovery methods based on graphical models

    Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10: 0 524, 2019. doi:10.3389/fgene.2019.00524

  11. [11]

    A kernel statistical test of independence

    Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Sch \"o lkopf, and Alex Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems (NeurIPS), 2007

  12. [12]

    Two optimal strategies for active learning of causal models from interventional data

    Alain Hauser and Peter B \"u hlmann. Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55 0 (4): 0 926--939, 2014. doi:10.1016/j.ijar.2013.11.007

  13. [13]

    Active learning of causal networks with intervention experiments and optimal designs

    Yang-Bo He and Zhi Geng. Active learning of causal networks with intervention experiments and optimal designs. Journal of Machine Learning Research, 9: 0 2523--2547, 2008

  14. [14]

    Chickering

    David Heckerman, Dan Geiger, and David M. Chickering. Learning B ayesian networks: The combination of knowledge and statistical data. Machine Learning, 20 0 (3): 0 197--243, 1995. doi:10.1023/A:1022623210503

  15. [15]

    Nonlinear causal discovery with additive noise models

    Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch \"o lkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems (NeurIPS), 2008

  16. [16]

    Pairwise likelihood ratios for estimation of non- G aussian structural equation models

    Aapo Hyv \"a rinen and Stephen M Smith. Pairwise likelihood ratios for estimation of non- G aussian structural equation models. Journal of Machine Learning Research, 14: 0 111--152, 2013

  17. [17]

    o lkopf, Peter B \

    Alexander Immer, Christoph Schultheiss, Julia E. Vogt, Bernhard Sch \"o lkopf, Peter B \"u hlmann, and Alexander Marx. On the identifiability and estimation of causal location-scale noise models. In Proceedings of the 40th International Conference on Machine Learning (ICML), volume 202 of PMLR, pages 14316--14332, 2023

  18. [18]

    Causal inference using the algorithmic markov condition

    Dominik Janzing and Bernhard Sch \"o lkopf. Causal inference using the algorithmic markov condition. IEEE Transactions on Information Theory, 56 0 (10): 0 5168--5194, 2010. doi:10.1109/TIT.2010.2060095. Information-theoretic / MDL-style basis for compression-based causal direction

  19. [19]

    Information-geometric approach to inferring causal directions

    Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniu s is, Bastian Steudel, and Bernhard Sch \"o lkopf. Information-geometric approach to inferring causal directions. Artificial Intelligence, 182--183: 0 1--31, 2012

  20. [20]

    Tearing apart NOTEARS : Controlling the graph prediction via variance manipulation

    Marcus Kaiser and Maksym Sipos. Tearing apart NOTEARS : Controlling the graph prediction via variance manipulation. arXiv preprint arXiv:2206.07195, 2022

  21. [21]

    Estimating high-dimensional directed acyclic graphs with the pc-algorithm

    Markus Kalisch and Peter B \"u hlmann. Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8: 0 613--636, 2007

  22. [22]

    Local computations with probabilities on graphical structures and their application to expert systems

    Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50 0 (2): 0 157--194, 1988. Origin of the ``asia'' Bayesian-network benchmark

  23. [23]

    DiBS : Differentiable B ayesian structure learning

    Lars Lorch, Jonas Rothfuss, Bernhard Sch \"o lkopf, and Andreas Krause. DiBS : Differentiable B ayesian structure learning. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

  24. [24]

    Amortized inference for causal structure learning

    Lars Lorch, Scott Sussex, Jonas Rothfuss, Andreas Krause, and Bernhard Sch \"o lkopf. Amortized inference for causal structure learning. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022

  25. [25]

    Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch \"o lkopf

    Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch \"o lkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17 0 (32): 0 1--102, 2016

  26. [26]

    Causality: Models, Reasoning, and Inference

    Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009

  27. [27]

    Elements of Causal Inference: Foundations and Learning Algorithms

    Jonas Peters, Dominik Janzing, and Bernhard Sch \"o lkopf. Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2017

  28. [28]

    Beware of the simulated dag! causal discovery benchmarks may be easy to game

    Alexander Reisach, Christof Seiler, and Sebastian Weichwald. Beware of the simulated dag! causal discovery benchmarks may be easy to game. In Advances in Neural Information Processing Systems (NeurIPS), 2021

  29. [29]

    a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \

    Paul Rolland, Volkan Cevher, Matth \"a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \"o lkopf, and Francesco Locatello. Score matching enables causal discovery of nonlinear additive noise models. In Proceedings of the 39th International Conference on Machine Learning (ICML), volume 162 of PMLR, pages 18741--18753, 2022. Leaf-node identifi...

  30. [30]

    Argumentation for interactive causal discovery

    Fabrizio Russo and Francesca Toni. Argumentation for interactive causal discovery. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), pages 7374--7382, 2023. doi:10.24963/ijcai.2023/820

  31. [31]

    Causal protein-signaling networks derived from multiparameter single-cell data

    Karen Sachs, Omar Perez, Dana Pe'er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 0 (5721): 0 523--529, 2005

  32. [32]

    Sch \"o lkopf, F

    Bernhard Sch \"o lkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109 0 (5): 0 612--634, 2021. doi:10.1109/JPROC.2021.3058954

  33. [33]

    Dimakis, and Sriram Vishwanath

    Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, and Sriram Vishwanath. Learning causal graphs with small interventions. In Advances in Neural Information Processing Systems (NeurIPS), 2015

  34. [34]

    A linear non-gaussian acyclic model for causal discovery

    Shohei Shimizu, Patrik O Hoyer, Aapo Hyv \"a rinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 0 2003--2030, 2006

  35. [35]

    Directlingam: a direct method for learning a linear non- G aussian structural equation model

    Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyv \"a rinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, and Kenneth Bollen. Directlingam: a direct method for learning a linear non- G aussian structural equation model. Journal of Machine Learning Research, 12: 0 1225--1248, 2011

  36. [36]

    B ayesian analysis in expert systems

    David J Spiegelhalter, A Philip Dawid, Steffen L Lauritzen, and Robert G Cowell. B ayesian analysis in expert systems. Statistical Science, 8 0 (3): 0 219--247, 1993. Origin of the ``child'' Bayesian-network benchmark

  37. [37]

    Causation, Prediction, and Search

    Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2nd edition, 2000

  38. [38]

    Strobl and Thomas A

    Eric V. Strobl and Thomas A. Lasko. Identifying patient-specific root causes with the heteroscedastic noise model. Journal of Computational Science, 72: 0 102099, 2023. doi:10.1016/j.jocs.2023.102099. Establishes bivariate identifiability of the location-scale noise model; preprint arXiv:2205.13085 (2022)

  39. [39]

    Interventions, where and how? E xperimental design for causal models at scale

    Panagiotis Tigas, Yashas Annadani, Andrew Jesson, Bernhard Sch \"o lkopf, Yarin Gal, and Stefan Bauer. Interventions, where and how? E xperimental design for causal models at scale. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022

  40. [40]

    On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias

    Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172 0 (16-17): 0 1873--1896, 2008

  41. [41]

    On the identifiability of the post-nonlinear causal model

    Kun Zhang and Aapo Hyv \"a rinen. On the identifiability of the post-nonlinear causal model. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), pages 647--655, 2009

  42. [42]

    Dags with no tears: Continuous optimization for structure learning

    Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems (NeurIPS), 2018