Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the $1+K$ Lower Bound

Eichi Uehara

arxiv: 2605.27477 · v1 · pith:TODY4EBAnew · submitted 2026-05-26 · 📊 stat.ML · cs.LG

Iterative Causal Discovery: Per-Edge Impossibility Certificates, Tier-Aware Oracle Queries, and the 1+K Lower Bound

Eichi Uehara This is my paper

Pith reviewed 2026-06-29 15:56 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords causal discoveryDAG orientationoracle queriesimpossibility certificatesidentifiability tiersMarkov equivalence classexpert interventionscontinuous data

0 comments

The pith

Causal discovery recovers any DAG with at most 1 plus the number of non-leaf vertices expert queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a protocol that attaches to every candidate edge either a RESOLVED code naming the identifiability theorem that fixed its direction or an IMPOSSIBLE code listing the exact expert question required to resolve it. Five gated methods (LSNM, IGCI, Stein, MDL, PEIT) extend the bivariate cascade and abstain unless their precondition tests pass. Two oracle primitives, the meta-hub query and the node-children query, together prove an upper bound of 1+K expert interactions that suffices to orient any DAG, where K counts non-leaf vertices. Under an ideal-oracle assumption the bound holds exactly for the asia, sachs, child, and alarm networks.

Core claim

Observational data under Markov and faithfulness conditions identifies only a Markov equivalence class, leaving some edge directions undetermined by the joint distribution. The protocol records per edge whether its orientation follows from one of the five identifiability tiers or requires external resolution. The meta-hub and node-children oracle primitives jointly establish that 1+K interactions recover the full DAG for any graph with K non-leaf vertices, and this count is achieved exactly on the listed benchmarks when every query returns the needed information without error.

What carries the argument

Per-edge impossibility certificates (RESOLVED and IMPOSSIBLE codes) together with the meta-hub query and node-children query primitives.

If this is right

Any DAG is recoverable with at most 1+K oracle interactions.
The bound is attained exactly on the asia, sachs, child, and alarm benchmarks under ideal oracle responses.
The five gated tiers abstain on edges whose preconditions fail, preventing incorrect commitments.
Certificates distinguish directions fixed by data from those assigned by untested assumptions.
The protocol works on continuous data and attaches a discrete question to each unresolved edge.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The per-edge certificates could be attached to outputs of existing causal discovery algorithms to flag which edges still need external validation.
The 1+K bound suggests a minimal expert budget that scales with graph structure rather than total edges.
Testing the same protocol on graphs with known ground truth but different sparsity patterns would show whether the bound remains tight.
Integration with active learning loops could prioritize which IMPOSSIBLE certificates to submit first.

Load-bearing premise

Every oracle query returns exactly the information needed to resolve the impossibility certificate without error or additional cost.

What would settle it

A concrete DAG on which recovering the complete orientation requires strictly more than 1+K ideal responses to the meta-hub and node-children queries.

Figures

Figures reproduced from arXiv: 2605.27477 by Eichi Uehara.

**Figure 2.** Figure 2: Protocol overview. Stage 2’s cascade contains five new gated safe tiers (LSNM, IGCI, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Cross-benchmark 1+K upper bound under the ideal-oracle assumption. Assuming a domain expert who answers each meta-hub and node-children query correctly, the protocol recovers each ground-truth DAG at precision = recall = F1 = 1.000 using exactly 1+K oracle interactions, where K is the count of non-leaf vertices. Query count grows much slower than edge count (alarm: 46 edges committed by 27 interactions). P… view at source ↗

**Figure 4.** Figure 4: Sachs Pareto frontier. Every operating point holds precision [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: diagrams one complete oracle-query round. The protocol selects the IMPOSSIBLE edge with the highest info-value (Section 3), formulates a certificate-specific question, receives one of FWD/BWD/ABSENT, updates the partial DAG, and runs the three zero-prior-cost auto-resolution passes (propagation, transitive mediation, conditional re-audit) before picking the next edge. 1. Pick highest-info-value IMPOSSIBLE … view at source ↗

**Figure 6.** Figure 6: Iterative DAG growth on asia. Black solid arrows are committed edges; the red thick [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Sachs walk-through (V = 11, N = 2000, propagation disabled, multi-tier mediator search enabled). Edges shown are representative of GT structure; the multi-tier mediator search drops 8 spurious skeleton edges at audit time, leaving 21 IMPOSSIBLE that get oracle-queried. The third case study is the financial-returns benchmark stocks_banks, on which the cascade defers every candidate to the oracle. stocks_ban… view at source ↗

**Figure 8.** Figure 8: stocks_banks walk-through. The cascade refuses to auto-commit any edge on dense [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: child walk-through, hub-node subgraph view ( [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: alarm walk-through, cardiovascular-hub subgraph view ( [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

Causal-discovery algorithms return a directed graph, yet provide no principled means of distinguishing edge directions identified by the data from those assigned without an identifying assumption. Under the standard Markov and faithfulness conditions, the observational distribution identifies only a Markov equivalence class; orientations within that class are not determined by the joint distribution and cannot be recovered from additional samples alone, but require either a functional restriction or an intervention. We introduce a protocol for observational causal discovery on continuous data that attaches to each candidate edge a discrete impossibility certificate: a RESOLVED code records the identifiability theorem under which the direction was committed, while an IMPOSSIBLE code records the failure mode together with the specific question a domain expert must answer to resolve it. The bivariate cascade is extended with five gated identifiability tiers LSNM, IGCI, Stein, MDL, and PEIT that abstain when their precondition test rejects. Two oracle primitives, the meta-hub query and the node-children query, jointly establish an upper bound of $1+K$ expert interactions sufficient to recover any DAG, where $K$ denotes the number of non-leaf vertices. Under an ideal-oracle assumption, the bound is met exactly on the asia, sachs, child, and alarm benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is per-edge RESOLVED/IMPOSSIBLE certificates plus two oracle primitives that are claimed to deliver a tight 1+K interaction bound, but the abstract gives no mapping or derivation showing the queries actually cover every certificate type.

read the letter

The concrete advance is the per-edge certificate scheme that tags each direction with either the theorem that resolved it or the exact question an expert must answer. Pairing that with five gated tiers (LSNM, IGCI, Stein, MDL, PEIT) that abstain on precondition failure is a clean way to avoid over-committing on observational data alone. The meta-hub and node-children queries are then presented as sufficient to resolve any remaining IMPOSSIBLE codes with at most 1+K calls, where K is the number of non-leaf nodes, and the bound is reported as tight on the asia, sachs, child, and alarm graphs under an ideal oracle.

What is missing from the abstract is any explicit check that every failure mode produced by the five tiers maps to exactly one invocation of one of the two primitives without residual ambiguity or extra queries. The ideal-oracle assumption simply stipulates that the match occurs; without a derivation or case-by-case accounting, it is not yet clear whether the bound is structural or contingent on that stipulation holding for all DAGs.

The framing is useful for anyone who needs a protocol that stays observational until it must ask a targeted question. Readers working on cost-sensitive causal discovery or on making equivalence-class output more actionable will find the certificate language and the interaction bound worth examining. The citation pattern looks standard for the area and does not appear to hide circularity.

The work is coherent on its own terms and shows clear thinking about how to localize identifiability questions. It deserves referee time because the per-edge reorganization and the claimed bound are specific enough to be checked or refuted in review.

Referee Report

1 major / 1 minor

Summary. The paper claims to introduce per-edge impossibility certificates (RESOLVED and IMPOSSIBLE) for causal discovery using five gated tiers (LSNM, IGCI, Stein, MDL, PEIT) that abstain when preconditions fail. It proposes two oracle primitives (meta-hub query and node-children query) that establish an upper bound of 1+K expert interactions to recover any DAG (K = number of non-leaf vertices), with the bound achieved on asia, sachs, child, and alarm benchmarks under an ideal oracle.

Significance. Should the mapping between IMPOSSIBLE certificates and the query primitives be verified and the bound proven, this would represent a meaningful contribution to interactive causal discovery by providing a principled, low-query protocol for resolving identifiability issues on continuous data. The certificate approach adds transparency to edge orientations.

major comments (1)

[Abstract] Abstract: the central claim that the meta-hub and node-children queries jointly establish the 1+K upper bound requires an explicit mapping or proof that every IMPOSSIBLE certificate from the five gated tiers is resolvable by exactly one invocation of one primitive without residual ambiguity; no such derivation or coverage argument is supplied.

minor comments (1)

[Title] Title: refers to a '$1+K$ Lower Bound' while the abstract describes an upper bound of 1+K expert interactions; the terminology should be reconciled for consistency.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for recognizing the potential contribution of the certificate-based protocol to interactive causal discovery. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the meta-hub and node-children queries jointly establish the 1+K upper bound requires an explicit mapping or proof that every IMPOSSIBLE certificate from the five gated tiers is resolvable by exactly one invocation of one primitive without residual ambiguity; no such derivation or coverage argument is supplied.

Authors: We agree that an explicit mapping and coverage argument would make the central claim more transparent. In the revised manuscript we will insert a new subsection immediately after the definitions of the two oracle primitives. This subsection will contain (i) a complete enumeration of every IMPOSSIBLE code that can be emitted by the LSNM, IGCI, Stein, MDL and PEIT tiers, (ii) a table mapping each code to the single primitive (meta-hub or node-children) that resolves it, and (iii) a short proof that the chosen primitive eliminates the recorded failure mode without residual ambiguity. The argument relies only on the definitions already present in Sections 3 and 4 and does not require additional assumptions. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the 1+K bound derivation

full rationale

The paper presents the 1+K upper bound as following directly from the two defined oracle primitives (meta-hub query and node-children query) under the ideal-oracle assumption, with the bound verified exactly on the listed benchmarks. No equations, fitted parameters, or self-citations are shown that would reduce the bound to a definition or input by construction. The derivation chain is self-contained against the stated primitives and external benchmarks, with no load-bearing reductions of the enumerated kinds.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

Review performed on abstract only; full methods, proofs, and any additional assumptions are unavailable. The ledger therefore records only the explicit background stated in the abstract.

axioms (1)

domain assumption Standard Markov and faithfulness conditions
Explicitly invoked in the abstract as the setting under which only a Markov equivalence class is identified from observational data.

invented entities (2)

RESOLVED and IMPOSSIBLE per-edge certificates no independent evidence
purpose: To record the identifiability theorem used or the exact expert question required for each candidate edge
New discrete codes introduced by the protocol; no independent evidence supplied in abstract.
meta-hub query and node-children query no independent evidence
purpose: Oracle primitives that together suffice for the 1+K bound
Two new query types proposed; no independent evidence supplied in abstract.

pith-pipeline@v0.9.1-grok · 5757 in / 1423 out tokens · 34172 ms · 2026-06-29T15:56:51.042930+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 11 canonical work pages · 1 internal anchor

[1]

pgmpy: Probabilistic graphical models using python

Ankur Ankan and Abinash Panda. pgmpy: Probabilistic graphical models using python. In Proceedings of the 14th Python in Science Conference (SciPy), pages 6--11, 2015. doi:10.25080/Majora-7b98e3ed-001

work page doi:10.25080/majora-7b98e3ed-001 2015
[2]

Variational causal networks: Approximate B ayesian inference over causal structures

Yashas Annadani, Jonas Rothfuss, Alexandre Lacoste, Nino Scherrer, Anirudh Goyal, Yoshua Bengio, and Stefan Bauer. Variational causal networks: Approximate B ayesian inference over causal structures. arXiv preprint arXiv:2106.07635, 2021

work page arXiv 2021
[3]

The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks

Ingo A Beinlich, H Jaap Suermondt, R Martin Chavez, and Gregory F Cooper. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medicine, pages 247--256, 1989. Origin of the ``alarm'' Bayesian-network benchmark

1989
[4]

Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization

Kevin Bello, Bryon Aragam, and Pradeep Ravikumar. Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization. In Advances in Neural Information Processing Systems (NeurIPS), 2022

2022
[5]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57 0 (1): 0 289--300, 1995

1995
[6]

Optimal structure identification with greedy search

David Maxwell Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 0 507--554, 2002

2002
[7]

Learning high-dimensional directed acyclic graphs with latent and selection variables

Diego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40 0 (1): 0 294--321, 2012

2012
[8]

BCD nets: Scalable variational approaches for B ayesian causal discovery

Chris Cundy, Aditya Grover, and Stefano Ermon. BCD nets: Scalable variational approaches for B ayesian causal discovery. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

2021
[9]

On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables

Frederick Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among N variables. arXiv preprint arXiv:1207.1389, 2005

work page internal anchor Pith review Pith/arXiv arXiv 2005
[10]

Review of causal discovery methods based on graphical models

Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10: 0 524, 2019. doi:10.3389/fgene.2019.00524

work page doi:10.3389/fgene.2019.00524 2019
[11]

A kernel statistical test of independence

Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Sch \"o lkopf, and Alex Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems (NeurIPS), 2007

2007
[12]

Two optimal strategies for active learning of causal models from interventional data

Alain Hauser and Peter B \"u hlmann. Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55 0 (4): 0 926--939, 2014. doi:10.1016/j.ijar.2013.11.007

work page doi:10.1016/j.ijar.2013.11.007 2014
[13]

Active learning of causal networks with intervention experiments and optimal designs

Yang-Bo He and Zhi Geng. Active learning of causal networks with intervention experiments and optimal designs. Journal of Machine Learning Research, 9: 0 2523--2547, 2008

2008
[14]

Chickering

David Heckerman, Dan Geiger, and David M. Chickering. Learning B ayesian networks: The combination of knowledge and statistical data. Machine Learning, 20 0 (3): 0 197--243, 1995. doi:10.1023/A:1022623210503

work page doi:10.1023/a:1022623210503 1995
[15]

Nonlinear causal discovery with additive noise models

Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch \"o lkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems (NeurIPS), 2008

2008
[16]

Pairwise likelihood ratios for estimation of non- G aussian structural equation models

Aapo Hyv \"a rinen and Stephen M Smith. Pairwise likelihood ratios for estimation of non- G aussian structural equation models. Journal of Machine Learning Research, 14: 0 111--152, 2013

2013
[17]

o lkopf, Peter B \

Alexander Immer, Christoph Schultheiss, Julia E. Vogt, Bernhard Sch \"o lkopf, Peter B \"u hlmann, and Alexander Marx. On the identifiability and estimation of causal location-scale noise models. In Proceedings of the 40th International Conference on Machine Learning (ICML), volume 202 of PMLR, pages 14316--14332, 2023

2023
[18]

Causal inference using the algorithmic markov condition

Dominik Janzing and Bernhard Sch \"o lkopf. Causal inference using the algorithmic markov condition. IEEE Transactions on Information Theory, 56 0 (10): 0 5168--5194, 2010. doi:10.1109/TIT.2010.2060095. Information-theoretic / MDL-style basis for compression-based causal direction

work page doi:10.1109/tit.2010.2060095 2010
[19]

Information-geometric approach to inferring causal directions

Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniu s is, Bastian Steudel, and Bernhard Sch \"o lkopf. Information-geometric approach to inferring causal directions. Artificial Intelligence, 182--183: 0 1--31, 2012

2012
[20]

Tearing apart NOTEARS : Controlling the graph prediction via variance manipulation

Marcus Kaiser and Maksym Sipos. Tearing apart NOTEARS : Controlling the graph prediction via variance manipulation. arXiv preprint arXiv:2206.07195, 2022

work page arXiv 2022
[21]

Estimating high-dimensional directed acyclic graphs with the pc-algorithm

Markus Kalisch and Peter B \"u hlmann. Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8: 0 613--636, 2007

2007
[22]

Local computations with probabilities on graphical structures and their application to expert systems

Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50 0 (2): 0 157--194, 1988. Origin of the ``asia'' Bayesian-network benchmark

1988
[23]

DiBS : Differentiable B ayesian structure learning

Lars Lorch, Jonas Rothfuss, Bernhard Sch \"o lkopf, and Andreas Krause. DiBS : Differentiable B ayesian structure learning. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

2021
[24]

Amortized inference for causal structure learning

Lars Lorch, Scott Sussex, Jonas Rothfuss, Andreas Krause, and Bernhard Sch \"o lkopf. Amortized inference for causal structure learning. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022

2022
[25]

Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch \"o lkopf

Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch \"o lkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17 0 (32): 0 1--102, 2016

2016
[26]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009

2009
[27]

Elements of Causal Inference: Foundations and Learning Algorithms

Jonas Peters, Dominik Janzing, and Bernhard Sch \"o lkopf. Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2017

2017
[28]

Beware of the simulated dag! causal discovery benchmarks may be easy to game

Alexander Reisach, Christof Seiler, and Sebastian Weichwald. Beware of the simulated dag! causal discovery benchmarks may be easy to game. In Advances in Neural Information Processing Systems (NeurIPS), 2021

2021
[29]

a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \

Paul Rolland, Volkan Cevher, Matth \"a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \"o lkopf, and Francesco Locatello. Score matching enables causal discovery of nonlinear additive noise models. In Proceedings of the 39th International Conference on Machine Learning (ICML), volume 162 of PMLR, pages 18741--18753, 2022. Leaf-node identifi...

2022
[30]

Argumentation for interactive causal discovery

Fabrizio Russo and Francesca Toni. Argumentation for interactive causal discovery. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), pages 7374--7382, 2023. doi:10.24963/ijcai.2023/820

work page doi:10.24963/ijcai.2023/820 2023
[31]

Causal protein-signaling networks derived from multiparameter single-cell data

Karen Sachs, Omar Perez, Dana Pe'er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 0 (5721): 0 523--529, 2005

2005
[32]

Sch \"o lkopf, F

Bernhard Sch \"o lkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109 0 (5): 0 612--634, 2021. doi:10.1109/JPROC.2021.3058954

work page doi:10.1109/jproc.2021.3058954 2021
[33]

Dimakis, and Sriram Vishwanath

Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, and Sriram Vishwanath. Learning causal graphs with small interventions. In Advances in Neural Information Processing Systems (NeurIPS), 2015

2015
[34]

A linear non-gaussian acyclic model for causal discovery

Shohei Shimizu, Patrik O Hoyer, Aapo Hyv \"a rinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 0 2003--2030, 2006

2003
[35]

Directlingam: a direct method for learning a linear non- G aussian structural equation model

Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyv \"a rinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, and Kenneth Bollen. Directlingam: a direct method for learning a linear non- G aussian structural equation model. Journal of Machine Learning Research, 12: 0 1225--1248, 2011

2011
[36]

B ayesian analysis in expert systems

David J Spiegelhalter, A Philip Dawid, Steffen L Lauritzen, and Robert G Cowell. B ayesian analysis in expert systems. Statistical Science, 8 0 (3): 0 219--247, 1993. Origin of the ``child'' Bayesian-network benchmark

1993
[37]

Causation, Prediction, and Search

Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2nd edition, 2000

2000
[38]

Strobl and Thomas A

Eric V. Strobl and Thomas A. Lasko. Identifying patient-specific root causes with the heteroscedastic noise model. Journal of Computational Science, 72: 0 102099, 2023. doi:10.1016/j.jocs.2023.102099. Establishes bivariate identifiability of the location-scale noise model; preprint arXiv:2205.13085 (2022)

work page doi:10.1016/j.jocs.2023.102099 2023
[39]

Interventions, where and how? E xperimental design for causal models at scale

Panagiotis Tigas, Yashas Annadani, Andrew Jesson, Bernhard Sch \"o lkopf, Yarin Gal, and Stefan Bauer. Interventions, where and how? E xperimental design for causal models at scale. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022

2022
[40]

On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias

Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172 0 (16-17): 0 1873--1896, 2008

2008
[41]

On the identifiability of the post-nonlinear causal model

Kun Zhang and Aapo Hyv \"a rinen. On the identifiability of the post-nonlinear causal model. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), pages 647--655, 2009

2009
[42]

Dags with no tears: Continuous optimization for structure learning

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems (NeurIPS), 2018

2018

[1] [1]

pgmpy: Probabilistic graphical models using python

Ankur Ankan and Abinash Panda. pgmpy: Probabilistic graphical models using python. In Proceedings of the 14th Python in Science Conference (SciPy), pages 6--11, 2015. doi:10.25080/Majora-7b98e3ed-001

work page doi:10.25080/majora-7b98e3ed-001 2015

[2] [2]

Variational causal networks: Approximate B ayesian inference over causal structures

Yashas Annadani, Jonas Rothfuss, Alexandre Lacoste, Nino Scherrer, Anirudh Goyal, Yoshua Bengio, and Stefan Bauer. Variational causal networks: Approximate B ayesian inference over causal structures. arXiv preprint arXiv:2106.07635, 2021

work page arXiv 2021

[3] [3]

The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks

Ingo A Beinlich, H Jaap Suermondt, R Martin Chavez, and Gregory F Cooper. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. Proceedings of the Second European Conference on Artificial Intelligence in Medicine, pages 247--256, 1989. Origin of the ``alarm'' Bayesian-network benchmark

1989

[4] [4]

Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization

Kevin Bello, Bryon Aragam, and Pradeep Ravikumar. Dagma: Learning dags via m-matrices and a log-determinant acyclicity characterization. In Advances in Neural Information Processing Systems (NeurIPS), 2022

2022

[5] [5]

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological), 57 0 (1): 0 289--300, 1995

1995

[6] [6]

Optimal structure identification with greedy search

David Maxwell Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 3: 0 507--554, 2002

2002

[7] [7]

Learning high-dimensional directed acyclic graphs with latent and selection variables

Diego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high-dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 40 0 (1): 0 294--321, 2012

2012

[8] [8]

BCD nets: Scalable variational approaches for B ayesian causal discovery

Chris Cundy, Aditya Grover, and Stefano Ermon. BCD nets: Scalable variational approaches for B ayesian causal discovery. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

2021

[9] [9]

On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables

Frederick Eberhardt, Clark Glymour, and Richard Scheines. On the number of experiments sufficient and in the worst case necessary to identify all causal relations among N variables. arXiv preprint arXiv:1207.1389, 2005

work page internal anchor Pith review Pith/arXiv arXiv 2005

[10] [10]

Review of causal discovery methods based on graphical models

Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10: 0 524, 2019. doi:10.3389/fgene.2019.00524

work page doi:10.3389/fgene.2019.00524 2019

[11] [11]

A kernel statistical test of independence

Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Sch \"o lkopf, and Alex Smola. A kernel statistical test of independence. In Advances in Neural Information Processing Systems (NeurIPS), 2007

2007

[12] [12]

Two optimal strategies for active learning of causal models from interventional data

Alain Hauser and Peter B \"u hlmann. Two optimal strategies for active learning of causal models from interventional data. International Journal of Approximate Reasoning, 55 0 (4): 0 926--939, 2014. doi:10.1016/j.ijar.2013.11.007

work page doi:10.1016/j.ijar.2013.11.007 2014

[13] [13]

Active learning of causal networks with intervention experiments and optimal designs

Yang-Bo He and Zhi Geng. Active learning of causal networks with intervention experiments and optimal designs. Journal of Machine Learning Research, 9: 0 2523--2547, 2008

2008

[14] [14]

Chickering

David Heckerman, Dan Geiger, and David M. Chickering. Learning B ayesian networks: The combination of knowledge and statistical data. Machine Learning, 20 0 (3): 0 197--243, 1995. doi:10.1023/A:1022623210503

work page doi:10.1023/a:1022623210503 1995

[15] [15]

Nonlinear causal discovery with additive noise models

Patrik O Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch \"o lkopf. Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems (NeurIPS), 2008

2008

[16] [16]

Pairwise likelihood ratios for estimation of non- G aussian structural equation models

Aapo Hyv \"a rinen and Stephen M Smith. Pairwise likelihood ratios for estimation of non- G aussian structural equation models. Journal of Machine Learning Research, 14: 0 111--152, 2013

2013

[17] [17]

o lkopf, Peter B \

Alexander Immer, Christoph Schultheiss, Julia E. Vogt, Bernhard Sch \"o lkopf, Peter B \"u hlmann, and Alexander Marx. On the identifiability and estimation of causal location-scale noise models. In Proceedings of the 40th International Conference on Machine Learning (ICML), volume 202 of PMLR, pages 14316--14332, 2023

2023

[18] [18]

Causal inference using the algorithmic markov condition

Dominik Janzing and Bernhard Sch \"o lkopf. Causal inference using the algorithmic markov condition. IEEE Transactions on Information Theory, 56 0 (10): 0 5168--5194, 2010. doi:10.1109/TIT.2010.2060095. Information-theoretic / MDL-style basis for compression-based causal direction

work page doi:10.1109/tit.2010.2060095 2010

[19] [19]

Information-geometric approach to inferring causal directions

Dominik Janzing, Joris Mooij, Kun Zhang, Jan Lemeire, Jakob Zscheischler, Povilas Daniu s is, Bastian Steudel, and Bernhard Sch \"o lkopf. Information-geometric approach to inferring causal directions. Artificial Intelligence, 182--183: 0 1--31, 2012

2012

[20] [20]

Tearing apart NOTEARS : Controlling the graph prediction via variance manipulation

Marcus Kaiser and Maksym Sipos. Tearing apart NOTEARS : Controlling the graph prediction via variance manipulation. arXiv preprint arXiv:2206.07195, 2022

work page arXiv 2022

[21] [21]

Estimating high-dimensional directed acyclic graphs with the pc-algorithm

Markus Kalisch and Peter B \"u hlmann. Estimating high-dimensional directed acyclic graphs with the pc-algorithm. Journal of Machine Learning Research, 8: 0 613--636, 2007

2007

[22] [22]

Local computations with probabilities on graphical structures and their application to expert systems

Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50 0 (2): 0 157--194, 1988. Origin of the ``asia'' Bayesian-network benchmark

1988

[23] [23]

DiBS : Differentiable B ayesian structure learning

Lars Lorch, Jonas Rothfuss, Bernhard Sch \"o lkopf, and Andreas Krause. DiBS : Differentiable B ayesian structure learning. In Advances in Neural Information Processing Systems 34 (NeurIPS), 2021

2021

[24] [24]

Amortized inference for causal structure learning

Lars Lorch, Scott Sussex, Jonas Rothfuss, Andreas Krause, and Bernhard Sch \"o lkopf. Amortized inference for causal structure learning. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022

2022

[25] [25]

Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch \"o lkopf

Joris M. Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Sch \"o lkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17 0 (32): 0 1--102, 2016

2016

[26] [26]

Causality: Models, Reasoning, and Inference

Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009

2009

[27] [27]

Elements of Causal Inference: Foundations and Learning Algorithms

Jonas Peters, Dominik Janzing, and Bernhard Sch \"o lkopf. Elements of Causal Inference: Foundations and Learning Algorithms. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 2017

2017

[28] [28]

Beware of the simulated dag! causal discovery benchmarks may be easy to game

Alexander Reisach, Christof Seiler, and Sebastian Weichwald. Beware of the simulated dag! causal discovery benchmarks may be easy to game. In Advances in Neural Information Processing Systems (NeurIPS), 2021

2021

[29] [29]

a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \

Paul Rolland, Volkan Cevher, Matth \"a us Kleindessner, Chris Russell, Dominik Janzing, Bernhard Sch \"o lkopf, and Francesco Locatello. Score matching enables causal discovery of nonlinear additive noise models. In Proceedings of the 39th International Conference on Machine Learning (ICML), volume 162 of PMLR, pages 18741--18753, 2022. Leaf-node identifi...

2022

[30] [30]

Argumentation for interactive causal discovery

Fabrizio Russo and Francesca Toni. Argumentation for interactive causal discovery. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI), pages 7374--7382, 2023. doi:10.24963/ijcai.2023/820

work page doi:10.24963/ijcai.2023/820 2023

[31] [31]

Causal protein-signaling networks derived from multiparameter single-cell data

Karen Sachs, Omar Perez, Dana Pe'er, Douglas A Lauffenburger, and Garry P Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308 0 (5721): 0 523--529, 2005

2005

[32] [32]

Sch \"o lkopf, F

Bernhard Sch \"o lkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109 0 (5): 0 612--634, 2021. doi:10.1109/JPROC.2021.3058954

work page doi:10.1109/jproc.2021.3058954 2021

[33] [33]

Dimakis, and Sriram Vishwanath

Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, and Sriram Vishwanath. Learning causal graphs with small interventions. In Advances in Neural Information Processing Systems (NeurIPS), 2015

2015

[34] [34]

A linear non-gaussian acyclic model for causal discovery

Shohei Shimizu, Patrik O Hoyer, Aapo Hyv \"a rinen, and Antti Kerminen. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7: 0 2003--2030, 2006

2003

[35] [35]

Directlingam: a direct method for learning a linear non- G aussian structural equation model

Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyv \"a rinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, and Kenneth Bollen. Directlingam: a direct method for learning a linear non- G aussian structural equation model. Journal of Machine Learning Research, 12: 0 1225--1248, 2011

2011

[36] [36]

B ayesian analysis in expert systems

David J Spiegelhalter, A Philip Dawid, Steffen L Lauritzen, and Robert G Cowell. B ayesian analysis in expert systems. Statistical Science, 8 0 (3): 0 219--247, 1993. Origin of the ``child'' Bayesian-network benchmark

1993

[37] [37]

Causation, Prediction, and Search

Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. MIT Press, 2nd edition, 2000

2000

[38] [38]

Strobl and Thomas A

Eric V. Strobl and Thomas A. Lasko. Identifying patient-specific root causes with the heteroscedastic noise model. Journal of Computational Science, 72: 0 102099, 2023. doi:10.1016/j.jocs.2023.102099. Establishes bivariate identifiability of the location-scale noise model; preprint arXiv:2205.13085 (2022)

work page doi:10.1016/j.jocs.2023.102099 2023

[39] [39]

Interventions, where and how? E xperimental design for causal models at scale

Panagiotis Tigas, Yashas Annadani, Andrew Jesson, Bernhard Sch \"o lkopf, Yarin Gal, and Stefan Bauer. Interventions, where and how? E xperimental design for causal models at scale. In Advances in Neural Information Processing Systems 35 (NeurIPS), 2022

2022

[40] [40]

On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias

Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172 0 (16-17): 0 1873--1896, 2008

2008

[41] [41]

On the identifiability of the post-nonlinear causal model

Kun Zhang and Aapo Hyv \"a rinen. On the identifiability of the post-nonlinear causal model. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI), pages 647--655, 2009

2009

[42] [42]

Dags with no tears: Continuous optimization for structure learning

Xun Zheng, Bryon Aragam, Pradeep Ravikumar, and Eric P Xing. Dags with no tears: Continuous optimization for structure learning. In Advances in Neural Information Processing Systems (NeurIPS), 2018

2018