Causal Discovery in the Era of Agents

Haoyue Dai; Kun Zhang; Mantej Gill; Peter Spirtes; Vishal Verma; Yujia Zheng

arxiv: 2606.23608 · v1 · pith:ZA6T4S2Hnew · submitted 2026-06-22 · 💻 cs.AI · cs.LG· cs.SE· stat.AP

Causal Discovery in the Era of Agents

Yujia Zheng , Vishal Verma , Mantej Gill , Haoyue Dai , Peter Spirtes , Kun Zhang This is my paper

Pith reviewed 2026-06-26 08:19 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.SEstat.AP

keywords causal discoverylarge language modelsagentsworkflow assistancecausal-learngraph structureassumption explanation

0 comments

The pith

Agents should handle data inspection and assumption explanation in causal discovery but must not generate edges, directions, or conclusions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that recent efforts to let large language models propose causal graphs or supply priors mix textual associations with data evidence and risk introducing artifacts. It advances the principle that agents assist only supporting steps while all causal claims stay anchored in data, explicit assumptions, formal algorithms, and expert decisions. This separation is realized in the causal-learn+ platform, which coordinates preprocessing, method choice, expert input, discovery runs, and interpretation around the causal-learn ecosystem. A case study with Big Five personality data shows an agent-assisted workflow that avoids turning model unreliability into causal evidence.

Core claim

Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. Causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions, as instantiated in the causal-learn+ platform that coordinates analysis around the causal-learn ecosystem without allowing language-model outputs to become causal evidence.

What carries the argument

The principle that agents assist the workflow while causal claims remain grounded exclusively in data, assumptions, algorithms and expert decisions, implemented as causal-learn+.

If this is right

Causal discovery pipelines can incorporate language models for context retrieval and output clarification without the models determining structure.
Expert knowledge enters only through explicit user or domain-expert input rather than model-proposed constraints.
Method selection and diagnostics stay under algorithmic control even when agents suggest candidates.
Interpretation of results remains traceable to data and formal assumptions rather than textual associations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation principle could apply to other scientific workflows where generative models risk substituting associations for measurements.
Platforms built this way may make it easier to audit exactly which steps relied on data versus assistance.
If the separation proves hard to enforce, hybrid human-AI review checkpoints would become necessary at every handoff.

Load-bearing premise

That agent roles can be kept strictly separate from causal inference steps so language-model outputs never leak into final edges, priors or conclusions.

What would settle it

An empirical case where an agent limited to inspection and explanation still produces a causal graph whose structure matches a known language-model hallucination rather than the data diagnostics.

read the original abstract

Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constraints. These approaches promise faster analysis, but they also obscure whether a causal evidence is supported by data and assumptions or by textual associations, prompt artifacts and hallucinated mechanisms. We argue for a different role for agents in causal discovery. Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. We propose the principle that agents assist the workflow, while causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions. We instantiate this principle in causal-learn+, an online platform that coordinates data analysis, preprocessing, method recommendation, expert-knowledge incorporation, formal discovery and interpretation around the algorithmic ecosystem of causal-learn. A case study on Big Five personality data illustrates agent-assisted pipeline of causal discovery without turning language-model unreliability into causal evidence. The platform is available at causallearn.com.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper argues for keeping agents out of any causal content supply and shows a platform to support that, but offers only a descriptive case study with no quantitative checks.

read the letter

The main takeaway is that agents and LLMs can inspect data, pull context, explain assumptions, and interpret outputs, but they should never provide edges, orientations, priors, constraints, or conclusions. The authors build causal-learn+ around the existing causal-learn library to keep the workflow split that way, and they walk through a Big Five personality data example.

What stands out is the explicit principle and the warning that current LLM uses risk turning textual associations into apparent causal evidence. The platform coordinates the non-causal steps while leaving formal discovery to the algorithms and domain experts. That framing is useful for anyone already working with causal-learn or similar toolkits.

The soft spots are straightforward. This is a position paper plus one illustrative case study; there are no controlled comparisons or accuracy metrics showing the separation improves results over direct LLM-assisted methods. The description of method recommendation and expert-knowledge incorporation leaves open how those steps stay isolated from model influence, so the enforceability concern from the stress test looks reasonable on the given details. No new derivation or measurement appears.

This is for researchers designing causal discovery tools that involve language models. A reader who wants a clear guideline on role separation will get something concrete to think about. It deserves peer review because the practical issue it flags is worth testing and discussing even if the current evidence stays descriptive.

Referee Report

2 major / 2 minor

Summary. The paper argues that LLMs and agents should not directly infer causal edges, orientations, priors, constraints or conclusions in discovery tasks, as this risks conflating textual associations with data-supported evidence. Instead, agents should only assist by inspecting data, retrieving context, explaining method assumptions and clarifying outputs. The authors propose a principle that causal claims must remain grounded exclusively in data, explicit assumptions, formal algorithms, diagnostics and domain-expert decisions. They instantiate the principle in the causal-learn+ online platform, which coordinates preprocessing, method recommendation, expert-knowledge incorporation, formal discovery via the causal-learn library and interpretation. A descriptive case study on Big Five personality data is presented to illustrate an agent-assisted workflow that avoids turning LLM outputs into causal evidence.

Significance. If the proposed separation of roles can be reliably enforced, the work could help preserve the epistemic grounding of causal discovery methods by excluding unreliable LLM-generated content from the inference pipeline. The availability of the causal-learn+ platform and the explicit statement of the principle provide a concrete starting point for discussion in the causal discovery community.

major comments (2)

[causal-learn+ description] § on causal-learn+ instantiation: the steps of 'method recommendation' and 'expert-knowledge incorporation' necessarily involve agent outputs that select algorithms or shape constraints; no mechanism is described that isolates these outputs from the formal discovery pipeline, so prompt artifacts could still determine which method runs or which domain constraints are applied, directly contradicting the central claim that agents supply neither priors nor constraints.
[case study] Case study section: the illustration on Big Five personality data is purely descriptive and provides no quantitative comparison (e.g., edge recovery rates, false-positive rates, or stability metrics) against either direct LLM-based discovery or a non-agent baseline; without such controls the case study cannot demonstrate that the separation principle improves causal accuracy.

minor comments (2)

[abstract/introduction] The abstract and introduction use 'agents' and 'LLMs' interchangeably in places; a brief clarification of the distinction would improve precision.
[platform description] No pseudocode or explicit workflow diagram is provided for how the platform routes agent outputs versus formal algorithm outputs; adding one would clarify the separation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the boundaries of our proposed principle. We respond to each major comment below.

read point-by-point responses

Referee: § on causal-learn+ instantiation: the steps of 'method recommendation' and 'expert-knowledge incorporation' necessarily involve agent outputs that select algorithms or shape constraints; no mechanism is described that isolates these outputs from the formal discovery pipeline, so prompt artifacts could still determine which method runs or which domain constraints are applied, directly contradicting the central claim that agents supply neither priors nor constraints.

Authors: We agree that the current description of causal-learn+ does not sufficiently detail how agent-generated suggestions are isolated from the formal pipeline. The manuscript states that agents assist the workflow while causal claims remain grounded exclusively in data, explicit assumptions, formal algorithms, diagnostics and domain-expert decisions; however, to make this separation explicit, we will revise the causal-learn+ section to describe a mandatory user-approval gate: agent recommendations for methods or constraints are presented as non-binding suggestions, logged separately, and only incorporated after explicit user or expert confirmation. This revision will also add that the formal discovery step (via causal-learn) operates solely on the approved inputs without further agent intervention. revision: yes
Referee: Case study section: the illustration on Big Five personality data is purely descriptive and provides no quantitative comparison (e.g., edge recovery rates, false-positive rates, or stability metrics) against either direct LLM-based discovery or a non-agent baseline; without such controls the case study cannot demonstrate that the separation principle improves causal accuracy.

Authors: The case study is presented strictly as an illustration of the agent-assisted workflow on real data, showing how the platform coordinates preprocessing, method selection, expert input, algorithmic discovery and interpretation without converting LLM outputs into causal evidence. The manuscript does not claim or attempt to demonstrate that the separation principle yields higher causal accuracy than direct LLM-based methods; such a claim would require a controlled benchmark study, which lies outside the scope of the current work focused on the principle and platform design. We therefore do not plan to add quantitative comparisons to the case study. revision: no

Circularity Check

0 steps flagged

No circularity: methodological stance without derivational reduction

full rationale

The paper advances a normative principle for agent roles in causal discovery and describes its instantiation in the causal-learn+ platform. No equations, fitted parameters, predictions, or formal derivations appear in the provided text. The central claim is an argument for workflow separation grounded in data and explicit algorithms rather than any self-referential reduction or self-citation chain. No load-bearing step reduces by construction to its own inputs, satisfying the criteria for a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that agent assistance tasks can be cleanly isolated from causal inference without introducing bias; no free parameters or invented entities are introduced.

axioms (1)

domain assumption LLM agents can perform inspection, retrieval and explanation tasks without injecting textual associations or hallucinations into the causal evidence pipeline
Invoked when stating that agents assist while causal claims remain grounded in data and algorithms.

pith-pipeline@v0.9.1-grok · 5733 in / 1070 out tokens · 30600 ms · 2026-06-26T08:19:56.371532+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 1 canonical work pages

[1]

doi: 10.1038/s42256-026-01183-2

Multi-agent AI systems need transparency.Nature Machine Intelligence, 8:1, 2026. doi: 10.1038/s42256-026-01183-2

work page doi:10.1038/s42256-026-01183-2 2026
[2]

Fast scalable and accurate discovery of dags using the best order score search and grow-shrink trees

Bryan Andrews, Joseph Ramsey, Ruben Sanchez-Romero, Jazmin Camchong, and Erich Kummerfeld. Fast scalable and accurate discovery of dags using the best order score search and grow-shrink trees. InAdvances in Neural Information Processing Systems, 2023

2023
[3]

Theory refinement on bayesian networks

Wray Buntine. Theory refinement on bayesian networks. InUncertainty proceedings 1991, pages 52–60. Elsevier, 1991

1991
[4]

Optimal structure identification with greedy search

David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002

2002
[5]

Large lan- guage models are effective priors for causal graph discovery.arXiv preprint arXiv:2405.13551, 2024

Victor-Alexandru Darvariu, Stephen Hailes, and Mirco Musolesi. Large lan- guage models are effective priors for causal graph discovery.arXiv preprint arXiv:2405.13551, 2024

arXiv 2024
[6]

A versatile causal discovery framework to allow causally-related hidden variables

Xinshuai Dong, Biwei Huang, Ignavier Ng, Xiangchen Song, Yujia Zheng, Songyao Jin, Roberto Legaspi, Peter Spirtes, and Kun Zhang. A versatile causal discovery framework to allow causally-related hidden variables. InInternational Conference on Learning Representations, 2024. 9

2024
[7]

On the probable error of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

Ronald Aylmer Fisher. On the probable error of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

1921
[8]

Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

2019
[9]

Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

1969
[10]

Testing for causality: A personal viewpoint.Journal of Economic Dynamics and control, 2:329–352, 1980

Clive WJ Granger. Testing for causality: A personal viewpoint.Journal of Economic Dynamics and control, 2:329–352, 1980

1980
[11]

Nonlinear causal discovery with additive noise models.Advances in neural information processing systems, 21, 2008

Patrik Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch¨ olkopf. Nonlinear causal discovery with additive noise models.Advances in neural information processing systems, 21, 2008

2008
[12]

Generalized score functions for causal discovery

Biwei Huang, Kun Zhang, Yizhu Lin, Bernhard Sch¨ olkopf, and Clark Glymour. Generalized score functions for causal discovery. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1551–1560, 2018

2018
[13]

Causal discovery from heterogeneous/nonstationary data.J

Biwei Huang, Kun Zhang, Jiji Zhang, Joseph D Ramsey, Ruben Sanchez- Romero, Clark Glymour, and Bernhard Sch¨ olkopf. Causal discovery from heterogeneous/nonstationary data.J. Mach. Learn. Res., 21(89):1–53, 2020

2020
[14]

Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(5), 2010

Aapo Hyv¨ arinen, Kun Zhang, Shohei Shimizu, and Patrik O Hoyer. Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(5), 2010

2010
[15]

Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

2023
[16]

Greedy relaxations of the sparsest permutation algorithm

Wai-Yin Lam, Bryan Andrews, and Joseph Ramsey. Greedy relaxations of the sparsest permutation algorithm. InUncertainty in Artificial Intelligence, pages 1052–1062. PMLR, 2022

2022
[17]

On causal discovery in the presence of deterministic relations.Advances in Neural Information Processing Systems, 37:130920–130952, 2024

Loka Li, Haoyue Dai, Hanin Al Ghothani, Biwei Huang, Jiji Zhang, Shahar Harel, Isaac Bentwich, Guangyi Chen, and Kun Zhang. On causal discovery in the presence of deterministic relations.Advances in Neural Information Processing Systems, 37:130920–130952, 2024

2024
[18]

Causal discovery with language models as imperfect experts

Stephanie Long, Alexandre Pich´ e, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts. InICML 2023 Workshop on Structured Probabilistic Inference and Generative Modeling. 10

2023
[19]

Can large language models build causal graphs?arXiv preprint arXiv:2303.05279, 2023

Stephanie Long, Tibor Schuster, and Alexandre Pich´ e. Can large language models build causal graphs?arXiv preprint arXiv:2303.05279, 2023

arXiv 2023
[20]

Rcd: Repetitive causal discovery of linear non-gaussian acyclic models with latent confounders

Takashi Nicholas Maeda and Shohei Shimizu. Rcd: Repetitive causal discovery of linear non-gaussian acyclic models with latent confounders. InInternational Conference on Artificial Intelligence and Statistics, pages 735–745. PMLR, 2020

2020
[21]

Causal additive models with unobserved variables

Takashi Nicholas Maeda and Shohei Shimizu. Causal additive models with unobserved variables. InUncertainty in Artificial Intelligence, pages 97–106. PMLR, 2021

2021
[22]

Estimating the dimension of a model.The annals of statistics, pages 461–464, 1978

Gideon Schwarz. Estimating the dimension of a model.The annals of statistics, pages 461–464, 1978

1978
[23]

A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

Shohei Shimizu, Patrik O Hoyer, Aapo Hyv¨ arinen, and Antti Kerminen. A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

2003
[24]

Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvarinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, Kenneth Bollen, and Patrik Hoyer. Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

2011
[25]

Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 2010

Ali Shojaie and George Michailidis. Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 2010

2010
[26]

A simple approach for finding the globally optimal bayesian network structure

Tomi Silander and Petri Myllym¨ aki. A simple approach for finding the globally optimal bayesian network structure. InConference on Uncertainty in Artificial Intelligence, pages 445–452, 2006

2006
[27]

Causal inference in the presence of latent variables and selection bias

Peter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent variables and selection bias. InProceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 499–506, 1995

1995
[28]

MIT press, 2000

Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000

2000
[29]

Integrating large language models in causal discovery: A statistical causal approach.Transactions on Machine Learning Research, 2024

Masayuki Takayama, Tadahisa Okuda, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma, Shohei Shimizu, and Akiyoshi Sannai. Integrating large language models in causal discovery: A statistical causal approach.Transactions on Machine Learning Research, 2024

2024
[30]

The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65:31–78, 2006

Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65:31–78, 2006. 11

2006
[31]

Causal discovery in the presence of missing data

Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellstr¨ om, and Kun Zhang. Causal discovery in the presence of missing data. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 1762–1770. Pmlr, 2019

2019
[32]

Generalized independent noise condition for estimating latent variable causal graphs.Advances in neural information processing systems, 33:14891–14902, 2020

Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zhifeng Hao, and Kun Zhang. Generalized independent noise condition for estimating latent variable causal graphs.Advances in neural information processing systems, 33:14891–14902, 2020

2020
[33]

Towards agentic science for advancing scientific discovery.Nature Machine Intelligence, 7(9):1373–1375, 2025

Hongliang Xin, John R Kitchin, and Heather J Kulik. Towards agentic science for advancing scientific discovery.Nature Machine Intelligence, 7(9):1373–1375, 2025

2025
[34]

Learning optimal bayesian networks: A shortest path perspective.Journal of Artificial Intelligence Research, 48:23–65, 2013

Changhe Yuan and Brandon Malone. Learning optimal bayesian networks: A shortest path perspective.Journal of Artificial Intelligence Research, 48:23–65, 2013

2013
[35]

On the identifiability of the post-nonlinear causal model

K Zhang and A Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. In25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), pages 647–655. AUAI Press, 2009

2009
[36]

Kernel-based conditional independence test and application in causal discovery

Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Sch¨ olkopf. Kernel-based conditional independence test and application in causal discovery. InProceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813, 2011

2011
[37]

Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024

Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024. 12

2024

[1] [1]

doi: 10.1038/s42256-026-01183-2

Multi-agent AI systems need transparency.Nature Machine Intelligence, 8:1, 2026. doi: 10.1038/s42256-026-01183-2

work page doi:10.1038/s42256-026-01183-2 2026

[2] [2]

Fast scalable and accurate discovery of dags using the best order score search and grow-shrink trees

Bryan Andrews, Joseph Ramsey, Ruben Sanchez-Romero, Jazmin Camchong, and Erich Kummerfeld. Fast scalable and accurate discovery of dags using the best order score search and grow-shrink trees. InAdvances in Neural Information Processing Systems, 2023

2023

[3] [3]

Theory refinement on bayesian networks

Wray Buntine. Theory refinement on bayesian networks. InUncertainty proceedings 1991, pages 52–60. Elsevier, 1991

1991

[4] [4]

Optimal structure identification with greedy search

David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002

2002

[5] [5]

Large lan- guage models are effective priors for causal graph discovery.arXiv preprint arXiv:2405.13551, 2024

Victor-Alexandru Darvariu, Stephen Hailes, and Mirco Musolesi. Large lan- guage models are effective priors for causal graph discovery.arXiv preprint arXiv:2405.13551, 2024

arXiv 2024

[6] [6]

A versatile causal discovery framework to allow causally-related hidden variables

Xinshuai Dong, Biwei Huang, Ignavier Ng, Xiangchen Song, Yujia Zheng, Songyao Jin, Roberto Legaspi, Peter Spirtes, and Kun Zhang. A versatile causal discovery framework to allow causally-related hidden variables. InInternational Conference on Learning Representations, 2024. 9

2024

[7] [7]

On the probable error of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

Ronald Aylmer Fisher. On the probable error of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

1921

[8] [8]

Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

2019

[9] [9]

Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

1969

[10] [10]

Testing for causality: A personal viewpoint.Journal of Economic Dynamics and control, 2:329–352, 1980

Clive WJ Granger. Testing for causality: A personal viewpoint.Journal of Economic Dynamics and control, 2:329–352, 1980

1980

[11] [11]

Nonlinear causal discovery with additive noise models.Advances in neural information processing systems, 21, 2008

Patrik Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch¨ olkopf. Nonlinear causal discovery with additive noise models.Advances in neural information processing systems, 21, 2008

2008

[12] [12]

Generalized score functions for causal discovery

Biwei Huang, Kun Zhang, Yizhu Lin, Bernhard Sch¨ olkopf, and Clark Glymour. Generalized score functions for causal discovery. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1551–1560, 2018

2018

[13] [13]

Causal discovery from heterogeneous/nonstationary data.J

Biwei Huang, Kun Zhang, Jiji Zhang, Joseph D Ramsey, Ruben Sanchez- Romero, Clark Glymour, and Bernhard Sch¨ olkopf. Causal discovery from heterogeneous/nonstationary data.J. Mach. Learn. Res., 21(89):1–53, 2020

2020

[14] [14]

Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(5), 2010

Aapo Hyv¨ arinen, Kun Zhang, Shohei Shimizu, and Patrik O Hoyer. Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(5), 2010

2010

[15] [15]

Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

2023

[16] [16]

Greedy relaxations of the sparsest permutation algorithm

Wai-Yin Lam, Bryan Andrews, and Joseph Ramsey. Greedy relaxations of the sparsest permutation algorithm. InUncertainty in Artificial Intelligence, pages 1052–1062. PMLR, 2022

2022

[17] [17]

On causal discovery in the presence of deterministic relations.Advances in Neural Information Processing Systems, 37:130920–130952, 2024

Loka Li, Haoyue Dai, Hanin Al Ghothani, Biwei Huang, Jiji Zhang, Shahar Harel, Isaac Bentwich, Guangyi Chen, and Kun Zhang. On causal discovery in the presence of deterministic relations.Advances in Neural Information Processing Systems, 37:130920–130952, 2024

2024

[18] [18]

Causal discovery with language models as imperfect experts

Stephanie Long, Alexandre Pich´ e, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts. InICML 2023 Workshop on Structured Probabilistic Inference and Generative Modeling. 10

2023

[19] [19]

Can large language models build causal graphs?arXiv preprint arXiv:2303.05279, 2023

Stephanie Long, Tibor Schuster, and Alexandre Pich´ e. Can large language models build causal graphs?arXiv preprint arXiv:2303.05279, 2023

arXiv 2023

[20] [20]

Rcd: Repetitive causal discovery of linear non-gaussian acyclic models with latent confounders

Takashi Nicholas Maeda and Shohei Shimizu. Rcd: Repetitive causal discovery of linear non-gaussian acyclic models with latent confounders. InInternational Conference on Artificial Intelligence and Statistics, pages 735–745. PMLR, 2020

2020

[21] [21]

Causal additive models with unobserved variables

Takashi Nicholas Maeda and Shohei Shimizu. Causal additive models with unobserved variables. InUncertainty in Artificial Intelligence, pages 97–106. PMLR, 2021

2021

[22] [22]

Estimating the dimension of a model.The annals of statistics, pages 461–464, 1978

Gideon Schwarz. Estimating the dimension of a model.The annals of statistics, pages 461–464, 1978

1978

[23] [23]

A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

Shohei Shimizu, Patrik O Hoyer, Aapo Hyv¨ arinen, and Antti Kerminen. A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

2003

[24] [24]

Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvarinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, Kenneth Bollen, and Patrik Hoyer. Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

2011

[25] [25]

Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 2010

Ali Shojaie and George Michailidis. Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 2010

2010

[26] [26]

A simple approach for finding the globally optimal bayesian network structure

Tomi Silander and Petri Myllym¨ aki. A simple approach for finding the globally optimal bayesian network structure. InConference on Uncertainty in Artificial Intelligence, pages 445–452, 2006

2006

[27] [27]

Causal inference in the presence of latent variables and selection bias

Peter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent variables and selection bias. InProceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 499–506, 1995

1995

[28] [28]

MIT press, 2000

Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000

2000

[29] [29]

Integrating large language models in causal discovery: A statistical causal approach.Transactions on Machine Learning Research, 2024

Masayuki Takayama, Tadahisa Okuda, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma, Shohei Shimizu, and Akiyoshi Sannai. Integrating large language models in causal discovery: A statistical causal approach.Transactions on Machine Learning Research, 2024

2024

[30] [30]

The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65:31–78, 2006

Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65:31–78, 2006. 11

2006

[31] [31]

Causal discovery in the presence of missing data

Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellstr¨ om, and Kun Zhang. Causal discovery in the presence of missing data. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 1762–1770. Pmlr, 2019

2019

[32] [32]

Generalized independent noise condition for estimating latent variable causal graphs.Advances in neural information processing systems, 33:14891–14902, 2020

Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zhifeng Hao, and Kun Zhang. Generalized independent noise condition for estimating latent variable causal graphs.Advances in neural information processing systems, 33:14891–14902, 2020

2020

[33] [33]

Towards agentic science for advancing scientific discovery.Nature Machine Intelligence, 7(9):1373–1375, 2025

Hongliang Xin, John R Kitchin, and Heather J Kulik. Towards agentic science for advancing scientific discovery.Nature Machine Intelligence, 7(9):1373–1375, 2025

2025

[34] [34]

Learning optimal bayesian networks: A shortest path perspective.Journal of Artificial Intelligence Research, 48:23–65, 2013

Changhe Yuan and Brandon Malone. Learning optimal bayesian networks: A shortest path perspective.Journal of Artificial Intelligence Research, 48:23–65, 2013

2013

[35] [35]

On the identifiability of the post-nonlinear causal model

K Zhang and A Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. In25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), pages 647–655. AUAI Press, 2009

2009

[36] [36]

Kernel-based conditional independence test and application in causal discovery

Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Sch¨ olkopf. Kernel-based conditional independence test and application in causal discovery. InProceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813, 2011

2011

[37] [37]

Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024

Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024. 12

2024