pith. sign in

arxiv: 2606.23608 · v1 · pith:ZA6T4S2Hnew · submitted 2026-06-22 · 💻 cs.AI · cs.LG· cs.SE· stat.AP

Causal Discovery in the Era of Agents

Pith reviewed 2026-06-26 08:19 UTC · model grok-4.3

classification 💻 cs.AI cs.LGcs.SEstat.AP
keywords causal discoverylarge language modelsagentsworkflow assistancecausal-learngraph structureassumption explanation
0
0 comments X

The pith

Agents should handle data inspection and assumption explanation in causal discovery but must not generate edges, directions, or conclusions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that recent efforts to let large language models propose causal graphs or supply priors mix textual associations with data evidence and risk introducing artifacts. It advances the principle that agents assist only supporting steps while all causal claims stay anchored in data, explicit assumptions, formal algorithms, and expert decisions. This separation is realized in the causal-learn+ platform, which coordinates preprocessing, method choice, expert input, discovery runs, and interpretation around the causal-learn ecosystem. A case study with Big Five personality data shows an agent-assisted workflow that avoids turning model unreliability into causal evidence.

Core claim

Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. Causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions, as instantiated in the causal-learn+ platform that coordinates analysis around the causal-learn ecosystem without allowing language-model outputs to become causal evidence.

What carries the argument

The principle that agents assist the workflow while causal claims remain grounded exclusively in data, assumptions, algorithms and expert decisions, implemented as causal-learn+.

If this is right

  • Causal discovery pipelines can incorporate language models for context retrieval and output clarification without the models determining structure.
  • Expert knowledge enters only through explicit user or domain-expert input rather than model-proposed constraints.
  • Method selection and diagnostics stay under algorithmic control even when agents suggest candidates.
  • Interpretation of results remains traceable to data and formal assumptions rather than textual associations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same separation principle could apply to other scientific workflows where generative models risk substituting associations for measurements.
  • Platforms built this way may make it easier to audit exactly which steps relied on data versus assistance.
  • If the separation proves hard to enforce, hybrid human-AI review checkpoints would become necessary at every handoff.

Load-bearing premise

That agent roles can be kept strictly separate from causal inference steps so language-model outputs never leak into final edges, priors or conclusions.

What would settle it

An empirical case where an agent limited to inspection and explanation still produces a causal graph whose structure matches a known language-model hallucination rather than the data diagnostics.

read the original abstract

Recent attempts to combine large language models (LLMs) with causal discovery ask models to infer pairwise directions, propose graph structures, or inject language-model outputs as priors and constraints. These approaches promise faster analysis, but they also obscure whether a causal evidence is supported by data and assumptions or by textual associations, prompt artifacts and hallucinated mechanisms. We argue for a different role for agents in causal discovery. Agents should inspect data, retrieve context, explain method assumptions and clarify graph outputs, but they should not supply edges, orientations, priors, constraints or causal conclusions. We propose the principle that agents assist the workflow, while causal claims remain grounded in data, explicit assumptions, formal algorithms, diagnostics and user or domain-expert decisions. We instantiate this principle in causal-learn+, an online platform that coordinates data analysis, preprocessing, method recommendation, expert-knowledge incorporation, formal discovery and interpretation around the algorithmic ecosystem of causal-learn. A case study on Big Five personality data illustrates agent-assisted pipeline of causal discovery without turning language-model unreliability into causal evidence. The platform is available at causallearn.com.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper argues that LLMs and agents should not directly infer causal edges, orientations, priors, constraints or conclusions in discovery tasks, as this risks conflating textual associations with data-supported evidence. Instead, agents should only assist by inspecting data, retrieving context, explaining method assumptions and clarifying outputs. The authors propose a principle that causal claims must remain grounded exclusively in data, explicit assumptions, formal algorithms, diagnostics and domain-expert decisions. They instantiate the principle in the causal-learn+ online platform, which coordinates preprocessing, method recommendation, expert-knowledge incorporation, formal discovery via the causal-learn library and interpretation. A descriptive case study on Big Five personality data is presented to illustrate an agent-assisted workflow that avoids turning LLM outputs into causal evidence.

Significance. If the proposed separation of roles can be reliably enforced, the work could help preserve the epistemic grounding of causal discovery methods by excluding unreliable LLM-generated content from the inference pipeline. The availability of the causal-learn+ platform and the explicit statement of the principle provide a concrete starting point for discussion in the causal discovery community.

major comments (2)
  1. [causal-learn+ description] § on causal-learn+ instantiation: the steps of 'method recommendation' and 'expert-knowledge incorporation' necessarily involve agent outputs that select algorithms or shape constraints; no mechanism is described that isolates these outputs from the formal discovery pipeline, so prompt artifacts could still determine which method runs or which domain constraints are applied, directly contradicting the central claim that agents supply neither priors nor constraints.
  2. [case study] Case study section: the illustration on Big Five personality data is purely descriptive and provides no quantitative comparison (e.g., edge recovery rates, false-positive rates, or stability metrics) against either direct LLM-based discovery or a non-agent baseline; without such controls the case study cannot demonstrate that the separation principle improves causal accuracy.
minor comments (2)
  1. [abstract/introduction] The abstract and introduction use 'agents' and 'LLMs' interchangeably in places; a brief clarification of the distinction would improve precision.
  2. [platform description] No pseudocode or explicit workflow diagram is provided for how the platform routes agent outputs versus formal algorithm outputs; adding one would clarify the separation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the boundaries of our proposed principle. We respond to each major comment below.

read point-by-point responses
  1. Referee: § on causal-learn+ instantiation: the steps of 'method recommendation' and 'expert-knowledge incorporation' necessarily involve agent outputs that select algorithms or shape constraints; no mechanism is described that isolates these outputs from the formal discovery pipeline, so prompt artifacts could still determine which method runs or which domain constraints are applied, directly contradicting the central claim that agents supply neither priors nor constraints.

    Authors: We agree that the current description of causal-learn+ does not sufficiently detail how agent-generated suggestions are isolated from the formal pipeline. The manuscript states that agents assist the workflow while causal claims remain grounded exclusively in data, explicit assumptions, formal algorithms, diagnostics and domain-expert decisions; however, to make this separation explicit, we will revise the causal-learn+ section to describe a mandatory user-approval gate: agent recommendations for methods or constraints are presented as non-binding suggestions, logged separately, and only incorporated after explicit user or expert confirmation. This revision will also add that the formal discovery step (via causal-learn) operates solely on the approved inputs without further agent intervention. revision: yes

  2. Referee: Case study section: the illustration on Big Five personality data is purely descriptive and provides no quantitative comparison (e.g., edge recovery rates, false-positive rates, or stability metrics) against either direct LLM-based discovery or a non-agent baseline; without such controls the case study cannot demonstrate that the separation principle improves causal accuracy.

    Authors: The case study is presented strictly as an illustration of the agent-assisted workflow on real data, showing how the platform coordinates preprocessing, method selection, expert input, algorithmic discovery and interpretation without converting LLM outputs into causal evidence. The manuscript does not claim or attempt to demonstrate that the separation principle yields higher causal accuracy than direct LLM-based methods; such a claim would require a controlled benchmark study, which lies outside the scope of the current work focused on the principle and platform design. We therefore do not plan to add quantitative comparisons to the case study. revision: no

Circularity Check

0 steps flagged

No circularity: methodological stance without derivational reduction

full rationale

The paper advances a normative principle for agent roles in causal discovery and describes its instantiation in the causal-learn+ platform. No equations, fitted parameters, predictions, or formal derivations appear in the provided text. The central claim is an argument for workflow separation grounded in data and explicit algorithms rather than any self-referential reduction or self-citation chain. No load-bearing step reduces by construction to its own inputs, satisfying the criteria for a score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that agent assistance tasks can be cleanly isolated from causal inference without introducing bias; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption LLM agents can perform inspection, retrieval and explanation tasks without injecting textual associations or hallucinations into the causal evidence pipeline
    Invoked when stating that agents assist while causal claims remain grounded in data and algorithms.

pith-pipeline@v0.9.1-grok · 5733 in / 1070 out tokens · 30600 ms · 2026-06-26T08:19:56.371532+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 1 canonical work pages

  1. [1]

    doi: 10.1038/s42256-026-01183-2

    Multi-agent AI systems need transparency.Nature Machine Intelligence, 8:1, 2026. doi: 10.1038/s42256-026-01183-2

  2. [2]

    Fast scalable and accurate discovery of dags using the best order score search and grow-shrink trees

    Bryan Andrews, Joseph Ramsey, Ruben Sanchez-Romero, Jazmin Camchong, and Erich Kummerfeld. Fast scalable and accurate discovery of dags using the best order score search and grow-shrink trees. InAdvances in Neural Information Processing Systems, 2023

  3. [3]

    Theory refinement on bayesian networks

    Wray Buntine. Theory refinement on bayesian networks. InUncertainty proceedings 1991, pages 52–60. Elsevier, 1991

  4. [4]

    Optimal structure identification with greedy search

    David Maxwell Chickering. Optimal structure identification with greedy search. Journal of machine learning research, 3(Nov):507–554, 2002

  5. [5]

    Large lan- guage models are effective priors for causal graph discovery.arXiv preprint arXiv:2405.13551, 2024

    Victor-Alexandru Darvariu, Stephen Hailes, and Mirco Musolesi. Large lan- guage models are effective priors for causal graph discovery.arXiv preprint arXiv:2405.13551, 2024

  6. [6]

    A versatile causal discovery framework to allow causally-related hidden variables

    Xinshuai Dong, Biwei Huang, Ignavier Ng, Xiangchen Song, Yujia Zheng, Songyao Jin, Roberto Legaspi, Peter Spirtes, and Kun Zhang. A versatile causal discovery framework to allow causally-related hidden variables. InInternational Conference on Learning Representations, 2024. 9

  7. [7]

    On the probable error of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

    Ronald Aylmer Fisher. On the probable error of a coefficient of correlation deduced from a small sample.Metron, 1:3–32, 1921

  8. [8]

    Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

    Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models.Frontiers in genetics, 10:524, 2019

  9. [9]

    Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

    Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods.Econometrica: journal of the Econometric Society, pages 424–438, 1969

  10. [10]

    Testing for causality: A personal viewpoint.Journal of Economic Dynamics and control, 2:329–352, 1980

    Clive WJ Granger. Testing for causality: A personal viewpoint.Journal of Economic Dynamics and control, 2:329–352, 1980

  11. [11]

    Nonlinear causal discovery with additive noise models.Advances in neural information processing systems, 21, 2008

    Patrik Hoyer, Dominik Janzing, Joris M Mooij, Jonas Peters, and Bernhard Sch¨ olkopf. Nonlinear causal discovery with additive noise models.Advances in neural information processing systems, 21, 2008

  12. [12]

    Generalized score functions for causal discovery

    Biwei Huang, Kun Zhang, Yizhu Lin, Bernhard Sch¨ olkopf, and Clark Glymour. Generalized score functions for causal discovery. InProceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1551–1560, 2018

  13. [13]

    Causal discovery from heterogeneous/nonstationary data.J

    Biwei Huang, Kun Zhang, Jiji Zhang, Joseph D Ramsey, Ruben Sanchez- Romero, Clark Glymour, and Bernhard Sch¨ olkopf. Causal discovery from heterogeneous/nonstationary data.J. Mach. Learn. Res., 21(89):1–53, 2020

  14. [14]

    Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(5), 2010

    Aapo Hyv¨ arinen, Kun Zhang, Shohei Shimizu, and Patrik O Hoyer. Estimation of a structural vector autoregression model using non-gaussianity.Journal of Machine Learning Research, 11(5), 2010

  15. [15]

    Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

    Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality.Transactions on Machine Learning Research, 2023

  16. [16]

    Greedy relaxations of the sparsest permutation algorithm

    Wai-Yin Lam, Bryan Andrews, and Joseph Ramsey. Greedy relaxations of the sparsest permutation algorithm. InUncertainty in Artificial Intelligence, pages 1052–1062. PMLR, 2022

  17. [17]

    On causal discovery in the presence of deterministic relations.Advances in Neural Information Processing Systems, 37:130920–130952, 2024

    Loka Li, Haoyue Dai, Hanin Al Ghothani, Biwei Huang, Jiji Zhang, Shahar Harel, Isaac Bentwich, Guangyi Chen, and Kun Zhang. On causal discovery in the presence of deterministic relations.Advances in Neural Information Processing Systems, 37:130920–130952, 2024

  18. [18]

    Causal discovery with language models as imperfect experts

    Stephanie Long, Alexandre Pich´ e, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts. InICML 2023 Workshop on Structured Probabilistic Inference and Generative Modeling. 10

  19. [19]

    Can large language models build causal graphs?arXiv preprint arXiv:2303.05279, 2023

    Stephanie Long, Tibor Schuster, and Alexandre Pich´ e. Can large language models build causal graphs?arXiv preprint arXiv:2303.05279, 2023

  20. [20]

    Rcd: Repetitive causal discovery of linear non-gaussian acyclic models with latent confounders

    Takashi Nicholas Maeda and Shohei Shimizu. Rcd: Repetitive causal discovery of linear non-gaussian acyclic models with latent confounders. InInternational Conference on Artificial Intelligence and Statistics, pages 735–745. PMLR, 2020

  21. [21]

    Causal additive models with unobserved variables

    Takashi Nicholas Maeda and Shohei Shimizu. Causal additive models with unobserved variables. InUncertainty in Artificial Intelligence, pages 97–106. PMLR, 2021

  22. [22]

    Estimating the dimension of a model.The annals of statistics, pages 461–464, 1978

    Gideon Schwarz. Estimating the dimension of a model.The annals of statistics, pages 461–464, 1978

  23. [23]

    A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

    Shohei Shimizu, Patrik O Hoyer, Aapo Hyv¨ arinen, and Antti Kerminen. A linear non-Gaussian acyclic model for causal discovery.Journal of Machine Learning Research, 7:2003–2030, 2006

  24. [24]

    Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

    Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvarinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, Kenneth Bollen, and Patrik Hoyer. Directlingam: A direct method for learning a linear non-gaussian structural equation model.Journal of Machine Learning Research-JMLR, 12(Apr):1225–1248, 2011

  25. [25]

    Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 2010

    Ali Shojaie and George Michailidis. Discovering graphical granger causality using the truncating lasso penalty.Bioinformatics, 26(18):i517–i523, 2010

  26. [26]

    A simple approach for finding the globally optimal bayesian network structure

    Tomi Silander and Petri Myllym¨ aki. A simple approach for finding the globally optimal bayesian network structure. InConference on Uncertainty in Artificial Intelligence, pages 445–452, 2006

  27. [27]

    Causal inference in the presence of latent variables and selection bias

    Peter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent variables and selection bias. InProceedings of the Eleventh conference on Uncertainty in artificial intelligence, pages 499–506, 1995

  28. [28]

    MIT press, 2000

    Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000

  29. [29]

    Integrating large language models in causal discovery: A statistical causal approach.Transactions on Machine Learning Research, 2024

    Masayuki Takayama, Tadahisa Okuda, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma, Shohei Shimizu, and Akiyoshi Sannai. Integrating large language models in causal discovery: A statistical causal approach.Transactions on Machine Learning Research, 2024

  30. [30]

    The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65:31–78, 2006

    Ioannis Tsamardinos, Laura E Brown, and Constantin F Aliferis. The max-min hill-climbing bayesian network structure learning algorithm.Machine learning, 65:31–78, 2006. 11

  31. [31]

    Causal discovery in the presence of missing data

    Ruibo Tu, Cheng Zhang, Paul Ackermann, Karthika Mohan, Hedvig Kjellstr¨ om, and Kun Zhang. Causal discovery in the presence of missing data. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 1762–1770. Pmlr, 2019

  32. [32]

    Generalized independent noise condition for estimating latent variable causal graphs.Advances in neural information processing systems, 33:14891–14902, 2020

    Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zhifeng Hao, and Kun Zhang. Generalized independent noise condition for estimating latent variable causal graphs.Advances in neural information processing systems, 33:14891–14902, 2020

  33. [33]

    Towards agentic science for advancing scientific discovery.Nature Machine Intelligence, 7(9):1373–1375, 2025

    Hongliang Xin, John R Kitchin, and Heather J Kulik. Towards agentic science for advancing scientific discovery.Nature Machine Intelligence, 7(9):1373–1375, 2025

  34. [34]

    Learning optimal bayesian networks: A shortest path perspective.Journal of Artificial Intelligence Research, 48:23–65, 2013

    Changhe Yuan and Brandon Malone. Learning optimal bayesian networks: A shortest path perspective.Journal of Artificial Intelligence Research, 48:23–65, 2013

  35. [35]

    On the identifiability of the post-nonlinear causal model

    K Zhang and A Hyv¨ arinen. On the identifiability of the post-nonlinear causal model. In25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), pages 647–655. AUAI Press, 2009

  36. [36]

    Kernel-based conditional independence test and application in causal discovery

    Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Sch¨ olkopf. Kernel-based conditional independence test and application in causal discovery. InProceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pages 804–813, 2011

  37. [37]

    Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024

    Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, and Kun Zhang. Causal-learn: Causal discovery in python.Journal of Machine Learning Research, 25(60):1–8, 2024. 12