pith. sign in

arxiv: 2506.16234 · v2 · submitted 2025-06-19 · 💻 cs.LG

Sequential Causal Discovery with Noisy Language Model Priors

Pith reviewed 2026-05-19 08:34 UTC · model grok-4.3

classification 💻 cs.LG
keywords causal discoverylanguage modelspartial ancestral graphssequential optimizationnoisy priorsbatch datahybrid frameworkobservational data
0
0 comments X

The pith

Shifting to partial ancestral graphs and sequential LM queries enables robust causal discovery from noisy priors and batch data

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that causal discovery remains feasible even when observational data arrives only in biased batches and expert knowledge is supplied by language models that hallucinate or contradict themselves. It does so by replacing strict directed acyclic graphs with partial ancestral graphs that keep track of remaining ambiguities, then using sequential optimization to decide which edges to ask the language model about next so that each new data batch can correct both sampling bias and model noise. A sympathetic reader would care because most applied settings lack complete data and flawless domain experts, making a method that tolerates both imperfections directly useful for recovering causal relations in practice. If the claim holds, the same noisy language-model output can be turned into a reliable prior rather than a source of error.

Core claim

The paper presents a hybrid framework that adaptively combines sequential batches of observational data with noisy language-model knowledge. The framework replaces directed acyclic graphs with partial ancestral graphs to represent causal ambiguities in a single coherent structure and introduces a sequential optimization procedure that selects the most informative edges for language-model queries, thereby grounding global noisy priors in local data while correcting for biases induced by sampling and by the model itself.

What carries the argument

The representation shift from directed acyclic graphs to partial ancestral graphs together with an adaptive sequential optimization scheme that chooses which edges to query from the language model on the basis of current data batches.

If this is right

  • Structural accuracy exceeds that of prior hybrid methods across multiple datasets and language models.
  • The same procedure extends from structure learning to estimation of causal parameters.
  • Performance remains stable even when language-model outputs contain hallucinations or systematic biases.
  • The approach handles data that arrive in successive batches subject to sampling bias.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same sequential-query idea could be tested with other imperfect knowledge sources such as crowdsourced annotations.
  • Extending the method to truly streaming data, where each new batch arrives after the previous queries have been answered, would be a natural next step.
  • Examining how the choice of language model changes the rate at which biases are corrected could guide practical deployment.

Load-bearing premise

That partial ancestral graphs combined with sequential optimization can reliably ground global noisy language-model knowledge in local observational data while correctly accounting for both data-induced and language-model-induced biases.

What would settle it

Apply the method to a dataset with a fully known ground-truth causal structure, supply deliberately inconsistent or biased language-model responses, and check whether the recovered partial ancestral graph matches the true structure more closely than a data-only baseline.

Figures

Figures reproduced from arXiv: 2506.16234 by Arno Solin, Atanu R. Sinha, David Arbour, Harshita Chopra, Prakhar Verma, Sunav Choudhary.

Figure 1
Figure 1. Figure 1: PAG-LM streamlines how LMs compose the graph and allows for ambiguities to be indi￾cated in the structure. (a) DAG is constructed by iterative prompting ( ) leading to ambiguities (e.g., ) requiring heuristics that cannot be represented. (b) BLANCE represents the causal structure as a PAG that implicitly allow ambiguities to be represented providing a richer representation ( e.g., ◦−◦). addressed by hybrid… view at source ↗
Figure 2
Figure 2. Figure 2: USER LEVEL DATA - I: Performance evolution across batches for Data-LM methods. Left: Modified Structural Hamming Distance (↓), Middle: Structural Intervention Distance (↓), and Right: F1-Score (↑). BLANCE consistently outperforms other approaches as data accumulation progresses. calls to refine the edge distribution HEi and expand the set of background knowledge Bi . In the edge refinement setting, each ex… view at source ↗
Figure 3
Figure 3. Figure 3: Structure learning ablation: The impact of two key components: selection score and dy￾namic background threshold. (Left, Middle) Modified SHD (↓) and F1-score (↑) on the USER LEVEL DATA - I dataset, comparing BLANCE—selection score against random selection. (Right) Modified SHD(↓) comparing BLANCE— dynamic threshold with a conventional fixed threshold. With latent confounders, MLE becomes ill-posed and int… view at source ↗
Figure 4
Figure 4. Figure 4: LM predicted confounders [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Parameter Estimation: Convergence of parameters and ro￾bustness to prior misspecification as more batches are processed. Parameter estimation: robustness and recovery We demon￾strate robustness of the proposed parameter estimation algo￾rithm under misspecified or ill-informed priors. Based on do￾main knowledge and observational data, we estimate latent confounder alcohol_content to follow distribution N (1… view at source ↗
read the original abstract

Causal discovery from observational data typically assumes access to complete data and availability of perfect domain experts. In practice, data often arrive in batches, are subject to sampling bias, and expert knowledge is scarce. Language Models (LMs) offer a surrogate for expert knowledge but suffer from hallucinations, inconsistencies, and bias. We present a hybrid framework that bridges these gaps by adaptively integrating sequential batch data with LM-derived noisy, expert knowledge while accounting for both data-induced and LM-induced biases. We propose a representation shift from Directed Acyclic Graph (DAG) to Partial Ancestral Graph (PAG), that accommodates ambiguities within a coherent framework, allowing grounding the global LM knowledge in local observational data. To guide LM interactions, we use a sequential optimization scheme that adaptively queries the most informative edges. Across varied datasets and LMs, we outperform prior work in structural accuracy and extend to parameter estimation, showing robustness to LM noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript presents a hybrid framework for causal discovery from sequential batch data that incorporates noisy priors from language models. It shifts representation from DAGs to Partial Ancestral Graphs (PAGs) to accommodate LM-induced ambiguities and hallucinations, employs a sequential optimization scheme to adaptively query informative edges, and explicitly models both data-induced and LM-induced biases. Experiments across multiple datasets and LMs report improved structural accuracy over prior methods, with extensions to parameter estimation and demonstrated robustness to LM noise.

Significance. If the central claims hold, the work addresses a practical gap in causal discovery where data arrives incrementally and expert knowledge is replaced by imperfect LM surrogates. The PAG-based grounding of global noisy knowledge in local observational data, combined with adaptive querying, offers a coherent hybrid approach. The empirical validation across varied datasets and LMs, including robustness checks, is a strength of the contribution.

major comments (2)
  1. [§3.2] §3.2, around Eq. (4): The procedure for mapping LM outputs to PAG edge constraints is described at a high level, but the exact mechanism for resolving LM inconsistencies (e.g., conflicting ancestral relations) and propagating them into the score function is not fully derived; this is load-bearing for the claim that the framework correctly accounts for LM-induced biases without circularity.
  2. [Table 3] Table 3, final column: The reported robustness to LM noise shows only modest degradation up to 30% noise, but the experimental protocol does not include a control where LM priors are replaced by random noise of matched strength; without this, it is difficult to confirm that the sequential optimization is actively grounding the priors rather than simply falling back to the observational data.
minor comments (3)
  1. [§4.1] The notation for the bias-correction term in the objective (introduced in §4.1) is introduced without an explicit reference to its derivation in the appendix; adding a pointer would improve readability.
  2. [Figure 4] Figure 4 caption does not specify the number of independent runs or the exact baseline implementations used for the F1-score comparisons; this detail is needed to interpret the error bars.
  3. [§2.3] A few sentences in §2.3 repeat the motivation for using PAGs that was already stated in the introduction; condensing this would tighten the related-work discussion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [§3.2] §3.2, around Eq. (4): The procedure for mapping LM outputs to PAG edge constraints is described at a high level, but the exact mechanism for resolving LM inconsistencies (e.g., conflicting ancestral relations) and propagating them into the score function is not fully derived; this is load-bearing for the claim that the framework correctly accounts for LM-induced biases without circularity.

    Authors: We agree that the current description would benefit from greater precision. LM outputs are parsed into statements about ancestral relations (e.g., X is an ancestor of Y) and encoded as soft constraints on the PAG. When LM statements conflict, the procedure selects the relation that is most consistent with the partial order already implied by the current PAG, using a weighted average over LM-provided confidence scores when available; unresolved conflicts are left as undetermined edges in the PAG. These constraints enter the score function as an additive penalty term that is computed once from the LM prior and held fixed during data-driven updates, avoiding any feedback loop. We will expand the derivation around Eq. (4) with explicit pseudocode and a worked example of conflict resolution in the revised manuscript. revision: yes

  2. Referee: [Table 3] Table 3, final column: The reported robustness to LM noise shows only modest degradation up to 30% noise, but the experimental protocol does not include a control where LM priors are replaced by random noise of matched strength; without this, it is difficult to confirm that the sequential optimization is actively grounding the priors rather than simply falling back to the observational data.

    Authors: We acknowledge the value of this control. Our existing experiments inject structured noise into LM-generated priors, but they do not compare against unstructured random priors of equivalent strength. Including such a baseline would more clearly isolate the contribution of the sequential optimization in grounding informative LM knowledge. We will add the random-noise control to the revised Table 3 and report the corresponding structural accuracy metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a hybrid framework that takes external LM outputs and observational batch data as independent inputs, then applies PAG representation and sequential optimization to integrate them while explicitly modeling biases. No derivation step reduces by construction to a fitted parameter or self-citation chain; the central claims rest on the combination of these distinct sources rather than re-labeling or re-deriving one from the other. The approach is self-contained against external benchmarks and does not invoke load-bearing self-citations or ansatzes for its core results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that noisy LM outputs can serve as useful priors that are correctable by sequential data; no free parameters or invented entities are explicitly named in the abstract.

axioms (1)
  • domain assumption Language models can supply noisy but integrable priors for causal edges that can be grounded in observational batches
    This is the core premise enabling the hybrid framework described in the abstract.

pith-pipeline@v0.9.0 · 5702 in / 1195 out tokens · 51705 ms · 2026-05-19T08:34:02.057447+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 3 internal anchors

  1. [1]

    Causal machine learning for healthcare and precision medicine

    Pedro Sanchez, Jeremy P V oisey, Tian Xia, Hannah I Watson, Alison Q O’Neil, and Sotirios A Tsaftaris. Causal machine learning for healthcare and precision medicine. Royal Society Open Science, 9, 2022

  2. [2]

    Neuropathic pain diagnosis simulator for causal discovery algorithm evaluation

    Ruibo Tu, Kun Zhang, Bo Bertilson, Hedvig Kjellstrom, and Cheng Zhang. Neuropathic pain diagnosis simulator for causal discovery algorithm evaluation. Advances in Neural Information Processing Systems, 32, 2019

  3. [3]

    Causal discovery in financial markets: A framework for nonstationary time-series data

    Agathe Sadeghi, Achintya Gopal, and Mohammad Fesanghary. Causal discovery in financial markets: A framework for nonstationary time-series data. arXiv preprint arXiv:2312.17375, 2023

  4. [4]

    Causal discovery for climate research using graphical models

    Imme Ebert-Uphoff and Yi Deng. Causal discovery for climate research using graphical models. Journal of Climate, 25(17), 2012

  5. [5]

    Causation, Prediction, and Search

    Peter Spirtes, Clark Glymour, and Richard Scheines. Causation, Prediction, and Search. The MIT Press, 2000

  6. [6]

    Learning Bayesian networks, volume 38

    Richard E Neapolitan et al. Learning Bayesian networks, volume 38. Pearson Prentice Hall Upper Saddle River, 2004

  7. [7]

    Causal discovery and inference: concepts and recent methodological advances

    Peter Spirtes and Kun Zhang. Causal discovery and inference: concepts and recent methodological advances. In Applied Informatics. Springer, 2016

  8. [8]

    Optimal structure identification with greedy search

    David Maxwell Chickering. Optimal structure identification with greedy search. Journal of Machine Learning Research, 2002

  9. [9]

    Active learning of causal networks with intervention experiments and optimal designs

    Yang-Bo He and Zhi Geng. Active learning of causal networks with intervention experiments and optimal designs. Journal of Machine Learning Research, 9(Nov), 2008

  10. [10]

    Subset verification and search algorithms for causal DAGs

    Davin Choo and Kirankumar Shiragur. Subset verification and search algorithms for causal DAGs. In International Conference on Artificial Intelligence and Statistics. PMLR, 2023

  11. [11]

    Distinguishing cause from effect using observational data: methods and benchmarks

    Joris M Mooij, Jonas Peters, Dominik Janzing, Jakob Zscheischler, and Bernhard Schölkopf. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17, 2016. 10

  12. [12]

    Causal Inference and Causal Explanation with Background Knowledge

    Christopher Meek. Causal inference and causal explanation with background knowledge. arXiv preprint arXiv:1302.4972, 2013

  13. [13]

    The impact of prior knowledge on causal structure learning

    Anthony C Constantinou, Zhigao Guo, and Neville K Kitson. The impact of prior knowledge on causal structure learning. Knowledge and Information Systems, 65(8), 2023

  14. [14]

    Causal reasoning and large language models: Opening a new frontier for causality

    Emre Kıcıman, Robert Ness, Amit Sharma, and Chenhao Tan. Causal reasoning and large language models: Opening a new frontier for causality. arXiv preprint arXiv:2305.00050, 2023

  15. [15]

    Can foundation models talk causality? In UAI 2022 Workshop on Causal Representation Learning, 2022

    Moritz Willig, Matej Zeˇcevi´c, Devendra Singh Dhami, and Kristian Kersting. Can foundation models talk causality? In UAI 2022 Workshop on Causal Representation Learning, 2022

  16. [16]

    Integrating large language models in causal discovery: A statistical causal approach, 2025

    Masayuki Takayama, Tadahisa Okuda, Thong Pham, Tatsuyoshi Ikenoue, Shingo Fukuma, Shohei Shimizu, and Akiyoshi Sannai. Integrating large language models in causal discovery: A statistical causal approach, 2025

  17. [17]

    Can large language models build causal graphs? In NeurIPS 2022 Workshop on Causality for Real-world Impact, 2022

    Stephanie Long, Tibor Schuster, and Alexandre Piché. Can large language models build causal graphs? In NeurIPS 2022 Workshop on Causality for Real-world Impact, 2022

  18. [18]

    From query tools to causal architects: Har- nessing large language models for advanced causal discovery from data

    Taiyu Ban, Lyvzhou Chen, Xiangyu Wang, and Huanhuan Chen. From query tools to causal architects: Har- nessing large language models for advanced causal discovery from data. arXiv preprint arXiv:2306.16902, 2023

  19. [19]

    Survey of hallucination in natural language generation

    Ziwei Ji, Nayeon Lee, Rita Frieske, Tiezheng Yu, Dan Su, Yan Xu, Etsuko Ishii, Ye Jin Bang, Andrea Madotto, and Pascale Fung. Survey of hallucination in natural language generation. ACM Computing Surveys, 55, 2023

  20. [20]

    Causal inference in the presence of latent variables and selection bias

    Peter Spirtes, Christopher Meek, and Thomas Richardson. Causal inference in the presence of latent variables and selection bias. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, 1995

  21. [21]

    Causality

    Judea Pearl. Causality. Cambridge university press, 2009

  22. [22]

    On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias

    Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence, 172, 2008

  23. [23]

    Learning high- dimensional directed acyclic graphs with latent and selection variables

    Diego Colombo, Marloes H Maathuis, Markus Kalisch, and Thomas S Richardson. Learning high- dimensional directed acyclic graphs with latent and selection variables. The Annals of Statistics, 2012

  24. [24]

    A linear non-Gaussian acyclic model for causal discovery

    Shohei Shimizu, Patrik O Hoyer, Aapo Hyvärinen, Antti Kerminen, and Michael Jordan. A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7(10), 2006

  25. [25]

    DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model

    Shohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Aapo Hyvarinen, Yoshinobu Kawahara, Takashi Washio, Patrik O Hoyer, Kenneth Bollen, and Patrik Hoyer. DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model. Journal of Machine Learning Research, 12, 2011

  26. [26]

    Strong Faithfulness and Uniform Consistency in Causal Inference

    Jiji Zhang and Peter L Spirtes. Strong faithfulness and uniform consistency in causal inference. arXiv preprint arXiv:1212.2506, 2012

  27. [27]

    The hardness of conditional independence testing and the generalised covariance measure

    Rajen D Shah and Jonas Peters. The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48, 2020

  28. [28]

    Kernel-based conditional inde- pendence test and application in causal discovery

    Kun Zhang, Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Kernel-based conditional inde- pendence test and application in causal discovery. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, 2011

  29. [29]

    Elements of causal inference: foundations and learning algorithms

    Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017

  30. [30]

    Review of causal discovery methods based on graphical models

    Clark Glymour, Kun Zhang, and Peter Spirtes. Review of causal discovery methods based on graphical models. Frontiers in genetics, 10, 2019

  31. [31]

    Causal discovery with general non-linear relation- ships using non-linear ICA

    Ricardo Pio Monti, Kun Zhang, and Aapo Hyvärinen. Causal discovery with general non-linear relation- ships using non-linear ICA. In Uncertainty in Artificial Intelligence. PMLR, 2020

  32. [32]

    Causal discovery with continuous additive noise models

    Jonas Peters, Joris M Mooij, Dominik Janzing, and Bernhard Schölkopf. Causal discovery with continuous additive noise models. The Journal of Machine Learning Research, 15, 2014

  33. [33]

    Strong completeness and faithfulness in Bayesian networks

    Christopher Meek. Strong completeness and faithfulness in Bayesian networks. In Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, 1995. 11

  34. [34]

    Learning Bayesian networks: The combination of knowledge and statistical data

    David Heckerman, Dan Geiger, and David M Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 1995

  35. [35]

    A hybrid causal search algorithm for latent variable models

    Juan Miguel Ogarrio, Peter Spirtes, and Joe Ramsey. A hybrid causal search algorithm for latent variable models. In Proceedings of the Eighth International Conference on Probabilistic Graphical Models , Proceedings of Machine Learning Research, pages 368–379. PMLR, 2016

  36. [36]

    Differentiable causal discovery from interventional data

    Philippe Brouillard, Sébastien Lachapelle, Alexandre Lacoste, Simon Lacoste-Julien, and Alexandre Drouin. Differentiable causal discovery from interventional data. Advances in Neural Information Processing Systems, 33:21865–21877, 2020

  37. [37]

    Permutation-based causal inference algorithms with interventions

    Yuhao Wang, Liam Solus, Karren Yang, and Caroline Uhler. Permutation-based causal inference algorithms with interventions. Advances in Neural Information Processing Systems, 30, 2017

  38. [38]

    Causal discovery via MML

    Chris Wallace, Kevin B Korb, and Honghua Dai. Causal discovery via MML. In ICML, volume 96, pages 516–524. Citeseer, 1996

  39. [39]

    Toward causal representation learning

    Bernhard Schölkopf, Francesco Locatello, Stefan Bauer, Nan Rosemary Ke, Nal Kalchbrenner, Anirudh Goyal, and Yoshua Bengio. Toward causal representation learning. Proceedings of the IEEE, 109(5), 2021

  40. [40]

    Bayesian causality

    Pierre Baldi and Babak Shahbaba. Bayesian causality. The American Statistician, 74(3):249–257, 2020

  41. [41]

    Global versus local shocks in micro price dynamics

    Philippe Andrade and Marios Zachariadis. Global versus local shocks in micro price dynamics. Journal of International Economics, 98:78–92, 2016

  42. [42]

    The macroeconomic impact of climate change: Global vs

    Adrien Bilal and Diego R Känzig. The macroeconomic impact of climate change: Global vs. local temperature. Technical report, National Bureau of Economic Research, 2024

  43. [43]

    Helmut J Geist and Eric F Lambin. Proximate causes and underlying driving forces of tropical deforestation: Tropical forests are disappearing as the result of many pressures, both local and regional, acting in various combinations in different geographical locations. BioScience, 52(2):143–150, 2002

  44. [44]

    Mitigating local causes of ocean acidification with existing laws

    Ryan P Kelly, MM Foley, WS Fisher, RA Feely, BS Halpern, GG Waldbusser, and MR Caldwell. Mitigating local causes of ocean acidification with existing laws. Science, 332(6033):1036–1037, 2011

  45. [45]

    Global versus local causes and health implications of high mercury concentrations in sharks from the east coast of south africa

    Melissa A McKinney, Kylie Dean, Nigel E Hussey, Geremy Cliff, Sabine P Wintner, Sheldon FJ Dudley, M Philip Zungu, and Aaron T Fisk. Global versus local causes and health implications of high mercury concentrations in sharks from the east coast of south africa. Science of the Total Environment, 541:176–183, 2016

  46. [46]

    Global and regional causes of death

    Colin D Mathers, Ties Boerma, and Doris Ma Fat. Global and regional causes of death. British medical bulletin, 92(1):7–32, 2009

  47. [47]

    Causalearn: Automated framework for scalable streaming-based causal Bayesian learning using fpgas

    Bita Darvish Rouhani, Mohammad Ghasemzadeh, and Farinaz Koushanfar. Causalearn: Automated framework for scalable streaming-based causal Bayesian learning using fpgas. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 1–10, 2018

  48. [48]

    Causal discovery from streaming features

    Kui Yu, Xindong Wu, Hao Wang, and Wei Ding. Causal discovery from streaming features. In 2010 IEEE International Conference on Data Mining, pages 1163–1168. IEEE, 2010

  49. [49]

    Local causal structure learning for streaming features

    Dianlong You, Siqi Dong, Shina Niu, Huigui Yan, Zhen Chen, Shunfu Jin, Di Wu, and Xindong Wu. Local causal structure learning for streaming features. Information Sciences, 647:119502, 2023

  50. [50]

    Causality-based online streaming feature selection

    Longzhu Li, Yaojin Lin, Hong Zhao, Jinkun Chen, and Shaozi Li. Causality-based online streaming feature selection. Concurrency and Computation: Practice and Experience, 33(20):e6347, 2021

  51. [51]

    Online causal feature selection for streaming features

    Dianlong You, Ruiqi Li, Shunpan Liang, Miaomiao Sun, Xinju Ou, Fuyong Yuan, Limin Shen, and Xindong Wu. Online causal feature selection for streaming features. IEEE Transactions on Neural Networks and Learning Systems, 34(3):1563–1577, 2021

  52. [52]

    Batch effects are causal effects: applications in human connectomics

    Eric W Bridgeford, Michael Powell, Gregory Kiar, Ross Lawrence, Brian Caffo, Michael Milham, and Joshua T V ogelstein. Batch effects are causal effects: applications in human connectomics. bioRxiv, 3, 2021

  53. [53]

    Practical batch Bayesian sampling algorithms for online adaptive traffic experimentation

    Zezhong Zhang and Ted Yuan. Practical batch Bayesian sampling algorithms for online adaptive traffic experimentation. In Companion Proceedings of the ACM Web Conference 2024, pages 471–480, 2024

  54. [54]

    Can large language models infer causation from correlation? In ICLR, 2024

    Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona T Diab, and Bernhard Schölkopf. Can large language models infer causation from correlation? In ICLR, 2024. 12

  55. [55]

    Causal discovery with language models as imperfect experts

    Stephanie Long, Alexandre Piché, Valentina Zantedeschi, Tibor Schuster, and Alexandre Drouin. Causal discovery with language models as imperfect experts. In ICML 2023 Workshop on Structured Probabilistic Inference and Generative Modeling, 2023

  56. [56]

    Balasubrama- nian, and Amit Sharma

    Aniket Vashishtha, Abbavaram Gowtham Reddy, Abhinav Kumar, Saketh Bachu, Vineeth N. Balasubrama- nian, and Amit Sharma. Causal order: The key to leveraging imperfect experts in causal inference. In The Thirteenth International Conference on Learning Representations, 2025

  57. [57]

    Causal structure learning supervised by large language model

    Taiyu Ban, Lyuzhou Chen, Derui Lyu, Xiangyu Wang, and Huanhuan Chen. Causal structure learning supervised by large language model. CoRR, 2023

  58. [58]

    Efficient Causal Graph Discovery Using Large Language Models

    Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, and Yoshua Bengio. Efficient causal graph discovery using large language models. arXiv preprint arXiv:2402.01207, 2024

  59. [59]

    Can LLMs leverage observational data? towards data-driven causal discovery with LLMs

    Yuni Susanti and Michael Färber. Can LLMs leverage observational data? towards data-driven causal discovery with LLMs. arXiv preprint arXiv:2504.10936, 2025

  60. [60]

    Structural equations with latent variables

    Kenneth A Bollen. Structural equations with latent variables. John Wiley & Sons, 1989

  61. [61]

    Learning sparse nonparametric DAGs

    Xun Zheng, Chen Dan, Bryon Aragam, Pradeep Ravikumar, and Eric Xing. Learning sparse nonparametric DAGs. In International Conference on Artificial Intelligence and Statistics, pages 3414–3425. PMLR, 2020

  62. [62]

    Gradient-based neural dag learning

    Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, and Simon Lacoste-Julien. Gradient-based neural dag learning. In International Conference on Learning Representations, 2020

  63. [63]

    Bayesian artificial intelligence

    Kevin B Korb and Ann E Nicholson. Bayesian artificial intelligence. CRC press, 2010

  64. [64]

    Local computations with probabilities on graphical structures and their application to expert systems

    Steffen L Lauritzen and David J Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B (Methodological), 50, 1988

  65. [65]

    Google analytics sample dataset, 2018

    Google and Kaggle. Google analytics sample dataset, 2018. URL https://www.kaggle.com/ datasets/bigquery/google-analytics-sample/data

  66. [66]

    Modeling wine preferences by data mining from physicochemical properties

    Paulo Cortez, António Cerdeira, Fernando Almeida, Telmo Matos, and José Reis. Modeling wine preferences by data mining from physicochemical properties. Decision support systems, 2009

  67. [67]

    thoughts

    Paul Edmund Chang, Prakhar Verma, S. T. John, Arno Solin, and Mohammad Emtiyaz Khan. Memory- based dual Gaussian processes for sequential learning. In Proceedings of the 40th International Conference on Machine Learning, Proceedings of Machine Learning Research. PMLR, 2023. 13 Technical Appendices and Supplementary Material We organize the supplementary m...

  68. [68]

    Can not be sure about the causal relationship , i . e . , { A } o - o { B } or { B } o - o { A }

  69. [69]

    Changing the state of node which says { B } causally affects a change in another node which says { A } , i . e . B - > A

  70. [70]

    option

    Can not be sure about the causal r e l a t i o n s h i p however { B } is not an ancestor of { A } , { B } o - > { A } Response format : { " option ": option_tag , " thoughts ": " step - by - step thought " } We know the fo llo wi ng causal r e l a t i o n s h i p s : { k n o w n _ r e l a t i o n s h i p } Be extra t h o u g h t f u l and careful about t...