Multi-Agent Causal Discovery Using Large Language Models
read the original abstract
Causal discovery aims to identify causal relationships between variables and is a fundamental problem across the sciences. Traditional statistical causal discovery (SCD) methods rely solely on observational data and ignore the contextual information available in metadata, whereas recent LLM-based methods exploit metadata but treat the large language model (LLM) as a single agent, leaving its judgments vulnerable to memorized or biased associations. To address this gap, we introduce MAC (Multi-Agent Causal Discovery Framework), which casts causal discovery as a multi-agent debate coupled with the autonomous selection of an SCD algorithm. MAC combines two complementary modules, bridged by a Meta Fusion mechanism: a Debate-Coding Module (DCM) that grounds an initial graph in data by autonomously selecting and executing the best-suited SCD algorithm, and a Meta-Debate Module (MDM) that refines the graph through an adversarial Affirmative-Negative-Judge debate over the metadata. Across five benchmark datasets and three metrics (F1, SHD, NHD), MAC achieves the best aggregate performance among five statistical and four LLM-based baselines, ranking first on 10 of 15 evaluation points with Gemini-2.0-Flash -- including a perfect reconstruction of the Earthquake graph -- and remains robust across three backbone LLMs.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models
PROMETHEUS builds causal atlases from text and data using local predictive-state models and sheaf gluing to create navigable Topos World Models that expose evidence strength and coherence gaps.
-
Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
The authors introduce a validation framework showing LLMs can pull causal links from disaster social media but require checks against post-event evidence to avoid relying on model priors.
-
A Survey of Scaling in Large Language Model Reasoning
A survey categorizing scaling in LLM reasoning across input size, steps, rounds, training, and future directions, noting that scaling can negatively affect performance.
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
The paper surveys reinforced reasoning techniques for LLMs, covering automated data construction, learning-to-reason methods, and test-time scaling as steps toward Large Reasoning Models.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.