PROMETHEUS builds causal atlases from text and data using local predictive-state models and sheaf gluing to create navigable Topos World Models that expose evidence strength and coherence gaps.
hub
Causal reasoning and large language models: Opening a new frontier for causality
15 Pith papers cite this work. Polarity classification is still indexing.
hub tools
representative citing papers
TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
PRCD-MAP assigns per-edge trust to imperfect priors in causal discovery via empirical Bayes calibration and MLP propagation, delivering an ε-safety guarantee that vanishes at prior-quality extremes and empirical gains on CausalTime datasets.
Proposes a sequential causal discovery framework integrating noisy LM priors with batch data via PAG representation and adaptive edge querying for improved structural accuracy.
MALLM-GAN uses multi-agent LLMs to emulate GAN architecture for generating higher-quality synthetic tabular data from small samples than prior models, while preserving privacy.
CausalGuard aggregates LLM-proposed and data-pruned DAGs to weight doubly robust pseudo-outcomes and applies conformal calibration to deliver finite-sample marginal coverage for conditional average treatment effects under graph uncertainty.
CIVeX maps agent tool calls to structural causal queries, checks identifiability, and issues auditable verdicts to prevent false executions while preserving utility on confounded benchmarks.
Introduces the CAUSALT3 benchmark for causal reasoning across Pearl's ladder and Regulated Causal Anchoring (RCA) to reduce sycophancy and skepticism in LLMs via inference-time verification.
Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.
CausalSynth combines structural causal models with LLMs and iterative verification to produce synthetic data that respects given causal structures while remaining linguistically natural.
DeepImagine trains LLMs on counterfactual pairs from clinical trials using supervised fine-tuning and reinforcement learning to improve outcome prediction by approximating causal mechanisms.
Hume's causal judgment requires experiential grounding, structured retrieval, and vivacity transfer, conditions that Bayesian formalizations abstract away while LLMs retain only statistical updating.
The authors introduce a validation framework showing LLMs can pull causal links from disaster social media but require checks against post-event evidence to avoid relying on model priors.
Gemma 3 introduces multimodal open models with architectural changes for efficient long context, trained via distillation and a new post-training recipe that makes the 4B version competitive with prior 27B models and the 27B version comparable to Gemini-1.5-Pro.
citing papers explorer
-
PROMETHEUS: Automating Deep Causal Research Integrating Text, Data and Models
PROMETHEUS builds causal atlases from text and data using local predictive-state models and sheaf gluing to create navigable Topos World Models that expose evidence strength and coherence gaps.
-
TCD-Arena: Assessing Robustness of Time Series Causal Discovery Methods Against Assumption Violations
TCD-Arena is a new customizable testing framework that runs millions of experiments to map how 33 different assumption violations affect time series causal discovery methods and shows ensembles can boost overall robustness.
-
PRCD-MAP: Learning How Much to Trust Imperfect Priors in Causal Discovery
PRCD-MAP assigns per-edge trust to imperfect priors in causal discovery via empirical Bayes calibration and MLP propagation, delivering an ε-safety guarantee that vanishes at prior-quality extremes and empirical gains on CausalTime datasets.
-
Sequential Causal Discovery with Noisy Language Model Priors
Proposes a sequential causal discovery framework integrating noisy LM priors with batch data via PAG representation and adaptive edge querying for improved structural accuracy.
-
MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data
MALLM-GAN uses multi-agent LLMs to emulate GAN architecture for generating higher-quality synthetic tabular data from small samples than prior models, while preserving privacy.
-
CausalGuard: Conformal Inference under Graph Uncertainty
CausalGuard aggregates LLM-proposed and data-pruned DAGs to weight doubly robust pseudo-outcomes and applies conformal calibration to deliver finite-sample marginal coverage for conditional average treatment effects under graph uncertainty.
-
CIVeX: Causal Intervention Verification for Language Agents
CIVeX maps agent tool calls to structural causal queries, checks identifiability, and issues auditable verdicts to prevent false executions while preserving utility on confounded benchmarks.
-
Diagnosing and Mitigating Sycophancy and Skepticism in LLM Causal Judgment
Introduces the CAUSALT3 benchmark for causal reasoning across Pearl's ladder and Regulated Causal Anchoring (RCA) to reduce sycophancy and skepticism in LLMs via inference-time verification.
-
CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models
Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.
-
CasualSynth: Generating Structurally Sound Synthetic Data
CausalSynth combines structural causal models with LLMs and iterative verification to produce synthetic data that respects given causal structures while remaining linguistically natural.
-
DeepImagine: Learning Biomedical Reasoning via Successive Counterfactual Imagining
DeepImagine trains LLMs on counterfactual pairs from clinical trials using supervised fine-tuning and reinforcement learning to improve outcome prediction by approximating causal mechanisms.
-
Hume's Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away
Hume's causal judgment requires experiential grounding, structured retrieval, and vivacity transfer, conditions that Bayesian formalizations abstract away while LLMs retain only statistical updating.
-
Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence
The authors introduce a validation framework showing LLMs can pull causal links from disaster social media but require checks against post-event evidence to avoid relying on model priors.
-
Gemma 3 Technical Report
Gemma 3 introduces multimodal open models with architectural changes for efficient long context, trained via distillation and a new post-training recipe that makes the 4B version competitive with prior 27B models and the 27B version comparable to Gemini-1.5-Pro.
- Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation