CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models

· 2025 · cs.CL · arXiv 2502.11008

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

Counterfactual reasoning is widely recognized as one of the most challenging and intricate aspects of causality in artificial intelligence. In this paper, we evaluate the performance of large language models (LLMs) in counterfactual reasoning. In contrast to previous studies that primarily focus on commonsense causal reasoning, where LLMs often rely on prior knowledge for inference, we specifically assess their ability to perform counterfactual inference using a set of formal rules. To support this evaluation, we introduce a new benchmark dataset, CounterBench, comprising 1K counterfactual reasoning questions. The dataset is designed with varying levels of difficulty, diverse causal graph structures, distinct types of counterfactual questions, and multiple nonsensical name variants. Our experiments demonstrate that counterfactual reasoning poses a significant challenge for LLMs, with most models performing at levels comparable to random guessing. To enhance LLM's counterfactual reasoning ability, we propose a novel reasoning paradigm, CoIn, which guides LLMs through iterative reasoning and backtracking to systematically explore counterfactual solutions. Experimental results show that our method significantly improves LLM performance on counterfactual reasoning tasks and consistently enhances performance across different LLMs.Our dataset is available at https://huggingface.co/datasets/CounterBench/CounterBench.

representative citing papers

WorldKernel: A World Model is the Coupling Kernel of Admissible Possible Worlds

cs.AI · 2026-06-09 · unverdicted · novelty 6.0

A world model is a positive semidefinite coupling kernel over admissible possible worlds, with the off-diagonal supplying the structural information for counterfactual queries that standard prediction cannot recover.

Counterfactual Modeling with Fine-Tuned LLMs for Health Intervention Design and Sensor Data Augmentation

cs.LG · 2026-01-21 · conditional · novelty 6.0

Fine-tuned LLMs produce plausible counterfactuals for health interventions and recover 20% F1 via data augmentation in label-scarce sensor datasets.

Causal Tongue-Tie: LLMs Can Encode Causal Direction, But Their Yes/No Outputs Fail to Express

cs.CL · 2026-05-25 · unverdicted · novelty 5.0

LLMs encode causal direction internally via probes but revert to commonsense in Yes/No outputs on anti-commonsense items, showing output accuracy alone does not measure causal understanding.

DeepImagine: Learning Biomedical Reasoning via Successive Counterfactual Imagining

cs.CL · 2026-04-24 · unverdicted · novelty 5.0

DeepImagine trains LLMs on counterfactual pairs from clinical trials using supervised fine-tuning and reinforcement learning to improve outcome prediction by approximating causal mechanisms.

Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform

cs.AI · 2026-05-13 · unverdicted · novelty 4.0

In the Flux environment, RL agents with explicit latent state access achieve ~79% win rate versus ~11% for LLMs on long-horizon tasks, illustrating limitations of sequence prediction for dynamic reasoning.

citing papers explorer

Showing 4 of 4 citing papers after filters.

WorldKernel: A World Model is the Coupling Kernel of Admissible Possible Worlds cs.AI · 2026-06-09 · unverdicted · none · ref 31 · internal anchor
A world model is a positive semidefinite coupling kernel over admissible possible worlds, with the off-diagonal supplying the structural information for counterfactual queries that standard prediction cannot recover.
Causal Tongue-Tie: LLMs Can Encode Causal Direction, But Their Yes/No Outputs Fail to Express cs.CL · 2026-05-25 · unverdicted · none · ref 2 · internal anchor
LLMs encode causal direction internally via probes but revert to commonsense in Yes/No outputs on anti-commonsense items, showing output accuracy alone does not measure causal understanding.
DeepImagine: Learning Biomedical Reasoning via Successive Counterfactual Imagining cs.CL · 2026-04-24 · unverdicted · none · ref 3 · internal anchor
DeepImagine trains LLMs on counterfactual pairs from clinical trials using supervised fine-tuning and reinforcement learning to improve outcome prediction by approximating causal mechanisms.
Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform cs.AI · 2026-05-13 · unverdicted · none · ref 13 · internal anchor
In the Flux environment, RL agents with explicit latent state access achieve ~79% win rate versus ~11% for LLMs on long-horizon tasks, illustrating limitations of sequence prediction for dynamic reasoning.

CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer