Causal Parrots: Large Language Models May Talk Causality But Are Not Causal, August 2023

· 2023 · arXiv 2308.13067

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

YoCausal: How Far is Video Generation from World Model? A Causality Perspective

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.

ORCA: An End-to-End Interactive Copilot for Optimized Root Cause Analysis

cs.AI · 2026-05-26 · unverdicted · novelty 6.0

ORCA is an agent-orchestrated interactive copilot that automates and guides end-to-end causal analysis from workflow selection to report generation across real-world use cases.

CIVeX: Causal Intervention Verification for Language Agents

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

CIVeX maps agent tool calls to structural causal queries, checks identifiability, and issues auditable verdicts to prevent false executions while preserving utility on confounded benchmarks.

CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

CauSim turns scarce causal reasoning labels into scalable supervised data by having LLMs incrementally construct complex executable structural causal models.

CogInstrument: Modeling Cognitive Processes for Bidirectional Human-LLM Alignment in Planning Tasks

cs.HC · 2026-04-12 · unverdicted · novelty 6.0

CogInstrument represents human reasoning as revisable cognitive motifs in graphical form to support iterative alignment with LLMs during planning tasks, with a N=12 study indicating gains in targeted revision, agency, and trust over standard dialogue interfaces.

CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models

cs.CL · 2025-02-16 · unverdicted · novelty 6.0

Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.

Novelty-based Tree-of-Thought Search for LLM Reasoning and Planning

cs.AI · 2026-05-07 · unverdicted · novelty 5.0

Novelty estimation via LLM prompts enables pruning in Tree-of-Thought search, reducing overall token usage on language planning benchmarks.

Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment

cs.LG · 2025-11-26 · unverdicted · novelty 5.0

A data-derived baseline using feature effects on binary outcomes provides a model-agnostic way to check if machine learning explanations align with the underlying data structure.

citing papers explorer

Showing 2 of 2 citing papers after filters.

CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models cs.CL · 2025-02-16 · unverdicted · none · ref 43
Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.
Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment cs.LG · 2025-11-26 · unverdicted · none · ref 12
A data-derived baseline using feature effects on binary outcomes provides a model-agnostic way to check if machine learning explanations align with the underlying data structure.

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal, August 2023

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer