YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
Causal Parrots: Large Language Models May Talk Causality But Are Not Causal, August 2023
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 2representative citing papers
ORCA is an agent-orchestrated interactive copilot that automates and guides end-to-end causal analysis from workflow selection to report generation across real-world use cases.
CIVeX maps agent tool calls to structural causal queries, checks identifiability, and issues auditable verdicts to prevent false executions while preserving utility on confounded benchmarks.
CauSim turns scarce causal reasoning labels into scalable supervised data by having LLMs incrementally construct complex executable structural causal models.
CogInstrument represents human reasoning as revisable cognitive motifs in graphical form to support iterative alignment with LLMs during planning tasks, with a N=12 study indicating gains in targeted revision, agency, and trust over standard dialogue interfaces.
Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.
Novelty estimation via LLM prompts enables pruning in Tree-of-Thought search, reducing overall token usage on language planning benchmarks.
A data-derived baseline using feature effects on binary outcomes provides a model-agnostic way to check if machine learning explanations align with the underlying data structure.
citing papers explorer
-
CounterBench: Evaluating and Improving Counterfactual Reasoning in Large Language Models
Introduces CounterBench benchmark and CoIn iterative reasoning method showing LLMs perform near random on formal counterfactual tasks but improve substantially with guided backtracking.
-
Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment
A data-derived baseline using feature effects on binary outcomes provides a model-agnostic way to check if machine learning explanations align with the underlying data structure.