From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning

Haoping Yu; Jing Ma; Yuanxi Li

arxiv: 2606.11745 · v1 · pith:Y7A3TBXEnew · submitted 2026-06-10 · 💻 cs.CV · cs.AI

From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning

Haoping Yu , Yuanxi Li , Jing Ma This is my paper

Pith reviewed 2026-06-27 10:16 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords vision-language modelscausal reasoningmulti-image inputscausal tokensintervention taskscausal graphmessage passing

0 comments

The pith

BridgeVLM converts induced causal graphs from multi-image inputs into structured tokens executed by RAMP layers for internalized causal message passing in vision-language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to move causal knowledge from external text prompts into the model's own execution path. It induces a causal graph from several images at once, encodes the graph as Causal Tokens, and routes those tokens through newly inserted RAMP layers inside the language-model decoder so that causal influences can be passed along during generation. Training uses a single interface called M3S that supplies supervision at both local and global levels. The result is higher accuracy on questions that ask what would change if an intervention were performed on the scene.

Core claim

BridgeVLM internalizes visual causal reasoning by inducing a causal graph from multi-image inputs and converting it into structured Causal Tokens executed by RAMP layers injected into the LLM decoder for causal message passing. A unified training interface M3S supplies fine-grained causal supervision from different granularities. On CausalVLBench this raises intervention-task accuracy from 33.2 percent with prompt supervision to 54.4 percent, lifts Causal3D accuracy from 43.6 percent to 49.0 percent, and improves causal-structure F1 from 33.4 percent to 75.1 percent.

What carries the argument

Induction of a causal graph from multi-image inputs followed by its conversion into Causal Tokens that are processed by RAMP layers inserted in the LLM decoder to perform causal message passing.

If this is right

Intervention accuracy on CausalVLBench rises from 33.2 percent to 54.4 percent.
Accuracy on Causal3D rises from 43.6 percent to 49.0 percent.
Causal structure learning F1 on CausalVLBench rises from 33.4 percent to 75.1 percent.
Causal supervision is supplied through a single M3S interface that works at both local and global granularity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token-and-layer pattern could be tested on single-image or video inputs to check whether the gain generalizes when the number of images changes.
Removing the RAMP layers at inference time would show whether the performance lift truly depends on the internal message-passing mechanism.
Applying the same conversion step to other structured relations such as temporal or spatial graphs might improve related reasoning tasks.

Load-bearing premise

An induced causal graph from multi-image inputs can be turned into tokens whose processing by the added layers produces genuine causal message passing that holds up outside the training benchmarks.

What would settle it

Run the model on a fresh multi-image causal benchmark whose images and intervention types were never seen in training and measure whether intervention accuracy stays above the prompt-only baseline.

Figures

Figures reproduced from arXiv: 2606.11745 by Haoping Yu, Jing Ma, Yuanxi Li.

**Figure 1.** Figure 1: Task and supervision overview. Given image pairs/sequences from the same causal scene, the goal is to predict the manipulated variable or the resulting variable states. Optional causal supervision can aid prediction, but is typically imperfect. queries (examples are shown in [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Motivation of BridgeVLM. Compared to traditional VLMs that concatenate images, queries, and supervision as prompts, BridgeVLM closes the (P1) interface gap by routing causal information inside the model and closes the (P2) supervision gap by aligning heterogeneous, imperfect supervision to the appropriate components via M3S, yielding better performance. ing (Ma, 2025) (for example, appending text-form cau… view at source ↗

**Figure 3.** Figure 3: Method overview. BridgeVLM contains three stages: (1) Inducing latent variable features and DAG; (2) generate node tokens; and (3) generate and update causal tokens. M3S (4) further provides causal supervision at heterogeneous granularity. We use D as the maximum number of latent variable features and G as the number of global Graph Tokens. Token sequences are represented as matrices in R (·)×H. Attn(·) d… view at source ↗

**Figure 4.** Figure 4: Visualization examples of induced DAG for CIRCUIT scenario on CausalVLBench. RQ5 (structure diagnostic). We use causal graph recovery on CausalVLBench interventions as a diagnostic for whether internal supervision – none (ANS-ONLY), node explanations (NODEALIGN) or supervising the routing DAG with the ground-truth causal graph as oracle adjacency (GRAPHALIGN) – makes the induced routing DAG more aligned … view at source ↗

**Figure 5.** Figure 5: Visualization examples of induced DAG on CausalVLBench. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗

read the original abstract

Visual causal reasoning is essential for understanding and intervening in the physical world, requiring identification of causal variables from visual inputs and reasoning over intervention effects. Despite recent progress, large vision--language models (VLMs) remain brittle at such tasks, especially for interventional and counterfactual queries over multi-image inputs. Most existing explorations inject causal knowledge via textual prompts, leaving causal mechanisms external to model execution and limiting reliable control during inference. To address this problem, we propose BridgeVLM, which internalizes visual causal reasoning by inducing a causal graph from multi-image inputs and converting it into structured Causal Tokens executed by RAMP layers injected into the LLM decoder for causal message passing. We further introduce a unified training interface M3S for fine-grained causal supervision from different granularities (local/global level). BridgeVLM achieves 54.4% accuracy on intervention tasks on CausalVLBench (vs. 33.2% with prompt-level supervision), improves results on Causal3D from 43.6% to 49.0%, and substantially improves causal structure learning on CausalVLBench ($F_1$: 33.4% $\rightarrow$ 75.1%).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BridgeVLM reports solid benchmark gains by turning induced causal graphs into tokens processed by injected RAMP layers, but the setup lacks ablations that would confirm the gains come from actual causal message passing rather than extra structure or supervision.

read the letter

BridgeVLM internalizes causal reasoning by inducing a graph from multi-image inputs, packing it into Causal Tokens, and routing those through RAMP layers added to the LLM decoder. The numbers beat prompt-level baselines, but nothing isolates whether the improvement traces to genuine causal propagation inside the model.

The concrete new elements are the token conversion step and the RAMP layers for message passing, along with the M3S training interface that supplies supervision at local and global levels. This shifts causal structure from external prompts into the model execution path. The paper states clear lifts on the benchmarks it uses: intervention accuracy rises from 33.2% to 54.4% on CausalVLBench, Causal3D improves from 43.6% to 49.0%, and causal structure F1 jumps from 33.4% to 75.1%. Those deltas are the main empirical contribution.

The soft spot is the absence of controls that would test the causal claim. No ablation keeps the token format but randomizes or removes the graph edges, and no run replaces RAMP layers with otherwise identical decoder layers while holding token count and objective fixed. Without those, the performance edge could come from finer-grained supervision or added capacity instead of internalized causal mechanics. The graph induction step also receives little detail on how reliably it recovers edges from visual inputs.

The work targets researchers who want VLMs to handle intervention and counterfactual queries over visual data without relying on prompt engineering. A reader focused on architectural routes for embedding causality will find the proposal and the reported numbers worth examining.

It deserves peer review. The architecture is distinct from the prompt baselines it contrasts against, the benchmarks are relevant, and the empirical claims are stated plainly enough for referees to request the missing controls.

Referee Report

1 major / 2 minor

Summary. The paper proposes BridgeVLM to internalize visual causal reasoning in VLMs: it induces a causal graph from multi-image inputs, converts the graph into structured Causal Tokens, and executes them via injected RAMP layers in the LLM decoder to perform causal message passing. A unified M3S training interface supplies fine-grained (local/global) causal supervision. On CausalVLBench the model reaches 54.4% accuracy on intervention tasks (vs. 33.2% with prompt-level supervision), improves Causal3D from 43.6% to 49.0%, and raises causal-structure F1 from 33.4% to 75.1%.

Significance. If the reported gains are shown to arise from genuine causal message passing rather than extra capacity or supervision granularity, the work would constitute a meaningful advance over prompt-only causal injection, offering a route to more reliable interventional and counterfactual reasoning inside the model itself. The magnitude of the accuracy and F1 lifts on two benchmarks would be noteworthy for multi-image causal VLM research.

major comments (1)

[Abstract and §3] Abstract and §3 (method description of RAMP layers and Causal Tokens): the central claim that the induced graph is converted into tokens whose execution inside RAMP layers produces verifiable causal message passing (i.e., interventions propagate according to graph edges) is load-bearing for the contribution. No ablation is described that (a) keeps the token format but randomizes or removes the graph structure, or (b) replaces RAMP layers with otherwise identical decoder layers while preserving token count and training objective. Without such a control the 54.4% vs. 33.2% delta on intervention tasks cannot be attributed to internalized causal mechanics rather than M3S supervision or added parameters.

minor comments (2)

[Abstract] Abstract: numeric results are given without error bars, number of runs, or statistical tests, which weakens assessment of the reliability of the reported improvements.
[Results] Results section: tables or figures reporting the benchmark numbers should include variance or confidence intervals and clarify whether the baselines use identical training data and compute.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (method description of RAMP layers and Causal Tokens): the central claim that the induced graph is converted into tokens whose execution inside RAMP layers produces verifiable causal message passing (i.e., interventions propagate according to graph edges) is load-bearing for the contribution. No ablation is described that (a) keeps the token format but randomizes or removes the graph structure, or (b) replaces RAMP layers with otherwise identical decoder layers while preserving token count and training objective. Without such a control the 54.4% vs. 33.2% delta on intervention tasks cannot be attributed to internalized causal mechanics rather than M3S supervision or added parameters.

Authors: We agree that these specific controls are needed to isolate whether gains arise from causal message passing. Our prompt-level baseline controls for supervision granularity but not for token structure or RAMP architecture. We will add the requested ablations—(a) randomizing graph structure while preserving token format/count and (b) swapping RAMP layers for equivalent decoder layers—and report the results in the revision to strengthen attribution of the intervention accuracy lift. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The provided abstract and description introduce BridgeVLM as an architectural approach that induces causal graphs, converts them to Causal Tokens, and executes them via injected RAMP layers under M3S supervision, with reported empirical gains on CausalVLBench and Causal3D. No equations, derivations, or self-citations appear that reduce these gains or the claimed internalization of causal message passing to quantities defined by the same inputs or prior self-referential results. The performance deltas are framed as outcomes of the new components rather than statistical artifacts of fitting or renaming. The derivation chain is self-contained against external benchmarks with no load-bearing steps that collapse by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Abstract-only review; no explicit free parameters, mathematical axioms, or background assumptions are stated. The central contribution rests on the unelaborated claim that the induced graph and RAMP layers implement causal message passing.

invented entities (2)

Causal Tokens no independent evidence
purpose: Structured representation of the induced causal graph that is executed inside the model
Introduced as the mechanism to internalize causal supervision; no independent evidence supplied.
RAMP layers no independent evidence
purpose: Injected decoder layers that perform causal message passing on the tokens
New architectural component claimed to enable internal causal reasoning; no independent evidence supplied.

pith-pipeline@v0.9.1-grok · 5740 in / 1384 out tokens · 39698 ms · 2026-06-27T10:16:02.249983+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 7 canonical work pages

[1]

C ausal VLB ench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models

Komanduri, Aneesh and Bhaila, Karuna and Wu, Xintao. C ausal VLB ench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1561

work page doi:10.18653/v1/2025.emnlp-main.1561 2025
[2]

2025 , eprint=

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data , author=. 2025 , eprint=

2025
[3]

CELLO : Causal Evaluation of Large Vision-Language Models

Chen, Meiqi and Peng, Bo and Zhang, Yan and Lu, Chaochao. CELLO : Causal Evaluation of Large Vision-Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1247

work page doi:10.18653/v1/2024.emnlp-main.1247 2024
[4]

Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance

Zhao, Haozhe and Si, Shuzheng and Chen, Liang and Zhang, Yichi and Sun, Maosong and Chang, Baobao and Zhang, Minjia. Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1...

work page doi:10.18653/v1/2025.emnlp-main.995 2025
[5]

2025 , url=

Vision Language Models are Biased , author=. 2025 , url=

2025
[6]

The Thirteenth International Conference on Learning Representations , year=

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality , author=. The Thirteenth International Conference on Learning Representations , year=
[7]

Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective , volume =

Zhang, Yanan and Li, Jiangmeng and Liu, Lixiang and Qiang, Wenwen , booktitle =. Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective , volume =. doi:10.52202/079017-1238 , editor =

work page doi:10.52202/079017-1238
[8]

Forty-second International Conference on Machine Learning , year=

Learning Invariant Causal Mechanism from Vision-Language Models , author=. Forty-second International Conference on Machine Learning , year=
[9]

The Thirteenth International Conference on Learning Representations , year=

Causal Graphical Models for Vision-Language Compositional Understanding , author=. The Thirteenth International Conference on Learning Representations , year=
[10]

2018 , eprint=

DAGs with NO TEARS: Continuous Optimization for Structure Learning , author=. 2018 , eprint=

2018
[11]

UAI 2022 Workshop on Causal Representation Learning , year=

Can Foundation Models Talk Causality? , author=. UAI 2022 Workshop on Causal Representation Learning , year=

2022
[12]

Transactions on Machine Learning Research , issn=

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality , author=. Transactions on Machine Learning Research , issn=. 2024 , url=

2024
[13]

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? , url =

Chi, Haoang and Li, He and Yang, Wenjing and Liu, Feng and Lan, Long and Ren, Xiaoguang and Liu, Tongliang and Han, Bo , booktitle =. Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? , url =. doi:10.52202/079017-3064 , editor =

work page doi:10.52202/079017-3064
[14]

CLADDER: assessing causal reasoning in language models , year =

Jin, Zhijing and Chen, Yuen and Leeb, Felix and Gresele, Luigi and Kamal, Ojasv and Lyu, Zhiheng and Blin, Kevin and Gonzalez, Fernando and Kleiman-Weiner, Max and Sachan, Mrinmaya and Sch\". CLADDER: assessing causal reasoning in language models , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =
[15]

C ausal B ench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models

Wang, Zeyu. C ausal B ench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models. Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10). 2024

2024
[16]

2024 , eprint=

Can large language models build causal graphs? , author=. 2024 , eprint=

2024
[17]

2023 , eprint=

Causal Discovery with Language Models as Imperfect Experts , author=. 2023 , eprint=

2023
[18]

2024 , eprint=

Efficient Causal Graph Discovery Using Large Language Models , author=. 2024 , eprint=

2024
[19]

2024 , eprint=

Large Language Models are Effective Priors for Causal Graph Discovery , author=. 2024 , eprint=

2024
[20]

2024 , eprint=

LLM-initialized Differentiable Causal Discovery , author=. 2024 , eprint=

2024
[21]

2025 , eprint=

Large Language Models for Causal Discovery: Current Landscape and Future Directions , author=. 2025 , eprint=

2025
[22]

Zhang, Congzhi and Zhang, Linhai and Wu, Jialong and He, Yulan and Zhou, Deyu , title =. Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence , articleno =. 2025 , isbn =. doi:...

work page doi:10.1609/aaai.v39i24.34777 2025
[23]

2024 , url=

Haitao Jiang and Lin Ge and Yuhe Gao and Jianian Wang and Rui Song , booktitle=. 2024 , url=

2024
[24]

2024 , eprint=

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning , author=. 2024 , eprint=

2024
[25]

2025 , eprint=

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs , author=. 2025 , eprint=

2025
[26]

2024 , eprint=

LLaVA-OneVision: Easy Visual Task Transfer , author=. 2024 , eprint=

2024
[27]

2024 , howpublished =

Gemini 2.0 Flash (Model Documentation) , author =. 2024 , howpublished =

2024
[28]

2023 , eprint=

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal , author=. 2023 , eprint=

2023
[29]

2020 , eprint=

End-to-End Object Detection with Transformers , author=. 2020 , eprint=

2020
[30]

Kuhn, H. W. , title =. Naval Research Logistics Quarterly , volume =. doi:https://doi.org/10.1002/nav.3800020109 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/nav.3800020109 , abstract =

work page doi:10.1002/nav.3800020109
[31]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=
[32]

2024 , eprint=

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding , author=. 2024 , eprint=

2024
[33]

arXiv preprint arXiv:2502.13923 , year=

Qwen2.5-VL Technical Report , author=. arXiv preprint arXiv:2502.13923 , year=

Pith/arXiv arXiv
[34]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Causal inference with large language model: A survey , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025
[35]

Proceedings of the 36th International Conference on Machine Learning , pages =

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks , author =. Proceedings of the 36th International Conference on Machine Learning , pages =. 2019 , editor =

2019
[36]

On the Connection Between

Cai, Chen and Hy, Truong Son and Yu, Rose and Wang, Yusu , booktitle =. On the Connection Between. 2023 , editor =

2023
[37]

Object-Centric Learning with Slot Attention , url =

Locatello, Francesco and Weissenborn, Dirk and Unterthiner, Thomas and Mahendran, Aravindh and Heigold, Georg and Uszkoreit, Jakob and Dosovitskiy, Alexey and Kipf, Thomas , booktitle =. Object-Centric Learning with Slot Attention , url =

[1] [1]

C ausal VLB ench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models

Komanduri, Aneesh and Bhaila, Karuna and Wu, Xintao. C ausal VLB ench: Benchmarking Visual Causal Reasoning in Large Vision-Language Models. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1/2025.emnlp-main.1561

work page doi:10.18653/v1/2025.emnlp-main.1561 2025

[2] [2]

2025 , eprint=

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data , author=. 2025 , eprint=

2025

[3] [3]

CELLO : Causal Evaluation of Large Vision-Language Models

Chen, Meiqi and Peng, Bo and Zhang, Yan and Lu, Chaochao. CELLO : Causal Evaluation of Large Vision-Language Models. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024. doi:10.18653/v1/2024.emnlp-main.1247

work page doi:10.18653/v1/2024.emnlp-main.1247 2024

[4] [4]

Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance

Zhao, Haozhe and Si, Shuzheng and Chen, Liang and Zhang, Yichi and Sun, Maosong and Chang, Baobao and Zhang, Minjia. Looking Beyond Text: Reducing Language Bias in Large Vision-Language Models via Multimodal Dual-Attention and Soft-Image Guidance. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 2025. doi:10.18653/v1...

work page doi:10.18653/v1/2025.emnlp-main.995 2025

[5] [5]

2025 , url=

Vision Language Models are Biased , author=. 2025 , url=

2025

[6] [6]

The Thirteenth International Conference on Learning Representations , year=

Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality , author=. The Thirteenth International Conference on Learning Representations , year=

[7] [7]

Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective , volume =

Zhang, Yanan and Li, Jiangmeng and Liu, Lixiang and Qiang, Wenwen , booktitle =. Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective , volume =. doi:10.52202/079017-1238 , editor =

work page doi:10.52202/079017-1238

[8] [8]

Forty-second International Conference on Machine Learning , year=

Learning Invariant Causal Mechanism from Vision-Language Models , author=. Forty-second International Conference on Machine Learning , year=

[9] [9]

The Thirteenth International Conference on Learning Representations , year=

Causal Graphical Models for Vision-Language Compositional Understanding , author=. The Thirteenth International Conference on Learning Representations , year=

[10] [10]

2018 , eprint=

DAGs with NO TEARS: Continuous Optimization for Structure Learning , author=. 2018 , eprint=

2018

[11] [11]

UAI 2022 Workshop on Causal Representation Learning , year=

Can Foundation Models Talk Causality? , author=. UAI 2022 Workshop on Causal Representation Learning , year=

2022

[12] [12]

Transactions on Machine Learning Research , issn=

Causal Reasoning and Large Language Models: Opening a New Frontier for Causality , author=. Transactions on Machine Learning Research , issn=. 2024 , url=

2024

[13] [13]

Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? , url =

Chi, Haoang and Li, He and Yang, Wenjing and Liu, Feng and Lan, Long and Ren, Xiaoguang and Liu, Tongliang and Han, Bo , booktitle =. Unveiling Causal Reasoning in Large Language Models: Reality or Mirage? , url =. doi:10.52202/079017-3064 , editor =

work page doi:10.52202/079017-3064

[14] [14]

CLADDER: assessing causal reasoning in language models , year =

Jin, Zhijing and Chen, Yuen and Leeb, Felix and Gresele, Luigi and Kamal, Ojasv and Lyu, Zhiheng and Blin, Kevin and Gonzalez, Fernando and Kleiman-Weiner, Max and Sachan, Mrinmaya and Sch\". CLADDER: assessing causal reasoning in language models , year =. Proceedings of the 37th International Conference on Neural Information Processing Systems , articleno =

[15] [15]

C ausal B ench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models

Wang, Zeyu. C ausal B ench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models. Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10). 2024

2024

[16] [16]

2024 , eprint=

Can large language models build causal graphs? , author=. 2024 , eprint=

2024

[17] [17]

2023 , eprint=

Causal Discovery with Language Models as Imperfect Experts , author=. 2023 , eprint=

2023

[18] [18]

2024 , eprint=

Efficient Causal Graph Discovery Using Large Language Models , author=. 2024 , eprint=

2024

[19] [19]

2024 , eprint=

Large Language Models are Effective Priors for Causal Graph Discovery , author=. 2024 , eprint=

2024

[20] [20]

2024 , eprint=

LLM-initialized Differentiable Causal Discovery , author=. 2024 , eprint=

2024

[21] [21]

2025 , eprint=

Large Language Models for Causal Discovery: Current Landscape and Future Directions , author=. 2025 , eprint=

2025

[22] [22]

Zhang, Congzhi and Zhang, Linhai and Wu, Jialong and He, Yulan and Zhou, Deyu , title =. Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence and Thirty-Seventh Conference on Innovative Applications of Artificial Intelligence and Fifteenth Symposium on Educational Advances in Artificial Intelligence , articleno =. 2025 , isbn =. doi:...

work page doi:10.1609/aaai.v39i24.34777 2025

[23] [23]

2024 , url=

Haitao Jiang and Lin Ge and Yuhe Gao and Jianian Wang and Rui Song , booktitle=. 2024 , url=

2024

[24] [24]

2024 , eprint=

Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning , author=. 2024 , eprint=

2024

[25] [25]

2025 , eprint=

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs , author=. 2025 , eprint=

2025

[26] [26]

2024 , eprint=

LLaVA-OneVision: Easy Visual Task Transfer , author=. 2024 , eprint=

2024

[27] [27]

2024 , howpublished =

Gemini 2.0 Flash (Model Documentation) , author =. 2024 , howpublished =

2024

[28] [28]

2023 , eprint=

Causal Parrots: Large Language Models May Talk Causality But Are Not Causal , author=. 2023 , eprint=

2023

[29] [29]

2020 , eprint=

End-to-End Object Detection with Transformers , author=. 2020 , eprint=

2020

[30] [30]

Kuhn, H. W. , title =. Naval Research Logistics Quarterly , volume =. doi:https://doi.org/10.1002/nav.3800020109 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1002/nav.3800020109 , abstract =

work page doi:10.1002/nav.3800020109

[31] [31]

Thirty-seventh Conference on Neural Information Processing Systems , year=

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting , author=. Thirty-seventh Conference on Neural Information Processing Systems , year=

[32] [32]

2024 , eprint=

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding , author=. 2024 , eprint=

2024

[33] [33]

arXiv preprint arXiv:2502.13923 , year=

Qwen2.5-VL Technical Report , author=. arXiv preprint arXiv:2502.13923 , year=

Pith/arXiv arXiv

[34] [34]

Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

Causal inference with large language model: A survey , author=. Findings of the Association for Computational Linguistics: NAACL 2025 , pages=

2025

[35] [35]

Proceedings of the 36th International Conference on Machine Learning , pages =

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks , author =. Proceedings of the 36th International Conference on Machine Learning , pages =. 2019 , editor =

2019

[36] [36]

On the Connection Between

Cai, Chen and Hy, Truong Son and Yu, Rose and Wang, Yusu , booktitle =. On the Connection Between. 2023 , editor =

2023

[37] [37]

Object-Centric Learning with Slot Attention , url =

Locatello, Francesco and Weissenborn, Dirk and Unterthiner, Thomas and Mahendran, Aravindh and Heigold, Georg and Uszkoreit, Jakob and Dosovitskiy, Alexey and Kipf, Thomas , booktitle =. Object-Centric Learning with Slot Attention , url =