Title resolution pending

Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, et al · 2024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

EmoTrans is a new video benchmark with four progressive tasks that measures how well current multimodal LLMs handle dynamic emotion transitions rather than static recognition.

DISSECT: Diagnosing Where Vision Ends and Language Priors Begin in Scientific VLMs

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

DISSECT benchmark reveals that VLMs extract visual details from scientific diagrams but frequently lose them during reasoning, with open-source models showing a larger integration gap than closed-source ones.

State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

MLLMs ignore dial state geometry and cluster by appearance, causing inconsistency under variations; TriSCA's state-distance alignment, metadata supervision, and objective alignment improve robustness on clock and gauge benchmarks.

SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making

cs.AI · 2026-05-10 · unverdicted · novelty 5.0

SKG-VLA models each complaint as a structured scene via a Scene Knowledge Graph to improve policy-grounded multimodal reasoning and decision accuracy.

An Empirical Study of Perceptions of General LLMs and Multimodal LLMs on Hugging Face

cs.SE · 2026-04-07 · unverdicted · novelty 4.0

Hugging Face discussions show that access barriers, output quality, and setup complexity are the main user concerns for both general and multimodal LLMs.

citing papers explorer

Showing 5 of 5 citing papers.

EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs cs.CV · 2026-04-25 · unverdicted · none · ref 37
EmoTrans is a new video benchmark with four progressive tasks that measures how well current multimodal LLMs handle dynamic emotion transitions rather than static recognition.
DISSECT: Diagnosing Where Vision Ends and Language Priors Begin in Scientific VLMs cs.CV · 2026-04-06 · unverdicted · none · ref 18
DISSECT benchmark reveals that VLMs extract visual details from scientific diagrams but frequently lose them during reasoning, with open-source models showing a larger integration gap than closed-source ones.
State Beyond Appearance: Diagnosing and Improving State Consistency in Dial-Based Measurement Reading cs.CV · 2026-04-29 · unverdicted · none · ref 43
MLLMs ignore dial state geometry and cluster by appearance, causing inconsistency under variations; TriSCA's state-distance alignment, metadata supervision, and objective alignment improve robustness on clock and gauge benchmarks.
SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making cs.AI · 2026-05-10 · unverdicted · none · ref 48
SKG-VLA models each complaint as a structured scene via a Scene Knowledge Graph to improve policy-grounded multimodal reasoning and decision accuracy.
An Empirical Study of Perceptions of General LLMs and Multimodal LLMs on Hugging Face cs.SE · 2026-04-07 · unverdicted · none · ref 80
Hugging Face discussions show that access barriers, output quality, and setup complexity are the main user concerns for both general and multimodal LLMs.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer