Title resolution pending

Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, et al · 2024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images

cs.CV · 2026-05-12 · unverdicted · novelty 7.0 · 3 refs

MLLMs display a large perception-reasoning gap on perspective-conditioned spatial reasoning tasks from omnidirectional images, with sharp accuracy drops on advanced tasks like egocentric rotation, though partial gains are possible via RL reward shaping.

EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs

cs.CV · 2026-04-25 · unverdicted · novelty 7.0

EmoTrans is a new video benchmark with four progressive tasks that measures how well current multimodal LLMs handle dynamic emotion transitions rather than static recognition.

UIPress: Bringing Optical Token Compression to UI-to-Code Generation

cs.CL · 2026-04-10 · unverdicted · novelty 7.0

UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.

Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.

An Empirical Study of Perceptions of General LLMs and Multimodal LLMs on Hugging Face

cs.SE · 2026-04-07 · unverdicted · novelty 4.0

Hugging Face discussions show that access barriers, output quality, and setup complexity are the main user concerns for both general and multimodal LLMs.

citing papers explorer

Showing 5 of 5 citing papers.

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images cs.CV · 2026-05-12 · unverdicted · none · ref 34 · 3 links
MLLMs display a large perception-reasoning gap on perspective-conditioned spatial reasoning tasks from omnidirectional images, with sharp accuracy drops on advanced tasks like egocentric rotation, though partial gains are possible via RL reward shaping.
EmoTrans: A Benchmark for Understanding, Reasoning, and Predicting Emotion Transitions in Multimodal LLMs cs.CV · 2026-04-25 · unverdicted · none · ref 20
EmoTrans is a new video benchmark with four progressive tasks that measures how well current multimodal LLMs handle dynamic emotion transitions rather than static recognition.
UIPress: Bringing Optical Token Compression to UI-to-Code Generation cs.CL · 2026-04-10 · unverdicted · none · ref 25
UIPress is the first encoder-side learned optical compression method for UI-to-Code that compresses visual tokens to 256, outperforming the uncompressed baseline by 7.5% CLIP score and the best inference-time baseline by 4.6% while delivering 9.1x TTFT speedup.
Precise Shield: Explaining and Aligning VLLM Safety via Neuron-Level Guidance cs.CV · 2026-04-10 · unverdicted · none · ref 22
Precise Shield identifies safety neurons in VLLMs via activation contrasts and aligns only them with gradient masking, boosting safety, preserving generalization, and enabling zero-shot cross-lingual and cross-modal transfer.
An Empirical Study of Perceptions of General LLMs and Multimodal LLMs on Hugging Face cs.SE · 2026-04-07 · unverdicted · none · ref 54
Hugging Face discussions show that access barriers, output quality, and setup complexity are the main user concerns for both general and multimodal LLMs.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer