Imagination helps visual reasoning, but not yet in latent space

You Li, Chi Chen, Yanghao Li, Fanhu Zeng, Kaiyu Huang, Jinan Xu, Maosong Sun · 2026 · arXiv 2602.22766

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Leveraging Latent Visual Reasoning in Silence

cs.CV · 2026-05-18 · conditional · novelty 6.0

Latent visual reasoning improves multimodal models via training effects even without using latent tokens at inference, enabled by an attention-based RL reward that promotes interaction with text tokens.

Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

RIS improves MLLM latent visual reasoning by retrieving spatial-semantic evidence, integrating it via attention bottlenecks, and synthesizing it with language transition tokens, yielding gains on V*, HRBench, MMVP, and BLINK benchmarks.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

What's Holding Back Latent Visual Reasoning?

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

Latent visual reasoning fails in current models because standard datasets make oracle latents uninformative and inference-time latents collapse away from useful representations.

citing papers explorer

Showing 4 of 4 citing papers.

Leveraging Latent Visual Reasoning in Silence cs.CV · 2026-05-18 · conditional · none · ref 16
Latent visual reasoning improves multimodal models via training effects even without using latent tokens at inference, enabled by an attention-based RL reward that promotes interaction with text tokens.
Retrieve, Integrate, and Synthesize: Spatial-Semantic Grounded Latent Visual Reasoning cs.CL · 2026-05-08 · unverdicted · none · ref 14
RIS improves MLLM latent visual reasoning by retrieving spatial-semantic evidence, integrating it via attention bottlenecks, and synthesizing it with language transition tokens, yielding gains on V*, HRBench, MMVP, and BLINK benchmarks.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 55
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
What's Holding Back Latent Visual Reasoning? cs.CV · 2026-05-18 · unverdicted · none · ref 12
Latent visual reasoning fails in current models because standard datasets make oracle latents uninformative and inference-time latents collapse away from useful representations.

Imagination helps visual reasoning, but not yet in latent space

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer