Sketch-in-latents: Eliciting unified reasoning in mllms.CoRR, abs/2512.16584

Jintao Tong, Jiaqi Gu, Yujing Lou, Lubin Fan, Yixiong Zou, Yue Wu, Jieping Ye, Ruixuan Li · 2025 · arXiv 2512.16584

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

baseline 2 background 1

citation-polarity summary

baseline 2 background 1

representative citing papers

UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

UniVLR unifies textual and visual reasoning in multimodal LLMs by compressing reasoning traces and auxiliary images into visual latent tokens for direct inference without interleaved text CoT.

Hybrid Latent Reasoning with Decoupled Policy Optimization

cs.CV · 2026-04-22 · unverdicted · novelty 7.0

HyLaR with DePO enables effective RL in hybrid discrete-continuous spaces for multimodal models, outperforming prior MLLMs on perception and understanding benchmarks.

Forest Before Trees: Latent Superposition for Efficient Visual Reasoning

cs.CL · 2026-01-11 · unverdicted · novelty 7.0

Laser reformulates visual reasoning via Dynamic Windowed Alignment Learning to maintain latent superposition of global features, delivering 5.03% average gains over Monet and over 97% fewer inference tokens on six benchmarks.

Leveraging Latent Visual Reasoning in Silence

cs.CV · 2026-05-18 · conditional · novelty 6.0

Latent visual reasoning improves multimodal models via training effects even without using latent tokens at inference, enabled by an attention-based RL reward that promotes interaction with text tokens.

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

cs.CV · 2026-04-09 · unverdicted · novelty 5.0

HDPO reframes tool efficiency as a conditional objective within accurate trajectories, enabling Metis to reduce tool invocations by orders of magnitude while raising reasoning accuracy.

citing papers explorer

Showing 5 of 5 citing papers.

UniVLR: Unifying Text and Vision in Visual Latent Reasoning for Multimodal LLMs cs.CV · 2026-05-12 · unverdicted · none · ref 13
UniVLR unifies textual and visual reasoning in multimodal LLMs by compressing reasoning traces and auxiliary images into visual latent tokens for direct inference without interleaved text CoT.
Hybrid Latent Reasoning with Decoupled Policy Optimization cs.CV · 2026-04-22 · unverdicted · none · ref 28
HyLaR with DePO enables effective RL in hybrid discrete-continuous spaces for multimodal models, outperforming prior MLLMs on perception and understanding benchmarks.
Forest Before Trees: Latent Superposition for Efficient Visual Reasoning cs.CL · 2026-01-11 · unverdicted · none · ref 30
Laser reformulates visual reasoning via Dynamic Windowed Alignment Learning to maintain latent superposition of global features, delivering 5.03% average gains over Monet and over 97% fewer inference tokens on six benchmarks.
Leveraging Latent Visual Reasoning in Silence cs.CV · 2026-05-18 · conditional · none · ref 29
Latent visual reasoning improves multimodal models via training effects even without using latent tokens at inference, enabled by an attention-based RL reward that promotes interaction with text tokens.
Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models cs.CV · 2026-04-09 · unverdicted · none · ref 29
HDPO reframes tool efficiency as a conditional objective within accurate trajectories, enabling Metis to reduce tool invocations by orders of magnitude while raising reasoning accuracy.

Sketch-in-latents: Eliciting unified reasoning in mllms.CoRR, abs/2512.16584

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer