Better, stronger, faster: Tackling the trilemma in mllm-based segmentation with simultaneous textual mask prediction, 2025

Jiazhen Liu, Mingkuan Feng, Long Chen · 2025 · arXiv 2512.00395

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

SSR3D-LLM improves fine-grained 3D grounding in unified 3D-LLMs by generating and scoring sequences of latent spatial reasoning steps from the query using fixed Mask3D proposals.

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

cs.AI · 2026-06-08 · unverdicted · novelty 5.0

DPVR-LF routes saturated vision tokens into a one-layer side branch after layer 4, runs text-only processing through layers 5-17, and performs late fusion at the final layer to reduce visual computation while preserving multimodal performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.

SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs cs.CV · 2026-05-27 · unverdicted · none · ref 24
SSR3D-LLM improves fine-grained 3D grounding in unified 3D-LLMs by generating and scoring sequences of latent spatial reasoning steps from the query using fixed Mask3D proposals.

Better, stronger, faster: Tackling the trilemma in mllm-based segmentation with simultaneous textual mask prediction, 2025

fields

years

verdicts

representative citing papers

citing papers explorer