TWN attaches separate reasoning and embedding LoRA adapters to a frozen backbone with gradient detachment and a self-supervised gate that decides per input whether to generate CoT, achieving SOTA on MMEB-V2 with 3-5% added parameters and up to 50% fewer reasoning tokens.
Visual instruction tuning
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CAVE is a GRPO-based process-reward method that improves VLMs on fragmented visual reasoning by crediting intermediate actions via belief update, evidence acquisition, and adaptive focus, shown on TRACER-Bench and public benchmarks.
citing papers explorer
-
Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture
TWN attaches separate reasoning and embedding LoRA adapters to a frozen backbone with gradient detachment and a self-supervised gate that decides per input whether to generate CoT, achieving SOTA on MMEB-V2 with 3-5% added parameters and up to 50% fewer reasoning tokens.
-
CAVE: A Structured Credit Assignment Approach for Fragmented Visual Evidence Reasoning
CAVE is a GRPO-based process-reward method that improves VLMs on fragmented visual reasoning by crediting intermediate actions via belief update, evidence acquisition, and adaptive focus, shown on TRACER-Bench and public benchmarks.