Visual instruction tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

cs.CV · 2026-05-14 · unverdicted · novelty 7.0

TWN attaches separate reasoning and embedding LoRA adapters to a frozen backbone with gradient detachment and a self-supervised gate that decides per input whether to generate CoT, achieving SOTA on MMEB-V2 with 3-5% added parameters and up to 50% fewer reasoning tokens.

CAVE: A Structured Credit Assignment Approach for Fragmented Visual Evidence Reasoning

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

CAVE is a GRPO-based process-reward method that improves VLMs on fragmented visual reasoning by crediting intermediate actions via belief update, evidence acquisition, and adaptive focus, shown on TRACER-Bench and public benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture cs.CV · 2026-05-14 · unverdicted · none · ref 23
TWN attaches separate reasoning and embedding LoRA adapters to a frozen backbone with gradient detachment and a self-supervised gate that decides per input whether to generate CoT, achieving SOTA on MMEB-V2 with 3-5% added parameters and up to 50% fewer reasoning tokens.
CAVE: A Structured Credit Assignment Approach for Fragmented Visual Evidence Reasoning cs.CV · 2026-05-13 · unverdicted · none · ref 3
CAVE is a GRPO-based process-reward method that improves VLMs on fragmented visual reasoning by crediting intermediate actions via belief update, evidence acquisition, and adaptive focus, shown on TRACER-Bench and public benchmarks.

Visual instruction tuning

fields

years

verdicts

representative citing papers

citing papers explorer