Improved baselines with visual instruction tuning

Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.

Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

A 0.5B student VLM distills from a 3B teacher using visual-switch distillation and DBiLD loss to gain 3.6 points on average across 10 multimodal benchmarks without architecture changes.

citing papers explorer

Showing 2 of 2 citing papers.

PixDLM: A Dual-Path Multimodal Language Model for UAV Reasoning Segmentation cs.CV · 2026-04-17 · unverdicted · none · ref 23
The work introduces the UAV Reasoning Segmentation task, the DRSeg benchmark dataset, and PixDLM as a baseline dual-path multimodal language model for reasoning-based segmentation in aerial imagery.
Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models cs.CV · 2026-04-16 · unverdicted · none · ref 31
A 0.5B student VLM distills from a 3B teacher using visual-switch distillation and DBiLD loss to gain 3.6 points on average across 10 multimodal benchmarks without architecture changes.

Improved baselines with visual instruction tuning

fields

years

verdicts

representative citing papers

citing papers explorer