Improved baselines with visual instruction tuning

Haotian Liu, Chunyuan Li, Yuheng Li, Yong Jae Lee · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks

cs.CV · 2024-12-23 · unverdicted · novelty 7.0

HumanVBench provides a 16-task benchmark for human-centric video understanding in MLLMs, created through automated annotation and distractor synthesis pipelines, and shows top models lag human performance on emotion perception and cross-modal alignment.

When Large Vision-Language Models Meet Person Re-Identification

cs.CV · 2024-11-27 · unverdicted · novelty 6.0

LVLM-ReID guides LVLMs to produce refined semantic tokens as pedestrian identity features for ReID, achieving competitive benchmark results without additional image-text data.

citing papers explorer

Showing 2 of 2 citing papers.

HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks cs.CV · 2024-12-23 · unverdicted · none · ref 37
HumanVBench provides a 16-task benchmark for human-centric video understanding in MLLMs, created through automated annotation and distractor synthesis pipelines, and shows top models lag human performance on emotion perception and cross-modal alignment.
When Large Vision-Language Models Meet Person Re-Identification cs.CV · 2024-11-27 · unverdicted · none · ref 13
LVLM-ReID guides LVLMs to produce refined semantic tokens as pedestrian identity features for ReID, achieving competitive benchmark results without additional image-text data.

Improved baselines with visual instruction tuning

fields

years

verdicts

representative citing papers

citing papers explorer