Llava-next: Im- proved reasoning, ocr, and world knowledge

Haotian Liu, Chunyuan Li, Yuheng Li, Bo Li, Yuanhan Zhang, Sheng Shen, Yong Jae Lee · 2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

A new benchmark dataset of 456 real rare-disease face images demonstrates that phenotype-aware synthetic augmentation with landmark filtering improves AI diagnostic accuracy by up to 13.7% in ultra-low-data regimes.

Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention

cs.CV · 2025-11-25 · unverdicted · novelty 6.0

VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.

Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models

cs.CV · 2025-03-18 · unverdicted · novelty 5.0

TwigVLM adds a twig module to VLMs for twig-guided token pruning and self-speculative decoding, retaining 96% performance after pruning 88.9% visual tokens and delivering 154% speedup on long responses for LLaVA-1.5-7B.

citing papers explorer

Showing 3 of 3 citing papers.

RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation cs.CV · 2026-04-03 · unverdicted · none · ref 30
A new benchmark dataset of 456 real rare-disease face images demonstrates that phenotype-aware synthetic augmentation with landmark filtering improves AI diagnostic accuracy by up to 13.7% in ultra-low-data regimes.
Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention cs.CV · 2025-11-25 · unverdicted · none · ref 15
VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.
Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models cs.CV · 2025-03-18 · unverdicted · none · ref 33
TwigVLM adds a twig module to VLMs for twig-guided token pruning and self-speculative decoding, retaining 96% performance after pruning 88.9% visual tokens and delivering 154% speedup on long responses for LLaVA-1.5-7B.

Llava-next: Im- proved reasoning, ocr, and world knowledge

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer