pith. sign in

hub Mixed citations

Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond

Mixed citation behavior. Most common role is background (40%).

10 Pith papers citing it
Background 40% of classified citations

hub tools

citation-role summary

background 2 baseline 2 method 1

citation-polarity summary

representative citing papers

LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs?

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

LLaVA-UHD v4 reduces visual-encoding FLOPs by 55.8% for high-resolution images in MLLMs via slice-based encoding plus intra-ViT early compression while matching or exceeding baseline performance on document, OCR, and VQA benchmarks.

citing papers explorer

Showing 10 of 10 citing papers.