A new benchmark dataset of 456 real rare-disease face images demonstrates that phenotype-aware synthetic augmentation with landmark filtering improves AI diagnostic accuracy by up to 13.7% in ultra-low-data regimes.
Llava-next: Im- proved reasoning, ocr, and world knowledge
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.
TwigVLM adds a twig module to VLMs for twig-guided token pruning and self-speculative decoding, retaining 96% performance after pruning 88.9% visual tokens and delivering 154% speedup on long responses for LLaVA-1.5-7B.
citing papers explorer
-
RDFace: A Benchmark Dataset for Rare Disease Facial Image Analysis under Extreme Data Scarcity and Phenotype-Aware Synthetic Generation
A new benchmark dataset of 456 real rare-disease face images demonstrates that phenotype-aware synthetic augmentation with landmark filtering improves AI diagnostic accuracy by up to 13.7% in ultra-low-data regimes.
-
Tell Model Where to Look: Mitigating Hallucinations in MLLMs by Vision-Guided Attention
VGA constructs precise visual grounding from token semantics to guide MLLM attention toward relevant regions, dynamically suppressing described areas in captioning, and achieves SOTA dehallucination with negligible overhead.
-
Growing a Multi-head Twig via Distillation and Reinforcement Learning to Accelerate Large Vision-Language Models
TwigVLM adds a twig module to VLMs for twig-guided token pruning and self-speculative decoding, retaining 96% performance after pruning 88.9% visual tokens and delivering 154% speedup on long responses for LLaVA-1.5-7B.