Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
arXiv preprint arXiv:2402.15300 (2024)
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.CV 3roles
method 1polarities
background 1representative citing papers
Equitable attention via Dominant Object Penalty and Outlier Boost Coefficient reduces object hallucinations in multimodal LLMs without retraining.
The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.
citing papers explorer
-
Hallucination of Multimodal Large Language Models: A Survey
The survey organizes causes of hallucinations in MLLMs, reviews evaluation benchmarks and metrics, and outlines mitigation approaches plus open questions.