Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
Gonzalez, Ion Stoica, and Eric P
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
TempCompass benchmark reveals that state-of-the-art Video LLMs have poor ability to perceive temporal aspects such as speed, direction, and ordering in videos.
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
SFTMix applies mixup regularization to confidence-stratified interpolated examples during LLM instruction tuning to achieve consistent gains across models and datasets.
citing papers explorer
-
Evaluating Object Hallucination in Large Vision-Language Models
Large vision-language models exhibit severe object hallucination that varies with training instructions, and the proposed POPE polling method evaluates it more stably and flexibly than prior approaches.
-
TempCompass: Do Video LLMs Really Understand Videos?
TempCompass benchmark reveals that state-of-the-art Video LLMs have poor ability to perceive temporal aspects such as speed, direction, and ordering in videos.
-
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
UltraChat supplies 1.5 million high-quality multi-turn dialogues that, when used to fine-tune LLaMA, produce UltraLLaMA, which outperforms prior open-source chat models including Vicuna.
-
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
SFTMix applies mixup regularization to confidence-stratified interpolated examples during LLM instruction tuning to achieve consistent gains across models and datasets.