JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.
Prompt for ambiguity check You are a QA evaluation assistant tasked with filtering incorrect or low-quality question-answer pairs based on video and audio context
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MM 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.