JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.
- If the explanation provides clear, evidence-based reasoning or logical steps to derive the answer, proceed to the final output
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MM 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.