JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.
- Preserve all other emotional/tonal descriptors (e.g., ”sad mood,” ”English accent”)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MM 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation
JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.