- Preserve all other emotional/tonal descriptors (e.g., ”sad mood,” ”English accent”)

Output of Emotional Information: For each utterance: - Only if the vocal traits aligns with the subtitle, is non-neutral: - Replace any speech content in the vocal traits (i

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation

cs.MM · 2025-12-14 · conditional · novelty 7.0

JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.

citing papers explorer

Showing 1 of 1 citing paper.

JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation cs.MM · 2025-12-14 · conditional · none · ref 12
JointAVBench is a benchmark for joint audio-visual reasoning that shows leading Omni-LLMs reach only 65.3% accuracy, with particular weakness in cross-scene tasks.

- Preserve all other emotional/tonal descriptors (e.g., ”sad mood,” ”English accent”)

fields

years

verdicts

representative citing papers

citing papers explorer