VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.
Wavbench: Benchmarking reasoning, colloquialism, and paralinguistics for end-to-end spoken dialogue models
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
dataset 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3representative citing papers
NVBench provides a standardized bilingual benchmark and evaluation protocol for assessing non-verbal vocalization generation, placement, and salience in text-to-speech systems.
A survey that provides a unified formulation of audio reasoning and reviews advances across Audio-to-Text, Audio-to-Speech, Audio-Visual, and Agentic paradigms while discussing challenges and future directions.
citing papers explorer
-
VoxSafeBench: Not Just What Is Said, but Who, How, and Where
VoxSafeBench reveals that speech language models recognize social norms from text but fail to apply them when acoustic cues like speaker or scene determine the appropriate response.