pith. sign in

InProceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Bench- marks 2021, December 2021, virtual

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.CL 2

years

2026 1 2025 1

representative citing papers

StressTest: Can YOUR Speech LM Handle the Stress?

cs.CL · 2025-05-28 · conditional · novelty 6.0

Speech language models fail at reasoning about sentence stress but improve after fine-tuning on a new 17k-example synthetic dataset that varies stress to alter meaning.

citing papers explorer

Showing 2 of 2 citing papers.

  • StressTest: Can YOUR Speech LM Handle the Stress? cs.CL · 2025-05-28 · conditional · none · ref 2

    Speech language models fail at reasoning about sentence stress but improve after fine-tuning on a new 17k-example synthetic dataset that varies stress to alter meaning.

  • AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA cs.CL · 2026-04-23 · unverdicted · none · ref 6

    AUDITA is a challenging audio QA benchmark where humans score 32% accuracy on average while state-of-the-art models score below 9%, using IRT to reveal systematic model deficiencies.