InNeuRIPS Workshop on Self- Supervised Learning for Speech and Audio Process- ing

The zero resource speech benchmark · 2021 · arXiv 2504.09081

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

eess.AS · 2025-07-31 · unverdicted · novelty 7.0

MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.

Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

cs.SD · 2026-04-13 · unverdicted · novelty 6.0

AF-Next is a scaled audio-language model with long-context support and Temporal Audio Chain-of-Thought reasoning that outperforms prior open models on audio understanding and reasoning benchmarks.

Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey

eess.AS · 2025-05-21 · accept · novelty 6.0

The survey introduces a four-category taxonomy for LALM evaluations and reviews benchmarks across general auditory processing, knowledge reasoning, dialogue, and fairness-safety.

citing papers explorer

Showing 3 of 3 citing papers.

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks eess.AS · 2025-07-31 · unverdicted · none · ref 47
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music cs.SD · 2026-04-13 · unverdicted · none · ref 1
AF-Next is a scaled audio-language model with long-context support and Temporal Audio Chain-of-Thought reasoning that outperforms prior open models on audio understanding and reasoning benchmarks.
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey eess.AS · 2025-05-21 · accept · none · ref 11
The survey introduces a four-category taxonomy for LALM evaluations and reviews benchmarks across general auditory processing, knowledge reasoning, dialogue, and fairness-safety.

InNeuRIPS Workshop on Self- Supervised Learning for Speech and Audio Process- ing

fields

years

verdicts

representative citing papers

citing papers explorer