MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
InNeuRIPS Workshop on Self- Supervised Learning for Speech and Audio Process- ing
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
AF-Next is a scaled audio-language model with long-context support and Temporal Audio Chain-of-Thought reasoning that outperforms prior open models on audio understanding and reasoning benchmarks.
The survey introduces a four-category taxonomy for LALM evaluations and reviews benchmarks across general auditory processing, knowledge reasoning, dialogue, and fairness-safety.
citing papers explorer
-
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
-
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
AF-Next is a scaled audio-language model with long-context support and Temporal Audio Chain-of-Thought reasoning that outperforms prior open models on audio understanding and reasoning benchmarks.
-
Towards Holistic Evaluation of Large Audio-Language Models: A Comprehensive Survey
The survey introduces a four-category taxonomy for LALM evaluations and reviews benchmarks across general auditory processing, knowledge reasoning, dialogue, and fairness-safety.