MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
2506.11350 , archivePrefix =
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
DAS adds a cached per-class bonus derived from noise-conditioned text prompts to cosine scores, improving accuracy by 2.60-5.75 points on UrbanSound8K and mAP by 1.50-1.74 points on FSD50K under urban noise.
Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.
citing papers explorer
-
Drift-Augmented Scoring: Text-Derived Noise Robustness for Zero-Shot Audio-Language Classification
DAS adds a cached per-class bonus derived from noise-conditioned text prompts to cosine scores, improving accuracy by 2.60-5.75 points on UrbanSound8K and mAP by 1.50-1.74 points on FSD50K under urban noise.
-
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.