MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
Glap: General contrastive audio-text pre- training across domains and languages
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.
citing papers explorer
-
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
-
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.