Glap: General contrastive audio-text pre- training across domains and languages

Heinrich Dinkel, Zhiyong Yan, Tianzi Wang, Yongqing Wang, Xingwei Sun, Yadong Niu, Jizhong Liu, Gang Li, Junbo Zhang, Jian Luan · 2025 · arXiv 2506.11350

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

eess.AS · 2025-07-31 · unverdicted · novelty 7.0

MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.

Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text

cs.SD · 2026-05-27 · unverdicted · novelty 6.0

Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.

citing papers explorer

Showing 2 of 2 citing papers.

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks eess.AS · 2025-07-31 · unverdicted · none · ref 26
MECAT is a multi-expert benchmark for audio AI offering fine-grained captions and QA pairs generated via expert models and LLM reasoning, paired with the DATE metric that combines semantic similarity and cross-sample discriminability to favor detailed outputs.
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text cs.SD · 2026-05-27 · unverdicted · none · ref 25
Dasheng AudioGen uses multi-view captions and a unified semantic-acoustic representation to enable end-to-end generation of mixed audio scenes from text descriptions.

Glap: General contrastive audio-text pre- training across domains and languages

fields

years

verdicts

representative citing papers

citing papers explorer