Large-scale con- trastive language-audio pretraining (clap)

Yusong Wu et al · 2024 · arXiv 2211.06687

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation

cs.SD · 2026-05-03 · unverdicted · novelty 7.0

TMD-Bench is a multi-level benchmark that measures music-dance co-generation quality including beat-level rhythmic synchronization, supported by a new dataset and Music Captioner, and shows commercial models lag in rhythm while a new baseline performs competitively.

MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline

cs.SD · 2026-02-24 · unverdicted · novelty 7.0

MIDI-SAG generates consistent long-form singing accompaniments by feeding symbolic MIDI timing, chords, and structure labels into a compositional pipeline built from pre-trained modules.

Executable Boundary Contracts for Sound Event Traces

cs.LO · 2026-05-19 · unverdicted · novelty 6.0

Defines executable boundary contracts for sound event traces using an STL-embeddable Boolean fragment plus interval and duration clauses, then evaluates them on speech and soundscape data where they disagree with standard scores.

MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting

eess.AS · 2026-04-04 · unverdicted · novelty 6.0

MALEFA reaches 90% accuracy and 0.007% false alarm rate on AMI for zero-shot KWS via cross-attention and multi-granularity contrastive learning while running efficiently on constrained hardware.

Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods

cs.SD · 2026-05-20 · accept · novelty 5.0

The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.

Woosh: A Sound Effects Foundation Model

cs.SD · 2026-04-02 · accept · novelty 5.0

Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.

citing papers explorer

Showing 6 of 6 citing papers.

TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation cs.SD · 2026-05-03 · unverdicted · none · ref 18
TMD-Bench is a multi-level benchmark that measures music-dance co-generation quality including beat-level rhythmic synchronization, supported by a new dataset and Music Captioner, and shows commercial models lag in rhythm while a new baseline performs competitively.
MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline cs.SD · 2026-02-24 · unverdicted · none · ref 51
MIDI-SAG generates consistent long-form singing accompaniments by feeding symbolic MIDI timing, chords, and structure labels into a compositional pipeline built from pre-trained modules.
Executable Boundary Contracts for Sound Event Traces cs.LO · 2026-05-19 · unverdicted · partial · ref 15
Defines executable boundary contracts for sound event traces using an STL-embeddable Boolean fragment plus interval and duration clauses, then evaluates them on speech and soundscape data where they disagree with standard scores.
MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting eess.AS · 2026-04-04 · unverdicted · none · ref 20
MALEFA reaches 90% accuracy and 0.007% false alarm rate on AMI for zero-shot KWS via cross-attention and multi-granularity contrastive learning while running efficiently on constrained hardware.
Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods cs.SD · 2026-05-20 · accept · none · ref 5
The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.
Woosh: A Sound Effects Foundation Model cs.SD · 2026-04-02 · accept · none · ref 29
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.

Large-scale con- trastive language-audio pretraining (clap)

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer