TMD-Bench is a multi-level benchmark that measures music-dance co-generation quality including beat-level rhythmic synchronization, supported by a new dataset and Music Captioner, and shows commercial models lag in rhythm while a new baseline performs competitively.
Large-scale con- trastive language-audio pretraining (clap)
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6representative citing papers
MIDI-SAG generates consistent long-form singing accompaniments by feeding symbolic MIDI timing, chords, and structure labels into a compositional pipeline built from pre-trained modules.
Defines executable boundary contracts for sound event traces using an STL-embeddable Boolean fragment plus interval and duration clauses, then evaluates them on speech and soundscape data where they disagree with standard scores.
MALEFA reaches 90% accuracy and 0.007% false alarm rate on AMI for zero-shot KWS via cross-attention and multi-granularity contrastive learning while running efficiently on constrained hardware.
The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.
citing papers explorer
-
TMD-Bench: A Multi-Level Evaluation Paradigm for Music-Dance Co-Generation
TMD-Bench is a multi-level benchmark that measures music-dance co-generation quality including beat-level rhythmic synchronization, supported by a new dataset and Music Captioner, and shows commercial models lag in rhythm while a new baseline performs competitively.
-
MIDI-Informed Singing Accompaniment Generation in a Compositional Song Pipeline
MIDI-SAG generates consistent long-form singing accompaniments by feeding symbolic MIDI timing, chords, and structure labels into a compositional pipeline built from pre-trained modules.
-
Executable Boundary Contracts for Sound Event Traces
Defines executable boundary contracts for sound event traces using an STL-embeddable Boolean fragment plus interval and duration clauses, then evaluates them on speech and soundscape data where they disagree with standard scores.
-
MALEFA: Multi-grAnularity Learning and Effective False Alarm Suppression for Zero-shot Keyword Spotting
MALEFA reaches 90% accuracy and 0.007% false alarm rate on AMI for zero-shot KWS via cross-attention and multi-granularity contrastive learning while running efficiently on constrained hardware.
-
Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods
The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.
-
Woosh: A Sound Effects Foundation Model
Woosh is a new publicly released foundation model optimized for high-quality sound effect generation from text or video, showing competitive or better results than open alternatives like Stable Audio Open.