MusTBENCH evaluates temporal grounding in large audio-language models via five expert-validated tasks, and MusT improves performance through encoder adaptation, LLM adaptation, supervised fine-tuning, and RL optimization.
Bryan, Zeyu Jin, and Justin Salamon
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
MOSS-Audio is an audio-language model using a 12.5 Hz encoder, DeepStack cross-layer injection, time markers, and an event-preserving annotation pipeline for unified audio understanding.
citing papers explorer
-
MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs
MusTBENCH evaluates temporal grounding in large audio-language models via five expert-validated tasks, and MusT improves performance through encoder adaptation, LLM adaptation, supervised fine-tuning, and RL optimization.
-
MOSS-Audio Technical Report
MOSS-Audio is an audio-language model using a 12.5 Hz encoder, DeepStack cross-layer injection, time markers, and an event-preserving annotation pipeline for unified audio understanding.