Music flamingo: Scaling music understanding in audio language models

Sreyan Ghosh et al · 2025 · arXiv 2511.10289

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models

cs.SD · 2026-04-21 · unverdicted · novelty 8.0

HalluAudio is the first large-scale benchmark spanning speech, environmental sound, and music that uses human-verified QA pairs, adversarial prompts, and mixed-audio tests to measure hallucinations in large audio-language models.

DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues

cs.AI · 2026-04-09 · unverdicted · novelty 8.0

DialBGM is a new benchmark dataset revealing that existing AI models fall far short of human performance when recommending fitting background music for open-domain conversations.

VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models

cs.SD · 2026-05-06 · unverdicted · novelty 6.0

VocalParse applies interleaved and Chain-of-Thought prompting to a Large Audio Language Model to jointly transcribe lyrics, melody and word-note alignments, achieving state-of-the-art results on multiple singing datasets.

Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods

cs.SD · 2026-05-20 · accept · novelty 5.0

The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.

citing papers explorer

Showing 4 of 4 citing papers.

HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models cs.SD · 2026-04-21 · unverdicted · none · ref 10
HalluAudio is the first large-scale benchmark spanning speech, environmental sound, and music that uses human-verified QA pairs, adversarial prompts, and mixed-audio tests to measure hallucinations in large audio-language models.
DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues cs.AI · 2026-04-09 · unverdicted · none · ref 10
DialBGM is a new benchmark dataset revealing that existing AI models fall far short of human performance when recommending fitting background music for open-domain conversations.
VocalParse: Towards Unified and Scalable Singing Voice Transcription with Large Audio Language Models cs.SD · 2026-05-06 · unverdicted · none · ref 9
VocalParse applies interleaved and Chain-of-Thought prompting to a Large Audio Language Model to jointly transcribe lyrics, melody and word-note alignments, achieving state-of-the-art results on multiple singing datasets.
Academic Text-to-Music Grand Challenge: Datasets, Baselines, and Evaluation Methods cs.SD · 2026-05-20 · accept · none · ref 10
The paper introduces the ATTM Grand Challenge with a CC-licensed instrumental subset of MTG-Jamendo, two tracks, and evaluation via FAD, CLAP, and a new Concept Coverage Score to support academic text-to-music research.

Music flamingo: Scaling music understanding in audio language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer