pith. sign in

MERIT: Learning Disentangled Music Representations for Audio Similarity

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

Current music similarity models typically compute a single, monolithic score, entangling distinct musical dimensions like melody, rhythm, and timbre. This limits user control and interpretability, making it impossible to execute nuanced queries. We introduce MERIT, a framework for learning disentangled, factor-specific music representations tailored to these three core dimensions. To overcome the lack of isolated musical variations in real-world audio, we use a novel training strategy that uses conditional audio generation and source-separated stems to strongly encourage single-factor variation in training data. Our evaluations demonstrate strong factor-wise disentanglement. Each head responds strongly to its intended perceptual dimension while remaining near chance on the others, a representational property that holds across both the synthetic training domain and independent real-world audio.

fields

cs.SD 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

MERIT: Learning Disentangled Music Representations for Audio Similarity

cs.SD · 2026-05-26 · unverdicted · novelty 6.0

MERIT trains disentangled heads for melody, rhythm, and timbre via conditional audio generation and stem separation, with evaluations showing each head responds strongly to its target dimension and near chance on others across synthetic and real audio.

citing papers explorer

Showing 1 of 1 citing paper.

  • MERIT: Learning Disentangled Music Representations for Audio Similarity cs.SD · 2026-05-26 · unverdicted · none · ref 1 · internal anchor

    MERIT trains disentangled heads for melody, rhythm, and timbre via conditional audio generation and stem separation, with evaluations showing each head responds strongly to its target dimension and near chance on others across synthetic and real audio.