The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

David Diaz-Guerra; Jaime Garcia-Martinez; John Anderson; Julio J. Carabias-Orti; Pablo Caba\~nas-Molero; Pedro Vera-Candeas; Ricardo Falcon-Perez; Tuomas Virtanen

arxiv: 2511.21247 · v2 · pith:JVXDPZVVnew · submitted 2025-11-26 · 📡 eess.AS · cs.LG· cs.SD

The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

Jaime Garcia-Martinez , David Diaz-Guerra , John Anderson , Ricardo Falcon-Perez , Pablo Caba\~nas-Molero , Tuomas Virtanen , Julio J. Carabias-Orti , Pedro Vera-Candeas This is my paper

Pith reviewed 2026-05-17 05:00 UTC · model grok-4.3

classification 📡 eess.AS cs.LGcs.SD

keywords orchestral musicsource separationmultitrack datasetclassical musicroom impulse responsesmusic information retrievalaudio processing

0 comments

The pith

The Spheres dataset supplies multitrack orchestral recordings with isolated stems and room impulse responses for classical music source separation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a collection of orchestral performances recorded with 23 microphones to produce both full mixes and separate instrument tracks. It covers two standard works by Tchaikovsky and Mozart played by the Colibrì Ensemble, plus scales and solos for each instrument. This structure supports training machine learning models to pull individual sounds out of complex orchestral audio, where instruments overlap heavily. Room impulse responses for each position are also included to describe the acoustic space. The goal is to give researchers concrete data for improving separation, dereverberation, and related tasks in the classical domain.

Core claim

The Spheres dataset consists of over one hour of multitrack recordings of Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 performed by the Colibrì Ensemble, together with chromatic scales and solo excerpts for each instrument, captured via a 23-microphone array that yields realistic stereo mixes with controlled bleeding and isolated stems for supervised training of source separation models, along with estimated room impulse responses for acoustic characterization of the space.

What carries the argument

The 23-microphone array of close spot, main, and ambient microphones that produces both isolated instrument stems and realistic mixes with controlled bleeding.

If this is right

Isolated stems enable supervised training of models that separate orchestral instrument families.
Controlled bleeding in the mixes allows direct evaluation of microphone debleeding methods.
Room impulse responses support studies of dereverberation and immersive audio rendering.
Baseline results with X-UMX models establish initial benchmarks while exposing challenges in dense orchestral textures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Models trained here could serve as a starting point for separation in live concert settings if studio and hall acoustics prove similar.
Adding recordings from additional ensembles and venues would be a direct way to test and improve generalization.
The multi-microphone layout may also aid research on instrument localization within the same recordings.

Load-bearing premise

Recordings from one ensemble in one studio with this fixed microphone setup produce data that generalizes to other orchestras, halls, and recording conditions.

What would settle it

Source separation models trained solely on this dataset would show markedly lower performance when tested on orchestral recordings made in different halls or with different ensembles.

Figures

Figures reproduced from arXiv: 2511.21247 by David Diaz-Guerra, Jaime Garcia-Martinez, John Anderson, Julio J. Carabias-Orti, Pablo Caba\~nas-Molero, Pedro Vera-Candeas, Ricardo Falcon-Perez, Tuomas Virtanen.

**Figure 2.** Figure 2: The approximate placement of instruments and microphones (indicated by M#) in the recording room. Each rounded square indicates the seat of a [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: A photograph illustrating a session of recording one bassoon line. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Estimated RIR for the Violin 2 microphone with the source located at [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Time played by every instrument in The Spheres dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 8.** Figure 8: Distribution of C50 layers, and a decoder (composed of two linear layers with batch normalization and ReLU activation in the first one). The X-UMX architecture adds a bridging operation between the encoder and the recurrent layers and between the recurrent layers and the decoder, where the latent representations of every branch are averaged so they can share information between them. We trained the models … view at source ↗

**Figure 7.** Figure 7: Signal-to-interference ratio [dB] for the main instrument/section of [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 9.** Figure 9: Swarmplot of T30 values for each receiver (microphone position). Solid and dashed lines indicate the median and the 25th and 75th percentiles, [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison between the theoretical and proposed inverse filters. Top [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Frequency-domain validation of the proposed inverse filter. The [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Inverse filter computation and validation for the corrupted ESS used [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

read the original abstract

This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibr\`i Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper mainly releases a new multitrack orchestral dataset with 23 mics, isolated stems, and room impulse responses that fills a gap for classical source separation.

read the letter

The main thing to know is that this paper releases a new dataset of multitrack orchestral recordings aimed at source separation and MIR in classical music. It includes over an hour of performances by the Colibrì Ensemble of Tchaikovsky and Mozart pieces, plus scales and solos, all captured with 23 microphones and with room impulse responses measured for the positions. The recording approach stands out because it uses close spot, main, and ambient mics to create realistic mixes with controlled bleeding while keeping stems isolated. Adding the RIRs gives extra value for acoustic work. The X-UMX baselines for family separation and debleeding are straightforward and show the real difficulties without hype. The documentation of the setup and dataset structure is clear and practical. A clear limitation is that everything was recorded in one studio with one ensemble, so generalization to other groups or spaces is an open question. The baselines could be expanded with more recent methods for stronger comparisons. This is worth attention from people building separation models for orchestral music or working on classical MIR tasks. The isolated tracks and acoustic data should let them train and test in a more realistic setting than synthetic mixes. I think it deserves peer review as a useful data contribution that documents a concrete resource for the community.

Referee Report

1 major / 2 minor

Summary. The paper introduces The Spheres dataset, consisting of over one hour of multitrack orchestral recordings of Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 performed by the Colibrì Ensemble at The Spheres studio. It uses a 23-microphone setup (close spot, main, and ambient) to capture isolated stems with controlled bleeding for realistic mixes, along with chromatic scales, solo excerpts, and estimated room impulse responses for each instrument position. The manuscript describes the dataset structure, provides acoustic analysis, and reports baseline evaluations using X-UMX models for orchestral family separation and microphone debleeding.

Significance. If released with complete documentation and access, this dataset would address an important gap in publicly available multitrack resources for classical orchestral music, enabling supervised training of source separation models in a domain where such data is scarce. The inclusion of RIRs and controlled bleeding setups supports additional research on acoustic characterization, dereverberation, localization, and immersive rendering. The baseline results usefully illustrate both the applicability and the remaining challenges of separation in dense orchestral textures.

major comments (1)

[Recording setup] Recording setup section: the description of how isolated stems are derived from the 23-microphone multitrack and the precise post-processing steps used to create controlled bleeding should be expanded with quantitative details (e.g., bleed levels, microphone distances) so that users can replicate the acoustic conditions for new experiments.

minor comments (2)

[Abstract] Abstract: adding one or two concrete quantitative results (e.g., SDR or SI-SDR values from the X-UMX baselines) would better substantiate the stated 'potential and challenges'.
[Dataset structure] Dataset structure: explicitly state total duration per work, sampling rate, bit depth, and file formats for all stems and RIRs to improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and positive recommendation. We address the major comment point by point below.

read point-by-point responses

Referee: [Recording setup] Recording setup section: the description of how isolated stems are derived from the 23-microphone multitrack and the precise post-processing steps used to create controlled bleeding should be expanded with quantitative details (e.g., bleed levels, microphone distances) so that users can replicate the acoustic conditions for new experiments.

Authors: We agree with the referee that expanding the Recording Setup section with quantitative details will improve the utility and reproducibility of the dataset. In the revised version of the manuscript, we will include specific microphone distances (e.g., spot mics at 0.5-1m from instruments), measured bleed levels in dB between adjacent stems, and a step-by-step description of the post-processing pipeline used to create the controlled bleeding mixes. These details are available from our recording session logs and will be presented in a new table or subsection for clarity. revision: yes

Circularity Check

0 steps flagged

No significant circularity in dataset release paper

full rationale

The paper is a data collection and release contribution describing recordings of two orchestral works with a 23-microphone setup, isolated stems, and room impulse responses, plus baseline evaluations on existing X-UMX models. No equations, derivations, or fitted parameters are presented that reduce any reported result to quantities defined or fitted from the same data by construction. The central claims rest on the concrete recording protocol and external model baselines rather than self-referential steps, self-citations as load-bearing premises, or ansatzes smuggled through prior work. This matches the expected honest non-finding for a resource paper whose value is the new data itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is an empirical dataset contribution; the central claim rests on the existence and utility of the collected recordings rather than on mathematical axioms or fitted parameters.

axioms (1)

domain assumption Standard acoustic measurement techniques can produce usable room impulse responses from the chosen microphone positions.
Invoked when the authors state that room impulse responses were estimated for each instrument position.

pith-pipeline@v0.9.0 · 5550 in / 1186 out tokens · 35638 ms · 2026-05-17T05:00:38.681872+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The dataset is composed of over one hour recordings... 23 microphones... room impulse responses were estimated... baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

[1]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B

Z. Rafii, A. Liutkus, F.-R. St ¨oter, S. I. Mimilakis, and R. Bittner, “MUSDB18-HQ - an uncompressed version of musdb18,” Dec. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3338373

work page doi:10.5281/zenodo.3338373 2019
[2]

Music demixing challenge 2021,

Y . Mitsufuji, G. Fabbro, S. Uhlich, F.-R. St ¨oter, A. D ´efossez, M. Kim, W. Choi, C.-Y . Yu, and K.-W. Cheuk, “Music demixing challenge 2021,” Frontiers in Signal Processing, vol. 1, p. 808395, 2022

work page 2021
[3]

The sound demixing challenge 2023 – music demixing track,

G. Fabbro, S. Uhlich, C.-H. Lai, W. Choi, M. Mart ´ınez-Ram´ırez, W. Liao, I. Gadelha, G. Ramos, E. Hsu, H. Rodrigues, F.-R. St ¨oter, A. D ´efossez, Y . Luo, J. Yu, D. Chakraborty, S. Mohanty, R. Solovyev, A. Stempkovskiy, T. Habruseva, N. Goswami, T. Harada, M. Kim, J. Hyung Lee, Y . Dong, X. Zhang, J. Liu, and Y . Mitsufuji, “The sound demixing challen...

work page 2023
[4]

Musical source separation: An introduction,

E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. St ¨oter, “Musical source separation: An introduction,”IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–40, 2019

work page 2019
[5]

MTG MASS database. [Online]

M. Vinyes, “MTG MASS database. [Online].” http://www.mtg.upf.edu/static/mass/resources, 2008

work page 2008
[6]

Medleydb 2.0: New data and a system for sustainable data collection,

R. M. Bittner, J. Wilkins, H. Yip, and J. P. Bello, “Medleydb 2.0: New data and a system for sustainable data collection,”ISMIR Late Breaking and Demo Papers, vol. 36, 2016

work page 2016
[7]

Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity,

E. Manilow, G. Wichern, P. Seetharaman, and J. Le Roux, “Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity,” inProc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019

work page 2019
[8]

Piano concerto dataset (pcd): A multitrack dataset of piano concertos,

Y . ¨Ozer, S. J. Schw ¨ar, V . Arifi-M ¨uller, J. Lawrence, E. Sen, and M. M ¨uller, “Piano concerto dataset (pcd): A multitrack dataset of piano concertos,”Transactions of the International Society for Music Information Retrieval, vol. 6, no. 1, pp. 75–88, 2023

work page 2023
[9]

Music source separation in the waveform domain

A. D ´efossez, N. Usunier, L. Bottou, and F. Bach, “Music source separation in the waveform domain,”arXiv preprint arXiv:1911.13254, 2019

work page arXiv 1911
[10]

Spleeter: a fast and efficient music source separation tool with pre-trained models,

R. Hennequin, A. Khlif, F. V oituret, and M. Moussallam, “Spleeter: a fast and efficient music source separation tool with pre-trained models,” Journal of Open Source Software, vol. 5, no. 50, p. 2154, 2020

work page 2020
[11]

Hybrid spectrogram and waveform source separation,

A. D ´efossez, “Hybrid spectrogram and waveform source separation,” in Proceedings of the ISMIR 2021 Workshop on Music Source Separation, 2021

work page 2021
[12]

Hybrid transformers for music source separation,

S. Rouard, F. Massa, and A. D ´efossez, “Hybrid transformers for music source separation,” inICASSP 2023 - 2023 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5

work page 2023
[13]

Music source separation with band-split rnn,

Y . Luo and J. Yu, “Music source separation with band-split rnn,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1893–1901, 2023

work page 1901
[14]

Scnet: Sparse compression network for music source separation,

W. Tong, J. Zhu, J. Chen, S. Kang, T. Jiang, Y . Li, Z. Wu, and H. Meng, “Scnet: Sparse compression network for music source separation,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 1276–1280

work page 2024
[15]

Music source sepa- ration with band-split rope transformer,

W.-T. Lu, J.-C. Wang, Q. Kong, and Y .-N. Hung, “Music source sepa- ration with band-split rope transformer,” inICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 481–485

work page 2024
[16]

Monaural score-informed source separation for classical music using convolutional neural networks,

M. Miron, J. Janer, and E. G ´omez, “Monaural score-informed source separation for classical music using convolutional neural networks,” in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 2017, pp. 55–62

work page 2017
[17]

Conditioned source separa- tion for musical instrument performances,

O. Slizovskaia, G. Haro, and E. G ´omez, “Conditioned source separa- tion for musical instrument performances,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2083–2095, 2021

work page 2083
[18]

Source separation of piano concertos using musically motivated augmentation techniques,

Y . ¨Ozer and M. M ¨uller, “Source separation of piano concertos using musically motivated augmentation techniques,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1214–1225, 2024

work page 2024
[19]

Synthsod: Developing an heteroge- neous dataset for orchestra music source separation,

J. Garcia-Martinez, D. Diaz-Guerra, A. Politis, T. Virtanen, J. J. Carabias-Orti, and P. Vera-Candeas, “Synthsod: Developing an heteroge- neous dataset for orchestra music source separation,”IEEE Open Journal of Signal Processing, vol. 6, pp. 129–137, 2025

work page 2025
[20]

The trios dataset,

J. Fritsch, “The trios dataset,” Jul. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6797837

work page doi:10.5281/zenodo.6797837 2022
[21]

Soundprism: an online system for score- informed source separation of music audio,

Z. Duan and B. Pardo, “Soundprism: an online system for score- informed source separation of music audio,”IEEE Journal of Selected Topics in Signal Process., vol. 5, no. 6, pp. 1205–1215, 2011

work page 2011
[22]

Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications,

B. Li, X. Liu, K. Dinesh, Z. Duan, and G. Sharma, “Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications,”IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 522–535, 2019

work page 2019
[23]

Anechoic recording system for symphony orchestra,

J. P ¨atynen, V . Pulkki, and T. Lokki, “Anechoic recording system for symphony orchestra,”Acta Acustica united with Acustica, vol. 94, pp. 856–865, 11 2008

work page 2008
[24]

Kollektion: Operation beethoven. beethovens 4. sinfonie in einzelstimmen!

U. Kaiser, I. Mestemacher, and M. Vieregg, “Kollektion: Operation beethoven. beethovens 4. sinfonie in einzelstimmen!”[Online]. Available: https://openmusic.academy/docs/4HAB9wcKyiXNGNsmkEFRXD/operation- beethoven-kooperation-der-open-music-academy-mit-der-hofkapelle- muenchen, 2023

work page 2023
[25]

Mixing-specific data augmentation techniques for improved blind vio- lin/piano source separation,

C.-Y . Chiu, W.-Y . Hsiao, Y .-C. Yeh, Y .-H. Yang, and A. W.-Y . Su, “Mixing-specific data augmentation techniques for improved blind vio- lin/piano source separation,” in2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2020, pp. 1–6

work page 2020
[26]

A study of audio mixing methods for piano transcription in violin-piano ensembles,

H. Kim, J. Park, T. Kwon, D. Jeong, and J. Nam, “A study of audio mixing methods for piano transcription in violin-piano ensembles,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5

work page 2023
[27]

Generating data to train convolutional neural networks for classical music source separation,

M. Miron, J. Janer Mestres, and E. G ´omez Guti ´errez, “Generating data to train convolutional neural networks for classical music source separation,” inProceedings of the 14th Sound and Music Computing Conference, 2017, pp. p. 227–33

work page 2017
[28]

Score-informed music source separation: Improving synthetic-to- real generalization in classical music,

E. Tunturi, D. Diaz-Guerra, A. Politis, and T. Virtanen, “Score-informed music source separation: Improving synthetic-to- real generalization in classical music,” 2025. [Online]. Available: https://arxiv.org/abs/2503.07352

work page arXiv 2025
[29]

Hierarchical musical instru- ment separation,

E. Manilow, G. Wichern, and J. L. Roux, “Hierarchical musical instru- ment separation,” inProceedings of the 21st International Conference on Music Information Retrieval, ISMIR 2020, 2020, pp. 376–383. 13

work page 2020
[30]

Leveraging syn- thetic data for improving chamber ensemble separation,

S. Sarkar, L. Thorpe, E. Benetos, and M. Sandler, “Leveraging syn- thetic data for improving chamber ensemble separation,” in2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023, pp. 1–5

work page 2023
[31]

All for one and one for all: Improving music separation by bridging networks,

R. Sawata, S. Uhlich, S. Takahashi, and Y . Mitsufuji, “All for one and one for all: Improving music separation by bridging networks,” inICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 51–55

work page 2021
[32]

The cadenza woodwind dataset: Synthesised quartets for music information retrieval and machine learning,

G. Roa Dabike, T. J. Cox, A. J. Miller, B. M. Fazenda, S. Graetzer, R. R. V os, M. A. Akeroyd, J. Firth, W. M. Whitmer, S. Bannister, A. Greasley, and J. P. Barker, “The cadenza woodwind dataset: Synthesised quartets for music information retrieval and machine learning,”Data in Brief, vol. 57, p. 111199, 2024. [Online]. Available: https://www.sciencedirec...

work page 2024
[33]

Tchaikovsky, Romeo and Juliet, Leonard Bernstein, New York Philharmonic

Lennyforever. Tchaikovsky, Romeo and Juliet, Leonard Bernstein, New York Philharmonic. [Online]. Available: https://www.youtube.com/watch?v=BSbzyTNVB1Q

work page
[34]

Bridgland

R. Bridgland. Mozart Symphony No 40 in G minor KV550 Leonard Bernstein. [Online]. Available: https://www.youtube.com/watch?v=p8bZ7vm4 6M

work page
[35]

Advancements in impulse response measurements by sine sweeps,

farina angelo, “Advancements in impulse response measurements by sine sweeps,”journal of the audio engineering society, no. 7121, may 2007

work page 2007
[36]

Acoustics – measurement of room acoustic parameters – part 1: Performance spaces,

I. 3382-1:2009, “Acoustics – measurement of room acoustic parameters – part 1: Performance spaces,” International Organization for Standard- ization, Geneva, Switzerland, Standard ISO 3382-1:2009, 2009

work page 2009
[37]

Why does music source separation benefit from cacophony?

C.-B. Jeon, G. Wichern, F. G. Germain, and J. Le Roux, “Why does music source separation benefit from cacophony?” in2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 2024, pp. 873–877

work page 2024
[38]

Open-unmix-a reference implementation for music source separation,

F.-R. St ¨oter, S. Uhlich, A. Liutkus, and Y . Mitsufuji, “Open-unmix-a reference implementation for music source separation,”Journal of Open Source Software, vol. 4, no. 41, p. 1667, 2019

work page 2019
[39]

Asteroid: the PyTorch-based audio source separation toolkit for researchers,

M. Pariente, S. Cornell, J. Cosentino, S. Sivasankaran, E. Tzinis, J. Heitkaemper, M. Olvera, F.-R. St ¨oter, M. Hu, J. M. Mart ´ın-Do˜nas, D. Ditter, A. Frank, A. Deleforge, and E. Vincent, “Asteroid: the PyTorch-based audio source separation toolkit for researchers,” inIn- terspeech 2020, Shanghai, China, 2020

work page 2020
[40]

museval 0.3.0,

F.-R. St ¨oter and A. Liutkus, “museval 0.3.0,” Aug. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3376621

work page doi:10.5281/zenodo.3376621 2019
[41]

Sdr–half-baked or well done?

J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “Sdr–half-baked or well done?” inICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 626–630

work page 2019
[42]

Performance measurement in blind audio source separation,

E. Vincent, R. Gribonval, and C. F ´evotte, “Performance measurement in blind audio source separation,”IEEE transactions on audio, speech, and language processing, vol. 14, no. 4, pp. 1462–1469, 2006

work page 2006
[43]

Using an exponential sine sweep to measure 3d printed vocal tract resonances,

B. Delvaux and D. M. Howard, “Using an exponential sine sweep to measure 3d printed vocal tract resonances,” inThe 21st International COngress on Sound and Vibration, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:55646202

work page 2014

[1] [1]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B

Z. Rafii, A. Liutkus, F.-R. St ¨oter, S. I. Mimilakis, and R. Bittner, “MUSDB18-HQ - an uncompressed version of musdb18,” Dec. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3338373

work page doi:10.5281/zenodo.3338373 2019

[2] [2]

Music demixing challenge 2021,

Y . Mitsufuji, G. Fabbro, S. Uhlich, F.-R. St ¨oter, A. D ´efossez, M. Kim, W. Choi, C.-Y . Yu, and K.-W. Cheuk, “Music demixing challenge 2021,” Frontiers in Signal Processing, vol. 1, p. 808395, 2022

work page 2021

[3] [3]

The sound demixing challenge 2023 – music demixing track,

G. Fabbro, S. Uhlich, C.-H. Lai, W. Choi, M. Mart ´ınez-Ram´ırez, W. Liao, I. Gadelha, G. Ramos, E. Hsu, H. Rodrigues, F.-R. St ¨oter, A. D ´efossez, Y . Luo, J. Yu, D. Chakraborty, S. Mohanty, R. Solovyev, A. Stempkovskiy, T. Habruseva, N. Goswami, T. Harada, M. Kim, J. Hyung Lee, Y . Dong, X. Zhang, J. Liu, and Y . Mitsufuji, “The sound demixing challen...

work page 2023

[4] [4]

Musical source separation: An introduction,

E. Cano, D. FitzGerald, A. Liutkus, M. D. Plumbley, and F.-R. St ¨oter, “Musical source separation: An introduction,”IEEE Signal Processing Magazine, vol. 36, no. 1, pp. 31–40, 2019

work page 2019

[5] [5]

MTG MASS database. [Online]

M. Vinyes, “MTG MASS database. [Online].” http://www.mtg.upf.edu/static/mass/resources, 2008

work page 2008

[6] [6]

Medleydb 2.0: New data and a system for sustainable data collection,

R. M. Bittner, J. Wilkins, H. Yip, and J. P. Bello, “Medleydb 2.0: New data and a system for sustainable data collection,”ISMIR Late Breaking and Demo Papers, vol. 36, 2016

work page 2016

[7] [7]

Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity,

E. Manilow, G. Wichern, P. Seetharaman, and J. Le Roux, “Cutting music source separation some Slakh: A dataset to study the impact of training data quality and quantity,” inProc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, 2019

work page 2019

[8] [8]

Piano concerto dataset (pcd): A multitrack dataset of piano concertos,

Y . ¨Ozer, S. J. Schw ¨ar, V . Arifi-M ¨uller, J. Lawrence, E. Sen, and M. M ¨uller, “Piano concerto dataset (pcd): A multitrack dataset of piano concertos,”Transactions of the International Society for Music Information Retrieval, vol. 6, no. 1, pp. 75–88, 2023

work page 2023

[9] [9]

Music source separation in the waveform domain

A. D ´efossez, N. Usunier, L. Bottou, and F. Bach, “Music source separation in the waveform domain,”arXiv preprint arXiv:1911.13254, 2019

work page arXiv 1911

[10] [10]

Spleeter: a fast and efficient music source separation tool with pre-trained models,

R. Hennequin, A. Khlif, F. V oituret, and M. Moussallam, “Spleeter: a fast and efficient music source separation tool with pre-trained models,” Journal of Open Source Software, vol. 5, no. 50, p. 2154, 2020

work page 2020

[11] [11]

Hybrid spectrogram and waveform source separation,

A. D ´efossez, “Hybrid spectrogram and waveform source separation,” in Proceedings of the ISMIR 2021 Workshop on Music Source Separation, 2021

work page 2021

[12] [12]

Hybrid transformers for music source separation,

S. Rouard, F. Massa, and A. D ´efossez, “Hybrid transformers for music source separation,” inICASSP 2023 - 2023 IEEE International Confer- ence on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5

work page 2023

[13] [13]

Music source separation with band-split rnn,

Y . Luo and J. Yu, “Music source separation with band-split rnn,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 1893–1901, 2023

work page 1901

[14] [14]

Scnet: Sparse compression network for music source separation,

W. Tong, J. Zhu, J. Chen, S. Kang, T. Jiang, Y . Li, Z. Wu, and H. Meng, “Scnet: Sparse compression network for music source separation,” in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 1276–1280

work page 2024

[15] [15]

Music source sepa- ration with band-split rope transformer,

W.-T. Lu, J.-C. Wang, Q. Kong, and Y .-N. Hung, “Music source sepa- ration with band-split rope transformer,” inICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 481–485

work page 2024

[16] [16]

Monaural score-informed source separation for classical music using convolutional neural networks,

M. Miron, J. Janer, and E. G ´omez, “Monaural score-informed source separation for classical music using convolutional neural networks,” in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR), 2017, pp. 55–62

work page 2017

[17] [17]

Conditioned source separa- tion for musical instrument performances,

O. Slizovskaia, G. Haro, and E. G ´omez, “Conditioned source separa- tion for musical instrument performances,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2083–2095, 2021

work page 2083

[18] [18]

Source separation of piano concertos using musically motivated augmentation techniques,

Y . ¨Ozer and M. M ¨uller, “Source separation of piano concertos using musically motivated augmentation techniques,”IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1214–1225, 2024

work page 2024

[19] [19]

Synthsod: Developing an heteroge- neous dataset for orchestra music source separation,

J. Garcia-Martinez, D. Diaz-Guerra, A. Politis, T. Virtanen, J. J. Carabias-Orti, and P. Vera-Candeas, “Synthsod: Developing an heteroge- neous dataset for orchestra music source separation,”IEEE Open Journal of Signal Processing, vol. 6, pp. 129–137, 2025

work page 2025

[20] [20]

The trios dataset,

J. Fritsch, “The trios dataset,” Jul. 2022. [Online]. Available: https://doi.org/10.5281/zenodo.6797837

work page doi:10.5281/zenodo.6797837 2022

[21] [21]

Soundprism: an online system for score- informed source separation of music audio,

Z. Duan and B. Pardo, “Soundprism: an online system for score- informed source separation of music audio,”IEEE Journal of Selected Topics in Signal Process., vol. 5, no. 6, pp. 1205–1215, 2011

work page 2011

[22] [22]

Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications,

B. Li, X. Liu, K. Dinesh, Z. Duan, and G. Sharma, “Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications,”IEEE Transactions on Multimedia, vol. 21, no. 2, pp. 522–535, 2019

work page 2019

[23] [23]

Anechoic recording system for symphony orchestra,

J. P ¨atynen, V . Pulkki, and T. Lokki, “Anechoic recording system for symphony orchestra,”Acta Acustica united with Acustica, vol. 94, pp. 856–865, 11 2008

work page 2008

[24] [24]

Kollektion: Operation beethoven. beethovens 4. sinfonie in einzelstimmen!

U. Kaiser, I. Mestemacher, and M. Vieregg, “Kollektion: Operation beethoven. beethovens 4. sinfonie in einzelstimmen!”[Online]. Available: https://openmusic.academy/docs/4HAB9wcKyiXNGNsmkEFRXD/operation- beethoven-kooperation-der-open-music-academy-mit-der-hofkapelle- muenchen, 2023

work page 2023

[25] [25]

Mixing-specific data augmentation techniques for improved blind vio- lin/piano source separation,

C.-Y . Chiu, W.-Y . Hsiao, Y .-C. Yeh, Y .-H. Yang, and A. W.-Y . Su, “Mixing-specific data augmentation techniques for improved blind vio- lin/piano source separation,” in2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2020, pp. 1–6

work page 2020

[26] [26]

A study of audio mixing methods for piano transcription in violin-piano ensembles,

H. Kim, J. Park, T. Kwon, D. Jeong, and J. Nam, “A study of audio mixing methods for piano transcription in violin-piano ensembles,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5

work page 2023

[27] [27]

Generating data to train convolutional neural networks for classical music source separation,

M. Miron, J. Janer Mestres, and E. G ´omez Guti ´errez, “Generating data to train convolutional neural networks for classical music source separation,” inProceedings of the 14th Sound and Music Computing Conference, 2017, pp. p. 227–33

work page 2017

[28] [28]

Score-informed music source separation: Improving synthetic-to- real generalization in classical music,

E. Tunturi, D. Diaz-Guerra, A. Politis, and T. Virtanen, “Score-informed music source separation: Improving synthetic-to- real generalization in classical music,” 2025. [Online]. Available: https://arxiv.org/abs/2503.07352

work page arXiv 2025

[29] [29]

Hierarchical musical instru- ment separation,

E. Manilow, G. Wichern, and J. L. Roux, “Hierarchical musical instru- ment separation,” inProceedings of the 21st International Conference on Music Information Retrieval, ISMIR 2020, 2020, pp. 376–383. 13

work page 2020

[30] [30]

Leveraging syn- thetic data for improving chamber ensemble separation,

S. Sarkar, L. Thorpe, E. Benetos, and M. Sandler, “Leveraging syn- thetic data for improving chamber ensemble separation,” in2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023, pp. 1–5

work page 2023

[31] [31]

All for one and one for all: Improving music separation by bridging networks,

R. Sawata, S. Uhlich, S. Takahashi, and Y . Mitsufuji, “All for one and one for all: Improving music separation by bridging networks,” inICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 51–55

work page 2021

[32] [32]

The cadenza woodwind dataset: Synthesised quartets for music information retrieval and machine learning,

G. Roa Dabike, T. J. Cox, A. J. Miller, B. M. Fazenda, S. Graetzer, R. R. V os, M. A. Akeroyd, J. Firth, W. M. Whitmer, S. Bannister, A. Greasley, and J. P. Barker, “The cadenza woodwind dataset: Synthesised quartets for music information retrieval and machine learning,”Data in Brief, vol. 57, p. 111199, 2024. [Online]. Available: https://www.sciencedirec...

work page 2024

[33] [33]

Tchaikovsky, Romeo and Juliet, Leonard Bernstein, New York Philharmonic

Lennyforever. Tchaikovsky, Romeo and Juliet, Leonard Bernstein, New York Philharmonic. [Online]. Available: https://www.youtube.com/watch?v=BSbzyTNVB1Q

work page

[34] [34]

Bridgland

R. Bridgland. Mozart Symphony No 40 in G minor KV550 Leonard Bernstein. [Online]. Available: https://www.youtube.com/watch?v=p8bZ7vm4 6M

work page

[35] [35]

Advancements in impulse response measurements by sine sweeps,

farina angelo, “Advancements in impulse response measurements by sine sweeps,”journal of the audio engineering society, no. 7121, may 2007

work page 2007

[36] [36]

Acoustics – measurement of room acoustic parameters – part 1: Performance spaces,

I. 3382-1:2009, “Acoustics – measurement of room acoustic parameters – part 1: Performance spaces,” International Organization for Standard- ization, Geneva, Switzerland, Standard ISO 3382-1:2009, 2009

work page 2009

[37] [37]

Why does music source separation benefit from cacophony?

C.-B. Jeon, G. Wichern, F. G. Germain, and J. Le Roux, “Why does music source separation benefit from cacophony?” in2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). IEEE, 2024, pp. 873–877

work page 2024

[38] [38]

Open-unmix-a reference implementation for music source separation,

F.-R. St ¨oter, S. Uhlich, A. Liutkus, and Y . Mitsufuji, “Open-unmix-a reference implementation for music source separation,”Journal of Open Source Software, vol. 4, no. 41, p. 1667, 2019

work page 2019

[39] [39]

Asteroid: the PyTorch-based audio source separation toolkit for researchers,

M. Pariente, S. Cornell, J. Cosentino, S. Sivasankaran, E. Tzinis, J. Heitkaemper, M. Olvera, F.-R. St ¨oter, M. Hu, J. M. Mart ´ın-Do˜nas, D. Ditter, A. Frank, A. Deleforge, and E. Vincent, “Asteroid: the PyTorch-based audio source separation toolkit for researchers,” inIn- terspeech 2020, Shanghai, China, 2020

work page 2020

[40] [40]

museval 0.3.0,

F.-R. St ¨oter and A. Liutkus, “museval 0.3.0,” Aug. 2019. [Online]. Available: https://doi.org/10.5281/zenodo.3376621

work page doi:10.5281/zenodo.3376621 2019

[41] [41]

Sdr–half-baked or well done?

J. Le Roux, S. Wisdom, H. Erdogan, and J. R. Hershey, “Sdr–half-baked or well done?” inICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019, pp. 626–630

work page 2019

[42] [42]

Performance measurement in blind audio source separation,

E. Vincent, R. Gribonval, and C. F ´evotte, “Performance measurement in blind audio source separation,”IEEE transactions on audio, speech, and language processing, vol. 14, no. 4, pp. 1462–1469, 2006

work page 2006

[43] [43]

Using an exponential sine sweep to measure 3d printed vocal tract resonances,

B. Delvaux and D. M. Howard, “Using an exponential sine sweep to measure 3d printed vocal tract resonances,” inThe 21st International COngress on Sound and Vibration, 2014. [Online]. Available: https://api.semanticscholar.org/CorpusID:55646202

work page 2014