The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

Bob L. Sturm

arxiv: 1306.1461 · v2 · pith:EWGSUEHAnew · submitted 2013-06-06 · 💻 cs.SD

The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

Bob L. Sturm This is my paper

classification 💻 cs.SD

keywords faultsgtzancontentsdatasetsystemsbeeneffectsevaluation

0 comments

read the original abstract

The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

If It's Good Enough for You, It's Good Enough for Me: Transferability of Audio Sufficiencies across Models
cs.SD 2026-04 unverdicted novelty 7.0

Transferability analysis finds that minimal sufficient signals transfer across audio models at rates varying by task, around 26% for music genre classification, with some deepfake models showing distinct behaviors not...
Continuous Audio Thinking for Large Audio Language Models
cs.CL 2026-06 unverdicted novelty 6.0

CoAT adds a continuous latent thinking space to LALMs via expert distillation to retain acoustic information, yielding gains on audio reasoning, understanding, music, emotion, and transcription benchmarks across three models.
Channel-Oriented Design for EEG-to-Music Reconstruction
cs.SD 2026-06 unverdicted novelty 6.0

Introduces a channel-oriented design using per-electrode tokenization, multi-view self-distillation, and structured channel dropout within an encoding-alignment-decoding pipeline to improve EEG-to-music reconstruction...
WQ-Fusion: Dynamic Gated Attention for Cross-Domain Audio Representation
cs.SD 2026-06 unverdicted novelty 4.0

WQ-Fusion combines Whisper and Qwen encoders with gated attention to reach 0.836 on the Interspeech 2026 Audio Encoder Capability Challenge, outperforming single-encoder baselines.