pith. sign in

arxiv: 2506.10488 · v3 · pith:SCE4FXXQnew · submitted 2025-06-12 · 💻 cs.CV · cs.DL· cs.IR

Sheet Music Benchmark: Standardized Optical Music Recognition Evaluation

classification 💻 cs.CV cs.DLcs.IR
keywords musicbenchmarkomr-neddataseterrorevaluationintroducemusical
0
0 comments X
read the original abstract

In this work, we introduce the Sheet Music Benchmark (SMB), a dataset of six hundred and eighty-five pages specifically designed to benchmark Optical Music Recognition (OMR) research. SMB encompasses a diverse array of musical textures, including monophony, pianoform, quartet, and others, all encoded in Common Western Modern Notation using the Humdrum **kern format. Alongside SMB, we introduce the OMR Normalized Edit Distance (OMR-NED), a new metric tailored explicitly for evaluating OMR performance. OMR-NED builds upon the widely-used Symbol Error Rate (SER), offering a fine-grained and detailed error analysis that covers individual musical elements such as note heads, beams, pitches, accidentals, and other critical notation features. The resulting numeric score provided by OMR-NED facilitates clear comparisons, enabling researchers and end-users alike to identify optimal OMR approaches. Our work thus addresses a long-standing gap in OMR evaluation, and we support our contributions with baseline experiments using standardized SMB dataset splits for training and assessing state-of-the-art methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Rubato: Transcribing Piano Music with Timestamps

    cs.SD 2026-05 unverdicted novelty 6.0

    Rubato model with InterMo representation outperforms cascade methods in generating timestamped piano sheet music from audio, even when cascades receive ground-truth MIDI.

  2. A High-Accuracy Optical Music Recognition Method Based on Bottleneck Residual Convolutions

    cs.CV 2026-04 unverdicted novelty 3.0

    A CNN using ResNet-v2-style residual bottleneck blocks and multi-scale dilated convolutions followed by BiGRU and CTC loss achieves SeER of 7.52% and SyER of 0.45% on the Camera-PrIMuS dataset for optical music recognition.