Optical Music Recognition with Convolutional Sequence-to-Sequence Models

Eelco van der Wel; Karen Ullrich

arxiv: 1707.04877 · v1 · pith:JNBN5XQ7new · submitted 2017-07-16 · 💻 cs.CV · cs.IR· cs.SD

Optical Music Recognition with Convolutional Sequence-to-Sequence Models

Eelco van der Wel , Karen Ullrich This is my paper

classification 💻 cs.CV cs.IRcs.SD

keywords learningmodelsmusicaccuracyavailabledatadeepmodel

0 comments

read the original abstract

Optical Music Recognition (OMR) is an important technology within Music Information Retrieval. Deep learning models show promising results on OMR tasks, but symbol-level annotated data sets of sufficient size to train such models are not available and difficult to develop. We present a deep learning architecture called a Convolutional Sequence-to-Sequence model to both move towards an end-to-end trainable OMR pipeline, and apply a learning process that trains on full sentences of sheet music instead of individually labeled symbols. The model is trained and evaluated on a human generated data set, with various image augmentations based on real-world scenarios. This data set is the first publicly available set in OMR research with sufficient size to train and evaluate deep learning models. With the introduced augmentations a pitch recognition accuracy of 81% and a duration accuracy of 94% is achieved, resulting in a note level accuracy of 80%. Finally, the model is compared to commercially available methods, showing a large improvements over these applications.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A High-Accuracy Optical Music Recognition Method Based on Bottleneck Residual Convolutions
cs.CV 2026-04 unverdicted novelty 3.0

A CNN using ResNet-v2-style residual bottleneck blocks and multi-scale dilated convolutions followed by BiGRU and CTC loss achieves SeER of 7.52% and SyER of 0.45% on the Camera-PrIMuS dataset for optical music recognition.