Deep Polyphonic ADSR Piano Note Transcription

Gerhard Widmer; Rainer Kelz; Sebastian B\"ock

arxiv: 1906.09165 · v1 · pith:QSRV33IMnew · submitted 2019-06-21 · 💻 cs.SD · eess.AS

Deep Polyphonic ADSR Piano Note Transcription

Rainer Kelz , Sebastian B\"ock , Gerhard Widmer This is my paper

Pith reviewed 2026-05-25 18:15 UTC · model grok-4.3

classification 💻 cs.SD eess.AS

keywords piano transcriptionpolyphonic musicADSR envelopeHidden Markov Modeldeep learningnote segmentationMAPS datasetlate fusion

0 comments

The pith

A compact network with late fusion to an ADSR-derived HMM produces state-of-the-art piano note segments on the MAPS dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper combines a small neural network that processes piano audio frames with a handcrafted Hidden Markov Model whose transitions encode attack, decay, sustain and release stages. Network outputs are fused across time using this model to form note hypotheses, which are then accepted or rejected by a simple threshold. The resulting system reaches state-of-the-art accuracy on the MAPS dataset and shows a large gain over earlier methods specifically when the task requires correct onsets and offsets together. The architecture remains compact and trains directly with gradient descent because the temporal structure is supplied by the fixed HMM rather than learned from data.

Core claim

Late fusion of per-frame network predictions with an HMM whose transition probabilities are chosen from an ADSR envelope model, followed by a final binary decision, yields accurate polyphonic note segmentations that outperform prior approaches by a large margin on the MAPS dataset when complete note regions from onset to offset are evaluated.

What carries the argument

Late-fusion stage that combines network outputs over time with a handcrafted HMM whose transition probabilities are set from an ADSR envelope model.

If this is right

Note offsets are predicted more reliably because the HMM explicitly encodes release behaviour.
Small networks become competitive when supplied with an explicit temporal prior instead of learning dynamics from data.
Polyphonic overlaps are handled by the fusion stage rather than by the network alone.
A final threshold can reject weak hypotheses without additional learned components.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ADSR prior could be tested on other percussive or sustained instruments that share similar amplitude envelopes.
Replacing the handcrafted transitions with parameters learned from data would reveal how much the fixed ADSR model contributes.
The low parameter count suggests the method could run in real time on modest hardware once the HMM is integrated into an online decoder.

Load-bearing premise

The fixed transition probabilities taken from the ADSR model supply a temporal prior that is accurate enough for real piano performances without any learned sequence dynamics.

What would settle it

An experiment on the MAPS test set in which another transcription system records a higher F1 score for complete note regions would falsify the claimed superiority.

read the original abstract

We investigate a late-fusion approach to piano transcription, combined with a strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM). The network architecture under consideration is compact in terms of its number of parameters and easy to train with gradient descent. The network outputs are fused over time in the final stage to obtain note segmentations, with an HMM whose transition probabilities are chosen based on a model of attack, decay, sustain, release (ADSR) envelopes, commonly used for sound synthesis. The note segments are then subject to a final binary decision rule to reject too weak note segment hypotheses. We obtain state-of-the-art results on the MAPS dataset, and are able to outperform other approaches by a large margin, when predicting complete note regions from onsets to offsets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs a compact neural net with a fixed ADSR-derived HMM for full-note piano transcription and claims large gains on MAPS, but supplies no numbers in the abstract to support that.

read the letter

The new piece is the late fusion step that takes network frame outputs and runs them through a handcrafted HMM whose transition probabilities come straight from a generic ADSR envelope model, then applies a simple threshold to drop weak segments. That specific combination for producing complete onset-to-offset regions is not just another end-to-end network. The network itself stays small and trains with ordinary gradient descent, which is a practical plus if you want something that runs without huge compute. The final binary decision rule is also straightforward and avoids over-segmentation. Those choices show a clear attempt to bring domain knowledge about note envelopes into the pipeline rather than forcing the network to learn timing from scratch. The central claim is empirical superiority on MAPS for full note regions. The abstract states state-of-the-art results and large-margin outperformance, yet gives no F1 scores, no baseline numbers, no error bars, and no dataset details. That makes the size of the improvement impossible to judge from the provided text. The HMM transitions are chosen once from the ADSR model and never fitted or ablated against uniform or learned alternatives. If the stress-test concern holds, most of the segmentation quality may already come from the network plus threshold, with the prior adding little or even hurting on expressive timing and pedaling. The paper would be for MIR groups already working on polyphonic transcription who want to test whether a lightweight temporal prior helps. It deserves a serious referee to check whether the full manuscript actually reports the metrics and ablations that would let us see if the HMM component is doing real work.

Referee Report

2 major / 2 minor

Summary. The paper proposes a late-fusion piano transcription pipeline in which a compact neural network produces frame-level activations that are then decoded into note segments via a handcrafted HMM whose transition matrix is derived from a generic ADSR envelope model; a final binary threshold rejects weak hypotheses. It claims state-of-the-art performance on the MAPS dataset for full note-region transcription (onsets to offsets), outperforming prior methods by a large margin.

Significance. If the empirical claims hold after proper validation, the work would illustrate that a small, easily trained network plus a domain-derived temporal prior can achieve strong segmentation without learned sequence modeling, offering a lightweight alternative to end-to-end recurrent or attention-based transcription systems.

major comments (2)

[Experiments] Experiments section: the central SOTA and 'large margin' claims are stated without any reported metrics, baselines, dataset splits, error bars, or ablation tables; this absence makes it impossible to verify the contribution of the HMM or to assess whether the network outputs plus simple thresholding already account for most of the reported quality.
[Method] Method (HMM construction): transition probabilities are handcrafted from a fixed ADSR model rather than estimated from MAPS training data or compared against uniform, learned, or data-driven alternatives; because the performance gain is explicitly attributed to this temporal prior, the lack of an ablation directly undermines the load-bearing assumption that the chosen probabilities generalize to real performances with pedaling and expressive timing.

minor comments (2)

[Method] Notation for the final binary decision rule is introduced without an explicit equation or threshold value, making the complete pipeline hard to reproduce from the text alone.
[Abstract] The abstract asserts quantitative superiority but supplies none of the supporting numbers; moving at least the headline F1 or precision-recall figures into the abstract would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We agree that the experimental claims require fuller quantitative support and will revise the manuscript to include the requested metrics, baselines, splits, and ablations.

read point-by-point responses

Referee: [Experiments] Experiments section: the central SOTA and 'large margin' claims are stated without any reported metrics, baselines, dataset splits, error bars, or ablation tables; this absence makes it impossible to verify the contribution of the HMM or to assess whether the network outputs plus simple thresholding already account for most of the reported quality.

Authors: We acknowledge the omission of explicit quantitative results in the experiments section of the current manuscript. In the revision we will add a results table reporting note-level and frame-level F1 scores on the standard MAPS train/test splits, direct numerical comparisons against published baselines, standard deviations across runs where applicable, and an ablation isolating the HMM decoder versus the network outputs with simple thresholding. This will make the SOTA claims and the HMM contribution verifiable. revision: yes
Referee: [Method] Method (HMM construction): transition probabilities are handcrafted from a fixed ADSR model rather than estimated from MAPS training data or compared against uniform, learned, or data-driven alternatives; because the performance gain is explicitly attributed to this temporal prior, the lack of an ablation directly undermines the load-bearing assumption that the chosen probabilities generalize to real performances with pedaling and expressive timing.

Authors: The ADSR-derived transitions are deliberately handcrafted to encode a domain-specific acoustic prior rather than learned from the same data used to train the network. We agree an ablation is needed to quantify its benefit. The revised manuscript will include comparisons of the fixed ADSR matrix against both uniform transition probabilities and transition probabilities estimated directly from MAPS training data, together with a brief analysis of performance on pedal-active excerpts to address generalization to expressive timing. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical results rest on external benchmarks

full rationale

The paper's central claim is an empirical SOTA result on the MAPS dataset obtained by late fusion of network outputs with a handcrafted HMM whose transition probabilities are taken from the standard ADSR envelope model used in sound synthesis. No equations, parameters, or predictions are shown to reduce to fitted quantities or self-definitions by construction; the HMM prior is fixed and external rather than estimated from the evaluation data or the network itself. The performance comparison is externally falsifiable on a public benchmark and does not rely on self-citation chains or uniqueness theorems imported from the authors' prior work. This is the normal case of a self-contained empirical pipeline.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central performance claim rests on the domain assumption that ADSR envelopes supply useful transition probabilities for piano notes and that late fusion plus a final binary decision suffices to produce accurate segments.

axioms (1)

domain assumption ADSR envelope model supplies appropriate transition probabilities for a piano note HMM.
Explicitly stated as the basis for choosing HMM transitions in the abstract.

pith-pipeline@v0.9.0 · 5660 in / 1105 out tokens · 22368 ms · 2026-05-25T18:15:01.937252+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 3 internal anchors

[1]

For each note sounding in the recording, we would like to obtain a tuple (s,e,n,v ), denoting start, end, MIDI note number and op- tionally volume

INTRODUCTION Polyphonic transcription is the task of extracting a symbolic score from an audio recording, regardless of how many in- struments or notes are playing concurrently. For each note sounding in the recording, we would like to obtain a tuple (s,e,n,v ), denoting start, end, MIDI note number and op- tionally volume. We tackle a somewhat easier sub...

work page 2019
[2]

RELA TION TO PREVIOUS WORK It could be shown that modelling different note phases in time with different neural network outputs can be advantageous [2, 4, 5, 8]. The piano transcription approach in [4] uses two separate, bi-directional long-short term recurrent neural net- works (BLSTMs) to train a pitched onset detector together with a framewise pitch de...

work page internal anchor Pith review Pith/arXiv arXiv 1906
[3]

Gaussian Dropout

MODELS When predicting multiple targets simultaneously with neu- ral networks, one can consider two ends of a spectrum. One could either branch out immediately after the input layer, and thus have a separate network for each target, or one could branch out immediately before the output layers and have a shared network for all targets. We opt to use a mode...

work page 2019
[4]

Conﬁguration II

EXPERIMENTS We use the MAPS dataset [24] to train and select models. The dataset contains 210 recordings of classical piano music, ren- dered using 7 samplebank-based synthesizers. Additionally, there are two sets of recordings of a reproducing Disklavier piano: 30 recordings from a microphone in close proximity to the piano, and 30 recordings from a micr...

work page
[5]

CONCLUSION AND FUTURE WORK We have shown that simple, small convolutional neural net- works with multiple outputs for different temporal phases of a note, together with sequential probabilistic models can achieve state-of-the-art results on a widely used piano tran- scription dataset. Some potential improvements for the future include: a global model for ...

work page
[6]

BA- SIS, Basisprogramm

ACKNOWLEDGMENTS This work is supported by the European Research Council via ERC Grant Agreement 670035, project CON ESPRESSIONE and the Austrian Promotion Agency (FFG) under the “BA- SIS, Basisprogramm” umbrella program. The Tesla K40 used for this research was donated by the NVIDIA Corporation

work page
[7]

Polyphonic Piano Note Transcription with Recurrent Neural Networks,

Sebastian B ¨ock and Markus Schedl, “Polyphonic Piano Note Transcription with Recurrent Neural Networks,” inIEEE Inter- national Conference on Acoustics, Speech and Signal Process- ing, ICASSP, Kyoto, Japan, March 25-30, 2012, pp. 121–124

work page 2012
[8]

Polyphonic Pitch Tracking with Deep Layered Learning

Anders Elowsson, “Polyphonic Pitch Tracking with Deep Lay- ered Learning,” CoRR, vol. abs/1804.02918, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Polyphonic Pitch Detection with Convolutional Recurrent Neural Networks,

Carl Thom ´e and Sven Ahlb ¨ack, “Polyphonic Pitch Detection with Convolutional Recurrent Neural Networks,” in MIREX 2017 abstracts, 2017

work page 2017
[10]

Onsets and Frames: Dual-Objective Piano Transcrip- tion,

Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, and Douglas Eck, “Onsets and Frames: Dual-Objective Piano Transcrip- tion,” in Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. 4 Published in IEEE Internati...

work page 2018
[11]

A Parallel Fusion Approach to Piano Music Transcription based on Convolutional Neural Network,

Fu’ze Cong, Shuchang Liu, Li Guo, and Geraint A. Wiggins, “A Parallel Fusion Approach to Piano Music Transcription based on Convolutional Neural Network,” in IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing, ICASSP, Calgary, AL, Canada, April 15-20 , 2018

work page 2018
[12]

An End-to-End Neural Network for Polyphonic Piano Music Tran- scription,

Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon, “An End-to-End Neural Network for Polyphonic Piano Music Tran- scription,” IEEE/ACM Trans. Audio, Speech & Language Pro- cessing, vol. 24, no. 5, pp. 927–939, 2016

work page 2016
[13]

On the Poten- tial of Simple Framewise Approaches to Piano Transcription,

Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian B¨ock, Andreas Arzt, and Gerhard Widmer, “On the Poten- tial of Simple Framewise Approaches to Piano Transcription,” in Proceedings of the 17th International Society for Music In- formation Retrieval Conference, ISMIR 2016, New York City, United States, August 7-11, 2016 , 2016, pp. 475–481

work page 2016
[14]

A Dozen Tricks with Multitask Learning,

Rich Caruana, “A Dozen Tricks with Multitask Learning,” in Neural Networks: Tricks of the Trade - Second Edition , pp. 163–189. 2012

work page 2012
[15]

Auto- matic Transcription of Piano Music based on HMM Tracking of jointly-estimated Pitches,

Valentin Emiya, Roland Badeau, and Bertrand David, “Auto- matic Transcription of Piano Music based on HMM Tracking of jointly-estimated Pitches,” in 2008 16th European Signal Processing Conference, EUSIPCO 2008, Lausanne, Switzer- land, August 25-29, 2008, pp. 1–5

work page 2008
[16]

Mul- tipitch Estimation of Piano Music by Exemplar-Based Sparse Representation,

Cheng-Te Lee, Yi-Hsuan Yang, and Homer H. Chen, “Mul- tipitch Estimation of Piano Music by Exemplar-Based Sparse Representation,” IEEE Trans. Multimedia, vol. 14, no. 3-1, pp. 608–618, 2012

work page 2012
[17]

Music Transcription with ISA and HMM,

Emmanuel Vincent and Xavier Rodet, “Music Transcription with ISA and HMM,” in Independent Component Analysis and Blind Signal Separation, Fifth International Conference, ICA Granada, Spain, September 22-24, Proceedings, 2004, pp. 1197–1204

work page 2004
[18]

Improving Note Segmentation in Auto- matic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method,

Dorian Cazau, Yuancheng Wang, Olivier Adam, Qiao Wang, and Gr ´egory Nuel, “Improving Note Segmentation in Auto- matic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method,” in Proceedings of the 18th Inter- national Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017 , 2017, pp. 523–530

work page 2017
[19]

Facto- rial Scaled Hidden Markov Model for Polyphonic Audio Rep- resentation and Source Separation,

Alexey Ozerov, C ´edric F´evotte, and Maurice Charbit, “Facto- rial Scaled Hidden Markov Model for Polyphonic Audio Rep- resentation and Source Separation,” in IEEE Workshop on Ap- plications of Signal Processing to Audio and Acoustics, WAS- PAA ’09, New Paltz, NY, USA, October 18-21 , 2009, pp. 121– 124

work page 2009
[20]

Explicit Duration Hidden Markov Models for Multiple-Instrument Polyphonic Music Transcription,

Emmanouil Benetos and Tillman Weyde, “Explicit Duration Hidden Markov Models for Multiple-Instrument Polyphonic Music Transcription,” in Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil, November 4-8, 2013 , 2013, pp. 269– 274

work page 2013
[21]

An Efﬁcient Temporally-Constrained Probabilistic Model for Multiple- Instrument Music Transcription,

Emmanouil Benetos and Tillman Weyde, “An Efﬁcient Temporally-Constrained Probabilistic Model for Multiple- Instrument Music Transcription,” in Proceedings of the 16th International Society for Music Information Retrieval Confer- ence, ISMIR 2015, M ´alaga, Spain, October 26-30, 2015, 2015, pp. 701–707

work page 2015
[22]

Polyphonic music transcrip- tion using note event modeling,

M. P. Ryynanen and A. Klapuri, “Polyphonic music transcrip- tion using note event modeling,” in IEEE Workshop on Appli- cations of Signal Processing to Audio and Acoustics, Oct 2005, pp. 319–322

work page 2005
[23]

A discriminative model for polyphonic piano transcription,

Graham E. Poliner and Daniel P. W. Ellis, “A discriminative model for polyphonic piano transcription,” EURASIP J. Adv. Sig. Proc., vol. 2007

work page 2007
[24]

Bilevel Sparse Mod- els for Polyphonic Music Transcription,

Tal Ben Yakar, Roee Litman, Pablo Sprechmann, Alexan- der M. Bronstein, and Guillermo Sapiro, “Bilevel Sparse Mod- els for Polyphonic Music Transcription,” in Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR, Curitiba, Brazil, November 4-8 , 2013, pp. 65–70

work page 2013
[25]

Non- negative Hidden Markov Modeling of Audio with Application to Source Separation,

Gautham J. Mysore, Paris Smaragdis, and Bhiksha Raj, “Non- negative Hidden Markov Modeling of Audio with Application to Source Separation,” in Latent V ariable Analysis and Signal Separation - 9th International Conference, LVA/ICA 2010, St. Malo, France, September 27-30, 2010. Proceedings , 2010, pp. 140–148

work page 2010
[26]

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arn ´e Clevert, Thomas Unterthiner, and Sepp Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs),” CoRR, vol. abs/1511.07289, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[27]

Dropout: A simple Way to Prevent Neural Networks from Overﬁtting,

Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: A simple Way to Prevent Neural Networks from Overﬁtting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014

work page 1929
[28]

madmom: a new Python Au- dio and Music Signal Processing Library,

Sebastian B ¨ock, Filip Korzeniowski, Jan Schl ¨uter, Florian Krebs, and Gerhard Widmer, “madmom: a new Python Au- dio and Music Signal Processing Library,” in Proceedings of the 24th ACM International Conference on Multimedia , Ams- terdam, The Netherlands, 10 2016, pp. 1174–1178

work page 2016
[29]

Experimenting with Musically Motivated Convolutional Neural Networks,

Jordi Pons, Thomas Lidy, and Xavier Serra, “Experimenting with Musically Motivated Convolutional Neural Networks,” in 14th International Workshop on Content-Based Multimedia In- dexing (CBMI). IEEE, 2016, pp. 1–6

work page 2016
[30]

Mul- tipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle,

Valentin Emiya, Roland Badeau, and Bertrand David, “Mul- tipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle,” IEEE Trans. Audio, Speech & Language Processing, vol. 18, no. 6, pp. 1643–1654, 2010

work page 2010
[31]

MIR EV AL: A Transparent Implementation of Common MIR Metrics,

Colin Raffel, Brian McFee, Eric J. Humphrey, Justin Sala- mon, Oriol Nieto, Dawen Liang, and Daniel P. W. Ellis, “MIR EV AL: A Transparent Implementation of Common MIR Metrics,” in Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, October 27-31, 2014, 2014, pp. 367–372. 5

work page 2014

[1] [1]

For each note sounding in the recording, we would like to obtain a tuple (s,e,n,v ), denoting start, end, MIDI note number and op- tionally volume

INTRODUCTION Polyphonic transcription is the task of extracting a symbolic score from an audio recording, regardless of how many in- struments or notes are playing concurrently. For each note sounding in the recording, we would like to obtain a tuple (s,e,n,v ), denoting start, end, MIDI note number and op- tionally volume. We tackle a somewhat easier sub...

work page 2019

[2] [2]

RELA TION TO PREVIOUS WORK It could be shown that modelling different note phases in time with different neural network outputs can be advantageous [2, 4, 5, 8]. The piano transcription approach in [4] uses two separate, bi-directional long-short term recurrent neural net- works (BLSTMs) to train a pitched onset detector together with a framewise pitch de...

work page internal anchor Pith review Pith/arXiv arXiv 1906

[3] [3]

Gaussian Dropout

MODELS When predicting multiple targets simultaneously with neu- ral networks, one can consider two ends of a spectrum. One could either branch out immediately after the input layer, and thus have a separate network for each target, or one could branch out immediately before the output layers and have a shared network for all targets. We opt to use a mode...

work page 2019

[4] [4]

Conﬁguration II

EXPERIMENTS We use the MAPS dataset [24] to train and select models. The dataset contains 210 recordings of classical piano music, ren- dered using 7 samplebank-based synthesizers. Additionally, there are two sets of recordings of a reproducing Disklavier piano: 30 recordings from a microphone in close proximity to the piano, and 30 recordings from a micr...

work page

[5] [5]

CONCLUSION AND FUTURE WORK We have shown that simple, small convolutional neural net- works with multiple outputs for different temporal phases of a note, together with sequential probabilistic models can achieve state-of-the-art results on a widely used piano tran- scription dataset. Some potential improvements for the future include: a global model for ...

work page

[6] [6]

BA- SIS, Basisprogramm

ACKNOWLEDGMENTS This work is supported by the European Research Council via ERC Grant Agreement 670035, project CON ESPRESSIONE and the Austrian Promotion Agency (FFG) under the “BA- SIS, Basisprogramm” umbrella program. The Tesla K40 used for this research was donated by the NVIDIA Corporation

work page

[7] [7]

Polyphonic Piano Note Transcription with Recurrent Neural Networks,

Sebastian B ¨ock and Markus Schedl, “Polyphonic Piano Note Transcription with Recurrent Neural Networks,” inIEEE Inter- national Conference on Acoustics, Speech and Signal Process- ing, ICASSP, Kyoto, Japan, March 25-30, 2012, pp. 121–124

work page 2012

[8] [8]

Polyphonic Pitch Tracking with Deep Layered Learning

Anders Elowsson, “Polyphonic Pitch Tracking with Deep Lay- ered Learning,” CoRR, vol. abs/1804.02918, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Polyphonic Pitch Detection with Convolutional Recurrent Neural Networks,

Carl Thom ´e and Sven Ahlb ¨ack, “Polyphonic Pitch Detection with Convolutional Recurrent Neural Networks,” in MIREX 2017 abstracts, 2017

work page 2017

[10] [10]

Onsets and Frames: Dual-Objective Piano Transcrip- tion,

Curtis Hawthorne, Erich Elsen, Jialin Song, Adam Roberts, Ian Simon, Colin Raffel, Jesse Engel, Sageev Oore, and Douglas Eck, “Onsets and Frames: Dual-Objective Piano Transcrip- tion,” in Proceedings of the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27, 2018. 4 Published in IEEE Internati...

work page 2018

[11] [11]

A Parallel Fusion Approach to Piano Music Transcription based on Convolutional Neural Network,

Fu’ze Cong, Shuchang Liu, Li Guo, and Geraint A. Wiggins, “A Parallel Fusion Approach to Piano Music Transcription based on Convolutional Neural Network,” in IEEE Interna- tional Conference on Acoustics, Speech and Signal Processing, ICASSP, Calgary, AL, Canada, April 15-20 , 2018

work page 2018

[12] [12]

An End-to-End Neural Network for Polyphonic Piano Music Tran- scription,

Siddharth Sigtia, Emmanouil Benetos, and Simon Dixon, “An End-to-End Neural Network for Polyphonic Piano Music Tran- scription,” IEEE/ACM Trans. Audio, Speech & Language Pro- cessing, vol. 24, no. 5, pp. 927–939, 2016

work page 2016

[13] [13]

On the Poten- tial of Simple Framewise Approaches to Piano Transcription,

Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian B¨ock, Andreas Arzt, and Gerhard Widmer, “On the Poten- tial of Simple Framewise Approaches to Piano Transcription,” in Proceedings of the 17th International Society for Music In- formation Retrieval Conference, ISMIR 2016, New York City, United States, August 7-11, 2016 , 2016, pp. 475–481

work page 2016

[14] [14]

A Dozen Tricks with Multitask Learning,

Rich Caruana, “A Dozen Tricks with Multitask Learning,” in Neural Networks: Tricks of the Trade - Second Edition , pp. 163–189. 2012

work page 2012

[15] [15]

Auto- matic Transcription of Piano Music based on HMM Tracking of jointly-estimated Pitches,

Valentin Emiya, Roland Badeau, and Bertrand David, “Auto- matic Transcription of Piano Music based on HMM Tracking of jointly-estimated Pitches,” in 2008 16th European Signal Processing Conference, EUSIPCO 2008, Lausanne, Switzer- land, August 25-29, 2008, pp. 1–5

work page 2008

[16] [16]

Mul- tipitch Estimation of Piano Music by Exemplar-Based Sparse Representation,

Cheng-Te Lee, Yi-Hsuan Yang, and Homer H. Chen, “Mul- tipitch Estimation of Piano Music by Exemplar-Based Sparse Representation,” IEEE Trans. Multimedia, vol. 14, no. 3-1, pp. 608–618, 2012

work page 2012

[17] [17]

Music Transcription with ISA and HMM,

Emmanuel Vincent and Xavier Rodet, “Music Transcription with ISA and HMM,” in Independent Component Analysis and Blind Signal Separation, Fifth International Conference, ICA Granada, Spain, September 22-24, Proceedings, 2004, pp. 1197–1204

work page 2004

[18] [18]

Improving Note Segmentation in Auto- matic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method,

Dorian Cazau, Yuancheng Wang, Olivier Adam, Qiao Wang, and Gr ´egory Nuel, “Improving Note Segmentation in Auto- matic Piano Music Transcription Systems with a Two-State Pitch-Wise HMM Method,” in Proceedings of the 18th Inter- national Society for Music Information Retrieval Conference, ISMIR 2017, Suzhou, China, October 23-27, 2017 , 2017, pp. 523–530

work page 2017

[19] [19]

Facto- rial Scaled Hidden Markov Model for Polyphonic Audio Rep- resentation and Source Separation,

Alexey Ozerov, C ´edric F´evotte, and Maurice Charbit, “Facto- rial Scaled Hidden Markov Model for Polyphonic Audio Rep- resentation and Source Separation,” in IEEE Workshop on Ap- plications of Signal Processing to Audio and Acoustics, WAS- PAA ’09, New Paltz, NY, USA, October 18-21 , 2009, pp. 121– 124

work page 2009

[20] [20]

Explicit Duration Hidden Markov Models for Multiple-Instrument Polyphonic Music Transcription,

Emmanouil Benetos and Tillman Weyde, “Explicit Duration Hidden Markov Models for Multiple-Instrument Polyphonic Music Transcription,” in Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR 2013, Curitiba, Brazil, November 4-8, 2013 , 2013, pp. 269– 274

work page 2013

[21] [21]

An Efﬁcient Temporally-Constrained Probabilistic Model for Multiple- Instrument Music Transcription,

Emmanouil Benetos and Tillman Weyde, “An Efﬁcient Temporally-Constrained Probabilistic Model for Multiple- Instrument Music Transcription,” in Proceedings of the 16th International Society for Music Information Retrieval Confer- ence, ISMIR 2015, M ´alaga, Spain, October 26-30, 2015, 2015, pp. 701–707

work page 2015

[22] [22]

Polyphonic music transcrip- tion using note event modeling,

M. P. Ryynanen and A. Klapuri, “Polyphonic music transcrip- tion using note event modeling,” in IEEE Workshop on Appli- cations of Signal Processing to Audio and Acoustics, Oct 2005, pp. 319–322

work page 2005

[23] [23]

A discriminative model for polyphonic piano transcription,

Graham E. Poliner and Daniel P. W. Ellis, “A discriminative model for polyphonic piano transcription,” EURASIP J. Adv. Sig. Proc., vol. 2007

work page 2007

[24] [24]

Bilevel Sparse Mod- els for Polyphonic Music Transcription,

Tal Ben Yakar, Roee Litman, Pablo Sprechmann, Alexan- der M. Bronstein, and Guillermo Sapiro, “Bilevel Sparse Mod- els for Polyphonic Music Transcription,” in Proceedings of the 14th International Society for Music Information Retrieval Conference, ISMIR, Curitiba, Brazil, November 4-8 , 2013, pp. 65–70

work page 2013

[25] [25]

Non- negative Hidden Markov Modeling of Audio with Application to Source Separation,

Gautham J. Mysore, Paris Smaragdis, and Bhiksha Raj, “Non- negative Hidden Markov Modeling of Audio with Application to Source Separation,” in Latent V ariable Analysis and Signal Separation - 9th International Conference, LVA/ICA 2010, St. Malo, France, September 27-30, 2010. Proceedings , 2010, pp. 140–148

work page 2010

[26] [26]

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

Djork-Arn ´e Clevert, Thomas Unterthiner, and Sepp Hochreiter, “Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs),” CoRR, vol. abs/1511.07289, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[27] [27]

Dropout: A simple Way to Prevent Neural Networks from Overﬁtting,

Nitish Srivastava, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, “Dropout: A simple Way to Prevent Neural Networks from Overﬁtting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014

work page 1929

[28] [28]

madmom: a new Python Au- dio and Music Signal Processing Library,

Sebastian B ¨ock, Filip Korzeniowski, Jan Schl ¨uter, Florian Krebs, and Gerhard Widmer, “madmom: a new Python Au- dio and Music Signal Processing Library,” in Proceedings of the 24th ACM International Conference on Multimedia , Ams- terdam, The Netherlands, 10 2016, pp. 1174–1178

work page 2016

[29] [29]

Experimenting with Musically Motivated Convolutional Neural Networks,

Jordi Pons, Thomas Lidy, and Xavier Serra, “Experimenting with Musically Motivated Convolutional Neural Networks,” in 14th International Workshop on Content-Based Multimedia In- dexing (CBMI). IEEE, 2016, pp. 1–6

work page 2016

[30] [30]

Mul- tipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle,

Valentin Emiya, Roland Badeau, and Bertrand David, “Mul- tipitch Estimation of Piano Sounds Using a New Probabilistic Spectral Smoothness Principle,” IEEE Trans. Audio, Speech & Language Processing, vol. 18, no. 6, pp. 1643–1654, 2010

work page 2010

[31] [31]

MIR EV AL: A Transparent Implementation of Common MIR Metrics,

Colin Raffel, Brian McFee, Eric J. Humphrey, Justin Sala- mon, Oriol Nieto, Dawen Liang, and Daniel P. W. Ellis, “MIR EV AL: A Transparent Implementation of Common MIR Metrics,” in Proceedings of the 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, October 27-31, 2014, 2014, pp. 367–372. 5

work page 2014