An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

Fred Jalbert-Desforges

arxiv: 2605.06685 · v1 · submitted 2026-04-25 · 💻 cs.SD · eess.AS· stat.AP

An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

Fred Jalbert-Desforges This is my paper

Pith reviewed 2026-05-11 00:44 UTC · model grok-4.3

classification 💻 cs.SD eess.ASstat.AP

keywords audio transcriptionpiano repertoireinformation theoryZipfian distributionharmonic scale degreesKullback-Leibler divergencecomposer profilingneoclassical music

0 comments

The pith

A certified audio transcription pipeline yields composer profiles that separate neoclassical piano artists from historical composers by tighter Zipfian fits in note transitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an audio-to-analysis pipeline that starts with a transcription layer certified at F1 = 0.9791 on the MAESTRO benchmark and then extracts empirical distributions over harmonic scale degrees from 1,238 piano pieces. It computes Shannon entropy to gauge predictability, asymmetric Kullback-Leibler divergence to recover stylistic connections, and Zipfian rank-frequency modeling to measure regularity in transitions. The resulting profiles place historical composers along a narrow entropy axis from 3.33 to 3.86 bits and recover known lineages such as Haydn-Beethoven through the smallest divergences. Neoclassical artists show a mean R² of 0.78 on Zipfian fits versus 0.46 for historical composers, a gap larger than the spread inside either group.

Core claim

Applied to 15 MAESTRO composers with at least ten pieces each spanning Baroque to early twentieth century plus contemporary neoclassical artists, the pipeline shows that neoclassical transition distributions follow Zipf's law more closely, with mean R² = 0.78 versus 0.46 for historical composers. The same profiles order composers by harmonic predictability within a narrow entropy band and recover stylistic lineages through minimal KL divergences, while Mendelssohn appears as a consistent outlier.

What carries the argument

The audio-to-analysis pipeline with a certified transcription layer that produces harmonic scale degree distributions for subsequent entropy, asymmetric KL divergence, and Zipfian rank-frequency analysis.

Load-bearing premise

Distributions over harmonic scale degrees taken from transcribed audio faithfully reflect each composer's compositional vocabulary without substantial distortion from performance practice, transcription errors on non-standard passages, or the scale-degree representation chosen.

What would settle it

A fresh set of transcriptions for the same 1,238 pieces that removes the R² gap between neoclassical (mean 0.78) and historical (mean 0.46) groups while preserving the within-group spreads would falsify the separation result.

Figures

Figures reproduced from arXiv: 2605.06685 by Fred Jalbert-Desforges.

**Figure 2.** Figure 2: Zipfian R² gap between neoclassical artists [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: Why the R 2 gap: rank-frequency of scale-degree transitions. Top-30 transitions per composer (dots) against the Zipfian regression line fit on all 225 transitions (dashed red). Slope exponents are comparable (Glass = 1.41, Chopin = 1.40) but the fit quality is radically different: Glass's distribution tracks the line across the full range, while Chopin's distribution is too flat for a power law to explain.… view at source ↗

**Figure 4.** Figure 4: Figure B.1: KL heatmap 27 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗

read the original abstract

We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, J\'ohannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean $R^2 = 0.78$ for neoclassical versus 0.46 for historical (N $\geq$ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper certifies transcription on MAESTRO then profiles 1238 piano pieces with entropy, KL, and Zipf on scale degrees, recovering some lineages and separating neoclassical from historical on transition regularity, but the contemporary validation is thin.

read the letter

The main takeaway is that this work turns raw audio into composer profiles using a certified transcription step followed by standard information-theoretic tools. It processes over a thousand pieces, recovers some known stylistic links like Haydn to Beethoven through low KL divergence, and finds that neoclassical artists fit Zipf's law better on their scale-degree transitions than historical composers do, with a reported R-squared gap larger than within-group spread.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an audio-to-analysis pipeline with a certified transcription layer (F1 = 0.9791 on MAESTRO v3.0.0) that extracts empirical distributions over harmonic scale degrees from 1,238 piano pieces across 15 historical composers and 5 contemporary neoclassical artists. These distributions are analyzed using Shannon entropy, asymmetric KL divergence, and Zipfian rank-frequency modeling to produce composer-level profiles, with claims that the profiles order composers by harmonic predictability (narrow entropy range 3.33-3.86 bits), recover known stylistic lineages via minimal KL values, and separate neoclassical from historical composers on Zipfian fit quality (mean R² = 0.78 vs. 0.46, N ≥ 10 pieces each, larger than within-group spread). All estimates use Laplace-smoothed bootstrap 95% CIs.

Significance. If the central claims hold, the work supplies a scalable, benchmark-certified method for quantitative stylistic profiling of piano repertoire directly from audio, which could support reproducible musicological comparisons. The explicit certification of the transcription step with a concrete F1 score and bootstrap CIs on a public dataset is a clear strength that enhances reproducibility. The reported separation on Zipfian regularity, if robust, would constitute a falsifiable, information-theoretic signature of minimalist tendencies in contemporary neoclassical composition.

major comments (2)

[Methods (feature extraction and analysis pipeline)] The extraction of harmonic scale degrees from transcribed MIDI (including key detection, scale representation, and transition definition) is not described with sufficient specificity to allow reproduction or error analysis. This is load-bearing for the transition distributions, entropy values, KL divergences, and especially the Zipfian R² computations that underpin the separation claim.
[Results (composer separation and Zipfian modeling)] The separation result (mean R² = 0.78 neoclassical vs. 0.46 historical) is presented as evidence of a compact transition vocabulary in contemporary artists, but the transcription validation is confined to the MAESTRO classical benchmark. No style-specific error rates, perturbation tests, or sensitivity analysis for contemporary acoustics (e.g., repetitive patterns in Glass or ambient elements in Arnalds) are reported, leaving open the possibility that systematic pitch or duration biases differentially affect the empirical transition probabilities and R² values between groups.

minor comments (2)

[Abstract] The abstract states that all estimates use Laplace-smoothed bootstrap CIs, but the exact smoothing parameter value and its justification are not stated in the provided text; this should be added for full reproducibility.
[Abstract and Results] Notation for composer names (e.g., J'ohannsson) and the precise definition of the harmonic alphabet size used for entropy calculations should be clarified to avoid ambiguity in the narrow reported entropy range.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify key areas where greater specificity and robustness checks will strengthen the manuscript. We respond point by point below and commit to the indicated revisions.

read point-by-point responses

Referee: [Methods (feature extraction and analysis pipeline)] The extraction of harmonic scale degrees from transcribed MIDI (including key detection, scale representation, and transition definition) is not described with sufficient specificity to allow reproduction or error analysis. This is load-bearing for the transition distributions, entropy values, KL divergences, and especially the Zipfian R² computations that underpin the separation claim.

Authors: We agree that the current Methods description of the harmonic scale-degree pipeline is insufficiently detailed for reproduction. In the revised manuscript we will expand the relevant subsection to specify: the key-detection algorithm and its parameters (with reference to the implementation), the exact mapping from MIDI pitches to scale degrees (including treatment of chromaticism, modulations, and non-diatonic notes), and the operational definition of transitions (note-to-note versus chordal, temporal windowing if any). We will also insert pseudocode that traces the full path from aligned MIDI events to the Laplace-smoothed empirical transition matrix. These additions will make the entropy, KL, and Zipfian R² calculations directly replicable. revision: yes
Referee: [Results (composer separation and Zipfian modeling)] The separation result (mean R² = 0.78 neoclassical vs. 0.46 historical) is presented as evidence of a compact transition vocabulary in contemporary artists, but the transcription validation is confined to the MAESTRO classical benchmark. No style-specific error rates, perturbation tests, or sensitivity analysis for contemporary acoustics (e.g., repetitive patterns in Glass or ambient elements in Arnalds) are reported, leaving open the possibility that systematic pitch or duration biases differentially affect the empirical transition probabilities and R² values between groups.

Authors: We acknowledge that the certified F1 score is reported only on the MAESTRO classical test set and that no style-specific error analysis was performed for the contemporary neoclassical recordings. In the revision we will add a dedicated sensitivity subsection that (i) manually inspects a stratified sample of the contemporary transcriptions for common error patterns (repeated-note omissions, duration smearing in ambient textures) and (ii) applies controlled perturbations to note onsets, offsets, and pitches at rates consistent with the observed MAESTRO error profile, then recomputes the transition distributions and R² values for both groups. The results of this analysis will be reported with the same bootstrap CIs; if the neoclassical–historical gap remains larger than within-group variability, the separation claim will be retained with an explicit robustness statement; otherwise the interpretation will be qualified. revision: yes

Circularity Check

0 steps flagged

No circularity: all profiles are direct empirical computations from transcribed counts

full rationale

The pipeline transcribes audio to obtain empirical distributions over harmonic scale degrees, then applies standard, parameter-light measures (Shannon entropy, asymmetric KL divergence, and R² of Zipfian rank-frequency fit) to those counts. No equation in the described chain defines an output in terms of itself or renames a fitted parameter as a 'prediction.' The reported separation (mean R² 0.78 vs 0.46) is an observed statistical difference between two groups of independently transcribed pieces, not a definitional tautology. Transcription fidelity is validated on the external MAESTRO benchmark rather than by self-reference. No self-citation load-bearing steps or ansatz smuggling appear in the derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard information-theoretic definitions and one domain assumption about what harmonic scale degrees capture; no new entities are introduced and only routine smoothing is used.

free parameters (1)

Laplace smoothing parameter
Applied to empirical distributions before entropy and KL calculations to avoid zero probabilities; value is the conventional additive constant.

axioms (2)

standard math Shannon entropy and Kullback-Leibler divergence are the appropriate measures for comparing discrete distributions over scale degrees
Invoked without derivation as the analytic tools for the profiles.
domain assumption Empirical distributions over harmonic scale degrees extracted from transcribed audio reflect compositional vocabulary
Central premise that allows the pipeline to produce stylistic profiles.

pith-pipeline@v0.9.0 · 5618 in / 1600 out tokens · 68389 ms · 2026-05-11T00:44:32.310961+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Zipfian rank-frequency fits on the 15×15 transition distribution... mean R² = 0.78 for neoclassical versus 0.46 for historical

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

vega-mir: An information-theoretic Python toolkit for symbolic music, with applications to harmonic graphs and rubato spectra
cs.SD 2026-05 unverdicted novelty 6.0

vega-mir bundles nine metrics for symbolic music and applies network and spectral analysis to find a 0.61 correlation between composer graph centrality and KL divergence plus structured rubato in Bach performers.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper

[1]

Agresti, A., & Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2), 119–126

work page 1998
[2]

Bogdanov, D., et al. (2013). Essentia: An Audio Analysis Library for Music Information Retrieval. ISMIR 2013

work page 2013
[3]

Bradshaw, L., et al. (2025). Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling. ICLR 2025

work page 2025
[4]

M., & Thomas, J

Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley

work page 2006
[5]

S., & Ariza, C

Cuthbert, M. S., & Ariza, C. (2010). music21: A toolkit for computer- aided musicology and symbolic music data. ISMIR 2010

work page 2010
[6]

F ebres, G., & Jaffé, K. (2017). Music viewed by its entropy content: A novel window for comparative analysis. PLOS ONE , 12(10), e0185757

work page 2017
[7]

Hawthorne, C., Elsen, E., Song, J., Roberts, A., Simon, I., Raf- fel, C., Engel, J., Oore, S., & Eck, D. (2018). Onsets and Frames: Dual-Objective Piano Transcription. ISMIR 2018

work page 2018
[8]

A., Dieleman, S., Elsen, E., Engel, J., & Eck, D

Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C.-Z. A., Dieleman, S., Elsen, E., Engel, J., & Eck, D. (2019). En- abling Factorized Piano Music Modeling and Generation with the MAE- STRO Dataset. ICLR 2019

work page 2019
[9]

Knopoff, L., & Hutchinson, W. (1981). Information Theory for Mu- sical Continua. Journal of Music Theory , 25(1), 17–44

work page 1981
[10]

Knopoff, L., & Hutchinson, W. (1983). Entropy as a Measure of Style: The Influence of Sample Length. Journal of Music Theory , 27(1), 75–97

work page 1983
[11]

Kong, Q., Li, B., Song, X., W an, Y., & W ang, Y. (2021). High- resolution Piano Transcription with Pedals by Regressing Onset and Offset Times. IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 29, 3707–3717

work page 2021
[12]

Kong, Q., et al. (2020). GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music. arXiv:2010.07061. 30

work page arXiv 2020
[13]

Liu, L., W ei, J., Zhang, H., Xin, J., & Huang, J. (2013). A statis- tical physics view of pitch fluctuations in the classical music from Bach to Chopin: Evidence for scaling. PLOS ONE , 8(3), e58710

work page 2013
[14]

Manaris, B., Romero, J., Machado, P ., Krehbiel, D., Hirzel, T., Pharr, W., & Davis, R. B. (2005). Zipf’s Law, Music Classification, and Aesthetics. Computer Music Journal , 29(1), 55–69

work page 2005
[15]

McKay , C. (2010). Automatic Music Classification with jSymbolic . PhD thesis, McGill University

work page 2010
[16]

T., & Wiggins, G

Pearce, M. T., & Wiggins, G. A. (2006). Expectation in melody: The influence of context and learning. Music Perception, 23(5), 377–405

work page 2006
[17]

J., Salamon, J., Nieto, O., Liang, D., & Ellis, D

Raffel, C., McF ee, B., Humphrey , E. J., Salamon, J., Nieto, O., Liang, D., & Ellis, D. P . W. (2014). mir_eval: A transparent implementation of common MIR metrics. ISMIR 2014

work page 2014
[18]

Sakellariou, J., T ria, F., Loreto, V., & Pachet, F. (2017). Maximum entropy models capture melodic styles. Scientific Reports, 7, 9172

work page 2017
[19]

Serrà, J., Corral, Á., Boguñá, M., Haro, M., & Arcos, J. L. (2019). Zipf’s law in music emerges by a natural choice of Zipfian units. Scientific Reports, 9, 2646

work page 2019
[20]

Information Flow and Repetition in Music

T emperley , D.(2014). Information Flow and Repetition in Music. Jour- nal of Music Theory , 58(2), 155–178

work page 2014
[21]

F., & Clarke, J

V oss, R. F., & Clarke, J. (1975). 1/f noise in music and speech. Nature, 258, 317–318

work page 1975
[22]

Lu, W.-T., W ang, J.-C., Kong, Q., & Hung, Y.-N. (2023). Music Source Separation with Band-Split RoPE Transformer. Sound Demixing Challenge (SDX23) . arXiv:2309.02612

work page arXiv 2023
[23]

F araldo, Á., Jordà, S., & Herrera, P . (2016). A Multi-Profile Method for Key Estimation in EDM. AES Conference on Semantic Audio , 2016

work page 2016
[24]

W eiss, C. (2017). Computational Methods for Tonality-Based Style Anal- ysis of Classical Music Audio Recordings . PhD thesis, Technische Univer- sität Ilmenau

work page 2017
[25]

Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort . Addison-Wesley. 31

work page 1949

[1] [1]

Agresti, A., & Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2), 119–126

work page 1998

[2] [2]

Bogdanov, D., et al. (2013). Essentia: An Audio Analysis Library for Music Information Retrieval. ISMIR 2013

work page 2013

[3] [3]

Bradshaw, L., et al. (2025). Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling. ICLR 2025

work page 2025

[4] [4]

M., & Thomas, J

Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley

work page 2006

[5] [5]

S., & Ariza, C

Cuthbert, M. S., & Ariza, C. (2010). music21: A toolkit for computer- aided musicology and symbolic music data. ISMIR 2010

work page 2010

[6] [6]

F ebres, G., & Jaffé, K. (2017). Music viewed by its entropy content: A novel window for comparative analysis. PLOS ONE , 12(10), e0185757

work page 2017

[7] [7]

Hawthorne, C., Elsen, E., Song, J., Roberts, A., Simon, I., Raf- fel, C., Engel, J., Oore, S., & Eck, D. (2018). Onsets and Frames: Dual-Objective Piano Transcription. ISMIR 2018

work page 2018

[8] [8]

A., Dieleman, S., Elsen, E., Engel, J., & Eck, D

Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C.-Z. A., Dieleman, S., Elsen, E., Engel, J., & Eck, D. (2019). En- abling Factorized Piano Music Modeling and Generation with the MAE- STRO Dataset. ICLR 2019

work page 2019

[9] [9]

Knopoff, L., & Hutchinson, W. (1981). Information Theory for Mu- sical Continua. Journal of Music Theory , 25(1), 17–44

work page 1981

[10] [10]

Knopoff, L., & Hutchinson, W. (1983). Entropy as a Measure of Style: The Influence of Sample Length. Journal of Music Theory , 27(1), 75–97

work page 1983

[11] [11]

Kong, Q., Li, B., Song, X., W an, Y., & W ang, Y. (2021). High- resolution Piano Transcription with Pedals by Regressing Onset and Offset Times. IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 29, 3707–3717

work page 2021

[12] [12]

Kong, Q., et al. (2020). GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music. arXiv:2010.07061. 30

work page arXiv 2020

[13] [13]

Liu, L., W ei, J., Zhang, H., Xin, J., & Huang, J. (2013). A statis- tical physics view of pitch fluctuations in the classical music from Bach to Chopin: Evidence for scaling. PLOS ONE , 8(3), e58710

work page 2013

[14] [14]

Manaris, B., Romero, J., Machado, P ., Krehbiel, D., Hirzel, T., Pharr, W., & Davis, R. B. (2005). Zipf’s Law, Music Classification, and Aesthetics. Computer Music Journal , 29(1), 55–69

work page 2005

[15] [15]

McKay , C. (2010). Automatic Music Classification with jSymbolic . PhD thesis, McGill University

work page 2010

[16] [16]

T., & Wiggins, G

Pearce, M. T., & Wiggins, G. A. (2006). Expectation in melody: The influence of context and learning. Music Perception, 23(5), 377–405

work page 2006

[17] [17]

J., Salamon, J., Nieto, O., Liang, D., & Ellis, D

Raffel, C., McF ee, B., Humphrey , E. J., Salamon, J., Nieto, O., Liang, D., & Ellis, D. P . W. (2014). mir_eval: A transparent implementation of common MIR metrics. ISMIR 2014

work page 2014

[18] [18]

Sakellariou, J., T ria, F., Loreto, V., & Pachet, F. (2017). Maximum entropy models capture melodic styles. Scientific Reports, 7, 9172

work page 2017

[19] [19]

Serrà, J., Corral, Á., Boguñá, M., Haro, M., & Arcos, J. L. (2019). Zipf’s law in music emerges by a natural choice of Zipfian units. Scientific Reports, 9, 2646

work page 2019

[20] [20]

Information Flow and Repetition in Music

T emperley , D.(2014). Information Flow and Repetition in Music. Jour- nal of Music Theory , 58(2), 155–178

work page 2014

[21] [21]

F., & Clarke, J

V oss, R. F., & Clarke, J. (1975). 1/f noise in music and speech. Nature, 258, 317–318

work page 1975

[22] [22]

Lu, W.-T., W ang, J.-C., Kong, Q., & Hung, Y.-N. (2023). Music Source Separation with Band-Split RoPE Transformer. Sound Demixing Challenge (SDX23) . arXiv:2309.02612

work page arXiv 2023

[23] [23]

F araldo, Á., Jordà, S., & Herrera, P . (2016). A Multi-Profile Method for Key Estimation in EDM. AES Conference on Semantic Audio , 2016

work page 2016

[24] [24]

W eiss, C. (2017). Computational Methods for Tonality-Based Style Anal- ysis of Classical Music Audio Recordings . PhD thesis, Technische Univer- sität Ilmenau

work page 2017

[25] [25]

Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort . Addison-Wesley. 31

work page 1949