An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire
Pith reviewed 2026-05-11 00:44 UTC · model grok-4.3
The pith
A certified audio transcription pipeline yields composer profiles that separate neoclassical piano artists from historical composers by tighter Zipfian fits in note transitions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applied to 15 MAESTRO composers with at least ten pieces each spanning Baroque to early twentieth century plus contemporary neoclassical artists, the pipeline shows that neoclassical transition distributions follow Zipf's law more closely, with mean R² = 0.78 versus 0.46 for historical composers. The same profiles order composers by harmonic predictability within a narrow entropy band and recover stylistic lineages through minimal KL divergences, while Mendelssohn appears as a consistent outlier.
What carries the argument
The audio-to-analysis pipeline with a certified transcription layer that produces harmonic scale degree distributions for subsequent entropy, asymmetric KL divergence, and Zipfian rank-frequency analysis.
Load-bearing premise
Distributions over harmonic scale degrees taken from transcribed audio faithfully reflect each composer's compositional vocabulary without substantial distortion from performance practice, transcription errors on non-standard passages, or the scale-degree representation chosen.
What would settle it
A fresh set of transcriptions for the same 1,238 pieces that removes the R² gap between neoclassical (mean 0.78) and historical (mean 0.46) groups while preserving the within-group spreads would falsify the separation result.
Figures
read the original abstract
We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, J\'ohannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean $R^2 = 0.78$ for neoclassical versus 0.46 for historical (N $\geq$ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an audio-to-analysis pipeline with a certified transcription layer (F1 = 0.9791 on MAESTRO v3.0.0) that extracts empirical distributions over harmonic scale degrees from 1,238 piano pieces across 15 historical composers and 5 contemporary neoclassical artists. These distributions are analyzed using Shannon entropy, asymmetric KL divergence, and Zipfian rank-frequency modeling to produce composer-level profiles, with claims that the profiles order composers by harmonic predictability (narrow entropy range 3.33-3.86 bits), recover known stylistic lineages via minimal KL values, and separate neoclassical from historical composers on Zipfian fit quality (mean R² = 0.78 vs. 0.46, N ≥ 10 pieces each, larger than within-group spread). All estimates use Laplace-smoothed bootstrap 95% CIs.
Significance. If the central claims hold, the work supplies a scalable, benchmark-certified method for quantitative stylistic profiling of piano repertoire directly from audio, which could support reproducible musicological comparisons. The explicit certification of the transcription step with a concrete F1 score and bootstrap CIs on a public dataset is a clear strength that enhances reproducibility. The reported separation on Zipfian regularity, if robust, would constitute a falsifiable, information-theoretic signature of minimalist tendencies in contemporary neoclassical composition.
major comments (2)
- [Methods (feature extraction and analysis pipeline)] The extraction of harmonic scale degrees from transcribed MIDI (including key detection, scale representation, and transition definition) is not described with sufficient specificity to allow reproduction or error analysis. This is load-bearing for the transition distributions, entropy values, KL divergences, and especially the Zipfian R² computations that underpin the separation claim.
- [Results (composer separation and Zipfian modeling)] The separation result (mean R² = 0.78 neoclassical vs. 0.46 historical) is presented as evidence of a compact transition vocabulary in contemporary artists, but the transcription validation is confined to the MAESTRO classical benchmark. No style-specific error rates, perturbation tests, or sensitivity analysis for contemporary acoustics (e.g., repetitive patterns in Glass or ambient elements in Arnalds) are reported, leaving open the possibility that systematic pitch or duration biases differentially affect the empirical transition probabilities and R² values between groups.
minor comments (2)
- [Abstract] The abstract states that all estimates use Laplace-smoothed bootstrap CIs, but the exact smoothing parameter value and its justification are not stated in the provided text; this should be added for full reproducibility.
- [Abstract and Results] Notation for composer names (e.g., J'ohannsson) and the precise definition of the harmonic alphabet size used for entropy calculations should be clarified to avoid ambiguity in the narrow reported entropy range.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments identify key areas where greater specificity and robustness checks will strengthen the manuscript. We respond point by point below and commit to the indicated revisions.
read point-by-point responses
-
Referee: [Methods (feature extraction and analysis pipeline)] The extraction of harmonic scale degrees from transcribed MIDI (including key detection, scale representation, and transition definition) is not described with sufficient specificity to allow reproduction or error analysis. This is load-bearing for the transition distributions, entropy values, KL divergences, and especially the Zipfian R² computations that underpin the separation claim.
Authors: We agree that the current Methods description of the harmonic scale-degree pipeline is insufficiently detailed for reproduction. In the revised manuscript we will expand the relevant subsection to specify: the key-detection algorithm and its parameters (with reference to the implementation), the exact mapping from MIDI pitches to scale degrees (including treatment of chromaticism, modulations, and non-diatonic notes), and the operational definition of transitions (note-to-note versus chordal, temporal windowing if any). We will also insert pseudocode that traces the full path from aligned MIDI events to the Laplace-smoothed empirical transition matrix. These additions will make the entropy, KL, and Zipfian R² calculations directly replicable. revision: yes
-
Referee: [Results (composer separation and Zipfian modeling)] The separation result (mean R² = 0.78 neoclassical vs. 0.46 historical) is presented as evidence of a compact transition vocabulary in contemporary artists, but the transcription validation is confined to the MAESTRO classical benchmark. No style-specific error rates, perturbation tests, or sensitivity analysis for contemporary acoustics (e.g., repetitive patterns in Glass or ambient elements in Arnalds) are reported, leaving open the possibility that systematic pitch or duration biases differentially affect the empirical transition probabilities and R² values between groups.
Authors: We acknowledge that the certified F1 score is reported only on the MAESTRO classical test set and that no style-specific error analysis was performed for the contemporary neoclassical recordings. In the revision we will add a dedicated sensitivity subsection that (i) manually inspects a stratified sample of the contemporary transcriptions for common error patterns (repeated-note omissions, duration smearing in ambient textures) and (ii) applies controlled perturbations to note onsets, offsets, and pitches at rates consistent with the observed MAESTRO error profile, then recomputes the transition distributions and R² values for both groups. The results of this analysis will be reported with the same bootstrap CIs; if the neoclassical–historical gap remains larger than within-group variability, the separation claim will be retained with an explicit robustness statement; otherwise the interpretation will be qualified. revision: yes
Circularity Check
No circularity: all profiles are direct empirical computations from transcribed counts
full rationale
The pipeline transcribes audio to obtain empirical distributions over harmonic scale degrees, then applies standard, parameter-light measures (Shannon entropy, asymmetric KL divergence, and R² of Zipfian rank-frequency fit) to those counts. No equation in the described chain defines an output in terms of itself or renames a fitted parameter as a 'prediction.' The reported separation (mean R² 0.78 vs 0.46) is an observed statistical difference between two groups of independently transcribed pieces, not a definitional tautology. Transcription fidelity is validated on the external MAESTRO benchmark rather than by self-reference. No self-citation load-bearing steps or ansatz smuggling appear in the derivation.
Axiom & Free-Parameter Ledger
free parameters (1)
- Laplace smoothing parameter
axioms (2)
- standard math Shannon entropy and Kullback-Leibler divergence are the appropriate measures for comparing discrete distributions over scale degrees
- domain assumption Empirical distributions over harmonic scale degrees extracted from transcribed audio reflect compositional vocabulary
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
Zipfian rank-frequency fits on the 15×15 transition distribution... mean R² = 0.78 for neoclassical versus 0.46 for historical
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
vega-mir: An information-theoretic Python toolkit for symbolic music, with applications to harmonic graphs and rubato spectra
vega-mir bundles nine metrics for symbolic music and applies network and spectral analysis to find a 0.61 correlation between composer graph centrality and KL divergence plus structured rubato in Bach performers.
Reference graph
Works this paper leans on
-
[1]
Agresti, A., & Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2), 119–126
work page 1998
-
[2]
Bogdanov, D., et al. (2013). Essentia: An Audio Analysis Library for Music Information Retrieval. ISMIR 2013
work page 2013
-
[3]
Bradshaw, L., et al. (2025). Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling. ICLR 2025
work page 2025
-
[4]
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley
work page 2006
-
[5]
Cuthbert, M. S., & Ariza, C. (2010). music21: A toolkit for computer- aided musicology and symbolic music data. ISMIR 2010
work page 2010
-
[6]
F ebres, G., & Jaffé, K. (2017). Music viewed by its entropy content: A novel window for comparative analysis. PLOS ONE , 12(10), e0185757
work page 2017
-
[7]
Hawthorne, C., Elsen, E., Song, J., Roberts, A., Simon, I., Raf- fel, C., Engel, J., Oore, S., & Eck, D. (2018). Onsets and Frames: Dual-Objective Piano Transcription. ISMIR 2018
work page 2018
-
[8]
A., Dieleman, S., Elsen, E., Engel, J., & Eck, D
Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C.-Z. A., Dieleman, S., Elsen, E., Engel, J., & Eck, D. (2019). En- abling Factorized Piano Music Modeling and Generation with the MAE- STRO Dataset. ICLR 2019
work page 2019
-
[9]
Knopoff, L., & Hutchinson, W. (1981). Information Theory for Mu- sical Continua. Journal of Music Theory , 25(1), 17–44
work page 1981
-
[10]
Knopoff, L., & Hutchinson, W. (1983). Entropy as a Measure of Style: The Influence of Sample Length. Journal of Music Theory , 27(1), 75–97
work page 1983
-
[11]
Kong, Q., Li, B., Song, X., W an, Y., & W ang, Y. (2021). High- resolution Piano Transcription with Pedals by Regressing Onset and Offset Times. IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 29, 3707–3717
work page 2021
- [12]
-
[13]
Liu, L., W ei, J., Zhang, H., Xin, J., & Huang, J. (2013). A statis- tical physics view of pitch fluctuations in the classical music from Bach to Chopin: Evidence for scaling. PLOS ONE , 8(3), e58710
work page 2013
-
[14]
Manaris, B., Romero, J., Machado, P ., Krehbiel, D., Hirzel, T., Pharr, W., & Davis, R. B. (2005). Zipf’s Law, Music Classification, and Aesthetics. Computer Music Journal , 29(1), 55–69
work page 2005
-
[15]
McKay , C. (2010). Automatic Music Classification with jSymbolic . PhD thesis, McGill University
work page 2010
-
[16]
Pearce, M. T., & Wiggins, G. A. (2006). Expectation in melody: The influence of context and learning. Music Perception, 23(5), 377–405
work page 2006
-
[17]
J., Salamon, J., Nieto, O., Liang, D., & Ellis, D
Raffel, C., McF ee, B., Humphrey , E. J., Salamon, J., Nieto, O., Liang, D., & Ellis, D. P . W. (2014). mir_eval: A transparent implementation of common MIR metrics. ISMIR 2014
work page 2014
-
[18]
Sakellariou, J., T ria, F., Loreto, V., & Pachet, F. (2017). Maximum entropy models capture melodic styles. Scientific Reports, 7, 9172
work page 2017
-
[19]
Serrà, J., Corral, Á., Boguñá, M., Haro, M., & Arcos, J. L. (2019). Zipf’s law in music emerges by a natural choice of Zipfian units. Scientific Reports, 9, 2646
work page 2019
-
[20]
Information Flow and Repetition in Music
T emperley , D.(2014). Information Flow and Repetition in Music. Jour- nal of Music Theory , 58(2), 155–178
work page 2014
-
[21]
V oss, R. F., & Clarke, J. (1975). 1/f noise in music and speech. Nature, 258, 317–318
work page 1975
- [22]
-
[23]
F araldo, Á., Jordà, S., & Herrera, P . (2016). A Multi-Profile Method for Key Estimation in EDM. AES Conference on Semantic Audio , 2016
work page 2016
-
[24]
W eiss, C. (2017). Computational Methods for Tonality-Based Style Anal- ysis of Classical Music Audio Recordings . PhD thesis, Technische Univer- sität Ilmenau
work page 2017
-
[25]
Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort . Addison-Wesley. 31
work page 1949
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.