pith. sign in

arxiv: 2605.06685 · v1 · submitted 2026-04-25 · 💻 cs.SD · eess.AS· stat.AP

An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

Pith reviewed 2026-05-11 00:44 UTC · model grok-4.3

classification 💻 cs.SD eess.ASstat.AP
keywords audio transcriptionpiano repertoireinformation theoryZipfian distributionharmonic scale degreesKullback-Leibler divergencecomposer profilingneoclassical music
0
0 comments X

The pith

A certified audio transcription pipeline yields composer profiles that separate neoclassical piano artists from historical composers by tighter Zipfian fits in note transitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an audio-to-analysis pipeline that starts with a transcription layer certified at F1 = 0.9791 on the MAESTRO benchmark and then extracts empirical distributions over harmonic scale degrees from 1,238 piano pieces. It computes Shannon entropy to gauge predictability, asymmetric Kullback-Leibler divergence to recover stylistic connections, and Zipfian rank-frequency modeling to measure regularity in transitions. The resulting profiles place historical composers along a narrow entropy axis from 3.33 to 3.86 bits and recover known lineages such as Haydn-Beethoven through the smallest divergences. Neoclassical artists show a mean R² of 0.78 on Zipfian fits versus 0.46 for historical composers, a gap larger than the spread inside either group.

Core claim

Applied to 15 MAESTRO composers with at least ten pieces each spanning Baroque to early twentieth century plus contemporary neoclassical artists, the pipeline shows that neoclassical transition distributions follow Zipf's law more closely, with mean R² = 0.78 versus 0.46 for historical composers. The same profiles order composers by harmonic predictability within a narrow entropy band and recover stylistic lineages through minimal KL divergences, while Mendelssohn appears as a consistent outlier.

What carries the argument

The audio-to-analysis pipeline with a certified transcription layer that produces harmonic scale degree distributions for subsequent entropy, asymmetric KL divergence, and Zipfian rank-frequency analysis.

Load-bearing premise

Distributions over harmonic scale degrees taken from transcribed audio faithfully reflect each composer's compositional vocabulary without substantial distortion from performance practice, transcription errors on non-standard passages, or the scale-degree representation chosen.

What would settle it

A fresh set of transcriptions for the same 1,238 pieces that removes the R² gap between neoclassical (mean 0.78) and historical (mean 0.46) groups while preserving the within-group spreads would falsify the separation result.

Figures

Figures reproduced from arXiv: 2605.06685 by Fred Jalbert-Desforges.

Figure 1
Figure 1. Figure 1: Shannon entropy of the scale-degree marginal, [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Zipfian R² gap between neoclassical artists [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Why the R 2 gap: rank-frequency of scale-degree transitions. Top-30 transitions per composer (dots) against the Zipfian regression line fit on all 225 transitions (dashed red). Slope exponents are comparable (Glass = 1.41, Chopin = 1.40) but the fit quality is radically different: Glass's distribution tracks the line across the full range, while Chopin's distribution is too flat for a power law to explain.… view at source ↗
Figure 4
Figure 4. Figure 4: Figure B.1: KL heatmap 27 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
read the original abstract

We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, J\'ohannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean $R^2 = 0.78$ for neoclassical versus 0.46 for historical (N $\geq$ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an audio-to-analysis pipeline with a certified transcription layer (F1 = 0.9791 on MAESTRO v3.0.0) that extracts empirical distributions over harmonic scale degrees from 1,238 piano pieces across 15 historical composers and 5 contemporary neoclassical artists. These distributions are analyzed using Shannon entropy, asymmetric KL divergence, and Zipfian rank-frequency modeling to produce composer-level profiles, with claims that the profiles order composers by harmonic predictability (narrow entropy range 3.33-3.86 bits), recover known stylistic lineages via minimal KL values, and separate neoclassical from historical composers on Zipfian fit quality (mean R² = 0.78 vs. 0.46, N ≥ 10 pieces each, larger than within-group spread). All estimates use Laplace-smoothed bootstrap 95% CIs.

Significance. If the central claims hold, the work supplies a scalable, benchmark-certified method for quantitative stylistic profiling of piano repertoire directly from audio, which could support reproducible musicological comparisons. The explicit certification of the transcription step with a concrete F1 score and bootstrap CIs on a public dataset is a clear strength that enhances reproducibility. The reported separation on Zipfian regularity, if robust, would constitute a falsifiable, information-theoretic signature of minimalist tendencies in contemporary neoclassical composition.

major comments (2)
  1. [Methods (feature extraction and analysis pipeline)] The extraction of harmonic scale degrees from transcribed MIDI (including key detection, scale representation, and transition definition) is not described with sufficient specificity to allow reproduction or error analysis. This is load-bearing for the transition distributions, entropy values, KL divergences, and especially the Zipfian R² computations that underpin the separation claim.
  2. [Results (composer separation and Zipfian modeling)] The separation result (mean R² = 0.78 neoclassical vs. 0.46 historical) is presented as evidence of a compact transition vocabulary in contemporary artists, but the transcription validation is confined to the MAESTRO classical benchmark. No style-specific error rates, perturbation tests, or sensitivity analysis for contemporary acoustics (e.g., repetitive patterns in Glass or ambient elements in Arnalds) are reported, leaving open the possibility that systematic pitch or duration biases differentially affect the empirical transition probabilities and R² values between groups.
minor comments (2)
  1. [Abstract] The abstract states that all estimates use Laplace-smoothed bootstrap CIs, but the exact smoothing parameter value and its justification are not stated in the provided text; this should be added for full reproducibility.
  2. [Abstract and Results] Notation for composer names (e.g., J'ohannsson) and the precise definition of the harmonic alphabet size used for entropy calculations should be clarified to avoid ambiguity in the narrow reported entropy range.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments identify key areas where greater specificity and robustness checks will strengthen the manuscript. We respond point by point below and commit to the indicated revisions.

read point-by-point responses
  1. Referee: [Methods (feature extraction and analysis pipeline)] The extraction of harmonic scale degrees from transcribed MIDI (including key detection, scale representation, and transition definition) is not described with sufficient specificity to allow reproduction or error analysis. This is load-bearing for the transition distributions, entropy values, KL divergences, and especially the Zipfian R² computations that underpin the separation claim.

    Authors: We agree that the current Methods description of the harmonic scale-degree pipeline is insufficiently detailed for reproduction. In the revised manuscript we will expand the relevant subsection to specify: the key-detection algorithm and its parameters (with reference to the implementation), the exact mapping from MIDI pitches to scale degrees (including treatment of chromaticism, modulations, and non-diatonic notes), and the operational definition of transitions (note-to-note versus chordal, temporal windowing if any). We will also insert pseudocode that traces the full path from aligned MIDI events to the Laplace-smoothed empirical transition matrix. These additions will make the entropy, KL, and Zipfian R² calculations directly replicable. revision: yes

  2. Referee: [Results (composer separation and Zipfian modeling)] The separation result (mean R² = 0.78 neoclassical vs. 0.46 historical) is presented as evidence of a compact transition vocabulary in contemporary artists, but the transcription validation is confined to the MAESTRO classical benchmark. No style-specific error rates, perturbation tests, or sensitivity analysis for contemporary acoustics (e.g., repetitive patterns in Glass or ambient elements in Arnalds) are reported, leaving open the possibility that systematic pitch or duration biases differentially affect the empirical transition probabilities and R² values between groups.

    Authors: We acknowledge that the certified F1 score is reported only on the MAESTRO classical test set and that no style-specific error analysis was performed for the contemporary neoclassical recordings. In the revision we will add a dedicated sensitivity subsection that (i) manually inspects a stratified sample of the contemporary transcriptions for common error patterns (repeated-note omissions, duration smearing in ambient textures) and (ii) applies controlled perturbations to note onsets, offsets, and pitches at rates consistent with the observed MAESTRO error profile, then recomputes the transition distributions and R² values for both groups. The results of this analysis will be reported with the same bootstrap CIs; if the neoclassical–historical gap remains larger than within-group variability, the separation claim will be retained with an explicit robustness statement; otherwise the interpretation will be qualified. revision: yes

Circularity Check

0 steps flagged

No circularity: all profiles are direct empirical computations from transcribed counts

full rationale

The pipeline transcribes audio to obtain empirical distributions over harmonic scale degrees, then applies standard, parameter-light measures (Shannon entropy, asymmetric KL divergence, and R² of Zipfian rank-frequency fit) to those counts. No equation in the described chain defines an output in terms of itself or renames a fitted parameter as a 'prediction.' The reported separation (mean R² 0.78 vs 0.46) is an observed statistical difference between two groups of independently transcribed pieces, not a definitional tautology. Transcription fidelity is validated on the external MAESTRO benchmark rather than by self-reference. No self-citation load-bearing steps or ansatz smuggling appear in the derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard information-theoretic definitions and one domain assumption about what harmonic scale degrees capture; no new entities are introduced and only routine smoothing is used.

free parameters (1)
  • Laplace smoothing parameter
    Applied to empirical distributions before entropy and KL calculations to avoid zero probabilities; value is the conventional additive constant.
axioms (2)
  • standard math Shannon entropy and Kullback-Leibler divergence are the appropriate measures for comparing discrete distributions over scale degrees
    Invoked without derivation as the analytic tools for the profiles.
  • domain assumption Empirical distributions over harmonic scale degrees extracted from transcribed audio reflect compositional vocabulary
    Central premise that allows the pipeline to produce stylistic profiles.

pith-pipeline@v0.9.0 · 5618 in / 1600 out tokens · 68389 ms · 2026-05-11T00:44:32.310961+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. vega-mir: An information-theoretic Python toolkit for symbolic music, with applications to harmonic graphs and rubato spectra

    cs.SD 2026-05 unverdicted novelty 6.0

    vega-mir bundles nine metrics for symbolic music and applies network and spectral analysis to find a 0.61 correlation between composer graph centrality and KL divergence plus structured rubato in Bach performers.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper

  1. [1]

    Agresti, A., & Coull, B. A. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. The American Statistician, 52(2), 119–126

  2. [2]

    Bogdanov, D., et al. (2013). Essentia: An Audio Analysis Library for Music Information Retrieval. ISMIR 2013

  3. [3]

    Bradshaw, L., et al. (2025). Aria-MIDI: A Dataset of Piano MIDI Files for Symbolic Music Modeling. ICLR 2025

  4. [4]

    M., & Thomas, J

    Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley

  5. [5]

    S., & Ariza, C

    Cuthbert, M. S., & Ariza, C. (2010). music21: A toolkit for computer- aided musicology and symbolic music data. ISMIR 2010

  6. [6]

    F ebres, G., & Jaffé, K. (2017). Music viewed by its entropy content: A novel window for comparative analysis. PLOS ONE , 12(10), e0185757

  7. [7]

    Hawthorne, C., Elsen, E., Song, J., Roberts, A., Simon, I., Raf- fel, C., Engel, J., Oore, S., & Eck, D. (2018). Onsets and Frames: Dual-Objective Piano Transcription. ISMIR 2018

  8. [8]

    A., Dieleman, S., Elsen, E., Engel, J., & Eck, D

    Hawthorne, C., Stasyuk, A., Roberts, A., Simon, I., Huang, C.-Z. A., Dieleman, S., Elsen, E., Engel, J., & Eck, D. (2019). En- abling Factorized Piano Music Modeling and Generation with the MAE- STRO Dataset. ICLR 2019

  9. [9]

    Knopoff, L., & Hutchinson, W. (1981). Information Theory for Mu- sical Continua. Journal of Music Theory , 25(1), 17–44

  10. [10]

    Knopoff, L., & Hutchinson, W. (1983). Entropy as a Measure of Style: The Influence of Sample Length. Journal of Music Theory , 27(1), 75–97

  11. [11]

    Kong, Q., Li, B., Song, X., W an, Y., & W ang, Y. (2021). High- resolution Piano Transcription with Pedals by Regressing Onset and Offset Times. IEEE/ACM Transactions on Audio, Speech, and Language Pro- cessing, 29, 3707–3717

  12. [12]

    Kong, Q., et al. (2020). GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music. arXiv:2010.07061. 30

  13. [13]

    Liu, L., W ei, J., Zhang, H., Xin, J., & Huang, J. (2013). A statis- tical physics view of pitch fluctuations in the classical music from Bach to Chopin: Evidence for scaling. PLOS ONE , 8(3), e58710

  14. [14]

    Manaris, B., Romero, J., Machado, P ., Krehbiel, D., Hirzel, T., Pharr, W., & Davis, R. B. (2005). Zipf’s Law, Music Classification, and Aesthetics. Computer Music Journal , 29(1), 55–69

  15. [15]

    McKay , C. (2010). Automatic Music Classification with jSymbolic . PhD thesis, McGill University

  16. [16]

    T., & Wiggins, G

    Pearce, M. T., & Wiggins, G. A. (2006). Expectation in melody: The influence of context and learning. Music Perception, 23(5), 377–405

  17. [17]

    J., Salamon, J., Nieto, O., Liang, D., & Ellis, D

    Raffel, C., McF ee, B., Humphrey , E. J., Salamon, J., Nieto, O., Liang, D., & Ellis, D. P . W. (2014). mir_eval: A transparent implementation of common MIR metrics. ISMIR 2014

  18. [18]

    Sakellariou, J., T ria, F., Loreto, V., & Pachet, F. (2017). Maximum entropy models capture melodic styles. Scientific Reports, 7, 9172

  19. [19]

    Serrà, J., Corral, Á., Boguñá, M., Haro, M., & Arcos, J. L. (2019). Zipf’s law in music emerges by a natural choice of Zipfian units. Scientific Reports, 9, 2646

  20. [20]

    Information Flow and Repetition in Music

    T emperley , D.(2014). Information Flow and Repetition in Music. Jour- nal of Music Theory , 58(2), 155–178

  21. [21]

    F., & Clarke, J

    V oss, R. F., & Clarke, J. (1975). 1/f noise in music and speech. Nature, 258, 317–318

  22. [22]

    Lu, W.-T., W ang, J.-C., Kong, Q., & Hung, Y.-N. (2023). Music Source Separation with Band-Split RoPE Transformer. Sound Demixing Challenge (SDX23) . arXiv:2309.02612

  23. [23]

    F araldo, Á., Jordà, S., & Herrera, P . (2016). A Multi-Profile Method for Key Estimation in EDM. AES Conference on Semantic Audio , 2016

  24. [24]

    W eiss, C. (2017). Computational Methods for Tonality-Based Style Anal- ysis of Classical Music Audio Recordings . PhD thesis, Technische Univer- sität Ilmenau

  25. [25]

    Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort . Addison-Wesley. 31