pith. sign in

arxiv: 2604.16658 · v1 · submitted 2026-04-17 · 💻 cs.SD

Coexisting Tempo Traditions in Beethoven's Piano and Cello Sonatas: A K-means Clustering Analysis of Recorded Performances, 1930-2012

Pith reviewed 2026-05-10 06:46 UTC · model grok-4.3

classification 💻 cs.SD
keywords tempo traditionsk-means clusteringBeethoven sonatasrecorded performancesperformance analysisempirical musicologystylistic changebar-level tempo data
0
0 comments X

The pith

Beethoven sonata recordings reveal multiple stable tempo traditions rather than uniform stylistic evolution over time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper challenges conventional models that treat tempo change as a single historical trend fitted by linear regression to recording year. Instead it groups more than one hundred recordings of five Beethoven piano and cello sonatas by applying k-means clustering directly to bar-by-bar tempo measurements. The resulting groups—slow, mid-range, and fast—remain internally stable across eight decades, with almost no slope when tempo is plotted against year inside each cluster. The mid-range group accounts for most recordings in every movement, while fast-character movements lack a slow group, indicating a shared sense of their character. Performer nationality, generation, or training shows no link to cluster membership, pointing to individual choice as the driver.

Core claim

Applying k-means clustering with k equal to three to bar-level BPM data from over one hundred recordings of Beethoven's five piano and cello sonatas spanning 1930-2012 shows that every movement contains at least two and usually three discrete tempo traditions. These slow, mid-range, and fast traditions exhibit negligible internal regression slopes against recording year, with R-squared values at or below 0.25 in all but one case. The mid-range tradition dominates, typically holding 55 to 70 percent of recordings, while a slow tradition is absent from fast-character movements such as the Op. 5 rondos and the Op. 69 scherzo. No correlation appears between cluster assignment and performers' age

What carries the argument

k-means clustering on bar-level beats-per-minute measurements that partitions recordings into stable tempo groups whose internal tempo-year slopes are near zero.

If this is right

  • Corpus studies of performance should replace single-line regression models with descriptions of shifting prevalence among multiple traditions.
  • Tempo choice in these sonatas reflects individual interpretive decisions rather than shared cultural or pedagogical inheritance.
  • Fast movements maintain a consensus against slow tempos, visible as the absence of a slow cluster.
  • One moderate exception exists: the mid-range group in Op. 102 No. 1 Allegro con brio shows a small deceleration of about 3.2 BPM over the period.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same clustering method could be tested on recordings of other composers to determine whether plural stable traditions are common or specific to Beethoven.
  • Performance history might be tracked by measuring changes in the relative sizes of the identified tempo groups rather than average tempo alone.
  • The dominance of the mid-range cluster suggests it may represent a practical balance between expressiveness and technical feasibility across many performers.

Load-bearing premise

That the three clusters produced by k-means on the tempo data reflect genuine, distinct interpretive traditions rather than artifacts of the chosen number of groups or the particular recordings analyzed.

What would settle it

Re-running the analysis on an expanded set of recordings or with a different number of clusters and finding either that the groups lose their distinctness or that substantial tempo drift appears within groups would undermine the claim of stable coexisting traditions.

Figures

Figures reproduced from arXiv: 2604.16658 by Ignasi Sole.

Figure 1
Figure 1. Figure 1: Scatter plot of average tempo by recording [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Scatter plot of average tempo by recording [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 2
Figure 2. Figure 2: Scatter plot of average tempo by recording year [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scatter plot of average tempo by recording [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Scatter plot of average tempo by recording year [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Scatter plot of average tempo by recording year [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: Scatter plot of average tempo by record [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Scatter plot of average tempo by recording year [PITH_FULL_IMAGE:figures/full_fig_p007_9.png] view at source ↗
read the original abstract

Empirical studies of recorded performance have conventionally modelled tempo change as a unidirectional historical process, fitting linear regression lines to tempo data plotted against recording year. This paper argues that such approaches impose a false narrative of uniform stylistic evolution on what is, in fact, a plurality of coexisting interpretive traditions. Applying k-means clustering (k=3) to bar-level BPM data from over one hundred recordings of Beethoven's five piano and cello sonatas (Op. 5 Nos. 1 and 2; Op. 69; Op. 102 Nos. 1 and 2) spanning 1930-2012, this study reveals that every movement supports at least two, and usually three, discrete tempo traditions (slow, mid-range, and fast), whose internal regression slopes are negligible (R-squared <= 0.25 in all but one case), demonstrating that each tradition is independently stable across eight decades. The mid-range cluster dominates in all movements, typically comprising 55-70% of recordings. A slow cluster is absent from fast-character movements (Op. 5 Rondos, Op. 69 Scherzo), reflecting a shared rhetorical consensus about their character. The single case of significant intra-cluster drift (Op. 102 No. 1 Allegro con brio, R-squared=0.246, p=0.013) indicates a moderate mid-range deceleration of approximately 3.2 BPM across the study period. No correlation is found between cluster membership and performers' generational, national, or pedagogical backgrounds, suggesting that tempo tradition reflects individual interpretive choice rather than collective cultural inheritance. The paper proposes an ecological model of stylistic change - coexisting traditions shifting in relative prevalence rather than a single tradition evolving - and argues that this reframing has broad implications for how empirical performance studies interpret corpus-level tempo data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript claims that conventional linear regression models of tempo evolution in recorded performances impose a false narrative of uniform change. By applying k-means clustering with k=3 to bar-level BPM data from over one hundred recordings of Beethoven's piano and cello sonatas spanning 1930-2012, the authors identify at least two, usually three, discrete tempo traditions per movement that exhibit stable tempos over time, as shown by low within-cluster R-squared values (≤0.25) for regressions against recording year. The mid-range tradition dominates, slow traditions are absent in fast movements, and no correlation with performer backgrounds is found, supporting an ecological model of coexisting traditions rather than unidirectional evolution.

Significance. If the clustering results are robust, this work could have substantial significance for empirical performance studies by providing an alternative framework to linear historical models. The identification of stable, coexisting tempo traditions across eight decades, based on a large corpus of recordings, offers a data-driven challenge to prevailing assumptions about stylistic change. Strengths include the use of bar-level data and the finding of negligible intra-cluster drift, which could encourage more nuanced interpretations of corpus-level tempo trends. However, the current lack of cluster validation limits the immediate impact.

major comments (3)
  1. The application of k-means clustering with a fixed k=3 is not supported by any reported validation procedure, such as the elbow method, silhouette scores, or gap statistics. This omission is load-bearing for the central claim, as the existence of discrete 'tempo traditions' hinges on the clusters reflecting genuine structure in the bar-level BPM data rather than an arbitrary partitioning.
  2. The low within-cluster R-squared values (≤0.25) are cited as evidence of stability, but without accompanying cluster validity indices or a demonstration that the tempo distributions are multimodal, the partitions could be artifacts of the algorithm. The paper should include sensitivity tests for different k values and alternative clustering methods to confirm the robustness of the three-tradition finding.
  3. The abstract and available text lack specifics on the exact number of recordings per movement, the criteria for selecting the recordings from 1930-2012, and the methodology for extracting bar-level BPM values. These details are necessary to evaluate potential biases and to allow replication of the analysis.
minor comments (1)
  1. The abstract mentions 'over one hundred recordings' but does not specify the exact number or breakdown per sonata/movement.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important areas for strengthening the methodological transparency and robustness of the clustering analysis. We address each major comment below and have prepared revisions to incorporate additional validation, sensitivity tests, and expanded methodological details.

read point-by-point responses
  1. Referee: The application of k-means clustering with a fixed k=3 is not supported by any reported validation procedure, such as the elbow method, silhouette scores, or gap statistics. This omission is load-bearing for the central claim, as the existence of discrete 'tempo traditions' hinges on the clusters reflecting genuine structure in the bar-level BPM data rather than an arbitrary partitioning.

    Authors: The choice of k=3 was motivated by the conventional division of tempo into slow, mid-range, and fast categories in performance studies, which aligns with the observed data structure across movements. We acknowledge that formal validation metrics were not reported in the original submission. The revised manuscript will include elbow plots, silhouette scores, and gap statistics to support k=3, together with results for k=2 and k=4 to demonstrate that three clusters provide the most stable and interpretable partition. revision: yes

  2. Referee: The low within-cluster R-squared values (≤0.25) are cited as evidence of stability, but without accompanying cluster validity indices or a demonstration that the tempo distributions are multimodal, the partitions could be artifacts of the algorithm. The paper should include sensitivity tests for different k values and alternative clustering methods to confirm the robustness of the three-tradition finding.

    Authors: The low within-cluster R-squared values are presented as evidence that tempo does not drift systematically within each group over time. To address the concern that the partitions may be algorithmic artifacts, the revision will add cluster validity indices (silhouette coefficient and Davies-Bouldin index), kernel density estimates to assess multimodality of the bar-level BPM distributions, and sensitivity analyses using both different k values and an alternative method (Gaussian mixture models). These additions will be placed in a new subsection of the methods and results. revision: yes

  3. Referee: The abstract and available text lack specifics on the exact number of recordings per movement, the criteria for selecting the recordings from 1930-2012, and the methodology for extracting bar-level BPM values. These details are necessary to evaluate potential biases and to allow replication of the analysis.

    Authors: We agree that these details are required for replication and bias assessment. The original manuscript states the overall corpus size (>100 recordings) but does not break it down per movement or fully specify selection and extraction procedures. The revised version will expand the data and methods section to report the exact number of recordings for each movement, the inclusion criteria (commercial releases with sufficient audio quality and complete movements), and the bar-level BPM extraction pipeline (automated beat tracking with manual correction and tempo smoothing). revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in the derivation chain

full rationale

The paper applies unsupervised k-means clustering (k=3) directly to independent bar-level BPM measurements drawn from historical recordings and then performs separate linear regressions of BPM against recording year within each resulting cluster. This sequence does not reduce any central claim to a self-definitional loop, a fitted parameter renamed as prediction, or a load-bearing self-citation; the clustering partitions tempo values while the within-cluster R-squared tests constitute an independent check for temporal drift. No uniqueness theorems, ansatzes, or renamings of known results are invoked in a circular manner, leaving the derivation self-contained against the external corpus of performance data.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on interpreting statistical clusters as substantive 'tempo traditions' and on the assumption that low within-cluster R-squared indicates stability rather than insufficient data or model misspecification. The number of clusters is fixed at three without reported justification.

free parameters (1)
  • k (number of clusters)
    Fixed at 3 for k-means; the abstract does not report how this value was selected or validated against alternatives.
axioms (1)
  • domain assumption Bar-level BPM measurements from selected recordings accurately capture distinct interpretive tempo traditions
    The paper treats tempo clusters as proxies for stable traditions without external validation such as performer interviews or score analysis.
invented entities (1)
  • coexisting tempo traditions no independent evidence
    purpose: To model the observed clusters as independently stable interpretive approaches rather than points on a historical continuum
    These are defined by the clustering output; no independent evidence outside the tempo data is cited.

pith-pipeline@v0.9.0 · 5646 in / 1676 out tokens · 107170 ms · 2026-05-10T06:46:52.182727+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

  1. [1]

    Tempo, Duration and Flexibility: Tech- niques in the Analysis of Performance,

    J. A. Bowen, “Tempo, Duration and Flexibility: Tech- niques in the Analysis of Performance,”Journal of Musi- cological Research, vol. 16, pp. 111–156, 1996

  2. [2]

    Performance Analysis and Chopin’s Mazurkas,

    N. Cook, “Performance Analysis and Chopin’s Mazurkas,”Musicae Scientiae, vol.11, no.2, pp.183–207, 2007

  3. [3]

    Leech-Wilkinson,The Changing Sound of Music: Approaches to Studying Recorded Musical Performance

    D. Leech-Wilkinson,The Changing Sound of Music: Approaches to Studying Recorded Musical Performance. London: CHARM, 2009. [Online]. Available:https:// www.charm.kcl.ac.uk/studies/chapters/chap5.html

  4. [4]

    Philip,Early Recordings and Musical Style: Changing Tastes in Instrumental Performance, 1900–1950

    R. Philip,Early Recordings and Musical Style: Changing Tastes in Instrumental Performance, 1900–1950. Cam- bridge: Cambridge University Press, 1992

  5. [5]

    Patterns of Expressive Timing in Performances of a Beethoven Minuet by Nineteen Famous Pianists,

    B. Repp, “Patterns of Expressive Timing in Performances of a Beethoven Minuet by Nineteen Famous Pianists,” Journal of the Acoustical Society of America, vol. 88, no. 2, pp. 622–641, 1990

  6. [6]

    Musical Genre Classification of Audio Signals,

    G. Tzanetakis and P. Cook, “Musical Genre Classification of Audio Signals,”IEEE Transactions on Speech and Au- dio Processing, vol. 10, no. 5, pp. 293–302, 2002

  7. [7]

    k-means++: The Advan- tagesofCarefulSeeding,

    D. Arthur and S. Vassilvitskii, “k-means++: The Advan- tagesofCarefulSeeding,” inProceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), New Orleans, 2007, pp. 1027–1035

  8. [8]

    An Interdisciplinary Review of Music Performance Analy- sis,

    A. Pati, A. Lerch, C. Arthur, and S. Gururani, “An Interdisciplinary Review of Music Performance Analy- sis,”Transactions of the International Society for Music Information Retrieval, vol. 3, pp. 221–245, 2020. DOI: 10.5334/tismir.53

  9. [9]

    A Manual Bar-by-Bar Tempo Measurement Protocol for Polyphonic Chamber Music Recordings: Design, Validation, and Application to Beethoven's Piano and Cello Sonatas

    I. Sole, “A Manual Bar-by-Bar Tempo Measurement Pro- tocol for Polyphonic Chamber Music Recordings: De- sign, Validation, and Application to Beethoven’s Piano and Cello Sonatas,”arXiv preprint arXiv:2604.15278, Apr. 2026. [Online]. Available:https://arxiv.org/abs/ 2604.15278

  10. [10]

    Scikit-learn: Machine Learning in Python,

    F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,”Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011

  11. [11]

    Noorduin,Beethoven’s Tempo Indications

    M. Noorduin,Beethoven’s Tempo Indications. PhD dis- sertation, University of Manchester, 2016. [Online]. Available:https://www.escholar.manchester.ac.uk/ uk-ac-man-scw:302884

  12. [12]

    TempoandCharacterinBeethoven’sMusic,

    R.Kolisch, “TempoandCharacterinBeethoven’sMusic,” The Musical Quarterly, vol. 77, no. 1, pp. 90–131, Spring 1993

  13. [13]

    Katz,Capturing Sound: How Technology Has Changed Music

    M. Katz,Capturing Sound: How Technology Has Changed Music. Berkeley: University of California Press, 2004

  14. [14]

    Characterizing Tempo Change in Musical Performances,

    R. Dannenberg and S. Mohan, “Characterizing Tempo Change in Musical Performances,” inProceedings of the International Computer Music Conference 2011, Univer- sityofHuddersfield, July31–August5, 2011, pp.650–656

  15. [15]

    Ward’s Hierarchical Ag- glomerative Clustering Method: Which Algorithms Im- plement Ward’s Criterion?

    M. Murtagh and P. Legendre, “Ward’s Hierarchical Ag- glomerative Clustering Method: Which Algorithms Im- plement Ward’s Criterion?”Journal of Classification, vol. 31, no. 3, pp. 274–295, 2014

  16. [16]

    Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,

    P. J. Rousseeuw, “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis,”Jour- nal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987