pith. sign in

arxiv: 2605.16539 · v1 · pith:FVGTENEJnew · submitted 2026-05-15 · 💻 cs.SD · physics.data-an

vega-mir: An information-theoretic Python toolkit for symbolic music, with applications to harmonic graphs and rubato spectra

Pith reviewed 2026-05-19 21:19 UTC · model grok-4.3

classification 💻 cs.SD physics.data-an
keywords rubatoanalysiscasecorpusgouldstudiescygnusdeployed
0
0 comments X

The pith

vega-mir bundles nine metrics for symbolic music and applies network and spectral analysis to find a 0.61 correlation between composer graph centrality and KL divergence plus structured rubato in Bach performers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The toolkit collects standard tools like Shannon entropy, Kullback-Leibler divergence, and Zipf fits along with newer ones such as chord-transition network analysis and spectral analysis of timing curves. These are wrapped in a simple API and tested on existing music datasets. In one study the authors build graphs where nodes are chords and edges show transitions, then compute PageRank on a central node and compare it to how much each composer's music diverges from the average. In the second study they measure periodic patterns in how three pianists vary the speed of Bach pieces, finding that the performer with the smallest overall timing changes actually shows the most regular repeating structure. The library also includes checks like Gini coefficients and fractal dimensions run on a small set of eight composers to confirm the metrics behave as expected.

Core claim

On the fourteen MAESTRO composers with N >= 10 pieces, the PageRank value of the gravity-centre node correlates with the marginal Kullback-Leibler distance at rho = 0.61 (Spearman, composer-level jackknife N = 14); Gould holds the highest periodicity ratio of the three performers on the 247-piece Bach corpus.

Load-bearing premise

The gravity-centre node in the chord-transition graphs and the chosen rubato curve extraction method are assumed to capture musically meaningful structure without additional validation against human judgments or alternative graph constructions, as implied by the case-study descriptions in the abstract.

Figures

Figures reproduced from arXiv: 2605.16539 by Fred Jalbert-Desforges.

Figure 1
Figure 1. Figure 1: Harmonic network signatures across 14 MAESTRO composers [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Rubato spectral signatures across Bach masters (Schiff, Gould, Richter) [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
read the original abstract

We present vega-mir, an open-source Python library that bundles nine information-theoretic and statistical metrics for the analysis of symbolic music corpora behind a small, tested, citable API, and demonstrates two of them at corpus scale in case studies not addressed by the upstream Cygnus paper. Of the nine metrics, three (Shannon entropy, Kullback-Leibler divergence, Zipfian fits) were deployed in the companion Cygnus arXiv preprint; two (network analysis on chord-transition graphs and spectral analysis of rubato curves) are deployed in full case studies here; the four remaining (multi-dimensional Gini, chi-squared stationarity, Higuchi fractal dimension, interval distribution) are validated against analytic anchors and exercised as sanity checks on a bundled 8-composer dataset. The two case studies yield two main observations. First, on the fourteen MAESTRO composers with N >= 10 pieces, the PageRank value of the gravity-centre node correlates with the marginal Kullback-Leibler distance at rho = 0.61 (Spearman, composer-level jackknife N = 14); the categorical gravity-centre identity takes five distinct values across the corpus but is not itself correlated with marginal KL (rho = 0.13, p = 0.21). Second, on the 247-piece Bach multi-master corpus (Schiff, Gould, Richter), Gould holds the highest periodicity ratio of the three performers, not the lowest, inverting the clich\'e that low scalar rubato reads as "metronomic": Gould's rubato is small in amplitude but structured in time, with a median dominant period of 66 beats against Schiff's 102 and Richter's 104.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces vega-mir, an open-source Python toolkit bundling nine information-theoretic and statistical metrics for symbolic music analysis. Three metrics overlap with the companion Cygnus preprint; the remaining six are validated on an 8-composer dataset or deployed in two corpus-scale case studies. The first case study constructs chord-transition graphs on the MAESTRO corpus and reports a Spearman correlation of rho = 0.61 (composer-level jackknife, N = 14) between PageRank of the gravity-centre node and marginal Kullback-Leibler distance; the categorical gravity-centre identity itself shows no correlation (rho = 0.13). The second case study extracts rubato curves from a 247-piece Bach multi-performer corpus and finds that Gould exhibits the highest periodicity ratio, with median dominant period 66 beats versus 102 and 104 for Schiff and Richter.

Significance. If the empirical claims survive validation, the work supplies a citable, tested API that lowers the barrier to reproducible information-theoretic analyses of symbolic music. The open-source release, bundled sanity-check dataset, and explicit extension of prior Cygnus results are concrete strengths. The harmonic-graph and rubato-spectral demonstrations illustrate practical utility, though their musical interpretability depends on the soundness of the gravity-centre and periodicity-ratio constructions.

major comments (2)
  1. [Abstract] Abstract, first case study: the rho = 0.61 correlation between PageRank of the gravity-centre node and marginal KL distance is load-bearing for the central claim, yet the manuscript supplies neither an ablation against simpler graph statistics (degree, betweenness, or modal-chord centrality) nor external validation (expert salience ratings on a held-out subset). Without these checks the reported link remains compatible with pipeline artifact.
  2. [Abstract] Abstract, second case study: the claim that Gould holds the highest periodicity ratio (median dominant period 66 beats) rests on an unspecified rubato-curve extraction method and supplies no error bars, sensitivity analysis, or comparison to alternative spectral estimators, leaving the inversion of the 'metronomic' cliché only partially supported.
minor comments (2)
  1. The abstract refers to 'the upstream Cygnus paper' without a complete bibliographic entry; a full citation should appear in the reference list.
  2. Implementation details for the gravity-centre node (exact selection rule, handling of self-loops, edge weighting) are not stated in the abstract and should be moved to the methods section or supplementary material for reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive report and the opportunity to clarify the manuscript. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract, first case study: the rho = 0.61 correlation between PageRank of the gravity-centre node and marginal KL distance is load-bearing for the central claim, yet the manuscript supplies neither an ablation against simpler graph statistics (degree, betweenness, or modal-chord centrality) nor external validation (expert salience ratings on a held-out subset). Without these checks the reported link remains compatible with pipeline artifact.

    Authors: We agree that the reported Spearman correlation (rho = 0.61, composer-level jackknife N = 14) is central to the first case study. The manuscript already notes that the categorical gravity-centre identity itself shows no correlation with marginal KL (rho = 0.13, p = 0.21), which offers a basic control against the result being driven solely by node selection. However, we did not conduct ablations against degree, betweenness, or modal-chord centrality. We will add these comparisons in the revised manuscript to test specificity to PageRank. External validation via expert salience ratings would require a separate human-subject study outside the scope of this toolkit paper; we will explicitly discuss this limitation in the revised discussion section. revision: partial

  2. Referee: [Abstract] Abstract, second case study: the claim that Gould holds the highest periodicity ratio (median dominant period 66 beats) rests on an unspecified rubato-curve extraction method and supplies no error bars, sensitivity analysis, or comparison to alternative spectral estimators, leaving the inversion of the 'metronomic' cliché only partially supported.

    Authors: The rubato-curve extraction procedure is specified in the full manuscript (Section 4.2): tempo curves are derived from MIDI beat annotations via linear interpolation of inter-onset intervals, followed by FFT-based periodogram estimation of the resulting time series. We will insert a concise description of this pipeline into the abstract. The current text reports only median dominant periods without uncertainty estimates; we will add bootstrap-derived 95% confidence intervals for both the periodicity ratios and the dominant periods. A sensitivity comparison to alternative estimators (e.g., Welch periodogram and Lomb-Scargle) will be included as a short supplementary analysis to strengthen support for the structured-rubato interpretation. revision: yes

standing simulated objections not resolved
  • External validation via expert salience ratings on a held-out subset cannot be performed without a new human-subject experiment, which lies outside the scope and resources of the present toolkit paper.

Circularity Check

0 steps flagged

Minor self-citation to companion Cygnus preprint; central claims are direct empirical measurements on external corpora

full rationale

The paper's headline observations (rho=0.61 Spearman correlation between gravity-centre PageRank and marginal KL distance on MAESTRO composers with N>=10; Gould's highest periodicity ratio on 247-piece Bach corpus) are computed directly from external symbolic music data using the toolkit's metrics. These quantities do not reduce to fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations within the present manuscript. The reference to the upstream Cygnus preprint covers only the three previously deployed metrics (entropy, KL, Zipf fits) and does not justify or derive the new graph-based or rubato-spectral case studies. The gravity-centre node is defined inside the chord-transition graphs, but its reported correlation is an external statistical finding rather than a tautological consequence of that definition. No ansatz smuggling, uniqueness theorems, or renaming of known results appears in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claims rest on standard definitions of Shannon entropy, KL divergence, PageRank, and spectral periodicity that are imported from prior literature; no new free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5851 in / 1175 out tokens · 46230 ms · 2026-05-19T21:19:48.125796+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 1 internal anchor

  1. [1]

    Cancino-Chacón, Carlos Eduardo, Silvan David Peter, Emmanouil Karystinaios, Francesco Foscarin, Maarten Grachten, and Gerhard Widmer. 2022. ``Partitura: A P ython Package for Symbolic Music Processing.'' Journal of Open Source Software 7 (76): 4519. https://doi.org/10.21105/joss.04519

  2. [2]

    Cuthbert, Michael Scott, and Christopher Ariza. 2010. ``Music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data.'' Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR) (Utrecht, Netherlands), 637--42

  3. [3]

    Schult, and Pieter J

    Hagberg, Aric A., Daniel A. Schult, and Pieter J. Swart. 2008. ``Exploring Network Structure, Dynamics, and Function Using N etwork X .'' Proceedings of the 7th P ython in Science Conference (SciPy) (Pasadena, CA, USA), 11--15

  4. [4]

    Harris, Charles R. et al. 2020. ``Array Programming with NumPy .'' Nature 585 (7825): 357--62. https://doi.org/10.1038/s41586-020-2649-2

  5. [5]

    Higuchi, Tomoyuki. 1988. ``Approach to an Irregular Time Series on the Basis of the Fractal Theory.'' Physica D: Nonlinear Phenomena 31 (2): 277--83. https://doi.org/10.1016/0167-2789(88)90081-4

  6. [6]

    Jalbert-Desforges, Fred. 2026. An Audio-to-Analysis Pipeline with Certified Transcription for Information-Theoretic Profiling of the Piano Repertoire. https://doi.org/10.48550/arXiv.2605.06685

  7. [7]

    Llorens, Ana, Federico Simonetta, Márius Serrano, and Álvaro Torrente. 2023. ``Musif: A P ython Package for Symbolic Music Feature Extraction.'' Proceedings of the Sound and Music Computing Conference (SMC) (Stockholm, Sweden). https://doi.org/10.48550/arXiv.2307.01120

  8. [8]

    Cumming, and Ichiro Fujinaga

    McKay, Cory, Julie E. Cumming, and Ichiro Fujinaga. 2018. ``jSymbolic 2.2: Extracting Features from Symbolic Music for Use in Musicological and MIR Research.'' Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR) (Paris, France), 348--54

  9. [9]

    Shannon, Claude Elwood. 1948. ``A Mathematical Theory of Communication.'' The Bell System Technical Journal 27 (3): 379--423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

  10. [10]

    Virtanen, Pauli et al. 2020. `` SciPy 1.0 : Fundamental Algorithms for Scientific Computing in P ython.'' Nature Methods 17: 261--72. https://doi.org/10.1038/s41592-019-0686-2

  11. [11]

    Zipf, George Kingsley. 1949. Human Behavior and the Principle of Least Effort. Addison-Wesley. CSLReferences document