pith. sign in

arxiv: 2506.07073 · v1 · pith:LRLZREFQnew · submitted 2025-06-08 · 💻 cs.SD · cs.HC· eess.AS

Insights on Harmonic Tones from a Generative Music Experiment

Pith reviewed 2026-05-22 00:43 UTC · model grok-4.3

classification 💻 cs.SD cs.HCeess.AS
keywords generative musicharmonic tonespolyphonymonophonic sequencesmusic perceptionAI music productionbass audio
0
0 comments X

The pith

A music AI model learned to generate coherent simultaneous melodic lines using only monophonic sequences of harmonic complex tones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

During a studio-lab session, music producers worked with an AI system that outputs bass-like audio and discovered they could use single harmonic complex tones from the model to stand for two or more pitches at once. This practical use showed that the model had acquired the capacity to create structured and coherent polyphonic melodies inside what look like single-tone sequences. Readers might care because the finding links generative AI practice directly to long-standing questions about how people hear harmonics and to new ways of making music.

Core claim

The model had learned to generate structured and coherent simultaneous melodic lines using monophonic sequences of harmonic complex tones, revealed when producers employed the model's single harmonic complex tones to convey two or more pitches.

What carries the argument

Producers' interpretation of single harmonic complex tones as standing for multiple pitches, which functions as evidence that the model internally organizes simultaneous melodies within monophonic output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same producer-AI collaboration method could be tried with other instrument families to check whether implicit polyphony appears in other monophonic generative outputs.
  • The result suggests a practical way to use generative models as probes for testing human harmonic perception without relying solely on synthetic test tones.
  • If confirmed, the observation opens the possibility that training data or model architecture choices can be examined for how they encourage multi-pitch organization inside single-tone streams.

Load-bearing premise

That the producers' choice to treat single tones as multiple pitches directly demonstrates the model's learned internal representation of polyphony instead of their own creative or contextual reading of the audio.

What would settle it

A controlled listening test with participants who have no production context, asking them to report how many distinct pitches they hear in the model's isolated harmonic tones.

Figures

Figures reproduced from arXiv: 2506.07073 by Emmanuel Deruty, Maarten Grachten.

Figure 1
Figure 1. Figure 1: Melatonin, section 2, bars 45–49, one BassNet output. (a) Perceptual transcrip￾tion. Pitches that can be associated to different partials are highlighted using different colors. (b) STFT of the weighted audio. The horizontal yellow lines denote the pitch as transcribed in (a). Listen to this example: http://mml2024-suppl-mat.s3-website.eu-west-3. amazonaws.com/index.html#LI_animation_12 In [PITH_FULL_IMAG… view at source ↗
Figure 2
Figure 2. Figure 2: STFT for the seven ‘modes’ of the ‘808 Woofer Warfare’ patch from the Seis￾mic Shock library in Omnisphere, weighted audio. http://mml2024-suppl-mat.s3-website. eu-west-3.amazonaws.com/index.html#seismic_shock [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Alt-J, ‘Hunger of the Pine’, 4’30 to 4’47, power spectrum of the weighted audio. (b) Guitar power chord, power spectrum of the weighted audio. 5 Conclusion This contribution demonstrates how a studio-lab experiment led to the unex￾pected result of the machine learning-powered BassNet synthesizer generating two or more independent melodies from a sequence of individual harmonic com￾plex tones. While the… view at source ↗
read the original abstract

The ultimate purpose of generative music AI is music production. The studio-lab, a social form within the art-science branch of cross-disciplinarity, is a way to advance music production with AI music models. During a studio-lab experiment involving researchers, music producers, and an AI model for music generating bass-like audio, it was observed that the producers used the model's output to convey two or more pitches with a single harmonic complex tone, which in turn revealed that the model had learned to generate structured and coherent simultaneous melodic lines using monophonic sequences of harmonic complex tones. These findings prompt a reconsideration of the long-standing debate on whether humans can perceive harmonics as distinct pitches and highlight how generative AI can not only enhance musical creativity but also contribute to a deeper understanding of music.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper describes a studio-lab experiment involving researchers, music producers, and a generative AI model producing bass-like audio via monophonic sequences of harmonic complex tones. It claims that producers' observed use of individual generated tones to convey two or more pitches demonstrates that the model has learned to generate structured and coherent simultaneous melodic lines, while also prompting reconsideration of whether humans perceive harmonics as distinct pitches.

Significance. If supported by objective audio evidence, the work could illustrate how generative models capture musical polyphony and how cross-disciplinary studio-lab methods advance both creative production and psychoacoustic understanding. The exploratory framing and emphasis on human-AI collaboration in music are potential strengths.

major comments (2)
  1. [Abstract] Abstract and main claim: The inference that producers' creative reinterpretation of single harmonic complex tones 'revealed' the model learned structured simultaneous melodic lines treats human interpretation as direct evidence of the model's internal polyphonic representation. No acoustic analysis, pitch detection, spectrogram examination, or comparison to training data is described to show detectable multiple fundamentals or coherent polyphonic structure in the outputs rather than rich single-fundamental harmonics.
  2. [Methodology] Methodology and results: The findings rest on qualitative observation during the experiment without quantitative measures, error bars, detailed protocols for how the observation was recorded or validated, or controls for contextual versus model-driven effects. This leaves the central claim interpretive rather than empirically grounded.
minor comments (2)
  1. Clarify terminology around 'monophonic sequences of harmonic complex tones' versus the claimed polyphonic output to avoid ambiguity in how the model generates audio.
  2. Add references to relevant prior work on harmonic perception debates and generative models for polyphonic music to situate the contribution.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments on our exploratory studio-lab study. We address each major point below and indicate planned revisions to better frame the interpretive nature of our observations while preserving the manuscript's focus on human-AI collaboration in music production.

read point-by-point responses
  1. Referee: [Abstract] Abstract and main claim: The inference that producers' creative reinterpretation of single harmonic complex tones 'revealed' the model learned structured simultaneous melodic lines treats human interpretation as direct evidence of the model's internal polyphonic representation. No acoustic analysis, pitch detection, spectrogram examination, or comparison to training data is described to show detectable multiple fundamentals or coherent polyphonic structure in the outputs rather than rich single-fundamental harmonics.

    Authors: The manuscript presents an observational account from a studio-lab session in which expert producers used individual generated harmonic complex tones to imply multiple pitches. This behavior is interpreted as suggesting that the model outputs supported polyphonic creative use, but we do not claim direct evidence of the model's internal representations or provide acoustic verification. We agree the language in the abstract could overstate the inference. In the revised version we will temper the abstract and main claim to describe the observation as generating a hypothesis about the model's learned structure, and we will add explicit discussion of the value of future acoustic analyses (pitch detection, spectrograms, training-data comparisons) to test for multiple fundamentals. revision: partial

  2. Referee: [Methodology] Methodology and results: The findings rest on qualitative observation during the experiment without quantitative measures, error bars, detailed protocols for how the observation was recorded or validated, or controls for contextual versus model-driven effects. This leaves the central claim interpretive rather than empirically grounded.

    Authors: The studio-lab approach is deliberately qualitative and process-oriented rather than controlled or quantitative. We will expand the methodology section to include fuller detail on session protocols, how observations were noted, and how they were validated through immediate post-session debriefs with the producers. We acknowledge that quantitative metrics, error bars, and explicit controls separating contextual from model-driven effects are absent; these are not part of the original design. A new limitations subsection will be added to state the interpretive character of the results and to outline how controlled follow-up experiments could address these gaps. revision: yes

standing simulated objections not resolved
  • The original experiment did not collect or analyze raw audio for acoustic evidence of polyphony; performing such analyses now would require new data collection outside the scope of a revision.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's derivation rests on direct experimental observation of music producers interpreting and using the model's monophonic harmonic tone outputs to convey multiple pitches during a studio-lab session. This observation is presented as evidence that the model had learned structured polyphonic melodic lines. No equations, fitted parameters, self-citations, or ansatzes are invoked in the provided abstract or context to create a self-referential loop. The central claim follows from participant behavior external to the model's training or internal state definitions, making the chain self-contained rather than reducing to its inputs by construction. This qualifies as a normal non-finding under the guidelines.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a qualitative observational study with no mathematical derivations, free parameters, or new postulated entities; the central claim rests on the interpretation of participant behavior in the experiment.

pith-pipeline@v0.9.0 · 5658 in / 1166 out tokens · 70705 ms · 2026-05-22T00:43:51.598808+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Economy and society 37(1), 20–49 (2008)

    Barry, A., Born, G., Weszkalnys, G.: Logics of interdisciplinarity. Economy and society 37(1), 20–49 (2008)

  2. [2]

    Univ of California Press (1995)

    Born, G.: Rationalizing culture: IRCAM, Boulez, and the institutionalization of the musical avant-garde. Univ of California Press (1995)

  3. [3]

    Cambridge, MA, and London: MIT Press (1996)

    Bregman, A.S., Ahad, P.A.: Demonstrations to Accompany Bregman’s Auditory Scene Analysis. Cambridge, MA, and London: MIT Press (1996)

  4. [4]

    Century,M.:Pathwaystoinnovationindigitalculture.McGillUniversity,Montreal (1999)

  5. [5]

    Melatonin

    Deruty, E., Grachten, M.: “Melatonin”: A case study on AI-induced musical style. In: Proceedings of the 3rd Conference on AI Music Creativity. AIMC (Sep 2022)

  6. [6]

    TISMIR 5(1) (2022)

    Deruty, E., Grachten, M., Lattner, S., Nistal, J., Aouameur, C.: On the develop- ment and practice of AI technology for contemporary popular music production. TISMIR 5(1) (2022)

  7. [7]

    In: Grove Music Online (2001)

    Dhomont, F.: Henry, Pierre. In: Grove Music Online (2001)

  8. [8]

    In: Tobias, J

    Dixon Ward, W.: Musical perception. In: Tobias, J. (ed.) Foundations of Modern Auditory Theory, vol. 1, pp. 405–446. Academic Press (1970)

  9. [9]

    Bell System Technical Journal12(4), 377–430 (1933),https://ieeexplore.ieee

    Fletcher,H.,Munson,W.A.:Loudness,itsdefinition,measurementandcalculation. Bell System Technical Journal12(4), 377–430 (1933),https://ieeexplore.ieee. org/document/6771028

  10. [10]

    Applied Sciences 10(18), 6627 (2020)

    Grachten, M., Lattner, S., Deruty, E.: Bassnet: A variational gated autoencoder for conditional generation of bass guitar tracks with learned interactive control. Applied Sciences 10(18), 6627 (2020)

  11. [11]

    Longmans, Green, and Co

    Helmholtz, H.L.F.v.: On the sensations of tone as a physiological basis for the theory of music. Longmans, Green, and Co. (1885)

  12. [12]

    Standard, International Organization for Standardization, Geneva, Switzerland (2003)

    ISO: Normal equal-loudness level contours-ISO 226: 2003. Standard, International Organization for Standardization, Geneva, Switzerland (2003)

  13. [13]

    In: Proceedings of the 26th International Computer Music Conference, ICMC 2000, Berlin, Germany (2000)

    Järveläinen, H., Verma, T.S., Välimäki, V.: The effect of inharmonicity on pitch in string instrument sounds. In: Proceedings of the 26th International Computer Music Conference, ICMC 2000, Berlin, Germany (2000)

  14. [14]

    Sébastien Cramoisy, Pierre Ballard et Richard Charlemagne (1636)

    Mersenne, M.: Harmonie Universelle. Sébastien Cramoisy, Pierre Ballard et Richard Charlemagne (1636)

  15. [15]

    In: Audio Engineering Society Convention 17

    Moog, R.A.: A voltage-controlled low-pass high-pass filter for audio signal pro- cessing. In: Audio Engineering Society Convention 17. Audio Engineering Society (1965)

  16. [16]

    Emerald Group Pub- lishing Limited (2012)

    Moore, B.C.: An introduction to the psychology of hearing. Emerald Group Pub- lishing Limited (2012)

  17. [17]

    The Journal of the Acoustical Society of America 36(9), 1628–1636 (1964)

    Plomp, R.: The ear as a frequency analyzer. The Journal of the Acoustical Society of America 36(9), 1628–1636 (1964)

  18. [18]

    Paris: Durand et Pissot (1750)

    Rameau, J.P.: Démonstration du principe de l’harmonie, servant de base à tout l’art musical théorique et pratique. Paris: Durand et Pissot (1750)

  19. [19]

    The British Journal for the History of Science10(1), 1–24 (1977)

    Turner, R.S.: The Ohm-Seebeck dispute, Hermann von Helmholtz, and the origins of physiological acoustics. The British Journal for the History of Science10(1), 1–24 (1977)

  20. [20]

    Yost,W.A.:Pitchperception.Attention,Perception,&Psychophysics 71(8),1701– 1715 (2009)