pith. sign in

arxiv: 1906.08916 · v1 · pith:3PW4XD5Xnew · submitted 2019-06-21 · 💻 cs.SD · eess.AS

Understanding and Classifying Cultural Music Using Melodic Features Case Of Hindustani, Carnatic And Turkish Music

Pith reviewed 2026-05-25 18:49 UTC · model grok-4.3

classification 💻 cs.SD eess.AS
keywords melodic featuresmusic style classificationHindustani musicCarnatic musicTurkish musicpitch contourcultural music analysisimprovisational music
0
0 comments X

The pith

Melodic features based on pitch contours and energy can classify Hindustani, Carnatic, and Turkish music styles in a way that matches human listener judgments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that listeners can discriminate between these three styles using only the melodic contour when other cues like instrumentation are removed. It derives a set of features from musicology that capture pitch transitions, micro-tonal notes, and vocal energy variations to build an automatic classifier. The resulting style labels correlate with subjective judgments according to statistical tests. Adding the melodic features to timbre-based ones raises classification performance. This approach matters because it offers an objective, melody-driven method for understanding cultural distinctions in improvisational music traditions that share similar structures.

Core claim

A set of melody-based features derived from musicology captures distinct characteristics of the melodic contour in Hindustani, Carnatic, and Turkish music. These features exploit pitch contour transitions, the presence of micro-tonal notes, and the nature of variations in vocal energy. When used for automatic classification, the style labels correlate well with subjective listening judgments as verified by statistical tests. The melody features also improve performance when combined with timbre-based features.

What carries the argument

A set of highly discriminatory melodic features that capture pitch contour transitions, micro-tonal notes, and vocal energy variations to distinguish style in the absence of instrumentation cues.

If this is right

  • Style labels produced by the melodic classifier match those from subjective listening tests, as confirmed by statistical comparison.
  • Melody alone suffices to discriminate the styles when similar ragas or makams are used and instrumentation is controlled.
  • Combining the melodic features with timbre features raises overall classification accuracy beyond either set used alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feature approach could be tested on other improvisational traditions that share theme-and-structure concert formats.
  • These features might support music search tools that group pieces by melodic style identity rather than artist or region alone.
  • Controlled recordings across multiple performers could check whether the features remain stable when recording conditions vary.

Load-bearing premise

The chosen melodic features derived from musicology are highly discriminatory and capture the distinct characteristics of the melodic contour independent of performer or recording quality.

What would settle it

An experiment in which the automatic classifier using only the melodic features performs no better than chance at matching the style labels that human listeners assign when all non-melodic cues are removed from the audio.

Figures

Figures reproduced from arXiv: 1906.08916 by Amruta Vidwans, Prateek Verma, Preeti Rao.

Figure 1
Figure 1. Figure 1: Percentage distribution of labels for each clip marked across all the participants showing the 7 labels marked. 4 Melodic Feature Extraction Melody is defined by Poliner et al. [17] as "the single (monophonic) pitch se￾quence that a listener might reproduce if asked to whistle or hum a piece of polyphonic music, and that a listener would recognize as being the ‘essence’ of that music when heard in comparis… view at source ↗
Figure 2
Figure 2. Figure 2: Normalized category wise response for 60 clips of each style. The dependence on the training can be seen on the markings. audio by using melody extraction algorithms [17][16][22], and represented as a time series of pitch values. This, when re-synthesized using sinusoidal basis func￾tion weighted by the timbral envelope, will sound like the original melody. The notes present in an audio excerpt can be foun… view at source ↗
Figure 3
Figure 3. Figure 3: Pitch contours of Hindustani, Carnatic and Turkish style music. The proportion of steady notes is seen more in Hindustani followed by Turkish and least in Carnatic. The dotted contours show the corresponding energy variations [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Distribution of Pitch Range (in cents) in alap section by (a) Hindustani vocalist Rashid Khan for raga Todi (b) Carnatic vocalist Sudha Raghunathan for raga Subha￾panthuvarali (c)Turkish vocalist Sadettin Kaynak for makam Mahur. Alap is centered around ’S’ in Hindustani and ’P’ in Carnatic style and even higher for taqsim section in Turkish style 4.4 Melodic transitions The feature described in section 4.2… view at source ↗
Figure 5
Figure 5. Figure 5: Concatenated pitch contour (in gray) of (a) Carnatic and (b) Hindustani and (c) Turkish clip. The black dashed line is the 5th level Haar wavelet approximation. The lower levels of approximation will capture the minute details and not the overall trend and may be prone to pitch errors while the higher levels will lead to loss of information. Level 5 wavelet approximation was chosen as optimum thus giving 3… view at source ↗
Figure 6
Figure 6. Figure 6: Presence of tremolo in the energy contour of the taqsim section of Turkish music as seen in (a) represented with a solid line and median filtered output represented in dashed line while (b) shows the periodic variation after subtraction of the energy contour from its median filtered output. 4.6 Microtonality based measure Turkish music uses 53 Holdorein commas [8] i.e. it has much more notes than in the ca… view at source ↗
Figure 7
Figure 7. Figure 7: Illustration of few steps in detecting peaks in a folded histogram for taqsim section of makam rast by artist Hafiz Sesyilmaz (8 cent binning) We define the four features by making use of the equation below: l 0 k = f (lk) = mod(lk, 100), if l0 k < 50, else l0 k = 100 − l 0 k (3) where lk is the location of the peak detected in the unfolded histogram The mod operation is carried out in order to remove the … view at source ↗
Figure 8
Figure 8. Figure 8: Percentage responses across all the listeners of few Hindustani clips depicting the need for using two-best ratio over the percentage of the majority label as the confidence [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗
read the original abstract

We present a melody based classification of musical styles by exploiting the pitch and energy based characteristics derived from the audio signal. Three prominent musical styles were chosen which have improvisation as integral part with similar melodic principles, theme, and structure of concerts namely, Hindustani, Carnatic and Turkish music. Listeners of one or more of these genres can discriminate between these based on the melodic contour alone. Listening tests were carried out using melodic attributes alone, on similar melodic pieces with respect to raga/makam, and removing any instrumentation cue to validate our hypothesis that style distinction is evident in the melody. Our method is based on finding a set of highly discriminatory features, derived from musicology, to capture distinct characteristics of the melodic contour. Behavior in terms of transitions of the pitch contour, the presence of micro-tonal notes and the nature of variations in the vocal energy are exploited. The automatically classified style labels are found to correlate well with subjective listening judgments. This was verified by using statistical tests to compare the labels from subjective and objective judgments. The melody based features, when combined with timbre based features, were seen to improve the classification performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper claims that pitch- and energy-derived melodic features (pitch transitions, micro-tonal notes, vocal energy variations) extracted from audio can classify Hindustani, Carnatic, and Turkish music styles. It reports that these features, when used for automatic classification, correlate well with human listening judgments on melodic contour alone (after matching pieces on raga/makam and removing instrumentation cues), with the correlation verified by statistical tests; combining the melodic features with timbre features further improves classification performance.

Significance. If the central claim holds after addressing potential confounds, the work would demonstrate that musicology-derived melodic features can capture style-specific contour characteristics in a manner that aligns with listener perception, offering a concrete contribution to MIR for improvisation-based genres. The explicit use of listening tests with statistical validation and the reported improvement from feature combination are positive elements that would strengthen the result.

major comments (1)
  1. [Abstract] Abstract (and implied Methods): the central claim that the selected melodic features discriminate style independently of performer or recording variables rests on the assumption that pitch/energy quantities encode only style-specific contour; however, no explicit cross-performer or cross-recording controls are described that would isolate the style signal from ornamentation habits or acoustics, leaving open the possibility that automatic labels correlate with listener judgments for spurious reasons.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the major comment below, providing clarification on the controls employed in the study while acknowledging areas where the manuscript can be strengthened.

read point-by-point responses
  1. Referee: [Abstract] Abstract (and implied Methods): the central claim that the selected melodic features discriminate style independently of performer or recording variables rests on the assumption that pitch/energy quantities encode only style-specific contour; however, no explicit cross-performer or cross-recording controls are described that would isolate the style signal from ornamentation habits or acoustics, leaving open the possibility that automatic labels correlate with listener judgments for spurious reasons.

    Authors: We appreciate this observation on potential confounds. The manuscript describes selection of pieces matched on raga/makam to control for melodic theme and structure, with instrumentation cues removed in the listening tests to isolate melodic contour. These steps were intended to focus the analysis on style-specific melodic characteristics. However, we acknowledge that explicit cross-performer normalization (e.g., multiple performers per raga across styles) or acoustic normalization across recordings is not detailed beyond the raga/makam matching. The statistical validation against human judgments on melodic attributes alone provides supporting evidence against purely spurious correlations, but to address the concern directly we will revise the Methods and Discussion sections to explicitly describe the controls used, discuss limitations regarding performer ornamentation habits and recording acoustics, and clarify how the feature set targets contour transitions, micro-tones, and energy variations independent of those factors. revision: partial

Circularity Check

0 steps flagged

No circularity: classification validated against independent subjective judgments

full rationale

The paper derives melodic features (pitch transitions, micro-tones, vocal energy) from audio, trains a classifier on style labels, and validates automatic outputs against separate human listening tests via statistical comparison. No self-definitional steps, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided derivation chain. The central result is an empirical correlation between feature-based labels and external judgments, which does not reduce to the inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review based on abstract only; details of any free parameters or assumptions in the full paper are unknown.

free parameters (1)
  • feature selection thresholds
    Likely parameters in defining discriminatory features from pitch and energy, though not specified in abstract.
axioms (1)
  • domain assumption The selected features from musicology capture style-specific melodic characteristics
    Invoked in the method description to justify the feature set.

pith-pipeline@v0.9.0 · 5745 in / 1143 out tokens · 33180 ms · 2026-05-25T18:49:26.777793+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Cmusphinx: The carnegie mellon sphinx project.https://cmusphinx.github.io/ wiki/, accessed: 2019-04-21

  2. [2]

    distinguishing between similar ragas

    "distinguishing between similar ragas", itc sangeet research academy.http://www. itcsra.org/sra_raga/sra_raga_index.asp, accessed: 2019-04-21

  3. [3]

    Wikipedia contributors. alap. in wikipedia, the free encyclopedia. https://en. wikipedia.org/wiki/Alap, accessed: 2019-04-21

  4. [4]

    Wikipedia contributors. makam. in wikipedia, the free encyclopedia.https://en. wikipedia.org/wiki/Makam, accessed: 2019-04-21

  5. [5]

    Wikipedia contributors. taqsim. in wikipedia, the free encyclopedia.https://en. wikipedia.org/wiki/Taqsim, accessed: 2019-04-21

  6. [6]

    In: ISMIR

    Agarwal, P., Karnick, H., Raj, B.: A comparative study of indian and western music forms. In: ISMIR. pp. 29–34 (2013)

  7. [7]

    IEEE Transactions on speech and audio processing13(5), 1035–1047 (2005)

    Bello, J.P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., Sandler, M.B.: A tutorial on onset detection in music signals. IEEE Transactions on speech and audio processing13(5), 1035–1047 (2005)

  8. [8]

    Proceedings of Les Corpus de l’oralité, Strasbourg, France (2011)

    Bozkurt, B.: Pitch histogram based analysis of makam music in turkey. Proceedings of Les Corpus de l’oralité, Strasbourg, France (2011)

  9. [9]

    In: Proceedings of the International Symposium on Music Information Retrieval, Vienna, Austria (2007)

    Chordia, P., Rae, A.: Automatic raag classification using pitch-class and pitch- class dyad distributions. In: Proceedings of the International Symposium on Music Information Retrieval, Vienna, Austria (2007)

  10. [10]

    In: 2011 National Conference on Communications (NCC)

    Kini, S., Gulati, S., Rao, P.: Automatic genre classification of north indian devo- tional music. In: 2011 National Conference on Communications (NCC). pp. 1–5. IEEE (2011)

  11. [11]

    Sound and Music Computing38, 39–41 (2011)

    Koduri, G.K., Gulati, S., Rao, P.: A survey of raaga recognition techniques and im- provements to the state-of-the-art. Sound and Music Computing38, 39–41 (2011)

  12. [12]

    In: Gouyon F, Herrera P, Martins LG, Müller M

    Koduri, G.K., Serrà Julià, J., Serra, X.: Characterization of intonation in car- natic music by parametrizing pitch histograms. In: Gouyon F, Herrera P, Martins LG, Müller M. ISMIR 2012: Proceedings of the 13th International Society for Music Information Retrieval Conference; 2012 Oct 8-12; Porto, Portugal. Porto: FEUP Ediçoes, 2012. International Society...

  13. [13]

    In: Audio Engineering Society Conference: 42nd International Conference: Semantic Audio

    Kruspe, A., Lukashevich, H., Abeßer, J., Großmann, H., Dittmar, C.: Automatic classification of musical pieces into global cultural areas. In: Audio Engineering Society Conference: 42nd International Conference: Semantic Audio. Audio Engi- neering Society (2011)

  14. [14]

    In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

    Liu, Y., Xiang, Q., Wang, Y., Cai, L.: Cultural style based music classification of audio signals. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 57–60. IEEE (2009)

  15. [15]

    CS229 Lecture notes1(1), 1–3 (2000)

    Ng, A.: Cs229 lecture notes. CS229 Lecture notes1(1), 1–3 (2000)

  16. [16]

    In: 2010 National Conference On Communications (NCC)

    Pant, S., Rao, V., Rao, P.: A melody detection user interface for polyphonic music. In: 2010 National Conference On Communications (NCC). pp. 1–5. IEEE (2010)

  17. [17]

    IEEE Transactions on Audio, Speech, and Language Processing15(4), 1247–1256 (2007) STYLE CLASSIFICATION USING MELODIC FEATURES 19

    Poliner, G.E., Ellis, D.P., Ehmann, A.F., Gómez, E., Streich, S., Ong, B.: Melody transcription from music audio: Approaches and evaluation. IEEE Transactions on Audio, Speech, and Language Processing15(4), 1247–1256 (2007) STYLE CLASSIFICATION USING MELODIC FEATURES 19

  18. [18]

    IEEE Transactions on Audio, Speech, and Language Processing20(1), 342–348 (2012)

    Rao, V., Gaddipati, P., Rao, P.: Signal-driven window-length adaptation for si- nusoid detection in polyphonic music. IEEE Transactions on Audio, Speech, and Language Processing20(1), 342–348 (2012)

  19. [19]

    In: International Workshop on Adaptive Multimedia Retrieval

    Rao, V., Gupta, C., Rao, P.: Context-aware features for singing voice detection in polyphonic music. In: International Workshop on Adaptive Multimedia Retrieval. pp. 43–57. Springer (2011)

  20. [20]

    IEEE transactions on audio, speech, and language processing 18(8), 2145–2154 (2010)

    Rao, V., Rao, P.: Vocal melody extraction in the presence of pitched accompa- niment in polyphonic music. IEEE transactions on audio, speech, and language processing 18(8), 2145–2154 (2010)

  21. [21]

    In: ISMIR

    Ross, J.C., Vinutha, T., Rao, P.: Detecting melodic motifs from audio for hindus- tani classical music. In: ISMIR. pp. 193–198 (2012)

  22. [22]

    IEEE Transactions on Audio, Speech, and Language Processing 20(6), 1759–1770 (2012)

    Salamon, J., Gómez, E.: Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing 20(6), 1759–1770 (2012)

  23. [23]

    In: Gouyon F, Herrera P, Martins LG, Müller M

    Salamon, J., Gulati, S., Serra, X.: A multipitch approach to tonic identification in indian classical music. In: Gouyon F, Herrera P, Martins LG, Müller M. IS- MIR 2012: Proceedings of the 13th International Society for Music Information Retrieval Conference; 2012 Oct 8-12; Porto, Portugal. Porto: FEUP Ediçoes; 2012. International Society for Music Informa...

  24. [24]

    In: 2012 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP)

    Salamon, J., Rocha, B., Gómez, E.: Musical genre classification using melody fea- tures extracted from polyphonic music signals. In: 2012 IEEE International Con- ference on Acoustics, Speech and Signal Processing (ICASSP). pp. 81–84. IEEE (2012)

  25. [25]

    In: ISMIR

    Turnbull, D., Lanckriet, G.R., Pampalk, E., Goto, M.: A supervised approach for detecting boundaries in music using difference features and boosting. In: ISMIR. pp. 51–54 (2007)

  26. [26]

    In: Serra X, Rao P, Murthy H, Bozkurt B, editors

    Vidwans, A., Ganguli, K.K., Rao, P.: Classification of indian classical vocal styles from melodic contours. In: Serra X, Rao P, Murthy H, Bozkurt B, editors. Proceed- ings of the 2nd CompMusic Workshop; 2012 Jul 12-13; Istanbul, Turkey. Barcelona: Universitat Pompeu Fabra; 2012. p. 139-146. Universitat Pompeu Fabra (2012)

  27. [27]

    In: In proc

    Vlachos, M., Lin, J., Keogh, E., Gunopulos, D.: A wavelet-based anytime algorithm for k-means clustering of time series. In: In proc. workshop on clustering high dimensionality data and its applications. Citeseer (2003)