pith. sign in

arxiv: 2606.07673 · v1 · pith:GVKB7KD7new · submitted 2026-06-04 · 💻 cs.SD · cs.AI· cs.LG

A Hierarchical Feature Engineering Framework for Automated Classification of Phonotraumatic and Non-Phonotraumatic Vocal Hyperfunction

Pith reviewed 2026-06-27 23:32 UTC · model grok-4.3

classification 💻 cs.SD cs.AIcs.LG
keywords vocal hyperfunctionphonotraumaticnon-phonotraumaticfeature engineeringneck-surface accelerationmachine learning classificationcoupling featuresvoice disorders
0
0 comments X

The pith

Coupling features in a hierarchical framework classify phonotraumatic vocal hyperfunction with AUC 0.891 from neck-surface acceleration data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hierarchical feature engineering framework to classify phonotraumatic vocal hyperfunction (PVH) and non-phonotraumatic vocal hyperfunction (NPVH) from healthy controls using ambulatory neck-surface acceleration recordings from the NeckVibe Challenge dataset. The framework layers static, dynamic, ratio-based, and coupling features that capture source-filter interactions. Univariate analysis shows strong separability for PVH but limited significance for NPVH, while the machine learning pipeline shows that coupling features are crucial for both tasks. This approach could support non-invasive monitoring of vocal hyperfunction subtypes, with PVH appearing near-linearly separable and NPVH requiring non-linear modeling.

Core claim

The hierarchical feature engineering framework identifies coupling features as crucial for distinguishing PVH and NPVH from healthy controls in the NeckVibe Challenge dataset, achieving an AUC of 0.891 for PVH and 0.728 for NPVH, suggesting that PVH is near-linearly separable while NPVH discrimination benefits from modeling non-linear feature interactions.

What carries the argument

The hierarchical feature engineering framework comprising static, dynamic, ratio-based, and coupling features that capture source-filter interactions.

If this is right

  • Coupling features are crucial for classification performance in both PVH and NPVH tasks.
  • PVH is near-linearly separable from controls using the engineered features.
  • NPVH discrimination benefits from modeling non-linear feature interactions.
  • The framework enables automated classification of vocal hyperfunction subtypes from neck-surface acceleration data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar hierarchical feature engineering may help classify other voice disorders that involve source-filter interactions.
  • The contrast in linear versus non-linear separability between PVH and NPVH suggests distinct underlying physiological mechanisms.
  • Extending the framework to continuous ambulatory monitoring could test its value for real-world clinical tracking of vocal health.

Load-bearing premise

The NeckVibe Challenge dataset labels and recordings accurately represent real-world PVH and NPVH cases without significant selection bias or label noise that would affect the reported separability and AUC values.

What would settle it

Applying the same framework to a new independently labeled dataset of neck-surface acceleration recordings and obtaining AUC values substantially below 0.7 for both PVH and NPVH would falsify the claim that coupling features are crucial.

Figures

Figures reproduced from arXiv: 2606.07673 by Hyunju Lee, June-Woo Kim, Kangwook Jang, Minu Kim.

Figure 1
Figure 1. Figure 1: Comparison of statistical features for PVH (Task 1) and NPVH (Task 2). While PVH exhibits robust group separation char￾acterized by large effect sizes across multiple dynamic and higher-order statistics, NPVH shows only weak and inconsistent differences at the daily summary level, even among the most discriminative features [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Task 2 SHAP summary (LightGBM). an AUC of 0.579 in Task 2. This suggests that the pathologi￾cal contrast in NPVH is insufficiently distinct for multivariate modeling. 5. Discussion and Conclusion The primary objective of this study was to evaluate the incre￾mental utility of complex feature configurations in character￾izing vocal hyperfunction. Our findings reveal a clear diver￾gence in classification perf… view at source ↗
Figure 2
Figure 2. Figure 2: Task 1 SHAP summary (Logistic Regression). order delta statistics of aerodynamic features (e.g., naq), exhibit stronger localized SHAP magnitudes. The broader spread and higher variance of SHAP values across folds suggest a less sta￾ble discriminative structure compared to Task 1. 4.5. Comparison with Univariate Analysis The multivariate modeling results are consistent with the statis￾tical findings report… view at source ↗
read the original abstract

Ambulatory neck-surface acceleration enables non-invasive monitoring of vocal hyperfunction, yet robust biomarkers for its subtypes remain limited. This study investigates the NeckVibe Challenge dataset to distinguish phonotraumatic (PVH) and non-phonotraumatic (NPVH) from healthy controls. We propose a hierarchical feature engineering framework comprising: (i) static, (ii) dynamic, (iii) ratio-based, (iv) coupling features capturing source filter interactions. While univariate statistical analysis shows strong separability for PVH but limited significance for NPVH, our machine learning pipeline, tailored for high-dimensional feature integration, identifies that coupling features are crucial for both tasks. We achieve an AUC of 0.891 for PVH and 0.728 for NPVH, suggesting that while PVH is near-linearly separable, NPVH discrimination benefits from modeling non-linear feature interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a hierarchical feature engineering framework (static, dynamic, ratio-based, and coupling features capturing source-filter interactions) applied to neck-surface acceleration signals from the NeckVibe Challenge dataset. It reports that univariate analysis shows strong separability for phonotraumatic vocal hyperfunction (PVH) versus controls but limited significance for non-phonotraumatic vocal hyperfunction (NPVH), while a machine-learning pipeline yields AUC 0.891 for PVH and 0.728 for NPVH, concluding that coupling features are crucial and that PVH is near-linearly separable while NPVH requires modeling non-linear interactions.

Significance. If the reported separability and AUC values prove robust to proper validation, the work could contribute clinically relevant biomarkers for ambulatory monitoring of vocal hyperfunction subtypes, particularly by highlighting the value of coupling features. The approach is grounded in standard supervised learning rather than circular definitions, and the distinction between linear separability for PVH and the need for non-linear modeling for NPVH is a potentially useful observation.

major comments (3)
  1. [Abstract/Methods] Abstract and Methods: The reported AUC values of 0.891 (PVH) and 0.728 (NPVH) are presented without any description of the cross-validation procedure, feature selection steps, or handling of class imbalance. These omissions are load-bearing for the central claim that coupling features are crucial, as the performance gap and the necessity of the hierarchical framework cannot be assessed without them.
  2. [Dataset] Dataset section: No information is provided on diagnostic criteria, inter-rater reliability, cohort demographics, or potential selection bias in the NeckVibe Challenge labels. This directly affects the interpretation of the univariate separability results and the claim that the framework identifies robust biomarkers rather than artifacts of the particular cohort.
  3. [Results] Results: While the abstract states that coupling features are crucial, there is no mention of ablation experiments, feature importance rankings, or statistical tests comparing the full hierarchical set against subsets. Without such evidence, the assertion that coupling features drive the reported performance remains unsupported.
minor comments (2)
  1. [Abstract] The abstract would benefit from stating the number of subjects per class and the total feature dimensionality to allow readers to gauge the scale of the high-dimensional integration task.
  2. [Methods] Notation for the four feature categories (static, dynamic, ratio-based, coupling) should be introduced with explicit definitions or references to equations in the main text for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us strengthen the manuscript. We address each major comment below and have made revisions to incorporate additional methodological details, dataset information, and supporting analyses as requested.

read point-by-point responses
  1. Referee: [Abstract/Methods] Abstract and Methods: The reported AUC values of 0.891 (PVH) and 0.728 (NPVH) are presented without any description of the cross-validation procedure, feature selection steps, or handling of class imbalance. These omissions are load-bearing for the central claim that coupling features are crucial, as the performance gap and the necessity of the hierarchical framework cannot be assessed without them.

    Authors: We agree that the original submission omitted explicit details on these aspects. The revised manuscript now includes a dedicated subsection in Methods describing the stratified 5-fold cross-validation procedure (with subject-wise partitioning to avoid leakage), recursive feature elimination with cross-validation for feature selection, and SMOTE oversampling to address class imbalance. These additions directly support the robustness of the AUC values and the role of coupling features. revision: yes

  2. Referee: [Dataset] Dataset section: No information is provided on diagnostic criteria, inter-rater reliability, cohort demographics, or potential selection bias in the NeckVibe Challenge labels. This directly affects the interpretation of the univariate separability results and the claim that the framework identifies robust biomarkers rather than artifacts of the particular cohort.

    Authors: The NeckVibe Challenge is a public benchmark whose labeling protocol is defined in the challenge documentation. To address the concern, we have added a new paragraph in the Dataset section summarizing cohort demographics (age, sex distribution), diagnostic criteria (clinical laryngoscopy and voice evaluation by board-certified laryngologists), and a reference to the challenge paper for inter-rater details. Potential selection biases are now explicitly discussed as a limitation. revision: yes

  3. Referee: [Results] Results: While the abstract states that coupling features are crucial, there is no mention of ablation experiments, feature importance rankings, or statistical tests comparing the full hierarchical set against subsets. Without such evidence, the assertion that coupling features drive the reported performance remains unsupported.

    Authors: We acknowledge the need for explicit supporting evidence. The revised Results section now includes ablation experiments (performance with/without coupling features), permutation feature importance rankings, and Wilcoxon signed-rank tests demonstrating statistically significant gains from the full hierarchical set. These are presented with new tables and figures. revision: yes

Circularity Check

0 steps flagged

No circularity in feature engineering or ML classification pipeline

full rationale

The paper presents a standard supervised learning pipeline: hierarchical feature extraction (static/dynamic/ratio/coupling) from the external NeckVibe Challenge dataset, followed by univariate analysis and model training to produce AUC metrics. No equations, derivations, or self-citations reduce the reported separability or AUCs to fitted parameters by construction; the results are empirical outputs of cross-validation on held-out data rather than tautological redefinitions. The central claims rest on observable data patterns and model performance, not on any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the public NeckVibe Challenge dataset provides ground-truth labels and that standard ML pipelines can be applied without domain-specific validation details being required for the reported separability.

axioms (1)
  • domain assumption NeckVibe Challenge dataset labels accurately reflect clinical PVH and NPVH diagnoses
    The paper uses these labels to train and evaluate the classifier.

pith-pipeline@v0.9.1-grok · 5696 in / 1196 out tokens · 18415 ms · 2026-06-27T23:32:39.370620+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 1 linked inside Pith

  1. [1]

    Introduction V ocal hyperfunction (VH) is a prevalent condition character- ized by chronic voice misuse, leading to disorders catego- rized as phonotraumatic VH (PVH, e.g., nodules) and non- phonotraumatic VH (NPVH, e.g., muscle tension dyspho- nia) [1, 2]. While clinical diagnosis is standard, ambulatory monitoring using neck-surface acceleration (ACC) h...

  2. [2]

    The challenge defines two binary classifica- tion tasks:(i)PVH versus PVH Control and(ii)NPVH ver- sus NPVH Control

    NeckVibe Challenge Dataset The NeckVibe Challenge provides ambulatory neck surface acceleration-derived measurements for subject-level voice dis- order detection. The challenge defines two binary classifica- tion tasks:(i)PVH versus PVH Control and(ii)NPVH ver- sus NPVH Control. The dataset comprises 3,278 samples from 468 subjects. Specifically, PVH incl...

  3. [3]

    Feature Construction To capture the complex mechanisms of VH, we constructed a hierarchical feature set focusing on static, dynamic, ratio-based, and coupling properties

    Methods 3.1. Feature Construction To capture the complex mechanisms of VH, we constructed a hierarchical feature set focusing on static, dynamic, ratio-based, and coupling properties. We include thevocal dose percentage in all configurations, defined as the proportion of voiced frames relative to total recording time. All frame-level measures were aggrega...

  4. [4]

    Statistical Results A summary of statistically significant features across different feature configurations is provided in Table 2

    Results 4.1. Statistical Results A summary of statistically significant features across different feature configurations is provided in Table 2. For Task 1, a substantial number of features remained statistically significant after FDR correction. The number of significant features in- creased as dynamic and normalized descriptors were incorpo- rated. As i...

  5. [5]

    Our findings reveal a clear diver- gence in classification performance between Task 1 and Task

    Discussion and Conclusion The primary objective of this study was to evaluate the incre- mental utility of complex feature configurations in character- izing vocal hyperfunction. Our findings reveal a clear diver- gence in classification performance between Task 1 and Task

  6. [6]

    In con- trast, the relatively low performance observed in Task 2 reflects a lack of clear discriminatory rationales in the current feature set

    For Task 1, leveraging coupling features notably enhanced model performance, suggesting that PVH conditions leave dis- tinct physiological signatures in ambulatory voice data. In con- trast, the relatively low performance observed in Task 2 reflects a lack of clear discriminatory rationales in the current feature set. In our univariate analysis, no single...

  7. [7]

    Acknowledgement This research was supported by the InnoCORE program of the Ministry of Science and ICT(GIST InnoCORE KH0860), and by the Regional Innovation System & Education(RISE) pro- gram through the Jeonbuk RISE Center, funded by the Min- istry of Education(MOE) and the Jeonbuk State, Republic of Korea(2026-RISE-13-WKU)

  8. [8]

    The authors have verified all technical content and maintain full account- ability for the work

    Generative AI Use Disclosure Generative AI (ChatGPT) was used solely for grammar correc- tion and linguistic polishing of this manuscript. The authors have verified all technical content and maintain full account- ability for the work

  9. [9]

    An updated theoretical framework for vocal hyperfunc- tion,

    R. E. Hillman, C. E. Stepp, J. H. Van Stan, M. Za ˜nartu, and D. D. Mehta, “An updated theoretical framework for vocal hyperfunc- tion,”American Journal of Speech-Language Pathology, vol. 29, no. 4, pp. 2254–2260, 2020

  10. [10]

    Patient-reported factors associated with the onset of hyperfunc- tional voice disorders,

    S. Kridgen, R. E. Hillman, T. Stadelman-Cohen, S. Zeitels, J. A. Burns, T. Hron, C. Krusemark, J. Muise, and J. H. Van Stan, “Patient-reported factors associated with the onset of hyperfunc- tional voice disorders,”Annals of Otology, Rhinology & Laryn- gology, vol. 130, no. 4, pp. 389–394, 2021

  11. [11]

    Mobile voice health monitoring using a wearable ac- celerometer sensor and a smartphone platform,

    D. D. Mehta, M. Zanartu, S. W. Feng, H. A. Cheyne II, and R. E. Hillman, “Mobile voice health monitoring using a wearable ac- celerometer sensor and a smartphone platform,”IEEE Transac- tions on Biomedical Engineering, vol. 59, no. 11, pp. 3090–3096, 2012

  12. [12]

    Using ambulatory voice monitoring to investigate common voice disorders: Research update,

    D. D. Mehta, J. H. Van Stan, M. Za ˜nartu, M. Ghassemi, J. V . Gut- tag, V . M. Espinoza, J. P. Cort´es, H. A. Cheyne, and R. E. Hill- man, “Using ambulatory voice monitoring to investigate common voice disorders: Research update,”Frontiers in bioengineering and biotechnology, vol. 3, p. 155, 2015

  13. [13]

    Differences in weeklong ambulatory vocal behavior between fe- male patients with phonotraumatic lesions and matched controls,

    J. H. Van Stan, D. D. Mehta, A. J. Ortiz, J. A. Burns, L. E. Toles, K. L. Marks, M. Vangel, T. Hron, S. Zeitels, and R. E. Hillman, “Differences in weeklong ambulatory vocal behavior between fe- male patients with phonotraumatic lesions and matched controls,” Journal of Speech, Language, and Hearing Research, vol. 63, no. 2, pp. 372–384, 2020

  14. [14]

    Ambulatory mon- itoring of subglottal pressure estimated from neck-surface vibra- tion in individuals with and without voice disorders,

    J. P. Cort ´es, J. Z. Lin, K. L. Marks, V . M. Espinoza, E. J. Ibarra, M. Za˜nartu, R. E. Hillman, and D. D. Mehta, “Ambulatory mon- itoring of subglottal pressure estimated from neck-surface vibra- tion in individuals with and without voice disorders,”Applied Sci- ences, vol. 12, no. 21, p. 10692, 2022

  15. [15]

    Differences in daily voice use measures between fe- male patients with nonphonotraumatic vocal hyperfunction and matched controls,

    J. H. Van Stan, A. J. Ortiz, J. P. Cortes, K. L. Marks, L. E. Toles, D. D. Mehta, J. A. Burns, T. Hron, T. Stadelman-Cohen, C. Kruse- market al., “Differences in daily voice use measures between fe- male patients with nonphonotraumatic vocal hyperfunction and matched controls,”Journal of Speech, Language, and Hearing Research, vol. 64, no. 5, pp. 1457–1470, 2021

  16. [16]

    Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,

    M. Za ˜nartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,”IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 9, pp. 1929–1939, 2013

  17. [17]

    The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface ac- celerometer signals during phonation,

    D. D. Mehta, V . M. Espinoza, J. H. Van Stan, M. Za ˜nartu, and R. E. Hillman, “The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface ac- celerometer signals during phonation,”The Journal of the Acous- tical Society of America, vol. 145, no. 5, pp. EL386–EL392, 2019

  18. [18]

    Glottal airflow estimation using neck surface acceleration and low-order kalman smoothing,

    A. Morales, J. I. Yuz, J. P. Cort ´es, J. G. Fontanet, and M. Za˜nartu, “Glottal airflow estimation using neck surface acceleration and low-order kalman smoothing,”IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 2055–2066, 2023

  19. [19]

    Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules,

    M. Ghassemi, J. H. Van Stan, D. D. Mehta, M. Za ˜nartu, H. A. Cheyne II, R. E. Hillman, and J. V . Guttag, “Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules,”IEEE Transac- tions on Biomedical Engineering, vol. 61, no. 6, pp. 1668–1675, 2014

  20. [20]

    Es- timating subglottal pressure from neck-surface acceleration dur- ing normal voice production,

    A. S. Fryd, J. H. Van Stan, R. E. Hillman, and D. D. Mehta, “Es- timating subglottal pressure from neck-surface acceleration dur- ing normal voice production,”Journal of Speech, Language, and Hearing Research, vol. 59, no. 6, pp. 1335–1345, 2016

  21. [21]

    Controlling the false discovery rate: a practical and powerful approach to multiple testing,

    Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,”Jour- nal of the Royal statistical society: series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995

  22. [22]

    Acoustic biomarkers for schizophrenia spectrum disor- ders and their associations with symptoms and cognitive func- tioning,

    K. Jang, L. Li, T.-H. Le, A. Setiani, F. Z. Rami, H. Kim, and Y . C. Chung, “Acoustic biomarkers for schizophrenia spectrum disor- ders and their associations with symptoms and cognitive func- tioning,”Progress in Neuro-Psychopharmacology and Biological Psychiatry, vol. 138, p. 111339, 2025

  23. [23]

    The regression analysis of binary sequences,

    D. R. Cox, “The regression analysis of binary sequences,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 20, no. 2, pp. 215–232, 1958

  24. [24]

    Support-vector networks,

    C. Cortes and V . Vapnik, “Support-vector networks,”Machine learning, vol. 20, no. 3, pp. 273–297, 1995

  25. [25]

    Random forests,

    L. Breiman, “Random forests,”Machine learning, vol. 45, no. 1, pp. 5–32, 2001

  26. [26]

    Xgboost: A scalable tree boosting sys- tem,

    T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting sys- tem,” inProceedings of the 22nd acm sigkdd international con- ference on knowledge discovery and data mining, 2016, pp. 785– 794

  27. [27]

    Lightgbm: A highly efficient gradient boosting de- cision tree,

    G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, “Lightgbm: A highly efficient gradient boosting de- cision tree,”Advances in neural information processing systems, vol. 30, 2017

  28. [28]

    A unified approach to interpreting model predictions,

    S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”Advances in neural information processing systems, vol. 30, 2017

  29. [29]

    Am- bulatory assessment of phonotraumatic vocal hyperfunction us- ing glottal airflow measures estimated from neck-surface acceler- ation,

    J. P. Cort ´es, V . M. Espinoza, M. Ghassemi, D. D. Mehta, J. H. Van Stan, R. E. Hillman, J. V . Guttag, and M. Za ˜nartu, “Am- bulatory assessment of phonotraumatic vocal hyperfunction us- ing glottal airflow measures estimated from neck-surface acceler- ation,”PloS one, vol. 13, no. 12, p. e0209017, 2018

  30. [30]

    Objective assessment of vocal hyperfunction: An experimental framework and initial results,

    R. E. Hillman, E. B. Holmberg, J. S. Perkell, M. Walsh, and C. Vaughan, “Objective assessment of vocal hyperfunction: An experimental framework and initial results,”Journal of Speech, Language, and Hearing Research, vol. 32, no. 2, pp. 373–392, 1989

  31. [31]

    wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,

    A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,”Advances in neural information processing systems, vol. 33, pp. 12 449–12 460, 2020

  32. [32]

    Wavlm: Large-scale self- supervised pre-training for full stack speech processing,

    S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiaoet al., “Wavlm: Large-scale self- supervised pre-training for full stack speech processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022

  33. [33]

    Intrinsic laryngeal muscle activity in response to auto- nomic nervous system activation,

    L. B. Helou, W. Wang, R. C. Ashmore, C. A. Rosen, and K. V . Abbott, “Intrinsic laryngeal muscle activity in response to auto- nomic nervous system activation,”The Laryngoscope, vol. 123, no. 11, pp. 2756–2765, 2013

  34. [34]

    Deep neural network- based analysis of voice biomarkers for monitoring treatment re- sponse in adolescent major depressive disorder,

    J.-W. Kim, H. Yoon, B.-N. Kim, S.-Y . Lee, D.-J. Kim, S.- E. Moon, Y . Choi, and C.-M. Yang, “Deep neural network- based analysis of voice biomarkers for monitoring treatment re- sponse in adolescent major depressive disorder,”Communications Medicine, 2026