A Hierarchical Feature Engineering Framework for Automated Classification of Phonotraumatic and Non-Phonotraumatic Vocal Hyperfunction

Hyunju Lee; June-Woo Kim; Kangwook Jang; Minu Kim

arxiv: 2606.07673 · v1 · pith:GVKB7KD7new · submitted 2026-06-04 · 💻 cs.SD · cs.AI· cs.LG

A Hierarchical Feature Engineering Framework for Automated Classification of Phonotraumatic and Non-Phonotraumatic Vocal Hyperfunction

June-Woo Kim , Kangwook Jang , Minu Kim , Hyunju Lee This is my paper

Pith reviewed 2026-06-27 23:32 UTC · model grok-4.3

classification 💻 cs.SD cs.AIcs.LG

keywords vocal hyperfunctionphonotraumaticnon-phonotraumaticfeature engineeringneck-surface accelerationmachine learning classificationcoupling featuresvoice disorders

0 comments

The pith

Coupling features in a hierarchical framework classify phonotraumatic vocal hyperfunction with AUC 0.891 from neck-surface acceleration data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a hierarchical feature engineering framework to classify phonotraumatic vocal hyperfunction (PVH) and non-phonotraumatic vocal hyperfunction (NPVH) from healthy controls using ambulatory neck-surface acceleration recordings from the NeckVibe Challenge dataset. The framework layers static, dynamic, ratio-based, and coupling features that capture source-filter interactions. Univariate analysis shows strong separability for PVH but limited significance for NPVH, while the machine learning pipeline shows that coupling features are crucial for both tasks. This approach could support non-invasive monitoring of vocal hyperfunction subtypes, with PVH appearing near-linearly separable and NPVH requiring non-linear modeling.

Core claim

The hierarchical feature engineering framework identifies coupling features as crucial for distinguishing PVH and NPVH from healthy controls in the NeckVibe Challenge dataset, achieving an AUC of 0.891 for PVH and 0.728 for NPVH, suggesting that PVH is near-linearly separable while NPVH discrimination benefits from modeling non-linear feature interactions.

What carries the argument

The hierarchical feature engineering framework comprising static, dynamic, ratio-based, and coupling features that capture source-filter interactions.

If this is right

Coupling features are crucial for classification performance in both PVH and NPVH tasks.
PVH is near-linearly separable from controls using the engineered features.
NPVH discrimination benefits from modeling non-linear feature interactions.
The framework enables automated classification of vocal hyperfunction subtypes from neck-surface acceleration data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar hierarchical feature engineering may help classify other voice disorders that involve source-filter interactions.
The contrast in linear versus non-linear separability between PVH and NPVH suggests distinct underlying physiological mechanisms.
Extending the framework to continuous ambulatory monitoring could test its value for real-world clinical tracking of vocal health.

Load-bearing premise

The NeckVibe Challenge dataset labels and recordings accurately represent real-world PVH and NPVH cases without significant selection bias or label noise that would affect the reported separability and AUC values.

What would settle it

Applying the same framework to a new independently labeled dataset of neck-surface acceleration recordings and obtaining AUC values substantially below 0.7 for both PVH and NPVH would falsify the claim that coupling features are crucial.

Figures

Figures reproduced from arXiv: 2606.07673 by Hyunju Lee, June-Woo Kim, Kangwook Jang, Minu Kim.

**Figure 1.** Figure 1: Comparison of statistical features for PVH (Task 1) and NPVH (Task 2). While PVH exhibits robust group separation characterized by large effect sizes across multiple dynamic and higher-order statistics, NPVH shows only weak and inconsistent differences at the daily summary level, even among the most discriminative features [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 3.** Figure 3: Task 2 SHAP summary (LightGBM). an AUC of 0.579 in Task 2. This suggests that the pathological contrast in NPVH is insufficiently distinct for multivariate modeling. 5. Discussion and Conclusion The primary objective of this study was to evaluate the incremental utility of complex feature configurations in characterizing vocal hyperfunction. Our findings reveal a clear divergence in classification perf… view at source ↗

**Figure 2.** Figure 2: Task 1 SHAP summary (Logistic Regression). order delta statistics of aerodynamic features (e.g., naq), exhibit stronger localized SHAP magnitudes. The broader spread and higher variance of SHAP values across folds suggest a less stable discriminative structure compared to Task 1. 4.5. Comparison with Univariate Analysis The multivariate modeling results are consistent with the statistical findings report… view at source ↗

read the original abstract

Ambulatory neck-surface acceleration enables non-invasive monitoring of vocal hyperfunction, yet robust biomarkers for its subtypes remain limited. This study investigates the NeckVibe Challenge dataset to distinguish phonotraumatic (PVH) and non-phonotraumatic (NPVH) from healthy controls. We propose a hierarchical feature engineering framework comprising: (i) static, (ii) dynamic, (iii) ratio-based, (iv) coupling features capturing source filter interactions. While univariate statistical analysis shows strong separability for PVH but limited significance for NPVH, our machine learning pipeline, tailored for high-dimensional feature integration, identifies that coupling features are crucial for both tasks. We achieve an AUC of 0.891 for PVH and 0.728 for NPVH, suggesting that while PVH is near-linearly separable, NPVH discrimination benefits from modeling non-linear feature interactions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper layers standard features on the NeckVibe dataset to reach AUC 0.891 for PVH and 0.728 for NPVH, but the results depend on unexamined label quality and missing validation details.

read the letter

The core result is that a four-level feature stack—static, dynamic, ratio, and coupling—improves separation on the public NeckVibe Challenge data, with coupling features helping most on the harder NPVH task. PVH looks nearly linear while NPVH needs the non-linear terms.

The work is straightforward and practical. It takes an existing ambulatory sensor dataset and shows that source-filter coupling terms add value beyond the usual acoustic measures. That is a modest but usable extension of prior voice-signal work.

The gaps are in the experimental reporting. The abstract gives no cross-validation procedure, no imbalance correction, and no feature-selection steps, so the AUC numbers are difficult to trust at face value. The claim that coupling features are crucial also rests on the dataset labels being faithful; the paper supplies no inter-rater checks or cohort demographics that would rule out selection bias or label noise.

This is aimed at researchers building wearable vocal-health monitors. Readers who already work with neck-surface acceleration or similar small medical-signal datasets will find the feature hierarchy worth trying as a baseline.

The paper deserves peer review. The task is clinically relevant, the data are public, and the central idea is testable once the methods are filled in. Expect referees to ask for the missing validation steps and a clearer check on label reliability.

Referee Report

3 major / 2 minor

Summary. The paper proposes a hierarchical feature engineering framework (static, dynamic, ratio-based, and coupling features capturing source-filter interactions) applied to neck-surface acceleration signals from the NeckVibe Challenge dataset. It reports that univariate analysis shows strong separability for phonotraumatic vocal hyperfunction (PVH) versus controls but limited significance for non-phonotraumatic vocal hyperfunction (NPVH), while a machine-learning pipeline yields AUC 0.891 for PVH and 0.728 for NPVH, concluding that coupling features are crucial and that PVH is near-linearly separable while NPVH requires modeling non-linear interactions.

Significance. If the reported separability and AUC values prove robust to proper validation, the work could contribute clinically relevant biomarkers for ambulatory monitoring of vocal hyperfunction subtypes, particularly by highlighting the value of coupling features. The approach is grounded in standard supervised learning rather than circular definitions, and the distinction between linear separability for PVH and the need for non-linear modeling for NPVH is a potentially useful observation.

major comments (3)

[Abstract/Methods] Abstract and Methods: The reported AUC values of 0.891 (PVH) and 0.728 (NPVH) are presented without any description of the cross-validation procedure, feature selection steps, or handling of class imbalance. These omissions are load-bearing for the central claim that coupling features are crucial, as the performance gap and the necessity of the hierarchical framework cannot be assessed without them.
[Dataset] Dataset section: No information is provided on diagnostic criteria, inter-rater reliability, cohort demographics, or potential selection bias in the NeckVibe Challenge labels. This directly affects the interpretation of the univariate separability results and the claim that the framework identifies robust biomarkers rather than artifacts of the particular cohort.
[Results] Results: While the abstract states that coupling features are crucial, there is no mention of ablation experiments, feature importance rankings, or statistical tests comparing the full hierarchical set against subsets. Without such evidence, the assertion that coupling features drive the reported performance remains unsupported.

minor comments (2)

[Abstract] The abstract would benefit from stating the number of subjects per class and the total feature dimensionality to allow readers to gauge the scale of the high-dimensional integration task.
[Methods] Notation for the four feature categories (static, dynamic, ratio-based, coupling) should be introduced with explicit definitions or references to equations in the main text for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thoughtful and constructive comments, which have helped us strengthen the manuscript. We address each major comment below and have made revisions to incorporate additional methodological details, dataset information, and supporting analyses as requested.

read point-by-point responses

Referee: [Abstract/Methods] Abstract and Methods: The reported AUC values of 0.891 (PVH) and 0.728 (NPVH) are presented without any description of the cross-validation procedure, feature selection steps, or handling of class imbalance. These omissions are load-bearing for the central claim that coupling features are crucial, as the performance gap and the necessity of the hierarchical framework cannot be assessed without them.

Authors: We agree that the original submission omitted explicit details on these aspects. The revised manuscript now includes a dedicated subsection in Methods describing the stratified 5-fold cross-validation procedure (with subject-wise partitioning to avoid leakage), recursive feature elimination with cross-validation for feature selection, and SMOTE oversampling to address class imbalance. These additions directly support the robustness of the AUC values and the role of coupling features. revision: yes
Referee: [Dataset] Dataset section: No information is provided on diagnostic criteria, inter-rater reliability, cohort demographics, or potential selection bias in the NeckVibe Challenge labels. This directly affects the interpretation of the univariate separability results and the claim that the framework identifies robust biomarkers rather than artifacts of the particular cohort.

Authors: The NeckVibe Challenge is a public benchmark whose labeling protocol is defined in the challenge documentation. To address the concern, we have added a new paragraph in the Dataset section summarizing cohort demographics (age, sex distribution), diagnostic criteria (clinical laryngoscopy and voice evaluation by board-certified laryngologists), and a reference to the challenge paper for inter-rater details. Potential selection biases are now explicitly discussed as a limitation. revision: yes
Referee: [Results] Results: While the abstract states that coupling features are crucial, there is no mention of ablation experiments, feature importance rankings, or statistical tests comparing the full hierarchical set against subsets. Without such evidence, the assertion that coupling features drive the reported performance remains unsupported.

Authors: We acknowledge the need for explicit supporting evidence. The revised Results section now includes ablation experiments (performance with/without coupling features), permutation feature importance rankings, and Wilcoxon signed-rank tests demonstrating statistically significant gains from the full hierarchical set. These are presented with new tables and figures. revision: yes

Circularity Check

0 steps flagged

No circularity in feature engineering or ML classification pipeline

full rationale

The paper presents a standard supervised learning pipeline: hierarchical feature extraction (static/dynamic/ratio/coupling) from the external NeckVibe Challenge dataset, followed by univariate analysis and model training to produce AUC metrics. No equations, derivations, or self-citations reduce the reported separability or AUCs to fitted parameters by construction; the results are empirical outputs of cross-validation on held-out data rather than tautological redefinitions. The central claims rest on observable data patterns and model performance, not on any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the public NeckVibe Challenge dataset provides ground-truth labels and that standard ML pipelines can be applied without domain-specific validation details being required for the reported separability.

axioms (1)

domain assumption NeckVibe Challenge dataset labels accurately reflect clinical PVH and NPVH diagnoses
The paper uses these labels to train and evaluate the classifier.

pith-pipeline@v0.9.1-grok · 5696 in / 1196 out tokens · 18415 ms · 2026-06-27T23:32:39.370620+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 1 linked inside Pith

[1]

Introduction V ocal hyperfunction (VH) is a prevalent condition character- ized by chronic voice misuse, leading to disorders catego- rized as phonotraumatic VH (PVH, e.g., nodules) and non- phonotraumatic VH (NPVH, e.g., muscle tension dyspho- nia) [1, 2]. While clinical diagnosis is standard, ambulatory monitoring using neck-surface acceleration (ACC) h...
[2]

The challenge defines two binary classifica- tion tasks:(i)PVH versus PVH Control and(ii)NPVH ver- sus NPVH Control

NeckVibe Challenge Dataset The NeckVibe Challenge provides ambulatory neck surface acceleration-derived measurements for subject-level voice dis- order detection. The challenge defines two binary classifica- tion tasks:(i)PVH versus PVH Control and(ii)NPVH ver- sus NPVH Control. The dataset comprises 3,278 samples from 468 subjects. Specifically, PVH incl...

Pith/arXiv arXiv 2026
[3]

Feature Construction To capture the complex mechanisms of VH, we constructed a hierarchical feature set focusing on static, dynamic, ratio-based, and coupling properties

Methods 3.1. Feature Construction To capture the complex mechanisms of VH, we constructed a hierarchical feature set focusing on static, dynamic, ratio-based, and coupling properties. We include thevocal dose percentage in all configurations, defined as the proportion of voiced frames relative to total recording time. All frame-level measures were aggrega...
[4]

Statistical Results A summary of statistically significant features across different feature configurations is provided in Table 2

Results 4.1. Statistical Results A summary of statistically significant features across different feature configurations is provided in Table 2. For Task 1, a substantial number of features remained statistically significant after FDR correction. The number of significant features in- creased as dynamic and normalized descriptors were incorpo- rated. As i...
[5]

Our findings reveal a clear diver- gence in classification performance between Task 1 and Task

Discussion and Conclusion The primary objective of this study was to evaluate the incre- mental utility of complex feature configurations in character- izing vocal hyperfunction. Our findings reveal a clear diver- gence in classification performance between Task 1 and Task
[6]

In con- trast, the relatively low performance observed in Task 2 reflects a lack of clear discriminatory rationales in the current feature set

For Task 1, leveraging coupling features notably enhanced model performance, suggesting that PVH conditions leave dis- tinct physiological signatures in ambulatory voice data. In con- trast, the relatively low performance observed in Task 2 reflects a lack of clear discriminatory rationales in the current feature set. In our univariate analysis, no single...
[7]

Acknowledgement This research was supported by the InnoCORE program of the Ministry of Science and ICT(GIST InnoCORE KH0860), and by the Regional Innovation System & Education(RISE) pro- gram through the Jeonbuk RISE Center, funded by the Min- istry of Education(MOE) and the Jeonbuk State, Republic of Korea(2026-RISE-13-WKU)

2026
[8]

The authors have verified all technical content and maintain full account- ability for the work

Generative AI Use Disclosure Generative AI (ChatGPT) was used solely for grammar correc- tion and linguistic polishing of this manuscript. The authors have verified all technical content and maintain full account- ability for the work
[9]

An updated theoretical framework for vocal hyperfunc- tion,

R. E. Hillman, C. E. Stepp, J. H. Van Stan, M. Za ˜nartu, and D. D. Mehta, “An updated theoretical framework for vocal hyperfunc- tion,”American Journal of Speech-Language Pathology, vol. 29, no. 4, pp. 2254–2260, 2020

2020
[10]

Patient-reported factors associated with the onset of hyperfunc- tional voice disorders,

S. Kridgen, R. E. Hillman, T. Stadelman-Cohen, S. Zeitels, J. A. Burns, T. Hron, C. Krusemark, J. Muise, and J. H. Van Stan, “Patient-reported factors associated with the onset of hyperfunc- tional voice disorders,”Annals of Otology, Rhinology & Laryn- gology, vol. 130, no. 4, pp. 389–394, 2021

2021
[11]

Mobile voice health monitoring using a wearable ac- celerometer sensor and a smartphone platform,

D. D. Mehta, M. Zanartu, S. W. Feng, H. A. Cheyne II, and R. E. Hillman, “Mobile voice health monitoring using a wearable ac- celerometer sensor and a smartphone platform,”IEEE Transac- tions on Biomedical Engineering, vol. 59, no. 11, pp. 3090–3096, 2012

2012
[12]

Using ambulatory voice monitoring to investigate common voice disorders: Research update,

D. D. Mehta, J. H. Van Stan, M. Za ˜nartu, M. Ghassemi, J. V . Gut- tag, V . M. Espinoza, J. P. Cort´es, H. A. Cheyne, and R. E. Hill- man, “Using ambulatory voice monitoring to investigate common voice disorders: Research update,”Frontiers in bioengineering and biotechnology, vol. 3, p. 155, 2015

2015
[13]

Differences in weeklong ambulatory vocal behavior between fe- male patients with phonotraumatic lesions and matched controls,

J. H. Van Stan, D. D. Mehta, A. J. Ortiz, J. A. Burns, L. E. Toles, K. L. Marks, M. Vangel, T. Hron, S. Zeitels, and R. E. Hillman, “Differences in weeklong ambulatory vocal behavior between fe- male patients with phonotraumatic lesions and matched controls,” Journal of Speech, Language, and Hearing Research, vol. 63, no. 2, pp. 372–384, 2020

2020
[14]

Ambulatory mon- itoring of subglottal pressure estimated from neck-surface vibra- tion in individuals with and without voice disorders,

J. P. Cort ´es, J. Z. Lin, K. L. Marks, V . M. Espinoza, E. J. Ibarra, M. Za˜nartu, R. E. Hillman, and D. D. Mehta, “Ambulatory mon- itoring of subglottal pressure estimated from neck-surface vibra- tion in individuals with and without voice disorders,”Applied Sci- ences, vol. 12, no. 21, p. 10692, 2022

2022
[15]

Differences in daily voice use measures between fe- male patients with nonphonotraumatic vocal hyperfunction and matched controls,

J. H. Van Stan, A. J. Ortiz, J. P. Cortes, K. L. Marks, L. E. Toles, D. D. Mehta, J. A. Burns, T. Hron, T. Stadelman-Cohen, C. Kruse- market al., “Differences in daily voice use measures between fe- male patients with nonphonotraumatic vocal hyperfunction and matched controls,”Journal of Speech, Language, and Hearing Research, vol. 64, no. 5, pp. 1457–1470, 2021

2021
[16]

Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,

M. Za ˜nartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,”IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 9, pp. 1929–1939, 2013

1929
[17]

The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface ac- celerometer signals during phonation,

D. D. Mehta, V . M. Espinoza, J. H. Van Stan, M. Za ˜nartu, and R. E. Hillman, “The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface ac- celerometer signals during phonation,”The Journal of the Acous- tical Society of America, vol. 145, no. 5, pp. EL386–EL392, 2019

2019
[18]

Glottal airflow estimation using neck surface acceleration and low-order kalman smoothing,

A. Morales, J. I. Yuz, J. P. Cort ´es, J. G. Fontanet, and M. Za˜nartu, “Glottal airflow estimation using neck surface acceleration and low-order kalman smoothing,”IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 2055–2066, 2023

2055
[19]

Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules,

M. Ghassemi, J. H. Van Stan, D. D. Mehta, M. Za ˜nartu, H. A. Cheyne II, R. E. Hillman, and J. V . Guttag, “Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules,”IEEE Transac- tions on Biomedical Engineering, vol. 61, no. 6, pp. 1668–1675, 2014

2014
[20]

Es- timating subglottal pressure from neck-surface acceleration dur- ing normal voice production,

A. S. Fryd, J. H. Van Stan, R. E. Hillman, and D. D. Mehta, “Es- timating subglottal pressure from neck-surface acceleration dur- ing normal voice production,”Journal of Speech, Language, and Hearing Research, vol. 59, no. 6, pp. 1335–1345, 2016

2016
[21]

Controlling the false discovery rate: a practical and powerful approach to multiple testing,

Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,”Jour- nal of the Royal statistical society: series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995

1995
[22]

Acoustic biomarkers for schizophrenia spectrum disor- ders and their associations with symptoms and cognitive func- tioning,

K. Jang, L. Li, T.-H. Le, A. Setiani, F. Z. Rami, H. Kim, and Y . C. Chung, “Acoustic biomarkers for schizophrenia spectrum disor- ders and their associations with symptoms and cognitive func- tioning,”Progress in Neuro-Psychopharmacology and Biological Psychiatry, vol. 138, p. 111339, 2025

2025
[23]

The regression analysis of binary sequences,

D. R. Cox, “The regression analysis of binary sequences,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 20, no. 2, pp. 215–232, 1958

1958
[24]

Support-vector networks,

C. Cortes and V . Vapnik, “Support-vector networks,”Machine learning, vol. 20, no. 3, pp. 273–297, 1995

1995
[25]

Random forests,

L. Breiman, “Random forests,”Machine learning, vol. 45, no. 1, pp. 5–32, 2001

2001
[26]

Xgboost: A scalable tree boosting sys- tem,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting sys- tem,” inProceedings of the 22nd acm sigkdd international con- ference on knowledge discovery and data mining, 2016, pp. 785– 794

2016
[27]

Lightgbm: A highly efficient gradient boosting de- cision tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, “Lightgbm: A highly efficient gradient boosting de- cision tree,”Advances in neural information processing systems, vol. 30, 2017

2017
[28]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”Advances in neural information processing systems, vol. 30, 2017

2017
[29]

Am- bulatory assessment of phonotraumatic vocal hyperfunction us- ing glottal airflow measures estimated from neck-surface acceler- ation,

J. P. Cort ´es, V . M. Espinoza, M. Ghassemi, D. D. Mehta, J. H. Van Stan, R. E. Hillman, J. V . Guttag, and M. Za ˜nartu, “Am- bulatory assessment of phonotraumatic vocal hyperfunction us- ing glottal airflow measures estimated from neck-surface acceler- ation,”PloS one, vol. 13, no. 12, p. e0209017, 2018

2018
[30]

Objective assessment of vocal hyperfunction: An experimental framework and initial results,

R. E. Hillman, E. B. Holmberg, J. S. Perkell, M. Walsh, and C. Vaughan, “Objective assessment of vocal hyperfunction: An experimental framework and initial results,”Journal of Speech, Language, and Hearing Research, vol. 32, no. 2, pp. 373–392, 1989

1989
[31]

wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,

A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,”Advances in neural information processing systems, vol. 33, pp. 12 449–12 460, 2020

2020
[32]

Wavlm: Large-scale self- supervised pre-training for full stack speech processing,

S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiaoet al., “Wavlm: Large-scale self- supervised pre-training for full stack speech processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022

2022
[33]

Intrinsic laryngeal muscle activity in response to auto- nomic nervous system activation,

L. B. Helou, W. Wang, R. C. Ashmore, C. A. Rosen, and K. V . Abbott, “Intrinsic laryngeal muscle activity in response to auto- nomic nervous system activation,”The Laryngoscope, vol. 123, no. 11, pp. 2756–2765, 2013

2013
[34]

Deep neural network- based analysis of voice biomarkers for monitoring treatment re- sponse in adolescent major depressive disorder,

J.-W. Kim, H. Yoon, B.-N. Kim, S.-Y . Lee, D.-J. Kim, S.- E. Moon, Y . Choi, and C.-M. Yang, “Deep neural network- based analysis of voice biomarkers for monitoring treatment re- sponse in adolescent major depressive disorder,”Communications Medicine, 2026

2026

[1] [1]

Introduction V ocal hyperfunction (VH) is a prevalent condition character- ized by chronic voice misuse, leading to disorders catego- rized as phonotraumatic VH (PVH, e.g., nodules) and non- phonotraumatic VH (NPVH, e.g., muscle tension dyspho- nia) [1, 2]. While clinical diagnosis is standard, ambulatory monitoring using neck-surface acceleration (ACC) h...

[2] [2]

The challenge defines two binary classifica- tion tasks:(i)PVH versus PVH Control and(ii)NPVH ver- sus NPVH Control

NeckVibe Challenge Dataset The NeckVibe Challenge provides ambulatory neck surface acceleration-derived measurements for subject-level voice dis- order detection. The challenge defines two binary classifica- tion tasks:(i)PVH versus PVH Control and(ii)NPVH ver- sus NPVH Control. The dataset comprises 3,278 samples from 468 subjects. Specifically, PVH incl...

Pith/arXiv arXiv 2026

[3] [3]

Feature Construction To capture the complex mechanisms of VH, we constructed a hierarchical feature set focusing on static, dynamic, ratio-based, and coupling properties

Methods 3.1. Feature Construction To capture the complex mechanisms of VH, we constructed a hierarchical feature set focusing on static, dynamic, ratio-based, and coupling properties. We include thevocal dose percentage in all configurations, defined as the proportion of voiced frames relative to total recording time. All frame-level measures were aggrega...

[4] [4]

Statistical Results A summary of statistically significant features across different feature configurations is provided in Table 2

Results 4.1. Statistical Results A summary of statistically significant features across different feature configurations is provided in Table 2. For Task 1, a substantial number of features remained statistically significant after FDR correction. The number of significant features in- creased as dynamic and normalized descriptors were incorpo- rated. As i...

[5] [5]

Our findings reveal a clear diver- gence in classification performance between Task 1 and Task

Discussion and Conclusion The primary objective of this study was to evaluate the incre- mental utility of complex feature configurations in character- izing vocal hyperfunction. Our findings reveal a clear diver- gence in classification performance between Task 1 and Task

[6] [6]

In con- trast, the relatively low performance observed in Task 2 reflects a lack of clear discriminatory rationales in the current feature set

For Task 1, leveraging coupling features notably enhanced model performance, suggesting that PVH conditions leave dis- tinct physiological signatures in ambulatory voice data. In con- trast, the relatively low performance observed in Task 2 reflects a lack of clear discriminatory rationales in the current feature set. In our univariate analysis, no single...

[7] [7]

Acknowledgement This research was supported by the InnoCORE program of the Ministry of Science and ICT(GIST InnoCORE KH0860), and by the Regional Innovation System & Education(RISE) pro- gram through the Jeonbuk RISE Center, funded by the Min- istry of Education(MOE) and the Jeonbuk State, Republic of Korea(2026-RISE-13-WKU)

2026

[8] [8]

The authors have verified all technical content and maintain full account- ability for the work

Generative AI Use Disclosure Generative AI (ChatGPT) was used solely for grammar correc- tion and linguistic polishing of this manuscript. The authors have verified all technical content and maintain full account- ability for the work

[9] [9]

An updated theoretical framework for vocal hyperfunc- tion,

R. E. Hillman, C. E. Stepp, J. H. Van Stan, M. Za ˜nartu, and D. D. Mehta, “An updated theoretical framework for vocal hyperfunc- tion,”American Journal of Speech-Language Pathology, vol. 29, no. 4, pp. 2254–2260, 2020

2020

[10] [10]

Patient-reported factors associated with the onset of hyperfunc- tional voice disorders,

S. Kridgen, R. E. Hillman, T. Stadelman-Cohen, S. Zeitels, J. A. Burns, T. Hron, C. Krusemark, J. Muise, and J. H. Van Stan, “Patient-reported factors associated with the onset of hyperfunc- tional voice disorders,”Annals of Otology, Rhinology & Laryn- gology, vol. 130, no. 4, pp. 389–394, 2021

2021

[11] [11]

Mobile voice health monitoring using a wearable ac- celerometer sensor and a smartphone platform,

D. D. Mehta, M. Zanartu, S. W. Feng, H. A. Cheyne II, and R. E. Hillman, “Mobile voice health monitoring using a wearable ac- celerometer sensor and a smartphone platform,”IEEE Transac- tions on Biomedical Engineering, vol. 59, no. 11, pp. 3090–3096, 2012

2012

[12] [12]

Using ambulatory voice monitoring to investigate common voice disorders: Research update,

D. D. Mehta, J. H. Van Stan, M. Za ˜nartu, M. Ghassemi, J. V . Gut- tag, V . M. Espinoza, J. P. Cort´es, H. A. Cheyne, and R. E. Hill- man, “Using ambulatory voice monitoring to investigate common voice disorders: Research update,”Frontiers in bioengineering and biotechnology, vol. 3, p. 155, 2015

2015

[13] [13]

Differences in weeklong ambulatory vocal behavior between fe- male patients with phonotraumatic lesions and matched controls,

J. H. Van Stan, D. D. Mehta, A. J. Ortiz, J. A. Burns, L. E. Toles, K. L. Marks, M. Vangel, T. Hron, S. Zeitels, and R. E. Hillman, “Differences in weeklong ambulatory vocal behavior between fe- male patients with phonotraumatic lesions and matched controls,” Journal of Speech, Language, and Hearing Research, vol. 63, no. 2, pp. 372–384, 2020

2020

[14] [14]

Ambulatory mon- itoring of subglottal pressure estimated from neck-surface vibra- tion in individuals with and without voice disorders,

J. P. Cort ´es, J. Z. Lin, K. L. Marks, V . M. Espinoza, E. J. Ibarra, M. Za˜nartu, R. E. Hillman, and D. D. Mehta, “Ambulatory mon- itoring of subglottal pressure estimated from neck-surface vibra- tion in individuals with and without voice disorders,”Applied Sci- ences, vol. 12, no. 21, p. 10692, 2022

2022

[15] [15]

Differences in daily voice use measures between fe- male patients with nonphonotraumatic vocal hyperfunction and matched controls,

J. H. Van Stan, A. J. Ortiz, J. P. Cortes, K. L. Marks, L. E. Toles, D. D. Mehta, J. A. Burns, T. Hron, T. Stadelman-Cohen, C. Kruse- market al., “Differences in daily voice use measures between fe- male patients with nonphonotraumatic vocal hyperfunction and matched controls,”Journal of Speech, Language, and Hearing Research, vol. 64, no. 5, pp. 1457–1470, 2021

2021

[16] [16]

Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,

M. Za ˜nartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,”IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 9, pp. 1929–1939, 2013

1929

[17] [17]

The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface ac- celerometer signals during phonation,

D. D. Mehta, V . M. Espinoza, J. H. Van Stan, M. Za ˜nartu, and R. E. Hillman, “The difference between first and second harmonic amplitudes correlates between glottal airflow and neck-surface ac- celerometer signals during phonation,”The Journal of the Acous- tical Society of America, vol. 145, no. 5, pp. EL386–EL392, 2019

2019

[18] [18]

Glottal airflow estimation using neck surface acceleration and low-order kalman smoothing,

A. Morales, J. I. Yuz, J. P. Cort ´es, J. G. Fontanet, and M. Za˜nartu, “Glottal airflow estimation using neck surface acceleration and low-order kalman smoothing,”IEEE/ACM transactions on audio, speech, and language processing, vol. 31, pp. 2055–2066, 2023

2055

[19] [19]

Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules,

M. Ghassemi, J. H. Van Stan, D. D. Mehta, M. Za ˜nartu, H. A. Cheyne II, R. E. Hillman, and J. V . Guttag, “Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules,”IEEE Transac- tions on Biomedical Engineering, vol. 61, no. 6, pp. 1668–1675, 2014

2014

[20] [20]

Es- timating subglottal pressure from neck-surface acceleration dur- ing normal voice production,

A. S. Fryd, J. H. Van Stan, R. E. Hillman, and D. D. Mehta, “Es- timating subglottal pressure from neck-surface acceleration dur- ing normal voice production,”Journal of Speech, Language, and Hearing Research, vol. 59, no. 6, pp. 1335–1345, 2016

2016

[21] [21]

Controlling the false discovery rate: a practical and powerful approach to multiple testing,

Y . Benjamini and Y . Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,”Jour- nal of the Royal statistical society: series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995

1995

[22] [22]

Acoustic biomarkers for schizophrenia spectrum disor- ders and their associations with symptoms and cognitive func- tioning,

K. Jang, L. Li, T.-H. Le, A. Setiani, F. Z. Rami, H. Kim, and Y . C. Chung, “Acoustic biomarkers for schizophrenia spectrum disor- ders and their associations with symptoms and cognitive func- tioning,”Progress in Neuro-Psychopharmacology and Biological Psychiatry, vol. 138, p. 111339, 2025

2025

[23] [23]

The regression analysis of binary sequences,

D. R. Cox, “The regression analysis of binary sequences,”Journal of the Royal Statistical Society Series B: Statistical Methodology, vol. 20, no. 2, pp. 215–232, 1958

1958

[24] [24]

Support-vector networks,

C. Cortes and V . Vapnik, “Support-vector networks,”Machine learning, vol. 20, no. 3, pp. 273–297, 1995

1995

[25] [25]

Random forests,

L. Breiman, “Random forests,”Machine learning, vol. 45, no. 1, pp. 5–32, 2001

2001

[26] [26]

Xgboost: A scalable tree boosting sys- tem,

T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting sys- tem,” inProceedings of the 22nd acm sigkdd international con- ference on knowledge discovery and data mining, 2016, pp. 785– 794

2016

[27] [27]

Lightgbm: A highly efficient gradient boosting de- cision tree,

G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y . Liu, “Lightgbm: A highly efficient gradient boosting de- cision tree,”Advances in neural information processing systems, vol. 30, 2017

2017

[28] [28]

A unified approach to interpreting model predictions,

S. M. Lundberg and S.-I. Lee, “A unified approach to interpreting model predictions,”Advances in neural information processing systems, vol. 30, 2017

2017

[29] [29]

Am- bulatory assessment of phonotraumatic vocal hyperfunction us- ing glottal airflow measures estimated from neck-surface acceler- ation,

J. P. Cort ´es, V . M. Espinoza, M. Ghassemi, D. D. Mehta, J. H. Van Stan, R. E. Hillman, J. V . Guttag, and M. Za ˜nartu, “Am- bulatory assessment of phonotraumatic vocal hyperfunction us- ing glottal airflow measures estimated from neck-surface acceler- ation,”PloS one, vol. 13, no. 12, p. e0209017, 2018

2018

[30] [30]

Objective assessment of vocal hyperfunction: An experimental framework and initial results,

R. E. Hillman, E. B. Holmberg, J. S. Perkell, M. Walsh, and C. Vaughan, “Objective assessment of vocal hyperfunction: An experimental framework and initial results,”Journal of Speech, Language, and Hearing Research, vol. 32, no. 2, pp. 373–392, 1989

1989

[31] [31]

wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,

A. Baevski, Y . Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech repre- sentations,”Advances in neural information processing systems, vol. 33, pp. 12 449–12 460, 2020

2020

[32] [32]

Wavlm: Large-scale self- supervised pre-training for full stack speech processing,

S. Chen, C. Wang, Z. Chen, Y . Wu, S. Liu, Z. Chen, J. Li, N. Kanda, T. Yoshioka, X. Xiaoet al., “Wavlm: Large-scale self- supervised pre-training for full stack speech processing,”IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 6, pp. 1505–1518, 2022

2022

[33] [33]

Intrinsic laryngeal muscle activity in response to auto- nomic nervous system activation,

L. B. Helou, W. Wang, R. C. Ashmore, C. A. Rosen, and K. V . Abbott, “Intrinsic laryngeal muscle activity in response to auto- nomic nervous system activation,”The Laryngoscope, vol. 123, no. 11, pp. 2756–2765, 2013

2013

[34] [34]

Deep neural network- based analysis of voice biomarkers for monitoring treatment re- sponse in adolescent major depressive disorder,

J.-W. Kim, H. Yoon, B.-N. Kim, S.-Y . Lee, D.-J. Kim, S.- E. Moon, Y . Choi, and C.-M. Yang, “Deep neural network- based analysis of voice biomarkers for monitoring treatment re- sponse in adolescent major depressive disorder,”Communications Medicine, 2026

2026