Articulatory strategy as a source of variation in acoustic vowel dynamics
Pith reviewed 2026-05-25 04:37 UTC · model grok-4.3
The pith
Tongue shape during /i/ predicts the timing and steepness of formant transitions in palatal diphthongs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Tongue shape in /i/ is a significant predictor of formant dynamics in diphthongs with a palatal offglide; greater displacement of tongue root and dorsum produces greater distortion from mean shape and requires higher velocities, resulting in relatively earlier and steeper formant transitions.
What carries the argument
Ultrasound tongue imaging of 36 speakers to classify articulatory strategies for /i/, followed by statistical regression linking those measures to formant trajectory parameters in diphthongs.
If this is right
- Speaker-specific acoustic patterns in vowels arise in part from consistent articulatory habits rather than vocal tract anatomy alone.
- Formant dynamics become more extreme when the required tongue displacement from the /i/ target is larger.
- Articulatory compensation mechanisms regularize some aspects of vowel production while preserving individual differences in timing and velocity.
Where Pith is reading between the lines
- The same tongue-shape predictor may apply to other offglide contexts or languages if the underlying displacement-velocity relation is general.
- Perception experiments could test whether listeners use these dynamic cues to identify speakers even when static formant targets are matched.
- Longitudinal data on the same speakers would show whether the observed strategies remain stable or shift with age or dialect exposure.
Load-bearing premise
The ultrasound tongue images from 36 speakers capture stable individual articulatory strategies whose statistical links to formant dynamics reflect causal effects of movement rather than shared speaker traits or measurement confounds.
What would settle it
Finding that formant transition timing and slope in the same diphthongs show no reliable difference when speakers are grouped by ultrasound tongue shape in /i/, after accounting for vocal tract length and other covariates.
Figures
read the original abstract
Acoustic vowel dynamics have some speaker-identifying characteristics, which have been ascribed to individual properties of articulatory strategies: formant transitions have a particular shape because speakers move their articulators, using specific and practised movements. However, there is little existing evidence that different articulatory strategies systematically affect formant dynamics. The present study corroborates the link between the two. Ultrasound tongue imaging data from 36 speakers of Northern-Anglo English are used to identify distinct articulatory strategies for the production of palatal vowel /i/. Tongue shape in /i/ is found to be a significant predictor of formant dynamics in diphthongs with a palatal offglide. The observed relationships can be explained by the characteristics of articulatory movement conditioned by vocal tract shape. Greater articulatory displacement of tongue root and/or dorsum produces greater distortion from the mean tongue shape in palatal vowels, and it also requires higher articulatory velocities, resulting in relatively earlier and steeper formant transitions. The results contribute to the conceptual understanding of individuality in speech, by illuminating the regularising and individual aspects of articulatory compensation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript uses ultrasound tongue imaging from 36 speakers of Northern-Anglo English to identify distinct articulatory strategies for /i/ and shows that tongue shape in /i/ is a significant predictor of formant dynamics in diphthongs with a palatal offglide. The observed relationships are attributed to greater articulatory displacement and velocity of tongue root/dorsum, which produce earlier and steeper formant transitions; the work frames this as evidence for how vocal-tract shape conditions individual articulatory compensation.
Significance. If the statistical relationships are shown to survive appropriate controls, the result would supply direct empirical evidence that articulatory strategy contributes to speaker-specific acoustic vowel dynamics, strengthening the conceptual link between vocal-tract geometry, movement kinematics, and formant trajectories.
major comments (2)
- [Abstract / Results] Abstract and Results section: the claim that tongue shape in /i/ is a 'significant predictor' is presented without any reported coefficients, p-values, effect sizes, model specification, or exclusion criteria for the 36 speakers. This absence prevents evaluation of whether the relationship survives controls for vocal-tract length, habitual jaw height, or speaking rate.
- [Statistical analysis] Statistical analysis: the manuscript states that the relationships 'can be explained by' displacement and velocity but does not indicate whether the regression distinguishes within-speaker from between-speaker variation or includes covariates that could jointly affect both /i/ shape and diphthong transitions.
minor comments (2)
- [Methods] Clarify the precise tongue-shape parameters extracted from the ultrasound data and how they were quantified (e.g., principal components or curvature measures).
- [Introduction] The phrase 'Northern-Anglo English' should be defined or referenced to a standard variety description.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We agree that greater transparency in statistical reporting is needed and will revise the manuscript accordingly to strengthen the presentation of the results.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results section: the claim that tongue shape in /i/ is a 'significant predictor' is presented without any reported coefficients, p-values, effect sizes, model specification, or exclusion criteria for the 36 speakers. This absence prevents evaluation of whether the relationship survives controls for vocal-tract length, habitual jaw height, or speaking rate.
Authors: We accept this criticism. The revised manuscript will expand both the abstract and results sections to report the full regression model specifications (including fixed and random effects), coefficients with standard errors, p-values, effect sizes (e.g., R² or standardized betas), and explicit speaker exclusion criteria. We will also add analyses that control for vocal-tract length (estimated from formant spacing), habitual jaw height (from ultrasound), and speaking rate (syllables per second), demonstrating that the predictive relationship between /i/ tongue shape and diphthong formant dynamics remains significant after these covariates. revision: yes
-
Referee: [Statistical analysis] Statistical analysis: the manuscript states that the relationships 'can be explained by' displacement and velocity but does not indicate whether the regression distinguishes within-speaker from between-speaker variation or includes covariates that could jointly affect both /i/ shape and diphthong transitions.
Authors: We agree that the current description is insufficient. In the revision we will clarify that all primary models are linear mixed-effects regressions with by-speaker random intercepts and slopes, thereby partitioning within-speaker from between-speaker variance. We will also document the covariate selection process and report models that include vocal-tract length, jaw position, and speaking rate as fixed effects, together with model comparison statistics showing that the tongue-shape predictor retains explanatory power after these controls. revision: yes
Circularity Check
No circularity: purely empirical statistical analysis from external speaker data
full rationale
The paper reports an observational study correlating ultrasound tongue shapes in /i/ with formant transition properties in diphthongs across 36 speakers. No equations, parameter-fitting steps, or predictions are described that reduce to the inputs by construction. No self-citations are invoked as uniqueness theorems or to justify ansatzes. The central claim rests on measured data and regression results rather than definitional equivalence or renaming of known patterns. This is the normal case of a self-contained empirical report.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Ultrasound tongue imaging provides an accurate and sufficient measure of individual tongue shape and articulatory strategy during vowel production.
Reference graph
Works this paper leans on
-
[1]
author Baranowski, M. , and author Turton, D. ( year 2015 ). title Manchester E nglish in booktitle Researching Northern Englishes , edited by editor R. Hickey ( publisher John Benjamins , address Amsterdam and Philadelphia ), pp. pages 293--316
work page 2015
-
[2]
author Barreda, S. ( year 2021 ). title Fast track: fast (nearly) automatic formant-tracking using P raat journal Linguistics Vanguard 7(1), pages 20200051
work page 2021
-
[3]
author Blumstein, S. E. , and author Stevens, K. N. ( year 1979 ). title Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants journal J. Acoust. Soc. Am. 66(4), pages 1001--1017
work page 1979
-
[4]
author Boersma, P. , and author Weenink, D. ( year 2022 ). title Praat: doing phonetics by computer [ C omputer programme] http://www.praat.org/ , note V ersion 6.2.14
work page 2022
-
[5]
author Brunner, J. , author Fuchs, S. , and author Perrier, P. ( year 2009 ). title On the relationship between palate shape and articulatory behavior journal J. Acoust. Soc. Am. 125(6), pages 3936--3949
work page 2009
-
[6]
author Eckert, P. ( year 2008 ). title Variation and the indexical field journal J. Sociolinguistics 12(4), pages 453--476
work page 2008
-
[7]
author Foulkes, P. , and author Docherty, G. ( year 2006 ). title The social life of phonetics and phonology journal J. Phonetics 34, pages 409--438
work page 2006
-
[8]
author Gay, T. , author Lindblom, B. , and author Lubker, J. ( year 1981 ). title Production of bite-block vowels: A coustic equivalence by selective compensation journal J. Acoust. Soc. Am. 69(3), pages 802--810
work page 1981
-
[9]
author Guenther, F. H. ( year 2016 ). title Neural Control of Speech ( publisher The MIT Press , address Cambridge, MA )
work page 2016
-
[10]
author Hasegawa-Johnson, M. , author Pizza, S. , author Alwan, A. , author Cha, J. S. , and author Haker, K. ( year 2003 ). title Vowel category dependence of the relationship between palate height, tongue height, and oral area journal J. Speech Lang. Hear. Res. 46(3), pages 738--753
work page 2003
-
[11]
author Heeren, W. ( year 2020 ). title The contribution of dynamic versus static formant information in conversational speech journal Int. J. Speech Lang. Law 27(1), pages 75--98
work page 2020
-
[12]
author Houde, J. , and author Jordan, M. I. ( year 1998 ). title Sensorimotor adaptation in speech production journal Science 279(5354), pages 1213--1216
work page 1998
-
[13]
author Hughes, V. , author Wood, S. , and author Foulkes, P. ( year 2016 ). title Strength of forensic voice comparison evidence from the acoustics of filled pauses journal Int. J. Speech Lang. Law 23(1), pages 99--132
work page 2016
-
[14]
author Johnson, K. ( year 2020 ). title The F method of vocal tract length normalization for vowels journal Lab. Phonol. 11(1), pages 10
work page 2020
-
[15]
author Johnson, K. ( year 2023 ). title Individual differences in speech production: What is ``phonetic substance''? in booktitle Proc. 20th Inter. Congr. P honetic Sci. , edited by editor R. Skarnitzl and editor J. Vol\' i n , publisher International Phonetic Association , pp. pages 1102--1106
work page 2023
-
[16]
author Johnson, K. , author Ladefoged, P. , and author Lindau, M. ( year 1993 ). title Individual differences in vowel production journal J. Acoust. Soc. Am. 94(2), pages 701--714
work page 1993
-
[17]
author Kent, J. T. ( year 1992 ). title New directions in shape analysis in booktitle The Art of Statistical Science , edited by editor K. V. Mardia ( publisher Wiley , address New York ), pp. pages 115--127
work page 1992
-
[18]
author Kirchhoff, K. , author Fink, G. A. , and author Sagerer, G. ( year 2002 ). title Combining acoustic and articulatory feature information for robust speech recognition journal Speech Commun. 37(3--4), pages 303--319
work page 2002
-
[19]
author Kirkham, S. , and author Strycharczuk, P. ( year 2025 ). title Dynamical model parameters from ultrasound tongue kinematics journal JASA Express Lett. 5(11), pages 115201
work page 2025
-
[20]
author Kirkham, S. , author Strycharczuk, P. , author Gorman, E. , author Nagamine, T. , and author Wrench, A. ( year 2023 ). title Co-registration of simultaneous high speed ultrasound and electromagnetic articulography for speech production research in booktitle Proc. 20th I nter. C ongr. P honetic S ci. , edited by editor R. Skarnitzl and editor J. Vol...
work page 2023
-
[21]
author Kuznetsova, A. , author Brockhoff, P. B. , and author Christensen, R. H. B. ( year 2017 ). title lmerTest package: Tests in linear mixed effects models journal J. Statist. Softw. 82(13), pages 1--26
work page 2017
-
[22]
author Lammert, A. , author Proctor, M. , and author Narayanan, S. ( year 2013 a ). title Interspeaker variability in hard palate morphology and vowel production journal J. Speech Lang. Hear. Res. 56(6), pages 1924--1933
work page 2013
-
[23]
author Lammert, A. , author Proctor, M. , and author Narayanan, S. ( year 2013 b ). title Morphological variation in the adult hard palate and posterior pharyngeal wall journal J. Speech Lang. Hear. Res. 56(2), pages 521--530
work page 2013
-
[24]
author Laver, J. ( year 1980 ). title The Phonetic Description of Voice Quality ( publisher Cambridge University Press , address Cambridge, UK )
work page 1980
-
[25]
author Lisker, L. ( year 1985 ). title The pursuit of invariance in speech signals journal J. Acoust. Soc. Am. 77(3), pages 1199--1202
work page 1985
-
[26]
author Lo, J. J. H. , author Strycharczuk, P. , and author Kirkham, S. ( year 2025 ). title Articulatory strategy in vowel production as a basis for speaker discrimination in booktitle Proc. Interspeech 2025 , pp. pages 3504--3508
work page 2025
-
[27]
author Mathis, A. , author Mamidanna, P. , author Cury, K. M. , author Abe, T. , author Murthy, V. N. , author Mathis, M. W. , and author Bethge, M. ( year 2018 ). title DeepLabCut : markerless pose estimation of user-defined body parts with deep learning journal Nature Neurosci. 21(9), pages 1281--1289
work page 2018
-
[28]
author McAuliffe, M. , author Socolof, M. , author Mihuc, S. , author Wagner, M. , and author Sonderegger, M. ( year 2017 ). title Montreal F orced A ligner: T rainable text-speech alignment using K aldi in booktitle Proc. Interspeech 2017 , pp. pages 498--502
work page 2017
-
[29]
author McDougall, K. ( year 2004 ). title Speaker-specific formant dynamics: An experiment on A ustralian E nglish / AI / journal Int. J. Speech Lang. Law 11(1), pages 103--130
work page 2004
-
[30]
author McDougall, K. ( year 2006 ). title Dynamic features of speech and the characterization of speakers: Toward a new approach using formant frequencies journal Int. J. Speech Lang. Law 13(1), pages 89--126
work page 2006
-
[31]
author McDougall, K. , and author Nolan, F. ( year 2007 ). title Discrimination of speakers using the formant dynamics of / u: / in B ritish E nglish in booktitle Proc. 16th Inter. Congr. Phonetic Sci. , edited by editor J. Trouvain and editor W. J. Barry , pp. pages 1825--1828
work page 2007
-
[32]
author McFarland, D. H. , and author Baum, S. R. ( year 1995 ). title Incomplete compensation to articulatory perturbation journal J. Acoust. Soc. Am. 97(3), pages 1865--1873
work page 1995
-
[33]
author Morrison, G. S. ( year 2009 ). title Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs journal J. Acoust. Soc. Am. 125(4), pages 2387--2397
work page 2009
-
[34]
author Noiray, A. , author Iskarous, K. , and author Whalen, D. ( year 2014 ). title Variability in E nglish vowels is comparable in articulation and acoustics journal Lab. Phonol. 5(2), pages 271--288
work page 2014
-
[35]
author Nolan, F. ( year 1983 ). title The Phonetic Bases of Speaker Recognition ( publisher Cambridge University Press , address Cambridge, UK )
work page 1983
-
[36]
author Nolan, F. , and author Grigoras, C. ( year 2005 ). title A case for formant analysis in forensic speaker identification journal Int. J. Speech Lang. Law 12(2), pages 143--173
work page 2005
-
[37]
author Redford, M. , and author Baese-Berk, M. ( year 2023 ). title Acoustic theories of speech perception in booktitle Oxford Research Encyclopedia of Linguistics
work page 2023
-
[38]
author Rhodes, R. W. ( year 2012 ). title Assessing the strength of non-contemporaneous forensic speech evidence Ph.D. thesis, school University of York
work page 2012
-
[39]
author Rose, P. , author Warren, P. , and author Watson, C. ( year 2006 ). title The intrinsic forensic discriminatory power of diphthongs in booktitle Proc. 11th Aust. Int. Conf. Speech Sci. Technol. , pp. pages 64--69
work page 2006
-
[40]
author Saltzman, E. L. , and author Munhall, K. G. ( year 1989 ). title A dynamical approach to gestural patterning in speech production journal Ecol. Psychol. 1(4), pages 333--382
work page 1989
-
[41]
author Serrurier, A. , author Badin, P. , author Lamalle, L. , and author Neuschaefer-Rube, C. ( year 2019 ). title Characterization of inter-speaker articulatory variability: A two-level multi-speaker modelling approach based on MRI data journal J. Acoust. Soc. Am. 145(4), pages 2149--2170
work page 2019
-
[42]
, and author Neuschaefer-Rube, C
author Serrurier, A. , and author Neuschaefer-Rube, C. ( year 2023 ). title Morphological and acoustic modeling of the vocal tract journal J. Acoust. Soc. Am. 153(3), pages 1867--1886
work page 2023
-
[43]
, and author Neuschaefer-Rube, C
author Serrurier, A. , and author Neuschaefer-Rube, C. ( year 2024 ). title Formant-based articulatory strategies: C haracterisation and inter-speaker variability analysis journal J. Phonetics 107, pages 101374
work page 2024
-
[44]
author S \'o skuthy, M. ( year 2021 ). title Evaluating generalised additive mixed modelling strategies for dynamic speech analysis journal J. Phonetics 84, pages 101017
work page 2021
-
[45]
author Spreafico, L. , author Pucher, M. , and author Matosova, A. ( year 2018 ). title Ultra F it: A speaker-friendly headset for ultrasound recordings in speech science in booktitle Proc. Interspeech 2018 , organization International Speech Communication Association , pp. pages 1517--1520
work page 2018
-
[46]
author Stevens, K. N. ( year 1989 ). title On the quantal nature of speech journal J. Phonetics 17(1--2), pages 3--45
work page 1989
-
[47]
author Strycharczuk, P. , and author Kirkham, S. ( year 2025 ). title Articulatory strategies in male and female vowel production journal J. Speech Lang. Hear. Res. 68(12), pages 5629--5649
work page 2025
-
[48]
author Strycharczuk, P. , author Kirkham, S. , author Gorman, E. , and author Nagamine, T. ( year 2024 ). title Towards a dynamical model of E nglish vowels. E vidence from diphthongisation journal J. Phonetics 107, pages 101349
work page 2024
-
[49]
author Strycharczuk, P. , author Kirkham, S. , author Gorman, E. , and author Nagamine, T. ( year 2025 ). title Dimensionality reduction in lingual articulation of vowels: E vidence from lax vowels in N orthern A nglo- E nglish journal Lang. Speech 68(3), pages 689--721
work page 2025
-
[50]
, author L \'o pez-Ib \'a \ n ez, M
author Strycharczuk, P. , author L \'o pez-Ib \'a \ n ez, M. , author Brown, G. , and author Leemann, A. ( year 2020 ). title General N orthern E nglish. E xploring regional variation in the N orth of E ngland with machine learning journal Frontiers Artif. Intell. 3, pages 48
work page 2020
-
[51]
author Watson, C. I. , and author Harrington, J. ( year 1999 ). title Acoustic evidence for dynamic formant trajectories in A ustralian E nglish vowels journal J. Acoust. Soc. Am. 106(1), pages 458--468
work page 1999
-
[52]
author Watt, D. ( year 2002 ). title ‘ I don’t speak with a G eordie accent, I speak, like, the N orthern accent’: C ontact-induced levelling in the T yneside vowel system journal J. Sociolinguistics 6(1), pages 44--63
work page 2002
-
[53]
author Weirich, M. ( year 2012 ). title The influence of N ature and N urture on speaker-specific parameters in twins speech Ph.D. thesis, school Humboldt-Universit \"a t zu Berlin
work page 2012
-
[54]
author Weirich, M. , author Fuchs, S. , author Simpson, A. , author Winkler, R. , and author Perrier, P. ( year 2016 ). title Mumbling: Macho or morphology? journal J. Speech Lang. Hear. Res. 59(6), pages S1587--S1595
work page 2016
-
[55]
author Weirich, M. , and author Simpson, A. P. ( year 2018 ). title Individual differences in acoustic and articulatory undershoot in a G erman diphthong--variation between male and female speakers journal J. Phonetics 71, pages 35--50
work page 2018
-
[56]
author Wells, J. ( year 1982 ). title Accents of E nglish 1: An introduction , volume 2 ( publisher Cambridge University Press , address Camrbidge, UK )
work page 1982
-
[57]
author Wood, S. ( year 2017 ). title Generalized Additive Models: An Introduction with R , edition 2nd ed. ( publisher Chapman and Hall/CRC )
work page 2017
-
[58]
author Wrench, A. , and author Balch-Tomes, J. ( year 2022 ). title Beyond the edge: M arkerless pose estimation of speech articulators from ultrasound and camera images using D eep L ab C ut journal Sensors 22(3), pages 1133
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.