pith. sign in

arxiv: 2605.23416 · v1 · pith:DPIMIIIYnew · submitted 2026-05-22 · 💻 cs.CL · cs.SD

Articulatory strategy as a source of variation in acoustic vowel dynamics

Pith reviewed 2026-05-25 04:37 UTC · model grok-4.3

classification 💻 cs.CL cs.SD
keywords articulatory strategiesformant dynamicsultrasound tongue imagingvowel productiondiphthongsspeaker variationNorthern English
0
0 comments X

The pith

Tongue shape during /i/ predicts the timing and steepness of formant transitions in palatal diphthongs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether distinct ways of positioning the tongue for the vowel /i/ produce measurable differences in how formants move during diphthongs that end with a palatal glide. Ultrasound recordings from 36 Northern Anglo English speakers supply tongue shape data for /i/, which then serves as a predictor in models of formant trajectories. The relationships hold because larger displacements from a mean palatal configuration demand higher articulatory velocities, shifting transitions earlier and making their slopes steeper. This supplies direct evidence that practiced articulatory strategies contribute to the speaker-specific character of acoustic vowel dynamics.

Core claim

Tongue shape in /i/ is a significant predictor of formant dynamics in diphthongs with a palatal offglide; greater displacement of tongue root and dorsum produces greater distortion from mean shape and requires higher velocities, resulting in relatively earlier and steeper formant transitions.

What carries the argument

Ultrasound tongue imaging of 36 speakers to classify articulatory strategies for /i/, followed by statistical regression linking those measures to formant trajectory parameters in diphthongs.

If this is right

  • Speaker-specific acoustic patterns in vowels arise in part from consistent articulatory habits rather than vocal tract anatomy alone.
  • Formant dynamics become more extreme when the required tongue displacement from the /i/ target is larger.
  • Articulatory compensation mechanisms regularize some aspects of vowel production while preserving individual differences in timing and velocity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same tongue-shape predictor may apply to other offglide contexts or languages if the underlying displacement-velocity relation is general.
  • Perception experiments could test whether listeners use these dynamic cues to identify speakers even when static formant targets are matched.
  • Longitudinal data on the same speakers would show whether the observed strategies remain stable or shift with age or dialect exposure.

Load-bearing premise

The ultrasound tongue images from 36 speakers capture stable individual articulatory strategies whose statistical links to formant dynamics reflect causal effects of movement rather than shared speaker traits or measurement confounds.

What would settle it

Finding that formant transition timing and slope in the same diphthongs show no reliable difference when speakers are grouped by ultrasound tongue shape in /i/, after accounting for vocal tract length and other covariates.

Figures

Figures reproduced from arXiv: 2605.23416 by Justin J. H. Lo, Patrycja Strycharczuk, Sam Kirkham.

Figure 1
Figure 1. Figure 1: FIG. 1. By-speaker mean of the [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. The effect of principal component scores on the variation in tongue shape. The grey line [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Top: GAMM predictions for normalised F1 and F2 trajectories, depending on tongue [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. GAMM predictions for patterns of raising of the posterior part of the dorsum (DLC knot [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. GAMM predictions for patterns of tongue root fronting in I-diphthongs (DLC knot 4) in [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
read the original abstract

Acoustic vowel dynamics have some speaker-identifying characteristics, which have been ascribed to individual properties of articulatory strategies: formant transitions have a particular shape because speakers move their articulators, using specific and practised movements. However, there is little existing evidence that different articulatory strategies systematically affect formant dynamics. The present study corroborates the link between the two. Ultrasound tongue imaging data from 36 speakers of Northern-Anglo English are used to identify distinct articulatory strategies for the production of palatal vowel /i/. Tongue shape in /i/ is found to be a significant predictor of formant dynamics in diphthongs with a palatal offglide. The observed relationships can be explained by the characteristics of articulatory movement conditioned by vocal tract shape. Greater articulatory displacement of tongue root and/or dorsum produces greater distortion from the mean tongue shape in palatal vowels, and it also requires higher articulatory velocities, resulting in relatively earlier and steeper formant transitions. The results contribute to the conceptual understanding of individuality in speech, by illuminating the regularising and individual aspects of articulatory compensation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript uses ultrasound tongue imaging from 36 speakers of Northern-Anglo English to identify distinct articulatory strategies for /i/ and shows that tongue shape in /i/ is a significant predictor of formant dynamics in diphthongs with a palatal offglide. The observed relationships are attributed to greater articulatory displacement and velocity of tongue root/dorsum, which produce earlier and steeper formant transitions; the work frames this as evidence for how vocal-tract shape conditions individual articulatory compensation.

Significance. If the statistical relationships are shown to survive appropriate controls, the result would supply direct empirical evidence that articulatory strategy contributes to speaker-specific acoustic vowel dynamics, strengthening the conceptual link between vocal-tract geometry, movement kinematics, and formant trajectories.

major comments (2)
  1. [Abstract / Results] Abstract and Results section: the claim that tongue shape in /i/ is a 'significant predictor' is presented without any reported coefficients, p-values, effect sizes, model specification, or exclusion criteria for the 36 speakers. This absence prevents evaluation of whether the relationship survives controls for vocal-tract length, habitual jaw height, or speaking rate.
  2. [Statistical analysis] Statistical analysis: the manuscript states that the relationships 'can be explained by' displacement and velocity but does not indicate whether the regression distinguishes within-speaker from between-speaker variation or includes covariates that could jointly affect both /i/ shape and diphthong transitions.
minor comments (2)
  1. [Methods] Clarify the precise tongue-shape parameters extracted from the ultrasound data and how they were quantified (e.g., principal components or curvature measures).
  2. [Introduction] The phrase 'Northern-Anglo English' should be defined or referenced to a standard variety description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We agree that greater transparency in statistical reporting is needed and will revise the manuscript accordingly to strengthen the presentation of the results.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and Results section: the claim that tongue shape in /i/ is a 'significant predictor' is presented without any reported coefficients, p-values, effect sizes, model specification, or exclusion criteria for the 36 speakers. This absence prevents evaluation of whether the relationship survives controls for vocal-tract length, habitual jaw height, or speaking rate.

    Authors: We accept this criticism. The revised manuscript will expand both the abstract and results sections to report the full regression model specifications (including fixed and random effects), coefficients with standard errors, p-values, effect sizes (e.g., R² or standardized betas), and explicit speaker exclusion criteria. We will also add analyses that control for vocal-tract length (estimated from formant spacing), habitual jaw height (from ultrasound), and speaking rate (syllables per second), demonstrating that the predictive relationship between /i/ tongue shape and diphthong formant dynamics remains significant after these covariates. revision: yes

  2. Referee: [Statistical analysis] Statistical analysis: the manuscript states that the relationships 'can be explained by' displacement and velocity but does not indicate whether the regression distinguishes within-speaker from between-speaker variation or includes covariates that could jointly affect both /i/ shape and diphthong transitions.

    Authors: We agree that the current description is insufficient. In the revision we will clarify that all primary models are linear mixed-effects regressions with by-speaker random intercepts and slopes, thereby partitioning within-speaker from between-speaker variance. We will also document the covariate selection process and report models that include vocal-tract length, jaw position, and speaking rate as fixed effects, together with model comparison statistics showing that the tongue-shape predictor retains explanatory power after these controls. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical statistical analysis from external speaker data

full rationale

The paper reports an observational study correlating ultrasound tongue shapes in /i/ with formant transition properties in diphthongs across 36 speakers. No equations, parameter-fitting steps, or predictions are described that reduce to the inputs by construction. No self-citations are invoked as uniqueness theorems or to justify ansatzes. The central claim rests on measured data and regression results rather than definitional equivalence or renaming of known patterns. This is the normal case of a self-contained empirical report.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the validity of ultrasound as a measure of articulatory strategy and on the assumption that statistical prediction reflects articulatory causation via movement characteristics.

axioms (1)
  • domain assumption Ultrasound tongue imaging provides an accurate and sufficient measure of individual tongue shape and articulatory strategy during vowel production.
    Invoked to identify distinct strategies and link them to acoustic outcomes.

pith-pipeline@v0.9.0 · 5731 in / 1049 out tokens · 26338 ms · 2026-05-25T04:37:30.363669+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages

  1. [1]

    , and author Turton, D

    author Baranowski, M. , and author Turton, D. ( year 2015 ). title Manchester E nglish in booktitle Researching Northern Englishes , edited by editor R. Hickey ( publisher John Benjamins , address Amsterdam and Philadelphia ), pp. pages 293--316

  2. [2]

    ( year 2021 )

    author Barreda, S. ( year 2021 ). title Fast track: fast (nearly) automatic formant-tracking using P raat journal Linguistics Vanguard 7(1), pages 20200051

  3. [3]

    author Blumstein, S. E. , and author Stevens, K. N. ( year 1979 ). title Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of stop consonants journal J. Acoust. Soc. Am. 66(4), pages 1001--1017

  4. [4]

    , and author Weenink, D

    author Boersma, P. , and author Weenink, D. ( year 2022 ). title Praat: doing phonetics by computer [ C omputer programme] http://www.praat.org/ , note V ersion 6.2.14

  5. [5]

    , author Fuchs, S

    author Brunner, J. , author Fuchs, S. , and author Perrier, P. ( year 2009 ). title On the relationship between palate shape and articulatory behavior journal J. Acoust. Soc. Am. 125(6), pages 3936--3949

  6. [6]

    ( year 2008 )

    author Eckert, P. ( year 2008 ). title Variation and the indexical field journal J. Sociolinguistics 12(4), pages 453--476

  7. [7]

    , and author Docherty, G

    author Foulkes, P. , and author Docherty, G. ( year 2006 ). title The social life of phonetics and phonology journal J. Phonetics 34, pages 409--438

  8. [8]

    , author Lindblom, B

    author Gay, T. , author Lindblom, B. , and author Lubker, J. ( year 1981 ). title Production of bite-block vowels: A coustic equivalence by selective compensation journal J. Acoust. Soc. Am. 69(3), pages 802--810

  9. [9]

    author Guenther, F. H. ( year 2016 ). title Neural Control of Speech ( publisher The MIT Press , address Cambridge, MA )

  10. [10]

    , author Pizza, S

    author Hasegawa-Johnson, M. , author Pizza, S. , author Alwan, A. , author Cha, J. S. , and author Haker, K. ( year 2003 ). title Vowel category dependence of the relationship between palate height, tongue height, and oral area journal J. Speech Lang. Hear. Res. 46(3), pages 738--753

  11. [11]

    ( year 2020 )

    author Heeren, W. ( year 2020 ). title The contribution of dynamic versus static formant information in conversational speech journal Int. J. Speech Lang. Law 27(1), pages 75--98

  12. [12]

    , and author Jordan, M

    author Houde, J. , and author Jordan, M. I. ( year 1998 ). title Sensorimotor adaptation in speech production journal Science 279(5354), pages 1213--1216

  13. [13]

    , author Wood, S

    author Hughes, V. , author Wood, S. , and author Foulkes, P. ( year 2016 ). title Strength of forensic voice comparison evidence from the acoustics of filled pauses journal Int. J. Speech Lang. Law 23(1), pages 99--132

  14. [14]

    ( year 2020 )

    author Johnson, K. ( year 2020 ). title The F method of vocal tract length normalization for vowels journal Lab. Phonol. 11(1), pages 10

  15. [15]

    ( year 2023 )

    author Johnson, K. ( year 2023 ). title Individual differences in speech production: What is ``phonetic substance''? in booktitle Proc. 20th Inter. Congr. P honetic Sci. , edited by editor R. Skarnitzl and editor J. Vol\' i n , publisher International Phonetic Association , pp. pages 1102--1106

  16. [16]

    , author Ladefoged, P

    author Johnson, K. , author Ladefoged, P. , and author Lindau, M. ( year 1993 ). title Individual differences in vowel production journal J. Acoust. Soc. Am. 94(2), pages 701--714

  17. [17]

    author Kent, J. T. ( year 1992 ). title New directions in shape analysis in booktitle The Art of Statistical Science , edited by editor K. V. Mardia ( publisher Wiley , address New York ), pp. pages 115--127

  18. [18]

    , author Fink, G

    author Kirchhoff, K. , author Fink, G. A. , and author Sagerer, G. ( year 2002 ). title Combining acoustic and articulatory feature information for robust speech recognition journal Speech Commun. 37(3--4), pages 303--319

  19. [19]

    , and author Strycharczuk, P

    author Kirkham, S. , and author Strycharczuk, P. ( year 2025 ). title Dynamical model parameters from ultrasound tongue kinematics journal JASA Express Lett. 5(11), pages 115201

  20. [20]

    , author Strycharczuk, P

    author Kirkham, S. , author Strycharczuk, P. , author Gorman, E. , author Nagamine, T. , and author Wrench, A. ( year 2023 ). title Co-registration of simultaneous high speed ultrasound and electromagnetic articulography for speech production research in booktitle Proc. 20th I nter. C ongr. P honetic S ci. , edited by editor R. Skarnitzl and editor J. Vol...

  21. [21]

    , author Brockhoff, P

    author Kuznetsova, A. , author Brockhoff, P. B. , and author Christensen, R. H. B. ( year 2017 ). title lmerTest package: Tests in linear mixed effects models journal J. Statist. Softw. 82(13), pages 1--26

  22. [22]

    , author Proctor, M

    author Lammert, A. , author Proctor, M. , and author Narayanan, S. ( year 2013 a ). title Interspeaker variability in hard palate morphology and vowel production journal J. Speech Lang. Hear. Res. 56(6), pages 1924--1933

  23. [23]

    , author Proctor, M

    author Lammert, A. , author Proctor, M. , and author Narayanan, S. ( year 2013 b ). title Morphological variation in the adult hard palate and posterior pharyngeal wall journal J. Speech Lang. Hear. Res. 56(2), pages 521--530

  24. [24]

    ( year 1980 )

    author Laver, J. ( year 1980 ). title The Phonetic Description of Voice Quality ( publisher Cambridge University Press , address Cambridge, UK )

  25. [25]

    ( year 1985 )

    author Lisker, L. ( year 1985 ). title The pursuit of invariance in speech signals journal J. Acoust. Soc. Am. 77(3), pages 1199--1202

  26. [26]

    author Lo, J. J. H. , author Strycharczuk, P. , and author Kirkham, S. ( year 2025 ). title Articulatory strategy in vowel production as a basis for speaker discrimination in booktitle Proc. Interspeech 2025 , pp. pages 3504--3508

  27. [27]

    , author Mamidanna, P

    author Mathis, A. , author Mamidanna, P. , author Cury, K. M. , author Abe, T. , author Murthy, V. N. , author Mathis, M. W. , and author Bethge, M. ( year 2018 ). title DeepLabCut : markerless pose estimation of user-defined body parts with deep learning journal Nature Neurosci. 21(9), pages 1281--1289

  28. [28]

    , author Socolof, M

    author McAuliffe, M. , author Socolof, M. , author Mihuc, S. , author Wagner, M. , and author Sonderegger, M. ( year 2017 ). title Montreal F orced A ligner: T rainable text-speech alignment using K aldi in booktitle Proc. Interspeech 2017 , pp. pages 498--502

  29. [29]

    ( year 2004 )

    author McDougall, K. ( year 2004 ). title Speaker-specific formant dynamics: An experiment on A ustralian E nglish / AI / journal Int. J. Speech Lang. Law 11(1), pages 103--130

  30. [30]

    ( year 2006 )

    author McDougall, K. ( year 2006 ). title Dynamic features of speech and the characterization of speakers: Toward a new approach using formant frequencies journal Int. J. Speech Lang. Law 13(1), pages 89--126

  31. [31]

    , and author Nolan, F

    author McDougall, K. , and author Nolan, F. ( year 2007 ). title Discrimination of speakers using the formant dynamics of / u: / in B ritish E nglish in booktitle Proc. 16th Inter. Congr. Phonetic Sci. , edited by editor J. Trouvain and editor W. J. Barry , pp. pages 1825--1828

  32. [32]

    author McFarland, D. H. , and author Baum, S. R. ( year 1995 ). title Incomplete compensation to articulatory perturbation journal J. Acoust. Soc. Am. 97(3), pages 1865--1873

  33. [33]

    author Morrison, G. S. ( year 2009 ). title Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs journal J. Acoust. Soc. Am. 125(4), pages 2387--2397

  34. [34]

    , author Iskarous, K

    author Noiray, A. , author Iskarous, K. , and author Whalen, D. ( year 2014 ). title Variability in E nglish vowels is comparable in articulation and acoustics journal Lab. Phonol. 5(2), pages 271--288

  35. [35]

    ( year 1983 )

    author Nolan, F. ( year 1983 ). title The Phonetic Bases of Speaker Recognition ( publisher Cambridge University Press , address Cambridge, UK )

  36. [36]

    , and author Grigoras, C

    author Nolan, F. , and author Grigoras, C. ( year 2005 ). title A case for formant analysis in forensic speaker identification journal Int. J. Speech Lang. Law 12(2), pages 143--173

  37. [37]

    , and author Baese-Berk, M

    author Redford, M. , and author Baese-Berk, M. ( year 2023 ). title Acoustic theories of speech perception in booktitle Oxford Research Encyclopedia of Linguistics

  38. [38]

    author Rhodes, R. W. ( year 2012 ). title Assessing the strength of non-contemporaneous forensic speech evidence Ph.D. thesis, school University of York

  39. [39]

    , author Warren, P

    author Rose, P. , author Warren, P. , and author Watson, C. ( year 2006 ). title The intrinsic forensic discriminatory power of diphthongs in booktitle Proc. 11th Aust. Int. Conf. Speech Sci. Technol. , pp. pages 64--69

  40. [40]

    author Saltzman, E. L. , and author Munhall, K. G. ( year 1989 ). title A dynamical approach to gestural patterning in speech production journal Ecol. Psychol. 1(4), pages 333--382

  41. [41]

    , author Badin, P

    author Serrurier, A. , author Badin, P. , author Lamalle, L. , and author Neuschaefer-Rube, C. ( year 2019 ). title Characterization of inter-speaker articulatory variability: A two-level multi-speaker modelling approach based on MRI data journal J. Acoust. Soc. Am. 145(4), pages 2149--2170

  42. [42]

    , and author Neuschaefer-Rube, C

    author Serrurier, A. , and author Neuschaefer-Rube, C. ( year 2023 ). title Morphological and acoustic modeling of the vocal tract journal J. Acoust. Soc. Am. 153(3), pages 1867--1886

  43. [43]

    , and author Neuschaefer-Rube, C

    author Serrurier, A. , and author Neuschaefer-Rube, C. ( year 2024 ). title Formant-based articulatory strategies: C haracterisation and inter-speaker variability analysis journal J. Phonetics 107, pages 101374

  44. [44]

    ( year 2021 )

    author S \'o skuthy, M. ( year 2021 ). title Evaluating generalised additive mixed modelling strategies for dynamic speech analysis journal J. Phonetics 84, pages 101017

  45. [45]

    , author Pucher, M

    author Spreafico, L. , author Pucher, M. , and author Matosova, A. ( year 2018 ). title Ultra F it: A speaker-friendly headset for ultrasound recordings in speech science in booktitle Proc. Interspeech 2018 , organization International Speech Communication Association , pp. pages 1517--1520

  46. [46]

    author Stevens, K. N. ( year 1989 ). title On the quantal nature of speech journal J. Phonetics 17(1--2), pages 3--45

  47. [47]

    , and author Kirkham, S

    author Strycharczuk, P. , and author Kirkham, S. ( year 2025 ). title Articulatory strategies in male and female vowel production journal J. Speech Lang. Hear. Res. 68(12), pages 5629--5649

  48. [48]

    , author Kirkham, S

    author Strycharczuk, P. , author Kirkham, S. , author Gorman, E. , and author Nagamine, T. ( year 2024 ). title Towards a dynamical model of E nglish vowels. E vidence from diphthongisation journal J. Phonetics 107, pages 101349

  49. [49]

    , author Kirkham, S

    author Strycharczuk, P. , author Kirkham, S. , author Gorman, E. , and author Nagamine, T. ( year 2025 ). title Dimensionality reduction in lingual articulation of vowels: E vidence from lax vowels in N orthern A nglo- E nglish journal Lang. Speech 68(3), pages 689--721

  50. [50]

    , author L \'o pez-Ib \'a \ n ez, M

    author Strycharczuk, P. , author L \'o pez-Ib \'a \ n ez, M. , author Brown, G. , and author Leemann, A. ( year 2020 ). title General N orthern E nglish. E xploring regional variation in the N orth of E ngland with machine learning journal Frontiers Artif. Intell. 3, pages 48

  51. [51]

    author Watson, C. I. , and author Harrington, J. ( year 1999 ). title Acoustic evidence for dynamic formant trajectories in A ustralian E nglish vowels journal J. Acoust. Soc. Am. 106(1), pages 458--468

  52. [52]

    ( year 2002 )

    author Watt, D. ( year 2002 ). title ‘ I don’t speak with a G eordie accent, I speak, like, the N orthern accent’: C ontact-induced levelling in the T yneside vowel system journal J. Sociolinguistics 6(1), pages 44--63

  53. [53]

    ( year 2012 )

    author Weirich, M. ( year 2012 ). title The influence of N ature and N urture on speaker-specific parameters in twins speech Ph.D. thesis, school Humboldt-Universit \"a t zu Berlin

  54. [54]

    , author Fuchs, S

    author Weirich, M. , author Fuchs, S. , author Simpson, A. , author Winkler, R. , and author Perrier, P. ( year 2016 ). title Mumbling: Macho or morphology? journal J. Speech Lang. Hear. Res. 59(6), pages S1587--S1595

  55. [55]

    , and author Simpson, A

    author Weirich, M. , and author Simpson, A. P. ( year 2018 ). title Individual differences in acoustic and articulatory undershoot in a G erman diphthong--variation between male and female speakers journal J. Phonetics 71, pages 35--50

  56. [56]

    ( year 1982 )

    author Wells, J. ( year 1982 ). title Accents of E nglish 1: An introduction , volume 2 ( publisher Cambridge University Press , address Camrbidge, UK )

  57. [57]

    ( year 2017 )

    author Wood, S. ( year 2017 ). title Generalized Additive Models: An Introduction with R , edition 2nd ed. ( publisher Chapman and Hall/CRC )

  58. [58]

    , and author Balch-Tomes, J

    author Wrench, A. , and author Balch-Tomes, J. ( year 2022 ). title Beyond the edge: M arkerless pose estimation of speech articulators from ultrasound and camera images using D eep L ab C ut journal Sensors 22(3), pages 1133