pith. sign in

arxiv: 1907.00835 · v1 · pith:ZTKYVNRCnew · submitted 2019-07-01 · 💻 cs.CL · cs.CV· cs.SD· eess.AS· eess.IV

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Pith reviewed 2026-05-25 11:56 UTC · model grok-4.3

classification 💻 cs.CL cs.CVcs.SDeess.ASeess.IV
keywords UltraSuiteultrasoundacoustic datachild speech therapyspeech sound disordersdata repositoryannotationsspeech production
0
0 comments X

The pith

UltraSuite is a new repository of ultrasound tongue images and sound recordings collected during child speech therapy sessions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UltraSuite as a curated collection of ultrasound and acoustic data drawn directly from recordings made in real child speech therapy sessions. It supplies three separate data sets—one from typically developing children and two from children with speech sound disorders—together with both manual and automatic annotations plus software for processing and viewing the material. A sympathetic reader would see this as a practical step toward making raw, therapy-sourced speech-production data available for study rather than keeping it locked inside clinics. If the release works as described, researchers gain concrete material for examining tongue movement and sound patterns in children who are and are not receiving therapy.

Core claim

UltraSuite supplies three data collections of ultrasound tongue images paired with acoustic recordings taken from actual child speech therapy sessions, accompanied by manual and automatic annotations and a set of software tools for data processing, transformation, and visualisation.

What carries the argument

UltraSuite, the repository that bundles the ultrasound-acoustic recordings, annotations, and processing tools.

Load-bearing premise

The ultrasound and acoustic files really come from genuine child therapy sessions and the supplied annotations and tools accurately describe and handle that data.

What would settle it

A check that finds the released recordings were not taken during live therapy sessions or that the provided software cannot load and display the data files as claimed.

Figures

Figures reproduced from arXiv: 1907.00835 by Aciel Eshky, Alan Wrench, James Scobbie, Joanne Cleland, Korin Richmond, Manuel Sam Ribeiro, Zoe Roxburgh.

Figure 1
Figure 1. Figure 1: An ultrasound image showing the midsagittal view of a child’s tongue. We store the raw ultrasound reflection data efficiently as a matrix (left), but provide a tool to transform it to real world proportions (right). a frame contains ultrasound reflection data of a single scanline. To correctly interpret the ultrasound data, we provide a tool to transform the raw representation to the real world proportions… view at source ↗
read the original abstract

We introduce UltraSuite, a curated repository of ultrasound and acoustic data, collected from recordings of child speech therapy sessions. This release includes three data collections, one from typically developing children and two from children with speech sound disorders. In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces UltraSuite, a curated repository of ultrasound and acoustic data collected from child speech therapy sessions. It comprises three data collections—one from typically developing children and two from children with speech sound disorders—together with manual and automatically produced annotations and software tools for processing, transforming, and visualizing the data.

Significance. If the repository contents and documentation match the description, the release supplies a specialized open resource for research on pediatric speech production, ultrasound tongue imaging, and automatic analysis of child speech, including disordered speech. The combination of raw recordings, dual annotation streams, and processing tools supports reproducibility and downstream work in clinical linguistics and speech technology.

minor comments (2)
  1. [Abstract] Abstract: the scale of the collections (participant counts, session totals) is not quantified, which would help readers gauge the resource size at a glance.
  2. The manuscript should confirm that all software tools are accompanied by persistent identifiers (e.g., GitHub release DOIs or Zenodo archives) to ensure long-term accessibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. We are pleased that the description of UltraSuite, including the data collections, annotations, and tools, was found to be clear and potentially valuable for research in pediatric speech production and ultrasound imaging.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a descriptive data-release announcement with no derivations, equations, predictions, fitted parameters, or theoretical claims. Its central assertions concern the existence, contents, and accessibility of the UltraSuite repository (three collections, annotations, and tools), which are externally verifiable facts rather than internally derived quantities. No load-bearing steps reduce to self-definition, fitted inputs, or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a data repository release paper. No mathematical derivations, fitted parameters, or new postulated entities are involved. The contribution rests on the curation and public sharing of collected recordings.

pith-pipeline@v0.9.0 · 5612 in / 1207 out tokens · 52044 ms · 2026-05-25T11:56:42.636346+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 1 internal anchor

  1. [1]

    UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

    Introduction Speech sound disorders (SSDs) affect quality of life for a large number of children. In the UK, 11.4% of eight year olds 1 have persistent SSDs, ranging from common clinical distortions to speech that is unintelligible even to close family members [1]. SSDs are similarly prevalent in other countries [1]. Children with disordered speech experi...

  2. [2]

    what’s this?

    Data Collection Ethical approval to collect the data was granted by the NHS Re- search Ethics Service. We recorded the data in the laboratory using the Articulate Assistant Advanced software (AAA), ini- tially storing it in the AAA proprietary data format [22]. All sessions were conducted by a speech and language therapist (SLT), and both the children and...

  3. [3]

    Data Preparation We exported the raw data from the proprietary AAA format to obtain a tuple of four files per utterance:

  4. [4]

    Prompt file: contains text describing the task the child was given and the date-time of recording

  5. [5]

    Audio file: RIFF wave file, sampled at 22.05 KHz, con- taining the speech of the child and the SLT

  6. [6]

    A sin- gle ultrasound frame is recorded as a 2D matrix where each column represents the ultrasound reflection intensi- ties along a single scanline

    Ultrasound file: a sequence of ultrasound frames cap- turing the midsagittal view of the child’s tongue. A sin- gle ultrasound frame is recorded as a 2D matrix where each column represents the ultrasound reflection intensi- ties along a single scanline. The surface of the probe is convex and the scanlines are directed in an equal-angled fan in the scanning ...

  7. [7]

    down link pat get

    Parameter file: contains a set of parameters to interpret the ultrasound data and synchronise it with the audio. It gives the number of scanlines in each frame (63), the number of data points per scanline (412), number of bits used to represent each reflection intensity data point (8), the angle between each scanline (0.038◦), the number of ultrasound frame...

  8. [8]

    Pronunciation dictionaries: We prepared a pronunciation dictionary for each of the three datasets

    Data Annotation In addition to the data described in the previous section, we re- lease a set of annotations, including pronunciation dictionaries for each of the datasets, audio transcriptions for UXTD, SLT annotations, automatic speaker labelling and automatic phone alignments, all of which can aid modelling. Pronunciation dictionaries: We prepared a pr...

  9. [9]

    Data Statistics Overall, the data contains 37.28 hours of synchronised audio and raw ultrasound across all datasets. Table 3 shows the distri- bution of audio in terms of speech (child and SLT) and silences (utterance initial, medial, and final), estimated using the speaker labelling method described in Section 4. Although our speaker labelling method achi...

  10. [10]

    We describe the current contents of the code repository and invite users to contribute their own code

    Companion Code Repository We distribute a code repository containing a set of tools to inter- pret, transform and visualise the data, in addition to the recipes used to annotate the data. We describe the current contents of the code repository and invite users to contribute their own code. Tools: The repository contains raw ultrasound reflection data, but ...

  11. [11]

    Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite

    License and Distribution We distribute UltraSuite under Attribution-NonCommercial 4.0 Generic (CC BY-NC 4.0) and distribute the companion code un- der Apache License v.2. Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite

  12. [12]

    We have described the process of data collection, preparation and standardisation, along with a suggested train- test split

    Conclusions and Future Work We have introduced a new repository of ultrasound and acous- tic data which we have collected from child speech therapy sessions. We have described the process of data collection, preparation and standardisation, along with a suggested train- test split. We have described tools to transform and visualise the data, and annotatio...

  13. [13]

    We thank Steve Renals for continued guidance and support, Anna Womack for help collecting the UltraPhonix data and Steve Cowen for technical support

    Acknowledgements Supported by: EPSRC Healthcare Partnerships Programme, grants number EP/I027696/1 (Ultrax) and EP/P02338X/1 (Ul- trax2020), and NHS Scotland CSO, grant numberETM/402 (Ul- traPhonix). We thank Steve Renals for continued guidance and support, Anna Womack for help collecting the UltraPhonix data and Steve Cowen for technical support. We than...

  14. [14]

    Prevalence and predictors of persistent speech sound disorder at eight years old: Findings from a population cohort study,

    Y . Wren, L. L. Miller, T. J. Peters, A. Emond, and S. Roulstonef, “Prevalence and predictors of persistent speech sound disorder at eight years old: Findings from a population cohort study,”Journal of Speech, Language, and Hearing Research , vol. 59, no. 4, pp. 647–673, 2016

  15. [15]

    Who to refer for speech therapy at 4 years of age versus who to “watch and wait

    A. Morgan, K. T. Eecen, A. Pezic, K. Brommeyer, C. Mei, P. Eadie, S. Reilly, and B. Dodd, “Who to refer for speech therapy at 4 years of age versus who to “watch and wait”?” The Journal of pediatrics, vol. 185, pp. 200–204, 2017

  16. [16]

    Twenty- year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes,

    C. J. Johnson, J. H. Beitchman, and E. B. Brownlie, “Twenty- year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes,”American Journal of Speech-Language Pathology, vol. 19, no. 1, pp. 51–65, 2010

  17. [17]

    A nationally representative study of the association between com- munication impairment at 4–5 years and children’s life activities at 7–9 years,

    J. McCormack, L. J. Harrison, S. McLeod, and L. McAllister, “A nationally representative study of the association between com- munication impairment at 4–5 years and children’s life activities at 7–9 years,” Journal of Speech, Language, and Hearing Re- search, vol. 54, no. 5, pp. 1328–1348, 2011

  18. [18]

    Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,

    B. A. Lewis, A. A. Avrich, L. A. Freebairn, A. J. Hansen, L. E. Sucheston, I. Kuo, H. G. Taylor, S. K. Iyengar, and C. M. Stein, “Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,” Journal of Speech, Lan- guage, and Hearing Research , vol. 54, no. 6, pp. 1628–1643, 2011

  19. [19]

    Howard and A

    S. Howard and A. Lohmander, Cleft palate speech: assessment and intervention. John Wiley & Sons, 2011

  20. [20]

    Tongue track- ing in ultrasound images using eigentongue decomposition and artificial neural networks,

    D. Fabre, T. Hueber, F. Bocquelet, and P. Badin, “Tongue track- ing in ultrasound images using eigentongue decomposition and artificial neural networks,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015

  21. [21]

    Robust contour tracking in ultrasound tongue image sequences,

    K. Xu, Y . Yang, M. Stone, A. Jaumard-Hakoun, C. Leboullenger, G. Dreyfus, P. Roussel, and B. Denby, “Robust contour tracking in ultrasound tongue image sequences,” Clinical linguistics & pho- netics, vol. 30, no. 3-5, pp. 313–327, 2016

  22. [22]

    Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,

    D. Fabre, T. Hueber, L. Girin, X. Alameda-Pineda, and P. Badin, “Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,”Speech Communication, vol. 93, pp. 63–75, 2017

  23. [23]

    Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,

    D. Smith, A. Sneddon, L. Ward, A. Duenser, J. Freyne, D. Silvera- Tawil, and A. Morgan, “Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,” in INTER- SPEECH, Stockholm, Sweden, August 2017, pp. 2690–2694

  24. [24]

    An ultrasound study of lingual coarticulation in children and adults,

    N. Zharkova, “An ultrasound study of lingual coarticulation in children and adults,” Dataset, 2009

  25. [25]

    High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,

    ——, “High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,” Dataset, 2011

  26. [26]

    High speed ultrasound/acoustic database of lingual articu- lation in typically developing children between three and thirteen years old,

    ——, “High speed ultrasound/acoustic database of lingual articu- lation in typically developing children between three and thirteen years old,” Dataset, 2016

  27. [27]

    Challenges for computer recognition of children’s speech,

    M. Russell and S. D’Arcy, “Challenges for computer recognition of children’s speech,” inWorkshop on Speech and Language Tech- nology in Education, 2007

  28. [28]

    Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion

    J. Fainberg, P. Bell, M. Lincoln, and S. Renals, “Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion.” in INTERSPEECH, 2016, pp. 1598–1602

  29. [29]

    Methods for eliciting, annotating, and analyzing databases for child speech development,

    M. E. Beckman, A. R. Plummer, B. Munson, and P. F. Reidy, “Methods for eliciting, annotating, and analyzing databases for child speech development,” Computer Speech and Language , vol. 45, pp. 278 – 299, 2017

  30. [30]

    Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech

    H. Christensen, M. Aniol, P. Bell, P. D. Green, T. Hain, S. King, and P. Swietojanski, “Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech.” in INTERSPEECH, 2013, pp. 3642–3645

  31. [31]

    Undifferentiated lingual gestures in children with articulation/phonological disorders,

    F. E. Gibbon, “Undifferentiated lingual gestures in children with articulation/phonological disorders,” Journal of Speech, Lan- guage, and Hearing Research, vol. 42, no. 2, pp. 382–397, 1999

  32. [32]

    Speech production knowledge in automatic speech recognition,

    S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, and M. Wester, “Speech production knowledge in automatic speech recognition,” The Journal of the Acoustical Society of America , vol. 121, no. 2, pp. 723–742, 2007

  33. [33]

    The use of ar- ticulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms,

    K. Richmond, Z.-H. Ling, and J. Yamagishi, “The use of ar- ticulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms,” Acoustical Science and Technology, vol. 36, no. 6, pp. 467–477, 2015

  34. [34]

    Silent speech interfaces,

    B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, “Silent speech interfaces,” Speech Communica- tion, vol. 52, no. 4, pp. 270–287, 2010

  35. [35]

    Wrench, Articulate Assistant User Guide: Version 2.11 , Ar- ticulate Instruments Ltd., QMU, Musselburgh, United Kingdom, 2010

    A. Wrench, Articulate Assistant User Guide: Version 2.11 , Ar- ticulate Instruments Ltd., QMU, Musselburgh, United Kingdom, 2010

  36. [36]

    A guide to analysing tongue motion from ultrasound images,

    M. Stone, “A guide to analysing tongue motion from ultrasound images,” Clinical Linguistics and Phonetics, vol. 19, no. 6-7, pp. 455–501, 2005

  37. [37]

    Helping children learn non-native articulations: the implications for ultrasound- based clinical intervention

    J. Cleland, J. Scobbie, S. Naki, and A. Wrench, “Helping children learn non-native articulations: the implications for ultrasound- based clinical intervention.” in Proceedings of the 18th Interna- tional Congress of Phonetic Sciences. ICPhS 2015, 2015, pp. 1–5

  38. [38]

    Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,

    J. Cleland, J. M. Scobbie, and A. A. Wrench, “Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,” Clinical Linguistics and Phonetics, vol. 29, no. 8-10, pp. 575–597, 2015

  39. [39]

    Covert contrast and covert errors in persistent velar fronting,

    J. Cleland, J. M. Scobbie, C. Heyde, Z. Roxburgh, and A. A. Wrench, “Covert contrast and covert errors in persistent velar fronting,” Clinical Linguistics and Phonetics , vol. 31, no. 1, pp. 35–55, 2017

  40. [40]

    B. Dodd, H. Zhu, S. Crosbie, A. Holm, and A. Ozanne,Diagnostic evaluation of articulation and phonology (DEAP) . Psychology Corporation, 2002

  41. [41]

    A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,

    J. McCann and A. A. Wrench, “A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,” in Proceedings of the 16th International Congress of the ICPhS , 2007, pp. 1985–1988

  42. [42]

    On generating Combilex pronunciations via morphological analysis,

    K. Richmond, R. Clark, and S. Fitt, “On generating Combilex pronunciations via morphological analysis,” in INTERSPEECH, 2010, pp. 1974–1977

  43. [43]

    Robust LTS rules with the Combilex speech technology lexicon,

    K. Richmond, R. A. Clark, and S. Fitt, “Robust LTS rules with the Combilex speech technology lexicon,” in INTERSPEECH, 2009, pp. 1295–1298

  44. [44]

    Praat: doing phonetics by computer [computer program],

    P. Boersma and D. Weenink, “Praat: doing phonetics by computer [computer program],” 2009

  45. [45]

    The Kaldi speech recognition toolkit,

    D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y . Qian, P. Schwarzet al., “The Kaldi speech recognition toolkit,” in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Sig- nal Processing Society, Dec. 2011

  46. [46]

    Comparison of text-independent speaker recognition methods using VQ–distortion and discrete/continuous HMM’s,

    T. Matsui and S. Furui, “Comparison of text-independent speaker recognition methods using VQ–distortion and discrete/continuous HMM’s,” IEEE Transactions on speech and audio processing , vol. 2, no. 3, pp. 456–459, 1994

  47. [47]

    pyannote. metrics: a toolkit for reproducible evalu- ation, diagnostic, and error analysis of speaker diarization sys- tems,

    H. Bredin, “pyannote. metrics: a toolkit for reproducible evalu- ation, diagnostic, and error analysis of speaker diarization sys- tems,” in INTERSPEECH, 2017

  48. [48]

    The PF STAR children’s speech corpus,

    A. Batliner, M. Blomberg, S. D’Arcy, D. Elenius, D. Giuliani, M. Gerosa, C. Hacker, M. Russell, S. Steidl, and M. Wong, “The PF STAR children’s speech corpus,” in Ninth European Confer- ence on Speech Communication and Technology, 2005

  49. [49]

    Ultrax2020: Ultrasound technology for optimising the treatment of speech dis- orders: Clinicians’ resource manual,

    J. Cleland, A. Wrench, S. Lloyd, and E. Sugden, “Ultrax2020: Ultrasound technology for optimising the treatment of speech dis- orders: Clinicians’ resource manual,” University of Strathclyde, Tech. Rep., 2018