UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions
Pith reviewed 2026-05-25 11:56 UTC · model grok-4.3
The pith
UltraSuite is a new repository of ultrasound tongue images and sound recordings collected during child speech therapy sessions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UltraSuite supplies three data collections of ultrasound tongue images paired with acoustic recordings taken from actual child speech therapy sessions, accompanied by manual and automatic annotations and a set of software tools for data processing, transformation, and visualisation.
What carries the argument
UltraSuite, the repository that bundles the ultrasound-acoustic recordings, annotations, and processing tools.
Load-bearing premise
The ultrasound and acoustic files really come from genuine child therapy sessions and the supplied annotations and tools accurately describe and handle that data.
What would settle it
A check that finds the released recordings were not taken during live therapy sessions or that the provided software cannot load and display the data files as claimed.
Figures
read the original abstract
We introduce UltraSuite, a curated repository of ultrasound and acoustic data, collected from recordings of child speech therapy sessions. This release includes three data collections, one from typically developing children and two from children with speech sound disorders. In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces UltraSuite, a curated repository of ultrasound and acoustic data collected from child speech therapy sessions. It comprises three data collections—one from typically developing children and two from children with speech sound disorders—together with manual and automatically produced annotations and software tools for processing, transforming, and visualizing the data.
Significance. If the repository contents and documentation match the description, the release supplies a specialized open resource for research on pediatric speech production, ultrasound tongue imaging, and automatic analysis of child speech, including disordered speech. The combination of raw recordings, dual annotation streams, and processing tools supports reproducibility and downstream work in clinical linguistics and speech technology.
minor comments (2)
- [Abstract] Abstract: the scale of the collections (participant counts, session totals) is not quantified, which would help readers gauge the resource size at a glance.
- The manuscript should confirm that all software tools are accompanied by persistent identifiers (e.g., GitHub release DOIs or Zenodo archives) to ensure long-term accessibility.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the manuscript and for recommending acceptance. We are pleased that the description of UltraSuite, including the data collections, annotations, and tools, was found to be clear and potentially valuable for research in pediatric speech production and ultrasound imaging.
Circularity Check
No significant circularity
full rationale
The paper is a descriptive data-release announcement with no derivations, equations, predictions, fitted parameters, or theoretical claims. Its central assertions concern the existence, contents, and accessibility of the UltraSuite repository (three collections, annotations, and tools), which are externally verifiable facts rather than internally derived quantities. No load-bearing steps reduce to self-definition, fitted inputs, or self-citation chains.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions
Introduction Speech sound disorders (SSDs) affect quality of life for a large number of children. In the UK, 11.4% of eight year olds 1 have persistent SSDs, ranging from common clinical distortions to speech that is unintelligible even to close family members [1]. SSDs are similarly prevalent in other countries [1]. Children with disordered speech experi...
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[2]
Data Collection Ethical approval to collect the data was granted by the NHS Re- search Ethics Service. We recorded the data in the laboratory using the Articulate Assistant Advanced software (AAA), ini- tially storing it in the AAA proprietary data format [22]. All sessions were conducted by a speech and language therapist (SLT), and both the children and...
work page 2011
-
[3]
Data Preparation We exported the raw data from the proprietary AAA format to obtain a tuple of four files per utterance:
-
[4]
Prompt file: contains text describing the task the child was given and the date-time of recording
-
[5]
Audio file: RIFF wave file, sampled at 22.05 KHz, con- taining the speech of the child and the SLT
-
[6]
Ultrasound file: a sequence of ultrasound frames cap- turing the midsagittal view of the child’s tongue. A sin- gle ultrasound frame is recorded as a 2D matrix where each column represents the ultrasound reflection intensi- ties along a single scanline. The surface of the probe is convex and the scanlines are directed in an equal-angled fan in the scanning ...
-
[7]
Parameter file: contains a set of parameters to interpret the ultrasound data and synchronise it with the audio. It gives the number of scanlines in each frame (63), the number of data points per scanline (412), number of bits used to represent each reflection intensity data point (8), the angle between each scanline (0.038◦), the number of ultrasound frame...
-
[8]
Pronunciation dictionaries: We prepared a pronunciation dictionary for each of the three datasets
Data Annotation In addition to the data described in the previous section, we re- lease a set of annotations, including pronunciation dictionaries for each of the datasets, audio transcriptions for UXTD, SLT annotations, automatic speaker labelling and automatic phone alignments, all of which can aid modelling. Pronunciation dictionaries: We prepared a pr...
-
[9]
Data Statistics Overall, the data contains 37.28 hours of synchronised audio and raw ultrasound across all datasets. Table 3 shows the distri- bution of audio in terms of speech (child and SLT) and silences (utterance initial, medial, and final), estimated using the speaker labelling method described in Section 4. Although our speaker labelling method achi...
-
[10]
Companion Code Repository We distribute a code repository containing a set of tools to inter- pret, transform and visualise the data, in addition to the recipes used to annotate the data. We describe the current contents of the code repository and invite users to contribute their own code. Tools: The repository contains raw ultrasound reflection data, but ...
-
[11]
Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite
License and Distribution We distribute UltraSuite under Attribution-NonCommercial 4.0 Generic (CC BY-NC 4.0) and distribute the companion code un- der Apache License v.2. Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite
-
[12]
Conclusions and Future Work We have introduced a new repository of ultrasound and acous- tic data which we have collected from child speech therapy sessions. We have described the process of data collection, preparation and standardisation, along with a suggested train- test split. We have described tools to transform and visualise the data, and annotatio...
-
[13]
Acknowledgements Supported by: EPSRC Healthcare Partnerships Programme, grants number EP/I027696/1 (Ultrax) and EP/P02338X/1 (Ul- trax2020), and NHS Scotland CSO, grant numberETM/402 (Ul- traPhonix). We thank Steve Renals for continued guidance and support, Anna Womack for help collecting the UltraPhonix data and Steve Cowen for technical support. We than...
-
[14]
Y . Wren, L. L. Miller, T. J. Peters, A. Emond, and S. Roulstonef, “Prevalence and predictors of persistent speech sound disorder at eight years old: Findings from a population cohort study,”Journal of Speech, Language, and Hearing Research , vol. 59, no. 4, pp. 647–673, 2016
work page 2016
-
[15]
Who to refer for speech therapy at 4 years of age versus who to “watch and wait
A. Morgan, K. T. Eecen, A. Pezic, K. Brommeyer, C. Mei, P. Eadie, S. Reilly, and B. Dodd, “Who to refer for speech therapy at 4 years of age versus who to “watch and wait”?” The Journal of pediatrics, vol. 185, pp. 200–204, 2017
work page 2017
-
[16]
C. J. Johnson, J. H. Beitchman, and E. B. Brownlie, “Twenty- year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes,”American Journal of Speech-Language Pathology, vol. 19, no. 1, pp. 51–65, 2010
work page 2010
-
[17]
J. McCormack, L. J. Harrison, S. McLeod, and L. McAllister, “A nationally representative study of the association between com- munication impairment at 4–5 years and children’s life activities at 7–9 years,” Journal of Speech, Language, and Hearing Re- search, vol. 54, no. 5, pp. 1328–1348, 2011
work page 2011
-
[18]
Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,
B. A. Lewis, A. A. Avrich, L. A. Freebairn, A. J. Hansen, L. E. Sucheston, I. Kuo, H. G. Taylor, S. K. Iyengar, and C. M. Stein, “Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,” Journal of Speech, Lan- guage, and Hearing Research , vol. 54, no. 6, pp. 1628–1643, 2011
work page 2011
-
[19]
S. Howard and A. Lohmander, Cleft palate speech: assessment and intervention. John Wiley & Sons, 2011
work page 2011
-
[20]
D. Fabre, T. Hueber, F. Bocquelet, and P. Badin, “Tongue track- ing in ultrasound images using eigentongue decomposition and artificial neural networks,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015
work page 2015
-
[21]
Robust contour tracking in ultrasound tongue image sequences,
K. Xu, Y . Yang, M. Stone, A. Jaumard-Hakoun, C. Leboullenger, G. Dreyfus, P. Roussel, and B. Denby, “Robust contour tracking in ultrasound tongue image sequences,” Clinical linguistics & pho- netics, vol. 30, no. 3-5, pp. 313–327, 2016
work page 2016
-
[22]
Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,
D. Fabre, T. Hueber, L. Girin, X. Alameda-Pineda, and P. Badin, “Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,”Speech Communication, vol. 93, pp. 63–75, 2017
work page 2017
-
[23]
Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,
D. Smith, A. Sneddon, L. Ward, A. Duenser, J. Freyne, D. Silvera- Tawil, and A. Morgan, “Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,” in INTER- SPEECH, Stockholm, Sweden, August 2017, pp. 2690–2694
work page 2017
-
[24]
An ultrasound study of lingual coarticulation in children and adults,
N. Zharkova, “An ultrasound study of lingual coarticulation in children and adults,” Dataset, 2009
work page 2009
-
[25]
High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,
——, “High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,” Dataset, 2011
work page 2011
-
[26]
——, “High speed ultrasound/acoustic database of lingual articu- lation in typically developing children between three and thirteen years old,” Dataset, 2016
work page 2016
-
[27]
Challenges for computer recognition of children’s speech,
M. Russell and S. D’Arcy, “Challenges for computer recognition of children’s speech,” inWorkshop on Speech and Language Tech- nology in Education, 2007
work page 2007
-
[28]
Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion
J. Fainberg, P. Bell, M. Lincoln, and S. Renals, “Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion.” in INTERSPEECH, 2016, pp. 1598–1602
work page 2016
-
[29]
Methods for eliciting, annotating, and analyzing databases for child speech development,
M. E. Beckman, A. R. Plummer, B. Munson, and P. F. Reidy, “Methods for eliciting, annotating, and analyzing databases for child speech development,” Computer Speech and Language , vol. 45, pp. 278 – 299, 2017
work page 2017
-
[30]
Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech
H. Christensen, M. Aniol, P. Bell, P. D. Green, T. Hain, S. King, and P. Swietojanski, “Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech.” in INTERSPEECH, 2013, pp. 3642–3645
work page 2013
-
[31]
Undifferentiated lingual gestures in children with articulation/phonological disorders,
F. E. Gibbon, “Undifferentiated lingual gestures in children with articulation/phonological disorders,” Journal of Speech, Lan- guage, and Hearing Research, vol. 42, no. 2, pp. 382–397, 1999
work page 1999
-
[32]
Speech production knowledge in automatic speech recognition,
S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, and M. Wester, “Speech production knowledge in automatic speech recognition,” The Journal of the Acoustical Society of America , vol. 121, no. 2, pp. 723–742, 2007
work page 2007
-
[33]
K. Richmond, Z.-H. Ling, and J. Yamagishi, “The use of ar- ticulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms,” Acoustical Science and Technology, vol. 36, no. 6, pp. 467–477, 2015
work page 2015
-
[34]
B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, “Silent speech interfaces,” Speech Communica- tion, vol. 52, no. 4, pp. 270–287, 2010
work page 2010
-
[35]
A. Wrench, Articulate Assistant User Guide: Version 2.11 , Ar- ticulate Instruments Ltd., QMU, Musselburgh, United Kingdom, 2010
work page 2010
-
[36]
A guide to analysing tongue motion from ultrasound images,
M. Stone, “A guide to analysing tongue motion from ultrasound images,” Clinical Linguistics and Phonetics, vol. 19, no. 6-7, pp. 455–501, 2005
work page 2005
-
[37]
J. Cleland, J. Scobbie, S. Naki, and A. Wrench, “Helping children learn non-native articulations: the implications for ultrasound- based clinical intervention.” in Proceedings of the 18th Interna- tional Congress of Phonetic Sciences. ICPhS 2015, 2015, pp. 1–5
work page 2015
-
[38]
Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,
J. Cleland, J. M. Scobbie, and A. A. Wrench, “Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,” Clinical Linguistics and Phonetics, vol. 29, no. 8-10, pp. 575–597, 2015
work page 2015
-
[39]
Covert contrast and covert errors in persistent velar fronting,
J. Cleland, J. M. Scobbie, C. Heyde, Z. Roxburgh, and A. A. Wrench, “Covert contrast and covert errors in persistent velar fronting,” Clinical Linguistics and Phonetics , vol. 31, no. 1, pp. 35–55, 2017
work page 2017
-
[40]
B. Dodd, H. Zhu, S. Crosbie, A. Holm, and A. Ozanne,Diagnostic evaluation of articulation and phonology (DEAP) . Psychology Corporation, 2002
work page 2002
-
[41]
A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,
J. McCann and A. A. Wrench, “A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,” in Proceedings of the 16th International Congress of the ICPhS , 2007, pp. 1985–1988
work page 2007
-
[42]
On generating Combilex pronunciations via morphological analysis,
K. Richmond, R. Clark, and S. Fitt, “On generating Combilex pronunciations via morphological analysis,” in INTERSPEECH, 2010, pp. 1974–1977
work page 2010
-
[43]
Robust LTS rules with the Combilex speech technology lexicon,
K. Richmond, R. A. Clark, and S. Fitt, “Robust LTS rules with the Combilex speech technology lexicon,” in INTERSPEECH, 2009, pp. 1295–1298
work page 2009
-
[44]
Praat: doing phonetics by computer [computer program],
P. Boersma and D. Weenink, “Praat: doing phonetics by computer [computer program],” 2009
work page 2009
-
[45]
The Kaldi speech recognition toolkit,
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y . Qian, P. Schwarzet al., “The Kaldi speech recognition toolkit,” in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Sig- nal Processing Society, Dec. 2011
work page 2011
-
[46]
T. Matsui and S. Furui, “Comparison of text-independent speaker recognition methods using VQ–distortion and discrete/continuous HMM’s,” IEEE Transactions on speech and audio processing , vol. 2, no. 3, pp. 456–459, 1994
work page 1994
-
[47]
H. Bredin, “pyannote. metrics: a toolkit for reproducible evalu- ation, diagnostic, and error analysis of speaker diarization sys- tems,” in INTERSPEECH, 2017
work page 2017
-
[48]
The PF STAR children’s speech corpus,
A. Batliner, M. Blomberg, S. D’Arcy, D. Elenius, D. Giuliani, M. Gerosa, C. Hacker, M. Russell, S. Steidl, and M. Wong, “The PF STAR children’s speech corpus,” in Ninth European Confer- ence on Speech Communication and Technology, 2005
work page 2005
-
[49]
J. Cleland, A. Wrench, S. Lloyd, and E. Sugden, “Ultrax2020: Ultrasound technology for optimising the treatment of speech dis- orders: Clinicians’ resource manual,” University of Strathclyde, Tech. Rep., 2018
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.