UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Aciel Eshky; Alan Wrench; James Scobbie; Joanne Cleland; Korin Richmond; Manuel Sam Ribeiro; Zoe Roxburgh

arxiv: 1907.00835 · v1 · pith:ZTKYVNRCnew · submitted 2019-07-01 · 💻 cs.CL · cs.CV· cs.SD· eess.AS· eess.IV

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Aciel Eshky , Manuel Sam Ribeiro , Joanne Cleland , Korin Richmond , Zoe Roxburgh , James Scobbie , Alan Wrench This is my paper

Pith reviewed 2026-05-25 11:56 UTC · model grok-4.3

classification 💻 cs.CL cs.CVcs.SDeess.ASeess.IV

keywords UltraSuiteultrasoundacoustic datachild speech therapyspeech sound disordersdata repositoryannotationsspeech production

0 comments

The pith

UltraSuite is a new repository of ultrasound tongue images and sound recordings collected during child speech therapy sessions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UltraSuite as a curated collection of ultrasound and acoustic data drawn directly from recordings made in real child speech therapy sessions. It supplies three separate data sets—one from typically developing children and two from children with speech sound disorders—together with both manual and automatic annotations plus software for processing and viewing the material. A sympathetic reader would see this as a practical step toward making raw, therapy-sourced speech-production data available for study rather than keeping it locked inside clinics. If the release works as described, researchers gain concrete material for examining tongue movement and sound patterns in children who are and are not receiving therapy.

Core claim

UltraSuite supplies three data collections of ultrasound tongue images paired with acoustic recordings taken from actual child speech therapy sessions, accompanied by manual and automatic annotations and a set of software tools for data processing, transformation, and visualisation.

What carries the argument

UltraSuite, the repository that bundles the ultrasound-acoustic recordings, annotations, and processing tools.

Load-bearing premise

The ultrasound and acoustic files really come from genuine child therapy sessions and the supplied annotations and tools accurately describe and handle that data.

What would settle it

A check that finds the released recordings were not taken during live therapy sessions or that the provided software cannot load and display the data files as claimed.

Figures

Figures reproduced from arXiv: 1907.00835 by Aciel Eshky, Alan Wrench, James Scobbie, Joanne Cleland, Korin Richmond, Manuel Sam Ribeiro, Zoe Roxburgh.

**Figure 1.** Figure 1: An ultrasound image showing the midsagittal view of a child’s tongue. We store the raw ultrasound reflection data efficiently as a matrix (left), but provide a tool to transform it to real world proportions (right). a frame contains ultrasound reflection data of a single scanline. To correctly interpret the ultrasound data, we provide a tool to transform the raw representation to the real world proportions… view at source ↗

read the original abstract

We introduce UltraSuite, a curated repository of ultrasound and acoustic data, collected from recordings of child speech therapy sessions. This release includes three data collections, one from typically developing children and two from children with speech sound disorders. In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UltraSuite is a standard data-release paper that puts child ultrasound and audio recordings from therapy sessions into the public domain.

read the letter

The core contribution here is the release of UltraSuite itself: three collections of ultrasound tongue imaging plus audio, one from typically developing children and two from children with speech sound disorders, along with a mix of manual and automatic annotations and some processing tools. That specific combination from real therapy sessions was not previously available in the cited literature, so the resource is new on its face. The full manuscript supplies the usual descriptive sections on participants, recording setup, annotation pipelines, and access, which is what a data paper needs to do. Credit to the authors for making the data public rather than keeping it internal. The paper does not claim new methods or results, and it does not overstate what it contains. The main value sits in the data being real and downloadable rather than in any analysis inside the text. Soft spots are limited and expected for this genre. There are no quantitative claims that need independent verification beyond the data description, and the load-bearing parts (actual recordings happened, tools run) are external facts about the release rather than internal contradictions. A reader who wants validated inter-annotator agreement numbers or benchmark results will not find them here, but that is not the paper's stated goal. This is for researchers in clinical speech technology or child language who need raw multimodal data to run their own experiments. It is not for someone hunting algorithmic novelty. The work shows clear, honest engagement with the task of releasing a usable resource. I would send it to peer review rather than desk reject if the venue handles data papers; the description is solid enough to merit referee time on the collection and access details.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces UltraSuite, a curated repository of ultrasound and acoustic data collected from child speech therapy sessions. It comprises three data collections—one from typically developing children and two from children with speech sound disorders—together with manual and automatically produced annotations and software tools for processing, transforming, and visualizing the data.

Significance. If the repository contents and documentation match the description, the release supplies a specialized open resource for research on pediatric speech production, ultrasound tongue imaging, and automatic analysis of child speech, including disordered speech. The combination of raw recordings, dual annotation streams, and processing tools supports reproducibility and downstream work in clinical linguistics and speech technology.

minor comments (2)

[Abstract] Abstract: the scale of the collections (participant counts, session totals) is not quantified, which would help readers gauge the resource size at a glance.
The manuscript should confirm that all software tools are accompanied by persistent identifiers (e.g., GitHub release DOIs or Zenodo archives) to ensure long-term accessibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript and for recommending acceptance. We are pleased that the description of UltraSuite, including the data collections, annotations, and tools, was found to be clear and potentially valuable for research in pediatric speech production and ultrasound imaging.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a descriptive data-release announcement with no derivations, equations, predictions, fitted parameters, or theoretical claims. Its central assertions concern the existence, contents, and accessibility of the UltraSuite repository (three collections, annotations, and tools), which are externally verifiable facts rather than internally derived quantities. No load-bearing steps reduce to self-definition, fitted inputs, or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a data repository release paper. No mathematical derivations, fitted parameters, or new postulated entities are involved. The contribution rests on the curation and public sharing of collected recordings.

pith-pipeline@v0.9.0 · 5612 in / 1207 out tokens · 52044 ms · 2026-05-25T11:56:42.636346+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · 1 internal anchor

[1]

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Introduction Speech sound disorders (SSDs) affect quality of life for a large number of children. In the UK, 11.4% of eight year olds 1 have persistent SSDs, ranging from common clinical distortions to speech that is unintelligible even to close family members [1]. SSDs are similarly prevalent in other countries [1]. Children with disordered speech experi...

work page internal anchor Pith review Pith/arXiv arXiv 1907
[2]

what’s this?

Data Collection Ethical approval to collect the data was granted by the NHS Re- search Ethics Service. We recorded the data in the laboratory using the Articulate Assistant Advanced software (AAA), ini- tially storing it in the AAA proprietary data format [22]. All sessions were conducted by a speech and language therapist (SLT), and both the children and...

work page 2011
[3]

Data Preparation We exported the raw data from the proprietary AAA format to obtain a tuple of four ﬁles per utterance:

work page
[4]

Prompt ﬁle: contains text describing the task the child was given and the date-time of recording

work page
[5]

Audio ﬁle: RIFF wave ﬁle, sampled at 22.05 KHz, con- taining the speech of the child and the SLT

work page
[6]

A sin- gle ultrasound frame is recorded as a 2D matrix where each column represents the ultrasound reﬂection intensi- ties along a single scanline

Ultrasound ﬁle: a sequence of ultrasound frames cap- turing the midsagittal view of the child’s tongue. A sin- gle ultrasound frame is recorded as a 2D matrix where each column represents the ultrasound reﬂection intensi- ties along a single scanline. The surface of the probe is convex and the scanlines are directed in an equal-angled fan in the scanning ...

work page
[7]

down link pat get

Parameter ﬁle: contains a set of parameters to interpret the ultrasound data and synchronise it with the audio. It gives the number of scanlines in each frame (63), the number of data points per scanline (412), number of bits used to represent each reﬂection intensity data point (8), the angle between each scanline (0.038◦), the number of ultrasound frame...

work page
[8]

Pronunciation dictionaries: We prepared a pronunciation dictionary for each of the three datasets

Data Annotation In addition to the data described in the previous section, we re- lease a set of annotations, including pronunciation dictionaries for each of the datasets, audio transcriptions for UXTD, SLT annotations, automatic speaker labelling and automatic phone alignments, all of which can aid modelling. Pronunciation dictionaries: We prepared a pr...

work page
[9]

Data Statistics Overall, the data contains 37.28 hours of synchronised audio and raw ultrasound across all datasets. Table 3 shows the distri- bution of audio in terms of speech (child and SLT) and silences (utterance initial, medial, and ﬁnal), estimated using the speaker labelling method described in Section 4. Although our speaker labelling method achi...

work page
[10]

We describe the current contents of the code repository and invite users to contribute their own code

Companion Code Repository We distribute a code repository containing a set of tools to inter- pret, transform and visualise the data, in addition to the recipes used to annotate the data. We describe the current contents of the code repository and invite users to contribute their own code. Tools: The repository contains raw ultrasound reﬂection data, but ...

work page
[11]

Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite

License and Distribution We distribute UltraSuite under Attribution-NonCommercial 4.0 Generic (CC BY-NC 4.0) and distribute the companion code un- der Apache License v.2. Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite

work page
[12]

We have described the process of data collection, preparation and standardisation, along with a suggested train- test split

Conclusions and Future Work We have introduced a new repository of ultrasound and acous- tic data which we have collected from child speech therapy sessions. We have described the process of data collection, preparation and standardisation, along with a suggested train- test split. We have described tools to transform and visualise the data, and annotatio...

work page
[13]

We thank Steve Renals for continued guidance and support, Anna Womack for help collecting the UltraPhonix data and Steve Cowen for technical support

Acknowledgements Supported by: EPSRC Healthcare Partnerships Programme, grants number EP/I027696/1 (Ultrax) and EP/P02338X/1 (Ul- trax2020), and NHS Scotland CSO, grant numberETM/402 (Ul- traPhonix). We thank Steve Renals for continued guidance and support, Anna Womack for help collecting the UltraPhonix data and Steve Cowen for technical support. We than...

work page
[14]

Prevalence and predictors of persistent speech sound disorder at eight years old: Findings from a population cohort study,

Y . Wren, L. L. Miller, T. J. Peters, A. Emond, and S. Roulstonef, “Prevalence and predictors of persistent speech sound disorder at eight years old: Findings from a population cohort study,”Journal of Speech, Language, and Hearing Research , vol. 59, no. 4, pp. 647–673, 2016

work page 2016
[15]

Who to refer for speech therapy at 4 years of age versus who to “watch and wait

A. Morgan, K. T. Eecen, A. Pezic, K. Brommeyer, C. Mei, P. Eadie, S. Reilly, and B. Dodd, “Who to refer for speech therapy at 4 years of age versus who to “watch and wait”?” The Journal of pediatrics, vol. 185, pp. 200–204, 2017

work page 2017
[16]

Twenty- year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes,

C. J. Johnson, J. H. Beitchman, and E. B. Brownlie, “Twenty- year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes,”American Journal of Speech-Language Pathology, vol. 19, no. 1, pp. 51–65, 2010

work page 2010
[17]

A nationally representative study of the association between com- munication impairment at 4–5 years and children’s life activities at 7–9 years,

J. McCormack, L. J. Harrison, S. McLeod, and L. McAllister, “A nationally representative study of the association between com- munication impairment at 4–5 years and children’s life activities at 7–9 years,” Journal of Speech, Language, and Hearing Re- search, vol. 54, no. 5, pp. 1328–1348, 2011

work page 2011
[18]

Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,

B. A. Lewis, A. A. Avrich, L. A. Freebairn, A. J. Hansen, L. E. Sucheston, I. Kuo, H. G. Taylor, S. K. Iyengar, and C. M. Stein, “Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,” Journal of Speech, Lan- guage, and Hearing Research , vol. 54, no. 6, pp. 1628–1643, 2011

work page 2011
[19]

Howard and A

S. Howard and A. Lohmander, Cleft palate speech: assessment and intervention. John Wiley & Sons, 2011

work page 2011
[20]

Tongue track- ing in ultrasound images using eigentongue decomposition and artiﬁcial neural networks,

D. Fabre, T. Hueber, F. Bocquelet, and P. Badin, “Tongue track- ing in ultrasound images using eigentongue decomposition and artiﬁcial neural networks,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015

work page 2015
[21]

Robust contour tracking in ultrasound tongue image sequences,

K. Xu, Y . Yang, M. Stone, A. Jaumard-Hakoun, C. Leboullenger, G. Dreyfus, P. Roussel, and B. Denby, “Robust contour tracking in ultrasound tongue image sequences,” Clinical linguistics & pho- netics, vol. 30, no. 3-5, pp. 313–327, 2016

work page 2016
[22]

Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,

D. Fabre, T. Hueber, L. Girin, X. Alameda-Pineda, and P. Badin, “Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,”Speech Communication, vol. 93, pp. 63–75, 2017

work page 2017
[23]

Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,

D. Smith, A. Sneddon, L. Ward, A. Duenser, J. Freyne, D. Silvera- Tawil, and A. Morgan, “Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,” in INTER- SPEECH, Stockholm, Sweden, August 2017, pp. 2690–2694

work page 2017
[24]

An ultrasound study of lingual coarticulation in children and adults,

N. Zharkova, “An ultrasound study of lingual coarticulation in children and adults,” Dataset, 2009

work page 2009
[25]

High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,

——, “High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,” Dataset, 2011

work page 2011
[26]

High speed ultrasound/acoustic database of lingual articu- lation in typically developing children between three and thirteen years old,

——, “High speed ultrasound/acoustic database of lingual articu- lation in typically developing children between three and thirteen years old,” Dataset, 2016

work page 2016
[27]

Challenges for computer recognition of children’s speech,

M. Russell and S. D’Arcy, “Challenges for computer recognition of children’s speech,” inWorkshop on Speech and Language Tech- nology in Education, 2007

work page 2007
[28]

Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion

J. Fainberg, P. Bell, M. Lincoln, and S. Renals, “Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion.” in INTERSPEECH, 2016, pp. 1598–1602

work page 2016
[29]

Methods for eliciting, annotating, and analyzing databases for child speech development,

M. E. Beckman, A. R. Plummer, B. Munson, and P. F. Reidy, “Methods for eliciting, annotating, and analyzing databases for child speech development,” Computer Speech and Language , vol. 45, pp. 278 – 299, 2017

work page 2017
[30]

Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech

H. Christensen, M. Aniol, P. Bell, P. D. Green, T. Hain, S. King, and P. Swietojanski, “Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech.” in INTERSPEECH, 2013, pp. 3642–3645

work page 2013
[31]

Undifferentiated lingual gestures in children with articulation/phonological disorders,

F. E. Gibbon, “Undifferentiated lingual gestures in children with articulation/phonological disorders,” Journal of Speech, Lan- guage, and Hearing Research, vol. 42, no. 2, pp. 382–397, 1999

work page 1999
[32]

Speech production knowledge in automatic speech recognition,

S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, and M. Wester, “Speech production knowledge in automatic speech recognition,” The Journal of the Acoustical Society of America , vol. 121, no. 2, pp. 723–742, 2007

work page 2007
[33]

The use of ar- ticulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms,

K. Richmond, Z.-H. Ling, and J. Yamagishi, “The use of ar- ticulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms,” Acoustical Science and Technology, vol. 36, no. 6, pp. 467–477, 2015

work page 2015
[34]

Silent speech interfaces,

B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, “Silent speech interfaces,” Speech Communica- tion, vol. 52, no. 4, pp. 270–287, 2010

work page 2010
[35]

Wrench, Articulate Assistant User Guide: Version 2.11 , Ar- ticulate Instruments Ltd., QMU, Musselburgh, United Kingdom, 2010

A. Wrench, Articulate Assistant User Guide: Version 2.11 , Ar- ticulate Instruments Ltd., QMU, Musselburgh, United Kingdom, 2010

work page 2010
[36]

A guide to analysing tongue motion from ultrasound images,

M. Stone, “A guide to analysing tongue motion from ultrasound images,” Clinical Linguistics and Phonetics, vol. 19, no. 6-7, pp. 455–501, 2005

work page 2005
[37]

Helping children learn non-native articulations: the implications for ultrasound- based clinical intervention

J. Cleland, J. Scobbie, S. Naki, and A. Wrench, “Helping children learn non-native articulations: the implications for ultrasound- based clinical intervention.” in Proceedings of the 18th Interna- tional Congress of Phonetic Sciences. ICPhS 2015, 2015, pp. 1–5

work page 2015
[38]

Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,

J. Cleland, J. M. Scobbie, and A. A. Wrench, “Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,” Clinical Linguistics and Phonetics, vol. 29, no. 8-10, pp. 575–597, 2015

work page 2015
[39]

Covert contrast and covert errors in persistent velar fronting,

J. Cleland, J. M. Scobbie, C. Heyde, Z. Roxburgh, and A. A. Wrench, “Covert contrast and covert errors in persistent velar fronting,” Clinical Linguistics and Phonetics , vol. 31, no. 1, pp. 35–55, 2017

work page 2017
[40]

B. Dodd, H. Zhu, S. Crosbie, A. Holm, and A. Ozanne,Diagnostic evaluation of articulation and phonology (DEAP) . Psychology Corporation, 2002

work page 2002
[41]

A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,

J. McCann and A. A. Wrench, “A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,” in Proceedings of the 16th International Congress of the ICPhS , 2007, pp. 1985–1988

work page 2007
[42]

On generating Combilex pronunciations via morphological analysis,

K. Richmond, R. Clark, and S. Fitt, “On generating Combilex pronunciations via morphological analysis,” in INTERSPEECH, 2010, pp. 1974–1977

work page 2010
[43]

Robust LTS rules with the Combilex speech technology lexicon,

K. Richmond, R. A. Clark, and S. Fitt, “Robust LTS rules with the Combilex speech technology lexicon,” in INTERSPEECH, 2009, pp. 1295–1298

work page 2009
[44]

Praat: doing phonetics by computer [computer program],

P. Boersma and D. Weenink, “Praat: doing phonetics by computer [computer program],” 2009

work page 2009
[45]

The Kaldi speech recognition toolkit,

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y . Qian, P. Schwarzet al., “The Kaldi speech recognition toolkit,” in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Sig- nal Processing Society, Dec. 2011

work page 2011
[46]

Comparison of text-independent speaker recognition methods using VQ–distortion and discrete/continuous HMM’s,

T. Matsui and S. Furui, “Comparison of text-independent speaker recognition methods using VQ–distortion and discrete/continuous HMM’s,” IEEE Transactions on speech and audio processing , vol. 2, no. 3, pp. 456–459, 1994

work page 1994
[47]

pyannote. metrics: a toolkit for reproducible evalu- ation, diagnostic, and error analysis of speaker diarization sys- tems,

H. Bredin, “pyannote. metrics: a toolkit for reproducible evalu- ation, diagnostic, and error analysis of speaker diarization sys- tems,” in INTERSPEECH, 2017

work page 2017
[48]

The PF STAR children’s speech corpus,

A. Batliner, M. Blomberg, S. D’Arcy, D. Elenius, D. Giuliani, M. Gerosa, C. Hacker, M. Russell, S. Steidl, and M. Wong, “The PF STAR children’s speech corpus,” in Ninth European Confer- ence on Speech Communication and Technology, 2005

work page 2005
[49]

Ultrax2020: Ultrasound technology for optimising the treatment of speech dis- orders: Clinicians’ resource manual,

J. Cleland, A. Wrench, S. Lloyd, and E. Sugden, “Ultrax2020: Ultrasound technology for optimising the treatment of speech dis- orders: Clinicians’ resource manual,” University of Strathclyde, Tech. Rep., 2018

work page 2018

[1] [1]

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Introduction Speech sound disorders (SSDs) affect quality of life for a large number of children. In the UK, 11.4% of eight year olds 1 have persistent SSDs, ranging from common clinical distortions to speech that is unintelligible even to close family members [1]. SSDs are similarly prevalent in other countries [1]. Children with disordered speech experi...

work page internal anchor Pith review Pith/arXiv arXiv 1907

[2] [2]

what’s this?

Data Collection Ethical approval to collect the data was granted by the NHS Re- search Ethics Service. We recorded the data in the laboratory using the Articulate Assistant Advanced software (AAA), ini- tially storing it in the AAA proprietary data format [22]. All sessions were conducted by a speech and language therapist (SLT), and both the children and...

work page 2011

[3] [3]

Data Preparation We exported the raw data from the proprietary AAA format to obtain a tuple of four ﬁles per utterance:

work page

[4] [4]

Prompt ﬁle: contains text describing the task the child was given and the date-time of recording

work page

[5] [5]

Audio ﬁle: RIFF wave ﬁle, sampled at 22.05 KHz, con- taining the speech of the child and the SLT

work page

[6] [6]

A sin- gle ultrasound frame is recorded as a 2D matrix where each column represents the ultrasound reﬂection intensi- ties along a single scanline

Ultrasound ﬁle: a sequence of ultrasound frames cap- turing the midsagittal view of the child’s tongue. A sin- gle ultrasound frame is recorded as a 2D matrix where each column represents the ultrasound reﬂection intensi- ties along a single scanline. The surface of the probe is convex and the scanlines are directed in an equal-angled fan in the scanning ...

work page

[7] [7]

down link pat get

Parameter ﬁle: contains a set of parameters to interpret the ultrasound data and synchronise it with the audio. It gives the number of scanlines in each frame (63), the number of data points per scanline (412), number of bits used to represent each reﬂection intensity data point (8), the angle between each scanline (0.038◦), the number of ultrasound frame...

work page

[8] [8]

Pronunciation dictionaries: We prepared a pronunciation dictionary for each of the three datasets

Data Annotation In addition to the data described in the previous section, we re- lease a set of annotations, including pronunciation dictionaries for each of the datasets, audio transcriptions for UXTD, SLT annotations, automatic speaker labelling and automatic phone alignments, all of which can aid modelling. Pronunciation dictionaries: We prepared a pr...

work page

[9] [9]

Data Statistics Overall, the data contains 37.28 hours of synchronised audio and raw ultrasound across all datasets. Table 3 shows the distri- bution of audio in terms of speech (child and SLT) and silences (utterance initial, medial, and ﬁnal), estimated using the speaker labelling method described in Section 4. Although our speaker labelling method achi...

work page

[10] [10]

We describe the current contents of the code repository and invite users to contribute their own code

Companion Code Repository We distribute a code repository containing a set of tools to inter- pret, transform and visualise the data, in addition to the recipes used to annotate the data. We describe the current contents of the code repository and invite users to contribute their own code. Tools: The repository contains raw ultrasound reﬂection data, but ...

work page

[11] [11]

Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite

License and Distribution We distribute UltraSuite under Attribution-NonCommercial 4.0 Generic (CC BY-NC 4.0) and distribute the companion code un- der Apache License v.2. Both can be obtained from the project website: http://www.ultrax-speech.org/ultrasuite

work page

[12] [12]

We have described the process of data collection, preparation and standardisation, along with a suggested train- test split

Conclusions and Future Work We have introduced a new repository of ultrasound and acous- tic data which we have collected from child speech therapy sessions. We have described the process of data collection, preparation and standardisation, along with a suggested train- test split. We have described tools to transform and visualise the data, and annotatio...

work page

[13] [13]

We thank Steve Renals for continued guidance and support, Anna Womack for help collecting the UltraPhonix data and Steve Cowen for technical support

Acknowledgements Supported by: EPSRC Healthcare Partnerships Programme, grants number EP/I027696/1 (Ultrax) and EP/P02338X/1 (Ul- trax2020), and NHS Scotland CSO, grant numberETM/402 (Ul- traPhonix). We thank Steve Renals for continued guidance and support, Anna Womack for help collecting the UltraPhonix data and Steve Cowen for technical support. We than...

work page

[14] [14]

Prevalence and predictors of persistent speech sound disorder at eight years old: Findings from a population cohort study,

Y . Wren, L. L. Miller, T. J. Peters, A. Emond, and S. Roulstonef, “Prevalence and predictors of persistent speech sound disorder at eight years old: Findings from a population cohort study,”Journal of Speech, Language, and Hearing Research , vol. 59, no. 4, pp. 647–673, 2016

work page 2016

[15] [15]

Who to refer for speech therapy at 4 years of age versus who to “watch and wait

A. Morgan, K. T. Eecen, A. Pezic, K. Brommeyer, C. Mei, P. Eadie, S. Reilly, and B. Dodd, “Who to refer for speech therapy at 4 years of age versus who to “watch and wait”?” The Journal of pediatrics, vol. 185, pp. 200–204, 2017

work page 2017

[16] [16]

Twenty- year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes,

C. J. Johnson, J. H. Beitchman, and E. B. Brownlie, “Twenty- year follow-up of children with and without speech-language impairments: Family, educational, occupational, and quality of life outcomes,”American Journal of Speech-Language Pathology, vol. 19, no. 1, pp. 51–65, 2010

work page 2010

[17] [17]

A nationally representative study of the association between com- munication impairment at 4–5 years and children’s life activities at 7–9 years,

J. McCormack, L. J. Harrison, S. McLeod, and L. McAllister, “A nationally representative study of the association between com- munication impairment at 4–5 years and children’s life activities at 7–9 years,” Journal of Speech, Language, and Hearing Re- search, vol. 54, no. 5, pp. 1328–1348, 2011

work page 2011

[18] [18]

Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,

B. A. Lewis, A. A. Avrich, L. A. Freebairn, A. J. Hansen, L. E. Sucheston, I. Kuo, H. G. Taylor, S. K. Iyengar, and C. M. Stein, “Literacy outcomes of children with early childhood speech sound disorders: Impact of endophenotypes,” Journal of Speech, Lan- guage, and Hearing Research , vol. 54, no. 6, pp. 1628–1643, 2011

work page 2011

[19] [19]

Howard and A

S. Howard and A. Lohmander, Cleft palate speech: assessment and intervention. John Wiley & Sons, 2011

work page 2011

[20] [20]

Tongue track- ing in ultrasound images using eigentongue decomposition and artiﬁcial neural networks,

D. Fabre, T. Hueber, F. Bocquelet, and P. Badin, “Tongue track- ing in ultrasound images using eigentongue decomposition and artiﬁcial neural networks,” in Sixteenth Annual Conference of the International Speech Communication Association, 2015

work page 2015

[21] [21]

Robust contour tracking in ultrasound tongue image sequences,

K. Xu, Y . Yang, M. Stone, A. Jaumard-Hakoun, C. Leboullenger, G. Dreyfus, P. Roussel, and B. Denby, “Robust contour tracking in ultrasound tongue image sequences,” Clinical linguistics & pho- netics, vol. 30, no. 3-5, pp. 313–327, 2016

work page 2016

[22] [22]

Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,

D. Fabre, T. Hueber, L. Girin, X. Alameda-Pineda, and P. Badin, “Automatic animation of an articulatory tongue model from ultra- sound images of the vocal tract,”Speech Communication, vol. 93, pp. 63–75, 2017

work page 2017

[23] [23]

Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,

D. Smith, A. Sneddon, L. Ward, A. Duenser, J. Freyne, D. Silvera- Tawil, and A. Morgan, “Improving child speech disorder assess- ment by incorporating out-of-domain adult speech,” in INTER- SPEECH, Stockholm, Sweden, August 2017, pp. 2690–2694

work page 2017

[24] [24]

An ultrasound study of lingual coarticulation in children and adults,

N. Zharkova, “An ultrasound study of lingual coarticulation in children and adults,” Dataset, 2009

work page 2009

[25] [25]

High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,

——, “High speed ultrasound/acoustic database of lingual articu- lation in preadolescents and adults,” Dataset, 2011

work page 2011

[26] [26]

High speed ultrasound/acoustic database of lingual articu- lation in typically developing children between three and thirteen years old,

——, “High speed ultrasound/acoustic database of lingual articu- lation in typically developing children between three and thirteen years old,” Dataset, 2016

work page 2016

[27] [27]

Challenges for computer recognition of children’s speech,

M. Russell and S. D’Arcy, “Challenges for computer recognition of children’s speech,” inWorkshop on Speech and Language Tech- nology in Education, 2007

work page 2007

[28] [28]

Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion

J. Fainberg, P. Bell, M. Lincoln, and S. Renals, “Improving chil- dren’s speech recognition through out-of-domain data augmenta- tion.” in INTERSPEECH, 2016, pp. 1598–1602

work page 2016

[29] [29]

Methods for eliciting, annotating, and analyzing databases for child speech development,

M. E. Beckman, A. R. Plummer, B. Munson, and P. F. Reidy, “Methods for eliciting, annotating, and analyzing databases for child speech development,” Computer Speech and Language , vol. 45, pp. 278 – 299, 2017

work page 2017

[30] [30]

Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech

H. Christensen, M. Aniol, P. Bell, P. D. Green, T. Hain, S. King, and P. Swietojanski, “Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech.” in INTERSPEECH, 2013, pp. 3642–3645

work page 2013

[31] [31]

Undifferentiated lingual gestures in children with articulation/phonological disorders,

F. E. Gibbon, “Undifferentiated lingual gestures in children with articulation/phonological disorders,” Journal of Speech, Lan- guage, and Hearing Research, vol. 42, no. 2, pp. 382–397, 1999

work page 1999

[32] [32]

Speech production knowledge in automatic speech recognition,

S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, and M. Wester, “Speech production knowledge in automatic speech recognition,” The Journal of the Acoustical Society of America , vol. 121, no. 2, pp. 723–742, 2007

work page 2007

[33] [33]

The use of ar- ticulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms,

K. Richmond, Z.-H. Ling, and J. Yamagishi, “The use of ar- ticulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms,” Acoustical Science and Technology, vol. 36, no. 6, pp. 467–477, 2015

work page 2015

[34] [34]

Silent speech interfaces,

B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, “Silent speech interfaces,” Speech Communica- tion, vol. 52, no. 4, pp. 270–287, 2010

work page 2010

[35] [35]

Wrench, Articulate Assistant User Guide: Version 2.11 , Ar- ticulate Instruments Ltd., QMU, Musselburgh, United Kingdom, 2010

A. Wrench, Articulate Assistant User Guide: Version 2.11 , Ar- ticulate Instruments Ltd., QMU, Musselburgh, United Kingdom, 2010

work page 2010

[36] [36]

A guide to analysing tongue motion from ultrasound images,

M. Stone, “A guide to analysing tongue motion from ultrasound images,” Clinical Linguistics and Phonetics, vol. 19, no. 6-7, pp. 455–501, 2005

work page 2005

[37] [37]

Helping children learn non-native articulations: the implications for ultrasound- based clinical intervention

J. Cleland, J. Scobbie, S. Naki, and A. Wrench, “Helping children learn non-native articulations: the implications for ultrasound- based clinical intervention.” in Proceedings of the 18th Interna- tional Congress of Phonetic Sciences. ICPhS 2015, 2015, pp. 1–5

work page 2015

[38] [38]

Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,

J. Cleland, J. M. Scobbie, and A. A. Wrench, “Using ultrasound visual biofeedback to treat persistent primary speech sound dis- orders,” Clinical Linguistics and Phonetics, vol. 29, no. 8-10, pp. 575–597, 2015

work page 2015

[39] [39]

Covert contrast and covert errors in persistent velar fronting,

J. Cleland, J. M. Scobbie, C. Heyde, Z. Roxburgh, and A. A. Wrench, “Covert contrast and covert errors in persistent velar fronting,” Clinical Linguistics and Phonetics , vol. 31, no. 1, pp. 35–55, 2017

work page 2017

[40] [40]

B. Dodd, H. Zhu, S. Crosbie, A. Holm, and A. Ozanne,Diagnostic evaluation of articulation and phonology (DEAP) . Psychology Corporation, 2002

work page 2002

[41] [41]

A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,

J. McCann and A. A. Wrench, “A new EPG protocol for assess- ing DDK accuracy scores in children: a Down’s syndrome study,” in Proceedings of the 16th International Congress of the ICPhS , 2007, pp. 1985–1988

work page 2007

[42] [42]

On generating Combilex pronunciations via morphological analysis,

K. Richmond, R. Clark, and S. Fitt, “On generating Combilex pronunciations via morphological analysis,” in INTERSPEECH, 2010, pp. 1974–1977

work page 2010

[43] [43]

Robust LTS rules with the Combilex speech technology lexicon,

K. Richmond, R. A. Clark, and S. Fitt, “Robust LTS rules with the Combilex speech technology lexicon,” in INTERSPEECH, 2009, pp. 1295–1298

work page 2009

[44] [44]

Praat: doing phonetics by computer [computer program],

P. Boersma and D. Weenink, “Praat: doing phonetics by computer [computer program],” 2009

work page 2009

[45] [45]

The Kaldi speech recognition toolkit,

D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y . Qian, P. Schwarzet al., “The Kaldi speech recognition toolkit,” in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Sig- nal Processing Society, Dec. 2011

work page 2011

[46] [46]

Comparison of text-independent speaker recognition methods using VQ–distortion and discrete/continuous HMM’s,

T. Matsui and S. Furui, “Comparison of text-independent speaker recognition methods using VQ–distortion and discrete/continuous HMM’s,” IEEE Transactions on speech and audio processing , vol. 2, no. 3, pp. 456–459, 1994

work page 1994

[47] [47]

pyannote. metrics: a toolkit for reproducible evalu- ation, diagnostic, and error analysis of speaker diarization sys- tems,

H. Bredin, “pyannote. metrics: a toolkit for reproducible evalu- ation, diagnostic, and error analysis of speaker diarization sys- tems,” in INTERSPEECH, 2017

work page 2017

[48] [48]

The PF STAR children’s speech corpus,

A. Batliner, M. Blomberg, S. D’Arcy, D. Elenius, D. Giuliani, M. Gerosa, C. Hacker, M. Russell, S. Steidl, and M. Wong, “The PF STAR children’s speech corpus,” in Ninth European Confer- ence on Speech Communication and Technology, 2005

work page 2005

[49] [49]

Ultrax2020: Ultrasound technology for optimising the treatment of speech dis- orders: Clinicians’ resource manual,

J. Cleland, A. Wrench, S. Lloyd, and E. Sugden, “Ultrax2020: Ultrasound technology for optimising the treatment of speech dis- orders: Clinicians’ resource manual,” University of Strathclyde, Tech. Rep., 2018

work page 2018