On the Use/Misuse of the Term 'Phoneme'

Lucy Skidmore; Roger K. Moore

arxiv: 1907.11640 · v1 · pith:3AF5OC4Mnew · submitted 2019-07-26 · 💻 cs.CL · cs.SD· eess.AS

On the Use/Misuse of the Term 'Phoneme'

Roger K. Moore , Lucy Skidmore This is my paper

Pith reviewed 2026-05-24 15:39 UTC · model grok-4.3

classification 💻 cs.CL cs.SDeess.AS

keywords phonemephonemic contrastphoneticspeech scienceterminologyINTERSPEECHmisuselinguistics

0 comments

The pith

Many speech researchers misuse 'phoneme' to label actual sounds instead of an abstract contrast unit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the proper meaning of 'phoneme' as an abstract psychological unit defined by contrasts that distinguish meaning, distinct from the physical sounds described at the phonetic level. It investigates usage in accepted INTERSPEECH-2018 papers and finds that a significant share of authors appear unaware of this distinction or the role of phonemic contrast. As a result, the term is applied casually to refer to speech sounds rather than the defined abstract concept. If accurate, this pattern means parts of the community overlook opportunities to apply the implications of phonemic structure in their work. The authors outline the correct usage and propose steps to reduce the observed misuse.

Core claim

Review of the accepted papers at INTERSPEECH-2018 confirms that a significant proportion of the community may not be aware of the critical difference between phonetic and phonemic levels of description, may not fully understand the significance of phonemic contrast, and as a consequence consistently misuse the term 'phoneme'.

What carries the argument

The phonetic versus phonemic distinction, where phonetic describes concrete speech sounds and phonemic describes abstract units whose contrasts carry meaning distinctions.

If this is right

Sections of the community miss chances to understand and apply the implications of the phoneme as a psychological phenomenon.
Casual usage reduces precision in descriptions of speech data and models.
Clearer adherence to the distinction would support more effective exploitation of contrast-based structure in applications.
Mitigation steps such as targeted guidance could improve shared terminology across publications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The observed pattern may also appear in speech datasets or model training pipelines that label units without reference to contrast.
Similar terminology slippage could occur in related areas such as language acquisition studies or voice synthesis evaluation.
Explicit training modules on the phonetic-phonemic distinction could be tested for impact on paper clarity in future conferences.

Load-bearing premise

The standard linguistic definition of correct 'phoneme' usage is the standard the speech technology community should follow, and the authors' criteria for spotting misuse in the papers are objective and representative.

What would settle it

A re-analysis of the same INTERSPEECH-2018 papers with different criteria for proper usage that finds most instances already correct, or a direct survey of researchers that shows widespread grasp of phonemic contrast.

Figures

Figures reproduced from arXiv: 1907.11640 by Lucy Skidmore, Roger K. Moore.

**Figure 1.** Figure 1: Distribution of the occurrences of the term ‘phoneme’ in the INTERSPEECH-2018 and ICSLP-1998 accepted papers [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Distribution of the occurrences of the term ‘phoneme’ in the INTERSPEECH-2018 accepted papers in the speech science and speech technology categories. Returning to the discussion of use/misuse in Section 3.3, it turned out that, of the papers mentioning the term ‘phoneme’ in potentially inappropriate ways, 25% were categorised as ‘science’ and 75% as ‘technology’. However, since there were approximately t… view at source ↗

read the original abstract

The term 'phoneme' lies at the heart of speech science and technology, and yet it is not clear that the research community fully appreciates its meaning and implications. In particular, it is suspected that many researchers use the term in a casual sense to refer to the sounds of speech, rather than as a well defined abstract concept. If true, this means that some sections of the community may be missing an opportunity to understand and exploit the implications of this important psychological phenomenon. Here we review the correct meaning of the term 'phoneme' and report the results of an investigation into its use/misuse in the accepted papers at INTERSPEECH-2018. It is confirmed that a significant proportion of the community (i) may not be aware of the critical difference between `phonetic' and 'phonemic' levels of description, (ii) may not fully understand the significance of 'phonemic contrast', and as a consequence, (iii) consistently misuse the term 'phoneme'. These findings are discussed, and recommendations are made as to how this situation might be mitigated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags loose use of 'phoneme' in one conference but supplies no methods or counts to back the claim.

read the letter

The main point to take away is that this paper says a significant share of INTERSPEECH-2018 papers treat 'phoneme' as just another word for speech sounds instead of an abstract contrastive unit, yet the investigation itself is described only in the abstract with no numbers, selection rules, or coding details attached. That leaves the central finding as an unverified assertion rather than a result you can check or build on. What the paper does bring is a clear restatement of the standard linguistic definition and a reminder that mixing phonetic and phonemic levels can hide the role of contrast. That part is straightforward and useful for anyone who has seen the term tossed around in engineering contexts. The new element is the specific look at that one year's papers, though without prior surveys cited for comparison it is hard to know how much this differs from older complaints about terminology drift. The soft spot is the missing empirical backbone. No sample size appears, no criteria for what counts as misuse are given, and there is no mention of how multiple readers would score the same paper. If those rules were applied after the fact or only to certain sentence patterns, the reported proportion does not follow. The assumption that the linguistics definition is the one the community must adopt is stated but not argued against the practical reasons people might use the word more loosely. This piece would mainly interest readers already inside speech technology who care about precise wording in papers and reviews. Someone outside the subfield or looking for data that could shift practice would find little to act on. I would not send it to referees in its current form; the methods gap is too large for the claim being made. A short note or editorial might carry the reminder without needing the full apparatus.

Referee Report

2 major / 2 minor

Summary. The paper reviews the standard linguistic definition of the 'phoneme' as an abstract unit defined by contrast (distinct from phonetic realizations), then reports results from an investigation of its usage in accepted papers at INTERSPEECH-2018. It concludes that a significant proportion of the community misuses the term by employing it to refer to concrete speech sounds rather than the abstract contrastive concept, implying limited awareness of the phonetic/phonemic distinction and the significance of phonemic contrast. Recommendations for mitigation are offered.

Significance. If the empirical results hold under rigorous scrutiny, the work would usefully flag a potential terminology gap in speech technology that could hinder precise exploitation of linguistic concepts in applications such as ASR and synthesis. The paper correctly recalls core linguistic distinctions and supplies concrete recommendations, which are constructive. Its value is primarily diagnostic rather than theoretical; impact would be greatest if the investigation were shown to be reproducible and representative of the broader community.

major comments (2)

[Investigation section] The description of the investigation (the section following the review of the phoneme definition) supplies no sample size, selection criteria for the INTERSPEECH-2018 papers examined, explicit coding scheme for classifying 'misuse,' or inter-rater reliability statistics. These omissions render the central claim of a 'significant proportion' of misuse impossible to evaluate for objectivity or generalizability, directly undermining the empirical support for conclusions (i)–(iii).
[Investigation section] The criteria used to identify misuse appear to rest on the authors' chosen linguistic definition without demonstrated decision rules (e.g., whether any non-contrastive syntactic context counts as misuse or only specific patterns). This makes the classification potentially post-hoc and non-reproducible, which is load-bearing for the reported proportion.

minor comments (2)

[Abstract] The abstract is unusually long and contains the main claims; consider shortening it to focus on the investigation's scope while moving detailed conclusions to the body.
[Results] No table or figure summarizes the quantitative findings (e.g., counts or percentages of misuse by category); adding one would improve clarity of the results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting the need for greater methodological transparency in the investigation section. We agree that the current description is insufficient to allow full evaluation of the empirical claims and will revise the manuscript to address this. Our point-by-point responses follow.

read point-by-point responses

Referee: [Investigation section] The description of the investigation (the section following the review of the phoneme definition) supplies no sample size, selection criteria for the INTERSPEECH-2018 papers examined, explicit coding scheme for classifying 'misuse,' or inter-rater reliability statistics. These omissions render the central claim of a 'significant proportion' of misuse impossible to evaluate for objectivity or generalizability, directly undermining the empirical support for conclusions (i)–(iii).

Authors: We acknowledge that the manuscript as submitted omits these methodological details. In the revision we will add: the total number of papers examined, the explicit selection criterion (all accepted INTERSPEECH-2018 papers), the full coding scheme with decision rules for classifying each occurrence, and inter-rater reliability statistics (or a statement that a single coder performed the classification). These additions will allow readers to assess objectivity and generalizability directly. revision: yes
Referee: [Investigation section] The criteria used to identify misuse appear to rest on the authors' chosen linguistic definition without demonstrated decision rules (e.g., whether any non-contrastive syntactic context counts as misuse or only specific patterns). This makes the classification potentially post-hoc and non-reproducible, which is load-bearing for the reported proportion.

Authors: We agree that the decision rules must be stated explicitly rather than left implicit. The revised manuscript will include a dedicated subsection detailing the operational criteria: which syntactic contexts trigger a 'misuse' label, how contrastive versus non-contrastive uses are distinguished, and any borderline cases with examples. This will make the classification reproducible from the published text. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical usage survey grounded in external linguistic definitions

full rationale

The paper contains no equations, derivations, fitted parameters, or predictions. Its central claim is an empirical count of term usage in INTERSPEECH-2018 papers, using a definition of 'phoneme' drawn from standard linguistics rather than any self-referential construction. No self-citation chains, ansatzes, or renamings of known results are load-bearing. The analysis is self-contained against external benchmarks (linguistic literature and the sampled conference proceedings) and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a terminology survey paper with no mathematical derivations, fitted parameters, or postulated physical entities. The central claim rests on an assumed standard definition of 'phoneme' drawn from linguistics and on the authors' judgment of what counts as misuse.

axioms (1)

domain assumption There exists a single, well-defined correct meaning of 'phoneme' that researchers should adhere to.
Invoked in the opening paragraph when the authors state that the term 'lies at the heart' yet is used casually rather than as the 'well defined abstract concept'.

pith-pipeline@v0.9.0 · 5723 in / 1255 out tokens · 22247 ms · 2026-05-24T15:39:53.310575+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

These ﬁndings are discussed, and recommendations are made as to how this situation might be mitigated

It is conﬁrmed that a signiﬁcant proportion of the com- munity (i) may not be aware of the critical difference betwee n ‘phonetic’ and ‘phonemic’ levels of description, (ii) may n ot fully understand the signiﬁcance of ‘phonemic contrast’, a nd as a consequence, (iii) consistently misuse the term ‘phone me’. These ﬁndings are discussed, and recommendation...

work page
[2]

the smallest unit of speech that dis- tinguishes one word from another in a particular language

Introduction The idea that speech is organised around a ﬁnite set of ‘funda - mental’ sound units is an ancient one, and many languages hav e exploited this phenomenon in the development of their writi ng systems [1, 2]. Of course the study of such sound structures i s the primary remit of the speech sciences, speciﬁcally the ﬁe lds of ‘phonetics’ (concer...

work page
[3]

a family of uttered sounds 2 (segmental elements of speech) in a particular language 3 which count for practical purposes as if they were one and the same

The ‘Phoneme’ 2.1. Background According to the pioneering phonetician Daniel Jones, the i dea of the phoneme was recognised from the 1870s, but the term it- self was not in general use until the beginning of the 20th century [6]. The need for such a term arose because early phonetician s had realised that acoustically distinct speech sounds were only perc...

work page
[4]

phoneme

The Study In order to gauge the usage of the term ‘phoneme’ in the broad speech science and technology community, it was decided to analyse the texts of all papers accepted for publication at t he most recent INTERSPEECH conference - INTERSPEECH- 2018 - which took place in Hyderabad, India in August 2018. 791 papers comprising a total of over 3 million wo...

work page 2018
[5]

This means that, in many cases, the term ‘phoneme’ could have been substituted by ‘phone’ with no los s of meaning

Discussion and Recommendations The results of this investigation clearly demonstrate that , al- though the term ‘phoneme’ is used quite frequently by the speech science and technology community, it is often deploy ed in a casual informal manner without considering its deeper f or- mal implications. This means that, in many cases, the term ‘phoneme’ could ...

work page
[6]

Researchers should avoid the term ‘phoneme’ unless they are certain of its meaning. In particular, the term ‘phone’ should be used to describe a generic speech sound, and the term ‘phoneme’ should be reserved to refer to the abstract family of sounds that serve to distinguish one word from another in a particular language

work page
[7]

Teachers/supervisors should ensure that newcomers to the ﬁeld of speech science/technology are fully briefed on the critical difference between ‘phonetic’ and ‘phonemic’ lev els of description, the signiﬁcance of ‘phonemic contrast’, an d the correct usage of the term ‘phoneme’ [31, pp. 206]

work page
[8]

Community associations (such as ISCA and IEEE) should take steps to ensure that their members are aware of the im- portance of using the term ‘phoneme’ correctly

work page
[9]

Three key rec om- mendations are made that aim to mitigate the situation

Summary and Conclusion The investigation reported in this paper has conﬁrmed the hy - pothesis that a signiﬁcant proportion of the community (i) m ay not be aware of the critical difference between ‘phonetic’ a nd ‘phonemic’ levels of description, (ii) may not fully unders tand the signiﬁcance of ‘phonemic contrast’, and as a consequenc e, (iii) consisten...

work page
[10]

P . T. Daniels and W. Bright, Eds., The W orld’s Writing Systems . Oxford: Oxford University Press, 1996

work page 1996
[11]

Writing Systems,

G. Sampson, “Writing Systems,” in The Routledge Handbook of Linguistics, K. Allan, Ed. Abingdon: Routledge, 2016, ch. 4, pp. 47–61

work page 2016
[12]

Ladefoged, Elements of Acoustic Phonetics

P . Ladefoged, Elements of Acoustic Phonetics. London: Univer- sity of Chicago Press, 1962

work page 1962
[13]

J. D. O’Connor, Phonetics. Harmondsworth, UK: Penguin Books, 1974

work page 1974
[14]

K. N. Stevens, Acoustic Phonetics . Cambridge, Mass.: MIT Press, 1998

work page 1998
[15]

The history and meaning of the term ’phoneme’,

D. Jones, “The history and meaning of the term ’phoneme’, ” in Phonology: Selected Readings , E. C. Fudge, Ed. Har- mondsworth, UK: Penguin Books, 1973, ch. 1, pp. 17–34

work page 1973
[16]

Bybee, Phonology and Language Use

J. Bybee, Phonology and Language Use. Cambridge: Cambridge University Press, 2001

work page 2001
[17]

J. N. Holmes and W. Holmes, Speech Synthesis and Recognition . Taylor & Francis, 2002

work page 2002
[18]

Large scale discriminative train- ing of hidden Markov models for speech recognition,

P . C. Woodland and D. Povey, “Large scale discriminative train- ing of hidden Markov models for speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 25–47, 2002

work page 2002
[19]

The application of hidden Mark ov models in speech recognition,

M. Gales and S. J. Y oung, “The application of hidden Mark ov models in speech recognition,” F oundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195–304, 2007

work page 2007
[20]

Taylor, Text-to-Speech Synthesis

P . Taylor, Text-to-Speech Synthesis. Cambridge: Cambridge Uni- versity Press, 2009

work page 2009
[21]

Pieraccini, The V oice in the Machine

R. Pieraccini, The V oice in the Machine. MIT Press, Cambridge, MA, 2012

work page 2012
[22]

Phoneme,

“Phoneme,” in Merriam-W ebster Dictionary. [Online]. Available: https://www.merriam-webster.com/dictionary/phoneme

work page
[23]

Phoneme,

“Phoneme,” in Collins English Dictionary, Harper Collins Publishers . [Online]. Available: https://www.collinsdictionary.com/dictionary/english/phoneme

work page
[24]

Phoneme,

“Phoneme,” in Encyclopedia Britannica . [Online]. Available: https://www.britannica.com/topic/phoneme

work page
[25]

Phoneme,

“Phoneme,” in Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Phoneme

work page
[26]

International Phonetic Association

“International Phonetic Association.” [Online]. Ava ilable: https://www.internationalphoneticassociation.org

work page
[27]

The International Phonetic Associa tion: the ﬁrst 100 years,

M. K. C. MacMahon, “The International Phonetic Associa tion: the ﬁrst 100 years,” Journal of the International Phonetic Associ- ation, vol. 16, pp. 30–38, 1986

work page 1986
[28]

Ashby and J

M. Ashby and J. Maidment, Introducing Phonetic Science. Cam- bridge University Press, 2005

work page 2005
[29]

Perceptual restoration of missing speec h sounds,

R. M. Warren, “Perceptual restoration of missing speec h sounds,” Science, vol. 167, no. 3917, pp. 392–393, 1970

work page 1970
[30]

Roles and representations of systematic ﬁ ne pho- netic detail in speech understanding,

S. Hawkins, “Roles and representations of systematic ﬁ ne pho- netic detail in speech understanding,” Journal of Phonetics , vol. 31, pp. 373–405, 2003

work page 2003
[31]

Le sign de l’´ el´ evation de la voix,

E. Lombard, “Le sign de l’´ el´ evation de la voix,” Ann. Maladies Oreille, Larynx, Nez, Pharynx, vol. 37, pp. 101–119, 1911

work page 1911
[32]

Explaining phonetic variation: a sketch of the H&H theory,

B. Lindblom, “Explaining phonetic variation: a sketch of the H&H theory,” in Speech Production and Speech Modelling , W. J. Hardcastle and A. Marchal, Eds. Kluwer Academic Publishers , 1990, pp. 403–439

work page 1990
[33]

Moving Beyond the ‘Beads-On-A-String’ Model of Speech,

M. Ostendorf, “Moving Beyond the ‘Beads-On-A-String’ Model of Speech,” in IEEE ASRU W orkshop. Keystone, USA: IEEE, 1999, pp. 79–84

work page 1999
[34]

Is there a universal phonetic alphabet?

G. Sampson, “Is there a universal phonetic alphabet?” Language, vol. 50, no. 2, pp. 236–259, 1974

work page 1974
[35]

Against formal phonology,

R. F. Port and A. P . Leary, “Against formal phonology,”Language, vol. 81, pp. 927–964, 2005

work page 2005
[36]

Wide learning fo r audi- tory comprehension,

E. Shafaei-Bajestan and R. H. Baayen, “Wide learning fo r audi- tory comprehension,” in INTERSPEECH 2018. Hyderabad, In- dia: ISCA, 2018, pp. 966–970

work page 2018
[37]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015
[38]

Generative a d- versarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. War de- Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative a d- versarial nets,” in Neural Information Processing Systems (NIPS) Conference, Montreal, Canada, 2014

work page 2014
[39]

End-to-end attention-based large vocabulary speech recog- nition,

D. Bahdanau, J. Chorowski, D. Serdyuk, P . Brakel, and Y . Ben- gio, “End-to-end attention-based large vocabulary speech recog- nition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Shanghai, China, 2016

work page 2016
[40]

Gibbon, R

D. Gibbon, R. K. Moore, and R. Winski, Eds., Handbook of Stan- dards and Resources for Spoken Language Systems. Berlin, New Y ork: Mouton de Gruyter, 1997

work page 1997

[1] [1]

These ﬁndings are discussed, and recommendations are made as to how this situation might be mitigated

It is conﬁrmed that a signiﬁcant proportion of the com- munity (i) may not be aware of the critical difference betwee n ‘phonetic’ and ‘phonemic’ levels of description, (ii) may n ot fully understand the signiﬁcance of ‘phonemic contrast’, a nd as a consequence, (iii) consistently misuse the term ‘phone me’. These ﬁndings are discussed, and recommendation...

work page

[2] [2]

the smallest unit of speech that dis- tinguishes one word from another in a particular language

Introduction The idea that speech is organised around a ﬁnite set of ‘funda - mental’ sound units is an ancient one, and many languages hav e exploited this phenomenon in the development of their writi ng systems [1, 2]. Of course the study of such sound structures i s the primary remit of the speech sciences, speciﬁcally the ﬁe lds of ‘phonetics’ (concer...

work page

[3] [3]

a family of uttered sounds 2 (segmental elements of speech) in a particular language 3 which count for practical purposes as if they were one and the same

The ‘Phoneme’ 2.1. Background According to the pioneering phonetician Daniel Jones, the i dea of the phoneme was recognised from the 1870s, but the term it- self was not in general use until the beginning of the 20th century [6]. The need for such a term arose because early phonetician s had realised that acoustically distinct speech sounds were only perc...

work page

[4] [4]

phoneme

The Study In order to gauge the usage of the term ‘phoneme’ in the broad speech science and technology community, it was decided to analyse the texts of all papers accepted for publication at t he most recent INTERSPEECH conference - INTERSPEECH- 2018 - which took place in Hyderabad, India in August 2018. 791 papers comprising a total of over 3 million wo...

work page 2018

[5] [5]

This means that, in many cases, the term ‘phoneme’ could have been substituted by ‘phone’ with no los s of meaning

Discussion and Recommendations The results of this investigation clearly demonstrate that , al- though the term ‘phoneme’ is used quite frequently by the speech science and technology community, it is often deploy ed in a casual informal manner without considering its deeper f or- mal implications. This means that, in many cases, the term ‘phoneme’ could ...

work page

[6] [6]

Researchers should avoid the term ‘phoneme’ unless they are certain of its meaning. In particular, the term ‘phone’ should be used to describe a generic speech sound, and the term ‘phoneme’ should be reserved to refer to the abstract family of sounds that serve to distinguish one word from another in a particular language

work page

[7] [7]

Teachers/supervisors should ensure that newcomers to the ﬁeld of speech science/technology are fully briefed on the critical difference between ‘phonetic’ and ‘phonemic’ lev els of description, the signiﬁcance of ‘phonemic contrast’, an d the correct usage of the term ‘phoneme’ [31, pp. 206]

work page

[8] [8]

Community associations (such as ISCA and IEEE) should take steps to ensure that their members are aware of the im- portance of using the term ‘phoneme’ correctly

work page

[9] [9]

Three key rec om- mendations are made that aim to mitigate the situation

Summary and Conclusion The investigation reported in this paper has conﬁrmed the hy - pothesis that a signiﬁcant proportion of the community (i) m ay not be aware of the critical difference between ‘phonetic’ a nd ‘phonemic’ levels of description, (ii) may not fully unders tand the signiﬁcance of ‘phonemic contrast’, and as a consequenc e, (iii) consisten...

work page

[10] [10]

P . T. Daniels and W. Bright, Eds., The W orld’s Writing Systems . Oxford: Oxford University Press, 1996

work page 1996

[11] [11]

Writing Systems,

G. Sampson, “Writing Systems,” in The Routledge Handbook of Linguistics, K. Allan, Ed. Abingdon: Routledge, 2016, ch. 4, pp. 47–61

work page 2016

[12] [12]

Ladefoged, Elements of Acoustic Phonetics

P . Ladefoged, Elements of Acoustic Phonetics. London: Univer- sity of Chicago Press, 1962

work page 1962

[13] [13]

J. D. O’Connor, Phonetics. Harmondsworth, UK: Penguin Books, 1974

work page 1974

[14] [14]

K. N. Stevens, Acoustic Phonetics . Cambridge, Mass.: MIT Press, 1998

work page 1998

[15] [15]

The history and meaning of the term ’phoneme’,

D. Jones, “The history and meaning of the term ’phoneme’, ” in Phonology: Selected Readings , E. C. Fudge, Ed. Har- mondsworth, UK: Penguin Books, 1973, ch. 1, pp. 17–34

work page 1973

[16] [16]

Bybee, Phonology and Language Use

J. Bybee, Phonology and Language Use. Cambridge: Cambridge University Press, 2001

work page 2001

[17] [17]

J. N. Holmes and W. Holmes, Speech Synthesis and Recognition . Taylor & Francis, 2002

work page 2002

[18] [18]

Large scale discriminative train- ing of hidden Markov models for speech recognition,

P . C. Woodland and D. Povey, “Large scale discriminative train- ing of hidden Markov models for speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 25–47, 2002

work page 2002

[19] [19]

The application of hidden Mark ov models in speech recognition,

M. Gales and S. J. Y oung, “The application of hidden Mark ov models in speech recognition,” F oundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195–304, 2007

work page 2007

[20] [20]

Taylor, Text-to-Speech Synthesis

P . Taylor, Text-to-Speech Synthesis. Cambridge: Cambridge Uni- versity Press, 2009

work page 2009

[21] [21]

Pieraccini, The V oice in the Machine

R. Pieraccini, The V oice in the Machine. MIT Press, Cambridge, MA, 2012

work page 2012

[22] [22]

Phoneme,

“Phoneme,” in Merriam-W ebster Dictionary. [Online]. Available: https://www.merriam-webster.com/dictionary/phoneme

work page

[23] [23]

Phoneme,

“Phoneme,” in Collins English Dictionary, Harper Collins Publishers . [Online]. Available: https://www.collinsdictionary.com/dictionary/english/phoneme

work page

[24] [24]

Phoneme,

“Phoneme,” in Encyclopedia Britannica . [Online]. Available: https://www.britannica.com/topic/phoneme

work page

[25] [25]

Phoneme,

“Phoneme,” in Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Phoneme

work page

[26] [26]

International Phonetic Association

“International Phonetic Association.” [Online]. Ava ilable: https://www.internationalphoneticassociation.org

work page

[27] [27]

The International Phonetic Associa tion: the ﬁrst 100 years,

M. K. C. MacMahon, “The International Phonetic Associa tion: the ﬁrst 100 years,” Journal of the International Phonetic Associ- ation, vol. 16, pp. 30–38, 1986

work page 1986

[28] [28]

Ashby and J

M. Ashby and J. Maidment, Introducing Phonetic Science. Cam- bridge University Press, 2005

work page 2005

[29] [29]

Perceptual restoration of missing speec h sounds,

R. M. Warren, “Perceptual restoration of missing speec h sounds,” Science, vol. 167, no. 3917, pp. 392–393, 1970

work page 1970

[30] [30]

Roles and representations of systematic ﬁ ne pho- netic detail in speech understanding,

S. Hawkins, “Roles and representations of systematic ﬁ ne pho- netic detail in speech understanding,” Journal of Phonetics , vol. 31, pp. 373–405, 2003

work page 2003

[31] [31]

Le sign de l’´ el´ evation de la voix,

E. Lombard, “Le sign de l’´ el´ evation de la voix,” Ann. Maladies Oreille, Larynx, Nez, Pharynx, vol. 37, pp. 101–119, 1911

work page 1911

[32] [32]

Explaining phonetic variation: a sketch of the H&H theory,

B. Lindblom, “Explaining phonetic variation: a sketch of the H&H theory,” in Speech Production and Speech Modelling , W. J. Hardcastle and A. Marchal, Eds. Kluwer Academic Publishers , 1990, pp. 403–439

work page 1990

[33] [33]

Moving Beyond the ‘Beads-On-A-String’ Model of Speech,

M. Ostendorf, “Moving Beyond the ‘Beads-On-A-String’ Model of Speech,” in IEEE ASRU W orkshop. Keystone, USA: IEEE, 1999, pp. 79–84

work page 1999

[34] [34]

Is there a universal phonetic alphabet?

G. Sampson, “Is there a universal phonetic alphabet?” Language, vol. 50, no. 2, pp. 236–259, 1974

work page 1974

[35] [35]

Against formal phonology,

R. F. Port and A. P . Leary, “Against formal phonology,”Language, vol. 81, pp. 927–964, 2005

work page 2005

[36] [36]

Wide learning fo r audi- tory comprehension,

E. Shafaei-Bajestan and R. H. Baayen, “Wide learning fo r audi- tory comprehension,” in INTERSPEECH 2018. Hyderabad, In- dia: ISCA, 2018, pp. 966–970

work page 2018

[37] [37]

Deep learning,

Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015

work page 2015

[38] [38]

Generative a d- versarial nets,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. War de- Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative a d- versarial nets,” in Neural Information Processing Systems (NIPS) Conference, Montreal, Canada, 2014

work page 2014

[39] [39]

End-to-end attention-based large vocabulary speech recog- nition,

D. Bahdanau, J. Chorowski, D. Serdyuk, P . Brakel, and Y . Ben- gio, “End-to-end attention-based large vocabulary speech recog- nition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Shanghai, China, 2016

work page 2016

[40] [40]

Gibbon, R

D. Gibbon, R. K. Moore, and R. Winski, Eds., Handbook of Stan- dards and Resources for Spoken Language Systems. Berlin, New Y ork: Mouton de Gruyter, 1997

work page 1997