On the Use/Misuse of the Term 'Phoneme'
Pith reviewed 2026-05-24 15:39 UTC · model grok-4.3
The pith
Many speech researchers misuse 'phoneme' to label actual sounds instead of an abstract contrast unit.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Review of the accepted papers at INTERSPEECH-2018 confirms that a significant proportion of the community may not be aware of the critical difference between phonetic and phonemic levels of description, may not fully understand the significance of phonemic contrast, and as a consequence consistently misuse the term 'phoneme'.
What carries the argument
The phonetic versus phonemic distinction, where phonetic describes concrete speech sounds and phonemic describes abstract units whose contrasts carry meaning distinctions.
If this is right
- Sections of the community miss chances to understand and apply the implications of the phoneme as a psychological phenomenon.
- Casual usage reduces precision in descriptions of speech data and models.
- Clearer adherence to the distinction would support more effective exploitation of contrast-based structure in applications.
- Mitigation steps such as targeted guidance could improve shared terminology across publications.
Where Pith is reading between the lines
- The observed pattern may also appear in speech datasets or model training pipelines that label units without reference to contrast.
- Similar terminology slippage could occur in related areas such as language acquisition studies or voice synthesis evaluation.
- Explicit training modules on the phonetic-phonemic distinction could be tested for impact on paper clarity in future conferences.
Load-bearing premise
The standard linguistic definition of correct 'phoneme' usage is the standard the speech technology community should follow, and the authors' criteria for spotting misuse in the papers are objective and representative.
What would settle it
A re-analysis of the same INTERSPEECH-2018 papers with different criteria for proper usage that finds most instances already correct, or a direct survey of researchers that shows widespread grasp of phonemic contrast.
Figures
read the original abstract
The term 'phoneme' lies at the heart of speech science and technology, and yet it is not clear that the research community fully appreciates its meaning and implications. In particular, it is suspected that many researchers use the term in a casual sense to refer to the sounds of speech, rather than as a well defined abstract concept. If true, this means that some sections of the community may be missing an opportunity to understand and exploit the implications of this important psychological phenomenon. Here we review the correct meaning of the term 'phoneme' and report the results of an investigation into its use/misuse in the accepted papers at INTERSPEECH-2018. It is confirmed that a significant proportion of the community (i) may not be aware of the critical difference between `phonetic' and 'phonemic' levels of description, (ii) may not fully understand the significance of 'phonemic contrast', and as a consequence, (iii) consistently misuse the term 'phoneme'. These findings are discussed, and recommendations are made as to how this situation might be mitigated.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reviews the standard linguistic definition of the 'phoneme' as an abstract unit defined by contrast (distinct from phonetic realizations), then reports results from an investigation of its usage in accepted papers at INTERSPEECH-2018. It concludes that a significant proportion of the community misuses the term by employing it to refer to concrete speech sounds rather than the abstract contrastive concept, implying limited awareness of the phonetic/phonemic distinction and the significance of phonemic contrast. Recommendations for mitigation are offered.
Significance. If the empirical results hold under rigorous scrutiny, the work would usefully flag a potential terminology gap in speech technology that could hinder precise exploitation of linguistic concepts in applications such as ASR and synthesis. The paper correctly recalls core linguistic distinctions and supplies concrete recommendations, which are constructive. Its value is primarily diagnostic rather than theoretical; impact would be greatest if the investigation were shown to be reproducible and representative of the broader community.
major comments (2)
- [Investigation section] The description of the investigation (the section following the review of the phoneme definition) supplies no sample size, selection criteria for the INTERSPEECH-2018 papers examined, explicit coding scheme for classifying 'misuse,' or inter-rater reliability statistics. These omissions render the central claim of a 'significant proportion' of misuse impossible to evaluate for objectivity or generalizability, directly undermining the empirical support for conclusions (i)–(iii).
- [Investigation section] The criteria used to identify misuse appear to rest on the authors' chosen linguistic definition without demonstrated decision rules (e.g., whether any non-contrastive syntactic context counts as misuse or only specific patterns). This makes the classification potentially post-hoc and non-reproducible, which is load-bearing for the reported proportion.
minor comments (2)
- [Abstract] The abstract is unusually long and contains the main claims; consider shortening it to focus on the investigation's scope while moving detailed conclusions to the body.
- [Results] No table or figure summarizes the quantitative findings (e.g., counts or percentages of misuse by category); adding one would improve clarity of the results.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for highlighting the need for greater methodological transparency in the investigation section. We agree that the current description is insufficient to allow full evaluation of the empirical claims and will revise the manuscript to address this. Our point-by-point responses follow.
read point-by-point responses
-
Referee: [Investigation section] The description of the investigation (the section following the review of the phoneme definition) supplies no sample size, selection criteria for the INTERSPEECH-2018 papers examined, explicit coding scheme for classifying 'misuse,' or inter-rater reliability statistics. These omissions render the central claim of a 'significant proportion' of misuse impossible to evaluate for objectivity or generalizability, directly undermining the empirical support for conclusions (i)–(iii).
Authors: We acknowledge that the manuscript as submitted omits these methodological details. In the revision we will add: the total number of papers examined, the explicit selection criterion (all accepted INTERSPEECH-2018 papers), the full coding scheme with decision rules for classifying each occurrence, and inter-rater reliability statistics (or a statement that a single coder performed the classification). These additions will allow readers to assess objectivity and generalizability directly. revision: yes
-
Referee: [Investigation section] The criteria used to identify misuse appear to rest on the authors' chosen linguistic definition without demonstrated decision rules (e.g., whether any non-contrastive syntactic context counts as misuse or only specific patterns). This makes the classification potentially post-hoc and non-reproducible, which is load-bearing for the reported proportion.
Authors: We agree that the decision rules must be stated explicitly rather than left implicit. The revised manuscript will include a dedicated subsection detailing the operational criteria: which syntactic contexts trigger a 'misuse' label, how contrastive versus non-contrastive uses are distinguished, and any borderline cases with examples. This will make the classification reproducible from the published text. revision: yes
Circularity Check
No circularity: empirical usage survey grounded in external linguistic definitions
full rationale
The paper contains no equations, derivations, fitted parameters, or predictions. Its central claim is an empirical count of term usage in INTERSPEECH-2018 papers, using a definition of 'phoneme' drawn from standard linguistics rather than any self-referential construction. No self-citation chains, ansatzes, or renamings of known results are load-bearing. The analysis is self-contained against external benchmarks (linguistic literature and the sampled conference proceedings) and does not reduce any result to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption There exists a single, well-defined correct meaning of 'phoneme' that researchers should adhere to.
Reference graph
Works this paper leans on
-
[1]
It is confirmed that a significant proportion of the com- munity (i) may not be aware of the critical difference betwee n ‘phonetic’ and ‘phonemic’ levels of description, (ii) may n ot fully understand the significance of ‘phonemic contrast’, a nd as a consequence, (iii) consistently misuse the term ‘phone me’. These findings are discussed, and recommendation...
-
[2]
the smallest unit of speech that dis- tinguishes one word from another in a particular language
Introduction The idea that speech is organised around a finite set of ‘funda - mental’ sound units is an ancient one, and many languages hav e exploited this phenomenon in the development of their writi ng systems [1, 2]. Of course the study of such sound structures i s the primary remit of the speech sciences, specifically the fie lds of ‘phonetics’ (concer...
-
[3]
The ‘Phoneme’ 2.1. Background According to the pioneering phonetician Daniel Jones, the i dea of the phoneme was recognised from the 1870s, but the term it- self was not in general use until the beginning of the 20th century [6]. The need for such a term arose because early phonetician s had realised that acoustically distinct speech sounds were only perc...
-
[4]
The Study In order to gauge the usage of the term ‘phoneme’ in the broad speech science and technology community, it was decided to analyse the texts of all papers accepted for publication at t he most recent INTERSPEECH conference - INTERSPEECH- 2018 - which took place in Hyderabad, India in August 2018. 791 papers comprising a total of over 3 million wo...
work page 2018
-
[5]
Discussion and Recommendations The results of this investigation clearly demonstrate that , al- though the term ‘phoneme’ is used quite frequently by the speech science and technology community, it is often deploy ed in a casual informal manner without considering its deeper f or- mal implications. This means that, in many cases, the term ‘phoneme’ could ...
-
[6]
Researchers should avoid the term ‘phoneme’ unless they are certain of its meaning. In particular, the term ‘phone’ should be used to describe a generic speech sound, and the term ‘phoneme’ should be reserved to refer to the abstract family of sounds that serve to distinguish one word from another in a particular language
-
[7]
Teachers/supervisors should ensure that newcomers to the field of speech science/technology are fully briefed on the critical difference between ‘phonetic’ and ‘phonemic’ lev els of description, the significance of ‘phonemic contrast’, an d the correct usage of the term ‘phoneme’ [31, pp. 206]
-
[8]
Community associations (such as ISCA and IEEE) should take steps to ensure that their members are aware of the im- portance of using the term ‘phoneme’ correctly
-
[9]
Three key rec om- mendations are made that aim to mitigate the situation
Summary and Conclusion The investigation reported in this paper has confirmed the hy - pothesis that a significant proportion of the community (i) m ay not be aware of the critical difference between ‘phonetic’ a nd ‘phonemic’ levels of description, (ii) may not fully unders tand the significance of ‘phonemic contrast’, and as a consequenc e, (iii) consisten...
-
[10]
P . T. Daniels and W. Bright, Eds., The W orld’s Writing Systems . Oxford: Oxford University Press, 1996
work page 1996
-
[11]
G. Sampson, “Writing Systems,” in The Routledge Handbook of Linguistics, K. Allan, Ed. Abingdon: Routledge, 2016, ch. 4, pp. 47–61
work page 2016
-
[12]
Ladefoged, Elements of Acoustic Phonetics
P . Ladefoged, Elements of Acoustic Phonetics. London: Univer- sity of Chicago Press, 1962
work page 1962
-
[13]
J. D. O’Connor, Phonetics. Harmondsworth, UK: Penguin Books, 1974
work page 1974
-
[14]
K. N. Stevens, Acoustic Phonetics . Cambridge, Mass.: MIT Press, 1998
work page 1998
-
[15]
The history and meaning of the term ’phoneme’,
D. Jones, “The history and meaning of the term ’phoneme’, ” in Phonology: Selected Readings , E. C. Fudge, Ed. Har- mondsworth, UK: Penguin Books, 1973, ch. 1, pp. 17–34
work page 1973
-
[16]
Bybee, Phonology and Language Use
J. Bybee, Phonology and Language Use. Cambridge: Cambridge University Press, 2001
work page 2001
-
[17]
J. N. Holmes and W. Holmes, Speech Synthesis and Recognition . Taylor & Francis, 2002
work page 2002
-
[18]
Large scale discriminative train- ing of hidden Markov models for speech recognition,
P . C. Woodland and D. Povey, “Large scale discriminative train- ing of hidden Markov models for speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 25–47, 2002
work page 2002
-
[19]
The application of hidden Mark ov models in speech recognition,
M. Gales and S. J. Y oung, “The application of hidden Mark ov models in speech recognition,” F oundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195–304, 2007
work page 2007
-
[20]
Taylor, Text-to-Speech Synthesis
P . Taylor, Text-to-Speech Synthesis. Cambridge: Cambridge Uni- versity Press, 2009
work page 2009
-
[21]
Pieraccini, The V oice in the Machine
R. Pieraccini, The V oice in the Machine. MIT Press, Cambridge, MA, 2012
work page 2012
- [22]
- [23]
- [24]
- [25]
-
[26]
International Phonetic Association
“International Phonetic Association.” [Online]. Ava ilable: https://www.internationalphoneticassociation.org
-
[27]
The International Phonetic Associa tion: the first 100 years,
M. K. C. MacMahon, “The International Phonetic Associa tion: the first 100 years,” Journal of the International Phonetic Associ- ation, vol. 16, pp. 30–38, 1986
work page 1986
-
[28]
M. Ashby and J. Maidment, Introducing Phonetic Science. Cam- bridge University Press, 2005
work page 2005
-
[29]
Perceptual restoration of missing speec h sounds,
R. M. Warren, “Perceptual restoration of missing speec h sounds,” Science, vol. 167, no. 3917, pp. 392–393, 1970
work page 1970
-
[30]
Roles and representations of systematic fi ne pho- netic detail in speech understanding,
S. Hawkins, “Roles and representations of systematic fi ne pho- netic detail in speech understanding,” Journal of Phonetics , vol. 31, pp. 373–405, 2003
work page 2003
-
[31]
Le sign de l’´ el´ evation de la voix,
E. Lombard, “Le sign de l’´ el´ evation de la voix,” Ann. Maladies Oreille, Larynx, Nez, Pharynx, vol. 37, pp. 101–119, 1911
work page 1911
-
[32]
Explaining phonetic variation: a sketch of the H&H theory,
B. Lindblom, “Explaining phonetic variation: a sketch of the H&H theory,” in Speech Production and Speech Modelling , W. J. Hardcastle and A. Marchal, Eds. Kluwer Academic Publishers , 1990, pp. 403–439
work page 1990
-
[33]
Moving Beyond the ‘Beads-On-A-String’ Model of Speech,
M. Ostendorf, “Moving Beyond the ‘Beads-On-A-String’ Model of Speech,” in IEEE ASRU W orkshop. Keystone, USA: IEEE, 1999, pp. 79–84
work page 1999
-
[34]
Is there a universal phonetic alphabet?
G. Sampson, “Is there a universal phonetic alphabet?” Language, vol. 50, no. 2, pp. 236–259, 1974
work page 1974
-
[35]
R. F. Port and A. P . Leary, “Against formal phonology,”Language, vol. 81, pp. 927–964, 2005
work page 2005
-
[36]
Wide learning fo r audi- tory comprehension,
E. Shafaei-Bajestan and R. H. Baayen, “Wide learning fo r audi- tory comprehension,” in INTERSPEECH 2018. Hyderabad, In- dia: ISCA, 2018, pp. 966–970
work page 2018
-
[37]
Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015
work page 2015
-
[38]
Generative a d- versarial nets,
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. War de- Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative a d- versarial nets,” in Neural Information Processing Systems (NIPS) Conference, Montreal, Canada, 2014
work page 2014
-
[39]
End-to-end attention-based large vocabulary speech recog- nition,
D. Bahdanau, J. Chorowski, D. Serdyuk, P . Brakel, and Y . Ben- gio, “End-to-end attention-based large vocabulary speech recog- nition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Shanghai, China, 2016
work page 2016
- [40]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.