pith. sign in

arxiv: 1907.11640 · v1 · pith:3AF5OC4Mnew · submitted 2019-07-26 · 💻 cs.CL · cs.SD· eess.AS

On the Use/Misuse of the Term 'Phoneme'

Pith reviewed 2026-05-24 15:39 UTC · model grok-4.3

classification 💻 cs.CL cs.SDeess.AS
keywords phonemephonemic contrastphoneticspeech scienceterminologyINTERSPEECHmisuselinguistics
0
0 comments X

The pith

Many speech researchers misuse 'phoneme' to label actual sounds instead of an abstract contrast unit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the proper meaning of 'phoneme' as an abstract psychological unit defined by contrasts that distinguish meaning, distinct from the physical sounds described at the phonetic level. It investigates usage in accepted INTERSPEECH-2018 papers and finds that a significant share of authors appear unaware of this distinction or the role of phonemic contrast. As a result, the term is applied casually to refer to speech sounds rather than the defined abstract concept. If accurate, this pattern means parts of the community overlook opportunities to apply the implications of phonemic structure in their work. The authors outline the correct usage and propose steps to reduce the observed misuse.

Core claim

Review of the accepted papers at INTERSPEECH-2018 confirms that a significant proportion of the community may not be aware of the critical difference between phonetic and phonemic levels of description, may not fully understand the significance of phonemic contrast, and as a consequence consistently misuse the term 'phoneme'.

What carries the argument

The phonetic versus phonemic distinction, where phonetic describes concrete speech sounds and phonemic describes abstract units whose contrasts carry meaning distinctions.

If this is right

  • Sections of the community miss chances to understand and apply the implications of the phoneme as a psychological phenomenon.
  • Casual usage reduces precision in descriptions of speech data and models.
  • Clearer adherence to the distinction would support more effective exploitation of contrast-based structure in applications.
  • Mitigation steps such as targeted guidance could improve shared terminology across publications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The observed pattern may also appear in speech datasets or model training pipelines that label units without reference to contrast.
  • Similar terminology slippage could occur in related areas such as language acquisition studies or voice synthesis evaluation.
  • Explicit training modules on the phonetic-phonemic distinction could be tested for impact on paper clarity in future conferences.

Load-bearing premise

The standard linguistic definition of correct 'phoneme' usage is the standard the speech technology community should follow, and the authors' criteria for spotting misuse in the papers are objective and representative.

What would settle it

A re-analysis of the same INTERSPEECH-2018 papers with different criteria for proper usage that finds most instances already correct, or a direct survey of researchers that shows widespread grasp of phonemic contrast.

Figures

Figures reproduced from arXiv: 1907.11640 by Lucy Skidmore, Roger K. Moore.

Figure 1
Figure 1. Figure 1: Distribution of the occurrences of the term ‘phoneme’ in the INTERSPEECH-2018 and ICSLP-1998 accepted papers [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of the occurrences of the term ‘phoneme’ in the INTERSPEECH-2018 accepted papers in the speech sci￾ence and speech technology categories. Returning to the discussion of use/misuse in Section 3.3, it turned out that, of the papers mentioning the term ‘phoneme’ in potentially inappropriate ways, 25% were categorised as ‘sci￾ence’ and 75% as ‘technology’. However, since there were approximately t… view at source ↗
read the original abstract

The term 'phoneme' lies at the heart of speech science and technology, and yet it is not clear that the research community fully appreciates its meaning and implications. In particular, it is suspected that many researchers use the term in a casual sense to refer to the sounds of speech, rather than as a well defined abstract concept. If true, this means that some sections of the community may be missing an opportunity to understand and exploit the implications of this important psychological phenomenon. Here we review the correct meaning of the term 'phoneme' and report the results of an investigation into its use/misuse in the accepted papers at INTERSPEECH-2018. It is confirmed that a significant proportion of the community (i) may not be aware of the critical difference between `phonetic' and 'phonemic' levels of description, (ii) may not fully understand the significance of 'phonemic contrast', and as a consequence, (iii) consistently misuse the term 'phoneme'. These findings are discussed, and recommendations are made as to how this situation might be mitigated.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper reviews the standard linguistic definition of the 'phoneme' as an abstract unit defined by contrast (distinct from phonetic realizations), then reports results from an investigation of its usage in accepted papers at INTERSPEECH-2018. It concludes that a significant proportion of the community misuses the term by employing it to refer to concrete speech sounds rather than the abstract contrastive concept, implying limited awareness of the phonetic/phonemic distinction and the significance of phonemic contrast. Recommendations for mitigation are offered.

Significance. If the empirical results hold under rigorous scrutiny, the work would usefully flag a potential terminology gap in speech technology that could hinder precise exploitation of linguistic concepts in applications such as ASR and synthesis. The paper correctly recalls core linguistic distinctions and supplies concrete recommendations, which are constructive. Its value is primarily diagnostic rather than theoretical; impact would be greatest if the investigation were shown to be reproducible and representative of the broader community.

major comments (2)
  1. [Investigation section] The description of the investigation (the section following the review of the phoneme definition) supplies no sample size, selection criteria for the INTERSPEECH-2018 papers examined, explicit coding scheme for classifying 'misuse,' or inter-rater reliability statistics. These omissions render the central claim of a 'significant proportion' of misuse impossible to evaluate for objectivity or generalizability, directly undermining the empirical support for conclusions (i)–(iii).
  2. [Investigation section] The criteria used to identify misuse appear to rest on the authors' chosen linguistic definition without demonstrated decision rules (e.g., whether any non-contrastive syntactic context counts as misuse or only specific patterns). This makes the classification potentially post-hoc and non-reproducible, which is load-bearing for the reported proportion.
minor comments (2)
  1. [Abstract] The abstract is unusually long and contains the main claims; consider shortening it to focus on the investigation's scope while moving detailed conclusions to the body.
  2. [Results] No table or figure summarizes the quantitative findings (e.g., counts or percentages of misuse by category); adding one would improve clarity of the results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for highlighting the need for greater methodological transparency in the investigation section. We agree that the current description is insufficient to allow full evaluation of the empirical claims and will revise the manuscript to address this. Our point-by-point responses follow.

read point-by-point responses
  1. Referee: [Investigation section] The description of the investigation (the section following the review of the phoneme definition) supplies no sample size, selection criteria for the INTERSPEECH-2018 papers examined, explicit coding scheme for classifying 'misuse,' or inter-rater reliability statistics. These omissions render the central claim of a 'significant proportion' of misuse impossible to evaluate for objectivity or generalizability, directly undermining the empirical support for conclusions (i)–(iii).

    Authors: We acknowledge that the manuscript as submitted omits these methodological details. In the revision we will add: the total number of papers examined, the explicit selection criterion (all accepted INTERSPEECH-2018 papers), the full coding scheme with decision rules for classifying each occurrence, and inter-rater reliability statistics (or a statement that a single coder performed the classification). These additions will allow readers to assess objectivity and generalizability directly. revision: yes

  2. Referee: [Investigation section] The criteria used to identify misuse appear to rest on the authors' chosen linguistic definition without demonstrated decision rules (e.g., whether any non-contrastive syntactic context counts as misuse or only specific patterns). This makes the classification potentially post-hoc and non-reproducible, which is load-bearing for the reported proportion.

    Authors: We agree that the decision rules must be stated explicitly rather than left implicit. The revised manuscript will include a dedicated subsection detailing the operational criteria: which syntactic contexts trigger a 'misuse' label, how contrastive versus non-contrastive uses are distinguished, and any borderline cases with examples. This will make the classification reproducible from the published text. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical usage survey grounded in external linguistic definitions

full rationale

The paper contains no equations, derivations, fitted parameters, or predictions. Its central claim is an empirical count of term usage in INTERSPEECH-2018 papers, using a definition of 'phoneme' drawn from standard linguistics rather than any self-referential construction. No self-citation chains, ansatzes, or renamings of known results are load-bearing. The analysis is self-contained against external benchmarks (linguistic literature and the sampled conference proceedings) and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

This is a terminology survey paper with no mathematical derivations, fitted parameters, or postulated physical entities. The central claim rests on an assumed standard definition of 'phoneme' drawn from linguistics and on the authors' judgment of what counts as misuse.

axioms (1)
  • domain assumption There exists a single, well-defined correct meaning of 'phoneme' that researchers should adhere to.
    Invoked in the opening paragraph when the authors state that the term 'lies at the heart' yet is used casually rather than as the 'well defined abstract concept'.

pith-pipeline@v0.9.0 · 5723 in / 1255 out tokens · 22247 ms · 2026-05-24T15:39:53.310575+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    These findings are discussed, and recommendations are made as to how this situation might be mitigated

    It is confirmed that a significant proportion of the com- munity (i) may not be aware of the critical difference betwee n ‘phonetic’ and ‘phonemic’ levels of description, (ii) may n ot fully understand the significance of ‘phonemic contrast’, a nd as a consequence, (iii) consistently misuse the term ‘phone me’. These findings are discussed, and recommendation...

  2. [2]

    the smallest unit of speech that dis- tinguishes one word from another in a particular language

    Introduction The idea that speech is organised around a finite set of ‘funda - mental’ sound units is an ancient one, and many languages hav e exploited this phenomenon in the development of their writi ng systems [1, 2]. Of course the study of such sound structures i s the primary remit of the speech sciences, specifically the fie lds of ‘phonetics’ (concer...

  3. [3]

    a family of uttered sounds 2 (segmental elements of speech) in a particular language 3 which count for practical purposes as if they were one and the same

    The ‘Phoneme’ 2.1. Background According to the pioneering phonetician Daniel Jones, the i dea of the phoneme was recognised from the 1870s, but the term it- self was not in general use until the beginning of the 20th century [6]. The need for such a term arose because early phonetician s had realised that acoustically distinct speech sounds were only perc...

  4. [4]

    phoneme

    The Study In order to gauge the usage of the term ‘phoneme’ in the broad speech science and technology community, it was decided to analyse the texts of all papers accepted for publication at t he most recent INTERSPEECH conference - INTERSPEECH- 2018 - which took place in Hyderabad, India in August 2018. 791 papers comprising a total of over 3 million wo...

  5. [5]

    This means that, in many cases, the term ‘phoneme’ could have been substituted by ‘phone’ with no los s of meaning

    Discussion and Recommendations The results of this investigation clearly demonstrate that , al- though the term ‘phoneme’ is used quite frequently by the speech science and technology community, it is often deploy ed in a casual informal manner without considering its deeper f or- mal implications. This means that, in many cases, the term ‘phoneme’ could ...

  6. [6]

    Researchers should avoid the term ‘phoneme’ unless they are certain of its meaning. In particular, the term ‘phone’ should be used to describe a generic speech sound, and the term ‘phoneme’ should be reserved to refer to the abstract family of sounds that serve to distinguish one word from another in a particular language

  7. [7]

    Teachers/supervisors should ensure that newcomers to the field of speech science/technology are fully briefed on the critical difference between ‘phonetic’ and ‘phonemic’ lev els of description, the significance of ‘phonemic contrast’, an d the correct usage of the term ‘phoneme’ [31, pp. 206]

  8. [8]

    Community associations (such as ISCA and IEEE) should take steps to ensure that their members are aware of the im- portance of using the term ‘phoneme’ correctly

  9. [9]

    Three key rec om- mendations are made that aim to mitigate the situation

    Summary and Conclusion The investigation reported in this paper has confirmed the hy - pothesis that a significant proportion of the community (i) m ay not be aware of the critical difference between ‘phonetic’ a nd ‘phonemic’ levels of description, (ii) may not fully unders tand the significance of ‘phonemic contrast’, and as a consequenc e, (iii) consisten...

  10. [10]

    P . T. Daniels and W. Bright, Eds., The W orld’s Writing Systems . Oxford: Oxford University Press, 1996

  11. [11]

    Writing Systems,

    G. Sampson, “Writing Systems,” in The Routledge Handbook of Linguistics, K. Allan, Ed. Abingdon: Routledge, 2016, ch. 4, pp. 47–61

  12. [12]

    Ladefoged, Elements of Acoustic Phonetics

    P . Ladefoged, Elements of Acoustic Phonetics. London: Univer- sity of Chicago Press, 1962

  13. [13]

    J. D. O’Connor, Phonetics. Harmondsworth, UK: Penguin Books, 1974

  14. [14]

    K. N. Stevens, Acoustic Phonetics . Cambridge, Mass.: MIT Press, 1998

  15. [15]

    The history and meaning of the term ’phoneme’,

    D. Jones, “The history and meaning of the term ’phoneme’, ” in Phonology: Selected Readings , E. C. Fudge, Ed. Har- mondsworth, UK: Penguin Books, 1973, ch. 1, pp. 17–34

  16. [16]

    Bybee, Phonology and Language Use

    J. Bybee, Phonology and Language Use. Cambridge: Cambridge University Press, 2001

  17. [17]

    J. N. Holmes and W. Holmes, Speech Synthesis and Recognition . Taylor & Francis, 2002

  18. [18]

    Large scale discriminative train- ing of hidden Markov models for speech recognition,

    P . C. Woodland and D. Povey, “Large scale discriminative train- ing of hidden Markov models for speech recognition,” Computer Speech and Language, vol. 16, no. 1, pp. 25–47, 2002

  19. [19]

    The application of hidden Mark ov models in speech recognition,

    M. Gales and S. J. Y oung, “The application of hidden Mark ov models in speech recognition,” F oundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195–304, 2007

  20. [20]

    Taylor, Text-to-Speech Synthesis

    P . Taylor, Text-to-Speech Synthesis. Cambridge: Cambridge Uni- versity Press, 2009

  21. [21]

    Pieraccini, The V oice in the Machine

    R. Pieraccini, The V oice in the Machine. MIT Press, Cambridge, MA, 2012

  22. [22]

    Phoneme,

    “Phoneme,” in Merriam-W ebster Dictionary. [Online]. Available: https://www.merriam-webster.com/dictionary/phoneme

  23. [23]

    Phoneme,

    “Phoneme,” in Collins English Dictionary, Harper Collins Publishers . [Online]. Available: https://www.collinsdictionary.com/dictionary/english/phoneme

  24. [24]

    Phoneme,

    “Phoneme,” in Encyclopedia Britannica . [Online]. Available: https://www.britannica.com/topic/phoneme

  25. [25]

    Phoneme,

    “Phoneme,” in Wikipedia. [Online]. Available: https://en.wikipedia.org/wiki/Phoneme

  26. [26]

    International Phonetic Association

    “International Phonetic Association.” [Online]. Ava ilable: https://www.internationalphoneticassociation.org

  27. [27]

    The International Phonetic Associa tion: the first 100 years,

    M. K. C. MacMahon, “The International Phonetic Associa tion: the first 100 years,” Journal of the International Phonetic Associ- ation, vol. 16, pp. 30–38, 1986

  28. [28]

    Ashby and J

    M. Ashby and J. Maidment, Introducing Phonetic Science. Cam- bridge University Press, 2005

  29. [29]

    Perceptual restoration of missing speec h sounds,

    R. M. Warren, “Perceptual restoration of missing speec h sounds,” Science, vol. 167, no. 3917, pp. 392–393, 1970

  30. [30]

    Roles and representations of systematic fi ne pho- netic detail in speech understanding,

    S. Hawkins, “Roles and representations of systematic fi ne pho- netic detail in speech understanding,” Journal of Phonetics , vol. 31, pp. 373–405, 2003

  31. [31]

    Le sign de l’´ el´ evation de la voix,

    E. Lombard, “Le sign de l’´ el´ evation de la voix,” Ann. Maladies Oreille, Larynx, Nez, Pharynx, vol. 37, pp. 101–119, 1911

  32. [32]

    Explaining phonetic variation: a sketch of the H&H theory,

    B. Lindblom, “Explaining phonetic variation: a sketch of the H&H theory,” in Speech Production and Speech Modelling , W. J. Hardcastle and A. Marchal, Eds. Kluwer Academic Publishers , 1990, pp. 403–439

  33. [33]

    Moving Beyond the ‘Beads-On-A-String’ Model of Speech,

    M. Ostendorf, “Moving Beyond the ‘Beads-On-A-String’ Model of Speech,” in IEEE ASRU W orkshop. Keystone, USA: IEEE, 1999, pp. 79–84

  34. [34]

    Is there a universal phonetic alphabet?

    G. Sampson, “Is there a universal phonetic alphabet?” Language, vol. 50, no. 2, pp. 236–259, 1974

  35. [35]

    Against formal phonology,

    R. F. Port and A. P . Leary, “Against formal phonology,”Language, vol. 81, pp. 927–964, 2005

  36. [36]

    Wide learning fo r audi- tory comprehension,

    E. Shafaei-Bajestan and R. H. Baayen, “Wide learning fo r audi- tory comprehension,” in INTERSPEECH 2018. Hyderabad, In- dia: ISCA, 2018, pp. 966–970

  37. [37]

    Deep learning,

    Y . LeCun, Y . Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015

  38. [38]

    Generative a d- versarial nets,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. War de- Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative a d- versarial nets,” in Neural Information Processing Systems (NIPS) Conference, Montreal, Canada, 2014

  39. [39]

    End-to-end attention-based large vocabulary speech recog- nition,

    D. Bahdanau, J. Chorowski, D. Serdyuk, P . Brakel, and Y . Ben- gio, “End-to-end attention-based large vocabulary speech recog- nition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Shanghai, China, 2016

  40. [40]

    Gibbon, R

    D. Gibbon, R. K. Moore, and R. Winski, Eds., Handbook of Stan- dards and Resources for Spoken Language Systems. Berlin, New Y ork: Mouton de Gruyter, 1997