pith. sign in

arxiv: 1907.11146 · v1 · pith:UQ74TTK7new · submitted 2019-07-25 · 💻 cs.HC

What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue

Pith reviewed 2026-05-24 16:01 UTC · model grok-4.3

classification 💻 cs.HC
keywords accented speechlexical choicepartner modelshuman-machine dialoguesynthetic speechreferential communicationvoice design
0
0 comments X

The pith

Accented synthetic speech leads people to choose vocabulary that matches the accent's region, just as with human partners.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether accent in a partner's voice shapes the words a user selects during a referential task. Participants described objects to either a human or a machine, and the partner's accent was either US or Irish. Those with a US-accented partner used American English terms more often than those with an Irish-accented partner, and the pattern held for both human and synthetic voices. This suggests that voice design influences users' assumptions about what the partner knows about words. The result implies that synthesis choices are not neutral but actively guide how people speak to machines.

Core claim

In the referential communication experiment, participants were more likely to use American English lexical terms when the dialogue partner had a US accent than when the partner had an Irish accent. The same accent-based shift in lexical choice appeared in both the human-partner condition and the machine-partner condition that used synthetic speech. The authors conclude that accent cues shape partner models of lexical knowledge, which then direct users' word choices.

What carries the argument

Partner models, the assumptions users form about a dialogue partner's knowledge of words, triggered by the accent heard in the partner's speech.

If this is right

  • Users adapt their lexical choices to match the accent region of both human and machine partners.
  • Accent in synthetic speech can signal differences in a machine's presumed lexical knowledge.
  • Design decisions in voice synthesis influence the partner models that guide user language production.
  • The effect of accent on lexical choice operates similarly whether the partner is human or machine.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interface designers could select specific accents to encourage users to adopt certain terms in domain-specific applications such as travel or technical support.
  • Other auditory features like speaking rate or dialect might produce parallel shifts in user vocabulary if they also alter perceived partner knowledge.
  • In systems serving multiple regions, accent choice might improve immediate comprehension but could limit exposure to varied vocabulary over time.

Load-bearing premise

The accent manipulation mainly changes what users think the partner knows about vocabulary, without other uncontrolled differences in how competent or natural the partner sounds.

What would settle it

Running the same referential task and finding no difference in the rate of American English term use between US-accent and Irish-accent conditions would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.11146 by Ali Hayes-Brady, Benjamin R. Cowan, Diego Garaialde, Holly P. Branigan, Jo\~ao Cabral, Justin Edwards, Leigh Clark, Philip Doyle.

Figure 1
Figure 1. Figure 1: Example screenshots of matching turn (a-Top) and [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Total percentage of American-English names used [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

The assumptions we make about a dialogue partner's knowledge and communicative ability (i.e. our partner models) can influence our language choices. Although similar processes may operate in human-machine dialogue, the role of design in shaping these models, and their subsequent effects on interaction are not clearly understood. Focusing on synthesis design, we conduct a referential communication experiment to identify the impact of accented speech on lexical choice. In particular, we focus on whether accented speech may encourage the use of lexical alternatives that are relevant to a partner's accent, and how this is may vary when in dialogue with a human or machine. We find that people are more likely to use American English terms when speaking with a US accented partner than an Irish accented partner in both human and machine conditions. This lends support to the proposal that synthesis design can influence partner perception of lexical knowledge, which in turn guide user's lexical choices. We discuss the findings with relation to the nature and dynamics of partner models in human machine dialogue.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript reports results from a referential communication experiment on the impact of accented speech (US vs. Irish) on lexical choice during dialogue with human or machine partners. The central claim is that participants are more likely to produce American English lexical terms with a US-accented partner than an Irish-accented partner in both conditions; this is taken as evidence that synthesis design shapes partner models of lexical knowledge, which in turn guide users' word choices.

Significance. If the result is robust, the work would add empirical support for partner-model accounts in human-machine interaction by isolating accent as a design factor that modulates lexical adaptation. It has applied relevance for voice interface design. The study is grounded in observed behavioral data rather than derivations or fitted parameters, which is a methodological strength.

major comments (2)
  1. [Methods] Methods: The accent manipulation in the synthetic-speech conditions is not accompanied by manipulation checks or matched ratings on perceived naturalness, intelligibility, or overall competence. This is load-bearing for the central claim because the partner-model interpretation (accent signals lexical knowledge) requires that the US/Irish contrast does not also differ on these global dimensions; without such controls the alternative explanation that participants adapt to perceived competence cannot be ruled out.
  2. [Results] Results (and Abstract): The directional finding is asserted without reported participant numbers, statistical tests, effect sizes, or details on how lexical alternatives were coded and classified. These elements are required to evaluate whether the observed behavior supports the stated claim.
minor comments (1)
  1. [Abstract] Abstract: 'how this is may vary' contains a typographical error and should read 'how this may vary'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Methods] Methods: The accent manipulation in the synthetic-speech conditions is not accompanied by manipulation checks or matched ratings on perceived naturalness, intelligibility, or overall competence. This is load-bearing for the central claim because the partner-model interpretation (accent signals lexical knowledge) requires that the US/Irish contrast does not also differ on these global dimensions; without such controls the alternative explanation that participants adapt to perceived competence cannot be ruled out.

    Authors: We agree this is a substantive concern for isolating the accent-specific mechanism. The study used standard, commercially available TTS voices selected for comparable production quality, and the parallel lexical alignment effect observed with natural human voices (where competence is not in question) provides convergent support for the partner-model account. However, we did not collect participant ratings on naturalness or competence. In the revision we will add an explicit limitations paragraph discussing this alternative explanation and its implications for interpretation. revision: partial

  2. Referee: [Results] Results (and Abstract): The directional finding is asserted without reported participant numbers, statistical tests, effect sizes, or details on how lexical alternatives were coded and classified. These elements are required to evaluate whether the observed behavior supports the stated claim.

    Authors: Participant numbers, statistical tests, effect sizes, and the lexical coding scheme are all reported in the Methods and Results sections of the full manuscript. The abstract presents only the directional summary due to length constraints. We will revise the abstract to include the key quantitative details (sample size, test statistics, and effect size) so that the evidential basis is immediately visible. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical experiment with behavioral data only

full rationale

The paper reports results from a referential communication experiment measuring participants' lexical choices in response to accented speech conditions (human and synthetic). No equations, fitted parameters, predictions derived from inputs, or derivation chains are present. The central claim rests on observed behavioral data rather than any self-referential construction, self-citation load-bearing premise, or renamed known result. Self-citations to prior partner-model literature are standard and do not reduce the empirical finding to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a behavioral experiment reporting an observed effect. No free parameters, mathematical axioms, or new invented entities are introduced or required by the central claim.

pith-pipeline@v0.9.0 · 5731 in / 996 out tokens · 22238 ms · 2026-05-24T16:01:34.331481+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

  1. [1]

    René Amalberti, Noëlle Carbonell, and Pierre Falzon. 1993. User representations of computer systems in human-computer speech interaction. International Journal of Man-Machine Studies 38, 4 (1993), 547–566

  2. [2]

    Dale J Barr, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language 68, 3 (2013), 255–278

  3. [3]

    Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67, 1 (2015), 1–48. https://doi.org/10.18637/jss.v067.i01

  4. [4]

    Allan Bell. 1984. Language style as audience design. Language in society 13, 2 (1984), 145–204

  5. [5]

    Linda Bell and Joakim Gustafson. 1999. Interaction with an animated agent in a spoken dialogue system. In Sixth European Conference on Speech Communication and Technology

  6. [6]

    Ludovic Le Bigot, Patrice Terrier, Virginie Amiel, Gérard Poulain, Eric Jamet, and Jean-François Rouet. 2007. Effect of modality on collaboration with a dialogue system. International Journal of Human-Computer Studies 65, 12 (2007), 983 – 991. https://doi.org/10.1016/j.ijhcs.2007.07.002

  7. [7]

    Holly P Branigan, Martin J Pickering, Jamie Pearson, Janet F McLean, and Ash Brown. 2011. The role of beliefs in lexical alignment: Evidence from dialogs with humans and computers. Cognition 121, 1 (2011), 41–57

  8. [8]

    Susan E Brennan. 1998. The grounding problem in conversations with and through computers. Social and cognitive approaches to interpersonal communica- tion (1998), 201–225

  9. [9]

    Susan E Brennan, Alexia Galati, and Anna K Kuhlen. 2010. Two minds, one dialog: Coordinating speaking and understanding. In Psychology of learning and motivation. Vol. 53. Elsevier, 301–344

  10. [10]

    Rainer Bromme, Riklef Rambow, and Matthias Nückles. 2001. Expertise and estimating what other people know: The influence of professional experience and type of knowledge. Journal of experimental psychology: Applied 7, 4 (2001), 317

  11. [11]

    Herbert H Clark. 1973. The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of verbal learning and verbal behavior 12, 4 (1973), 335–359

  12. [12]

    Herbert H Clark. 1996. Using language. Cambridge University Press

  13. [13]

    Leigh Clark, Phillip Doyle, Diego Garaialde, Emer Gilmartin, Stephan Schlögl, Jens Edlund, Matthew Aylett, João Cabral, Cosmin Munteanu, and Benjamin Cowan. 2018. The State of Speech in HCI: Trends, Themes and Challenges. arXiv preprint arXiv:1810.06828 (2018)

  14. [14]

    Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Christine Murad, Cosmin Munteanu, Vincent Wade, et al. 2019. What Makes a Good Conversation? Challenges in Designing Truly Conversational Agents. arXiv preprint arXiv:1901.06525 (2019)

  15. [15]

    Benjamin R Cowan and Holly P Branigan. 2015. Does voice anthropomorphism affect lexical alignment in speech-based human-computer dialogue?. In Sixteenth Annual Conference of the International Speech Communication Association

  16. [16]

    Benjamin R Cowan, Holly P Branigan, Mateo Obregón, Enas Bugis, and Russell Beale. 2015. Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human- computer dialogue. International Journal of Human-Computer Studies 83 (2015), 27–42

  17. [17]

    Benjamin R Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. What can i help you with?: infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services . ACM, 43

  18. [18]

    Nils Dahlbäck, Seema Swamy, Clifford Nass, Fredrik Arvidsson, and Jörgen Skågeby. 2001. Spoken Interaction with Computers in a Native or Non-native Language-Same or Different. In Proceedings of INTERACT. 294–301

  19. [19]

    Nils Dahlbäck, QianYing Wang, Clifford Nass, and Jenny Alwin. 2007. Similarity is more important than expertise: Accent effects in speech interfaces. InProceedings of the SIGCHI conference on Human factors in computing systems . ACM, 1553– 1556

  20. [20]

    Rick Dale, Alexia Galati, Camila Alviar, Pablo Contreras Kallens, Adolfo G Ramirez-Aristizabal, Maryam Tabatabaeian, and David W Vinson. 2018. In- teracting timescales in perspective-taking. Frontiers in psychology 9 (2018). CUI 2019, August 22–23, 2019, Dublin, Ireland B. R. Cowan et al

  21. [21]

    Nicholas Duran, Rick Dale, and Alexia Galati. 2016. Toward integrative dynamic models for adaptive perspective taking. Topics in cognitive science 8, 4 (2016), 761–779

  22. [22]

    Jens Edlund, Joakim Gustafson, Mattias Heldner, and Anna Hjalmarsson. 2008. Towards human-like spoken dialogue systems. Speech Communication 50, 8 (2008), 630 – 645. https://doi.org/10.1016/j.specom.2008.04.002 Evaluating new methods and models for advanced speech-based interactive systems

  23. [23]

    Susan R Fussell and Robert M Krauss. 1989. Understanding friends and strangers: The effects of audience design on message comprehension. European Journal of Social Psychology 19, 6 (1989), 509–525

  24. [24]

    Susan R Fussell and Robert M Krauss. 1992. Coordination of knowledge in communication: Effects of speakers’ assumptions about what others know. , 378–391 pages. https://doi.org/10.1037/0022-3514.62.3.378

  25. [25]

    Ayako Ikeno and John HL Hansen. 2007. The Effect of Listener Accent Back- ground on Accent Perception and Comprehension. EURASIP Journal on Audio, Speech, and Music Processing 2007, 1 (15 Nov 2007), 076030. https://doi.org/10. 1155/2007/76030

  26. [26]

    Alan Kennedy, Alan Wilkes, Leona Elder, and Wayne S. Murray. 1988. Dia- logue with machines. Cognition 30, 1 (1988), 37 – 72. https://doi.org/10.1016/ 0010-0277(88)90003-0

  27. [27]

    Sara Kiesler, Aaron Powers, Susan R Fussell, and Cristen Torrey. 2008. Anthropo- morphic interactions with a robot and robot–like agent. Social Cognition 26, 2 (2008), 169–181

  28. [28]

    Lucian Leahu, Marisa Cohn, and Wendy March. 2013. How categories come to matter. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3331–3334

  29. [29]

    Sau-lai Lee, Ivy Yee-man Lau, Sara Kiesler, and Chi-Yue Chiu. 2005. Human mental models of humanoid robots. In Proceedings of the 2005 IEEE international conference on robotics and automation . IEEE, 2767–2772

  30. [30]

    Ewa Luger and Abigail Sellen. 2016. Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . ACM, 5286–5297

  31. [31]

    Elizabeth J Meddeb and Patricia Frenz-Belkin. 2010. What? I Didn’t Say THAT!: Linguistic strategies when speaking to write. Journal of Pragmatics 42, 9 (2010), 2415–2429

  32. [32]

    Roger K Moore. 2017. Is spoken language all-or-nothing? Implications for fu- ture speech-based human-machine interaction. In Dialogues with Social Robots . Springer, 281–291

  33. [33]

    Raymond S Nickerson. 1999. How we know—and sometimes misjudge—what others know: Imputing one’s own knowledge to others. Psychological bulletin 125, 6 (1999), 737

  34. [34]

    Sharon Oviatt, Jon Bernard, and Gina-Anne Levow. 1998. Linguistic adaptations during spoken and multimodal error resolution. Language and speech 41, 3-4 (1998), 419–442

  35. [35]

    Jonathan Peirce and Michael MacAskill. 2018. Building experiments in PsychoPy . SAGE, Los Angeles London New Delhi Singapore Washington DC Melbourne. OCLC: 1042943960

  36. [36]

    R Core Team. 2018. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project. org/

  37. [37]

    Ellen Bouchard Ryan and Howard Giles. 1982. An Integrative Perspective for the Study of Attitudes Towards Language Variation. In Attitudes towards language variation: Social and applied contexts . Edward Arnold London, 1–19

  38. [38]

    Henrik Singmann and David Kellen. 2017. An introduction to mixed models for experimental psychology. In New Methods in Neuroscience and Cognitive Psychology. Psychology Press Hove

  39. [39]

    Joan G Snodgrass and Mary Vanderwart. 1980. A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. Journal of experimental psychology: Human learning and memory 6, 2 (1980), 174