What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue
Pith reviewed 2026-05-24 16:01 UTC · model grok-4.3
The pith
Accented synthetic speech leads people to choose vocabulary that matches the accent's region, just as with human partners.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In the referential communication experiment, participants were more likely to use American English lexical terms when the dialogue partner had a US accent than when the partner had an Irish accent. The same accent-based shift in lexical choice appeared in both the human-partner condition and the machine-partner condition that used synthetic speech. The authors conclude that accent cues shape partner models of lexical knowledge, which then direct users' word choices.
What carries the argument
Partner models, the assumptions users form about a dialogue partner's knowledge of words, triggered by the accent heard in the partner's speech.
If this is right
- Users adapt their lexical choices to match the accent region of both human and machine partners.
- Accent in synthetic speech can signal differences in a machine's presumed lexical knowledge.
- Design decisions in voice synthesis influence the partner models that guide user language production.
- The effect of accent on lexical choice operates similarly whether the partner is human or machine.
Where Pith is reading between the lines
- Interface designers could select specific accents to encourage users to adopt certain terms in domain-specific applications such as travel or technical support.
- Other auditory features like speaking rate or dialect might produce parallel shifts in user vocabulary if they also alter perceived partner knowledge.
- In systems serving multiple regions, accent choice might improve immediate comprehension but could limit exposure to varied vocabulary over time.
Load-bearing premise
The accent manipulation mainly changes what users think the partner knows about vocabulary, without other uncontrolled differences in how competent or natural the partner sounds.
What would settle it
Running the same referential task and finding no difference in the rate of American English term use between US-accent and Irish-accent conditions would falsify the claim.
Figures
read the original abstract
The assumptions we make about a dialogue partner's knowledge and communicative ability (i.e. our partner models) can influence our language choices. Although similar processes may operate in human-machine dialogue, the role of design in shaping these models, and their subsequent effects on interaction are not clearly understood. Focusing on synthesis design, we conduct a referential communication experiment to identify the impact of accented speech on lexical choice. In particular, we focus on whether accented speech may encourage the use of lexical alternatives that are relevant to a partner's accent, and how this is may vary when in dialogue with a human or machine. We find that people are more likely to use American English terms when speaking with a US accented partner than an Irish accented partner in both human and machine conditions. This lends support to the proposal that synthesis design can influence partner perception of lexical knowledge, which in turn guide user's lexical choices. We discuss the findings with relation to the nature and dynamics of partner models in human machine dialogue.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports results from a referential communication experiment on the impact of accented speech (US vs. Irish) on lexical choice during dialogue with human or machine partners. The central claim is that participants are more likely to produce American English lexical terms with a US-accented partner than an Irish-accented partner in both conditions; this is taken as evidence that synthesis design shapes partner models of lexical knowledge, which in turn guide users' word choices.
Significance. If the result is robust, the work would add empirical support for partner-model accounts in human-machine interaction by isolating accent as a design factor that modulates lexical adaptation. It has applied relevance for voice interface design. The study is grounded in observed behavioral data rather than derivations or fitted parameters, which is a methodological strength.
major comments (2)
- [Methods] Methods: The accent manipulation in the synthetic-speech conditions is not accompanied by manipulation checks or matched ratings on perceived naturalness, intelligibility, or overall competence. This is load-bearing for the central claim because the partner-model interpretation (accent signals lexical knowledge) requires that the US/Irish contrast does not also differ on these global dimensions; without such controls the alternative explanation that participants adapt to perceived competence cannot be ruled out.
- [Results] Results (and Abstract): The directional finding is asserted without reported participant numbers, statistical tests, effect sizes, or details on how lexical alternatives were coded and classified. These elements are required to evaluate whether the observed behavior supports the stated claim.
minor comments (1)
- [Abstract] Abstract: 'how this is may vary' contains a typographical error and should read 'how this may vary'.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on our manuscript. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Methods] Methods: The accent manipulation in the synthetic-speech conditions is not accompanied by manipulation checks or matched ratings on perceived naturalness, intelligibility, or overall competence. This is load-bearing for the central claim because the partner-model interpretation (accent signals lexical knowledge) requires that the US/Irish contrast does not also differ on these global dimensions; without such controls the alternative explanation that participants adapt to perceived competence cannot be ruled out.
Authors: We agree this is a substantive concern for isolating the accent-specific mechanism. The study used standard, commercially available TTS voices selected for comparable production quality, and the parallel lexical alignment effect observed with natural human voices (where competence is not in question) provides convergent support for the partner-model account. However, we did not collect participant ratings on naturalness or competence. In the revision we will add an explicit limitations paragraph discussing this alternative explanation and its implications for interpretation. revision: partial
-
Referee: [Results] Results (and Abstract): The directional finding is asserted without reported participant numbers, statistical tests, effect sizes, or details on how lexical alternatives were coded and classified. These elements are required to evaluate whether the observed behavior supports the stated claim.
Authors: Participant numbers, statistical tests, effect sizes, and the lexical coding scheme are all reported in the Methods and Results sections of the full manuscript. The abstract presents only the directional summary due to length constraints. We will revise the abstract to include the key quantitative details (sample size, test statistics, and effect size) so that the evidential basis is immediately visible. revision: yes
Circularity Check
No circularity: empirical experiment with behavioral data only
full rationale
The paper reports results from a referential communication experiment measuring participants' lexical choices in response to accented speech conditions (human and synthetic). No equations, fitted parameters, predictions derived from inputs, or derivation chains are present. The central claim rests on observed behavioral data rather than any self-referential construction, self-citation load-bearing premise, or renamed known result. Self-citations to prior partner-model literature are standard and do not reduce the empirical finding to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
René Amalberti, Noëlle Carbonell, and Pierre Falzon. 1993. User representations of computer systems in human-computer speech interaction. International Journal of Man-Machine Studies 38, 4 (1993), 547–566
work page 1993
-
[2]
Dale J Barr, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of memory and language 68, 3 (2013), 255–278
work page 2013
-
[3]
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67, 1 (2015), 1–48. https://doi.org/10.18637/jss.v067.i01
-
[4]
Allan Bell. 1984. Language style as audience design. Language in society 13, 2 (1984), 145–204
work page 1984
-
[5]
Linda Bell and Joakim Gustafson. 1999. Interaction with an animated agent in a spoken dialogue system. In Sixth European Conference on Speech Communication and Technology
work page 1999
-
[6]
Ludovic Le Bigot, Patrice Terrier, Virginie Amiel, Gérard Poulain, Eric Jamet, and Jean-François Rouet. 2007. Effect of modality on collaboration with a dialogue system. International Journal of Human-Computer Studies 65, 12 (2007), 983 – 991. https://doi.org/10.1016/j.ijhcs.2007.07.002
-
[7]
Holly P Branigan, Martin J Pickering, Jamie Pearson, Janet F McLean, and Ash Brown. 2011. The role of beliefs in lexical alignment: Evidence from dialogs with humans and computers. Cognition 121, 1 (2011), 41–57
work page 2011
-
[8]
Susan E Brennan. 1998. The grounding problem in conversations with and through computers. Social and cognitive approaches to interpersonal communica- tion (1998), 201–225
work page 1998
-
[9]
Susan E Brennan, Alexia Galati, and Anna K Kuhlen. 2010. Two minds, one dialog: Coordinating speaking and understanding. In Psychology of learning and motivation. Vol. 53. Elsevier, 301–344
work page 2010
-
[10]
Rainer Bromme, Riklef Rambow, and Matthias Nückles. 2001. Expertise and estimating what other people know: The influence of professional experience and type of knowledge. Journal of experimental psychology: Applied 7, 4 (2001), 317
work page 2001
-
[11]
Herbert H Clark. 1973. The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of verbal learning and verbal behavior 12, 4 (1973), 335–359
work page 1973
-
[12]
Herbert H Clark. 1996. Using language. Cambridge University Press
work page 1996
- [13]
-
[14]
Leigh Clark, Nadia Pantidi, Orla Cooney, Philip Doyle, Diego Garaialde, Justin Edwards, Brendan Spillane, Christine Murad, Cosmin Munteanu, Vincent Wade, et al. 2019. What Makes a Good Conversation? Challenges in Designing Truly Conversational Agents. arXiv preprint arXiv:1901.06525 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[15]
Benjamin R Cowan and Holly P Branigan. 2015. Does voice anthropomorphism affect lexical alignment in speech-based human-computer dialogue?. In Sixteenth Annual Conference of the International Speech Communication Association
work page 2015
-
[16]
Benjamin R Cowan, Holly P Branigan, Mateo Obregón, Enas Bugis, and Russell Beale. 2015. Voice anthropomorphism, interlocutor modelling and alignment effects on syntactic choices in human- computer dialogue. International Journal of Human-Computer Studies 83 (2015), 27–42
work page 2015
-
[17]
Benjamin R Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. 2017. What can i help you with?: infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services . ACM, 43
work page 2017
-
[18]
Nils Dahlbäck, Seema Swamy, Clifford Nass, Fredrik Arvidsson, and Jörgen Skågeby. 2001. Spoken Interaction with Computers in a Native or Non-native Language-Same or Different. In Proceedings of INTERACT. 294–301
work page 2001
-
[19]
Nils Dahlbäck, QianYing Wang, Clifford Nass, and Jenny Alwin. 2007. Similarity is more important than expertise: Accent effects in speech interfaces. InProceedings of the SIGCHI conference on Human factors in computing systems . ACM, 1553– 1556
work page 2007
-
[20]
Rick Dale, Alexia Galati, Camila Alviar, Pablo Contreras Kallens, Adolfo G Ramirez-Aristizabal, Maryam Tabatabaeian, and David W Vinson. 2018. In- teracting timescales in perspective-taking. Frontiers in psychology 9 (2018). CUI 2019, August 22–23, 2019, Dublin, Ireland B. R. Cowan et al
work page 2018
-
[21]
Nicholas Duran, Rick Dale, and Alexia Galati. 2016. Toward integrative dynamic models for adaptive perspective taking. Topics in cognitive science 8, 4 (2016), 761–779
work page 2016
-
[22]
Jens Edlund, Joakim Gustafson, Mattias Heldner, and Anna Hjalmarsson. 2008. Towards human-like spoken dialogue systems. Speech Communication 50, 8 (2008), 630 – 645. https://doi.org/10.1016/j.specom.2008.04.002 Evaluating new methods and models for advanced speech-based interactive systems
-
[23]
Susan R Fussell and Robert M Krauss. 1989. Understanding friends and strangers: The effects of audience design on message comprehension. European Journal of Social Psychology 19, 6 (1989), 509–525
work page 1989
-
[24]
Susan R Fussell and Robert M Krauss. 1992. Coordination of knowledge in communication: Effects of speakers’ assumptions about what others know. , 378–391 pages. https://doi.org/10.1037/0022-3514.62.3.378
-
[25]
Ayako Ikeno and John HL Hansen. 2007. The Effect of Listener Accent Back- ground on Accent Perception and Comprehension. EURASIP Journal on Audio, Speech, and Music Processing 2007, 1 (15 Nov 2007), 076030. https://doi.org/10. 1155/2007/76030
work page 2007
-
[26]
Alan Kennedy, Alan Wilkes, Leona Elder, and Wayne S. Murray. 1988. Dia- logue with machines. Cognition 30, 1 (1988), 37 – 72. https://doi.org/10.1016/ 0010-0277(88)90003-0
work page 1988
-
[27]
Sara Kiesler, Aaron Powers, Susan R Fussell, and Cristen Torrey. 2008. Anthropo- morphic interactions with a robot and robot–like agent. Social Cognition 26, 2 (2008), 169–181
work page 2008
-
[28]
Lucian Leahu, Marisa Cohn, and Wendy March. 2013. How categories come to matter. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3331–3334
work page 2013
-
[29]
Sau-lai Lee, Ivy Yee-man Lau, Sara Kiesler, and Chi-Yue Chiu. 2005. Human mental models of humanoid robots. In Proceedings of the 2005 IEEE international conference on robotics and automation . IEEE, 2767–2772
work page 2005
-
[30]
Ewa Luger and Abigail Sellen. 2016. Like having a really bad PA: the gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . ACM, 5286–5297
work page 2016
-
[31]
Elizabeth J Meddeb and Patricia Frenz-Belkin. 2010. What? I Didn’t Say THAT!: Linguistic strategies when speaking to write. Journal of Pragmatics 42, 9 (2010), 2415–2429
work page 2010
-
[32]
Roger K Moore. 2017. Is spoken language all-or-nothing? Implications for fu- ture speech-based human-machine interaction. In Dialogues with Social Robots . Springer, 281–291
work page 2017
-
[33]
Raymond S Nickerson. 1999. How we know—and sometimes misjudge—what others know: Imputing one’s own knowledge to others. Psychological bulletin 125, 6 (1999), 737
work page 1999
-
[34]
Sharon Oviatt, Jon Bernard, and Gina-Anne Levow. 1998. Linguistic adaptations during spoken and multimodal error resolution. Language and speech 41, 3-4 (1998), 419–442
work page 1998
-
[35]
Jonathan Peirce and Michael MacAskill. 2018. Building experiments in PsychoPy . SAGE, Los Angeles London New Delhi Singapore Washington DC Melbourne. OCLC: 1042943960
work page 2018
-
[36]
R Core Team. 2018. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project. org/
work page 2018
-
[37]
Ellen Bouchard Ryan and Howard Giles. 1982. An Integrative Perspective for the Study of Attitudes Towards Language Variation. In Attitudes towards language variation: Social and applied contexts . Edward Arnold London, 1–19
work page 1982
-
[38]
Henrik Singmann and David Kellen. 2017. An introduction to mixed models for experimental psychology. In New Methods in Neuroscience and Cognitive Psychology. Psychology Press Hove
work page 2017
-
[39]
Joan G Snodgrass and Mary Vanderwart. 1980. A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. Journal of experimental psychology: Human learning and memory 6, 2 (1980), 174
work page 1980
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.