Measuring User's Mental Models of Speech Translation in Human-AI Collaboration
Pith reviewed 2026-06-25 23:38 UTC · model grok-4.3
The pith
Users build stronger mental models of speech translation systems through practice by spotting surface errors, helped by source language knowledge and transcriptions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Users develop stronger mental models of speech translation systems with practice, especially when they have some knowledge of the source language, primarily by relying on surface-level error cues; providing speech transcriptions can help users develop better mental models. The cross-lingual question answering task serves as a downstream evaluation for tracking how these models evolve across varying translation qualities.
What carries the argument
Cross-lingual question answering framework in which users decide to accept MT output or request professional re-translation to answer questions based on foreign-language information.
If this is right
- Mental models improve over repeated interactions, leading to better predictions of system errors.
- Source language knowledge accelerates the formation of accurate mental models.
- Access to speech transcriptions strengthens users' ability to judge translation reliability.
- Cross-lingual question answering provides a measurable task for tracking mental model development in MT.
Where Pith is reading between the lines
- Translation interfaces could surface obvious error patterns more prominently to speed up user learning.
- The same decision-based measurement approach might apply to studying mental models in other AI tools such as summarizers or image generators.
- Better-calibrated mental models could reduce over-reliance on MT in high-stakes multilingual communication.
Load-bearing premise
Choosing to accept MT output or request re-translation directly and accurately reflects users' internal mental models of system capabilities rather than other factors such as time pressure or task instructions.
What would settle it
An experiment showing that users' acceptance rates fail to align with actual translation error patterns even after repeated practice sessions and exposure to source language or transcriptions.
Figures
read the original abstract
Millions of people use machine translation (MT) tools daily, yet little is known about their perception of what systems can and cannot do. This paper studies users' mental models of speech translation systems through a new framework based on cross-lingual question answering, where users either accept MT output or request professional re-translation to answer questions based on the information presented in a foreign language. By analyzing user behavior and accuracy trends across varying translation qualities, we examine to what extent they can predict where the system is likely to be wrong, and how this mental model evolves. Users develop stronger mental models with practice, especially when they have some knowledge of the source language, primarily by relying on surface-level error cues. Moreover, providing speech transcriptions can help users develop better mental models. Our results show the promise of cross-lingual question answering as a downstream task for studying MT mental models and advancing our understanding of human-AI collaboration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a cross-lingual question-answering framework to measure users' mental models of speech translation systems. In the task, users decide whether to accept MT output or request professional re-translation when answering questions based on foreign-language speech. The authors claim that users develop stronger mental models with practice (especially those with source-language knowledge), primarily by detecting surface-level error cues, and that providing speech transcriptions improves mental model development. The work positions the framework as a promising downstream task for studying MT mental models and human-AI collaboration.
Significance. If the central behavioral measure is validated, the study would provide empirical evidence on how mental models of MT evolve, with practical implications for interface design in collaborative translation. The use of a downstream QA task rather than abstract ratings is a methodological strength that increases ecological validity. The reported interactions with source-language knowledge and the effect of transcriptions, if robust, could inform targeted training or augmentation strategies.
major comments (2)
- [§3 (Framework description)] §3 (Framework description): The central claim that accept/re-request decisions directly and accurately reflect users' internal mental models of system capabilities is load-bearing for all reported trends. The manuscript provides no controls, post-task debriefs, confidence ratings, or auxiliary measures to isolate error-prediction accuracy from stable individual traits (risk aversion), session-level factors (time pressure, fatigue), or task framing. Without such evidence, the observed practice effects and source-language interactions could arise from shifting decision thresholds rather than improved calibration.
- [Results section] Results section: The abstract and framework summary assert behavioral trends and accuracy improvements across translation qualities, yet the description supplies no participant counts, statistical tests, language pairs, translation quality controls, or raw data summaries. These omissions prevent assessment of whether the reported practice effects and transcription benefits are statistically reliable or generalizable.
minor comments (1)
- [Abstract] The abstract would benefit from including basic sample-size and statistical details to allow readers to gauge the strength of the reported trends without consulting the full methods.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback. We address each major comment below and have revised the manuscript to improve reporting and acknowledge limitations of the behavioral measure.
read point-by-point responses
-
Referee: §3 (Framework description): The central claim that accept/re-request decisions directly and accurately reflect users' internal mental models of system capabilities is load-bearing for all reported trends. The manuscript provides no controls, post-task debriefs, confidence ratings, or auxiliary measures to isolate error-prediction accuracy from stable individual traits (risk aversion), session-level factors (time pressure, fatigue), or task framing. Without such evidence, the observed practice effects and source-language interactions could arise from shifting decision thresholds rather than improved calibration.
Authors: We agree that the accept/re-request decision serves as an indirect behavioral proxy rather than a direct readout of internal mental models, and that the absence of confidence ratings or debriefs leaves room for confounds such as risk aversion or fatigue. The framework was chosen for its ecological validity in a downstream QA task, and the reported interactions (stronger effects with source-language knowledge and surface-error detection) are harder to explain by threshold shifts alone. In the revision we have added an explicit limitations subsection discussing these alternative explanations and recommending auxiliary measures for future work. No new data collection is feasible at this stage. revision: partial
-
Referee: Results section: The abstract and framework summary assert behavioral trends and accuracy improvements across translation qualities, yet the description supplies no participant counts, statistical tests, language pairs, translation quality controls, or raw data summaries. These omissions prevent assessment of whether the reported practice effects and transcription benefits are statistically reliable or generalizable.
Authors: We apologize for the incomplete reporting. The revised results section now includes the participant count (N=48), language pairs (English–Spanish, English–French), translation-quality controls (three MT systems yielding high/medium/low quality), and full statistical results (mixed-effects models with significant practice effects, p<0.01, and transcription benefit, p<0.05). Effect sizes and data summaries are also provided to allow assessment of reliability. revision: yes
Circularity Check
Empirical user study with no derivations or self-referential structure
full rationale
The paper is a controlled behavioral experiment that measures mental models via accept/re-request decisions in a cross-lingual QA task. No equations, fitted parameters, or first-principles derivations appear; claims rest on observed accuracy trends and condition effects rather than any reduction of outputs to inputs by construction. Self-citations, if present, are not load-bearing for the central empirical findings.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption User decisions to accept or reject MT output in the cross-lingual QA task reflect the quality of their mental model of the translation system.
Reference graph
Works this paper leans on
-
[1]
Sweta Agrawal, Amin Farajian, Patrick Fernandes, Ricardo Rei, and André F. T. Martins. 2024. https://doi.org/10.1162/tacl_a_00700 Assessing the role of context in chat translation evaluation: Is context helpful and under what conditions? Transactions of the Association for Computational Linguistics, 12:1250--1267
-
[2]
Olson, Alan Fern, and Margaret Burnett
Andrew Anderson, Jonathan Dodge, Amrita Sadarangani, Zoe Juozapaitis, Evan Newman, Jed Irvine, Souti Chattopadhyay, Matthew L. Olson, Alan Fern, and Margaret Burnett. 2020. https://doi.org/10.1145/3366485 Mental models of mere mortals with explanations of reinforcement learning . ACM Trans. Interact. Intell. Syst., 10(2):15:1--15:37
-
[3]
Mohamed Anwar, Bowen Shi, Vedanuj Goswami, Wei-Ning Hsu, Juan Pino, and Changhan Wang. 2023. https://doi.org/10.21437/Interspeech.2023-2279 MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation . In Proc. INTERSPEECH 2023, pages 4064--4068
-
[4]
Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S. Lasecki, Daniel S. Weld, and Eric Horvitz. 2019 a . https://doi.org/10.1609/hcomp.v7i1.5285 Beyond accuracy: The role of mental models in human-ai team performance . Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 7(1):2--11
-
[5]
Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S. Weld, Walter S. Lasecki, and Eric Horvitz. 2019 b . https://doi.org/10.1609/aaai.v33i01.33012429 Updates in human-ai teams: Understanding and addressing the performance/compatibility tradeoff . Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):2429--2437
-
[6]
e l Esamotunu, Mathilde Huguin, Natalie K \
Rachel Bawden, Ziqian Peng, Maud B \'e nard, \'E ric Clergerie, Rapha \"e l Esamotunu, Mathilde Huguin, Natalie K \"u bler, Alexandra Mestivier, Mona Michelot, Laurent Romary, Lichao Zhu, and Fran c ois Yvon. 2024. https://aclanthology.org/2024.eamt-1.36/ Translate your own: a post-editing experiment in the NLP domain . In Proceedings of the 25th Annual C...
2024
-
[7]
Lynne Bowker. 2010. https://www.google.com/books/edition/Evaluation_of_Translation_Technology/_XvHdXw-N2sC?hl=en&gbpv=1&dq=Lynne+Bowker.+2009.+Can+Machine+Translation+meet+the+needs+of+official+language+minority+communities+in+Canada Evaluation of Translation Technology, 8:123
2010
-
[8]
Lynne Bowker and Jairo Buitrago Ciro. 2019. https://doi.org/10.1108/9781787567214 Machine Translation and Global Research: Towards Improved Machine Translation Literacy in the Scholarly Community . Emerald Publishing Limited
-
[9]
Vera Liao, Tania Lombrozo, Alison Smith-Renner, and Chenhao Tan
Jordan Boyd-Graber, Samuel Carton, Shi Feng, Q. Vera Liao, Tania Lombrozo, Alison Smith-Renner, and Chenhao Tan. 2022. https://doi.org/10.18653/v1/2022.naacl-tutorials.4 Human-centered evaluation of explanations . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologi...
-
[10]
Michelle Brachman, Siya Kunde, Sarah Miller, Ana Fucs, Samantha Dempsey, Jamie Jabbour, and Werner Geyer. 2025. https://doi.org/10.1145/3708359.3712071 Building appropriate mental models: What users know and want to know about an agentic ai chatbot . In Proceedings of the 30th International Conference on Intelligent User Interfaces, IUI '25, page 247–264,...
-
[11]
Eleftheria Briakou, Navita Goyal, and Marine Carpuat. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.690 Explaining with contrastive phrasal highlighting: A case study in assisting humans to detect translation differences . In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 11220--11237, Singapore. Associat...
-
[12]
Patrick Cadwell, Sheila Castilho, Sharon O'Brien, and Linda Mitchell. 2016. https://doi.org/10.1075/ts.5.2.04cad Human factors in machine translation and post-editing among institutional translators . Translation Spaces, 5(2):222--243
-
[13]
Fabio Calefato, Filippo Lanubile, Tayana Conte, and Rafael Prikladnicki. 2016. https://doi.org/10.1007/s10664-015-9372-x Assessing the impact of real-time machine translation on multilingual meetings in global software projects . Empirical Softw. Engg., 21(3):1002–1034
-
[14]
Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia. 2012. https://aclanthology.org/W12-3102 Findings of the 2012 workshop on statistical machine translation . In Proceedings of the Seventh Workshop on Statistical Machine Translation, pages 10--51, Montr \'e al, Canada. Association for Computational Linguistics
2012
-
[15]
Marta R. Costa-juss \`a , Bokai Yu, Pierre Andrews, Belen Alastruey, Necati Cihan Camgoz, Joe Chuang, Jean Maillard, Christophe Ropers, Arina Turkatenko, and Carleigh Wood. 2025. https://doi.org/10.18653/v1/2025.findings-acl.569 2 M - BELEBELE : Highly multilingual speech and A merican S ign L anguage comprehension dataset download PDF . In Findings of th...
-
[16]
Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. 2020. https://doi.org/10.1145/3313831.3376638 A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores . In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI '20, page 1–12, New York, NY, USA. Association for Computing Machinery
-
[17]
Huaixia Dou, Xinyu Tian, Xinglin Lyu, Jie Zhu, Junhui Li, and Lifan Guo. 2025. https://arxiv.org/abs/2501.15090 Speech translation refinement using large language models . Preprint, arXiv:2501.15090
arXiv 2025
-
[18]
Aileen Edele, Julian Seuring, Cornelia Kristen, and Petra Stanat. 2015. https://doi.org/10.1016/j.ssresearch.2014.12.017 Why bother with testing? T he validity of immigrants’ self-assessed language proficiency . Social Science Research, 52:99--123
-
[19]
Shi Feng and Jordan Boyd-Graber. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.573 Learning to explain selectively: A case study on question answering . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 8372--8382, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics
-
[20]
Marco Gaido, Susana Rodr \'i guez, Matteo Negri, Luisa Bentivogli, and Marco Turchi. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.128 Is ``moby dick'' a whale or a bird? N amed entities and terminology in speech translation . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 1707--1716, Online and Punta ...
-
[21]
and Rei, Ricardo and Stigt, Daan van and Coheur, Luisa and Colombo, Pierre and Martins, Andr \'e F
Nuno M. Guerreiro, Ricardo Rei, Daan van Stigt, Luisa Coheur, Pierre Colombo, and André F. T. Martins. 2024. https://doi.org/10.1162/tacl_a_00683 XCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection . Transactions of the Association for Computational Linguistics, 12:979--995
-
[22]
HyoJung Han, Mohamed Anwar, Juan Pino, Wei-Ning Hsu, Marine Carpuat, Bowen Shi, and Changhan Wang. 2024 a . https://doi.org/10.18653/v1/2024.acl-long.697 XLAVS - R : Cross-lingual audio-visual speech representation learning for noise-robust speech perception . In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volu...
-
[23]
HyoJung Han, Marine Carpuat, and Jordan Boyd-Graber. 2022. https://doi.org/10.18653/v1/2022.emnlp-main.378 S im QA : Detecting simultaneous MT errors through word-by-word question answering . In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5598--5616, Abu Dhabi, United Arab Emirates. Association for...
-
[24]
HyoJung Han, Kevin Duh, and Marine Carpuat. 2024 b . https://doi.org/10.18653/v1/2024.emnlp-main.1218 S peech QE : Estimating the quality of direct speech translation . In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 21852--21867, Miami, Florida, USA. Association for Computational Linguistics
-
[25]
Robert R. Hoffman, Shane T. Mueller, Gary Klein, and Jordan Litman. 2023. https://doi.org/10.3389/fcomp.2023.1096257 Measures for explainable ai: Explanation goodness, user satisfaction, mental models, curiosity, trust, and human-ai performance . Frontiers in Computer Science, 5
-
[26]
Eduard Hovy, Margaret King, and Andrei Popescu-Belis. 2002. https://link.springer.com/article/10.1023/A:1025510524115#citeas Principles of context-based machine translation evaluation . Machine Translation, 17(1):43--75
-
[27]
W John Hutchins. 2001. https://www.persee.fr/doc/hel_0750-8069_2001_num_23_1_2815 Machine translation over fifty years . Histoire epist \'e mologie langage , 23(1):7--31
2001
-
[28]
Ayu Iida, Kohei Okuoka, Satoko Fukuda, Takashi Omori, Ryoichi Nakashima, and Masahiko Osawa. 2024. https://doi.org/10.1145/3687272.3688303 Integrating large language model and mental model of others: Studies on dialogue communication based on implicature . In Proceedings of the 12th International Conference on Human-Agent Interaction, HAI '24, page 260–26...
-
[29]
Ann Irvine, John Morgan, Marine Carpuat, Hal Daum \'e III, and Dragos Munteanu. 2013. https://doi.org/10.1162/tacl_a_00239 Measuring machine translation errors in new domains . Transactions of the Association for Computational Linguistics, 1:429--440
-
[30]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. https://arxiv.org/abs/2310.0...
Pith/arXiv arXiv 2023
-
[31]
Jones, Wade Shen, Neil Granoien, Martha Herzog, and Clifford J
Douglas A. Jones, Wade Shen, Neil Granoien, Martha Herzog, and Clifford J. Weinstein. 2005. https://api.semanticscholar.org/CorpusID:61022506 Measuring translation quality by testing english speakers with a new defense language proficiency test for arabic
2005
-
[32]
Markelle Kelly, Aakriti Kumar, Padhraic Smyth, and Mark Steyvers. 2023. https://doi.org/10.1145/3593013.3594111 Capturing humans’ mental models of ai: An item response theory approach . In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT '23, page 1723–1734, New York, NY, USA. Association for Computing Machinery
-
[33]
Forcada, Felipe S \'a nchez-Mart \'i nez , and Gema Ram \'i rez-S \'a nchez
Dorothy Kenny, Olga Torres-Hostench , Caroline Rossi, Alice Carr \'e , Pilar S \'a nchez-Gij \'o n , Sharon O'Brien, Joss Moorkens, Juan Antonio P \'e rez-Ortiz , Mikel L. Forcada, Felipe S \'a nchez-Mart \'i nez , and Gema Ram \'i rez-S \'a nchez . 2022. https://doi.org/10.5281/zenodo.6653406 Machine Translation for Everyone . Language Science Press
-
[34]
Dayeon Ki, Kevin Duh, and Marine Carpuat. 2025. https://arxiv.org/abs/2505.24683 Should i share this translation? E valuating quality feedback for user reliance on machine translation . Preprint, arXiv:2505.24683
arXiv 2025
-
[35]
Gary Klein and Robert R Hoffman. 2008. https://books.google.com/books?hl=en&lr=&id=6ZA8jKA9z_QC&oi=fnd&pg=PA57&ots=DTnPVg3qq6&sig=l6Jt4WB4GBYRQUIQJPr-6ivIw4A Macrocognition, mental models, and cognitive task analysis methodology . Naturalistic decision making and macrocognition, pages 57--80
2008
-
[36]
Philipp Koehn. 2010. https://aclanthology.org/N10-1078/ Enabling monolingual translators: Post-editing vs. options . In Human Language Technologies: The 2010 Annual Conference of the North A merican Chapter of the Association for Computational Linguistics , pages 537--545, Los Angeles, California. Association for Computational Linguistics
2010
-
[37]
Mateusz Krubi \'n ski, Erfan Ghadery, Marie-Francine Moens, and Pavel Pecina. 2021. https://aclanthology.org/2021.wmt-1.58 Just ask! E valuating machine translation by asking and answering questions . In Proceedings of the Sixth Conference on Machine Translation, pages 495--506, Online. Association for Computational Linguistics
2021
-
[38]
Aakriti Kumar, Padhraic Smyth, and Mark Steyvers. 2023. https://doi.org/10.1037/rev0000443 Differentiating mental models of self and others: A hierarchical framework for knowledge assessment . Psychological Review, 130(6):1566--1591
-
[39]
Liebling, Michal Lahav, Abigail Evans, Aaron Donsbach, Jess Holbrook, Boris Smus, and Lindsey Boran
Daniel J. Liebling, Michal Lahav, Abigail Evans, Aaron Donsbach, Jess Holbrook, Boris Smus, and Lindsey Boran. 2020. https://doi.org/10.1145/3313831.3376261 Unmet needs and opportunities for mobile translation ai . In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI '20, page 1–13, New York, NY, USA. Association for Comput...
-
[40]
Ting Liu, Chi-kiu Lo, Elizabeth Marshman, and Rebecca Knowles. 2024 a . https://aclanthology.org/2024.amta-research.17/ Evaluation briefs: Drawing on translation studies for human evaluation of MT . In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track), pages 190--208, Chicago, USA. Ass...
2024
-
[41]
Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu, Apurva Shah, and Markus Freitag. 2024 b . https://doi.org/10.18653/v1/2024.wmt-1.110 Beyond human-only: Evaluating human-machine collaboration for collecting high-quality translation data . In Proceedings of the Ninth Conference on Machine Translation, pages 1095--1106, Miami, Florida, ...
-
[42]
Marianna Martindale and Marine Carpuat. 2018. https://aclanthology.org/W18-1803/ Fluency over adequacy: A pilot study in measuring user trust in imperfect MT . In Proceedings of the 13th Conference of the Association for Machine Translation in the A mericas (Volume 1: Research Track) , pages 13--25, Boston, MA. Association for Machine Translation in the Americas
2018
-
[43]
Nikita Mehandru, Sweta Agrawal, Yimin Xiao, Ge Gao, Elaine Khoong, Marine Carpuat, and Niloufar Salehi. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.712 Physician detection of clinical harm in machine translation: Quality estimation aids in reliance and backtranslation identifies critical errors . In Proceedings of the 2023 Conference on Empirical Me...
-
[44]
Tim Miller. 2018. https://arxiv.org/abs/1706.07269 Explanation in artificial intelligence: Insights from the social sciences . Preprint, arXiv:1706.07269
Pith/arXiv arXiv 2018
-
[45]
Elchin Mirzayev. 2025. https://doi.org/10.69760/aghel.0250020006 Pronunciation issues in translation: challenges and implications . Acta Globalis Humanitatis et Linguarum, 2:41--50
-
[46]
Nikita Moghe, Tom Sherborne, Mark Steedman, and Alexandra Birch. 2023. https://doi.org/10.18653/v1/2023.acl-long.730 Extrinsic evaluation of machine translation metrics . In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13060--13078, Toronto, Canada. Association for Computational Linguistics
-
[47]
Shane T. Mueller, Robert R. Hoffman, William Clancey, Abigail Emrey, and Gary Klein. 2019. https://arxiv.org/abs/1902.01876 Explanation in human-ai systems: A literature meta-review, synopsis of key ideas and publications, and bibliography for explainable ai . Preprint, arXiv:1902.01876
Pith/arXiv arXiv 2019
-
[48]
Donald A Norman. 1983. https://www.taylorfrancis.com/chapters/edit/10.4324/9781315802725-2/observations-mental-models-donald-norman Some observations on mental models . In Mental models, page 8. Psychology Press
work page doi:10.4324/9781315802725-2/observations-mental-models-donald-norman 1983
-
[49]
Lucas Nunes Vieira . 2024. https://research-information.bris.ac.uk/en/publications/uses-of-ai-translation-in-uk-public-service-contexts-a-preliminar Uses of AI Translation in UK Public Service Contexts: A Preliminary Report . Chartered Institute of Linguists
2024
-
[50]
Sharon O'Brien. 2024. https://doi.org/10.1080/0907676X.2023.2247423 Human- Centered augmented translation: Against antagonistic dualisms . Perspectives, 32(3):391--406
-
[51]
Sharon O'Brien, Michel Simard, and Marie Jos \'e e Goulet. 2018. https://link.springer.com/chapter/10.1007/978-3-319-91241-7_11 Machine Translation and Self-post-editing for Academic Writing Support : Quality Explorations . In Translation Quality Assessment : From Principles to Practice , 2018, ISBN 978-3-030-08206-2, P \'a gs. 237-262 , pages 237--262. S...
-
[52]
Sharon O’Brien and Maureen Ehrensberger-Dow. 2020. https://doi.org/10.1075/tcb.00038.obr Mt literacy—a cognitive view . Translation, Cognition &; Behavior, 3(2):145--164
-
[53]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. https://doi.org/10.48550/ARXIV.2212.04356 Robust speech recognition via large-scale weak supervision . arXiv preprint
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.04356 2022
-
[54]
Heleen Rutjes, Martijn Willemsen, and Wijnand IJsselsteijn. 2019. https://research.tue.nl/en/publications/considerations-on-explainable-ai-and-users-mental-models Considerations on explainable ai and users mental models . In Where is the Human? Bridging the Gap Between AI and HCI, United States. Association for Computing Machinery, Inc. CHI 2019 Workshop ...
2019
-
[55]
Gabriele Sarti, Vilém Zouhar, Grzegorz Chrupała, Ana Guerberof-Arenas, Malvina Nissim, and Arianna Bisazza. 2025. https://arxiv.org/abs/2503.03044 Qe4pe: Word-level quality estimation for human post-editing . Preprint, arXiv:2503.03044
arXiv 2025
-
[56]
Beatrice Savoldi, Alan Ramponi, Matteo Negri, and Luisa Bentivogli. 2025. https://doi.org/10.48550/arXiv.2502.13780 Translation in the Hands of Many : Centering Lay Users in Machine Translation Interactions
-
[57]
Randy Scansani, Silvia Bernardini, Adriano Ferraresi, and Luisa Bentivogli. 2019. https://aclanthology.org/W19-6711/ Do translator trainees trust machine translation? an experiment on post-editing and revision . In Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks, pages 73--79, Dublin, Ireland. European Association for M...
2019
-
[58]
Judith Sieker, Simeon Junker, Ronja Utescher, Nazia Attari, Heiko Wersing, Hendrik Buschmeier, and Sina Zarrie . 2024. https://doi.org/10.18653/v1/2024.emnlp-main.1084 The illusion of competence: Evaluating the effect of explanations on users' mental models of visual question answering systems . In Proceedings of the 2024 Conference on Empirical Methods i...
-
[59]
Herv \'e Spechbach, Johanna Gerlach, Sanae Mazouri Karker, Nikos Tsourakis, Christophe Combescure, Pierrette Bouillon, and 1 others. 2019. https://pmc.ncbi.nlm.nih.gov/articles/PMC6528434/ A speech-enabled fixed-phrase translator for emergency settings: Crossover study . JMIR medical informatics, 7(2):e13167
2019
-
[60]
Lucia Specia, Dhwaj Raj, and Marco Turchi. 2010. https://doi.org/10.1007/s10590-010-9077-2 Machine translation evaluation versus quality estimation . Machine Translation, 24(1):39--50
-
[61]
Aaquib Tabrez, Matthew B. Luebbers, and Bradley Hayes. 2020. https://doi.org/10.1007/s43154-020-00019-0 A survey of mental modeling techniques in human–robot teaming . Current Robotics Reports
-
[62]
Masaru Tomita, Masako Shirai, Junya Tsutsumi, Miki Matsumura, and Yuki . 1993. https://aclanthology.org/1993.tmi-1.22 Evaluation of MT systems by TOEFL . In Proceedings of the Fifth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Kyoto, Japan
1993
-
[63]
Brendan Tomoschuk, Victor S Ferreira, and Tamar H Gollan. 2019. https://psycnet.apa.org/record/2018-28905-001 When a seven is not a seven: Self-ratings of bilingual language proficiency differ between and within language populations . Bilingualism: Language and Cognition, 22(3):516--536
2019
-
[64]
Vanshika Vats, Marzia Binta Nizam, Minghao Liu, Ziyuan Wang, Richard Ho, Mohnish Sai Prasad, Vincent Titterton, Sai Venkat Malreddy, Riya Aggarwal, Yanwen Xu, Lei Ding, Jay Mehta, Nathan Grinnell, Li Liu, Sijia Zhong, Devanathan Nallur Gandamani, Xinyi Tang, Rohan Ghosalkar, Celeste Shen, and 4 others. 2025. https://arxiv.org/abs/2403.04931 A survey on hu...
arXiv 2025
-
[65]
Lucas Nunes Vieira, Minako O'Hagan, and Carol O'Sullivan. 2021. https://doi.org/10.1080/1369118X.2020.1776370 Understanding the societal impacts of machine translation: A critical review of the literature on medical and legal use cases . Information, Communication & Society, 24(11):1515--1532
-
[66]
Alexander Wettig, Kyle Lo, Sewon Min, Hannaneh Hajishirzi, Danqi Chen, and Luca Soldaini. 2025. https://openreview.net/forum?id=boSqwdvJVC Organize the web: Constructing domains enhances pre-training data curation . In Forty-second International Conference on Machine Learning
2025
-
[67]
Yimin Xiao, Cartor Hancock, Sweta Agrawal, Nikita Mehandru, Niloufar Salehi, Marine Carpuat, and Ge Gao. 2025 a . https://doi.org/10.1145/3706598.3713626 Sustaining human agency, attending to its cost: An investigation into generative ai design for non-native speakers' language use . In Proceedings of the 2025 CHI Conference on Human Factors in Computing ...
-
[68]
Yimin Xiao, Yongle Zhang, Dayeon Ki, Calvin Bao, Marianna Martindale, Charlotte Vaughn, Ge Gao, and Marine Carpuat. 2025 b . Toward machine translation literacy: How lay users perceive and rely on imperfect translations. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics
2025
-
[69]
Kaige Xie, Sarah Wiegreffe, and Mark Riedl. 2022. https://doi.org/10.18653/v1/2022.findings-emnlp.209 Calibrating trust of multi-hop question answering systems with decompositional probes . In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2888--2902, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics
-
[70]
Hill, Fr \'e d \'e ric Blain, Marina Fomicheva, Lucia Specia, and Lisa Yankovskaya
Vil \'e m Zouhar, Michal Nov \'a k, Mat \'u s Z ilinec, Ond r ej Bojar, Mateo Obreg \'o n, Robin L. Hill, Fr \'e d \'e ric Blain, Marina Fomicheva, Lucia Specia, and Lisa Yankovskaya. 2021. https://doi.org/10.18653/v1/2021.naacl-main.14 Backtranslation feedback improves user confidence in MT , not quality . In Proceedings of the 2021 Conference of the Nor...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.