pith. sign in

arxiv: 2607.00019 · v1 · pith:M6G6S3RUnew · submitted 2026-05-29 · 💻 cs.CY · cs.AI

LLMs in the Real World: Evaluating "AI" in Emergency Contexts

Pith reviewed 2026-07-02 22:44 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords LLM deploymentemergency servicesmachine translationpublic communicationAI misconceptionstext-to-911real-world evaluationbest practices
0
0 comments X

The pith

Researchers should explain their findings on simple AI uses to the public to avoid misconceptions in emergency services.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper urges AI researchers to take more responsibility for communicating results beyond academic circles. It presents a case study of an LLM-based translation system for texting 911 operators across 55 languages to demonstrate how such tools are often misunderstood in practice. The authors highlight several common misconceptions and end with specific recommendations for everyone involved in building and rolling out these systems. They contend that attention tends to go to difficult technical challenges while everyday deployment issues receive less scrutiny.

Core claim

While scientific progress often centers on solving the hard problems, it is often the easy ones—problems for which the latest technology is often unnecessary—that are most overlooked, as shown by misconceptions surrounding the initial deployment of an LLM-based text-2-911 system in 55 languages.

What carries the argument

The case study of the initial deployment of an LLM-based machine translation application for a text-2-911 system, used to surface common misconceptions about such technologies in emergency contexts.

If this is right

  • Stakeholders across the development and deployment pipeline should adopt the recommended best practices to reduce risks in emergency AI applications.
  • Greater public articulation of research findings can correct misunderstandings before they affect real emergency responses.
  • Shifting focus to overlooked simple problems can improve the safety of AI tools in critical settings where advanced capabilities are not required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar communication gaps may affect AI use in other high-stakes areas such as medical advice or legal aid for non-native speakers.
  • Developers could test whether basic rule-based translation suffices for many emergency phrases before introducing LLMs.
  • Public education efforts might reduce over-reliance on machine translation when human operators remain available.

Load-bearing premise

The case study of the text-2-911 deployment accurately identifies the misconceptions people hold about these technologies.

What would settle it

A survey or log analysis showing that the misconceptions described do not appear among actual users or operators of similar emergency translation systems would undermine the argument.

read the original abstract

This paper offers a call to action. We urge our colleagues in the research community to play a greater role in the articulation of our findings to the public. To illustrate the stakes we present a case study on the initial stages of an LLM-based machine translation application's deployment in a real-world context: a text-2-911 system advertising capabilities in 55 languages for use in emergencies in which it may be difficult to call operators directly. We identify a number of common misconceptions about technologies such as these, concluding with a set of concrete recommendations and best practices for stakeholders at every stage of the development and deployment pipeline. While the advancement of scientific research often lies in solving the "hard" problems, we argue it is often the "easy" ones -- problems for which the latest technology is often unnecessary -- that are most overlooked.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. This position paper issues a call to action for the research community to better communicate findings to the public. It illustrates the stakes via a case study of the initial deployment of an LLM-based text-2-911 translation system advertised in 55 languages for emergency use, identifies common misconceptions about such technologies, and offers concrete recommendations and best practices. The authors contend that 'easy' problems (those not requiring the latest technology) are frequently overlooked compared to 'hard' problems.

Significance. If the case-study observations are representative, the paper usefully draws attention to risks of deploying machine translation in life-critical emergency contexts and the value of researcher involvement in public articulation of limitations. As a qualitative call to action rather than an empirical study, its contribution lies in framing and recommendations rather than new data or formal results.

major comments (1)
  1. [Abstract] Abstract and case-study description: the manuscript references a case study of an LLM text-2-911 deployment and states that it 'identify a number of common misconceptions,' yet provides no specific observations, error examples, deployment metrics, user reports, or methodological details to ground those misconceptions. This absence is load-bearing for the central claim that the case study illustrates overlooked 'easy' problems.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for highlighting an opportunity to strengthen the abstract's grounding of the case study. We address the comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract and case-study description: the manuscript references a case study of an LLM text-2-911 deployment and states that it 'identify a number of common misconceptions,' yet provides no specific observations, error examples, deployment metrics, user reports, or methodological details to ground those misconceptions. This absence is load-bearing for the central claim that the case study illustrates overlooked 'easy' problems.

    Authors: We agree that the abstract would be improved by briefly referencing one concrete observation from the case study to illustrate the misconceptions. In revision we will add a short clause noting, for example, the observed mismatch between advertised 55-language coverage and reliable performance on time-critical emergency queries. The body of the paper already contains the qualitative description of the deployment and the specific misconceptions encountered; the abstract change will make this connection explicit without converting the position paper into an empirical report. revision: yes

Circularity Check

0 steps flagged

No significant circularity; qualitative position paper with no derivations or fitted claims

full rationale

The paper is explicitly a call to action and qualitative position piece that presents opinions and a descriptive case study on misconceptions around an LLM text-2-911 deployment. It contains no equations, parameters, predictions, uniqueness theorems, or technical derivations that could reduce to their own inputs by construction. No self-citations are invoked as load-bearing premises for any formal result. The central argument is presented as interpretive framing rather than a derived claim, making the work self-contained against external benchmarks with no circular steps present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is a position piece without technical derivations, free parameters, or invented entities. It rests on domain assumptions about researcher responsibilities and the representativeness of the case study.

axioms (1)
  • domain assumption Researchers in AI have an obligation to articulate their findings and limitations to the public
    This underpins the entire call to action in the abstract.

pith-pipeline@v0.9.1-grok · 5666 in / 1197 out tokens · 40821 ms · 2026-07-02T22:44:26.653763+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

82 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    Mohamed Abdalla, Jan Philip Wahle, Terry Ruas, Aur \'e lie N \'e v \'e ol, Fanny Ducel, Saif Mohammad, and Karen Fort. 2023. https://doi.org/10.18653/v1/2023.acl-long.734 The elephant in the room: Analyzing the presence of big tech in natural language processing research . In Proceedings of the 61st Annual Meeting of the Association for Computational Ling...

  2. [2]

    Ali Al-Laith and Rachida Kebdani. 2025. https://aclanthology.org/2025.wacl-1.8.pdf E valuating Calibration of Arabic Pre-trained Language Models on Dialectal Text . In Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4) , pages 68--76

  3. [3]

    Anyaegbuna, N

    C. Anyaegbuna, N. Steele, A. S. Liang, S. P. Ma, I. Lopez, N. Chilukuri, K. Patel, K. Schulman, and J. H. Chen. 2026. https://doi.org/10.1136/bmjhci-2025-102007 Artificial intelligence translation in healthcare: an urgent call for evidence-informed policy frameworks . BMJ Health Care Informatics, 33(1):e102007

  4. [4]

    Seth Aycock, David Stap, Di Wu, Christof Monz, and Khalil Sima'an. 2025. https://openreview.net/forum?id=aMBSY2ebPw Can LLM s Really Learn to Translate a Low-Resource Language from One Grammar Book ? In The Thirteenth International Conference on Learning Representations

  5. [5]

    Benjamin

    R. Benjamin. 2019. https://books.google.com/books?id=G6-hDwAAQBAJ Race After Technology: Abolitionist Tools for the New Jim Code . Polity Press

  6. [7]

    Richard A Berk. 2021. https://doi.org/10.1146/annurev-criminol-051520-012342 Artificial I ntelligence, P redictive P olicing, and R isk A ssessment for L aw E nforcement . Annual Review of Criminology, 4(1):209--237

  7. [8]

    Johana Bhuiyan. 2023. https://www.theguardian.com/us-news/2023/sep/07/ai-translation-app-asylum-application Lost in AI translation: Growing reliance on language apps jeopardizes some asylum applications . The Guardian

  8. [9]

    Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. https://doi.org/10.48550/arXiv.2005.14050 Language ( Technology) is Power: A Critical Survey of “Bias” in NLP . (arXiv:2005.14050). ArXiv:2005.14050 [cs]

  9. [10]

    Anna Burns. 2025. https://mapleridgenews.com/2025/10/02/surrey-police-shooting-death-prompts-calls-for-interpreter-access/ Surrey police shooting death prompts calls for interpreter access . Maple Ridge News

  10. [11]

    Greta Byrum and Ruha Benjamin. 2022. https://doi.org/10.48558/9SEV-4D26 Disrupting the Gospel of Tech Solutionism to Build Tech Justice . Stanford Social Innovation Review

  11. [12]

    CalMatters. 2025. https://calmatters.org/justice/2025/07/ice-detention-deaf-asylum-seeker/ Deaf Mongolian Immigrant Held by ICE in California for 4 Months with No Access to I nterpreter

  12. [13]

    Center for Democracy & Technology . 2025 a . https://cdt.org/insights/content-moderation-in-the-global-south-a-comparative-study-of-four-low-resource-languages/ Content Moderation in the Global South: A Comparative Study of Four Low-Resource Languages

  13. [14]

    Center for Democracy & Technology . 2025 b . https://cdt.org/wp-content/uploads/2025/09/2025-09-22-Humans-in-the-Loop-CDT-Civic-Tech-report-final.pdf Humans in the loop . Civic tech report, Center for Democracy & Technology

  14. [15]

    Central Ohio Hospital Council , Columbus Public Health , and Franklin County Public Health . 2025. https://centralohiohospitals.org/wp-content/uploads/2025/06/HM2025.FINAL2_.pdf Franklin County HealthMap2025: Community Health Needs Assessment

  15. [16]

    Amit Choudhari, Sylvain Guilley, and Khaled Karray. 2021. https://doi.org/10.1109/NICS54270.2021.9701469 Cryscanner: Finding cryptographic libraries misuse . In 2021 8th NAFOSTED Conference on Information and Computer Science (NICS), pages 230--235

  16. [17]

    Colorado General Assembly . 2024. https://leg.colorado.gov/bills/sb24-205 Concerning consumer protections in interactions with artificial intelligence systems . Signed into law May 17, 2024; effective February 1, 2026. Codified at Colo.\ Rev.\ Stat.\ 6-1-1701 et seq

  17. [18]

    Ângela Costa, Wang Ling, Tiago Luís, Rui Correia, and Luísa Coheur. 2015. https://doi.org/10.1007/s10590-015-9169-0 A linguistically motivated taxonomy for machine translation error analysis . Mach. Transl., 29(2):127--161

  18. [19]

    Sara Court and Micha Elsner. 2024. https://doi.org/10.18653/v1/2024.wmt-1.125 Shortcomings of LLM s for low-resource translation: Retrieval and understanding are both the problem . In Proceedings of the Ninth Conference on Machine Translation, pages 1332--1354, Miami, Florida, USA. Association for Computational Linguistics

  19. [20]

    William De Brugger. 2023. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ Chat GPT sets record for fastest growing user base: Analyst note . Accessed October 4, 2025

  20. [21]

    Andrew Deck. 2023. https://restofworld.org/2023/ai-translation-errors-afghan-refugees-asylum/ AI Translation Is Jeopardizing Afghan Asylum Claims . Rest of World

  21. [22]

    Ameet Deshpande, Tanmay Rajpurohit, Karthik Narasimhan, and Ashwin Kalyan. 2023. https://doi.org/10.18653/v1/2023.nllp-1.1 Anthropomorphization of AI : Opportunities and risks . In Proceedings of the Natural Legal Language Processing Workshop 2023, pages 1--7, Singapore. Association for Computational Linguistics

  22. [23]

    K. N. Dew, A. M. Turner, Y. K. Choi, A. Bosold, and K. Kirchhoff. 2018. https://doi.org/10.1016/j.jbi.2018.07.018 Development of machine translation technology for assisting health communication: A systematic review . Journal of Biomedical Informatics, 85:56--67

  23. [24]

    Lelia Erscoi, Annelies Véronique Kleinherenbrink, and Olivia Guest. 2023. https://doi.org/10.31235/osf.io/jqxb6 Pygmalion displacement: When humanising AI dehumanises women

  24. [25]

    Virginia Eubanks. 2018. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin's Press

  25. [26]

    European Parliament and Council of the European Union . 2024. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L_202401689 Regulation ( EU ) 2024/1689 of the European Parliament and of the Council of 13 june 2024 laying down harmonised rules on artificial intelligence (artificial intelligence act)

  26. [28]

    Franklin County Board of Commissioners . 2019. Residents can now text-to-911 in an emergency. Press release. Available at: https://www.franklincountyohio.gov/files/assets/public/v/1/emergency-management/documents/text-911-news-release.pdf (accessed [11/20/2025])

  27. [29]

    Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. https://doi.org/10.1162/tacl_a_00437 Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation . Transactions of the Association for Computational Linguistics, 9:1460--1474

  28. [30]

    Markus Freitag, Nitika Mathur, Daniel Deutsch, Chi-Kiu Lo, Eleftherios Avramidis, Ricardo Rei, Brian Thompson, Frederic Blain, Tom Kocmi, Jiayi Wang, David Ifeoluwa Adelani, Marianna Buchicchio, Chrysoula Zerva, and Alon Lavie. 2024. https://doi.org/10.18653/v1/2024.wmt-1.2 Are LLM s breaking MT metrics? results of the WMT 24 metrics shared task . In Proc...

  29. [31]

    Valentin Hofmann, Pratyusha Ria Kalluri, Dan Jurafsky, and Sharese King. 2024. https://doi.org/10.1038/s41586-024-07856-5 AI generates covertly racist decisions about people based on their dialect . Nature, 633(8028):147--154. Epub 2024 Aug 28

  30. [32]

    Jess Hohenstein and Malte Jung. 2020. https://doi.org/10.1016/j.chb.2019.106190 AI as a moral crumple zone: The effects of AI -mediated communication on attribution and trust . 106:106190

  31. [33]

    Anne H Charity Hudley, Christine Mallinson, and Mary Bucholtz. 2024. Decolonizing linguistics. Oxford University Press

  32. [34]

    International Association of Privacy Professionals . 2025. https://iapp.org/news/a/italy-s-dpa-reaffirms-ban-on-replika-over-ai-and-children-s-privacy-concerns Italy's DPA reaffirms ban on Replika over AI and children's privacy concerns

  33. [35]

    Marie-Odile Junker. 2024. https://aclanthology.org/2024.computel-1.8/ Data-mining and extraction: the gold rush of AI on I ndigenous languages . In Proceedings of the Seventh Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 52--57, St. Julians, Malta. Association for Computational Linguistics

  34. [36]

    Cecilia Kang. 2025. https://www.nytimes.com/2025/03/24/technology/trump-ai-regulation.html Trump Unveils Plan to Overhaul A.I. Regulation . The New York Times. Accessed: 2025-11-16

  35. [37]

    Kapur, Michael Pecht, and Andrew P

    Kailash C. Kapur, Michael Pecht, and Andrew P. Sage. 2014. Reliability engineering. Wiley

  36. [38]

    Antonia Karamolegkou, Sandrine Schiller Hansen, Ariadni Christopoulou, Filippos Stamatiou, Anne Lauscher, and Anders S gaard. 2025. https://doi.org/10.18653/v1/2025.naacl-long.580 Ethical concern identification in NLP : A corpus of ACL A nthology ethics statements . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Associ...

  37. [39]

    Aliah Keller. 2025. https://spectrumnews1.com/oh/columbus/news/2025/06/04/columbus-police-break-language-barriers- Columbus police break language barriers in emergencies with new tools . Spectrum News 1. Published 5:02 AM ET

  38. [40]

    Sachin Kumar, Vidhisha Balachandran, Lucille Njoo, Antonios Anastasopoulos, and Yulia Tsvetkov. 2023. https://aclanthology.org/2023.eacl-main.241/ Language generation models can cause harm: So what can we do about it? an actionable survey . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, page...

  39. [41]

    Jordan Laird. 2025. https://www.dispatch.com/story/news/local/2025/04/23/columbus-911-text-translation-facetime-video-update/83229379007/ Columbus upgrades 911 system with text translation in 55 languages, 'one-way facetime' . The Columbus Dispatch

  40. [42]

    Richard N Landers and Tara S Behrend. 2023. https://doi.org/10.1037/amp0000972 Auditing the AI auditors: A framework for evaluating fairness and bias in high stakes AI predictive models. American Psychologist, 78(1):36

  41. [43]

    David Lazar, Haogang Chen, Xi Wang, and Nickolai Zeldovich. 2014. https://doi.org/10.1145/2637166.2637237 Why does cryptographic software fail? a case study and open problems . In Proceedings of 5th Asia-Pacific Workshop on Systems, APSys '14, New York, NY, USA. Association for Computing Machinery

  42. [44]

    Karim Lekadir, Alejandro F Frangi, Antonio R Porras, Ben Glocker, Celia Cintas, Curtis P Langlotz, Eva Weicken, Folkert W Asselbergs, Fred Prior, Gary S Collins, and 1 others. 2025. Future-ai: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. bmj, 388

  43. [45]

    Bryan Li, Jiaming Luo, Eleftheria Briakou, and Colin Cherry. 2025. https://doi.org/10.18653/v1/2025.knowledgenlp-1.7 Leveraging domain knowledge at inference time for LLM translation: Retrieval versus generation . In Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing, pages 91--106, Albuquerque, Ne...

  44. [46]

    https://www.lsadc.org/linguistics_language_and_the_public_award Linguistics, Language, and the Public Award

    Linguistic Society of America . https://www.lsadc.org/linguistics_language_and_the_public_award Linguistics, Language, and the Public Award

  45. [47]

    Lopez, D

    I. Lopez, D. E. Velasquez, J. H. Chen, and J. A. Rodriguez. 2025. https://doi.org/10.1038/s41746-025-01944-0 Operationalizing machine-assisted translation in healthcare . npj Digital Medicine, 8(1):584

  46. [48]

    Elisabeth Mahase. 2023. Babylon looks to sell gp at hand and other uk business amid financial issues. BMJ: British Medical Journal (Online), 382:p1835

  47. [49]

    Kyle Mahowald, Anna A Ivanova, Idan A Blank, Nancy Kanwisher, Joshua B Tenenbaum, and Evelina Fedorenko. 2024. https://www.evlab.mit.edu/s/Mahowald_Ivanova_et_al_2024_TiCS.pdf Dissociating language and thought in large language models . Trends in cognitive sciences, 28(6):517--540

  48. [50]

    Jonibek Mansurov, Akhmed Sakip, and Alham Fikri Aji. 2025. https://doi.org/10.18653/v1/2025.acl-long.407 Data laundering: Artificially boosting benchmark results through knowledge distillation . In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8332--8345, Vienna, Austria. Association...

  49. [51]

    Nikita Mehandru, Sweta Agrawal, Yimin Xiao, Ge Gao, Elaine Khoong, Marine Carpuat, and Niloufar Salehi. 2023. https://doi.org/10.18653/v1/2023.emnlp-main.712 Physician detection of clinical harm in machine translation: Quality estimation aids in reliance and backtranslation identifies critical errors . In Proceedings of the 2023 Conference on Empirical Me...

  50. [52]

    Timothee Mickus, Elaine Zosa, Raul Vazquez, Teemu Vahtola, J \"o rg Tiedemann, Vincent Segonne, Alessandro Raganato, and Marianna Apidianaki. 2024. https://doi.org/10.18653/v1/2024.semeval-1.273 S em E val-2024 task 6: SHROOM , a shared-task on hallucinations and related observable overgeneration mistakes . In Proceedings of the 18th International Worksho...

  51. [53]

    Venkatesh Mishra, Bimsara Pathiraja, Mihir Parmar, Sat Chidananda, Jayanth Srinivasa, Gaowen Liu, Ali Payani, and Chitta Baral. 2025. Investigating the Shortcomings of LLM s in Step-by-Step Legal Reasoning . arXiv preprint arXiv:2502.05675

  52. [54]

    Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pages 220--229

  53. [55]

    Melanie Mitchell. 2024. https://doi.org/10.1126/science.adt6140 The metaphors of artificial intelligence . Science, 386(6723):eadt6140

  54. [56]

    Sabrina Moreno. 2021. https://richmond.com/news/local/virginia-uses-google-translate-for-covid-vaccine-information-heres-how-that-magnifies-language-barriers-misinformation/article_715cb81a-d880-5c98-aac5-6b30b378bbd3.html Virginia Uses Google Translate for COVID Vaccine Information. Here’s How That Magnifies Language Barriers, Misinformation . Richmond T...

  55. [57]

    Evgeny Morozov. 2013. To save everything, click here: The folly of technological solutionism. Public Affairs

  56. [58]

    Denis Moser, Nikola Stanic, and Murat Sariyar. 2025. https://doi.org/10.1093/jamiaopen/ooaf147 Benchmarking speech-to-text robustness in noisy emergency medical dialogues: an evaluation of models under realistic acoustic conditions . JAMIA Open, 8(6):ooaf147

  57. [59]

    National Immigrant Women’s Advocacy Project (NiWAP) and American University Washington College of Law . 2013. https://niwaplibrary.wcl.american.edu/wp-content/uploads/IMM-Qref-LangAccessUVisaCollaboration.pdf Immigrant and limited english proficient victims’ access to the criminal justice system: The importance of collaboration . Technical report, America...

  58. [60]

    National Institute of Standards and Technology . 2023. https://doi.org/10.6028/NIST.AI.100-1 AI risk management framework ( AI RMF 1.0) . Technical Report NIST AI 100-1, National Institute of Standards and Technology, Gaithersburg, MD

  59. [61]

    National Institute of Standards and Technology . 2026. https://www.nist.gov/programs-projects/concept-note-ai-rmf-profile-trustworthy-ai-critical-infrastructure Profile on trustworthy AI in critical infrastructure . Technical report, National Institute of Standards and Technology, Gaithersburg, MD. Details forthcoming at time of writing

  60. [62]

    Von Nessen

    Joseph C. Von Nessen. 2025. https://www.odvn.org/wp-content/uploads/2025/02/19Feb_EconImpact_release.pdf The Economic Impact of Intimate Partner Violence in Ohio . Report commissioned by Ohio Domestic Violence Network, released Feb. 24, 2025

  61. [63]

    Elizabeth Nielsen, Isaac Rayburn Caswell, Jiaming Luo, and Colin Cherry. 2025. https://aclanthology.org/2025.naacl-short.18/ Alligators all around: Mitigating lexical confusion in low-resource machine translation . In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language ...

  62. [64]

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, and 1 others. 2022. https://papers.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf Training language models to follow instructions with human feedback . Advances in neur...

  63. [65]

    Tekendra Parmar. 2025. https://www.motherjones.com/criminal-justice/2025/08/axon-police-ai-draft-one-foia/ Axon’s Draft One Is Designed to Defy Transparency . Mother Jones. Accessed: 2025‑10‑20

  64. [66]

    Sofia Quaglia. 2022. https://slate.com/technology/2022/09/machine-translation-accuracy-government-danger.html Death by machine translation? Slate. Archived at https://perma.cc/6RD2-3TY3

  65. [67]

    Kevin Roose. 2023. https://www.nytimes.com/2023/02/16/technology/bing-chatbot-transcript.html Bing’s A.I. Chat Reveals Its Feelings: ‘I Want to Be Alive. ’ . The New York Times. Accessed: 2025‑10‑19

  66. [68]

    SAFE-AI Task Force . 2024. https://safeaitf.org/wp-content/uploads/2024/07/SAFE-AI-Guidance-07-01-24.pdf Interpreting safe AI task force guidance: AI and interpreting services . Technical report, Stakeholders Advocating for Fair and Ethical AI in Interpreting. Version dated July 1, 2024

  67. [69]

    SAFE AI Task Force and CoSET . 2025. https://safeaitf.org/wp-content/uploads/2025/09/AI-Interpreting-Solutions-Evaluation-Toolkit_Part-A.pdf AI Interpreting Solutions Evaluation Toolkit, Part A: Organization, Implementation and Management . Technical report, SAFE AI Task Force and the Coalition for Sign Language Equity in Technology (CoSET)

  68. [70]

    Thomas W Sanchez, Marc Brenman, and Xinyue Ye. 2025. The ethical concerns of artificial intelligence in urban planning. Journal of the American Planning Association, 91(2):294--307

  69. [71]

    Danielle Saunders. 2022. Domain adaptation and multi-domain adaptation for neural machine translation: A survey. Journal of Artificial Intelligence Research, 75:351--424

  70. [72]

    Forcada, Miquel Espl \`a -Gomis, and Lucia Specia

    Scarton Scarton, Mikel L. Forcada, Miquel Espl \`a -Gomis, and Lucia Specia. 2019. https://aclanthology.org/2019.iwslt-1.23/ Estimating post-editing effort: a study on human judgements, task-based and reference-based metrics of MT quality . In Proceedings of the 16th International Conference on Spoken Language Translation, Hong Kong. Association for Compu...

  71. [73]

    Behzad Shayegh, Jan-Thorsten Peter, David Vilar, Tobias Domhan, Juraj Juraska, Markus Freitag, and Lili Mou. 2025. https://arxiv.org/pdf/2503.24013? Feeding two birds or favoring one? adequacy--fluency tradeoffs in evaluation and meta-evaluation of machine translation . In Proceedings of the Tenth Conference on Machine Translation (WMT), Volume 1: Researc...

  72. [74]

    Ana Silva, Nikit Srivastava, Tatiana Moteu Ngoli, Michael R \"o der, Diego Moussallem, and Axel-Cyrille Ngonga Ngomo. 2024. Benchmarking low-resource machine translation systems. In Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024), pages 175--185

  73. [75]

    State of Ohio . 2023. https://das.ohio.gov/wps/wcm/connect/gov/de987825-6f6d-41e7-86b9-31c957551975/IT-17.pdf?MOD=AJPERES&CONVERT_TO=url&CACHEID=ROOTWORKSPACE.Z18_K9I401S01H7F40QBNJU3SO1F56-de987825-6f6d-41e7-86b9-31c957551975-oWr6g0E Use of Artificial Intelligence in State of Ohio Solutions . Administrative policy it-17, Ohio Department of Administrative...

  74. [76]

    Taira, Valerie Kreger, Amanda Orue, and Lisa C

    Breena R. Taira, Valerie Kreger, Amanda Orue, and Lisa C. Diamond. 2021. https://doi.org/10.1007/s11606-021-06666-z A pragmatic assessment of google translate for emergency department instructions . Journal of General Internal Medicine, 36(11):3361--3365

  75. [77]

    Alan M. Turing. 1950. Computing machinery and intelligence. Mind, 59(236):433

  76. [78]

    Cruz-Zamora

    United States v. Cruz-Zamora . 2018. United states vs. omar cruz-zamora. The United States District Court for the District of Kansas. Retrieved from https://ecf.ksd.uscourts.gov/cgi-bin/show_public_doc?2017cr40100-24

  77. [79]

    Ashok Urlana, Charaka Vinayak Kumar, Bala Mallikarjunarao Garlapati, Ajeet Kumar Singh, and Rahul Mishra. 2025. No size fits all: The perils and pitfalls of leveraging LLM s vary with company size. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 187--203

  78. [80]

    Baptiste Vasey, Myura Nagendran, Bruce Campbell, David A Clifton, Gary S Collins, Spiros Denaxas, Alastair K Denniston, Livia Faes, Bart Geerts, Mudathir Ibrahim, Xiaoxuan Liu, Bilal A Mateen, Piyush Mathur, Melissa D McCradden, Lauren Morgan, Johan Ordish, Chris Rogers, Suchi Saria, Daniel Shu Wei Ting, and 4 others. 2022. https://doi.org/10.1038/s41591-...

  79. [81]

    Lucas Nunes Vieira. 2020. https://doi.org/10.1075/ts.00023.nun Machine translation in the news: A framing analysis of the written press . Translation Spaces, 9(1):98--122

  80. [82]

    Lucas Nunes Vieira, Minako O'Hagan, and Carol O'Sullivan. 2021. https://doi.org/10.1080/1369118X.2020.1776370 Understanding the societal impacts of machine translation: A critical review of the literature on medical and legal use cases . Information, Communication & Society, 24(11):1515--1532

Showing first 80 references.