Empowering 9-1-1 Calltaking Training with Generative AI: Experiences and Lessons Learned

Meiyi Ma; Yilin Liu; Zirong Chen

arxiv: 2602.13241 · v2 · pith:L2WBPEUMnew · submitted 2026-01-30 · 💻 cs.CY · cs.AI· cs.HC

Empowering 9-1-1 Calltaking Training with Generative AI: Experiences and Lessons Learned

Zirong Chen , Yilin Liu , Meiyi Ma This is my paper

Pith reviewed 2026-05-25 07:07 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.HC

keywords generative AI911 call-taker trainingemergency communicationspublic safety systemsAI deployment lessonssafety-critical trainingreal-world evaluationorganizational processes

0 comments

The pith

A generative AI training system for 9-1-1 call-takers was deployed at scale in Nashville, producing four lessons on design and governance drawn from real operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports on a partnership to create and run a generative AI system that trains emergency call-takers to handle 911 calls. Traditional methods require hundreds of hours of one-on-one time and cannot keep up with staffing shortages above 25 percent. The system reached 190 users across 1,120 sessions in six months, and logs of 98,429 interactions plus organizational records were examined to surface challenges in delivery, accuracy, reliability, and human fit that lab tests miss. Each challenge is paired with specific practices for building and overseeing such systems. Readers would care because the guidance is taken directly from live use in a safety-critical public service rather than from simulations.

Core claim

Deployment of the GenAI-powered call-taking training system under real-world constraints scaled from pilot to 190 operational users and 1,120 sessions; analysis of 98,429 user interactions, organizational processes, and stakeholder patterns distilled four key lessons on system delivery, rigor, resilience, and human factors, each paired with concrete design and governance practices for safety-critical public sector environments.

What carries the argument

The six-month live deployment of the GenAI call-taking training system together with systematic review of its usage logs and organizational patterns to extract lessons.

If this is right

AI training systems can expand coverage and speed feedback without pulling experienced staff off active duty for every new hire.
Challenges around delivery, rigor, resilience, and human factors only become visible after systems move from controlled tests into daily operations.
Concrete design choices must be paired with governance rules to keep the training aligned with safety requirements.
Practitioners in other constrained public-sector settings can use the same log-analysis approach to surface their own hidden obstacles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pattern of surfacing hidden challenges through live logs could be applied to AI training tools in fields such as healthcare dispatch or aviation ground operations.
Organizations could test whether adopting the four paired practices shortens the 720-hour training cycle or lowers the 25 percent staffing gap.
Repeating the deployment and analysis in centers with different call volumes or technology stacks would show how much the lessons depend on local conditions.
Long-term use may require ongoing monitoring of how the system interacts with existing shift schedules and performance metrics.

Load-bearing premise

The challenges seen in this single Metro Nashville deployment are representative of those that will appear in other public safety organizations that face similar staffing and training limits.

What would settle it

A second deployment in a different emergency communications center that encounters none of the four reported challenges or finds that the recommended design and governance practices produce no measurable improvement in training outcomes.

Figures

Figures reproduced from arXiv: 2602.13241 by Meiyi Ma, Yilin Liu, Zirong Chen.

**Figure 1.** Figure 1: Workstation view of the deployed training system at a municipal 9- 1-1 communications center. We address this gap through a longitudinal deployment of a GenAI-powered training system within Metro Nashville Department of Emergency Communications (MNDEC), see photos caputured in routine training in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: System workflow showing training assignment, cloud-based caller simulation, real-time [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Continuously iterative design-develop-deploy workflow. The three phases operate concur [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Dispute rates (noted as ‘phantom error’) of user attributed mistakes to system different across experience and performance levels (114 trainees, 1,120 completed sessions). 3.3.1 Observations When training systems perform evaluative functions under pressure, users experiencing difficulty may attribute failures to technology rather than their own performance as a psychological defense mechanism [29, 44]. Thi… view at source ↗

**Figure 5.** Figure 5: Task complexity versus performance and dispute rates (940 sessions with 12+ turns). [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Emergency call-takers form the first operational link in public safety response, handling over 240 million calls annually while facing a sustained training crisis: staffing shortages exceed 25\% in many centers, and preparing a single new hire can require up to 720 hours of one-on-one instruction that removes experienced personnel from active duty. Traditional training approaches struggle to scale under these constraints, limiting both coverage and feedback timeliness. In partnership with Metro Nashville Department of Emergency Communications (MNDEC), we designed, developed, and deployed a GenAI-powered call-taking training system under real-world constraints. Over six months, deployment scaled from initial pilot to 190 operational users across 1,120 training sessions, exposing systematic challenges around system delivery, rigor, resilience, and human factors that remain largely invisible in controlled or purely simulated evaluations. By analyzing deployment logs capturing 98,429 user interactions, organizational processes, and stakeholder engagement patterns, we distill four key lessons, each coupled with concrete design and governance practices. These lessons provide grounded guidance for researchers and practitioners seeking to deliver AI-driven training systems in safety-critical public sector environments where practical constraints fundamentally shape human-centric design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A single-site deployment report that surfaces four practical lessons from live GenAI use in 911 training, with the usual limits on how far those lessons travel.

read the letter

The main thing here is a six-month rollout of a GenAI call-taker trainer at one Nashville center that reached 190 users and logged 98k interactions. They turned the operational data plus stakeholder notes into four lessons on delivery, rigor, resilience, and human factors, each tied to concrete design or governance steps. That combination of scale numbers and on-the-ground adjustments is the part that stands out from most lab-style AI training papers. It shows what actually breaks when you move past pilots into shift schedules, union rules, and existing LMS systems. The work is honest about the constraints that shape safety-critical training and gives practitioners something they can try without starting from scratch. The soft spot is exactly what the stress test flags: everything rests on Metro Nashville specifics. No second site, no cross-check on which lessons depend on local staffing ratios or call volume, and the abstract gives no method for how the four lessons were pulled from the logs or interviews. That makes the generalizability claim thin. The paper is a case study, not a controlled test, so readers will have to judge transfer themselves. This is the kind of report that belongs in a reading group for people who build or evaluate AI systems inside public safety agencies. It is not a methods paper and it will not change theory, but the deployment metrics and the governance suggestions are worth seeing. I would send it to peer review. The practical detail is real even if the scope is narrow, and referees can push on the extraction process and transfer conditions without rejecting the core contribution.

Referee Report

2 major / 2 minor

Summary. The paper describes a six-month real-world deployment of a generative AI-powered training system for 9-1-1 call-takers at Metro Nashville Department of Emergency Communications (MNDEC). It reports scaling from pilot to 190 users and 1,120 sessions, with analysis of 98,429 user interactions, organizational processes, and stakeholder patterns used to identify four key lessons on challenges in system delivery, rigor, resilience, and human factors. Each lesson is paired with concrete design and governance practices intended to guide similar AI training deployments in safety-critical public-sector settings.

Significance. If the lessons hold and transfer, the work supplies rare empirical detail on operational frictions that arise only after deployment at scale in a high-stakes domain, complementing controlled studies by documenting how staffing shortages, existing LMS constraints, and union rules interact with GenAI training tools.

major comments (2)

[Abstract / Methods (lesson distillation)] Abstract and the section describing lesson extraction: the central claim that four lessons were distilled from the 98,429-interaction logs rests on an unstated analytical process; no description is given of coding procedures, inter-rater reliability, controls for selection bias in reported challenges, or validation steps against raw logs or stakeholder transcripts.
[Abstract] Abstract: the assertion that the lessons supply 'grounded guidance' for other safety-critical public-sector environments is load-bearing for the contribution, yet the manuscript contains no cross-site data, no explicit transferability conditions, and no sensitivity checks showing which lessons depend on MNDEC-specific factors such as local staffing ratios or call-volume profile.

minor comments (2)

[Abstract] The abstract states deployment scaled to 190 operational users but does not clarify whether this figure includes only active call-takers or also supervisors and trainers.
[Results / Discussion] No table or figure summarizes the four lessons alongside the supporting log-derived evidence or the paired design practices; such a summary would improve traceability.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each major comment below, indicating where we will revise the paper and where limitations are inherent to the single-site study design.

read point-by-point responses

Referee: [Abstract / Methods (lesson distillation)] Abstract and the section describing lesson extraction: the central claim that four lessons were distilled from the 98,429-interaction logs rests on an unstated analytical process; no description is given of coding procedures, inter-rater reliability, controls for selection bias in reported challenges, or validation steps against raw logs or stakeholder transcripts.

Authors: We agree that the analytical process used to distill the four lessons requires explicit description. The lessons emerged from iterative team review of deployment logs, organizational records, and stakeholder meeting notes, with challenges identified through pattern recognition across the 98,429 interactions and cross-referenced with operational constraints reported by MNDEC staff. No formal qualitative coding protocol with inter-rater reliability metrics was applied, as the process relied on consensus among the co-authors who had direct access to the deployment. In the revised manuscript we will add a dedicated subsection in Methods that documents the steps taken, including how raw logs were sampled, how selection of reported challenges was reviewed for bias, and the absence of formal IRR procedures. This addition will also note the limitations of the approach. revision: yes
Referee: [Abstract] Abstract: the assertion that the lessons supply 'grounded guidance' for other safety-critical public-sector environments is load-bearing for the contribution, yet the manuscript contains no cross-site data, no explicit transferability conditions, and no sensitivity checks showing which lessons depend on MNDEC-specific factors such as local staffing ratios or call-volume profile.

Authors: The referee correctly notes that the study is a single-site deployment and therefore cannot supply cross-site empirical validation or sensitivity analyses across varying staffing ratios or call volumes. We will revise the abstract to moderate the phrasing around 'grounded guidance' and add a new Limitations and Transferability section that explicitly lists MNDEC-specific factors (staffing shortages exceeding 25%, existing LMS constraints, union rules) and states the conditions under which the lessons are most likely to apply (public-safety agencies facing comparable training bottlenecks and regulatory environments). No new data collection is possible, but the added section will provide readers with clearer boundaries for generalization. revision: partial

standing simulated objections not resolved

We cannot supply cross-site data, multi-site validation, or quantitative sensitivity checks across different staffing or call-volume profiles, as the work reports a single six-month deployment at MNDEC.

Circularity Check

0 steps flagged

No circularity: observational case study with no derivations or fitted constructs

full rationale

The paper is a six-month deployment study of a GenAI training system at one site (MNDEC). It reports 98,429 logged interactions and distills four lessons from logs, processes, and stakeholder patterns. No equations, parameters, predictions, or derivations appear; lessons are presented as direct outputs of the observed data rather than constructs that reduce to their own inputs by definition or self-citation. The single-site nature raises external-validity questions but does not create circularity in the reported analysis chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical deployment study with no mathematical model, free parameters, axioms, or postulated entities; relies entirely on observed logs and stakeholder input from one partnership.

pith-pipeline@v0.9.0 · 5742 in / 1298 out tokens · 53295 ms · 2026-05-25T07:07:14.494342+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

A systematic review on fostering appropriate trust in human-ai interaction: Trends, opportunities and challenges

Yasmeen Alufaisan, Layan Alzahrani, Jian Zhou, and Murat Kantarcioglu. A systematic review on fostering appropriate trust in human-ai interaction: Trends, opportunities and challenges. ACM Journal on Responsible Computing, 1(4):1–35, 2024

work page 2024
[2]

Why does the effective context length of llms fall short?, 2024

Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, and Lingpeng Kong. Why does the effective context length of llms fall short?, 2024. arXiv preprint

work page 2024
[3]

Minimum training standards for public safety telecommunicators (apco ans 3.103.2-2015)

APCO International. Minimum training standards for public safety telecommunicators (apco ans 3.103.2-2015). American national standard, APCO International, Daytona Beach, FL,

work page 2015
[4]

ANSI-approved standard

work page
[5]

Standard for the establishment of a quality assurance and quality improvement program for public safety answering points (apco/nena ans 1.107.1-2015)

APCO International and National Emergency Number Association (NENA). Standard for the establishment of a quality assurance and quality improvement program for public safety answering points (apco/nena ans 1.107.1-2015). American national standard, APCO Interna- tional; National Emergency Number Association (NENA), 2015

work page 2015
[6]

Chain-of-thought reasoning in the wild is not always faithful

Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, and Arthur Conmy. Chain-of-thought reasoning in the wild is not always faithful. ICLR 2025 Workshop on Reasoning and Planning for Large Language Models, 2025

work page 2025
[7]

Runtime assurance from signal temporal logic safety spec- ifications

Luke Baird and Samuel Coogan. Runtime assurance from signal temporal logic safety spec- ifications. In2023 American Control Conference (ACC), pages 3535–3540, San Diego, CA, USA, 2023. IEEE

work page 2023
[8]

Socio-technical systems: From design methods to sys- tems engineering.Interacting with Computers, 23(1):4–17, 2011

Gordon Baxter and Ian Sommerville. Socio-technical systems: From design methods to sys- tems engineering.Interacting with Computers, 23(1):4–17, 2011

work page 2011
[9]

Artificial intelligence in emergency medicine: Viewpoint of current applications and foreseeable opportunities and challenges

Matthias Beham, Carmen Vlad, and Sonja Reuter. Artificial intelligence in emergency medicine: Viewpoint of current applications and foreseeable opportunities and challenges. Journal of Medical Internet Research, 25:e40031, 2023

work page 2023
[10]

Robert A. Bjork. Memory and metamemory considerations in the training of human beings. In Janet Metcalfe and Arthur P. Shimamura, editors,Metacognition: Knowing About Knowing, pages 185–205. MIT Press, Cambridge, MA, 1994

work page 1994
[11]

Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[12]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. To trust or to think: Cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making.Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–21, 2021

work page 2021
[13]

Over- coming the challenges of collaboratively adopting artificial intelligence in the public sector

Averill Campion, Mila Gasco-Hernandez, Slava Jankin Mikhaylov, and Marc Esteve. Over- coming the challenges of collaboratively adopting artificial intelligence in the public sector. Social Science Computer Review, 40(2):462–477, 2022

work page 2022
[14]

Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019

Jianyu Chen and Xiang Ran. Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019. 10

work page 2019
[15]

Logidebrief: A signal-temporal logic based automated debriefing approach with large language models integration

Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Maritini, and Meiyi Ma. Logidebrief: A signal-temporal logic based automated debriefing approach with large language models integration. In James Kwok, editor,Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, pages 9582–9590. International...

work page 2025
[16]

Sim911: Towards effective and equitable 9-1-1 dispatcher training with an llm-enabled simulation

Zirong Chen, Elizabeth Chason, Noah Mladenovski, Erin Wilson, Kristin Mullen, Stephen Martini, and Meiyi Ma. Sim911: Towards effective and equitable 9-1-1 dispatcher training with an llm-enabled simulation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27896–27904, 2025

work page 2025
[17]

Cityspec: An intelligent assistant system for requirement specification in smart cities

Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John A Stankovic, and Meiyi Ma. Cityspec: An intelligent assistant system for requirement specification in smart cities. In2022 IEEE International Conference on Smart Computing (SMARTCOMP), pages 32–39. IEEE, 2022

work page 2022
[18]

Cityspec with shield: A secure intelligent assistant for requirement formalization.Pervasive and Mobile Computing, 92:101802, 2023

Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John A Stankovic, and Meiyi Ma. Cityspec with shield: A secure intelligent assistant for requirement formalization.Pervasive and Mobile Computing, 92:101802, 2023

work page 2023
[19]

Auto311: A confidence-guided auto- mated system for non-emergency calls

Zirong Chen, Xutong Sun, Yuanhe Li, and Meiyi Ma. Auto311: A confidence-guided auto- mated system for non-emergency calls. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 21967–21975, 2024

work page 2024
[20]

Teacher, trainer, counsel, spy: How generative ai can bridge or widen the gaps in worker-centric digital phenotyping of wellbeing

Vedant Das Swain and Koustuv Saha. Teacher, trainer, counsel, spy: How generative ai can bridge or widen the gaps in worker-centric digital phenotyping of wellbeing. InProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work, CHI- WORK ’24, New York, NY , USA, 2024. ACM

work page 2024
[21]

The participatory turn in ai design: Theoretical foundations and the current state of practice

Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang. The participatory turn in ai design: Theoretical foundations and the current state of practice. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’23), pages 1–23, New York, NY , USA, 2023. ACM

work page 2023
[22]

Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A

Jyotirmoy V . Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A. Seshia. Robust online monitoring of signal temporal logic. InRuntime Veri- fication, volume 9333 ofLecture Notes in Computer Science, pages 55–70. Springer, Cham, 2015

work page 2015
[23]

Exploring context window of large language models via decomposed positional vectors.Ad- vances in Neural Information Processing Systems, 37:10320–10347, 2024

Zican Dong, Junyi Li, Xin Men, Xin Zhao, Bingning Wang, Zhen Tian, Ji-Rong Wen, et al. Exploring context window of large language models via decomposed positional vectors.Ad- vances in Neural Information Processing Systems, 37:10320–10347, 2024

work page 2024
[24]

Upol Ehsan, Koustuv Saha, Munmun De Choudhury, and Mark O. Riedl. Charting the so- ciotechnical gap in explainable ai: A framework to address the gap in xai.Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1):1–32, 2023

work page 2023
[25]

Equity within ai systems: What can health leaders expect?Healthcare Management Forum, 36(2):119–124, 2023

Christo El Morr. Equity within ai systems: What can health leaders expect?Healthcare Management Forum, 36(2):119–124, 2023

work page 2023
[26]

Wal- brink, Arielle Shibi Rosen, and Isabel T

Eury Hong, Sundes Kazmir, Benjamin Dylik, Rajaram Bellan, William Frey, Sina Ardestani, Ikhlas Al-Hosni, Anna Mattana, Emily Carlson, Margaret Kirk, Megan Fitzwater, Rebecca Goldstein, Dany Furness, Nicola Kydes, Kim Recker, Katie Dilger, Erik Doyle, Traci A. Wal- brink, Arielle Shibi Rosen, and Isabel T. Gross. Exploring the use of a large language model...

work page 2025
[27]

America’s 911 workforce is in crisis

International Academy of Emergency Dispatch and National Association of State 911 Admin- istrators. America’s 911 workforce is in crisis. Technical report, IAED & NASNA, 2023

work page 2023
[28]

The global landscape of ai ethics guidelines

Anna Jobin, Marcello Ienca, and Effy Vayena. The global landscape of ai ethics guidelines. Nature Machine Intelligence, 1(9):389–399, 2019. 11

work page 2019
[29]

Productive failure.Cognition and Instruction, 26(3):379–424, 2008

Manu Kapur. Productive failure.Cognition and Instruction, 26(3):379–424, 2008

work page 2008
[30]

Kubota, Reihane Mojdehbakhsh, Clarissa I

Jennifer T. Kubota, Reihane Mojdehbakhsh, Clarissa I. Cortland, and Elizabeth A. Phelps. Stressing the person: Legal and everyday person attributions under stress.Biological Psychol- ogy, 103:117–124, 2014

work page 2014
[31]

Retrieval- augmented generation for knowledge-intensive nlp tasks.Advances in neural information pro- cessing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Na- man Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.Advances in neural information pro- cessing systems, 33:9459–9474, 2020

work page 2020
[32]

Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

work page 2024
[33]

Google cloud platform.https://cloud.google.com/, 2025

Google LLC. Google cloud platform.https://cloud.google.com/, 2025. Accessed: 2025-11-11

work page 2025
[34]

Who should i trust: Ai or myself? leveraging human and ai correctness likelihood to promote appropriate trust in ai-assisted decision-making

Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma. Who should i trust: Ai or myself? leveraging human and ai correctness likelihood to promote appropriate trust in ai-assisted decision-making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–19, New York, NY , USA, 2023. ACM

work page 2023
[35]

Marraffino, Bradford L

Matthew D. Marraffino, Bradford L. Schroeder, Nicholas W. Fraulini, Wendi L. Van Buskirk, and Cheryl I. Johnson. Adapting training in real time: An empirical test of adaptive difficulty schedules.Military Psychology, 33(3):136–151, 2021

work page 2021
[36]

Turner, Marcia R

Hendrika Meischke, Ian Painter, Anne M. Turner, Marcia R. Weaver, Carol E. Fahrenbruch, Brooke R. Ike, and Scott Stangenes. Protocol: Simulation training to improve 9-1-1 dispatcher identification of cardiac arrest.BMC Emergency Medicine, 16(1):9, 2016

work page 2016
[37]

Implementing AI in the public sector.Public Management Review, Special Issue editorial, 2024

Ines Mergel, Helen Dickinson, Jari Stenvall, and Mila Gascó. Implementing AI in the public sector.Public Management Review, Special Issue editorial, 2024

work page 2024
[38]

Harnessing ai’s potential to lift up under- served communities

Pooja Mittal, Mohamed Jalloh, and Traco Matthews. Harnessing ai’s potential to lift up under- served communities. California Health Care Foundation, April 2025

work page 2025
[39]

The story of socio-technical design: Reflections on its successes, failures and potential.Information Systems Journal, 16(4):317–342, 2006

Enid Mumford. The story of socio-technical design: Reflections on its successes, failures and potential.Information Systems Journal, 16(4):317–342, 2006

work page 2006
[40]

Ai risk management framework (ai rmf 1.0)

National Institute of Standards and Technology. Ai risk management framework (ai rmf 1.0). Technical report, U.S. Department of Commerce, 2023

work page 2023
[41]

Exploring artificial intelligence adop- tion in public organizations: A comparative case study.Public Management Review, 26(5):1– 23, 2024

Oliver Neumann, Kyrillos Guirguis, and Roland Steiner. Exploring artificial intelligence adop- tion in public organizations: A comparative case study.Public Management Review, 26(5):1– 23, 2024

work page 2024
[42]

Gpt-4o-audio-preview (version 2024-12-17)

OpenAI. Gpt-4o-audio-preview (version 2024-12-17). Preview model documentation, 2024. Supports audio-in/audio-out for text and audio modalities

work page 2024
[43]

Pasmore, Stu Winby, Susan A

William A. Pasmore, Stu Winby, Susan A. Mohrman, and Richard Vanasse. Reflections: So- ciotechnical systems design and organization change.The Journal of Change Management, 19(2):67–85, 2019

work page 2019
[44]

Janne Riikonen, Pia Laukkanen-Nevala, Ilkka Virkkunen, Veronica Lindström, and Joonas Pappinen. Differences between the dispatch priority assessments of emergency medical dis- patchers and emergency medical services: A prospective register-based study in finland.Scan- dinavian Journal of Trauma, Resuscitation and Emergency Medicine, 31(1):8, 2023

work page 2023
[45]

The intuitive psychologist and his shortcomings: Distortions in the attribution pro- cess

Lee Ross. The intuitive psychologist and his shortcomings: Distortions in the attribution pro- cess. In Leonard Berkowitz, editor,Advances in Experimental Social Psychology, volume 10, pages 173–220. Academic Press, New York, NY , 1977. 12

work page 1977
[46]

Magid, Paul Chan, Elisabeth D

Comilla Sasson, David J. Magid, Paul Chan, Elisabeth D. Root, Bryan F. McNally, Arthur L. Kellermann, and Jason S. Haukoos. Association of neighborhood characteristics with bystander-initiated cpr.New England Journal of Medicine, 367(17):1607–1615, 2012

work page 2012
[47]

Edge computing: Vision and challenges.IEEE Internet of Things Journal, 3(5):637–646, 2016

Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. Edge computing: Vision and challenges.IEEE Internet of Things Journal, 3(5):637–646, 2016

work page 2016
[48]

Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257–285, 1988

John Sweller. Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257–285, 1988

work page 1988
[49]

Taylor, Christopher A

Michael J. Taylor, Christopher A. Zoda, Kyle L. Rasmussen, Anna M. Washington, Devansh Saxena, and Iheoma U. Ogbonnaya-Ogburu. Democratizing AI in public administration: Im- plementing U.S. federal AI guidelines with maximum feasible participation.AI & Society, 40(5):3653–3662, 2025

work page 2025
[50]

Llama: Open and efficient foundation language models, 2023

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models, 2023. arXiv preprint, v1

work page 2023
[51]

Michelangelo: Long context evaluations beyond haystacks via latent structure queries, 2024

Kiran V odrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shiv- anna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, et al. Michelangelo: Long context evaluations beyond haystacks via latent structure queries, 2024. arXiv preprint

work page 2024
[52]

Vygotsky.Mind in Society: The Development of Higher Psychological Processes

Lev S. Vygotsky.Mind in Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge, MA, 1978

work page 1978
[53]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022
[54]

Mastering symbolic operations: Augmenting language models with compiled neural networks

Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, and Jun Zhao. Mastering symbolic operations: Augmenting language models with compiled neural networks. InPro- ceedings of the Twelfth International Conference on Learning Representations (ICLR), Vienna, Austria, 2024. OpenReview

work page 2024
[55]

Measuring and understand- ing trust calibrations for automated systems: A survey of the state-of-the-art and future direc- tions

Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. Measuring and understand- ing trust calibrations for automated systems: A survey of the state-of-the-art and future direc- tions. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–23, New York, NY , USA, 2023. ACM

work page 2023
[56]

From role- play to drama-interaction: An llm solution

Weiqi Wu, Hongqiu Wu, Lai Jiang, Xingyuan Liu, Hai Zhao, and Min Zhang. From role- play to drama-interaction: An llm solution. InFindings of the Association for Computational Linguistics: ACL 2024, pages 3271–3290, Bangkok, Thailand, 2024. Association for Compu- tational Linguistics

work page 2024
[57]

Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE, 107(8):1738–1762, 2019

Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE, 107(8):1738–1762, 2019. 13 Natural Language Rules Formalized Specifications Call-taker asks for the address in the firstτ 1 turns. DETECT ω[0,τ1] a ,‘ask address’ Caller provides...

work page 2019

[1] [1]

A systematic review on fostering appropriate trust in human-ai interaction: Trends, opportunities and challenges

Yasmeen Alufaisan, Layan Alzahrani, Jian Zhou, and Murat Kantarcioglu. A systematic review on fostering appropriate trust in human-ai interaction: Trends, opportunities and challenges. ACM Journal on Responsible Computing, 1(4):1–35, 2024

work page 2024

[2] [2]

Why does the effective context length of llms fall short?, 2024

Chenxin An, Jun Zhang, Ming Zhong, Lei Li, Shansan Gong, Yao Luo, Jingjing Xu, and Lingpeng Kong. Why does the effective context length of llms fall short?, 2024. arXiv preprint

work page 2024

[3] [3]

Minimum training standards for public safety telecommunicators (apco ans 3.103.2-2015)

APCO International. Minimum training standards for public safety telecommunicators (apco ans 3.103.2-2015). American national standard, APCO International, Daytona Beach, FL,

work page 2015

[4] [4]

ANSI-approved standard

work page

[5] [5]

Standard for the establishment of a quality assurance and quality improvement program for public safety answering points (apco/nena ans 1.107.1-2015)

APCO International and National Emergency Number Association (NENA). Standard for the establishment of a quality assurance and quality improvement program for public safety answering points (apco/nena ans 1.107.1-2015). American national standard, APCO Interna- tional; National Emergency Number Association (NENA), 2015

work page 2015

[6] [6]

Chain-of-thought reasoning in the wild is not always faithful

Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, and Arthur Conmy. Chain-of-thought reasoning in the wild is not always faithful. ICLR 2025 Workshop on Reasoning and Planning for Large Language Models, 2025

work page 2025

[7] [7]

Runtime assurance from signal temporal logic safety spec- ifications

Luke Baird and Samuel Coogan. Runtime assurance from signal temporal logic safety spec- ifications. In2023 American Control Conference (ACC), pages 3535–3540, San Diego, CA, USA, 2023. IEEE

work page 2023

[8] [8]

Socio-technical systems: From design methods to sys- tems engineering.Interacting with Computers, 23(1):4–17, 2011

Gordon Baxter and Ian Sommerville. Socio-technical systems: From design methods to sys- tems engineering.Interacting with Computers, 23(1):4–17, 2011

work page 2011

[9] [9]

Artificial intelligence in emergency medicine: Viewpoint of current applications and foreseeable opportunities and challenges

Matthias Beham, Carmen Vlad, and Sonja Reuter. Artificial intelligence in emergency medicine: Viewpoint of current applications and foreseeable opportunities and challenges. Journal of Medical Internet Research, 25:e40031, 2023

work page 2023

[10] [10]

Robert A. Bjork. Memory and metamemory considerations in the training of human beings. In Janet Metcalfe and Arthur P. Shimamura, editors,Metacognition: Knowing About Knowing, pages 185–205. MIT Press, Cambridge, MA, 1994

work page 1994

[11] [11]

Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language mod- els are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[12] [12]

Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. To trust or to think: Cognitive forcing functions can reduce overreliance on ai in ai-assisted decision-making.Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1):1–21, 2021

work page 2021

[13] [13]

Over- coming the challenges of collaboratively adopting artificial intelligence in the public sector

Averill Campion, Mila Gasco-Hernandez, Slava Jankin Mikhaylov, and Marc Esteve. Over- coming the challenges of collaboratively adopting artificial intelligence in the public sector. Social Science Computer Review, 40(2):462–477, 2022

work page 2022

[14] [14]

Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019

Jianyu Chen and Xiang Ran. Deep learning with edge computing: A review.Proceedings of the IEEE, 107(8):1655–1674, 2019. 10

work page 2019

[15] [15]

Logidebrief: A signal-temporal logic based automated debriefing approach with large language models integration

Zirong Chen, Ziyan An, Jennifer Reynolds, Kristin Mullen, Stephen Maritini, and Meiyi Ma. Logidebrief: A signal-temporal logic based automated debriefing approach with large language models integration. In James Kwok, editor,Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, IJCAI-25, pages 9582–9590. International...

work page 2025

[16] [16]

Sim911: Towards effective and equitable 9-1-1 dispatcher training with an llm-enabled simulation

Zirong Chen, Elizabeth Chason, Noah Mladenovski, Erin Wilson, Kristin Mullen, Stephen Martini, and Meiyi Ma. Sim911: Towards effective and equitable 9-1-1 dispatcher training with an llm-enabled simulation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27896–27904, 2025

work page 2025

[17] [17]

Cityspec: An intelligent assistant system for requirement specification in smart cities

Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John A Stankovic, and Meiyi Ma. Cityspec: An intelligent assistant system for requirement specification in smart cities. In2022 IEEE International Conference on Smart Computing (SMARTCOMP), pages 32–39. IEEE, 2022

work page 2022

[18] [18]

Cityspec with shield: A secure intelligent assistant for requirement formalization.Pervasive and Mobile Computing, 92:101802, 2023

Zirong Chen, Isaac Li, Haoxiang Zhang, Sarah Preum, John A Stankovic, and Meiyi Ma. Cityspec with shield: A secure intelligent assistant for requirement formalization.Pervasive and Mobile Computing, 92:101802, 2023

work page 2023

[19] [19]

Auto311: A confidence-guided auto- mated system for non-emergency calls

Zirong Chen, Xutong Sun, Yuanhe Li, and Meiyi Ma. Auto311: A confidence-guided auto- mated system for non-emergency calls. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 21967–21975, 2024

work page 2024

[20] [20]

Teacher, trainer, counsel, spy: How generative ai can bridge or widen the gaps in worker-centric digital phenotyping of wellbeing

Vedant Das Swain and Koustuv Saha. Teacher, trainer, counsel, spy: How generative ai can bridge or widen the gaps in worker-centric digital phenotyping of wellbeing. InProceedings of the 3rd Annual Meeting of the Symposium on Human-Computer Interaction for Work, CHI- WORK ’24, New York, NY , USA, 2024. ACM

work page 2024

[21] [21]

The participatory turn in ai design: Theoretical foundations and the current state of practice

Fernando Delgado, Stephen Yang, Michael Madaio, and Qian Yang. The participatory turn in ai design: Theoretical foundations and the current state of practice. InProceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO ’23), pages 1–23, New York, NY , USA, 2023. ACM

work page 2023

[22] [22]

Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A

Jyotirmoy V . Deshmukh, Alexandre Donzé, Shromona Ghosh, Xiaoqing Jin, Garvit Juniwal, and Sanjit A. Seshia. Robust online monitoring of signal temporal logic. InRuntime Veri- fication, volume 9333 ofLecture Notes in Computer Science, pages 55–70. Springer, Cham, 2015

work page 2015

[23] [23]

Exploring context window of large language models via decomposed positional vectors.Ad- vances in Neural Information Processing Systems, 37:10320–10347, 2024

Zican Dong, Junyi Li, Xin Men, Xin Zhao, Bingning Wang, Zhen Tian, Ji-Rong Wen, et al. Exploring context window of large language models via decomposed positional vectors.Ad- vances in Neural Information Processing Systems, 37:10320–10347, 2024

work page 2024

[24] [24]

Upol Ehsan, Koustuv Saha, Munmun De Choudhury, and Mark O. Riedl. Charting the so- ciotechnical gap in explainable ai: A framework to address the gap in xai.Proceedings of the ACM on Human-Computer Interaction, 7(CSCW1):1–32, 2023

work page 2023

[25] [25]

Equity within ai systems: What can health leaders expect?Healthcare Management Forum, 36(2):119–124, 2023

Christo El Morr. Equity within ai systems: What can health leaders expect?Healthcare Management Forum, 36(2):119–124, 2023

work page 2023

[26] [26]

Wal- brink, Arielle Shibi Rosen, and Isabel T

Eury Hong, Sundes Kazmir, Benjamin Dylik, Rajaram Bellan, William Frey, Sina Ardestani, Ikhlas Al-Hosni, Anna Mattana, Emily Carlson, Margaret Kirk, Megan Fitzwater, Rebecca Goldstein, Dany Furness, Nicola Kydes, Kim Recker, Katie Dilger, Erik Doyle, Traci A. Wal- brink, Arielle Shibi Rosen, and Isabel T. Gross. Exploring the use of a large language model...

work page 2025

[27] [27]

America’s 911 workforce is in crisis

International Academy of Emergency Dispatch and National Association of State 911 Admin- istrators. America’s 911 workforce is in crisis. Technical report, IAED & NASNA, 2023

work page 2023

[28] [28]

The global landscape of ai ethics guidelines

Anna Jobin, Marcello Ienca, and Effy Vayena. The global landscape of ai ethics guidelines. Nature Machine Intelligence, 1(9):389–399, 2019. 11

work page 2019

[29] [29]

Productive failure.Cognition and Instruction, 26(3):379–424, 2008

Manu Kapur. Productive failure.Cognition and Instruction, 26(3):379–424, 2008

work page 2008

[30] [30]

Kubota, Reihane Mojdehbakhsh, Clarissa I

Jennifer T. Kubota, Reihane Mojdehbakhsh, Clarissa I. Cortland, and Elizabeth A. Phelps. Stressing the person: Legal and everyday person attributions under stress.Biological Psychol- ogy, 103:117–124, 2014

work page 2014

[31] [31]

Retrieval- augmented generation for knowledge-intensive nlp tasks.Advances in neural information pro- cessing systems, 33:9459–9474, 2020

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Na- man Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks.Advances in neural information pro- cessing systems, 33:9459–9474, 2020

work page 2020

[32] [32]

Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

work page 2024

[33] [33]

Google cloud platform.https://cloud.google.com/, 2025

Google LLC. Google cloud platform.https://cloud.google.com/, 2025. Accessed: 2025-11-11

work page 2025

[34] [34]

Who should i trust: Ai or myself? leveraging human and ai correctness likelihood to promote appropriate trust in ai-assisted decision-making

Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma. Who should i trust: Ai or myself? leveraging human and ai correctness likelihood to promote appropriate trust in ai-assisted decision-making. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–19, New York, NY , USA, 2023. ACM

work page 2023

[35] [35]

Marraffino, Bradford L

Matthew D. Marraffino, Bradford L. Schroeder, Nicholas W. Fraulini, Wendi L. Van Buskirk, and Cheryl I. Johnson. Adapting training in real time: An empirical test of adaptive difficulty schedules.Military Psychology, 33(3):136–151, 2021

work page 2021

[36] [36]

Turner, Marcia R

Hendrika Meischke, Ian Painter, Anne M. Turner, Marcia R. Weaver, Carol E. Fahrenbruch, Brooke R. Ike, and Scott Stangenes. Protocol: Simulation training to improve 9-1-1 dispatcher identification of cardiac arrest.BMC Emergency Medicine, 16(1):9, 2016

work page 2016

[37] [37]

Implementing AI in the public sector.Public Management Review, Special Issue editorial, 2024

Ines Mergel, Helen Dickinson, Jari Stenvall, and Mila Gascó. Implementing AI in the public sector.Public Management Review, Special Issue editorial, 2024

work page 2024

[38] [38]

Harnessing ai’s potential to lift up under- served communities

Pooja Mittal, Mohamed Jalloh, and Traco Matthews. Harnessing ai’s potential to lift up under- served communities. California Health Care Foundation, April 2025

work page 2025

[39] [39]

The story of socio-technical design: Reflections on its successes, failures and potential.Information Systems Journal, 16(4):317–342, 2006

Enid Mumford. The story of socio-technical design: Reflections on its successes, failures and potential.Information Systems Journal, 16(4):317–342, 2006

work page 2006

[40] [40]

Ai risk management framework (ai rmf 1.0)

National Institute of Standards and Technology. Ai risk management framework (ai rmf 1.0). Technical report, U.S. Department of Commerce, 2023

work page 2023

[41] [41]

Exploring artificial intelligence adop- tion in public organizations: A comparative case study.Public Management Review, 26(5):1– 23, 2024

Oliver Neumann, Kyrillos Guirguis, and Roland Steiner. Exploring artificial intelligence adop- tion in public organizations: A comparative case study.Public Management Review, 26(5):1– 23, 2024

work page 2024

[42] [42]

Gpt-4o-audio-preview (version 2024-12-17)

OpenAI. Gpt-4o-audio-preview (version 2024-12-17). Preview model documentation, 2024. Supports audio-in/audio-out for text and audio modalities

work page 2024

[43] [43]

Pasmore, Stu Winby, Susan A

William A. Pasmore, Stu Winby, Susan A. Mohrman, and Richard Vanasse. Reflections: So- ciotechnical systems design and organization change.The Journal of Change Management, 19(2):67–85, 2019

work page 2019

[44] [44]

Janne Riikonen, Pia Laukkanen-Nevala, Ilkka Virkkunen, Veronica Lindström, and Joonas Pappinen. Differences between the dispatch priority assessments of emergency medical dis- patchers and emergency medical services: A prospective register-based study in finland.Scan- dinavian Journal of Trauma, Resuscitation and Emergency Medicine, 31(1):8, 2023

work page 2023

[45] [45]

The intuitive psychologist and his shortcomings: Distortions in the attribution pro- cess

Lee Ross. The intuitive psychologist and his shortcomings: Distortions in the attribution pro- cess. In Leonard Berkowitz, editor,Advances in Experimental Social Psychology, volume 10, pages 173–220. Academic Press, New York, NY , 1977. 12

work page 1977

[46] [46]

Magid, Paul Chan, Elisabeth D

Comilla Sasson, David J. Magid, Paul Chan, Elisabeth D. Root, Bryan F. McNally, Arthur L. Kellermann, and Jason S. Haukoos. Association of neighborhood characteristics with bystander-initiated cpr.New England Journal of Medicine, 367(17):1607–1615, 2012

work page 2012

[47] [47]

Edge computing: Vision and challenges.IEEE Internet of Things Journal, 3(5):637–646, 2016

Weisong Shi, Jie Cao, Quan Zhang, Youhuizi Li, and Lanyu Xu. Edge computing: Vision and challenges.IEEE Internet of Things Journal, 3(5):637–646, 2016

work page 2016

[48] [48]

Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257–285, 1988

John Sweller. Cognitive load during problem solving: Effects on learning.Cognitive Science, 12(2):257–285, 1988

work page 1988

[49] [49]

Taylor, Christopher A

Michael J. Taylor, Christopher A. Zoda, Kyle L. Rasmussen, Anna M. Washington, Devansh Saxena, and Iheoma U. Ogbonnaya-Ogburu. Democratizing AI in public administration: Im- plementing U.S. federal AI guidelines with maximum feasible participation.AI & Society, 40(5):3653–3662, 2025

work page 2025

[50] [50]

Llama: Open and efficient foundation language models, 2023

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models, 2023. arXiv preprint, v1

work page 2023

[51] [51]

Michelangelo: Long context evaluations beyond haystacks via latent structure queries, 2024

Kiran V odrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu, Sanil Jain, Rakesh Shiv- anna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi, Bahare Fatemi, et al. Michelangelo: Long context evaluations beyond haystacks via latent structure queries, 2024. arXiv preprint

work page 2024

[52] [52]

Vygotsky.Mind in Society: The Development of Higher Psychological Processes

Lev S. Vygotsky.Mind in Society: The Development of Higher Psychological Processes. Harvard University Press, Cambridge, MA, 1978

work page 1978

[53] [53]

Chain-of-thought prompting elicits reasoning in large language models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022

work page 2022

[54] [54]

Mastering symbolic operations: Augmenting language models with compiled neural networks

Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, and Jun Zhao. Mastering symbolic operations: Augmenting language models with compiled neural networks. InPro- ceedings of the Twelfth International Conference on Learning Representations (ICLR), Vienna, Austria, 2024. OpenReview

work page 2024

[55] [55]

Measuring and understand- ing trust calibrations for automated systems: A survey of the state-of-the-art and future direc- tions

Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. Measuring and understand- ing trust calibrations for automated systems: A survey of the state-of-the-art and future direc- tions. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, pages 1–23, New York, NY , USA, 2023. ACM

work page 2023

[56] [56]

From role- play to drama-interaction: An llm solution

Weiqi Wu, Hongqiu Wu, Lai Jiang, Xingyuan Liu, Hai Zhao, and Min Zhang. From role- play to drama-interaction: An llm solution. InFindings of the Association for Computational Linguistics: ACL 2024, pages 3271–3290, Bangkok, Thailand, 2024. Association for Compu- tational Linguistics

work page 2024

[57] [57]

Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE, 107(8):1738–1762, 2019

Zhi Zhou, Xu Chen, En Li, Liekang Zeng, Ke Luo, and Junshan Zhang. Edge intelligence: Paving the last mile of artificial intelligence with edge computing.Proceedings of the IEEE, 107(8):1738–1762, 2019. 13 Natural Language Rules Formalized Specifications Call-taker asks for the address in the firstτ 1 turns. DETECT ω[0,τ1] a ,‘ask address’ Caller provides...

work page 2019