MDwAIstScheduler: A Low-Cost, Voice-Activated Device for Hands-Free Clinical Scheduling

Diego Mardien; Frank Liu

arxiv: 2604.16352 · v1 · submitted 2026-03-17 · 💻 cs.HC

MDwAIstScheduler: A Low-Cost, Voice-Activated Device for Hands-Free Clinical Scheduling

Diego Mardien , Frank Liu This is my paper

Pith reviewed 2026-05-15 10:42 UTC · model grok-4.3

classification 💻 cs.HC

keywords voice assistantclinical schedulinghands-free deviceRaspberry PiEHR tasksphysician burnoutspeech recognition

0 comments

The pith

A belt-worn Raspberry Pi device uses cloud speech recognition and LLMs to turn spoken commands into automatic calendar events for clinicians.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Physicians spend nearly half their workday on administrative EHR tasks that contribute to burnout and cut time for direct patient care. The paper introduces MDwAIstScheduler, a low-cost belt-worn voice assistant hidden under a lab coat that processes natural-language scheduling requests without visible screens or wrist devices. Running on a Raspberry Pi, the system applies cloud speech recognition followed by LLM intent extraction to interpret commands and create calendar entries automatically. This setup aims to let clinicians handle follow-ups and appointments hands-free during patient encounters.

Core claim

The MDwAIstScheduler demonstrates an end-to-end pipeline on a Raspberry Pi that captures voice input, applies cloud-based speech recognition, uses an LLM to extract scheduling intent, and automatically generates the corresponding calendar event, all while remaining hidden to avoid disrupting clinician-patient eye contact.

What carries the argument

Belt-worn Raspberry Pi that integrates cloud speech recognition and LLM intent extraction to convert spoken commands into calendar actions.

If this is right

Clinicians can manage calendars by voice during encounters without breaking eye contact or using visible devices.
The hidden belt-worn design prevents the eye-contact disruptions typical of screens or wrist devices.
Low-cost hardware based on Raspberry Pi makes the system accessible for routine clinical use.
Automatic creation of events from natural speech reduces time spent on manual EHR administrative entry.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same voice-to-calendar pipeline could be adapted for other routine clinical administrative tasks.
Real-world use would require safeguards for patient data privacy when sending speech to cloud services.
Performance testing in varied noise levels would determine whether the system maintains accuracy across different clinic settings.

Load-bearing premise

Cloud-based speech recognition and LLM intent extraction will perform reliably and accurately in noisy clinical environments without requiring visible interaction or causing errors.

What would settle it

Multiple trials in a noisy clinical simulation where the device fails to correctly interpret or schedule from spoken commands such as 'Schedule a follow-up with Mr. Smith next Tuesday at 2'.

Figures

Figures reproduced from arXiv: 2604.16352 by Diego Mardien, Frank Liu.

read the original abstract

Physicians spend nearly half their workday on EHR tasks and administrative work, contributing to burnout and reducing time for direct patient care. We present MDwAIstScheduler, a low-cost, belt-worn voice assistant that allows hands-free calendar management during patient encounters. Hidden beneath a lab coat, the device avoids the eye-contact disruptions caused by visible screens or wrist-worn devices. Running on a Raspberry Pi with cloud-based speech recognition and LLM intent extraction, the system lets clinicians simply say 'Schedule a follow-up with Mr. Smith next Tuesday at 2' and automatically creates the calendar event. Our demo show-cases this end-to-end pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A plain Raspberry Pi voice scheduler prototype that describes the hardware flow but supplies zero accuracy numbers or clinical tests.

read the letter

The paper walks through a belt-worn Raspberry Pi setup that runs cloud speech recognition and an LLM to turn spoken sentences like 'Schedule a follow-up with Mr. Smith next Tuesday at 2' into calendar entries. The device is meant to stay hidden under a lab coat so clinicians avoid looking at screens during patient visits. That is the whole contribution: a specific low-cost assembly of existing parts aimed at one narrow administrative task. It does a decent job laying out the parts list and the high-level pipeline, and it correctly flags how much time doctors lose to EHR work. The writing is clear and the motivation is realistic. Beyond that, nothing new appears in the algorithms or the underlying techniques. The real problem is the complete absence of any numbers. The claim that the system 'automatically creates the calendar event' is presented as a working demo, yet there are no word-error rates, no intent-parsing success rates, no trials in noisy rooms, no tests with medical accents or overlapping speech, and no comparison to typing or existing voice tools. Without those data the central assertion stays unverified. The assumption that cloud ASR plus LLM will be reliable enough in a real clinic is left unexamined. This kind of work is useful to someone who already wants to build a similar wearable prototype and needs a parts list and wiring diagram. It is not ready for a serious journal or conference track that expects evidence. I would not bring it to a reading group and I would not cite it. A serious editor should desk-reject unless the authors add at least basic quantitative evaluation and some user feedback from a clinical setting.

Referee Report

2 major / 2 minor

Summary. The manuscript presents MDwAIstScheduler, a low-cost belt-worn voice assistant built on a Raspberry Pi that uses cloud-based speech recognition and LLM-based intent extraction to enable hands-free clinical scheduling. Clinicians utter natural-language commands (e.g., 'Schedule a follow-up with Mr. Smith next Tuesday at 2') that are automatically turned into calendar events, with the device concealed under a lab coat to avoid visual disruption during patient encounters. The work is framed as a working demo of this end-to-end pipeline.

Significance. If the claimed functionality were shown to be reliable, the device could meaningfully reduce administrative burden and burnout for physicians by allowing seamless, eyes-free scheduling. The low-cost hardware approach and emphasis on non-disruptive form factor are practical strengths. However, the complete absence of any performance data prevents assessment of whether the system delivers on its core promise in realistic clinical conditions.

major comments (2)

[Abstract] Abstract: the central claim that the system 'automatically creates the calendar event' after a spoken utterance is unsupported by evidence. No word-error rates, intent-extraction accuracy, end-to-end success rates, or comparisons to baseline methods are reported anywhere in the manuscript.
[System Description / Demo] System description and demo section: no quantitative or qualitative evaluation is provided for performance in noisy clinical environments, with medical accents, overlapping speech, or under realistic patient-encounter conditions, which directly undermines the reliability asserted for the hands-free pipeline.

minor comments (2)

[Abstract] Abstract: 'show-cases' should be written as 'showcases'.
[Hardware] The manuscript would benefit from a brief cost breakdown of the Raspberry Pi hardware and cloud services to substantiate the 'low-cost' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We agree that the manuscript, presented as a prototype demonstration, lacks quantitative performance data and that this limits the strength of claims about automatic functionality and reliability. We address each point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the system 'automatically creates the calendar event' after a spoken utterance is unsupported by evidence. No word-error rates, intent-extraction accuracy, end-to-end success rates, or comparisons to baseline methods are reported anywhere in the manuscript.

Authors: We agree that the abstract phrasing implies reliable end-to-end automation without supporting metrics. The manuscript describes a working prototype and demo of the pipeline rather than an evaluation study. We will revise the abstract to state that the system 'converts spoken commands into calendar events using cloud-based speech recognition and LLM-based intent extraction' and will add an explicit note that no quantitative metrics such as word-error rates or success rates are reported. revision: yes
Referee: [System Description / Demo] System description and demo section: no quantitative or qualitative evaluation is provided for performance in noisy clinical environments, with medical accents, overlapping speech, or under realistic patient-encounter conditions, which directly undermines the reliability asserted for the hands-free pipeline.

Authors: The system description and demo section focus on the hardware integration, form factor, and end-to-end pipeline in a controlled demonstration setting. We do not provide evaluations under the listed clinical conditions because the work is scoped as a feasibility demonstration. We will revise the manuscript to include a limitations paragraph stating that performance in noisy environments, with medical accents or overlapping speech, has not been quantified and that such testing is reserved for future work. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering prototype description with no derivations or self-referential reductions

full rationale

The manuscript is a high-level description of a Raspberry Pi-based hardware prototype that uses off-the-shelf cloud ASR and LLM components to convert spoken sentences into calendar events. No equations, fitted parameters, uniqueness theorems, or derivation chains appear anywhere in the text. The central claim is presented as an end-to-end demo pipeline rather than a result derived from prior inputs or self-citations, so the work remains self-contained with no steps that reduce to their own assumptions by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No formal parameters, axioms, or invented entities are present; the paper is an applied prototype description relying on standard off-the-shelf components.

pith-pipeline@v0.9.0 · 5403 in / 990 out tokens · 28844 ms · 2026-05-15T10:42:11.220215+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

[1]

Maria A Alkureishi, Wei Wei Lee, Michael Lyber, et al. 2016. Impact of Electronic Medical Record Use on the Patient-Doctor Relationship and Communication: A Systematic Review.Journal of General Internal Medicine31, 5 (2016), 548–560. doi:10.1007/s11606-015-3582-1

work page doi:10.1007/s11606-015-3582-1 2016
[2]

Onur Asan and Enid Montague. 2014. Dynamic Modeling of Patient and Physi- cian Eye Gaze to Understand the Effects of Electronic Health Records on Doctor- Patient Communication and Attention.International Journal of Medical Infor- matics83, 3 (2014), 225–234. doi:10.1016/j.ijmedinf.2013.11.003

work page doi:10.1016/j.ijmedinf.2013.11.003 2014
[3]

Alberto Cuevas et al. 2025. Real-Time Speech-to-Text on Edge: A Prototype Sys- tem for Ultra-Low Latency Communication with AI-Powered NLP.Information 16, 8 (2025), 685. doi:10.3390/info16080685

work page doi:10.3390/info16080685 2025
[4]

Sameer Ghatnekar et al. 2025. Real-World Evidence Synthesis of Digital Scribes Using Ambient Listening and Generative Artificial Intelligence for Clinician Documentation Workflows: Rapid Review.JMIR AI4 (2025), e76743. doi:10.2196/ 76743

work page 2025
[5]

Trent Hodgson and Enrico Coiera. 2018. Electronic Health Record Interactions through Voice: A Review.Applied Clinical Informatics9, 3 (2018), 541–559. doi:10. 1055/s-0038-1666844

work page 2018
[6]

Lu Lu, Jue Zhang, Yingchao Xie, et al. 2020. Wearable Health Devices in Health Care: Narrative Systematic Review.JMIR mHealth and uHealth8, 11 (2020), e18907. doi:10.2196/18907

work page doi:10.2196/18907 2020
[7]

Yvonne O’Connor et al. 2021. Review of Wearable Devices and Data Collection Considerations for Connected Health.Sensors21, 16 (2021), 5589. doi:10.3390/ s21165589

work page 2021
[8]

Sachin J Shah, Christine A Sinsky, Karleen F Giannitrapani, Tait D Shanafelt, et al. 2025. Use of Ambient AI Scribes to Reduce Administrative Burden and Professional Burnout.JAMA Network Open8, 2 (2025), e2460637. doi:10.1001/ jamanetworkopen.2024.60637

work page arXiv 2025
[9]

Christine Sinsky, Lacey Colligan, Ling Li, Mirela Prgomet, Sam Reynolds, Lindsey Goeders, Johanna Westbrook, Michael Tutty, and George Blike. 2016. Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Annals of Internal Medicine165, 11 (2016), 753–760. doi:10.7326/M16-0961

work page doi:10.7326/m16-0961 2016
[10]

Aaron A Tierney, Gonzalo Gavidia, and Christine Sinsky. 2024. Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation.NEJM Catalyst5, 3 (2024), CAT.23.0404. doi:10.1056/CAT.23.0404

work page doi:10.1056/cat.23.0404 2024
[11]

Yifan Zhou et al. 2025. Intelligent Data Analysis in Edge Computing with Large Language Models: Applications, Challenges, and Future Directions.Frontiers in Computer Science7 (2025), 1538277. doi:10.3389/fcomp.2025.1538277

work page doi:10.3389/fcomp.2025.1538277 2025

[1] [1]

Maria A Alkureishi, Wei Wei Lee, Michael Lyber, et al. 2016. Impact of Electronic Medical Record Use on the Patient-Doctor Relationship and Communication: A Systematic Review.Journal of General Internal Medicine31, 5 (2016), 548–560. doi:10.1007/s11606-015-3582-1

work page doi:10.1007/s11606-015-3582-1 2016

[2] [2]

Onur Asan and Enid Montague. 2014. Dynamic Modeling of Patient and Physi- cian Eye Gaze to Understand the Effects of Electronic Health Records on Doctor- Patient Communication and Attention.International Journal of Medical Infor- matics83, 3 (2014), 225–234. doi:10.1016/j.ijmedinf.2013.11.003

work page doi:10.1016/j.ijmedinf.2013.11.003 2014

[3] [3]

Alberto Cuevas et al. 2025. Real-Time Speech-to-Text on Edge: A Prototype Sys- tem for Ultra-Low Latency Communication with AI-Powered NLP.Information 16, 8 (2025), 685. doi:10.3390/info16080685

work page doi:10.3390/info16080685 2025

[4] [4]

Sameer Ghatnekar et al. 2025. Real-World Evidence Synthesis of Digital Scribes Using Ambient Listening and Generative Artificial Intelligence for Clinician Documentation Workflows: Rapid Review.JMIR AI4 (2025), e76743. doi:10.2196/ 76743

work page 2025

[5] [5]

Trent Hodgson and Enrico Coiera. 2018. Electronic Health Record Interactions through Voice: A Review.Applied Clinical Informatics9, 3 (2018), 541–559. doi:10. 1055/s-0038-1666844

work page 2018

[6] [6]

Lu Lu, Jue Zhang, Yingchao Xie, et al. 2020. Wearable Health Devices in Health Care: Narrative Systematic Review.JMIR mHealth and uHealth8, 11 (2020), e18907. doi:10.2196/18907

work page doi:10.2196/18907 2020

[7] [7]

Yvonne O’Connor et al. 2021. Review of Wearable Devices and Data Collection Considerations for Connected Health.Sensors21, 16 (2021), 5589. doi:10.3390/ s21165589

work page 2021

[8] [8]

Sachin J Shah, Christine A Sinsky, Karleen F Giannitrapani, Tait D Shanafelt, et al. 2025. Use of Ambient AI Scribes to Reduce Administrative Burden and Professional Burnout.JAMA Network Open8, 2 (2025), e2460637. doi:10.1001/ jamanetworkopen.2024.60637

work page arXiv 2025

[9] [9]

Christine Sinsky, Lacey Colligan, Ling Li, Mirela Prgomet, Sam Reynolds, Lindsey Goeders, Johanna Westbrook, Michael Tutty, and George Blike. 2016. Allocation of Physician Time in Ambulatory Practice: A Time and Motion Study in 4 Specialties. Annals of Internal Medicine165, 11 (2016), 753–760. doi:10.7326/M16-0961

work page doi:10.7326/m16-0961 2016

[10] [10]

Aaron A Tierney, Gonzalo Gavidia, and Christine Sinsky. 2024. Ambient Artificial Intelligence Scribes to Alleviate the Burden of Clinical Documentation.NEJM Catalyst5, 3 (2024), CAT.23.0404. doi:10.1056/CAT.23.0404

work page doi:10.1056/cat.23.0404 2024

[11] [11]

Yifan Zhou et al. 2025. Intelligent Data Analysis in Edge Computing with Large Language Models: Applications, Challenges, and Future Directions.Frontiers in Computer Science7 (2025), 1538277. doi:10.3389/fcomp.2025.1538277

work page doi:10.3389/fcomp.2025.1538277 2025