H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration
Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3
The pith
H-AdminSim provides a multi-agent simulator with FHIR integration as a standardized testbed for evaluating LLM automation of hospital administrative workflows.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
H-AdminSim combines realistic data generation with multi-agent-based simulation of hospital administrative workflows and FHIR integration to create a unified, interoperable environment for testing these workflows across heterogeneous hospital settings and assessing the feasibility and performance of LLM-driven administrative automation.
What carries the argument
H-AdminSim, a multi-agent simulator that generates realistic hospital data, models administrative workflows via agent interactions, and uses FHIR standards to ensure interoperability for LLM evaluation with rubrics.
If this is right
- LLMs can be compared systematically on complete, multi-step administrative processes instead of isolated subtasks.
- Testing becomes possible across varied hospital settings through a single FHIR-based interface.
- Quantitative rubric scores provide concrete metrics for judging automation feasibility at scale.
- Edge cases in high-volume daily request handling can be explored in a controlled environment before real deployment.
Where Pith is reading between the lines
- If the simulator proves faithful, it could serve as a low-risk sandbox for iterating on LLM tools before any live hospital integration.
- The framework might support future extensions that incorporate real-time hospital data feeds for ongoing validation.
- Standardized benchmarks for administrative AI could emerge, allowing consistent progress tracking across research groups.
Load-bearing premise
The combination of generated data and multi-agent interactions in H-AdminSim sufficiently captures the complexity, variability, and edge cases of actual hospital administrative workflows.
What would settle it
Direct comparison of LLM-generated workflow outcomes, error patterns, and decision sequences inside H-AdminSim against anonymized logs from real hospital administrative systems would show whether the simulation matches observed behavior.
Figures
read the original abstract
Hospital administration departments handle a wide range of operational tasks and, in large hospitals, process over 10,000 requests per day, driving growing interest in LLM-based automation. However, prior work has focused primarily on patient-physician interactions or isolated administrative subtasks, failing to capture the complexity of real administrative workflows. To address this gap, we propose H-AdminSim, a comprehensive simulation framework that combines realistic data generation with multi-agent-based simulation of hospital administrative workflows. These tasks are quantitatively evaluated using detailed rubrics, enabling systematic comparison of LLMs. Through FHIR integration, H-AdminSim provides a unified and interoperable environment for testing administrative workflows across heterogeneous hospital settings, serving as a standardized testbed for assessing the feasibility and performance of LLM-driven administrative automation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes H-AdminSim, a multi-agent simulation framework that generates realistic hospital administrative data and models workflows via agent interactions, with FHIR integration to enable interoperability across heterogeneous hospital settings; it further claims to support quantitative LLM evaluation through detailed rubrics and to serve as a standardized testbed for assessing LLM-driven administrative automation.
Significance. If the simulation's fidelity to real workflows can be established, H-AdminSim would address a clear gap in prior LLM healthcare work by providing an interoperable environment for testing complex, multi-step administrative tasks at scale, potentially enabling reproducible comparisons that isolated-subtask benchmarks cannot.
major comments (2)
- [Abstract and §3] Abstract and §3 (framework description): the central claim that generated data plus multi-agent interactions plus FHIR integration produce workflows whose complexity, variability, and edge cases match real hospital administration is load-bearing yet unsupported; no quantitative fidelity metrics, no comparison to anonymized hospital logs, and no expert validation study are reported.
- [§4] §4 (evaluation): the rubric-based quantitative assessment of LLMs is described but no ablation is shown isolating the contribution of the multi-agent or FHIR components versus synthetic data alone, leaving the interoperability and realism assertions untested.
minor comments (1)
- [§3] Notation for agent roles and FHIR resource mappings could be clarified with a small table or diagram in §3 to aid reproducibility.
Simulated Author's Rebuttal
We are grateful to the referee for the thoughtful and constructive review of our manuscript. The comments have helped us identify areas where the presentation of our contributions can be strengthened. We address each major comment below and describe the revisions we intend to make.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (framework description): the central claim that generated data plus multi-agent interactions plus FHIR integration produce workflows whose complexity, variability, and edge cases match real hospital administration is load-bearing yet unsupported; no quantitative fidelity metrics, no comparison to anonymized hospital logs, and no expert validation study are reported.
Authors: We acknowledge that the central claims regarding workflow realism and complexity are currently grounded in the design of the data generation process and agent interaction rules, which draw from publicly documented hospital administrative procedures and FHIR resource specifications, rather than from quantitative fidelity metrics or direct comparisons to real hospital logs. No expert validation study is reported in the current manuscript. In the revised version we will expand §3 with a detailed rationale for the modeling choices, add qualitative examples of captured edge cases and variability, and insert an explicit limitations subsection that states the absence of quantitative validation metrics and outlines plans for future expert review and log-based comparisons. These changes will moderate the strength of the claims while preserving the framework as a proposed standardized testbed. revision: yes
-
Referee: [§4] §4 (evaluation): the rubric-based quantitative assessment of LLMs is described but no ablation is shown isolating the contribution of the multi-agent or FHIR components versus synthetic data alone, leaving the interoperability and realism assertions untested.
Authors: We agree that ablation experiments would strengthen the evaluation by isolating the contributions of the multi-agent simulation and FHIR integration. The present §4 reports end-to-end rubric scores on the full workflows but does not include such controls. For the revision we will add ablation results comparing LLM performance on synthetic data alone versus the complete multi-agent environment, together with a discussion of how FHIR resources enable cross-setting interoperability. These additional experiments will be reported in the revised §4. revision: yes
Circularity Check
No circularity: framework proposal without derivations or fitted reductions
full rationale
The manuscript proposes H-AdminSim as a simulation framework that combines synthetic data generation, multi-agent workflow modeling, and FHIR integration. No equations, parameter-fitting steps, or derivation chains appear in the abstract or described content. Claims of realism and interoperability are presented as design features of the proposed system rather than results obtained by reducing outputs to prior fitted inputs or self-citation chains. The work is therefore self-contained as a framework description and exhibits no circularity of the enumerated kinds.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Hospital administrative workflows can be realistically simulated using multi-agent systems and generated data.
invented entities (1)
-
H-AdminSim simulator
no independent evidence
Reference graph
Works this paper leans on
-
[1]
ISSN 2291-9694. doi: 10.2196/21929. URL https://medinform.jmir.org/2021/7/e21929. Zhijie Bao, Qingyun Liu, Ying Guo, Zhengqiang Ye, Jun Shen, Shirong Xie, Jiajie Peng, Xuanjing Huang, and Zhongyu Wei. Piors: Personalized intel- ligent outpatient reception based on large language model with multi-agents medical scenario simula- tion, 2024. URL https://arxi...
-
[2]
BMC Nursing 20(1), 158 (2021) https://doi.org/10.1186/s12912-021-00684-2
URL https://www.keiseruniversity.edu /primary-secondary-tertiary-and-quaternar y-understanding-levels-of-patient-care/. Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik S Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae W Park. Mdagents: An adaptive collaboration of llms for medical decision-making.Advances in Neural Inf...
-
[3]
URL https://arxiv.org/abs/2505.17818. Junkai Li, Yunghwei Lai, Weitao Li, Jingyi Ren, Meng Zhang, Xinhui Kang, Siyu Wang, Peng Li, Ya-Qin Zhang, Weizhi Ma, et al. Agent hospital: A sim- ulacrum of hospital with evolvable medical agents. arXiv preprint arXiv:2405.02957, 2024a. Yanzeng Li, Cheng Zeng, Jialun Zhong, Ruoyu Zhang, Minhao Zhang, and Lei Zou. Le...
-
[4]
American College of Osteopathic Internists
-
[5]
General Medical Council
-
[6]
Medical Board of Australia Association (CMA) 8 : gastroenterology, cardiology, pulmonology, endocrinology/metabolism, nephrology, hematology/oncology, allergy, infectious diseases, and rheumatology. A.1.2. Physician Data After hospital and department information are gener- ated, physician data are synthesized according to the predefined number of physicia...
-
[7]
Physician 1Physician k...Physician 1Physician l...Physician 1Physician m
Canadian Medical Association 16 H-AdminSim Synthesize Hospital 1 Department 1Department 2Department n... Physician 1Physician k...Physician 1Physician l...Physician 1Physician m... Simulation Data #1 Start Hour End Hour Schedules Physicians Departments Hospital Data synthesizing order freefree free appn. 2freeappn. 3 appn. 1 appn. 1 free busybusy appn. 2 ...
work page 2005
-
[8]
NHS Inform Health Encyclopedia Scotland
-
[9]
Summary statistics and representative examples are provided in Table 8
Seoul National University Hospital Medicine Information clopedia,11 and the Severance Hospital Disease Ency- clopedia.12 Each disease was mapped to one or more of the nine internal medicine specialties. Summary statistics and representative examples are provided in Table 8. The total disease count reported in Table 8 exceeds 194 because diseases treatable...
work page 2013
-
[10]
Asan Medical Center Disease Encyclopedia
-
[11]
Severance Hospital Disease Encyclopedia
-
[12]
The official website of HL7 International Practitioner. ThePractitionerresource represents individuals who provide healthcare or related services and contains demographic attributes such as name, gender, birth date, and contact information. In the simulation, this resource is used to represent each physician’s demographic profile. PractitionerRole. ThePra...
work page 2025
-
[13]
Patient agents were configured to reflect typical outpatient characteristics
supports four configurable personality-related dimensions: (1) personality, (2) language proficiency, (3) confusion level, and (4) medical history recall level. Patient agents were configured to reflect typical outpatient characteristics. Among the six available personality types (neutral, distrustful, impatient, over- anxious, overly positive, and verbos...
work page 2001
- [14]
-
[15]
Discuss familiar topics confidently but struggle with abstract or technical subjects
Speaking: Use common vocabulary and form connected, coherent sentences with occasional minor grammar errors. Discuss familiar topics confidently but struggle with abstract or technical subjects. Avoid highly specialized or abstract words
-
[16]
Need clarification or simpler explanations for abstract, technical, or complex information
Understanding: Can understand the main ideas of everyday conversations. Need clarification or simpler explanations for abstract, technical, or complex information. Words within your level: {understandwords}. Words beyond your level:{misunderstandwords}
-
[17]
Medical Terms: Use and understand common medical terms related to general health. Cannot use or understand advanced or specialized medical terms and require these to be explained in simple language. Below are examples of words within and beyond your level. You cannot understand words more complex than the examples provided within your level. Words within ...
-
[18]
No chronic conditions, regular medications, or relevant family medical history are reported.High 1.Accurately remember all health-related information, including past conditions, current medica- tions, and other documented details
-
[19]
Do not forget or confuse medical information
-
[20]
unknown” and “first hospital visit for this symp- tom,
Consistently ensure that recalled details match documented records. D.1.2. Patient Prompt The patient agent assumes it is interacting with the hospital’s administrative staff, as specified in the in- take prompt. As shown in Figure 11, the prompt consists of three main components: (1) patient in- formation, (2) persona, and (3) behavioral guide- lines. Th...
-
[22]
Ensure responses stay consistent with the patient’s profile, current visit details, and prior conversation, allowing minor persona-based variations
-
[23]
Align responses with the patient’s language proficiency, using simpler terms or asking for rephrasing if any words exceed their level
-
[25]
Minimize or exaggerate medical information, or even deny answers as appropriate, based on dazedness and personality
-
[26]
Prioritize dazedness over personality when dazedness is high, while maintaining language profi- ciency
-
[27]
Reflect the patient’s memory and dazedness level, potentially forgetting or confusing details
-
[30]
11.Gradually reveal detailed information or experiences as the dialogue goes on
Keep responses to 1–{sentencelimit}concise sentences, each no longer than 20 words. 11.Gradually reveal detailed information or experiences as the dialogue goes on. Avoid sharing all possible information without being asked. 12.Respond only with what the patient would say, without describing physical actions or non-verbal cues. 13.Do not directly reveal d...
-
[31]
Do not directly ask which department you should visit. The final decision will be made entirely by the administration office based on the symptoms you report. Figure 12: The prompt represents the general behavioral guideline for a patient during the intake simulation. This guideline is inserted into the{behavioral guideline}placeholder in Figure 11. 32 H-...
-
[32]
First decide if the patient should go to Internal Medicine or Surgery
-
[33]
You may ask up to{totalidx}questions before making your final decision
Then guide the patient to the most suitable detailed department within that category. You may ask up to{totalidx}questions before making your final decision. Conversation guidelines:
-
[34]
Try to ask for all of the above information at once naturally, rather than separately
You **must** ask about demographic information to the patient: Name, gender, phone number, personal ID, and address. Try to ask for all of the above information at once naturally, rather than separately
-
[35]
After obtaining the patient’s demographic information, you **must** ask the patient about any previously diagnosed diseases
-
[36]
Focus on the patient’s main problem. Ask about: • Main symptom: when it started, how it feels, how long it lasts, and what makes it better or worse (use simple, everyday words), etc. • Medical history: If the patient has diagnostic records or a diagnosis from a previous hospital, you should make the final decision on the department based on this informati...
-
[37]
Even if the patient does not have medical records or a diagnostic history from a previous hospital, you must not make a medical diagnosis yourself. Your purpose is to assign the most appropriate department for treatment, based on the previous hospital’s diagnosis if available, or on the patient’s symptoms if no such records exist
-
[38]
Avoid medical jargon. Use everyday words (e.g., say “yellowing of eyes” instead of “icterus”)
-
[39]
Adjust your questions based on the patient’s answers. If unclear, gently rephrase
-
[40]
I understand that must be uncomfortable
Show empathy and reassurance (e.g., “I understand that must be uncomfortable.”)
-
[41]
Ask only one short and clear question at a time and keep your answers short (1–2 sentences per turn)
-
[42]
Three examples of the answer format: •‘Answer: 1
Whenever you are able to determine the patient’s department, you **must** use the following answer format, including the corresponding number from the options below. Three examples of the answer format: •‘Answer: 1. orthopedics‘ •‘Answer: 4. neurology‘ •‘Answer: 3. oncology‘ Current department options in the hospital: {department} This is round{curridx}, ...
-
[50]
Respond only with what the patient would say, without describing physical actions or non-verbal cues. You are now the patient. Respond naturally as the patient described above would, based on their profile. Respond in one concise sentence only, with a maximum length of 20 words. Figure 15: Patient agent system prompt template for new appointment schedulin...
-
[52]
Ensure responses stay consistent with the patient’s profile, and scheduling preference
-
[57]
Respond in no more than two concise sentences, with a maximum length of 20 words in total
-
[58]
Respond only with what the patient would say, without describing physical actions or non-verbal cues. You are now the patient. Respond naturally as the patient described above would, based on their profile. Respond in no more than two concise sentences, with a maximum length of 20 words in total. Figure 16: Patient agent system prompt template for rejecti...
-
[60]
Ensure that all responses remain consistent with the patient’s name and the existing appointment information to be moved earlier
-
[66]
Respond only with what the patient would say, without describing physical actions or non-verbal cues. You are now the patient. Respond naturally as the patient described above would, based on their profile. Respond in one concise sentence only, with a maximum length of 20 words. Figure 17: Patient agent system prompt template for appointment rescheduling....
-
[67]
Fully immerse yourself in the patient role, setting aside any awareness of being an AI model
-
[68]
Ensure that all responses remain consistent with the patient’s name and the appointment information to be cancelled
-
[69]
Align responses with the patient’s language proficiency
-
[70]
Do not explicitly mention the personality
Match the tone and style to the patient’s personality, reflecting it distinctly and naturally. Do not explicitly mention the personality
-
[71]
Avoid mechanical repetition and a robotic or exaggerated tone
Keep responses realistic and natural. Avoid mechanical repetition and a robotic or exaggerated tone
-
[72]
Use informal, everyday language
-
[73]
Respond in one concise sentence only, with a maximum length of 20 words
-
[74]
Respond only with what the patient would say, without describing physical actions or non-verbal cues. You are now the patient. Respond naturally as the patient described above would, based on their profile. Respond in one concise sentence only, with a maximum length of 20 words. Figure 18: Patient agent system prompt template for appointment cancellation....
work page 2025
-
[75]
If the patient has a preferred doctor, the appointment must be scheduled with that doctor
-
[76]
If the patient wants the earliest possible appointment, compare the available times of the doctors in the patient’s department and schedule the appointment with the doctor who can see the patient the soonest
-
[77]
If the patient wants an appointment after a specific date, compare the availability of doctors in the patient’s department after that date and schedule the appointment with the doctor who can see the patient the soonest after that date
-
[78]
Appointment times must be later than the ”current time” (ISO format) provided in the ”Hospital time information” above
-
[79]
If more than one doctor is available, the appointment should be made with the doctor who has the lower workload (expressed as a percentage)
-
[80]
For example, one doctor’s consultation time may be 0.25 hours, while another’s may be 0.5 hours
Once the doctor for the appointment is determined, you must schedule according to that doctor’s outpatient consultation duration. For example, one doctor’s consultation time may be 0.25 hours, while another’s may be 0.5 hours
-
[81]
Output the patient’s scheduled appointment as the value of the ’schedule’ key in the JSON format shown below
-
[82]
Schedule appointments between the patient and the doctor while satisfying the above conditions, following the basic principle of booking sequentially from the earliest available date and time
-
[83]
Since there may be gaps in the schedule, carefully check the doctor’s schedule when assigning
If a patient requests rescheduling due to a previous patient’s appointment cancellation, you **must** find and assign the earliest available date and time slot. Since there may be gaps in the schedule, carefully check the doctor’s schedule when assigning. In this case, appending a time slot may not be needed, and the earliest available time slot should be...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.