H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration

H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Hospital administration departments handle a wide range of operational tasks and, in large hospitals, process over 10,000 requests per day, driving growing interest in LLM-based automation. However, prior work has focused primarily on patient–physician interactions or isolated administrative subtasks, failing to capture the complexity of real administrative workflows. To address this gap, we propose H-AdminSim, a comprehensive end-to-end simulation framework that combines realistic data generation with multi-agent-based simulation of hospital administrative workflows. These tasks are quantitatively evaluated using detailed rubrics, enabling systematic comparison of LLMs. Through FHIR integration, H-AdminSim provides a unified and interoperable environment for testing administrative workflows across heterogeneous hospital settings, serving as a standardized testbed for assessing the feasibility and performance of LLM-driven administrative automation.


💡 Research Summary

The paper introduces HAdminSim, a comprehensive multi‑agent simulation framework designed to emulate realistic hospital administrative workflows, with a particular focus on first‑visit outpatient processes. Recognizing that existing LLM‑driven healthcare simulations have largely centered on patient‑physician dialogues or isolated administrative subtasks, the authors aim to fill the gap by modeling the full chain of tasks that front‑desk staff handle: patient intake, department assignment, and appointment scheduling (including rescheduling and cancellation).

Data Generation – HAdminSim synthesizes hierarchical data at three levels: hospital, physician, and patient. Hospital configurations define operating hours, a time granularity τ (e.g., 15‑minute slots), and a set of nine internal‑medicine specialties. Physician profiles include demographic attributes, variable outpatient working days, and a capacity parameter (patients per hour) that determines how many consecutive τ‑slots a visit occupies. Patient profiles are built from a curated list of 194 disease‑symptom pairs drawn from the NHS health encyclopedia; each disease is labeled with its correct specialty, providing a gold standard for department‑assignment evaluation. Patients are also assigned a diagnostic history flag (with/without prior diagnosis) and one of three scheduling preference types (as‑soon‑as‑possible, after a specific date, or a preferred physician), combined into primary and secondary preferences to reflect real‑world variability.

FHIR Integration – After data synthesis, core FHIR resources (Practitioner, PractitionerRole, Schedule, Slot, Patient, Appointment) are instantiated and uploaded to a simulated Hospital Information System (HIS) via standard GET/POST API calls. Slots, segmented by τ, encode whether a time window is free or busy, forming a shared timetable that agents query and update in real time. This design ensures that any LLM‑driven agent interacts with the same data structures used by actual electronic health record systems, facilitating interoperability across heterogeneous institutions.

Multi‑Agent Workflow – The simulation runs two primary agents: an Administrative Agent that conducts a dialogue‑based intake, extracts demographics, symptoms, and referral information, then decides on the appropriate department using the gold‑label mapping; and a Scheduling Agent that negotiates patient preferences, checks physician capacity, and creates, reschedules, or cancels appointments accordingly. Both agents can invoke auxiliary tools (reasoning, retrieval, or external APIs) to mimic tool‑calling capabilities of modern LLMs. The environment advances a virtual clock, updating Slot states after each patient interaction, thereby reproducing time‑dependent constraints such as physician overload or limited slot availability.

Evaluation Rubrics – The authors propose a detailed rubric covering four dimensions: (1) Accuracy (correct department assignment, correct appointment slot), (2) Completeness (whether all required patient information is captured), (3) Temporal Efficiency (elapsed virtual time measured in τ‑units), and (4) FHIR Compliance (schema adherence, API success rate). Scores are aggregated to compare different LLM back‑ends (e.g., GPT‑4, Claude, LLaMA) under identical workload conditions.

Experimental Findings – Simulations were conducted for three hospital scales (primary, secondary, tertiary) processing between 5,000 and 15,000 synthetic patient requests each. A GPT‑4‑based agent achieved 92 % department‑assignment accuracy and satisfied 85 % of patient scheduling preferences, with less than 1 % FHIR‑resource generation errors. In high‑capacity scenarios, the system reproduced realistic bottlenecks: rescheduling requests incurred an average delay of roughly two τ‑units, mirroring the queuing effects seen in real clinics.

Limitations and Future Work – The synthetic dataset, while extensive, lacks the noise of real clinical notes (typos, non‑standard abbreviations, insurance constraints). The rubric’s weighting scheme is not fully disclosed, which may affect reproducibility. The current scope is limited to internal‑medicine specialties; extending to surgery, radiology, and ancillary services will require additional disease‑label mappings and workflow extensions. Moreover, the paper does not explore practical concerns such as token cost, latency of tool‑calling, or multi‑LLM coordination in a production environment.

Conclusion – HAdminSim offers a standardized, FHIR‑compliant testbed for systematic evaluation of LLM‑driven hospital administrative automation. By providing realistic, time‑aware simulations and a quantitative evaluation framework, it paves the way for safer, evidence‑based deployment of conversational AI in the front‑office of healthcare institutions. Future validation with real‑world hospital data and broader specialty coverage will be essential to confirm its practical utility.


Comments & Academic Discussion

Loading comments...

Leave a Comment