Adoption and Use of LLMs at an Academic Medical Center

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with “workflow friction” from manual data entry. We developed ChatEHR, a system that enables the use of LLMs with the entire patient timeline spanning several years. ChatEHR enables automations - which are static combinations of prompts and data that perform a fixed task - and interactive use in the electronic health record (EHR) via a user interface (UI). The resulting ability to sift through patient medical records for diverse use-cases such as pre-visit chart review, screening for transfer eligibility, monitoring for surgical site infections, and chart abstraction, redefines LLM use as an institutional capability. This system, accessible after user-training, enables continuous monitoring and evaluation of LLM use. In 1.5 years, we built 7 automations and 1075 users have trained to become routine users of the UI, engaging in 23,000 sessions in the first 3 months of launch. For automations, being model-agnostic and accessing multiple types of data was essential for matching specific clinical or administrative tasks with the most appropriate LLM. Benchmark-based evaluations proved insufficient for monitoring and evaluation of the UI, requiring new methods to monitor performance. Generation of summaries was the most frequent task in the UI, with an estimated 0.73 hallucinations and 1.60 inaccuracies per generation. The resulting mix of cost savings, time savings, and revenue growth required a value assessment framework to prioritize work as well as quantify the impact of using LLMs. Initial estimates are $6M savings in the first year of use, without quantifying the benefit of the better care offered. Such a “build-from-within” strategy provides an opportunity for health systems to maintain agency via a vendor-agnostic, internally governed LLM platform.

💡 Research Summary

This paper describes the design, deployment, and early evaluation of ChatEHR, a vendor‑agnostic platform that integrates large language models (LLMs) with the full longitudinal electronic health record (EHR) at an academic medical center. The system consists of three core capabilities—Data Orchestration, Context Management, and LLM Routing—that together enable real‑time extraction of patient data (both FHIR and non‑FHIR), token‑efficient packaging of that data with prompts, and dynamic selection or chunking of LLMs based on context‑window limits. ChatEHR supports two modes of use: “automations,” which are static prompt‑code pipelines that perform predefined tasks such as pre‑visit summaries or transfer‑eligibility checks, and an interactive ChatEHR UI embedded directly in the EHR for open‑ended queries. Over 1.5 years the team built seven automations and trained 1,075 users, who generated roughly 23,000 UI sessions in the first three months after launch.

To evaluate performance, the authors created task‑specific gold‑standard datasets for each automation and used them for baseline benchmarking and ongoing monitoring whenever models were updated. For the UI, where tasks are not pre‑defined, they combined quantitative metrics (hallucination rates) with qualitative user feedback. A one‑month sample of UI‑generated summaries revealed an average of 0.73 hallucinations and 1.60 factual inaccuracies per summary, underscoring the need for clinician verification.

A structured value‑assessment framework quantified cost savings, time savings, and revenue growth. Conservative first‑year estimates projected $6 million in savings, driven by reduced manual chart review, faster documentation, and new revenue‑generating workflows. The platform currently offers access to 18 LLMs from providers such as OpenAI, Anthropic, Google, Meta, and DeepSeek, allowing the institution to remain independent of any single vendor. Continuous monitoring of system integrity (latency, errors, usage), performance (task distribution, error types), and impact (clinical benefit, API costs) is performed via telemetry and dashboards.

Overall, the study demonstrates that a “build‑from‑within” approach—combining responsible AI governance, real‑time data integration, flexible model routing, and rigorous post‑deployment monitoring—can deliver a scalable, safe, and financially beneficial LLM capability for health systems while preserving data sovereignty and institutional control.

Adoption and Use of LLMs at an Academic Medical Center

💡 Research Summary

Comments & Academic Discussion

Leave a Comment