Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-based Agents

Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM-based Agents
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large Language Model (LLM)-based agents that plan, use tools and act has begun to shape healthcare and medicine. Reported studies demonstrate competence on various tasks ranging from EHR analysis and differential diagnosis to treatment planning and research workflows. Yet the literature largely consists of overviews which are either broad surveys or narrow dives into a single capability (e.g., memory, planning, reasoning), leaving healthcare work without a common frame. We address this by reviewing 49 studies using a seven-dimensional taxonomy: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology and Core Tasks & Subtasks with 29 operational sub-dimensions. Using explicit inclusion and exclusion criteria and a labeling rubric (Fully Implemented, Partially Implemented, Not Implemented), we map each study to the taxonomy and report quantitative summaries of capability prevalence and co-occurrence patterns. Our empirical analysis surfaces clear asymmetries. For instance, the External Knowledge Integration sub-dimension under Knowledge Management is commonly realized (~76% Fully Implemented) whereas Event-Triggered Activation sub-dimenison under Interaction Patterns is largely absent (~92% Not Implemented) and Drift Detection & Mitigation sub-dimension under Adaptation & Learning is rare (~98% Not Implemented). Architecturally, Multi-Agent Design sub-dimension under Framework Typology is the dominant pattern (~82% Fully Implemented) while orchestration layers remain mostly partial. Across Core Tasks & Subtasks, information centric capabilities lead e.g., Medical Question Answering & Decision Support and Benchmarking & Simulation, while action and discovery oriented areas such as Treatment Planning & Prescription still show substantial gaps (~59% Not Implemented).


💡 Research Summary

The paper “Agentic AI in Healthcare & Medicine: A Seven-Dimensional Taxonomy for Empirical Evaluation of LLM‑based Agents” presents a systematic review of 49 recent studies (October 2023 – June 2025) that implement large‑language‑model (LLM) driven agents for clinical and biomedical tasks. Recognizing that existing surveys either provide broad overviews or focus on isolated capabilities (e.g., memory, planning), the authors introduce a comprehensive evaluation framework consisting of seven high‑level dimensions and 29 operational sub‑dimensions: Cognitive Capabilities, Knowledge Management, Interaction Patterns, Adaptation & Learning, Safety & Ethics, Framework Typology, and Core Tasks & Subtasks.

A three‑step PRISMA‑style selection process (initial search across Google Scholar, PubMed, DBLP, Scopus, arXiv; removal of duplicates, workshop abstracts, dataset‑only papers, and incomplete manuscripts; independent double‑screening by three reviewers) narrowed an initial pool of 137 papers to the final 49. Each study was then annotated using a three‑point rubric: Fully Implemented (✓), Partially Implemented (∆), and Not Implemented (✗). The rubric is explicitly defined: ✓ requires end‑to‑end implementation with demonstrable evaluation; ∆ covers cases where a capability is shown only in simulation, ablation, or limited context; ✗ denotes mere mention without concrete evidence.

Quantitative analysis reveals striking asymmetries across dimensions. External Knowledge Integration (a sub‑dimension of Knowledge Management) is realized in roughly 76 % of the papers (✓), indicating that most agents successfully connect to external medical databases, guidelines, or literature APIs. In contrast, Event‑Triggered Activation (Interaction Patterns) is absent in about 92 % of the studies (✗), suggesting that real‑time, event‑driven workflows (e.g., alerts for abnormal lab values) remain under‑explored. Drift Detection & Mitigation (Adaptation & Learning) is virtually nonexistent (≈98 % ✗), highlighting a gap in continuous monitoring of model performance as clinical data distributions evolve.

Architecturally, Multi‑Agent Design (Framework Typology) dominates, with 82 % of the surveyed systems employing a collection of cooperating agents, reflecting a trend toward modular, distributed problem solving. However, orchestration layers that coordinate these agents are mostly only partially implemented, indicating that sophisticated scheduling, resource allocation, or conflict resolution mechanisms are still emerging.

Task‑level findings show that information‑centric capabilities—Medical Question Answering & Decision Support and Benchmarking & Simulation—are the most mature, while action‑oriented functions such as Treatment Planning & Prescription lag behind (≈59 % ✗). This pattern underscores that current LLM agents excel at retrieval‑grounded advising but have not yet achieved reliable autonomous prescribing or care‑path generation.

Safety and Ethics, while acknowledged as essential, receive limited concrete treatment in the surveyed literature; most implementations fall into the partial category, implying that systematic risk assessment, regulatory compliance, and privacy safeguards are still nascent.

The authors argue that their taxonomy provides a transparent baseline for cross‑study comparison and a roadmap for future research. By pinpointing where capabilities are abundant (knowledge integration, multi‑agent orchestration) and where they are scarce (real‑time activation, drift mitigation, treatment execution, robust safety frameworks), the paper guides researchers, clinicians, and policymakers toward the most pressing development priorities.

In conclusion, the study demonstrates that LLM‑based agents have reached a level of maturity sufficient for augmenting clinical decision support and simulation, yet they require substantial advances in adaptive learning, real‑time interaction, and rigorous safety/ethics engineering before they can be trusted as dependable, autonomous components of healthcare delivery. The seven‑dimensional taxonomy introduced here offers a scalable, reproducible tool for tracking progress and ensuring that future innovations address the critical gaps identified.


Comments & Academic Discussion

Loading comments...

Leave a Comment