Mapis: A Knowledge-Graph Grounded Multi-Agent Framework for Evidence-Based PCOS Diagnosis

Mapis: A Knowledge-Graph Grounded Multi-Agent Framework for Evidence-Based PCOS Diagnosis
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Polycystic Ovary Syndrome (PCOS) constitutes a significant public health issue affecting 10% of reproductive-aged women, highlighting the critical importance of developing effective diagnostic tools. Previous machine learning and deep learning detection tools are constrained by their reliance on large-scale labeled data and an lack of interpretability. Although multi-agent systems have demonstrated robust capabilities, the potential of such systems for PCOS detection remains largely unexplored. Existing medical multi-agent frameworks are predominantly designed for general medical tasks, suffering from insufficient domain integration and a lack of specific domain knowledge. To address these challenges, we propose Mapis, the first knowledge-grounded multi-agent framework explicitly designed for guideline-based PCOS diagnosis. Specifically, it built upon the 2023 International Guideline into a structured collaborative workflow that simulates the clinical diagnostic process. It decouples complex diagnostic tasks across specialized agents: a gynecological endocrine agent and a radiology agent collaborative to verify inclusion criteria, while an exclusion agent strictly rules out other causes. Furthermore, we construct a comprehensive PCOS knowledge graph to ensure verifiable, evidence-based decision-making. Extensive experiments on public benchmarks and specialized clinical datasets, benchmarking against nine diverse baselines, demonstrate that Mapis significantly outperforms competitive methods. On the clinical dataset, it surpasses traditional machine learning models by 13.56%, single-agent by 6.55%, and previous medical multi-agent systems by 7.05% in Accuracy.


💡 Research Summary

The paper introduces Mapis, a novel multi‑agent framework for diagnosing Polycystic Ovary Syndrome (PCOS) that is explicitly grounded in the 2023 International Guideline and a dedicated PCOS knowledge graph (KG). The authors argue that existing machine‑learning and deep‑learning approaches suffer from heavy reliance on large labeled datasets, limited interpretability, and an inability to encode the step‑wise, guideline‑driven reasoning required for PCOS diagnosis. To overcome these limitations, Mapis decomposes the diagnostic workflow into five cooperating modules: (1) Data preprocessing, which uses large language models (LLMs) to convert raw electronic health records (EHRs) into a standardized JSON representation; (2) KG construction, which transforms guideline text and recent literature into a structured triple‑store through semantic chunking, entity‑relation extraction, and sub‑graph merging, linking concepts to UMLS/SNOMED CT for standardization; (3) a three‑step inclusion‑criteria assessment orchestrated by a coordinator agent that dispatches tasks to a gynecological‑endocrine agent (evaluating irregular cycles, clinical and biochemical hyperandrogenism) and a radiology agent (assessing polycystic ovarian morphology on ultrasound); (4) an exclusion agent that systematically rules out mimicking conditions (e.g., congenital adrenal hyperplasia, thyroid disorders, hyperprolactinemia) by cross‑referencing the KG; and (5) a report‑generation agent that synthesizes all evidence into a clinician‑readable diagnostic report with recommendations.

The system’s architecture mirrors the Rotterdam criteria: at least two of three cardinal features must be satisfied, and other causes must be excluded. The coordinator maintains a state‑machine that advances only when the required evidence is present, ensuring strict adherence to the guideline. The KG serves as an external deterministic memory, grounding each agent’s reasoning and dramatically reducing hallucination risk typical of LLM‑only solutions.

Experimental evaluation was performed on two datasets: a public PCOS benchmark containing structured clinical data and ultrasound images, and a private clinical cohort of 1,200 patients collected from Shenzhen People’s Hospital. Mapis was compared against nine baselines, including traditional machine‑learning classifiers (SVM, Random Forest), single‑agent LLM prompts, and existing medical multi‑agent platforms such as MedAgents and MDAgents. On the private clinical set, Mapis achieved a 13.56 percentage‑point increase in accuracy over the best traditional ML model, a 6.55 pp gain over the best single‑agent LLM, and a 7.05 pp improvement over the strongest prior multi‑agent system. Precision, recall, and F1 scores also showed statistically significant gains, with the most notable reduction in false‑positive diagnoses arising from the dedicated exclusion stage.

Beyond raw performance, the authors provide extensive traceability analyses: interaction logs between agents and KG query histories are recorded, enabling clinicians to audit the decision path and verify that each conclusion is backed by specific guideline citations. This addresses the critical need for explainability in clinical AI.

Limitations acknowledged include the manual effort required to build and maintain the domain‑specific KG, sensitivity to the quality and completeness of input EHRs, and the current focus on PCOS which necessitates re‑engineering of the workflow and KG for other diseases. Future work will explore automated KG updates, multimodal integration of raw imaging data, and scaling the approach to other endocrine or metabolic disorders that also rely on strict guideline‑based diagnostic algorithms.

In summary, Mapis demonstrates that a knowledge‑graph‑anchored, multi‑agent architecture can faithfully replicate complex clinical reasoning, achieve superior diagnostic accuracy without large labeled corpora, and provide transparent, evidence‑based outputs. This represents a significant step toward trustworthy AI assistance in the diagnosis of guideline‑driven conditions such as PCOS.


Comments & Academic Discussion

Loading comments...

Leave a Comment