Evaluation of Oncotimia: An LLM based system for supporting tumour boards
Multidisciplinary tumour boards (MDTBs) play a central role in oncology decision-making but require manual processes and structuring large volumes of heterogeneous clinical information, resulting in a substantial documentation burden. In this work, we present ONCOTIMIA, a modular and secure clinical tool designed to integrate generative artificial intelligence (GenAI) into oncology workflows and evaluate its application to the automatic completion of lung cancer tumour board forms using large language models (LLMs). The system combines a multi-layer data lake, hybrid relational and vector storage, retrieval-augmented generation (RAG) and a rule-driven adaptive form model to transform unstructured clinical documentation into structured and standardised tumour board records. We assess the performance of six LLMs deployed through AWS Bedrock on ten lung cancer cases, measuring both completion form accuracy and end-to-end latency. The results demonstrate high performance across models, with the best performing configuration achieving an 80% of correct field completion and clinically acceptable response time for most LLMs. Larger and more recent models exhibit best accuracies without incurring prohibitive latency. These findings provide empirical evidence that LLM- assisted autocompletion form is technically feasible and operationally viable in multidisciplinary lung cancer workflows and support its potential to significantly reduce documentation burden while preserving data quality.
💡 Research Summary
**
The paper presents OncoTimia, a modular, secure clinical platform that integrates generative AI—specifically large language models (LLMs)—into multidisciplinary tumour board (MDTB) workflows for lung cancer. The authors identify the heavy documentation burden inherent in MDTBs, where heterogeneous data (electronic health records, radiology reports, pathology results, molecular assays, and treatment histories) are scattered across multiple systems and must be manually extracted, normalized, and entered into a standardized case‑review form. To address this, OncoTimia combines a three‑tier data lake (landing, staging, refined), hybrid relational‑vector storage, retrieval‑augmented generation (RAG), and a rule‑driven adaptive form model.
System Architecture
- Data Ingestion Layer: Accepts raw PDFs, DOCX, and text files, validates formats, and generates metadata.
- ETL Pipeline: Uses LangChain loaders (Docx2txtLoader, PyPDFLoader) to extract content, clean text, tokenize, lemmatize, and stem.
- Storage Subsystem:
- Relational Store – PostgreSQL holds structured variables (demographics, ECOG, staging, treatment dates).
- Vector Store – Qdrant stores semantic embeddings generated by the Nomic model, enabling similarity search for RAG.
- Backend Services (Micro‑services): Provide automatic summarisation, form autocompletion, and a clinical assistant for question answering.
- LLM Abstraction Layer: Translates clinical requests into model‑compliant prompts, enforces safety checks, and normalises outputs to a predefined JSON schema.
- Reverse Proxy: Handles routing, load‑balancing, rate‑limiting, and audit logging, ensuring secure exposure of APIs.
Adaptive Form Model
The lung‑cancer MDTB form is organized into seven logical blocks. Block 1 gathers core data (smoking status, comorbidities, ECOG, imaging summaries, histopathology, molecular biomarkers, staging). Based on values in Block 1, conditional rules activate subsequent blocks (e.g., prior neoplasms, treatment refusal, recurrence, re‑biopsy, radiotherapy, chemotherapy). This rule‑driven flow reduces unnecessary data entry and aligns the form with real‑world clinical decision pathways.
LLM Integration and RAG
Six state‑of‑the‑art LLMs available through AWS Bedrock (including Anthropic Claude‑2, Meta Llama 2‑70B, Mistral‑Large) were evaluated. The abstraction layer constructs prompts that embed retrieved context from the vector store (RAG) and appends the adaptive form schema. Model outputs are parsed, validated, and written back to the relational database as completed form fields.
Evaluation Methodology
Ten synthetic but clinically realistic lung‑cancer cases were generated, each containing a full set of narrative notes and structured metadata. For each case, the system was run with each LLM, measuring:
- Form‑completion accuracy – proportion of fields where the model’s value matched the ground‑truth reference.
- End‑to‑end latency – total time from case ingestion to completed form, encompassing ETL, retrieval, model inference, and post‑processing.
Results
- Average accuracy across all models: 68 %.
- Best‑performing configuration (Claude‑2) achieved 80 % correct field completion.
- Latency was clinically acceptable: most models responded within 2–4 seconds, with the largest model staying under 6 seconds.
- Larger, newer models consistently outperformed smaller ones in accuracy, while latency penalties remained modest.
Discussion
The study demonstrates that a hybrid storage + RAG architecture can effectively supply contextual evidence to LLMs, improving their ability to generate structured, guideline‑aligned outputs. The rule‑driven adaptive form ensures that only relevant fields are presented to the model, mitigating hallucination risk and focusing inference on clinically pertinent information. However, the authors acknowledge residual risks: occasional hallucinations, misinterpretation of nuanced guideline language, and the necessity for human oversight before final submission. Security and compliance were addressed through encryption, role‑based access control, and comprehensive audit logs, yet full regulatory certification (e.g., HIPAA, GDPR) would require additional validation.
Limitations and Future Work
- Evaluation used synthetic cases; real‑world pilot studies are needed to assess robustness against noisy EHR data.
- Current LLMs were used off‑the‑shelf; domain‑specific fine‑tuning or prompt‑engineering could further boost accuracy.
- Expansion to other tumour types and integration with clinical decision support (e.g., guideline‑based treatment recommendations) are planned.
- Exploration of federated learning or on‑premise model deployment could address data‑privacy concerns in multi‑institution settings.
Conclusion
OncoTimia provides empirical evidence that LLM‑assisted autocompletion of MDTB forms is technically feasible and operationally viable. By delivering up to 80 % correct field completion with sub‑6‑second response times, the system promises a substantial reduction in documentation workload while preserving data quality—an important step toward more efficient, AI‑augmented oncology care.
Comments & Academic Discussion
Loading comments...
Leave a Comment