RareCollab -- An Agentic System Diagnosing Mendelian Disorders with Integrated Phenotypic and Molecular Evidence

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Millions of children worldwide are affected by severe rare Mendelian disorders, yet exome and genome sequencing still fail to provide a definitive molecular diagnosis for a large fraction of patients, prolonging the diagnostic odyssey. Bridging this gap increasingly requires transitioning from DNA-only interpretation to multi-modal diagnostic reasoning that combines genomic data, transcriptomic sequencing (RNA-seq), and phenotype information; however, computational frameworks that coherently integrate these signals remain limited. Here we present RareCollab, an agentic diagnostic framework that pairs a stable quantitative Diagnostic Engine with Large Language Model (LLM)-based specialist modules that produce high-resolution, interpretable assessments from transcriptomic signals, phenotypes, variant databases, and the literature to prioritize potential diagnostic variants. In a rigorously curated benchmark of Undiagnosed Diseases Network (UDN) patients with paired genomic and transcriptomic data, RareCollab achieved 77% top-5 diagnostic accuracy and improved top-1 to top-5 accuracy by ~20% over widely used variant-prioritization approaches. RareCollab illustrates how modular artificial intelligence (AI) can operationalize multi-modal evidence for accurate, scalable rare disease diagnosis, offering a promising path toward reducing the diagnostic odyssey for affected families.

💡 Research Summary

RareCollab is an agentic diagnostic framework that integrates genomic, transcriptomic, and phenotypic data to prioritize pathogenic variants in Mendelian disorders. The system combines a stable, DNA‑centric “Diagnostic Engine” with four Large Language Model (LLM)‑based specialist modules—RNA Lab, Phenotype Lab, Database Lab, and Literature Lab—each of which extracts high‑resolution, interpretable evidence from its respective modality.

The Diagnostic Engine is a mixture‑of‑experts deep‑learning model trained on large WES/WGS cohorts. It evaluates each candidate variant across eight evidence domains: population allele frequency, variant impact (coding, splice, frameshift), in‑silico functional scores, prior database annotations, gene constraint, evolutionary conservation, phenotype similarity (derived from HPO/OMIM), and inheritance pattern. Each domain is processed by a dedicated expert network, producing domain‑specific scores that are aggregated into an overall ranking. These scores serve as a quantitative backbone that is stable, reproducible, and directly comparable across patients.

The four LLM Labs operate in parallel to enrich the DNA‑only ranking with modality‑specific judgments. The RNA Lab receives quantitative outlier metrics (aberrant expression, splicing, allele‑specific expression) from upstream pipelines and returns a calibrated “RNA‑level support score.” The Phenotype Lab computes weighted similarity between the patient’s HPO terms and gene‑associated phenotypes, providing axis‑wise explanations that highlight which clinical features are well‑explained. The Database Lab harmonizes ClinVar and other variant repositories, taking into account submitter credibility, classification criteria, and recency to produce a quality‑aware variant assessment. The Literature Lab consists of a retrieval agent that gathers relevant publications and a synthesis agent that evaluates how closely reported human cases or model‑organism phenotypes match the patient’s presentation, thereby capturing emerging gene‑disease links not yet reflected in curated databases.

An Integration Engine combines the DNA‑centric scores with the LLM Lab outputs. It assigns each candidate to one of four tiers based on two clinically motivated axes: phenotype concordance and strength of pathogenicity evidence. Within each tier, the original Diagnostic Engine ranking is preserved, ensuring a stable ordering while allowing flexible, human‑in‑the‑loop adjustments of tier thresholds or evidence‑combination rules. Finally, a Confidence Reviewer module examines the full evidence package and assigns calibrated confidence levels to each candidate diagnosis, mirroring the certainty categories used by the Undiagnosed Diseases Network (UDN) curators.

Performance was evaluated on a rigorously curated benchmark of 131 UDN probands with paired WES/WGS, blood RNA‑seq (and fibroblast RNA‑seq for many), and detailed phenotypic descriptions. RareCollab achieved 46 % top‑1 recall and 77 % top‑5 recall, outperforming AI‑MARRVEL by 8–18 percentage points and Exomiser by 20–32 percentage points across the top‑1 to top‑5 ranks. When stratified by UDN certainty, the system ranked 54 % of “Certain” diagnoses at the top position and 82 % within the top five, while still recovering 75 % of “Highly Likely” cases within the top five—demonstrating robustness even for atypical or incompletely documented presentations.

Ablation analyses showed that the LLM Labs contribute substantially: the RNA Lab provided transcriptomic support for 41 % of diagnostic variants; the Phenotype Lab confirmed phenotype match for 86 %; the Database Lab contributed to 65 %; and the Literature Lab rescued 13 variants lacking curated phenotype data. Overall, about 95 % of diagnostic variants received a coherent genotype‑phenotype explanation from the combined system. Moreover, the Confidence Reviewer’s assignments aligned closely with expert curators, with over half of the cases matching exactly and only a small fraction (≈10 %) being rated more confidently than human reviewers, indicating that the system does not over‑inflate certainty.

Limitations include the opacity of LLM reasoning (prompt design and model parameters are not fully disclosed), potential bias when RNA‑seq is limited to blood (missing tissue‑specific expression abnormalities), and the need for broader validation across diverse clinical settings. Future work should explore integration of additional omics layers (e.g., proteomics, epigenomics), improve LLM interpretability, and embed the framework directly into real‑time clinical workflows.

In summary, RareCollab introduces a novel “DNA‑centric + LLM‑augmented” paradigm for rare disease diagnosis. By modularly combining quantitative DNA evidence with high‑resolution, language‑model‑driven assessments of RNA, phenotype, databases, and literature, it achieves markedly higher diagnostic recall while preserving interpretability and allowing clinician‑driven customization. This approach represents a significant step toward reducing the diagnostic odyssey for families affected by rare Mendelian disorders.

RareCollab -- An Agentic System Diagnosing Mendelian Disorders with Integrated Phenotypic and Molecular Evidence

💡 Research Summary

Comments & Academic Discussion

Leave a Comment