iSight: Towards expert-AI co-assessment for improved immunohistochemistry staining interpretation

iSight: Towards expert-AI co-assessment for improved immunohistochemistry staining interpretation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Immunohistochemistry (IHC) provides information on protein expression in tissue sections and is commonly used to support pathology diagnosis and disease triage. While AI models for H&E-stained slides show promise, their applicability to IHC is limited due to domain-specific variations. Here we introduce HPA10M, a dataset that contains 10,495,672 IHC images from the Human Protein Atlas with comprehensive metadata included, and encompasses 45 normal tissue types and 20 major cancer types. Based on HPA10M, we trained iSight, a multi-task learning framework for automated IHC staining assessment. iSight combines visual features from whole-slide images with tissue metadata through a token-level attention mechanism, simultaneously predicting staining intensity, location, quantity, tissue type, and malignancy status. On held-out data, iSight achieved 85.5% accuracy for location, 76.6% for intensity, and 75.7% for quantity, outperforming fine-tuned foundation models (PLIP, CONCH) by 2.5–10.2%. In addition, iSight demonstrates well-calibrated predictions with expected calibration errors of 0.0150-0.0408. Furthermore, in a user study with eight pathologists evaluating 200 images from two datasets, iSight outperformed initial pathologist assessments on the held-out HPA dataset (79% vs 68% for location, 70% vs 57% for intensity, 68% vs 52% for quantity). Inter-pathologist agreement also improved after AI assistance in both held-out HPA (Cohen’s $κ$ increased from 0.63 to 0.70) and Stanford TMAD datasets (from 0.74 to 0.76), suggesting expert–AI co-assessment can improve IHC interpretation. This work establishes a foundation for AI systems that can improve IHC diagnostic accuracy and highlights the potential for integrating iSight into clinical workflows to enhance the consistency and reliability of IHC assessment.


💡 Research Summary

This paper addresses the pressing need for automated, reliable interpretation of immunohistochemistry (IHC) slides by introducing a large‑scale, richly annotated dataset (HPA10M) and a novel multi‑task deep learning framework called iSight. HPA10M comprises 10,495,672 IHC images harvested from the Human Protein Atlas, each paired with comprehensive metadata: 45 normal tissue types, 20 major cancer types, patient demographics, SNOMED‑CT diagnoses, UniProt/Ensembl protein identifiers, and four staining scores (intensity, location, quantity, and a “none” class). The authors built a custom web‑scraping pipeline, harmonized XML metadata using biomedical ontologies (Uberon, SNOMED, etc.), performed extensive quality control, removed duplicates via MD5 hashes, and provided tissue masks and bounding boxes for easy patch extraction. The dataset is split into a training set of 10,493,672 images and a held‑out test set of 2,000 images.

iSight is designed to jointly exploit visual information from whole‑slide images and structured metadata. Whole‑slide or tissue‑microarray images are tiled into non‑overlapping 336 × 336 patches. Each patch is processed by a CLIP‑ViT‑large‑patch‑14‑336 Vision Transformer, yielding 577 token embeddings (576 spatial tokens plus a CLS token). All tokens across patches are concatenated and fed into a gated‑attention multiple‑instance learning (MIL) module, which produces a weighted slide‑level representation. In parallel, the metadata (tissue name, SNOMED code, antibody gene, etc.) is encoded with the CLIP text encoder and one‑hot cell‑type vectors, then projected into the same latent space. A token‑level attention mechanism fuses visual and contextual embeddings, allowing the model to focus on diagnostically relevant regions while preserving global context. Five parallel classification heads predict staining intensity, subcellular location, staining quantity, tissue type, and malignancy status simultaneously, enabling shared representation learning across tasks.

On the held‑out test set, iSight achieves 85.5 % accuracy for subcellular location, 76.6 % for intensity, and 75.7 % for quantity, outperforming fine‑tuned H&E‑based foundation models (PLIP, CONCH) by 2.5–10.2 % across tasks. Calibration is strong, with Expected Calibration Errors ranging from 0.0150 to 0.0408, indicating reliable probability estimates.

To evaluate clinical relevance, the authors conducted a user study with eight board‑certified pathologists who assessed 200 images from both the HPA test set and the Stanford TMAD dataset. Without AI assistance, pathologists achieved 68 % (location), 57 % (intensity), and 52 % (quantity) accuracy. With iSight assistance, performance rose to 79 %, 70 %, and 68 % respectively. Inter‑observer agreement, measured by Cohen’s κ, also improved: from 0.63 to 0.70 on the HPA set and from 0.74 to 0.76 on TMAD, demonstrating that AI support can harmonize interpretations among experts.

The paper’s contributions are threefold: (1) the creation and public release of HPA10M, the first truly large‑scale, structured IHC dataset suitable for foundation‑model training; (2) the development of iSight, a multimodal token‑level attention architecture that effectively integrates visual and metadata cues, addressing the unique challenges of IHC such as variable chromogen colors, subcellular localization, and quantitative scoring; (3) empirical evidence that expert‑AI co‑assessment improves both diagnostic accuracy and consistency.

Limitations include reliance on research‑grade images from the Human Protein Atlas, which may differ in staining protocols, scanner hardware, and artifact profiles from routine clinical slides. Real‑world generalization, domain adaptation, and validation on prospective clinical cohorts remain future work. Additionally, the current four‑class scoring scheme could be extended to continuous or more granular scoring systems, and the framework could be adapted for multiplexed IHC or combined H&E‑IHC analyses.

Overall, the study establishes a solid foundation for AI‑augmented IHC interpretation, offering a scalable tool that can serve as a second read, quality‑control assistant, and educational resource, particularly valuable in settings with limited pathology expertise. The modular design of iSight facilitates rapid adaptation to new biomarkers, scoring protocols, and clinical workflows, paving the way toward robust, deployable AI systems in diagnostic pathology.


Comments & Academic Discussion

Loading comments...

Leave a Comment