Beyond Occlusion: In Search for Near Real-Time Explainability of CNN-Based Prostate Cancer Classification
Deep neural networks are starting to show their worth in critical applications such as assisted cancer diagnosis. However, for their outputs to get accepted in practice, the results they provide should be explainable in a way easily understood by pathologists. A well-known and widely used explanation technique is occlusion, which, however, can take a long time to compute, thus slowing the development and interaction with pathologists. In this work, we set out to find a faster replacement for occlusion in a successful system for detecting prostate cancer. Since there is no established framework for comparing the performance of various explanation methods, we first identified suitable comparison criteria and selected corresponding metrics. Based on the results, we were able to choose a different explanation method, which cut the previously required explanation time at least by a factor of 10, without any negative impact on the quality of outputs. This speedup enables rapid iteration in model development and debugging and brings us closer to adopting AI-assisted prostate cancer detection in clinical settings. We propose that our approach to finding the replacement for occlusion can be used to evaluate candidate methods in other related applications.
💡 Research Summary
The paper addresses a critical bottleneck in AI‑assisted prostate cancer diagnosis: the slow computation of occlusion‑based saliency maps, which are currently used to explain the decisions of a VGG‑16‑based binary classifier that operates on 512 × 512 pixel tiles extracted from whole‑slide images (WSI). While occlusion provides intuitive, faithful explanations, it requires dozens of forward passes per tile, leading to 30–90 minutes per slide and making real‑time assistance infeasible.
To replace occlusion, the authors first define a comprehensive evaluation framework consisting of four criteria: computational efficiency (runtime per tile, total slide time, GPU memory), faithfulness (measured by the Remove‑and‑Debias (R‑OAD) method, which progressively removes the most salient pixels and observes the drop in model confidence), localization (quantified by the Weighting Game metric that compares saliency mass inside expert‑annotated bounding boxes to total saliency mass), and usefulness (similarity of new saliency maps to the trusted occlusion maps, again using the Weighting Game).
Four single‑pass, widely cited explanation techniques are selected for comparison: (1) CAM, which leverages the global pooling layer present in the model; (2) Grad‑CAM++, a gradient‑based extension of Grad‑CAM that often yields sharper, more discriminative maps; (3) HiResCAM, a recent gradient‑based method theoretically guaranteed to highlight all regions that increase the class score; and (4) Composite‑LRP, an implementation of Layer‑wise Relevance Propagation with an α‑β rule for sharper visual output.
Experiments are conducted on the test set from the original study
Comments & Academic Discussion
Loading comments...
Leave a Comment