From Cold Start to Active Learning: Embedding-Based Scan Selection for Medical Image Segmentation
Accurate segmentation annotations are critical for disease monitoring, yet manual labeling remains a major bottleneck due to the time and expertise required. Active learning (AL) alleviates this burden by prioritizing informative samples for annotation, typically through a diversity-based cold-start phase followed by uncertainty-driven selection. We propose a novel cold-start sampling strategy that combines foundation-model embeddings with clustering, including automatic selection of the number of clusters and proportional sampling across clusters, to construct a diverse and representative initial training. This is followed by an uncertainty-based AL framework that integrates spatial diversity to guide sample selection. The proposed method is intuitive and interpretable, enabling visualization of the feature-space distribution of candidate samples. We evaluate our approach on three datasets spanning X-ray and MRI modalities. On the CheXmask dataset, the cold-start strategy outperforms random selection, improving Dice from 0.918 to 0.929 and reducing the Hausdorff distance from 32.41 to 27.66 mm. In the AL setting, combined entropy and diversity selection improves Dice from 0.919 to 0.939 and reduces the Hausdorff distance from 30.10 to 19.16 mm. On the Montgomery dataset, cold-start gains are substantial, with Dice improving from 0.928 to 0.950 and Hausdorff distance decreasing from 14.22 to 9.38 mm. On the SynthStrip dataset, cold-start selection slightly affects Dice but reduces the Hausdorff distance from 9.43 to 8.69 mm, while active learning improves Dice from 0.816 to 0.826 and reduces the Hausdorff distance from 7.76 to 6.38 mm. Overall, the proposed framework consistently outperforms baseline methods in low-data regimes, improving segmentation accuracy.
💡 Research Summary
The paper presents a two‑stage sampling framework designed to reduce annotation effort while improving segmentation performance in medical imaging. In the cold‑start stage, the authors extract image‑level embeddings from a large‑scale, medically‑pretrained vision encoder (ResNet‑50 trained on the RadImageNet collection). These high‑dimensional embeddings are projected into a 2‑D space using t‑SNE, which provides an intuitive visual layout and simplifies distance calculations. To determine the appropriate number of clusters, the method evaluates several candidate k values with k‑means clustering and selects the one that maximizes the average silhouette score, thereby automatically adapting to the intrinsic structure of the data. For each cluster, the medoid (the point minimizing total intra‑cluster distance) is chosen as an initial seed. The remaining annotation budget is allocated proportionally to cluster sizes, and within each cluster additional samples are selected by a greedy farthest‑point strategy that iteratively adds the point farthest from the already‑chosen set, maximizing intra‑cluster diversity.
After training an initial segmentation network on this diverse seed set, the active‑learning stage combines two complementary signals: (1) image‑level predictive entropy, computed as the mean pixel‑wise entropy of the softmax output and normalized across the unlabeled pool, and (2) spatial diversity, measured as the minimum Euclidean distance in the 2‑D embedding space between a candidate image and any already‑selected image. Both scores are normalized to
Comments & Academic Discussion
Loading comments...
Leave a Comment