Transcending the Annotation Bottleneck: AI-Powered Discovery in Biology and Medicine
The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supervised learning drove the initial wave of clinical algorithms, a paradigm shift towards un…
Authors: Soumick Chatterjee
T ranscending the Annotation Bottlenec k: AI-P o w ered Disco v ery in Biology and Medicine Soumic k Chatterjee 1 , 2 [0000 − 0001 − 7594 − 1188] 1 Human T ec hnop ole, Milan, Italy 2 F acult y of Computer Science, Otto von Gueric ke Univ ersity Magdeburg, Magdeburg, German y contact@soumick.com Abstract. The dependence on exp ert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While sup ervised learning drov e the initial w av e of clini- cal algorithms, a paradigm shift tow ards unsupervised and self-sup ervised learning (SSL) is curren tly unlo c king the latent potential of biobank- scale datasets. By learning directly from the intrinsic structure of data - whether pixels in a magnetic resonance image (MRI), vo xels in a volu- metric scan, or tok ens in a genomic sequence - these methods facilitate the discov ery of no vel phenot yp es, the link age of morphology to genet- ics, and the detection of anomalies without human bias. This article syn thesises seminal and recen t adv ances in ”learning without lab els,” highligh ting ho w unsup ervised frameworks can derive heritable cardiac traits, predict spatial gene expression in histology , and detect pathologies with p erformance that riv als or exceeds sup ervised counterparts. Keyw ords: Unsup ervised Learning · Medical Imaging · Phenot yp e Dis- co very · Anomaly Detection · Genomics. 1 In tro duction: The Annotation Bottlenec k and the Unsup ervised Solution F or the past decade, the standard w orkflow in biomedical data analysis has ne- cessitated the curation of datasets, the manual annotation of regions of in terest (e.g., tumours, lesions, or anatomical structures), and the training of sup ervised mo dels to replicate these h uman lab els. While effectiv e for sp ecific, narro w tasks, this approac h is fundamentally constrained b y the scarcity of high-qualit y labels, the inheren t bias of h uman kno wledge, and the high cost of exp ert time. F urther- more, sup ervised approac hes typically discard the v ast ma jority of information con tained within high-dimensional data, fo cusing only on features relev an t to the pre-defined lab el. T o ov ercome these limitations, the field has increasingly turned to unsu- p ervised and self-supervised learning. A common critique of these metho ds is that they sacrifice accuracy for flexibility . Ho wev er, recent evidence suggests 2 Soumic k Chatterjee this trade-off is v anishing. In an inv estigation of vo xel-wise segmentation for ad- ditiv e man ufacturing, Iuso et al. [11] compared sophisticated supervised models (suc h as UNet++) against unsup ervised V AE-based approac hes for p orosit y de- tection. Remark ably , they found that unsup ervised models, particularly when p ost-processed, could ac hieve p erformance metrics (Average Precision 0.830) that riv alled or even exceeded their sup ervised counterparts (Average Precision 0.751) in challenging testing scenarios. This finding challenges the ortho do xy that sup ervised learning is in v ariably sup erior, suggesting that for complex, highly v ariable targets, a mo del that comprehends the fundamental data distribution ma y b e more robust than one trained to mimic a limited set of human lab els. These techniques learn robust representations by solving ”pretext” tasks—suc h as con trasting similar views of an image or reconstructing mask ed portions of data—rather than predicting extrinsic lab els. Seminal w orks in computer vision established the efficacy of this approac h: SimCLR [8] demonstrated that con- trastiv e learning could pro duce visual representations com p arable to sup ervised metho ds, while DINO [5] utilised self-distillation with Vision T ransformers (ViT) to capture semantic segmen tation prop erties without explicit supervision. In the medical domain, these principles were adapted to address data heterogeneity by Azizi et al. [1], paving the wa y for ”foundation mo dels” capable of discov ering biological signals that ma y elude human observers. 2 Unsup ervised Learning in Medical Imaging The application of unsup ervised learning in medical imaging has matured from simple dimensionalit y reduction to complex tasks in volving phenotype disco very , anomaly detection, and image registration. 2.1 Phenot yp e Disco very and Genetic Link age A principal adv an tage of data-driv en disco very is the capacity to define quan ti- tativ e phenotypes that bridge the gap b et ween macroscopic imaging and micro- scopic genetics. In the realm of multimodal learning, T aleb et al. [19] introduced the ”Con tIG” framework, demonstrating that self-sup ervised contrastiv e learn- ing could effectively in tegrate medical imaging with genetic data to improv e disease prediction. Building upon this, Radhakrishnan et al. [17] utilised cross-mo dal auto en- co ders to learn holistic representations of the cardiov ascular state. Expanding this frontier, Ometto et al. [16] recently developed a 3D diffusion autoenco der (3DDiffAE) to analyse temp oral cardiac MRIs from the UK Biobank. Unlike traditional metho ds relying on fixed parameters suc h as ejection fraction, this unsup ervised mo del learnt a ”latent space” of 182 phenotypes describing complex cardiac w all motion and structure. Crucially , Ometto et al. demonstrated that these latent phenotypes shared a genetic architecture with established cardiac diseases, revealing 89 significant genomic loci. Title Suppressed Due to Excessive Length 3 This principle extends to the microscopic scale. In computational pathology , Cisternino et al. [9] utilised self-supervised Vision T ransformers (ViT) trained on ov er 1.7 million histology tiles from the Genotype-Tissue Expression (GTEx) pro ject. Their mo del, RNAP ath, utilises these self-sup ervised features to pre- dict spatial RNA expression lev els directly from H&E-stained slides, effectiv ely bridging the gap betw een tissue morphology and transcriptomics without the need for exp ensiv e spatial transcriptomics assa ys. 2.2 Robust Anomaly Detection One of the mos t immediate clinical applications of learning without lab els is anomaly detection—identifying pathology as a deviation from the normative distribution. StRegA [7] addressed this in neuroimaging, a pipeline utilising a con text-enco ding V ariational Auto encoder (V AE). By learning the distribution of healthy brain anatomy , StRegA identifies regions that the mo del cannot ac- curately reconstruct, successfully lo calising brain tumours and other anomalies without ever observing a labelled tumour during training. Building upon these foundational V AE-based approaches, recen t adv ance- men ts hav e introduced more sophisticated generativ e mo dels. Li et al. [13] intro- duced Scale-Aware Contrastiv e Reverse Distillation (SCAD), a nov el framework that enhances anomaly detection b y leveraging multi-scale feature representa- tions and con trastive learning. SCAD addresses the challenge of scale v ariance in medical anomalies by distilling knowledge from a pre-trained teacher netw ork to a student net work in a reverse manner, effectively capturing anomalies across differen t resolutions. F urthermore, Bercea et al. [4] critically ev aluated normative represen tation learning in generative AI for robust anomaly detection in brain imaging. Their work highligh ts the imp ortance of robust normativ e learning to ensure that generativ e mo dels accurately capture the v ariability of healthy brain anatom y , thereby impro ving the sensitivity and sp ecificit y of anomaly detection. Addressing the limitations of standard diffusion mo dels in preserving fine de- tails, Beizaee et al. [3] introduced MAD-AD, a masked diffusion framework for unsup ervised brain anomaly detection. By incorp orating masking strategies in to the diffusion pro cess, MAD-AD effectively mitigates the issue of noise accum u- lation during image reconstruction, leading to more precise anomaly lo calisation compared to traditional diffusion-based metho ds. The field contin ues to diversify with the emergence of State Space Models (SSMs) like Mamba, which hav e led to new architectures such as MAA T (Mamba Adaptiv e Anomaly T ransformer) [20]. This mo del efficien tly captures long-range dep endencies in ph ysiological data, offering a computationally efficient alter- nativ e to traditional T ransformers. F urthermore, foundation mo dels are being adapted for this task; Seeb¨ oc k et al. [18] recen tly utilised self-supervised learning to guide segmentation in retinal OCT scans, effectively using anomaly detection as a weak sup ervision signal. 4 Soumic k Chatterjee 2.3 Image Registration Deformable image registration has traditionally b een computationally exp en- siv e. Unsup ervised deep learning metho ds like V oxelMorph [2] learn to predict deformation fields b y optimizing image similarity metrics, ac hieving state-of- the-art accuracy with significantly faster inference times. Building up on these foundations, MICDIR [6] in tro duced a m ulti-scale in verse-consisten t framew ork incorp orating a self-constructing graph latent. By explicitly enco ding global de- p endencies and enforcing cycle consistency , MICDIR demonstrated statistically significan t impro v ements ov er V oxelMorph in both in tramodal and intermodal brain MRI registration tasks. 3 Deciphering the Molecular Code Bey ond imaging, unsupervised learning is revolutionising genomics and molecu- lar biology by treating biological sequences as a ”language” of life. 3.1 Genomic Sequence Mo delling Just as large language mo dels learn the structure of text, genomic mo dels can learn the grammar of regulatory elements and gene expression without explicit lab els. Seminal works like DNABER T [12] applied the BER T arc hitecture to k- mer sequences of DNA, demonstrating that attention mec hanisms could capture global and lo cal genomic con text. More recently , the Nucleotide T ransformer [10] scaled this approach to billions of parameters, training on multispecies genomes to predict molecular phenot yp es and v ariant effects. 3.2 Single-Cell Analysis The adven t of single-cell RNA sequencing (scRNA-seq) has provided high-resolution views of cellular heterogeneity , but the data is inherently high-dimensional, sparse, and noisy . Deep generative mo dels like scVI (Single-cell V ariational In- ference) [15] use v ariational inference to approximate the underlying probability distributions of gene expression. By learning a lo w-dimensional laten t represen- tation of each cell, scVI can correct for batch effects, impute missing v alues, and cluster cell types without reliance on pre-defined markers. 4 Clinical and Therap eutic F ron tiers The utility of unsupervised learning extends in to the translation of biological insigh ts in to clinical practice and therapeutic dev elopment. Title Suppressed Due to Excessive Length 5 4.1 Computational Phenotyping from Electronic Health Records Electronic Health Records (EHR) contain rich, longitudinal data on patien t health. Unsupervised learning allo ws for ”computational phenot yping”—the dis- co very of clinical patterns without manual cohort definition. Inspired by natural language pro cessing, mo dels lik e BEHR T [14] treat patient medical histories as sequences of even ts and use T ransformer arc hitectures to learn robust pa- tien t representations. These self-supervised embeddings can predict future dis- ease risks and stratify patients in to no vel subt yp es, effectiv ely enabling precision medicine at the population scale. 5 Concluding Remarks The transition from sup ervised to unsup ervised learning marks a decisive mat- uration in biomedical AI, effectively circumv enting the ”annotation b ottlenec k” that has long stifled progress. No longer compromising on accuracy , these self- sup ervised frameworks now riv al sup ervised counterparts and drive gen uine dis- co very—from defining nov el cardiac phenotypes to deco ding the genomic ”lan- guage” of life. By lev eraging the intrinsic structure of data, the field is moving to wards a holistic view of biology where insights are derived from the data itself rather than human bias. F uture researc h must fo cus on the con vergence of these mo dalities in to uni- fied ”foundation mo dels” capable of reasoning across imaging, genomics, and electronic health records simultaneously . Additionally , the exploration of com- putationally efficien t architectures, such as State Space Mo dels (e.g. Mam ba), offers a promising a ven ue for mo delling long-range biological dependencies that traditional T ransformers struggle to capture. Ultimately , the priorit y remains bridging the gap b et ween these high-dimensional latent representations and in- terpretable, clinically actionable biomark ers. References 1. Azizi, S., Mustafa, B., Ryan, F., Bea ver, Z., F reyb erg, J., Deaton, J., Loh, A., Karthik esalingam, A., Kornb lith, S., Chen, T., et al.: Big self-sup ervised models adv ance medical image classification. In: Pro ceedings of the IEEE/CVF interna- tional conference on computer vision. pp. 3478–3488 (2021) 2. Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: V oxelmorph: a learning framework for deformable medical image registration. IEEE transactions on medical imaging 38 (8), 1788–1800 (2019) 3. Beizaee, F., Lo dygensky , G., Desrosiers, C., Dolz, J.: Mad-ad: masked diffusion for unsup ervised brain anomaly detection. In: In ternational Conference on Information Pro cessing in Medical Imaging. pp. 139–153. Springer (2025) 4. Bercea, C.I., Wiestler, B., Rueck ert, D., Schnabel, J.A.: Ev aluating normative rep- resen tation learning in generative ai for robust anomaly detection in brain imaging. Nature Comm unications 16 (1), 1624 (2025) 6 Soumic k Chatterjee 5. Caron, M., T ouvron, H., Misra, I., J´ egou, H., Mairal, J., Bo janowski, P ., Joulin, A.: Emerging properties in self-sup ervised vision transformers. In: Pro ceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021) 6. Chatterjee, S., Ba ja j, H., Siddiquee, I.H., Subbara yappa, N.B., Simon, S., Shashid- har, S.B., Speck, O., N ¨ urn berger, A.: Micdir: Multi-scale in verse-consisten t de- formable image registration using unetmss with self-constructing graph laten t. Computerized Medical Imaging and Graphics 108 , 102267 (2023) 7. Chatterjee, S., Sciarra, A., D ¨ unn wald, M., T ummala, P ., Agraw al, S.K., Jauhari, A., Kalra, A., Oeltze-Jafra, S., Sp ec k, O., N ¨ urn b erger, A.: Strega: Unsup ervised anomaly detection in brain mris using a compact context-encoding v ariational auto encoder. Computers in biology and medicine 149 , 106093 (2022) 8. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framew ork for con- trastiv e learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PmLR (2020) 9. Cisternino, F., Ometto, S., Chatterjee, S., Giacopuzzi, E., Levine, A.P ., Glaston- bury , C.A.: Self-sup ervised learning for characterising histomorphological div ersity and spatial rna expression prediction across 23 human tissue t yp es. Nature Com- m unications 15 (1), 5906 (2024) 10. Dalla-T orre, H., Gonzalez, L., Mendoza-Revilla, J., Lopez Carranza, N., Grzyw aczewski, A.H., Oteri, F., Dallago, C., T rop, E., de Almeida, B.P ., Sirelkha- tim, H., et al.: Nucleotide transformer: building and ev aluating robust foundation mo dels for human genomics. Nature Metho ds 22 (2), 287–297 (2025) 11. Iuso, D., Chatterjee, S., Cornelissen, S., V erhees, D., Beenhouw er, J.D., Sijb ers, J.: V o xel-wise segmentation for p orosit y inv estigation of additive manufactured parts with 3d unsup ervised and (deeply) supervised neural netw orks. Applied Intelligence 54 (24), 13160–13177 (2024) 12. Ji, Y., Zhou, Z., Liu, H., Davuluri, R.V.: Dnab ert: pre-trained bidirectional encoder represen tations from transformers model for dna-languag e in genome. Bioinformat- ics 37 (15), 2112–2120 (2021) 13. Li, C., Shi, Y., Hu, J., Zh u, X.X., Mou, L.: Scale-aw are contrastiv e rev erse distillation for unsupervised medical anomaly detection. In: The Thirteen th In ternational Conference on Learning Representations (2025), h ttps://op enreview.net/forum?id=HNOo4UNPBF 14. Li, Y., Rao, S., Solares, J.R.A., Hassaine, A., Ramakrishnan, R., Cano y , D., Zhu, Y., Rahimi, K., Salimi-Khorshidi, G.: Behrt: transformer for electronic health records. Scien tific rep orts 10 (1), 7155 (2020) 15. Lop ez, R., Regier, J., Cole, M.B., Jordan, M.I., Y osef, N.: Deep generativ e mo deling for single-cell transcriptomics. Nature metho ds 15 (12), 1053–1058 (2018) 16. Ometto, S., Chatterjee, S., V ergani, A.M., Landini, A., Sharap o v, S., Giacopuzzi, E., Viscon ti, A., Bianchi, E., Santonastaso, F., So da, E.M., et al.: Hundreds of cardiac mri traits derived using 3d diffusion auto encoders share a common genetic arc hitecture. medRxiv pp. 2024–11 (2024) 17. Radhakrishnan, A., F riedman, S.F., Khurshid, S., Ng, K., Batra, P ., Lubitz, S.A., Philippakis, A.A., Uhler, C.: Cross-modal auto encoder framework learns holis- tic representations of cardiov ascular state. Nature Communications 14 (1), 2436 (2023) 18. Seeb¨ ock, P ., Orlando, J.I., Michl, M., Mai, J., Schmidt-Erfurth, U., Bogunovi ´ c, H.: Anomaly guided segmentation: Introducing semantic context for lesion segmenta- tion in retinal oct using w eak con text sup ervision from anomaly detection. Medical Image Analysis 93 , 103104 (2024) Title Suppressed Due to Excessive Length 7 19. T aleb, A., Kirc hler, M., Monti, R., Lippert, C.: Contig: Self-supervised multimodal con trastive learning for medical imaging with genetics. In: Pro ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 20908– 20921 (2022) 20. Zak aria Sellam, A., Benaissa, I., T aleb-Ahmed, A., Patrono, L., Distan te, C.: Maat: Mam ba adaptive anomaly transformer with asso ciation discrepancy for time series. arXiv e-prin ts pp. arXiv–2502 (2025)
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment