Longitudinal Risk Prediction in Mammography with Privileged History Distillation

Longitudinal Risk Prediction in Mammograph y with Privileged History Distillation Banafsheh Karimian 1 , Alexis Guic hemerre 1 , Souﬁane Belharbi 1 , Natac ha Gillet 1 , Luk e McCaﬀrey 2 , Mohammadhadi Shateri 1 , Eric Granger 1 , 1 LIVIA, ILLS, Systems Engineering Dept., ETS Montreal, Canada 2 Go o dman Cancer Institute, Dept. of Oncology , McGill Univ ersity , Canada Abstract Breast cancer remains a leading cause of cancer-related mortality w orld- wide. Longitudinal mammograph y risk prediction mo dels improv e multi- y ear breast cancer risk prediction based on prior screening exams. Ho w- ev er, in real-w orld clinical practice, longitudinal histories are often in- complete, irregular, or unav ailable due to missed screenings, ﬁrst-time ex- aminations, heterogeneous acquisition schedules, or archiv al constrain ts. The absence of prior exams degrades the p erformance of longitudinal risk mo dels and limits their practical applicability . While substantial longi- tudinal history is a v ailable during training, prior exams are commonly absen t at test time. In this pap er, we address missing history at inference time and propose a longitudinal risk prediction metho d that uses mam- mograph y history as privileged information during training and distills its prognostic v alue into a studen t mo del that only requires the current exam at inference time. The key idea is a privileged multi-teac her dis- tillation sc heme with horizon-speciﬁc teac hers: eac h teac her is trained on the full longitudinal history to sp ecialize in one prediction horizon, while the student receives only a reconstructed history deriv ed from the curren t exam. This allows the studen t to inherit horizon-dependent longi- tudinal risk cues without requiring prior screening exams at deploymen t. Our new Privileged History Distillation (PHD) metho d is v alidated on a large longitudinal mammography dataset with m ulti-year cancer out- comes, CSA W-CC, comparing full-history and no-history baselines to their distilled coun terparts. Using time-dependent A UC across horizons, our privileged history distillation metho d markedly improv es the performance of long-horizon prediction ov er no-history models and is comparable to that of full-history mo dels, while using only the current exam at inference time. Co de: h ttps://github.com/BanafshehKarimian/PHD Keyw ords: Mammograph y , Risk of Cancer, Longitudinal Mo deling, Multi-T eacher Knowledge Distillation. 1 1 In tro duction Breast cancer is one of the most common cancers w orldwide and remains a ma jor cause of cancer-related mortality [1, 5, 19, 4]. Detecting breast cancer at earlier, asymptomatic stages through screening is strongly asso ciated with impro ved prognosis and reduced mortality [3]. Mammography remains the pri- mary imaging mo dality for breast cancer screening due to its relatively lo w cost, wide accessibility , and established clinical utility compared with other mo dalities suc h as MRI, ultrasound, or tissue-based diagnostic procedures (e.g., pathology examinations) [18, 10, 14]. Consequen tly , accurate prediction of future cancer risk from mammograms is an imp ortant comp onent of eﬀective early screen- ing strategies. Recen t deep learning approaches [9, 11, 13, 17] hav e improv ed p erformance in b oth immediate and long-term risk p rediction compared with traditional statistical metho ds [8]. Longitudinal imaging history pla ys an important role in clinical risk assess- men t, since changes observed across successiv e screening exams, including the app earance or progression of subtle ﬁndings, provide v aluable information ab out disease developmen t. Ho wev er, most state-of-the-art deep learning approaches for cancer risk prediction based on mammograms only analyze a single screening examination, and do not use longitudinal data during training [21]. More re- cen t longitudinal mo dels (LoMaR [6] and VMRA [16]) use prior mammograms during training and inference to capture temp oral evolution in breast tissue and ac hieve impro ved predictiv e performance compared with single-exam metho ds. Despite their promising performance, longitudinal mammograph y models t ypically require prior screening exams at inference time. In real-world clinical settings, ho w ever, longitudinal history is often incomplete [12] or una v ailable due to irregular screening attendance, ﬁrst-time examinations, or imaging p er- formed across multiple institutions. Consequen tly , a patient’s medical record ma y only contain a single recent mammogram, even though longitudinal mo d- els are trained assuming access to mammograms from m ultiple prior exams. The absence of historical context at test time can substantially degrade p erformance [6]. As sho wn in Fig. 1, b oth LoMaR and VMRA achiev e a high level of per- formance when a longer and consistent screening history is av ailable. Masking prior exams at test time leads to a clear degradation in long-horizon predictiv e accuracy , highligh ting their dependence on longitudinal con text and limiting applicabilit y when only the current mammogram is av ailable. This gap b etw een mo dels trained without longitudinal data and mo dels that require it at inference time motiv ates approac hes that can exploit historical information during training, while remaining accurate when prior exams are una v ailable at deplo yment. W e address this b y in tro ducing Privileged History Distillation (PHD), a longitudinal training metho d that transfers temp oral risk signals from full-history data to a stude n t model operating on reconstructed history . Speciﬁcally , w e use m ultiple horizon-speciﬁc teacher branches on com- plete screening sequences to learn sp ecialized long-term risk representations p er future year, and distill their predictions into a student mo del that receiv es only the curren t mammogram and its reconstructed history from the current exam. 2 As a result, the studen t learns horizon-aw are longitudinal risk patterns without requiring prior exams at test time, bridging the gap b etw een no-history inference and full-history training paradigms. 2 Related W ork Early deep learning approaches for mammography-based risk prediction typ- ically analyze a single screening examination. Mirai [21], predicts m ulti-year breast cancer risk directly from screening mammograms and ac hieves strong predictiv e p erformance. Ho w ever, such single-exam models do not use longitu- dinal imaging history , thus hav e limited p erformance in long-horizon risk pre- diction [6]. T o address this limitation, subsequen t w ork introduced longitudinal mammograph y risk prediction models that explicitly use prior screening exams. LoMaR (Longitudinal Mammogram Risk) [6] is designed to use an arbitrary n umber of previous screening visits and mo del temp oral progression in mam- mographic app earance. Each mammogram is pro cessed using a shared image enco der, and multi-view features are aggregated within eac h visit to form visit- lev el em b eddings. These em b eddings are then mo deled as a temporal sequence using a transformer-based visit aggregator to capture longitudinal ev olution across screening years, follo wed by an additive hazard mo dule for m ulti-y ear risk prediction. More recen t w ork has explored alternativ e temporal mo deling strategies. VMRA-MaR [16] in tro duces a recurrent longitudinal framew ork based on a Vision-Mam ba RNN to mo del temp oral dep endencies across screening exams. Instead of positional enco ding and global attention, VMRA up dates recurrent hidden states across visits to capture imaging progression o ver time. Despite their strong p erformance, existing longitudinal mo dels rely on the av ailability of prior exams at inference time. LoMaR has shown that predictive performance decreases as few er historical exams are av ailable, with no-history settings ap- proac hing the p erformance of single-exam models. This highlights the strong dep endence of longitudinal risk prediction on historical con text and the need Figure 1: Partial A UC at 10% FPR (pAUC@10%) for LoMaR and VMRA at 4- and 5-year horizons as a function of av ailable screening history . 3 Figure 2: Prop osed PHD metho d for longitudinal risk prediction in mammog- raph y . Visit em b eddings are extracted from each exam (mammogram), and missing historical embeddings are predicted from the current exam. The gener- ated sequence is aggregated b y a longitudinal mo del and passed to an additive hazard la yer for m ulti-y ear risk prediction, with a frozen true-history multi- teac her path wa y providing per-horizon distillation. for metho ds that remain eﬀectiv e when prior exams are missing. 3 Prop osed Privileged History Distillation Metho d Let us consider longitudinal mammography risk prediction under missing his- tory at inference. I T V , where V could be Righ t/Left CC/MLO views of the standard bilateral views of a mammogram and T ∈ { 0 , − 1 , . . . , − T h } denotes the screening y ear relativ e to the curren t exam. During training, a sub ject ma y ha ve access to a sequence of exams across multiple years. Standard longitudi- nal risk models aggregate this full sequence to predict multi-y ear cancer risk. Ho wev er, at inference time, only the curren t exam I 0 V ma y be a v ailable, thus w e assume prior exams are missing during inference. Our ob jective is to learn a mo del that uses the full longitudinal history during training while pro ducing accurate risk predictions from I 0 V alone at inference. The ov erall framework of the prop osed PHD metho d is illustrated in Fig- ure 2. F ollowing [6, 16], each screening exam is enco ded in to a visit em b edding, yielding a sequence corresp onding to the av ailable history . T o use prior exams only as privileged information during training, PHD learns a history predic- tion mo dule that distills eac h exam em b edding and predicts missing historical 4 em b eddings from the current one. The resulting sequence, composed of the cur- ren t embedding and predicted history , is aggregated by a longitudinal encoder to pro duce a temp oral represen tation for risk prediction. T o preserve temp oral risk information from full-history training data, PHD uses a shared-parameter teac her–student training scheme. T o preserve temp oral risk information from full-history training data, PHD adopts a horizon-sp eciﬁc multi-teac her distilla- tion scheme. F or each prediction horizon k , a dedicated teacher exp ert is trained using the true longitudinal history sequence and optimized only for that hori- zon, to b e used during student training. In parallel, the student branch op erates on the reconstructed history sequence derived from the curren t exam. The stu- den t is supervised b y the corresp onding horizon-sp eciﬁc teacher through logit distillation, enabling transfer of longitudinal information into missing-history setting. Image and Exam Representation Learning: T o obtain embeddings repre- sen ting each exam year, PHD uses the Mirai[21] image enco der, denoted as f ϕ ( · ) , and k eeps it frozen during training. Each mammographic view I T V , is pro cessed to obtain view-level features, z T V = f ϕ ( I T V ) . View-lev el features within each visit are aggregated using the Mirai view trans former to pro duce a compact visit em- b edding. This yields a sequence of yearly represen tations { x 0 , x − 1 , ..., x − T h } , where each x T summarizes all av ailable views for screening year T . Historical Em b edding Distillation: Giv en the visit em b eddings, PHD in- tro duces a distillation module that predicts historical representations from the curren t exam embedding. Speciﬁcally , it learns a mapping: f ψ : x 0 → ˆ x t , t ∈ {− 1 , ..., − T h } , which reconstructs embeddings, { ˆ x − 1 , ..., ˆ x − T h } , corresp onding to prior screening years. This mo dule helps the mo del to infer longitudinal con- text even when historical exams are missing. T o train the distillation mo dule, PHD uses a mean squared error (MSE) feature distillation loss: L feature KD = 1 T X t ∈{− 1 , − 2 , − 3 , − 4 }   x t − ˆ x t   2 2 . (1) Longitudinal Aggregation and Risk Prediction: The prop osed PHD framew ork is agnostic to the sp eciﬁc longitudinal aggregation arc hitecture and can b e com bined with an y backbone that pro duces a uniﬁed temporal rep- resen tation from a sequence of visit em beddings. Giv en the selected history sequence { x 0 , ˜ x − 1 , . . . , ˜ x − T h } , the longitudinal encoder A θ ( · ) will produce a compact history representation q T = A θ  { x 0 , ˜ x − 1 , . . . , ˜ x − T h }  . Risk prediction is performed using an additive hazard surviv al head, as commonly adopted in mammography risk mo deling. The aggregated representation is mapp ed to a baseline risk term and horizon-sp eciﬁc non-negativ e hazard increments H k ( q T ) = σ +  w ⊤ k q T + b k  , with activ ation function of σ + and cumulativ e risk 5 up to horizon k : P ( t cancer ≤ k | q T ) = B ( q T ) + k − 1 X i =0 H i ( q T ) . (2) T raining relies on a horizon-wise weigh ted binary cross-entrop y loss. Let y ∈ { 0 , 1 , − 1 } K denote lab els with unknown entries mask ed b y m k = ⊮ [ y k  = − 1] , and let w k b e the p ositive class w eight for horizon k . With predicted probabil- ities ˆ p k , the sup ervised Re-w eighted Cross En tropy (R CE) loss [7] is: L RCE = 1 P k m k X k m k [ − w k y k log ˆ p k − (1 − y k ) log(1 − ˆ p k )] . (3) P er-Horizon Logit Distillation: Multi-horizon risk prediction is t ypically form ulated as a multi-task problem, which can lead to optimization trade-oﬀs across horizons and reduced accuracy for individual time p oin ts [20]. PHD there- fore trains separate uni-task teac her exp erts, eac h optimized for a single hori- zon using full-history input, whic h empirically yields stronger horizon-sp eciﬁc p erformance. F or eac h prediction horizon k , the corresp onding frozen teacher exp ert pro cesses the true history sequence { x 0 , x − 1 , . . . , x − T h } using the same longitudinal aggregation and additive hazard arc hitecture to pro duce teacher logits z tea k . In parallel, the student pathw ay op erates on the generated (recon- structed) history sequence and pro duces logits z stu k . PHD applies a per-horizon logit distillation loss based on KL divergence: L logit KD = 1 P k ∈K m k X k ∈K m k KL  σ ( z tea k ) ∥ σ ( z stu k )  , (4) where K denotes the set of prediction years, m k is the lab el-v alidity mask, and σ ( · ) denotes the sigmoid function conv erting logits to risk probabilities. Eac h teac her expert sup ervises only their corresp onding horizon, providing stronger and more sp ecialized guidance than m ulti-task teachers. This horizon-aligned sup ervision enables accurate long-horizon risk estimation even when only cur- ren t or reconstructed history is av ailable at inference. The total loss is as follows: L total = L RCE + λ l L logit KD , (5) where λ l is a co eﬃcient, deﬁning the imp ortance of the distillation loss. 4 Exp erimen tal V alidation Exp erimen tal Methodology: The prop osed PHD metho d is ev aluated on the CSA W-CC (Karolinsk a case–con trol) dataset [15], the same longitudinal mammograph y cohort used in LoMaR and VMRA to ensure reproducibility and fair comparison. CSA W-CC is derived from the Cohort of Screen-Aged W omen (CSA W) [2] and con tains longitudinal screening mammograms with 6 exam dates. Eac h examination includes the standard four views. The dataset has 19,328 screening exams from 7,353 individuals, with 1,413 asso ciated with a future breast cancer diagnosis, enabling m ulti-year longitudinal risk modeling. Since CSA W-CC does not pro vide an oﬃcial test split, w e follow prior work and p erform patien t-level random splits to prev ent data leak age (80% for train- ing, 20% for testing, with 25% of the training for v alidation). T o obtain stable estimates, we rep eat the splitting procedure 10 times and rep ort the av erage p erformance. F or v alidation, the LoMaR [6] proto col was adopted. It samples a single exam p er patient in the test set to a void horizon bias. Sampling is rep eated 100 times, and the results are av eraged. Hyp erparameters for the longitudinal backbone follo w the settings rep orted in LoMaR and VMRA. PHD introduces an addi- tional hyperparameter, λ l , whic h is tuned on the v alidation set. Models are trained using Adam with an initial learning rate of 10 − 3 and cosine scheduling. T raining is performed for up to 30 epo chs with early stopping after 5 ep o chs without v alidation improv ement. F or the f ψ 0 → t ( · ) , 3 fully connected lay ers were used with drop out of 0.1 and ReLU activ ation function. Results and Discussion: T able 1 rep orts p erformance across history av ail- abilit y settings. As exp ected, longitudinal mo dels p erform b est with full history ( # H =4 ), while removing history ( # H =0 ) degrades 4–5 year prediction, show- ing the imp ortance of temp oral context. LoMaR+PHD and VMRA+PHD sub- stan tially mitigate this loss despite using no true history at inference. Relativ e to LoMaR # H =0 and VMRA # H =0 , the distilled mo dels consisten tly im- pro ve long-horizon F ull AUC and pAUC (5-year: LoMaR+PHD 0 . 829 → 0 . 853 / 0 . 711 → 0 . 752 ; VMRA+PHD 0 . 829 → 0 . 855 / 0 . 710 → 0 . 757 ), matc hing or exceed- ing full-history mo dels. Statistical tests show no signiﬁcant pA UC diﬀerences T able 1: Mean ± std AUC and pA UC@10% for 1–5 y ear risk prediction on CSA W-CC across folds. #H shows the num b er of prior exams at inference. Met. Mo del #H 1y 2y 3y 4y 5y F ull AUC LoMaR (MICCAI,24) 4 0.914 ± 0 . 023 0.865 ± 0 . 020 0.851 ± 0 . 017 0.841 ± 0 . 019 0.851 ± 0 . 016 VMRA (MICCAI,25) 4 0.920 ± 0 . 019 0.868 ± 0 . 020 0.851 ± 0 . 017 0.842 ± 0 . 017 0.851 ± 0 . 017 Mirai 0 0.924 ± 0 . 020 0.872 ± 0 . 016 0.853 ± 0 . 015 0.837 ± 0 . 014 0.829 ± 0 . 015 LoMaR (MICCAI,24) 0 0.922 ± 0 . 020 0.873 ± 0 . 016 0.853 ± 0 . 015 0.837 ± 0 . 014 0.829 ± 0 . 015 VMRA (MICCAI,25) 0 0.922 ± 0 . 020 0.872 ± 0 . 017 0.853 ± 0 . 016 0.836 ± 0 . 015 0.829 ± 0 . 015 LoMaR+PHD 0 0.913 ± 0 . 022 0.865 ± 0 . 019 0.852 ± 0 . 016 0.845 ± 0 . 015 0.853 ± 0 . 015 VMRA+PHD 0 0.920 ± 0 . 018 0.869 ± 0 . 018 0.852 ± 0 . 016 0.847 ± 0 . 015 0.855 ± 0 . 017 pAUC LoMaR (MICCAI,24) 4 0.817 ± 0 . 023 0.749 ± 0 . 020 0.738 ± 0 . 018 0.731 ± 0 . 018 0.740 ± 0 . 018 VMRA (MICCAI,25) 4 0.822 ± 0 . 020 0.752 ± 0 . 020 0.736 ± 0 . 018 0.728 ± 0 . 019 0.745 ± 0 . 021 Mirai 0 0.824 ± 0 . 023 0.753 ± 0 . 019 0.735 ± 0 . 018 0.715 ± 0 . 018 0.711 ± 0 . 021 LoMaR (MICCAI,24) 0 0.824 ± 0 . 023 0.754 ± 0 . 019 0.735 ± 0 . 018 0.715 ± 0 . 018 0.711 ± 0 . 020 VMRA (MICCAI,25) 0 0.823 ± 0 . 023 0.752 ± 0 . 019 0.734 ± 0 . 018 0.714 ± 0 . 019 0.710 ± 0 . 021 LoMaR+PHD 0 0.810 ± 0 . 031 0.744 ± 0 . 024 0.734 ± 0 . 015 0.735 ± 0 . 020 0.752 ± 0 . 019 VMRA+PHD 0 0.818 ± 0 . 020 0.749 ± 0 . 017 0.733 ± 0 . 017 0.734 ± 0 . 017 0.757 ± 0 . 018 7 Figure 3: a) Ablation studies showing that m ulti-teacher supervision (Student (5 teacher)) yields the strongest gains, particularly at the 5-year horizon, b) Comparing ROC curv es for VMRA and LoMaR under v arying history av ail- abilit y and the prop osed distilled-history mo del. Although VMRA and LoMaR p erformance increases with more prior exams, VMRA+PHD and LoMaR+PHD ac hieves the strongest sensitivity in the lo w-FPR despite op erating without his- tory , matching or exceeding the full-history model. at y ears 1–3 (VMRA p = 0 . 7 , 0 . 7 , 0 . 7 ; LoMaR p = 0 . 3 , 0 . 4 , 0 . 9 ), but signiﬁcant gains at longer horizons (VMRA p = 0 . 03 , 8 × 10 − 5 ; LoMaR p = 0 . 04 , 3 × 10 − 4 ). Figure 3 (b) compares VMRA and LoMaR under diﬀerent history a v ailabil- it y settings and highlights the low false-p ositive regime. Although VMRA and LoMaR with full history ( # H =4 ) is a strong baseline, the prop osed model (VMRA+PHD and LoMaR+PHD, # H =0+ ) op erates without any prior ex- ams at inference time and still achiev es the b est op erating characteristics where screening systems are t ypically used. In particular, across lo w false-positive rates, the VMRA+PHD and LoMaR+PHD curv e lies ab o ve all other v ari- an ts, including the full-history mo del, indicating higher sensitivity at the same FPR. This dominance in the low-FPR region is consistent with the pAUC im- pro vemen ts and suggests that the distilled-history pathw a y learns to inject risk- relev an t longitudinal con text into the single-exam setting. Ablation results (Figure 3 (a)) further clarify the con tribution of PHD. Re- mo ving distillation (Studen t (no KD)) yields only marginal impro vemen t o ver the base longitudinal mo del at b oth horizons. Using only a single full-horizon teac her pro vides moderate gains, while m ulti-teacher distillation (Studen t (5 teac her)) achiev es the strongest performance, particularly at the 5-y ear hori- zon. F or VMRA, pAUC increases from 0.745 (base) to 0.757 with ﬁve teachers, and for LoMaR from 0.740 to 0.752. Improv ements at 4 years are smaller but consisten t. These trends indicate that horizon-sp eciﬁc sup ervision contributes complemen tary temp oral information, and that aggregating multiple teachers is imp ortan t for transferring long-horizon risk structure in to the student. 5 Conclusion This pap er introduces a privileged-history longitudinal metho d for breast cancer risk prediction under missing prior exams. The prop osed PHD metho d transfers 8 temp oral information to reconstruct previous history , and then distills horizon- sp eciﬁc teachers into a student op erating on the curren t exam only , com bin- ing history reconstruction with horizon-aw are distillation. On the CSA W-CC dataset, PHD impro ves long-horizon prediction ov er no-history baselines and approac hes full-history p erformance, esp ecially in low false-p ositive regions. Re- sults indicate that longitudinal risk information can b e eﬀectively distilled into single-exam inference, enabling practical m ulti-year risk prediction when prior screening exams are una v ailable. 9 References [1] Bra y , F ., La versanne, M., Sung, H., F erla y , J., Siegel, R.L., So erjomataram, I., Jemal, A.: Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortalit y worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 74 (3), 229–263 (May 2024) [2] Dem brow er, K., Lindholm, P ., Strand, F.: A m ulti-million mammogra- ph y image dataset and p opulation-based screening cohort for the training and ev aluation of deep neural netw orks—the cohort of screen-aged w omen (csa w). Journal of Digital Imaging 33 (2), 408–413 (Sep 2019) [3] Ginsburg, O., Yip, C., Bro oks, A., Cabanes, A., Caleﬃ, M., Dunstan Y at- aco, J.A., Gya w ali, B., McCormack, V., McLaughlin de Anderson, M., Mehrotra, R., Mohar, A., Murillo, R., Pace, L.E., Pask ett, E.D., Romanoﬀ, A., Rositc h, A.F., Scheel, J.R., Schneidman, M., Unger-Saldaña, K., V an- derpuy e, V., W u, T., Y uma, S., Dv aladze, A., Duggan, C., Anderson, B.O.: Breast cancer early detection: A phased approach to implementation. Can- cer 126 (S10), 2379–2393 (Apr 2020) [4] Grasso, A., Altomare, V., Fiorini, G., Zompanti, A., Pennazza, G., Santon- ico, M.: Inno v ativ e methodologies for the early detection of breast cancer: A review categorized by target biological samples. Biosensors 15 (4), 257 (Apr 2025) [5] Hak ama, M., Coleman, M.P ., A lexe, D.M., Auvinen, A.: Cancer screening: evidence and practice in europ e 2008. Eur. J. Cancer 44 (10), 1404–1413 (Jul 2008) [6] Karaman, B.K., Do delzon, K., Ak ar, G.B., Sabuncu, M.R.: Longitudinal mammogram risk prediction. In: International Conference on Medical Im- age Computing and Computer-Assisted Interv ention. pp. 437–446. Springer (2024) [7] Karaman, B.K., Mormino, E.C., Sabuncu, M.R.: Mac hine learning based m ulti-mo dal prediction of future decline to ward alzheimer’s disease: An empirical study . PLOS ONE 17 (11), e0277322 (Nov 2022) [8] Kim, G., Bahl, M.: Assessing risk of breast cancer: A review of risk pre- diction mo dels. J. Breast Imaging 3 (2), 144–155 (Mar 2021) [9] Kim, H., Lim, J., Kim, H.G., Lim, Y., Seo, B.K., Bae, M.S.: Deep learning analysis of mammography for breast cancer risk prediction in asian w omen. Diagnostics (Basel) 13 (13), 2247 (Jul 2023) [10] Laub y-Secretan, B., Sco ccian ti, C., Lo omis, D., Benbrahim-T allaa, L., Bou- v ard, V., Bianchini, F., Straif, K.: Breast-cancer screening — viewpoint of the iarc working group. New England Journal of Medicine 372 (24), 2353–2358 (Jun 2015) 10 [11] Mendes, J., Oliveira, B., Araújo, C., Galrão, J., Mota, A.M., Garcia, N.C., Matela, N.: Deep learning in breast cancer risk prediction: a re- view of recent applications in full-ﬁeld digital mammography . F ront. Oncol. 15 (1656842), 1656842 (Sep 2025) [12] Reece, J.C., Neal, E.F.G., Nguy en, P ., McIn tosh, J.G., Emery , J.D.: De- la yed or failure to follo w-up abnormal breast cancer screening mammo- grams in primary care: a systematic review. BMC Cancer 21 (1) (Apr 2021) [13] San teramo, R., Damiani, C., W ei, J., Montana, G., Brentnall, A.R.: Are b etter AI algorithms for breast cancer detection also better at predicting risk? a paired case-control study . Breast Cancer Res. 26 (1), 25 (F eb 2024) [14] Smith, R.A.: Exp ert group: Iarc handbo oks of cancer preven tion. v ol.7: Breast cancer screening. lyon, france: Iarc; 2002. 248pp.: Isbn 92 832 3007 8. Breast Cancer Researc h 5 (4) (Aug 2003) [15] Strand, F.: Csaw-cc (mammography) – a dataset for ai research to improv e screening, diagnostics and prognostics of breast cancer (2022) [16] Sun, Z., Thrun, S., Kampﬀmeyer, M.: V mra-mar: An asymmetry-a ware temp oral framew ork for longitudinal breast cancer risk prediction. In: In ternational Conference on Medical Image Computing and Computer- Assisted Interv en tion. pp. 660–669. Springer (2025) [17] Thrun, S., Hansen, S., Sun, Z., Blum, N., Salahuddin, S.A., Wic kstrøm, K., W etzer, E., Jenssen, R., Stille, M., Kampﬀmey er, M.: Reconsidering explicit longitudinal mammograph y alignment for enhanced breast cancer risk prediction. In: International Conference on Medical Image Computing and Computer-Assisted Interv ention. pp. 495–505. Springer (2025) [18] Wilkinson, A.N., Mainprize, J.G., Y aﬀe, M.J., Robinson, J., Cordeiro, E., Lo ok Hong, N.J., Williams, P ., Moideen, N., Renaud, J., Seely , J.M., Rush- ton, M.: Cost-eﬀectiveness of breast cancer screening using digital mam- mograph y in canada. JAMA Netw. Op en 8 (1), e2452821 (Jan 2025) [19] Wilkinson, L., Gathani, T.: Understanding breast cancer as a global health concern. Br. J. Radiol. 95 (1130), 20211033 (F eb 2022) [20] Xin, D., Ghorbani, B., Gilmer, J., Garg, A., Firat, O.: Do curren t multi- task optimization metho ds in deep learning even help? Adv ances in neural information pro cessing systems 35 , 13597–13609 (2022) [21] Y ala, A., Mikhael, P .G., Strand, F., Lin, G., Smith, K., W an, Y.L., Lam b, L., Hughes, K., Lehman, C., Barzilay , R.: T o ward robust mammograph y- based mo dels for breast cancer risk. Sci. T ransl. Med. 13 (578), eaba4373 (Jan 2021) 11

Longitudinal Risk Prediction in Mammography with Privileged History Distillation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment