Promptable segmentation with region exploration enables minimal-effort expert-level prostate cancer delineation

Promptable segmen tation with region exploration enables minimal-eﬀort exp ert-lev el prostate cancer delineation Junqing Y ang 1 , Natasha Thorley 2 , Ahmed Nadeem Abbasi 3 , Shonit Pun wani 2 , Zion Tse 4 , Yip eng Hu 1 , Shaheer U. Saeed 1,4* 1 UCL Ha wkes Institute; Departmen t of Medical Physics and Biomedical Engineering, Univ ersit y College London, UK. 2 Cen tre of Medical Imaging, Universit y College London, UK. 3 Departmen t of Oncology , Aga Khan Uni versit y Hospital, P akistan. 4 Cen tre for Bio engineering; School of Engineering and Mat er i al s Science, Q ue en Mary Universit y of London, UK. *Corresp onding author(s). E-mail(s): shaheer.saeed@qmul.ac.uk ; Abstract Purp ose: Accurate segmentation of prostate cancer on magnetic re so n a n c e (MR) images is crucial for planning image-guided in terven tions suc h as tar- geted biopsies, cryoablation, and radiothera py . How ever, subtle and v ariable tumour appearances, diﬀerences in imaging proto cols, and limited exp ert a v ail- ability make consistent in terpretation diﬃcult. While automated methods aim to address this, they rely o n large exp ertly-annotated datasets that are often incon- sistent, whereas manual delineation remains lab our-in tensive. This work aims to bridge the gap betw een automated and manual segmentation through a frame- w ork driven by user-p rovided p oin t prompts, enabling accura te segmentation with minimal annotation eﬀort. Metho ds: The framew ork combines reinforc em ent learni n g (RL) with a regi o n - growing segmentation pro cess guided b y user prompts. Starting from an i n it ia l p oin t prompt, region-growing generates a preliminary segmentation, which is iteratively reﬁned through RL. At each step, the RL agent observes the image and current s eg me ntation to predict a new p oin t, from which region growing up dates the mask. A reward, balancing segmentation accuracy and vo xel-wise uncertaint y , encourages exploration of ambiguous regions, allo wing the agent to escap e lo cal optima and p erform sample-sp eciﬁc optimisa t io n . Despite requiring 1 fully sup ervised training, the framew ork bridges manual and fully automated seg- mentation at inference b y substantially reducing user eﬀort while outperformin g current fully automated metho ds. Results: The framework was ev aluated on tw o public prostate MR datasets (PROMIS and PICAI, wit h 566 and 1090 cases). It outp erformed the previous b est automated metho ds by 9.9% and 8.9%, respectively , with performance com- parable to man ual radiologist segmen tation, reducing annotation time tenfold. Conclusion: By combining prompting with RL–driven exploration, the frame- w ork achiev es radiolo g i st -l evel prostate cancer segmentation with a fraction of the annotation eﬀort, highlighting the p oten tial of RL to enable adaptive and eﬃcient cancer delineation. Co de: github.com/JQ- Sakura/prostate- rl- segmentation Keyw ords: Promptable Segmen tation, Prostate Cancer, Reinforcement Learning, Deep Learning 1 In tro duction Magnetic resonance (MR) imagi ng plays a central role in interv en tional planning, for b oth diagnosi s and treatment of prostate cancer [ 1 ]. Ho w ever, accurat e interpretation of prostate MR images is par ti c ul ar l y challenging due to the v ariable and often subtle imaging app earances of cancerous tissue. This is echoed b y the low rep orted sensi- tivity , even in exp ert readings [ 1 ]. The task is furth er complicated by diﬀerences i n acquisition protocols, imaging equipment, and radiologist training [ 2 ]. Reliable assess- men t therefore requires s p ecialist exp ertise, whic h is severely limited, esp ecially in resource-constrained regions [ 3 ]. Even when exp ertise is av ailable, substantial annota- tion burden and in ter-observer v ariabilit y p ersist, reﬂecting the inherent diﬃculty of the task [ 3 ]. This, com bined with the signiﬁcant manual eﬀort required for suc h assess- men t, leads to inconsistent reporting, wh ich undermines the p oten tial of MR imaging to guide timely and eﬀective in terven tional pro cedures [ 1 ]. Automation has b een widely explored to reduce diagnostic v ariab i l ity and acceler- ate rep orting workﬂo ws, particularly thr ou gh automated tumour b oundary delineation or segmen tation [ 4 ]. How ev er, fully automated systems typically dep end on large, meticulously lab elled datasets for sup ervised trai ni n g [ 5 ]. Curating such datasets is hamp ered b y the limited a v ailabilit y of expe r ts , the signiﬁcant time burden of delineat- ing cancerous tissue, and high inter-annotator v ariability [ 5 ]. As a result, aut omat ed mo dels often inherit biases an d inconsistencies from their training data, leading to unreliable p erformance in clinical p rac t ic e [ 4 ]. Semi-automated me th o ds seek to mitigate some of the v ariability an d burden of man ual segmentation, by incorp orating user gu i dance in to the automated segmen- tation pro cess [ 6 , 7 ]. Recently , promptable meth o ds hav e shown promise for v arious vision tasks, where user inputs, such as p oin t or b ounding b o x prompts, guide seg- men tation [ 8 ]. This is distinct from interactiv e segmentation where users may need to reﬁne segmen tation masks iterativ ely . In contrast, prompt abl e metho ds necessitate only one, or a few, prompts to initiate automated segmentation. Most promptable 2 mo dels are developed for general vision or anatom ic al segmen tation tasks [ 7 , 8 ], where image v ariabilit y is relatively lo w compared to pathological cancer segmentation. These approac hes are typically built on con ven tional deep learning frameworks, which pri- marily learn global dataset-level patterns making them prone to conv erging to ward lo cal optima when datasets are limited or imperfect [ 6 , 9 ]. F or prostat e cance r, wh er e app earance v ariabilit y is high and gathering suﬃciently-sized datasets is challenging, suc h dataset-level strategies rarely ac hieve clinically acceptable p erformance [ 9 , 10 ]. T o o vercome these limitations, we prop ose a reinforcement learning (RL)-based promptable segmentation framework tailored for prostate cancer on MR images. The metho d formulates segmentation as a dynamic sequential pro cess, where a RL agent iteratively reﬁnes a segmentation mask guided by user-pro vided p oin t prompts. Each prompt initiates a region-gro wing mo dule [ 11 ] that pro duces an initial segmen tation of the region of in terest (ROI). The agen t then observes b oth the image and curren t seg- men tation to predict a new seed lo cation exp ected to improv e the result, from which region growing is re-applied to up date the mask. Reinforcement learning enables this pro cess to incorp orate exploration, gu i de d by vo xel-wise entrop y , allowing the agent to searc h challenging or uncertai n regions and escap e dataset-level lo cal optima aris- ing fr om limit ed or imp erfect data [ 9 , 12 – 14 ]. Through this explorati on–e xp l oi tati on balance, the mo del can eﬃciently search for and conv erge on optimal sample-sp eciﬁc segmen tation solutions, ev en in challenging high v ariabilit y cases. The framework is summarised in Fig. 1 . At inference time, the prop osed framework bri dge s the gap b et ween man ual and fully automated segmentation by approaching exp ert-level accuracy with only sparse user in teraction, substantially reducin g annotati on eﬀort compared to full manual delineation. F ull y su p ervised annotations are still required during training; the primary b eneﬁt lies in reducing e xpert eﬀort during deploymen t rather than dataset cur at i on. RL has previously been applied in medical imaging, for example for landmark lo calisation, where agents are trained to identify ﬁxed anatomical p oin ts through sequential search [ 15 , 16 ]. Segmentation, ho wev er, is fundamen tally diﬀ er ent, as it requires reﬁnement of spatially extended region -of -i nterest b oundaries rather than lo calisation of a single target. In this work, RL is introduced to explicitly incorp o- rate exploration–exploitation into promptable segmentation, enabling the agen t to escap e dataset-level lo cal optima by actively exploring uncertain regions on a p er- sample basis during inference. This capabilit y is particularly imp ortan t for pathological segmen tation, where tumour app earances are heterogeneous and training data are lim- ited, making single -p ass foundation and sup ervised models, that rely on dataset-level trends, prone to fail ur e on atypical cases. The contributions of this work are summarised: 1) developing a nov el promptable segmen tation mec hanism, using RL to allo w sample-sp eciﬁc optimisation , for prostate cancer segmentation on MR images; 2) ev aluating the prop osed metho d for prostate cancer segmentation using tw o publicly av ailable datasets of 566 and 1090 prostate cancer patient MR images; 3) demonstrating p erformance that substantially exceeds recen t state-of-the-art metho ds such as nnUNet, UNeT r and Combiner; 4) demon- strating p erformance that ap pr oaches the level of exp ert radiologists while requiring 3 minimal annotation eﬀort at inference; 5) op en-source implementation av ailable at: gith ub.com/JQ- Sakura/prostate - r l - se gme ntation . 2 Metho ds Fig. 1 : An ov erview of the prop osed promptable segmentation using RL. 2.1 Deﬁnitions for the image, v o xel, and segmen tation Let x ∈ X ⊆ R H × W × D denote a volumetric image with spatial dimensions ( H, W , D ), where X is the space of images. Its corresp onding segmentation mas k is deﬁned as y ∈ Y ⊆ { 0 , 1 } H × W × D , where Y is the space of segmentation masks. Each vo xel is indexed by v ∈ V ⊆ Z 3 , where V i s the space of vo xel indices and x ( v ) denotes the in tensity at vo xel v . The goal of the framework is to generate an optimal segmen tation that delineates the p r ost at e cancer region within the image. 2.2 Eﬃcien t exploration of segmen tation using region gro wing W e deﬁne a region- growing op erator that expands a segmentation from a single seed v oxel. Let the seed be v s ∈ V . The region-growing mapping is written as g ( · ) : X × V → Y , where for a sp eciﬁc sam pl e the segmentation from region-growing is giv en by y = g ( x, v s ). The region-growing is itself cons id er ed non-parametric in our form ulation as an y dev elopment and parameter tuni n g happ ens prior to th e promptable segmen tation outlined in Sec. 2.3 Surr o gate se gmentation network and entr opy A surrogat e neural netw ork f ( · ; θ ) : X → [0 , 1] H × W × D parameterised by weigh ts θ , is emplo yed to provide v oxel-wise prob abi l i t ie s that guide the region-growing process. F or a given image x ∈ X , the netw ork pro duces a probability map y p = f ( x ; θ ), where y p ∈ [0 , 1] H × W × D . 4 While y p enco des mo del conﬁdence, it do es not explicitly capture un cer t ai nt y . T o quan tify v oxel-wise uncert ai nt y , an entrop y map y e is derived from y p as y e ( v ) = − [ y p ( v ) log y p ( v ) + (1 − y p ( v )) log(1 − y p ( v ))]. Low-en trop y v oxels corresp ond to con- ﬁden t predictions (probabili t i es n ear 0 or 1), while high-entrop y vo xels indicate am biguity (probabilities near 0.5). The surrogate netw ork f alw ays outputs a vo xel- wise probability map y p , from which an entrop y map y e is deterministically derived. F or brevity , this deriv ation is summarised simply as y e = f ( x ; θ ), whic h is a notational shorthand rather than a redeﬁnition of the netw ork output. The entrop y at each v oxel is then given b y y e ( v ), which measures the uncertaint y for that particular vo xel. The surrogate netw ork is trained separately in a fully sup ervised manner using a set of annotated training samples { x i , y i } N i =1 , where x i ∈ X and y i ∈ Y . The training ob jective minimises a loss function L : Y × Y → R , where the loss for a p art i cu l ar sample is giv en by L ( f ( x i ; θ ) , y i ), whic h is the Dice loss in our w ork. The optimisation to obtain optimal paramet er s θ ∗ is then conducted usi ng: θ ∗ = arg min θ 1 N N X i =1 L ( f ( x i ; θ ) , y i ) (1) After optimisation, parameters θ ∗ remain ﬁxed and f ( · ; θ ∗ ) is used to generate en tropy maps y e = f ( · ; θ ∗ ) for the r egi on- growing pro cess. The surrogate netw ork is trained indep enden tly and remains ﬁxed during RL training and inference. In future w ork, the use of pre-trained surrogate models could b e explored to r ed uc e computational ov erhead. 2.2.1 Region expansion for s egmentation In this su bs ec ti on , j denotes the region-growing iteration index. The segmentation y is initialised using the seed v oxel v s : y j =0 ( v ) = ( 1 , v = v s , 0 , otherwise. Eac h v o xel with y j ( v ) = 1 is considered included in the curren t ROI segmentation, where v ′ denotes an included vo xel. F or each included vo xel v ′ , the neighbourho o d is deﬁned as the set of vo xels in a lo cal win dow around v ′ , expressed as v = v ′ + ∆ v where ∆ v = (∆ a, ∆ b, ∆ c ) controls neighbourho o d radius (set as (3 , 3 , 3) in our work). F or eac h can d id at e vo xel v in the neighbourho od v ′ + ∆ v that is not yet included (i.e., y j ( v ) = 0), the inclusion condition is: y j +1 ( v ) = ( 1 , if σ x ( v ) < τ σ and y e ( v ) < τ e , y j ( v ) , otherwise. where σ x ( v ) is the neigh b ourhoo d in tensity standard deviation computed ov er all v oxels within v ± ∆ v , and y e ( v ) is the v o xel-wise en trop y ob t ai ne d from f ( x ; θ ∗ ). Both τ σ and τ e are hyper-parameters which are left unchanged from defaults in [ 11 ]. 5 This pro cess rep eats iteratively , where each iteration examines the neighbours of newly included vo xels. Region growing terminates when no new v oxels are added, i.e. ∥ y j +1 − y j ∥ 1 = 0, or when a predeﬁned maximum n um b er of iterations J is reac hed. Final segmentation after conv ergence is denoted as y = y J , at ﬁn al iteration index J . The en tire region-growing op erator c an b e summarised as y = g ( x, v s ), with y b eing the ﬁnal segmen tation mask, x the image and v s b eing the initial seed p oin t. 2.3 Reinforcemen t learning for promptable segmen tation RL is used to model promptable segmentation as a sequential pro cess, where an agent iteratively reﬁnes the segmen tation by selecting new seed p oin ts for region gro wing. A neural netw ork agent h ( · ; ϕ ) : S → A , p ar amet e ri s ed by weigh ts ϕ , deﬁnes the p olicy that maps the observed state s t ∈ S at time step t to the next action a t ∈ A , where S and A are the state and action spaces. State: Th e state observed by the agent at time step t is deﬁned as s t ∈ S , which consists of x the MR image and y t the current segmentation mask after the t -th step. This means that s t = ( x, y t ) and that S = X × Y . Action: A t each time step t , the agent selects an action a t ∈ A , which corres p onds to the next seed vo xel lo cation to ini t ial i se region gro wing. F ormally , the action is giv en b y a t = v s,t = h ( s t ; ϕ ), whic h means that A = V . State transition: After the agent selects an action a t = v s,t , the environmen t up dates the segme ntation by applying the region-growing op erator. The new stat e is denoted as s t +1 = ( x, y t +1 ), where y t +1 = g ( x, v s,t ) is obtained from the region- gro wing op erator. This completes one transition in the Mark ov decision process transitioning from the curren t state-action pair ( s t , a t ) to the next state s t +1 . Rew ard: The agent receives a scalar reward that quantiﬁes the improv emen t in seg- men tation after each action. F or a giv en time step t , the reward function is deﬁned as r : S × A → R , and the rew ard at time step t is given by: R t = r ( s t , a t ) = L ( y t , ˆ y ) − L ( y t +1 , ˆ y ) + β E v ∈ y t +1 [ y e ( v )] , (2) where L is the loss from Eq. ( 1 ) (see Sec. 2.2 ), ˆ y is the ground-truth segmen tation mask, and E v ∈ y t +1 [ y e ( v )] denotes the exp ectation ov er the vo xel-wi se entrop y v alues for all vo xels included in y t +1 from the region-growing operator. First t wo terms mea- sure improv ement in segmen tation compared to ground truth b et ween consecutive iterations and are collec ti vely cal l ed the dice reward, while the ﬁnal term encourages exploration in high-en trop y (low-conﬁdence) regions called the en tropy reward, where y e is the en tropy map obtained from the region-growing op erator. This en tropy reward encourages explorat i on of high-entrop y areas i.e., those wh er e the surrogate netw ork is uncer t ai n (vo xel classiﬁcation probabilities close to 0. 5 leading to high exp ected v alues of y e ( v )). The hyper-parameter β con trols balance b et w een exploitation and 6 exploration, where a higher β for exploration allows escaping lo cal minima caused by dataset-level trends. Note that the ﬁrst term rewards improv ed ov e rl ap (decrease in Dice loss), while th e entrop y b on us encourages exploration of uncertain regions. Optimisation The agent net w ork h ( · ; ϕ ) is trained to learn the optimal policy param- eters ϕ ∗ that max im is e the exp ected cumulativ e discoun ted reward ov er a ﬁnite time horizon T . The optimisation ob jectiv e is deﬁned as: ϕ ∗ = arg max ϕ E " T − 1 X t =0 γ t R t # , (3) where γ ∈ [0 , 1) is the discount factor controlling the trade-oﬀ b et ween immediate and future rew ards. The exp ectation is t aken ov er the st at e and action tra jectories induced by the p olicy h ( · ; ϕ ) lasting up to time-step T . In our implem entation, the optimisation is p er f orm ed using a p olicy-gradien t based approach [ 17 ], allowing the net work to iteratively improv e its seed-p oin t selection strategy to maximise the long- term segmentation p erformance. The tr ai n in g is summarised in Algo. 1 . Inference to obtain the ﬁ nal segmentation After training, the optimal p olicy h ( · ; ϕ ∗ ) is used to iterativ ely predict seed lo cations un til con v ergence. After the init i al user p oin t prompt v s, 0 , at each time step the agen t selects the next seed vo xel v s,t , whic h is passed to the region-gro wing op erator g ( · ) to up date the segmentation. The pro cess terminates when the segmentation mask stabilises, deﬁned as no further change in y t or up on reaching a maximum num b er of iterations T . The ﬁnal segmentation is obtained as y T = g ( x, v s,T ), where v s,T is the ﬁnal seed vo xel selected by the trained agen t and y T is the corresp onding segmentation. A t eac h decision step, the agent predicts a new seed vo xel, from which region gro wing is re-initialised to generate a new segmentation mask. The newly generated mask replaces the previous segmentation rather than being accumulated, allowing the agent to correct prior ov er- or under-segmentation and preven ting monotonic region growth. Although region gro wing uses ﬁxed parameters, adaptabili ty is achiev ed through iterative seed relocation guided by the RL p olicy and vo xel-wise entrop y , enabling sample-sp eciﬁc reﬁnement. 3 Exp erimen ts 3.1 Datasets PR OMIS: [ 1 ] consists of multi-parametric MR images from 566 p at i ents with sus- p ected prost at e cancer. T2-weigh ted (T2W), diﬀusion-w eighted (D WI), and apparen t diﬀusion coeﬃcient ( ADC ) sequences form separate c hannels. Images were centre- cropp ed and resampled to 128 × 128 × 128 vo xels with intensities normalised. V oxel-lev el annotations for s us p ected cancerous regions, conducted by imaging r es earc hers with agreed consensus, serv ed as ground truth, with negative cases having fully-negative segmen tation masks. During training initial seed points were sampled randomly withi n 7 Data: Image-lab el pairs { ( x i , ˆ y i ) } N i =1 , trained surrogate netw ork f ( · ; θ ∗ ) Result: T rained agent h ( · ; ϕ ∗ ) while not c onver ge d do Sample ( x, ˆ y ) from the training set; Obtain entrop y map y e = f ( x ; θ ∗ ); F or time step t = 0, initialise heuristic seed p oin t v s, 0 ; Apply region-growing for segmen tation y 0 = g ( x, v s, 0 ); Set state s 0 = ( x, y 0 ); for t ← 0 t o T − 1 do Select next action us in g agent h ( s t ; ϕ ) = v s,t = a t ; Apply region growing to get y t +1 = g ( x, v s,t ); Set next state s t +1 = ( x, y t +1 ); Compute reward R t = L ( y t , ˆ y ) − L ( y t +1 , ˆ y ) + β E v ∈ y t +1 [ y e ( v )]; Store transition ( s t , a t , R t , s t +1 ) in buﬀer; if ∥ y t +1 − y t ∥ 1 = 0 the n break end end Up date ϕ using arg max ϕ E h P T − 1 t =0 γ t R t i ; end Algorithm 1: RL training for promptabl e segmentation the lesion. F or inference, seed p oin ts were also randomly c hosen within the lesion, to simulate inter-observ er v ariab i li ty . Data was randomly split into developmen t and holdout sets with ratio 80:20, where heldout samples w ere used to rep ort performance. PICAI: [ 18 ] consists of multi-parametric MR images from 1090 patients from multiple in ternational centres. These images had the same channels as PR OMIS and w ere also resampled and nor mal i se d to the same ranges. Exp ert annotations for lab elled cases serv ed as ground truth, similar to PR OMIS, with negativ e cases ha ving fully-negativ e segmen tation mas k s. Seed p oin t lo cations were sampled in the same manner as the PR OMIS dataset. The data w as randomly split in to the dev el opme nt and holdout set s with ratio 80:20. The heldout samples were used to rep ort p erformance. 3.2 Net w ork arc hitectures and h yp er-parameters Our op en-source implementation, with details and hyper-parameter settings, is a v ail- able at: gith ub.com/JQ- Sakura/prostate - r l - se gme ntation . The training was conducted on tw o Nvidia T esla V100 GPUs, with s ur r ogate netw ork training and RL agent training lasting approximately 24 and 96 hours. Surrogate net w ork: F ollo wed a 3D UNe t architecture [ 19 ] with 4 do wnsampling and 4 upsampling blo c ks. Hyp er-parameters are rep orted in the supplementary materials. Agen t: The agent had tw o p ar ts , the actor and the critic and used the pro ximal p olicy optimisation algorithm for training [ 17 ]. Both the actor and critic netw orks adopt a shared 3D con volutional enco der follo wed by three fully connected lay ers. Eac h la yer is 8 follo wed by group normalization and LeakyReLU activ at i on . The enco der r e ce i ves four- c hannel v olumetric input, comprising the three MR mo d al it i es (T2W, DWI, ADC) and segmen tation mask. Hyp er-parameters are rep orted in the supplementary materials. 3.3 Comparisons W e compare our metho d (RL-PromptSeg) with commonly used segmentation meth- o ds, including SAM [ 8 ], SAM2 [ 20 ], SAM3 [ 21 ], MedSAM [ 7 ], MedSAM2 [ 22 ], MedSAM3 [ 23 ], Combiner [ 10 ], T2-predictor [ 24 ], Swin-UNeT r [ 25 ], Univ erSeg [ 26 ], UNet [ 19 ], nnUNet [ 27 ] and DinoV3 [ 28 ]. W e rep ort Dice scores (mean ± standard deviation) on hold ou t sets, with statistical signiﬁcanc e assessed using paired t-tests. F or reference, human exp ert p erformance from a second reader (3 years exp erience) w as measured on all 114 heldout samples in the PROMIS dataset, to estimate a human b enc hmark for performance. Promptable mo dels (SAM, MedSAM, and UniverSeg) w ere ﬁne-tuned using the dev elopment set. F or SAM and MedSAM, only the segmen- tation head (mask deco der) was ﬁne-tuned, with the bac kb one enco der kept f roz en , using the Adam optimiser with an initial learning rate of 1 e − 5 and batch size of 64, follo wing the re comm en de d proto cols in the original publications. Univ erSeg w as ﬁne- tuned end -t o-e nd using the Adam optimiser with an initial learning rate of 1 e − 6 and batc h size of 128. All other compared methods w ere trained end-to-end following their original training proto cols, initialising with pre-trained weigh ts where applicable. 3.4 Ablations Our ablation studies include in vestigating impact of inclusion of certain comp onen ts. W e compare our RL-PromptSeg metho d t o ablated versions: 1) omitti ng the initial seed p oin t prompts by using region-gro wing without an initial seed (no prompting); 2) omitting the en trop y-based reward (no entrop y rew ard); and 3) using a single T2 MR sequence instead of the multi-parametric MR image (no multi-parametric input) . The impact of the parameter β , which controls the exploration-exploitation trade-oﬀ, is also studied. Other h ype r- param eters (rep orted in the s up pl em e ntary materials) w ere set using a grid search, and had limited impact in pre l im i nar y ev aluations. 4 Results 4.1 Comparisons Comparisons are presented in T ab. 1 . Our prop osed RL-PromptSeg metho d outp er- forms the previous state-of-the-art fully-automated method Swin-UNeT r b y 9.9% and 8.9% p ercen tage p oin ts for the PROMIS and PICAI datasets, resp ectiv ely (p-v alues 0.010 and 0.004). It also outp erforms the previous b est promptable method UniverSeg b y 21.9% and 21.5% for the tw o datasets, resp ectiv ely (p-v alues 0.002 and 0.001). Compared to a h uman observer, statistical si gni ﬁ can ce was not found with a p-v alue of 0.14. The prompt selection time was 131s p er case, av eraged across 10 pati ent cases, compared to a full annotation ti me of 1093s p er case, allo wing for a 10 × reduction in annotation time, whi l e main taining comparable p erformance to a h uman observer and signiﬁcantly exceeding fully-automated performance. Standard deviation with resp ect 9 to the user-provided prompt p oin t, with p oint being within the lesion, was 0.028. W e also rep ort impact of oﬀ-target prompts in the supplemen tary materials. It is interesting to note that the surrogate netw or k alone p erforms subs t antially w orse than the prop osed RL-PromptSeg fr ame work, highlighting t h at the rep orted p erformance gains arise from the iterative RL-driven prompt reﬁnement rather than from t h e surrogate mo del itself. F urther analysis of the con tribution of individ ual comp onen ts is pro vided in the ablation studies. Mo del PR OMIS (Dice) PICAI (Dice) SAM 0.236 ± 0.107 0.294 ± 0.132 SAM2 0.254 ± 0.124 0.301 ± 0.137 SAM3 0.312 ± 0.113 0.298 ± 0.142 MedSAM 0.267 ± 0.138 0.342 ± 0.142 MedSAM2 0.291 ± 0.129 0.363 ± 0.144 MedSAM3 0.323 ± 0.148 0.376 ± 0.134 Combiner 0.330 ± 0.180 0.469 ± 0.156 T2-predictor 0.339 ± 0.192 0.394 ± 0.141 UniverSeg 0.307 ± 0.216 0.351 ± 0.154 UNet 0.327 ± 0.198 0.426 ± 0.153 UNet (surrogate) 0.346 ± 0.174 0.453 ± 0.141 nnUNet 0.414 ± 0.201 0.461 ± 0.137 Swin-UNeT r 0.427 ± 0.185 0.477 ± 0.133 DinoV3 0.318 ± 0.163 0.438 ± 0.138 Human 0.538 ± 0.094 - RL-PromptSeg 0.526 ± 0.112 0.566 ± 0.139 T able 1 : P erformance comparison. 4.2 Ablations The ablation studies are presented in T ab. 2 . RL-PromptSeg outp erformed ablated v ariants across b oth datasets with margins of > 5% p ercen tage p oints (all p-v alues < 0.010). The largest improv ement was observed compared to the no entrop y reward v ariant, where the entrop y rew ard was omitted. P erformance, for this v ariant reduces appro ximately to the level of other fully-automated met ho ds. The impact of v arying the ex pl or at i on-e x ploi t at ion trade-oﬀ h yp er-parameter β is outlined in T ab. 2 . The p erformance for the optimal setting of β was higher than the other v alues, with statistical signiﬁcance (all p-v alu es < 0.020) . 4.3 Qualitativ e results Predicted segmentations from RL-Pr omp t S eg are presented in Fig. 2 , showing o ver- segmen tation in some cases, with a detailed analysis in the supplementary materials. 10 Mo del PR OMIS (Dice) PICAI (Dice) No prompting 0.456 ± 0.122 0.464 ± 0.134 No entrop y reward 0.425 ± 0.172 0.441 ± 0.141 No multi-parametric 0.461 ± 0.183 0.465 ± 0.156 All comp onen ts 0.526 ± 0.112 0.566 ± 0.139 (a) Impact of removing metho d components β PR OMIS (Dice) 0.0 0.425 ± 0.172 0.2 0.447 ± 0.163 0.4 0.473 ± 0.179 0.6 0.498 ± 0.121 0.8 0.526 ± 0.112 1.0 0.472 ± 0.181 (b) Impact of β T able 2 : Ablation studies. Fig. 2 : Samples from PROMIS segmen ted us in g our RL-PromptSeg approach. 5 Discussion These resul t s demonstrate that our proposed framework enables eﬀectiv e prostat e can- cer se gmentation on MR images. The metho d achiev ed p erformance comparable to exp ert radiologists and substantially exceeded existing fully-automated and prompt- able approaches across tw o inde p endent datasets. A key adv an tage of our approach lies in its abilit y to combine the ﬂexibility of prompting with sample-sp eciﬁc optimisa- tion through RL. Unlike con ven tional deep lear ni n g mo dels that rely on dataset-level trends, RL allows explorat i on for eac h sample, guided by t he surrogate net work’s vo xel- wise entrop y . The ablation experi ments highlight the cen tral role of the en tropy-based rew ard in encou r agin g meaningful exploration, where its remov al led to a mark ed decline in accuracy , reducing perf orm anc e to that of con ven tional automated methods. Increasing the w e ight of the exploration-exploitation trade-oﬀ resulted in an impr ov e- men t up to 0.8, where a further increase ma y necessitate muc h longer training times to accommodate added exploration. Our metho d signiﬁcantly reduces the manual bur- den of prostate cancer segmentation, achieving a tenfold reduction in annotation time needing only a single p oin t prompt p er case, while maintaining radiologist-lev el accu- racy . Despite its sup erior p erformance, w e observed ov er-segmen tation in some cases, ho wev er, this may b e controlled by mo difying region-growing hyper-parameters, or exploring alternativ e denser prompting strategies and point-to-segmen t algorithms. A full ev aluation of in ter-/ intra-operator v ariability and alignmen t with individual user 11 in tent w ould require prosp ectiv e studies with multiple op erators and interactiv e cor- rection workﬂo ws, whic h we leav e as imp ortan t future work. F uture work could also explore other problems where annotation v ariability causes p oor p erformance. 6 Conclusion W e presented a reinforcement learning–based promptable segmentation framework for prostate cancer delineation on MR images. By modelling promptable segmen tation as a sequen tial pro cess, the method brid ges t he gap b et ween manual and automated delin- eation by enabling sample-sp eciﬁc reﬁnement guided by user prompt s and vo xel -w is e en tropy . The approac h achiev ed radiologist-level p erformance while reducing anno- tation time tenfold, demonstrating its p oten tial to accelerate clinical workﬂo ws, and dataset curation. Bey ond MR, the formulation oﬀers a general framework for diﬀeren t approac hes to implement semi-automated segmen tation across diverse tasks. Ac kno wledgemen ts / F unding This work is supp orted b y the International Alliance for Cancer Early Detection, an alliance b et ween Cancer Research UK [EDDAP A-2024/100014] & [C73666/A31378], Canary Center at Stanford Universit y , the Universit y of Cambridge, OHSU Knight Cancer Institute, Universit y College London and the Universit y of Manchester. Declarations Compliance with Ethical Standards Comp eting in terests: All authors hav e no conﬂicts of interest to declare that are relev ant to the conten t of this article. Ethics appro v al: Our work uses op en-source datasets where all original pro cedures p erformed in studies inv olving human participants were in accordance with the eth- ical standar ds of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendmen ts or comparable ethic al standards. Informed consent: Informed consent was obtained from all individual participants included in the origi n al studies. References [1] Ahmed, H.U., et al.: Diagnosti c accuracy of m ulti-par ame t ri c mri and t r u s biopsy in prostate cancer (pr omi s) . The Lancet (2017) [2] Penzk ofer, T., et al.: Prostate cancer detection and diagnosis: the role of mr and its comparison with oth er diagnostic mo dalities. NMR in Biomedicine (2014) 12 [3] Dos-Santos-Silv a, I., et al.: Global disparities in access to can cer care. Communi- cations Medicine (2022) [4] Sanders, J.W., et al.: Computer-aided segmentation on mri for prostate radio- therapy . Radiotherapy and Oncology (2022) [5] Suno qrot, M.R.S., et al.: Artiﬁcial intelligence for prostate mri: op en dataset s, a v ailable applications, and grand challenges. Eur Radiology Exp erimen tal (2022) [6] Karam, L., et al. : Pr omp t abl e cancer segmentation using minimal exp ert-curated data. In: MIUA (2025) [7] Ma, J . , et al.: Segment anything in medical images. Nature Comms (2024) [8] Kirillov, A., et al.: Segment anything. arXiv:2304.02643 ( 2023) [9] Aggarwal, R., et al.: Diagnosti c accuracy of deep learning in medical imaging: a systematic review and meta-analysis. np j Digital Medicine (2021) [10] Y an, W., et al.: Combiner and hypercombiner netw or k s: Rules to combine m ultimo dalit y mr images for prostate cancer lo calisation. MedIA (2024) [11] Adams, R., Bischof, L.: Seeded region gro wing. IEEE TP AMI (1994) [12] Czolb e, S., et al. : Is segmentation uncertaint y useful? In: IPMI (2021) [13] Ahn, S., et al.: Mitigating dataset bias by using p er-sample gradien t. arXiv:2205.15704 (2022) [14] Saeed, S.U., et al.: Comp eting for pixels: a self-play algorithm for weakly- sup ervised seman tic segmen tation. IEEE TP AMI (2024) [15] Vlontzos, A., et al. : Multiple landmark detection using multi-agen t reinforcem ent learning. In: MICCAI (2019) [16] Alansary , A., et al. : Ev al u ati n g rei nf or cem ent learning agen ts for anatomical landmark detection. Medical image analysis 53 , 156–164 (2019) [17] Sch ul m an, J., et al.: Proximal policy optimization algorithms. arXiv (2017) [18] Saha, A., et al.: Artiﬁcial in telligence and radiologists in prostate cance r detection on mri (pi-cai). The Lancet Oncology (2024) [19] Ronneb erger, O., et al. : U-net: Conv ol u t ion al ne tw orks for biomedical image segmen tation. In: MICCAI (2015) [20] Ravi, N., et al.: Sam 2: Segment anything in images and videos . arXiv (2024) 13 [21] Carion, N., et al.: Sam 3: Segment anything with concepts. arXiv (2025) [22] Ma, J., et al.: Medsam2: Segmen t anything in 3d med i cal images and videos. arXiv:2504.03600 (2025) [23] Liu, A., et al.: Medsam3: Delving into segment an ything with medical concepts. arXiv:2511.19046 (2025) [24] Yi, W., et al. : T2-only prostate cancer predic t ion by meta-learning from bi- parametric mr imaging. In: IEEE ISBI (2025) [25] Hatamizadeh, A., et al.: Swin transformers for seman tic segmentation of b r ain tumors in mri. arXiv: 2201. 01266 (2022) [26] Butoi, V.I., et al. : Universal medical image segmentation. In: ICCV (2023) [27] Isensee, F., et al.: nnu-net: a self-conﬁguring metho d for de ep l e arn i ng- bas ed biomedical image segmentation. Nature Metho ds (2021) [28] Sim´ eoni, O., et al.: Dinov3. arXiv:2508.10104 (2025) 14 Supplemen tary Material - Prompta b l e segmen tation with reg io n explorati on enable s minimal-eﬀort exp ert-lev el prostate can c er delineation Junqing Y ang 1 , Natasha Tho rley 2 , Ahmed Nadeem Abbasi 3 , Shonit Pun wani 2 , Zion Tse 4 , Yip eng Hu 1 , Shaheer U. Saeed 1,4* 1 UCL Ha wkes Institute; Department of Med ic al Ph ysics and Biomedical Engineering, Univ ersity College London, UK. 2 Cen tre of Medical Imaging, Universit y College Lon don, UK. 3 Departmen t of Oncology , Aga Khan Uni versit y Hospital, P akistan. 4 Cen tre for Bio engineering; School of Engineering and Mat er i al s Science, Q ue en Mary Universit y of London, UK. *Corresp onding author(s). E-mail(s): shaheer.saeed@qmul.ac.uk ; Robustness analysis In clinical scenarios, with limited annotation time, point prompts may be placed near lesions but outside their exact bou nd ar ie s. As suc h, to sim ulate this realistic scenario, w e randomly sampled p oin t prompt lo cations b etw een 0-20 pixels or 0-10 mm aw ay from lesion b oundaries. F or suc h perturb ed p oin ts, the p erformance for RL-PromptSeg is compared with SAM [ 1 ], SAM2 [ 2 ], SAM3[ 3 ] and MedSAM3 [ 4 ] (all other com- pared metho ds use no prompts, or b ounding b o xes, and thus ar e excluded from this exp erimen t). The impact of p erturbed p oin ts for the PROMIS datas et is summarised in T ab. 1 . All metho ds sho wed po orer p erformance compared to p oin ts sampled within the lesion, ho wev er, RL-PromptSeg sho wed mi ni mal p erformance reduction and had sup erior p erformance compared to other metho ds (p-v alues < 0.001). Standard deviation with resp ect to p erturb ed p oin t locations was 0.052. This robustness of RL-PromptSeg is attributabl e to the prop osed RL formulation, which en abl es iterative exploration 1 and prompt relo cation rather than reliance on a single in i ti al prompt lo cation. The robustness of RL-PromptSeg to p erturbed prompts pro vides a pro x y for in ter-/ intra- op erator v ariability , demonstrati ng that the ﬁnal segmentation is stable across diﬀeren t user p oin t pl ac e ments. Mode l Prompts within lesion (Dice) Prompts outside lesion (Dice) SAM 0.236 ± 0.107 0.162 ± 0.097 SAM2 0.254 ± 0.124 0.141 ± 0.127 SAM3 0.312 ± 0.113 0.234 ± 0.115 MedSAM3 0.323 ± 0.148 0.226 ± 0.134 RL-PromptSeg 0.526 ± 0.11 2 0.473 ± 0.146 T able 1 : Robustness analysis for PROMIS (pertu r b ed p oin t prompts outside lesion boun d ari e s) . Generalisabilit y analysis T o assess the generalisability of the RL-PromptSeg framework to other applications w e ev aluate its performance on t wo other datasets: 1) LiTS: liv er tumour segmentation [ 5 ]; and 2) KiTS: kidney tumour segmentation [ 6 ]. The tw o datasets, consisting of 131 an d 489 samples of CT images, are split in to developmen t and holdout sets with ratio 80:20, with p erformance rep ort e d on heldout samples. Only the b est-p erforming general-purp ose metho ds from the comparison on PR OMIS and PICAI w ere included for comparison in this generalisability study , alongside the previous rep orted fully- automated b est-p erforming metho ds CLIPSeg [ 7 ] and ASeg [ 8 ] (metho ds that used LiTS or KiTS for pre-training w ere excluded). The results for the generalisabilit y study across the tasks of liv er and kidney tumour segmen tation on CT images are rep orted in T ab. 2 . W e observed statistically signiﬁ cant p erformance improv ements for RL-PromptSeg compared to all oth er tested meth- o ds (all p-v alues < 0.02). It is interesting to note that we also obser ved p erformance impro vemen ts compared to the b est fully-aut omat e d metho ds for each application, alb eit at the trade-oﬀ of increased inference-time compute for RL-PromptSeg in addi - tion to the required initial u ser -p r ovided prompt. These results indicate th at the prop osed RL-based promptable formulation generalises across organs and imaging mo dalities, suggesting applicability b ey ond prostate MR. 2 Mo del LiTS (Dice) KiTS (Dice) SAM3 0.651 ± 0.101 0.613 ± 0.071 MedSAM3 0.703 ± 0.099 0.724 ± 0.074 UNet 0.742 ± 0.097 0.718 ± 0.064 nnUNet 0.748 ± 0.085 0.720 ± 0.059 Swin-UNeT r 0.758 ± 0.093 0.714 ± 0.072 DinoV3 0.745 ± 0.103 0.727 ± 0.068 CLIPSeg 0.794 ± 0.081 - ASeg - 0.764 ± 0.055 RL-PromptSeg 0.803 ± 0.105 0.772 ± 0.067 T able 2 : Performance comparison for liver and kidney tumour segmentation. Ov er-segmen tation analysis T ab. 3 quan tiﬁes o v er-segmentation, u si n g vo x e l- wi se false p ositiv e rate (FPR), whic h directly me asu re s excess background inclusion, alongside sensitivity to assess the trade-oﬀ b etw een lesion cov erage and ov er-segmen tation. Only the b est-performing metho ds from the ab o v e analys is were included in the comp ari s ons . Note that sp eci- ﬁcit y and fal se negative rate are not rep orted as they can b e directly derived from the rep orted quantities. The res ul t s s how that all automated metho ds exhibit higher o ver-segmen tation than the human observer; how ever, RL-PromptSeg achiev es sub- stan tially low er FPR than other automated approaches while maintaining se ns it i v i ty comparable to human perfor manc e. T able 3 : Quantitativ e analysis of o ver- segmen tation on the PROMIS dataset. Mo del PR OMIS (FPR) PR OMIS (Sensitivity) SAM3 0.322 0.421 MedSAM3 0.306 0.412 Combiner 0.218 0.498 T2-predictor 0.213 0.507 UniverSeg 0.264 0.516 Swin-UNeT r 0.281 0.523 Human 0.133 0.571 RL-PromptSeg 0.144 0.569 Handling negativ e cases F or negative cases, initial prompts are sampled within the prostate gland. A scan is classiﬁed as negativ e i f the ﬁnal segmen tation do es not exceed a mini mum size thresh- old corresp onding to the region-growing neighbourho o d , suppressing small erroneous 3 false-p ositiv e regions. Negative cases naturally yield minimal Dice improv ement, dis- couraging region expansion throu gh the reward formulation, without requiring an explicit ‘no-lesion’ action. In cases with multiple lesions, one prompt is used p er lesion; suc h cases were infrequent in the ev aluated datasets. The patient-lev el sensitivity and sp eciﬁcit y (with 95% conﬁdence interv al s) for a h uman observ er w ere rep orted as 0.88 (0.84–0.91) and 0.45 (0.39–0.51) in the PROMIS study [ 9 ]. This corresp onds to classifying a single scan as p ositiv e or negative. F or the same task on the holdout set in PR OMIS, RL-PromptSeg ac hieved 0.84 accuracy with sensitivity and s p eciﬁcity of 0.84 (0.81–0.89) and 0.48 (0.40–0.53), whic h is in line with rep orted human performance. Hyp er-parameter settings The v alues of relev ant hyper-parameters are summarised in T ab. 4 . Hyp er- paramete r V alue Surrogate learning rate 1 e − 4 Surrogate learning rate annealing Cosine up-to 1 e − 6 Surrogate optimiser Adam Surrogate batch size 256 Surrogate ep ochs 200 β : reward scaling / ex plora tion -ex ploi tatio n 0.8 τ e : entrop y threshold 0.1 τ σ : region gro wing intensit y St.D. threshold 0.3 RL learning rate 1 e − 4 RL γ 0.99 RL batch size 128 RL steps 1000*512 RL clip range epsilon 0.2 RL adv antage lambda 0.95 RL entrop y co eﬃcien t 0.01 T able 4 : Hyp er-parameter v alues. References [1] Kirillov, A., et al.: Segment anything. arXiv:2304.02643 (2023) [2] Ravi, N., et al.: Sam 2: Segment anything in images and videos . arXiv (2024) [3] Carion, N., et al.: Sam 3: Segment anything with concepts. arXiv (2025) [4] Liu, A., et al.: Medsam3: Delving into segment an ything with medical concepts. arXiv:2511.19046 (2025) [5] Bilic, P ., et al. : The liver tumor segmentation b enc hmar k (lits). Medical image analysis 84 , 102680 (2023) 4 [6] Heller, N., et al. : The state of the art in kidn ey and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge. Medical image analysis 67 , 101821 (2021) [7] Liu, J., et al. : Clip-driv en universal mo del for or gan segmen tation and tumor detec- tion. In: Pro ceedings of the IEEE/CVF International Conference on Compu t er Vision, pp. 21152–21164 (2023) [8] Myronenko, A., et al. : Automated 3d segmentation of kidneys and tumors in mi cc ai kits 2023 challenge. In: International Challenge on Kidney and Kidney T umor Segmen tation, pp. 1–7. Springer, ??? (2023) [9] Ahmed, H.U., et al.: Diagnostic accuracy of multi-parametric mri and trus biopsy in prostate cancer (pr omi s) . The Lancet (2017) 5

Promptable segmentation with region exploration enables minimal-effort expert-level prostate cancer delineation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment