Learning and Naming Subgroups with Exceptional Survival Characteristics

Learning and Naming Subgr oups with Exceptional Survival Characteristics Mhd Jawad Al Rahwanji 1 Sascha Xu 1 Nils Philipp W alter 1 Jilles Vreek en 1 Abstract In many applications, it is important to iden- tify subpopulations that surviv e longer or shorter than the rest of the population. In medicine, for example, it allows determining which patients beneﬁt from treatment, and in predicti v e main- tenance, which components are more likely to fail. Existing methods for discovering subgroups with exceptional surviv al characteristics require restrictiv e assumptions about the surviv al model (e.g. proportional hazards), pre-discretized fea- tures, and, as they compare average statistics, tend to ov erlook individual deviations. In this paper , we propose S Y S U RV , a fully dif ferentiable, non- parametric method that le verages random survi v al forests to learn individual surviv al curves, auto- matically learns conditions and ho w to combine these into inherently interpretable rules, so as to select subgroups with exceptional survi v al char- acteristics. Empirical e v aluation on a wide range of datasets and settings, including a case study on cancer data, sho ws that S Y S U RV rev eals insightful and actionable surviv al subgroups. 1. Introduction Surviv al analysis traditionally focuses on estimating whether a given group of indi viduals has different survi v al characteristics than a reference population. This has ap- plications in many ﬁelds, but most obviously in precision medicine, as it allows characterizing patients who bene- ﬁt from a treatment. What if the subgroups are not yet known? Can we learn easily interpretable rules that select subgroups with exceptional surviv al characteristics? Can we learn these in a ﬂexible, dif ferentiable manner , without restricti ve assumptions or pre-discretizing features, allo wing ev en heavy censoring, while keeping indi vidual de viations in mind? That is exactly the topic of this paper . W e consider time-to-e vent data, in which for each sam- 1 CISP A Helmholtz Center for Information Security , Saarbr ¨ ucken, Germany . Correspondence to: Jawad Al Rahwanji < jawad.alrahw anji@cispa.de > . Pr eprint. F ebruary 26, 2026. 0 20 40 60 80 100 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Population Found subgroup Follo w up time (mo) Surviv al probability 0 5 10 15 20 25 Subgroup 1 Subgroup 2 Population Unemployment time (w) F igur e 1. Survival subgr oups . S Y S U RV ﬁnds and characterizes surviv al subgroups, i.e. subpopulations with exceptional survi v al characteristics compared to the o verall population. (Left) S Y S U RV ﬁnds patients suffering from a therapy-resistant tumor . (Right) S Y S U RV ﬁnds people with exceptionally long (1) resp. short (2) durations until re-employment. ple we ha ve a v ector x of descripti ve features (co v ariates), a time t since we started observing the subject, and an indicator δ of whether the ev ent of interest (e.g. death) has occurred since then. W e are interested in learning conjunctiv e rules that deﬁne conditions on the cov ariates (e.g. x 1 ∈ [0 . 2 , 1 . 0] and x 2 ∈ [0 . 8 , 0 . 9] ) and so select a subgroup for which the corresponding surviv al curves are exceptional with re gard to the ov erall population. For an e xample, consider Fig. 1 . On the left-hand side, we show the result of our method, S Y S U RV , on neck can- cer treatment data. It learns that “patients whose tumors hav e mutations related to DNA repair” tend to respond very poorly to treatment, which is in line with medical under- standing ( Barker et al. , 2015 ), as well as actionable, as we can now identify and treat these patients dif ferently . On the right-hand side, we consider unemployment data. S Y S U RV ﬁnds that “people older than 58 with unemplo yment in- surance” tend to hav e exceptionally long durations until re-employment (group 1), whereas “people who previously earned a low wage and without unemployment insurance” tend to ﬁnd new jobs much more quickly (group 2). T o ﬁnd these subgroups, we use a non-parametric mea- sure of exceptionality which is based on individual surviv al curves. That is, unlike the state of the art, we do not assume the Cox proportional hazards model, as it would restrict us to subgroups that roughly follow the population trend, and neither do we rely on group av erages, as this obscures 1 Learning and Naming Subgroups with Exceptional Survival Characteristics individual deviations. W e sho w how to learn rules that sin- gle out subgroups with exceptional survi val characteristics in a diff erentiable manner , automatically ﬁnding relevant features and intervals in a way that allows easy optimization, yet crisp logical interpretation after learning. Through an extensiv e set of experiments on synthetic and real-world data, we sho w that our method works well in practice. It outperforms state-of-the-art methods by a wide margin, both in terms of recov ering the ground truth, and in terms of exceptionality of the found subgroups. Through a case study on treatment of locally advanced head and neck squamous cell carcinoma, we conﬁrm that S Y S U RV ﬁnds biomarkers that are kno wn to be associated with good resp. poor response to treatment, while also identifying novel subgroups that warrant further in vestigation. In a nutshell, our main contributions are as follo ws • W e propose a dif ferentiable, non-parametric measure of surviv al subgroup exceptionality based on indi vidual surviv al estimates, and discuss how it relates to the well-known logrank model. • W e show ho w to differentiably learn rules that select subgroups with exceptional surviv al characteristics, and how to post hoc e valuate statistical signiﬁcance. • W e empirically e v aluate our method on a wide range of synthetic and real-world datasets, comparing to three state-of-the-art methods. W e make all our code a vailable online. 1 2. Preliminaries W e consider time-to-ev ent (survi v al) data, where we hav e a dataset D = { ( x ( i ) , t ( i ) , δ ( i ) ) } n i =1 consisting of n i.i.d. realizations, called subjects , from a joint distribution P ( X , T , ∆) . Here, x ∈ R p is a vector of p cov ariates that describes the subject, t ∈ R ≥ 0 measures time since the start of its observation, and δ indicates whether the event of interest (e.g. relapse) has occurred yet. If the event did occur ( δ = 1 ), t is the time at which it was observed. If it did not occur , t is the latest time we observed the subject to still be ﬁne, and so the outcome is said to be censored . A surviv al function S ( t ) giv es the probability of a subject surviving be yond time t , i.e. it is deﬁned as the cumulativ e distribution function P ( T > t ) . T o ease notation, we denote the domains of x and t by X and T , respectively . A subgroup Q is a set of subjects selected from D using a rule σ Q : X → { 0 , 1 } that indicates whether a subject belongs to Q or not, i.e. Q = { x ∈ D | σ Q ( x ) = 1 } . The surviv al function of subgroup Q is denoted by ˆ S Q ( t ) . 1 https://eda.group/sysurv 3. Learning Survival Subgroups W e are interested in learning rules that select subgroups with surviv al trends that are exceptional compared to those of the general population. First, we introduce our population model, then deﬁne a measure for subgroup e xceptionality , and then present S Y S U R V for learning exceptional surviv al subgroups using gradient-based optimization. 3.1. Population Model Precision medicine overcomes the traditional “one-size-ﬁts- all” model by incorporating indi vidual characteristics into treatment outcome predictions ( Feuerriegel et al. , 2024 ). This is particularly important in surviv al analysis where subject heterogeneity can signiﬁcantly impact outcomes. For us, this translates to estimating indi vidual survival. Individual surviv al functions ˆ S ( t | X = x ) , or ˆ S ( t | x ) for short, can lev erage the cov ariates x of a subject and so pro- vide more accurate and more rob ust survi val estimates ( Cox , 1972 ) than those estimated marginally ( Kaplan & Meier , 1958 ). Speciﬁcally important here is that indi vidual survi val functions provide ﬁne-grained signal that we can use during optimization, i.e. to determine which subjects should (not) be part of the subgroup. Giv en individual estimates, the surviv al function for a subgroup Q selected by rule σ Q is deﬁned as ˆ S Q ( t ) : = E x ∼ X [ ˆ S ( t | x ) | σ Q ( x ) = 1] . T o obtain indi vidual estimates, we ﬁt a conditional estimator, or population model, M ov er the entire dataset. W e do so us- ing the random survi val forest (RSF) ( Ishwaran et al. , 2014 ). This is a fully-non-parametric continuous-time model that extends the random forest ( Breiman , 2001 ) to time-to-ev ent data. In contrast to well-kno wn parametric models, it does not restrict us to speciﬁc surviv al distributions (e.g. W eibull) or structures (e.g. Cox) and allo ws us to ﬁnd subgroups with 0 2 4 6 8 10 0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 2 3 4 Surviv al time Surviv al probability ˆ S ( t | x ) ˆ S ( t ) Pop. 0 2 4 6 8 10 Surviv al time ˆ S 1 ( t ) ˆ S 2 ( t ) ˆ S 3 ( t ) Pop. F igur e 2. Desired level of sensitivity . (Left) Individual surviv al functions ˆ S ( t | x ) are more informative for discovery than their group-lev el estimate ˆ S ( t ) . (Right) Approaches that do not apply the absolute v alue can capture the difference between group and reference population surviv al (Pop.) when the two do not cross ( ˆ S 1 and ˆ S 2 ) but completely underestimate it when the y do ( ˆ S 3 ). 2 Learning and Naming Subgroups with Exceptional Survival Characteristics exceptional survi val characteristics in general. 3.2. Exceptionality Measure In traditional surviv al analysis, the subgroup of interest is assumed to be given and exceptionality versus the reference population is measured on average, at the group level. As we will detail below , this has drawbacks in terms of sensitivity to per-subject de viations when we aim to learn the subgroup. First, we illustrate in Fig. 2 (left) ho w indi vidual estimates can re veal differences from the reference surviv al that group- lev el estimates obscur e . The dark green line shows the group-lev el av erage ov er four indi viduals (light green lines). The average curve obscures that the curves of subjects 1 and 2 are very different, i.e. delayed, compared to those of subjects 3 and 4, effecti v ely underestimating the true exceptionality . In general, individual-le vel exceptionality measures offer increased sensiti vity to deviations of indi vid- ual surviv al functions, such as those crossing the reference, making them particularly suited for optimization of sub- group membership rules. Independent of whether we measure exceptionality at the group or individual lev el, it is important to consider the absolute difference to the reference. W e gi ve an example in Fig. 2 (right). Whereas it is trivial to identify groups 1 and 2 as exceptional e ven when we measure relati ve to the reference, the large exceptionality of group 3 only becomes clear when we measure the difference in absolute terms. Next, we formalize these intuitions, and sho w that consid- ering absolute dif ferences of individual estimates to the reference giv es us more signal than group averages. Proposition 3.1. Given two gr oups A and B , selectable by rule σ A and σ B , resp., for which the expected gr oup- level survival at any time t ar e ˆ S A ( t ) and ˆ S B ( t ) , and for which individual-le vel survival is denoted by ˆ S ( t | x ) . The expected absolute dif ference in survival of the subjects se- lectable by σ A is as, or more , sensitive than the absolute differ ence of ˆ S A ( t ) fr om ˆ S B ( t ) , E x ∼ X [ ℓ 1 t ( s ( x ) , s B ) | σ A ( x ) = 1] ≥ ℓ 1 t ( s A , s B ) , wher e ℓ 1 t ( · , · ) is an absolute differ ence measur e, and for br e vity we write s ◦ for ˆ S ◦ ( t ) , and s ( x ) for ˆ S ( t | x ) . W e pro vide the proof in Appx. A . For our purposes, s A would typically belong to a group of interest, e.g. a subgroup, and s B to a reference group, e.g. the entire dataset. T o measure the exceptionality between tw o survi v al estimates s A , s B ≥ 0 at some time t , we consider the L 1 distance ℓ 1 t ( s A , s B ) : = | s A − s B | . Corollary 3.2. Under the assumptions of Pr op. 3.1 , if ther e exist x (1) , x (2) ∈ X such that s ( x (1) ) > s B ∧ s ( x (2) ) < s B , then the inequality in Pr op. 3.1 is strict, i.e. E x ∼ X [ ℓ 1 t ( s ( x ) , s B ) | σ A ( x ) = 1] > ℓ 1 t ( s A , s B ) . Pr oof. Proposition 3.1 follows from the triangle inequality property of the absolute value function. Equality holds if f all arguments are aligned, i.e. either s ( x ) ≥ s B or s ( x ) ≤ s B for all x selectable by σ A . The existence of x (1) and x (2) contradicts this, hence the inequality is strict. T o capture the surviv al interplay between groups, we lev erage individual estimates ov er the entire time domain T by extending the absolute difference measure ℓ 1 t ( · , · ) for two surviv al functions S A , S B as ℓ 1 T ( S A , S B ) : = R t ∈T | S A ( t ) − S B ( t ) | dt . Finally , we deﬁne our exception- ality measure as the expected difference in survi v al of the subjects of a subgroup selectable by σ from the estimated surviv al in the population ˆ S D ( t ) throughout T as ϕ ( σ, σ D ) : = E x ∼ X [ ℓ 1 T ( ˆ S ( t | x ) , ˆ S D ( t )) | σ ( x ) = 1] . (1) Next, we contrast our measure to the logrank statistic. Relation to Logrank The logrank statistic ( Mantel , 1966 ) is widely used in survi v al analysis. It essentially compares the observed number of e vents d to the expected number of ev ents e at the group level across all e vent times L o gr ank ∝ X evt. times d grp t − e grp t ∝ X evt. times d grp t r grp t − d r ef t r r ef t , where e grp t = d r ef t · r grp t /r r ef t , r refers to the number of individuals at risk, “ grp ” and “ r ef ” refer to the group and reference populations, respectively . Further , it assumes the Cox proportional hazards condition, which states that the rates at which subgroup and reference surviv al decay (i.e. hazards) are constantly proportional ov er time. By relying on group-lev el differences, the logrank obscures individual variability . Perhaps more importantly , the pro- portional hazards assumption is often violated in practice, which means that the survival functions for groups of in- terest often cross one another . Since the logrank does not apply the absolute value before aggre gating o ver time, this causes the effects to cancel out. In contrast, our measure does not hav e these drawbacks, and in our e xperiments, we will compare to methods that rely on the logrank. 3.3. Rule Induction Armed with our exceptionality measure, we now present S Y S U RV for learning subgroups with e xceptional survi v al characteristics using gradient-based optimization. 3 Learning and Naming Subgroups with Exceptional Survival Characteristics 0 0 . 3 0 . 7 1 0 0 . 3 0 . 7 1 x ˆ π ( x ; τ ) 0 . 1 0 . 05 0 . 025 (a) Soft condition 0 0 . 5 1 0 0 . 5 1 x 1 x 2 0 1 (b) Soft rule F igure 3. Soft conditions and rules. The soft condition approaches a hard interv al with decreasing temperature τ → 0 (a) . Multiple soft conditions combine to form a soft rule depicted as a hyper box in the cov ariate space (b) . Adapted from Xu et al. ( 2024 ). Learnable Rules Recall that subgroups are selected via rules, so we ﬁrst formalize the language of these rules. Tra- ditionally , a subgroup is deﬁned by a hard rule σ : x 7→ V p j =1 π ( x j ; α j , β j ) , where each π is a Boolean condition ev aluating to true (1) if a covariate x falls within the interval deﬁned by lower and upper bounds α, β ∈ R , e.g. “ 18 < age < 32 ” . T o integrate rule induction into a gradient-based optimization pipeline, we employ a contin- uous relaxation of these logical expressions. T o facilitate differentiable subgroup discovery , we use soft rules ˆ σ : X → [0 , 1] consisting of soft conditions ˆ π : R → [0 , 1] . These soft conditions model the proba- bility of a cov ariate being inside the speciﬁed interv al via a composition of two opposing sigmoids located at the learn- able bounds α and β . A temperature hyperparameter τ > 0 controls the strictness of the bounds; as τ → 0 , the soft condition ˆ π con verges to a Boolean interv al (Fig. 3a ). Follo wing Xu et al. ( 2024 ), we deﬁne the soft condition as ˆ π ( x j ; α j , β j , τ ) : = e 1 t (2 x j − α j ) e 1 t x j + e 1 t (2 x j − α j ) + e 1 t (3 x j − α j − β j ) , and the differentiable rule learner using a weighted harmonic mean of soft conditions as ˆ σ ( x ; α , β , a , τ ) = P p j =1 a j P p j =1 a j ˆ π ( x j ; α j , β j , τ ) − 1 , where a ∈ R p represents a vector of learnable weights. These weights allow the model to perform feature selection, i.e. a cov ariate x is activ ely included in the conjunction when a > 0 and effecti vely ignored as a → 0 . The harmonic mean structure is chosen because it serves as a smooth approximation of the logical-and operator; if any single activ e condition ˆ π approaches zero, the entire rule output ˆ σ tends toward zero. Henceforth, the output ˆ σ ( x ) can be interpreted as the probability P ( ˆ σ ( X ) = 1 | X = x ) that a subject belongs to the subgroup (yellow box in Fig. 3b ). Objective Armed with dif ferentiable rules, the goal is to update those rules to select the subgroups that maximize our objectiv e. For rule updates, we need to tak e the gradient of our objectiv e w .r .t the parameters of the soft rule ˆ σ . Hence, we rewrite the e xpectation in Eq. ( 1 ) to deriv e our objecti ve where ˆ σ explicitly appears as ϕ ( ˆ σ , ˆ σ D ) = Z x ∈X ℓ 1 T ( ˆ S ( t | x ) , ˆ S D ( t )) P ( X = x ) P ( ˆ σ ( X ) = 1) ˆ σ ( x ) d x , (2) by using the expected v alue deﬁnition, marginalization rule and Bayes’ theorem. W e estimate the inte gral over X and T in Eq. ( 2 ) using the standard Monte Carlo and trapezoidal methods, respectiv ely . Our objecti ve is deﬁned as ˆ ϕ ( ˆ σ , ˆ σ D ) : = 1 | ˆ σ | n X i =1 ℓ 1 T ( ˆ S ( t | x ( i ) ) , ˆ S D ( t )) ˆ σ ( x ( i ) ; θ ) , where the subgroup size | ˆ σ | is estimated as 1 n P n i =1 ˆ σ ( x ; θ ) , and θ stands for α , β , and a , collectiv ely . Rule Generality and Diversity W e introduce a size penalty | ˆ σ | γ to our objectiv e to av oid learning ov erly spe- ciﬁc subgroups. W e achiev e this by controlling the trade-of f between e xceptionality and subgroup size via a hyperparam- eter γ ∈ [0 , 1] . T o learn non-redundant subgroups, we add a regularizing term to our objecti ve that ensures that we get a div erse set of subgroups by contrasting each subgroup with the set of q predecessors. W e deﬁne the full objective as arg max θ " | ˆ σ | γ ˆ ϕ ( ˆ σ , ˆ σ D ) + q X g =1 | ˆ σ | γ g ˆ ϕ ( ˆ σ , ˆ σ g ) # , where we can optionally anneal the value of γ to discov er progressiv ely smaller subgroups. Optimization Gradient-based optimization allo ws us to efﬁciently learn the parameters of our rules, and therewith scale S Y S U RV to large datasets. For ev ery subject, each feature is subjected to its corresponding soft condition ac- cording to the learned bounds, which are then weighted and combined into a soft rule that predicts a membership prob- ability . W e iteratively learn the weights and bounds using standard ﬁrst-order gradient-based optimization techniques ( Kingma & Ba , 2017 ) for a number of epochs, while anneal- ing the temperature to arriv e at crisp discretizations. For the pseudocode and the annealing schedule see Appx. B . Rule Sparsity In high-dimensional correlated data, e.g. gene expressions, there often exist different ways (i.e. different conditions over different features) to select the same subgroup, possibly leading to rules with redun- dant terms. Rather than naiv ely regularizing for rule length, adding a hyperparameter and complicating optimization, we 4 Learning and Naming Subgroups with Exceptional Survival Characteristics propose to achieve rule sparsity in such cases by post hoc pruning. The idea is simple: we iterativ ely remov e that con- dition from the rule that minimally changes which subjects are selected, and stop when a minimal Jaccard similarity is reached. W e present the pseudocode in Appx. C . 3.4. Statistical V alidation In critical applications, e.g. healthcare, it is often important that results come with statistical guarantees. While it is challenging to make formal statements about the results of continuous optimization, we can determine p-values through a permutation test. In particular, following Dui vesteijn & Knobbe ( 2011 ), we create independent copies of the input dataset D in which we break the dependency between the cov ariates, x , and the outcomes, ( t, δ ), by random permuta- tion. W e run S Y S U RV on each of these copies, as to obtain a distribution of e xceptionalities of false disco veries. Then, by applying the central limit theorem, we can determine the p-value of the e xceptionality of the subgroup(s) we found on the actual data. When discovering multiple subgroups, we use Bonferroni correction ( Dunn , 1961 ) to correct for multiple hypothesis testing. For details, see Appx. D . 4. Related W ork In this section, we discuss the most rele v ant related work from surviv al analysis, survi v al clustering, subgroup discov- ery , and surviv al subgroup discovery . Surviv al Analysis The majority of machine learning re- search for time-to-e vent data gra vitates towards predicting the outcome of indi vidual subjects. Some of the most potent methods for surviv al outcome prediction are ensemble meth- ods ( Barnw al et al. , 2022 ), which includes Random Survi v al Forests ( Ishwaran et al. , 2014 ). Alternativ ely , surviv al func- tions can be computed using neural networks, such as the multitask logistic regression framework ( Y u et al. , 2011 ; Fotso , 2018 ), or transformer-based approaches ( Li et al. , 2025 ), some of which can, in addition to subject covari- ates, also incorporate unstructured data, such as images or sequences ( Cheerla & Gev aert , 2019 ; V ale-Silva & Rohr , 2021 ; Meng et al. , 2022 ; Saeed et al. , 2024 ; Meng et al. , 2023 ; Saeed et al. , 2024 ; Farooq et al. , 2025 ). S Y S U RV does not compete with these methods, but rather needs one to obtain individual survi val estimates. Surviv al Clustering has the goal of assigning subjects to groups of high inter-group but lo w intra-group surviv al similarity and is therewith a related problem, but it does not provide descriptions or means to identify them. Many modern techniques le verage generati ve models to parame- terize latent mixtures of W eibull distrib utions, such as those using variational autoencoders ( Manduchi et al. , 2022 ) or multilayer perceptrons ( Hou et al. , 2024 ). Other methods assume proportional hazards ( Nagpal et al. , 2021 ), use a multitask framework ( Cui et al. , 2024 ), or le verage the latent vector quantization framework ( de Boer et al. , 2024 ), but unlike S Y S U RV , these typically make speciﬁc assumptions about the underlying distributions or structure, which are then interpreted as data-dri ven subgroups, a process yielding uninterpretable results compared to rule induction. Subgroup Discovery , ﬁrst introduced by Kl ¨ osgen ( 1996 ), is the task of ﬁnding and describing subpopulations that are exceptional in terms of a target property . T ypically , sub- group discov ery approaches le verage dif ferent exceptional- ity measures to accommodate the data type of a single target variable ( Song et al. , 2016 ; Kalofolias & Vreeken , 2022 ). Furthermore, some works opt to assume strong assumptions on the distribution of the target variable ( Friedman & Fisher , 1999 ; Lavra ˇ c et al. , 2004 ; Grosskreutz & R ¨ uping , 2009 ). The exceptional model mining frame work ( Leman et al. , 2008 ; Dui vesteijn et al. , 2016 ) generalizes subgroup dis- cov ery to arbitrary numbers of target variables by instead measuring the exceptionality between a model trained on the subgroup and one trained on the population. Whereas most methods for subgroup discovery rely on combinatorial search, Xu et al. ( 2024 ) recently sho wed that gradient-based optimization can ef ﬁciently learn subgroups in lar ge datasets and remov es the need for pre-discretization. Surviv al Subgroup Discovery , i.e. subgroup discovery applied to surviv al data. Existing approaches to survi v al subgroup discovery alter the exceptionality measure, and are typically based on the logrank statistic. R U L E K I T ( Gudy ´ s et al. , 2020 ) relies on heuristic search, while F I B E R S ( Ur- banowicz et al. , 2023 ) is e v olutionary algorithm-based, and, more recently , E S M A M D S ( V imieiro et al. , 2025 ) w as based on Ant Colony Optimization. Relator et al. ( 2018 ) e xplore signiﬁcant surviv al subgroup discovery by correcting for multiple testing, albeit at great computational cost. In con- trast, we propose a gradient-descent-optimizable objecti ve that uses ﬂexible survi v al functions without making assump- tions about how subgroup and population surviv al relate. W e also provide statistical guarantees for our subgroups. 5. Experiments Next, we empirically ev aluate S Y S U RV on synthetic and real-world data. W e compare it to state-of-the-art methods R U L E K I T ( Gudy ´ s et al. , 2020 ), F I B E R S ( Urbanowicz et al. , 2023 ), and E S M A M D S ( V imieiro et al. , 2025 ). All optimize the logrank for exceptionality , but do so employing different search strategies. W e use the implementations of the respec- tiv e authors. W e allo w each method, including S Y S U RV , up to 48 hours per experiment. W e tune hyperparameters for all methods using grid search (see Appx. E ). W e make all 5 Learning and Naming Subgroups with Exceptional Survival Characteristics 10 1 10 2 10 3 0 . 2 0 . 4 0 . 6 0 . 8 0 0 1 0 Number of features p F1-score 0 0 . 3 0 . 6 0 . 9 0 . 2 0 . 4 0 . 6 0 . 8 0 0 1 0 Percentage of censored subjects F1-score S Y S U RV R U L E K I T E S M A M D S F I B E R S 0 . 1 0 . 2 0 . 3 0 . 4 0 . 2 0 . 4 0 . 6 0 . 8 0 0 1 0 Percentage of subjects in subgroup F1-score F igur e 4. Synthetic setting. Comparison of S Y S U RV and each of R U L E K I T , E S M A M D S , and F I B E R S in terms of F1-scores recovering planted subgroups with increasingly large datasets (Left) , increasingly censored subjects (Center) , and increasingly large planted subgroups (Right) . E S M A M D S is the closest competitor to S Y S U RV closely followed by R U L E K I T . Higher is better . The shaded areas show ±1 standard error o ver 10 runs. T able 1. Real-world setting. Exceptionality of subgroups discovered by S Y S U RV ( Ours ), R U L E K I T ( R K), E S M A M D S ( E D), and F I B E R S (F I B ), measured using three different metrics (our objecti v e, logrank, and mean-shift). Greater is better . Highest values marked in bold. Our objectiv e Logrank Mean-shift Ours R K E D F I B Ours R K E D F I B Ours R K E D F I B UnempDur 6.4 3.8 3.8 3.8 19.3 226.5 125.0 125.0 2.0 2.1 2.1 2.1 Nwtco 1011.6 681.0 681.0 208.6 374.2 281.3 31.4 0.0 1162.0 688.2 688.2 0.6 Rott2 559.2 326.2 487.6 487.6 267.0 93.0 71.7 169.3 31.5 10.0 25.2 25.2 Rdata 259.2 213.8 213.8 158.5 113.4 107.8 14.6 43.3 1116.6 933.5 933.5 705.5 Aids2 66.6 60.5 54.0 34.7 0.2 1.7 4.3 0.0 22.7 61.2 61.4 0.7 Dialysis 4.5 4.7 4.7 4.6 6.4 68.4 70.9 305.9 0.6 5.7 3.1 4.5 TRA CE 261.8 272.5 332.1 261.6 0.0 94.6 93.2 0.0 0.0 0.6 1.7 0.0 Support2 132.8 81.4 104.9 54.8 251.0 59.9 46.8 11.9 405.7 227.6 405.5 71.0 DataDIV A T2 286.3 141.3 172.5 141.3 33.6 31.6 10.9 31.6 3.2 1.6 2.1 1.6 ProstateSurv . 20.1 7.5 14.1 5.4 425.7 64.7 82.5 82.5 6.1 3.7 3.4 1.0 Actg 34.6 24.2 18.8 11.3 60.0 1.1 12.3 0.0 54.1 27.3 6.0 0.4 Scania 172.5 147.4 147.4 84.7 4.7 14.7 13.7 9.9 6.5 2.4 2.4 0.1 Grace 38.6 16.5 21.0 13.1 59.3 33.3 28.7 13.8 64.0 16.0 21.8 11.1 A vg. rank 1.38 2.65 2.27 3.69 2.04 2.19 2.69 3.08 1.81 2.46 2.23 3.5 our code av ailable online, 1 and giv e the pseudocode for our data generator and additional results in Appx. G and H . 5.1. Synthetic Data W e start by e v aluating on data with known ground truth. T o this end, we generate data as follows. W e sample p = 10 feature variables X j from a uniform distribution U (0 , 1) , and the survi v al time T from a W eibull distribution for its conﬁgurability , Weibul l (1 . 5 , 5) . Subject to 10% uniform censoring, we create a dataset of n = 10 000 subjects. W e randomly choose k ≤ p features (here, k = 2 ) and sample corresponding bounds, creating a rule that covers 20% of the feature space. W e randomly split the subjects into those to be included in the subgroup (20%) and those not (80%). W e ensure the feature values for the samples designated not to belong to the subgroup to be outside the rule bounds across conditional features. W e then do the same to the subjects designated to belong to the subgroup so the y fulﬁll the rule but also resample their outcomes from a W eib ull distribution ha ving the same shape of 1 . 5 as the ov erall population, but a different scale of 1 . In other words, subgroup subjects e xperience their e vent 5 × earlier on expectation. Now that we hav e our benchmark, we can vary the data or the subgroup generation parameters. Scalability First, we assess the performance of S Y S U RV and its competitors on data where we v ary the number of fea- tures. W e report the a verage F1-scores ov er 10 runs in Fig. 4 (left). W e see that all methods are stable in terms of F1- scores as the dimensionality increases, and that R U L E K I T does not ﬁnish in time for more than 500 features. W e provide the runtimes in Fig. 7 (left) in Appx. H . Censoring Next, we assess robustness to censoring when increasing the percentage of censored subjects. W e giv e the results in Fig. 4 (center); we see that the performance of S Y S U RV drops when the percentage of censored subjects be- 6 Learning and Naming Subgroups with Exceptional Survival Characteristics comes extreme; this also depends on how uninformativ e the censoring is since the censored time is sampled uniformly up to the true survi val time. W e check whether ha ving the subgroup outsurviv e the population, and vice versa, af fects performance by varying the subgroup and ﬁxing the popula- tion hazards, and giv e the results in Fig. 7 (center). Subgroup Size Lastly , we assess retrie val quality as the percentage of subjects that belong to the planted subgroup increases. W e see in Fig. 4 (right) that all methods improve as the planted subgroup size increases, particularly , E S - M A M D S . W e attribute the impro vement in the baselines to them generally preferring lar ger subgroups, according to our observations (Appx. H ). In Fig. 7 (right), we also provide results on sample sufﬁcienc y by varying dataset sizes. S Y S U RV is very stable with average F1-scores of approx. 0.8, up to very high rates of censoring ( > 80% ). F I B E R S is highly unstable, with v ery lar ge standard errors. E S M A M D S is the closest competitor , hov ering around 0.55, closely fol- lowed by R U L E K I T consistently at 0.45, while F I B E R S lags behind in all experiments by a wide margin, with av erage F1-scores of approximately 0.35. 5.2. Real-W orld Data Next, we ev aluate how well S Y S U RV performs on real-world data. T o this end, we consider 13 time-to-ev ent datasets from the SurvSet repository ( Drysdale , 2022 ) that span dif ferent domains and hav e at least 1000 subjects and 7 cov ariates. As for these datasets the ground truth is unkno wn, we mea- sure the quality of the found subgroups using 3 metrics: logrank, mean-shift, and our o wn objecti v e. As before, we compare S Y S U RV to R U L E K I T , E S M A M D S , and F I B E R S . For S Y S U RV , we consider the ﬁrst subgroup it returns. W e giv e the competitors an advantage by considering the best- scoring subgroup out of the top-5 subgroups they return. W e report the results in T able 1 . For mean-shift, we are only interested in the magnitude of the shift, and hence report absolute v alues. W e provide the average subgroup sizes found by each method in T able 2 in Appx. H . W e see that, despite them having an advantage, S Y S U RV outperforms its competitors by a wide margin. F or most datasets it achiev es (much) higher scores, and overall the best average ranks, for each metric. For the Nwtco , Rott2 , Pr ostateSurvival , Actg , and Grace datasets, it even scores between 14.7% to 1091.7% better on each metric. For some datasets, e.g. Unemployment Duration , we ﬁnd that methods excel at their o wn metrics. T o in vestigate this irrespectiv e of these metrics, we consider the Kaplan-Meier surviv al curves ( Kaplan & Meier , 1958 ) of the survi v al sub- groups found by S Y S U RV and R U L E K I T on Unemployment Duration in Fig. 5a . W e see that both found subgroups with substantially longer time until re-employment than 0 5 10 15 20 25 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Unemployment time (w) Surviv al probability S Y S U RV R U L E K I T E S M A M D S Population (a) Unemployment dataset 0 2 4 6 8 Surviv al time (y) (b) Heart attack dataset F igur e 5. Real-world setting. Surviv al subgroups discovered in the unemployment ( a ) and heart attack ( b ) datasets using S Y S U RV and R U L E K I T resp. E S M A M D S . S Y S U R V learns more exceptional subsets of the subgroups discovered by baselines. The shaded areas show 95% conﬁdence interv als. the overall population. The rule R U L E K I T found, “un- employment insur ed” , is v ery succinct. The rule found by S Y S U RV , “ age > 47 . 43 ∧ r eplacement rate ∈ [0 . 11 , 1 . 95] ∧ disr e gard r ate < 0 . 96 ∧ tenur e > 6 . 14 ∧ log(wage) ∈ [4 . 20 , 7 . 36] ” , is more detailed as it describes that subjects who are relati vely old and earned a high wage tend to tak e longer until re-employment. Both are informative. For the heart attack dataset, TRA CE , S Y S U RV ﬁnds a very large subgroup that does not differ much from the popu- lation. By adjusting the hyperparameter γ , we can steer it towards smaller subgroups. When we change its value to 0.025, it ﬁnds a subgroup that at 470 subjects cov- ers more than half the subgroup of 896 subjects found by E S M A M D S . When we inspect the Kaplan-Meier es- timates of the subgroups in Fig. 5b , we see that S Y S U RV in fact outperforms E S M A M D S by a margin. Inspecting the found rules themselves, “ ¬ clinical heart failur e ” for E S M A M D S , and “ wall motion index > 1 . 37 ∧ female ∧ ¬ clinical heart failur e ” for S Y S U RV , we again see that S Y S U RV ﬁnds more detailed descriptions. Overall, these experiments sho w that S Y S U RV performs well on a wide range of settings, and remarkably well compared on the same metrics that the competitors optimize for . It ﬁnds sensible and precisely described subgroups that are generally more exceptional than those found by the closest competitors, while allowing user -adjustable subgroup sizes. 5.3. Case Study: Neck Cancer Last, we qualitati vely e v aluate S Y S U RV on a case study . W e consider a high-dimensional dataset of patients with locally advanced head and neck squamous cell carcinoma (HNSCC) undergoing radiochemotherap y ( Schmidt et al. , 2020 ). The data consists of two cohorts. The ﬁrst, or primary , cohort are patients who only recei ved radiochemotherap y ( n = 136 ). 7 Learning and Naming Subgroups with Exceptional Survival Characteristics 0 20 40 60 80 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Follo w up time (mo) Surviv al probability S 1 ( t ) S 2 ( t ) Pop. S 3 ( t ) S 4 ( t ) (a) Primary cohort 0 20 40 60 80 100 Follo w up time (mo) S 5 ( t ) S 6 ( t ) Pop. S 7 ( t ) S 8 ( t ) (b) Postoperativ e cohort F igur e 6. Case study . S Y S U RV discovers, overall, sets of di verse subgroups in the primary cohort ( a ), and two subgroups of poor responders in the postoperative cohort ( b ) of HNSCC data. The shaded areas show 95% conﬁdence interv als. The second, or postoperativ e, cohort are patients who re- ceiv ed radiochemotherapy after a tumor remo v al operation ( n = 190 ). The cov ariates are tumor gene expressions ( p = 158 ) and the outcome is the time until tumor recur- rence. Censoring is 60% for the ﬁrst and 85% for the second cohort. From a clinical perspectiv e, we are both interested in subgroups of subjects that respond better , as well as those that respond worse to treatment, as this allows more targeted (personalized) treatment. W e run S Y S U RV on both datasets and discov er four sub- groups each. W e show the survi v al curv es in Fig. 6 and gi v e the rules in Appx. H . F or both cohorts, S Y S U RV ﬁnds two subgroups that respond better , and two that respond worse, to the respectiv e treatments than the overall populations. First, consider the primary cohort. W e see in Fig. 6a that approximately half of the population are still ali ve after 80 months. Subgroups S 1 (yellow , n=6) and S 3 (orange, n=28) identify subjects that respond better , while subgroups S 2 (red, n=12) and S 4 (blue, n=25) respond worse. Although the effect sizes are not signiﬁcant under our test, it is highly encouraging that the rules select on meaningful biomarkers. Schmidt et al. ( 2020 ) recently reported on the survi val pro- ﬁle when stratifying primary radiochemotherapy patients based on hypoxia 15- and 26-gene e xpressions. Subgroup S 4 not only shows a similar proﬁle, but like subgroup S 3 also selects subjects based on hypoxia-related genes. Next, we in vestigate the results on the postoperative co- hort. Here, almost 80% of the patients surviv e for more than 80 months (Fig. 6b ). W e are speciﬁcally interested in subgroups S 5 (green, n=12) and S 7 (purple, n=12) as these identify subjects that respond very poorly . Both are signiﬁcant under our statistical model. Like before, the cor- responding rules make biological sense: they both select on genes that relate to cellular communication, metabolism, and DN A repair , which are related to highly aggressi ve tu- mor subtypes ( T oustrup et al. , 2011 ; Lendahl et al. , 2009 ) for which con ventional treatment not only fails to impro v e pa- tient condition but e ven promotes recurrence ( Barker et al. , 2015 ). The learned rules allo w clinicians to pre-screen these subjects and offer them alternate treatment instead. 6. Conclusion W e propose S Y S U RV , an efﬁcient method for survi val sub- group discov ery . S Y S U RV lev erages non-parametric surviv al regression via Random Survi val Forests to learn individual surviv al functions. In particular, we overcome the common drawbacks of ex- isting approaches that make parametric assumptions on the underlying survi v al structure while also obscuring inter- subgroup v ariability by operating at the lev el of aggre gate statistics. S Y S U RV exploits individual-le vel deviations in surviv al with respect to the population in the way it quanti- ﬁes subgroup exceptionality , achieving a prov ably more sen- sitiv e measure compared to prior work. S Y S U RV employs a neural layer whose parameters can be learned to auto- matically select features and cutoffs along their domains to select members of a subgroup. By doing so, S Y S U RV ef fec- tiv ely results in human-interpretable rules that describe sub- groups with exceptional surviv al behavior as it maximizes subgroup exceptionality using our differentiable objectiv e via gradient-based optimization. Extensiv e experiments on real-world datasets demonstrate that S Y S U R V consistently outperforms existing baselines in terms of subgroup exceptionality while also being more efﬁcient. In a case study on neck cancer patients, S Y S U RV discov ers biomarkers that are kno wn to be associated with good resp. poor response to treatment while also identifying nov el subgroups that may warrant further in vestigation. Limitations In S Y S U RV , we consider tabular data; howe ver , many ap- plications in v olve other data modalities such as images or sequences. W e intend to extend S Y S U RV to incorporate such structured data alongside tabular cov ariates in a mean- ingful way to discov er survi v al subgroups characterized by both types of data. This may inv olv e anchoring the discov- ered subgroups in interpretable tabular features as well as emergent visual features, or sequential patterns. S Y S U RV provides a novel method to uncover , possibly nov el, subgroups with e xceptional survi val characteristics. S Y S U RV is strictly associational, and while its results may indicate (unknown) causal mechanisms, it does not provide any guarantees in this regard. Before putting any of its re- sults into (clinical) practice, randomized controlled trials, or other ways to verify causality should be emplo yed. 8 Learning and Naming Subgroups with Exceptional Survival Characteristics Acknowledgements Jaw ad Al Rahw anji is supported by DFG GRK 2853/1 “Neu- roexplicit Models of Language, V ision, and Action”. References Barker , H. E., Paget, J. T ., Khan, A. A., and Harrington, K. J. The tumour microen vironment after radiotherapy: mechanisms of resistance and recurrence. Natur e Reviews Cancer , 15(7):409–425, 2015. Barnwal, A., Cho, H., and Hocking, T . Surviv al Regres- sion with Accelerated F ailure Time Model in XGBoost. Journal of Computational and Graphical Statistics , 31 (4):1292–1302, 2022. Breiman, L. Random Forests. Machine learning , 45(1): 5–32, 2001. Cheerla, A. and Ge v aert, O. Deep learning with multimodal representation for pancancer prognosis prediction. Bioin- formatics , 35(14):i446–i454, 2019. Cox, D. R. Regression Models and Life-Tables. J our- nal of the Royal Statistical Society: Series B (Statistical Methodology) , 34(2):187–202, 1972. Cui, C., T ang, Y ., and Zhang, W . Deep Surviv al Analysis With Latent Clustering and Contrasti ve Learning. IEEE Journal of Biomedical and Health Informatics , 28(5): 3090–3101, 2024. de Boer, J., Dedja, K., and V ens, C. Surviv alL VQ: Inter- pretable supervised clustering and prediction in survi val analysis via Learning V ector Quantization. P attern Recog- nition , 153:110497, 2024. Drysdale, E. SurvSet: An open-source time-to-e vent dataset repository , 2022. URL 2203.03094 . Duiv esteijn, W . and Knobbe, A. Exploiting False Discov eries–Statistical V alidation of Patterns and Quality Measures in Subgroup Discov ery . In 2011 IEEE 11th International Confer ence on Data Mining , pp. 151–160. IEEE, 2011. Duiv esteijn, W ., Feelders, A. J., and Knobbe, A. Excep- tional Model Mining: Supervised descriptive local pattern mining with complex tar get concepts. Data Mining and Knowledge Discovery , 30(1):47–98, 2016. Dunn, O. J. Multiple Comparisons among Means. Journal of the American Statistical Association , 56(293):52–64, 1961. Farooq, A., Mishra, D., and Chaudhury , S. Surviv al Predic- tion in Lung Cancer through Multi-Modal Representation Learning. In 2025 IEEE/CVF W inter Confer ence on Ap- plications of Computer V ision (W ACV) , pp. 3907–3915. IEEE, 2025. Feuerriegel, S., Frauen, D., Melnychuk, V ., Schweisthal, J., Hess, K., Curth, A., Bauer , S., Kilbertus, N., K ohane, I. S., and v an der Schaar , M. Causal machine learning for predicting treatment outcomes. Nature Medicine , 30(4): 958–968, 2024. Fotso, S. Deep Neural Networks for Surviv al Analysis Based on a Multi-Task Framework, 2018. URL https: //arxiv.org/abs/1801.05512 . Friedman, J. H. and Fisher, N. I. Bump hunting in high- dimensional data. Statistics and Computing , 9(2):123– 143, 1999. Grosskreutz, H. and R ¨ uping, S. On subgroup disco very in numerical domains. Data Mining and Knowledge Discovery , 19:210–226, 2009. Gudy ´ s, A., Sikora, M., and Wr ´ obel, Ł . RuleKit: A compre- hensiv e suite for rule-based learning. Knowledge-Based Systems , 194:105480, 2020. Hou, B., W en, Z., Bao, J., Zhang, R., T ong, B., Y ang, S., W en, J., Cui, Y ., Moore, J. H., Saykin, A. J., Huang, H., Thompson, P . M., Ritchie, M. D., Dav atzikos, C., and Shen, L. Interpretable deep clustering survi v al machines for Alzheimer’ s disease subtype discov ery . Medical Im- age Analysis , 97:103231, 2024. Ishwaran, H., Gerds, T . A., K ogalur , U. B., Moore, R. D., Gange, S. J., and Lau, B. M. Random survi v al forests for competing risks. Biostatistics , 15(4):757–773, 2014. Kalofolias, J. and Vreeken, J. Naming the Most Anomalous Cluster in Hilbert Space for Structures with Attrib ute Information. In Pr oceedings of the AAAI Confer ence on Artiﬁcial Intelligence , v olume 36, pp. 4057–4064, 2022. Kaplan, E. L. and Meier , P . Nonparametric Estimation from Incomplete Observations. J ournal of the American Statistical Association , 53(282):457–481, 1958. Kingma, D. P . and Ba, J. Adam: A Method for Stochas- tic Optimization, 2017. URL abs/1412.6980 . Kl ¨ osgen, W . Explora: a multipattern and multistrate gy discovery assistant , pp. 249–271. American Association for Artiﬁcial Intelligence, 1996. ISBN 0262560976. Lavra ˇ c, N., Ka v ˇ sek, B., Flach, P ., and T odorovski, L. Sub- group Discovery with CN2-SD. Journal of Machine Learning Resear c h , 5(Feb):153–188, 2004. 9 Learning and Naming Subgroups with Exceptional Survival Characteristics Leman, D., Feelders, A., and Knobbe, A. Exceptional Model Mining. In Machine Learning and Knowledge Discovery in Databases , pp. 1–16. Springer , 2008. Lendahl, U., Lee, K. L., Y ang, H., and Poellinger , L. Gen- erating speciﬁcity and di versity in the transcriptional re- sponse to hypoxia. Natur e Reviews Genetics , 10(12): 821–832, 2009. Li, Y ., Zhang, X., Hu, J., Xia, W ., Liu, Z., Qin, X., Fei, B., and Zhou, J. SurvFormer: A Transformer Based Frame- work for Survi val Analysis in Insurance Underwriting. In Companion Pr oceedings of the A CM on W eb Confer ence 2025 , pp. 325–333, 2025. Manduchi, L., Marcinkevi ˇ cs, R., Massi, M. C., W eikert, T ., Sauter , A., Gotta, V ., M ¨ uller , T ., V asella, F ., Neidert, M. C., Pﬁster , M., Stieltjes, B., and V ogt, J. E. A Deep Variational Approach to Clustering Surviv al Data. In International Confer ence on Learning Repr esentations , 2022. Mantel, N. Evaluation of survival data and two new rank order statistics arising in its consideration. Can- cer Chemotherapy Reports , 50(3):163–170, 1966. Meng, M., Gu, B., Bi, L., Song, S., Feng, D. D., and Kim, J. DeepMTS: Deep Multi-Task Learning for Surviv al Prediction in Patients With Advanced Nasopharyngeal Carcinoma Using Pretreatment PET/CT. IEEE Journal of Biomedical and Health Informatics , 26(9):4497–4507, 2022. Meng, M., Bi, L., Fulham, M., Feng, D., and Kim, J. Merging-Di verging Hybrid T ransformer Networks for Survi val Prediction in Head and Neck Cancer . In Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 , pp. 400–410. Springer , 2023. Nagpal, C., Y adlowsk y , S., Rostamzadeh, N., and Heller , K. Deep Cox Mixtures for Surviv al Regression. In Machine Learning for Healthcare Conference , volume 149, pp. 674–708. PMLR, 2021. Relator , R. T ., T erada, A., and Sese, J. Identifying sta- tistically signiﬁcant combinatorial markers for surviv al analysis. BMC Medical Genomics , 11(Suppl 2):31, 2018. Saeed, N., Ridzuan, M., Maani, F . A., Alasmawi, H., Nandakumar , K., and Y aqub, M. SurvRNC: Learning Ordered Representations for Surviv al Prediction Using Rank-N-Contrast. In Medical Image Computing and Com- puter Assisted Intervention – MICCAI 2024 , pp. 659–669. Springer , 2024. Schmidt, S., Linge, A., Grosser , M., Lohaus, F ., Gudziol, V ., Now ak, A., Tinhofer , I., Budach, V ., Sak, A., Stuschke, M., Balermpas, P ., R ¨ odel, C., Sch ¨ afer , H., Grosu, A.- L., Abdollahi, A., Debus, J., Ganswindt, U., Belka, C., Pigorsch, S., Combs, S. E., M ¨ onnich, D., Zips, D., Baret- ton, G. B., Buchholz, F ., Baumann, M., Krause, M., and L ¨ ock, S. Comparison of GeneChip, nCounter , and Real- Time PCR–Based Gene Expressions Predicting Locore- gional T umor Control after Primary and Postoperativ e Radiochemotherapy in Head and Neck Squamous Cell Carcinoma. The Journal of Molecular Diagnostics , 22 (6):801–810, 2020. Song, H., Kull, M., Flach, P ., and Kalogridis, G. Subgroup Discov ery with Proper Scoring Rules. In Machine Learn- ing and Knowledge Discovery in Databases , pp. 492–510. Springer , 2016. T oustrup, K., Sørensen, B. S., Nordsmark, M., Busk, M., W iuf, C., Alsner , J., and Over gaard, J. Dev elopment of a Hypoxia Gene Expression Classiﬁer with Predicti v e Im- pact for Hypoxic Modiﬁcation of Radiotherap y in Head and Neck Cancer . Cancer Resear ch , 71(17):5923–5931, 2011. Urbanowicz, R., Bandhey , H., Kamoun, M., Fogarty , N., and Hsieh, Y .-A. Scikit-FIBERS: An’OR’-Rule Dis- cov ery Evolutionary Algorithm for Risk Stratiﬁcation in Right-Censored Surviv al Analyses. In Pr oceedings of the Companion Confer ence on Genetic and Evolutionary Computation , pp. 1846–1854, 2023. V ale-Silva, L. A. and Rohr , K. Long-term cancer surviv al prediction using multimodal deep learning. Scientiﬁc Reports , 11(1):13505, 2021. V imieiro, R., Mattos, J. B., and de Mattos Neto, P . S. Es- mamDS: A more div erse exceptional survi v al model min- ing approach. Information Sciences , 690:121549, 2025. Xu, S., W alter, N. P ., Kalofolias, J., and Vreeken, J. Learn- ing Exceptional Subgroups by End-to-End Maximizing KL-Di ver gence. In International Confer ence on Mac hine Learning , volume 235, pp. 55267–55285. PMLR, 2024. Y u, C.-N., Greiner, R., Lin, H.-C., and Baracos, V . Learn- ing Patient-Speciﬁc Cancer Surviv al Distributions as a Sequence of Dependent Regressors. Advances in Neural Information Pr ocessing Systems , 24, 2011. 10 Learning and Naming Subgroups with Exceptional Survival Characteristics A. Proof of Proposition 3.1 Proposition 3.1. Given two gr oups A and B , selectable by rule σ A and σ B , resp., for which the expected gr oup-level survival at any time t ar e ˆ S A ( t ) and ˆ S B ( t ) , and for which individual-le vel survival is denoted by ˆ S ( t | x ) . The expected absolute differ ence in survival of the subjects selectable by σ A is as, or more , sensitive than the absolute differ ence of ˆ S A ( t ) fr om ˆ S B ( t ) , E x ∼ X [ ℓ 1 t ( s ( x ) , s B ) | σ A ( x ) = 1] ≥ ℓ 1 t ( s A , s B ) , wher e ℓ 1 t ( · , · ) is an absolute dif fer ence measure , and for br evity we write s ◦ for ˆ S ◦ ( t ) , and s ( x ) for ˆ S ( t | x ) . Pr oof. Let Z = s ( X ) − s B be a random variable. The triangle inequality property of the absolute value function states that E [ | Z | ] ≥ | E [ Z ] | . It also holds to condition the expectations using the indicator σ A ( x ) = 1 to get E x ∼ X [ | Z | | σ A ( x ) = 1 ] ≥ | E x ∼ X [ Z | σ A ( x ) = 1] | . W e substitute the deﬁnition of Z into the inequality to get E x ∼ X [ | s ( x ) − s B | | σ A ( x ) = 1] ≥ | E x ∼ X [ s ( x ) − s B | σ A ( x ) = 1] | . W e use the linearity of the expectation, E [ Z + W ] = E [ Z ] + E [ W ] , and note that the e xpectation of the positi v e constant s B is the constant itself to get E x ∼ X [ | s ( x ) − s B | | σ A ( x ) = 1 ] ≥ | E x ∼ X [ s ( x ) | σ A ( x ) = 1] − s B | . Lastly , we substitute the deﬁnitions of ℓ 1 t ( · , · ) and s A to arriv e at E x ∼ X [ ℓ 1 t ( s ( x ) , s B ) | σ A ( x ) = 1 ] ≥ ℓ 1 t ( s A , s B ) . 11 Learning and Naming Subgroups with Exceptional Survival Characteristics B. The S Y S U RV Algorithm In this section, we provide the pseudocode for S Y S U RV in Alg. 1 . Prior to e xecuting S Y S U RV , we ﬁt a global non-parametric population model to the entire dataset RSF ( X , T , ∆) . This giv es per-subject survi val functions ov er the discrete domain of unique e vent times ˆ S D ( t ∗ | X = x ( i ) ) as the matrix M . The rule set R is initialized as ∅ and is populated with subsequently discov ered subgroups, up to a user-deﬁned limit. On the subject level x ( i ) , the following steps are applied. First, each feature x ( i ) j is binned with learned cut points α j and β j using the soft condition to obtain ˆ π ( x ( i ) ; α , β , τ ) . Next, by means of the weights a j , we combine per-feature conditions ˆ π ( x ( i ) j ) into a conjunction ˆ σ ( x ( i ) ; α , β , a , τ ) . Then, we estimate the expected per -subject de viation in survi v al within the subgroup w .r .t. the overall population according to our objectiv e. In a backwards pass, the soft rule parameters are updated to select a more exceptional subgroup for the remainder of the epochs. W e anneal the temperature τ once halfway through the epochs and once three quarters of the way . Note that # events denotes the number of unique event times t ∗ in D , where t ∗ ∈ uniq ( { t i ∼ T | δ i = 1 } ) . The M E A S U R E E X C E P T I O N A L I T Y algorithm, used in Alg. 1 , computes exceptionality using the trapezoidal rule ov er the absolute differences in surviv al between each subject and the reference population surviv al. Algorithm 1 L E A R N S U B G R O U P 1: Input: design matrix X , population model M , rules R , size penalty γ , initial temperature τ 2: Output: a rule ˆ σ that selects an exceptional subgroup 3: α j ← min X j 4: β j ← max X j 5: a j ← 1 6: Initialize soft rule ˆ σ ( · ; α , β , a , τ ) 7: S D ( t ∗ ) ← 1 n P n i =1 S ( t ∗ | x ( i ) ) 8: Exc eptionality D ( x ) ← M E A S U R E E X C E P T I O N A L I T Y ( x , M , S D ( t ∗ )) 9: for e ← 1 to # ep o chs do 10: Memb ership ( x ) ← ˆ σ ( x ; α , β , a , τ ) 11: Size ← 1 n P n i =1 Memb ership ( x ( i ) ) 12: ˆ ϕ ← P n i =1 Memb ership ( x ( i ) ) · Exc eptionality D ( x ( i ) ) 13: Weighte d ˆ ϕ ← ˆ ϕ · Size γ − 1 14: R e gularizer ← 0 15: for all ˆ σ g ∈ R do 16: Size g ← 1 n P n i =1 ˆ σ g ( x ( i ) ) 17: S g ( t ∗ ) ← 1 Size g P n i =1 S ( t ∗ | x ( i ) ) · ˆ σ g ( x ( i ) ) 18: Exc eptionality g ( x ) ← M E A S U R E E X C E P T I O N A L I T Y ( x , M , S g ( t ∗ )) 19: ˆ ϕ g ← P n i =1 Memb ership ( x ( i ) ) · Exc eptionality g ( x ( i ) ) 20: R e gularizer ← R e gularizer + ˆ ϕ g 21: end for 22: R e gularizer ← R e gularizer · Size γ g − 1 23: loss ← − Weighte d ˆ ϕ − R e gularizer 24: Update the parameters of rule ˆ σ to minimize the loss 25: if ( e = # ep o chs / 2) ∨ ( e = # ep o chs · 3 / 4) then 26: τ ← τ / 2 27: end if 28: end for 29: return ˆ σ Algorithm 2 M E A S U R E E X C E P T I O N A L I T Y 1: Input: subject covariates x , population model M , reference surviv al at the event times S r ef ( t ∗ ) 2: Output: subject surviv al exceptionality v alue w .r .t reference 3: abs diﬀ ← | S ( t ∗ | x ) − S r ef ( t ∗ ) | 4: return P # events − 1 u =0 ( t ∗ u +1 − t ∗ u ) / 2 · ( abs diﬀ [ u ] + abs diﬀ [ u + 1]) 12 Learning and Naming Subgroups with Exceptional Survival Characteristics C. Rule Pruning W e provide the pseudocode for our post hoc pruning algorithm that simpliﬁes rules in the presence of collinearity in Alg. 3 . The P R U N E R U L E algorithm is a greedy procedure for reducing the complexity of learned rules without signiﬁcantly altering its subject memberships. It begins with a full rule and iterati vely identiﬁes the least inﬂuential condition and attempts to remov e it by setting its weight to zero. The algorithm then ev aluates each candidate rule by computing the Jaccard similarity between its membership indicators and those of the full rule. This process continues as long as the similarity remains abov e a speciﬁed threshold, ensuring it still describes, more or less, the same subset of the population. It is to (optionally) be ex ecuted after learning a rule using Alg. 1 . Algorithm 3 P RU N E R U L E 1: Input: design matrix X , learned rule ˆ σ , Jaccard similarity thr eshold (default: 0.95) 2: Output: pruned rule ˆ σ prun 3: ˆ σ prun ← ˆ σ 4: mask ← ˆ σ ( X ) = 1 5: while Jac c ar d ( mask , ˆ σ prun ( X ) = 1) ≥ thr eshold do 6: indic es ← { j | R eLU ( a ˆ σ prun j ) > 0 . 1 } 7: sc or e ⋆ ← 0 8: idx ⋆ ← nul l 9: for each j ∈ indic es do 10: ˆ σ ′ ← ˆ σ prun 11: a ˆ σ ′ j ← 0 12: sc or e ← Jac c ar d ( mask , ˆ σ ′ ( X ) = 1) 13: if sc or e > sc or e ⋆ then 14: sc or e ⋆ ← sc or e 15: idx ⋆ ← j 16: end if 17: end for 18: if sc or e ⋆ < thr eshold then 19: break 20: end if 21: a ˆ σ prun idx ⋆ ← 0 22: end while 23: return ˆ σ prun 13 Learning and Naming Subgroups with Exceptional Survival Characteristics D. Statistical V alidation W e provide the pseudocode for b uilding the model of false disco veries (DFD) in Alg. 4 . The B U I L D D F D algorithm is a statistical validation procedure that quantiﬁes the exceptionality that can be expected by chance. By repeatedly shufﬂing the surviv al outcomes relative to the design matrix, the algorithm creates a series of null datasets where no true relationship exists. In each iteration, it runs S Y S U RV while recording the exceptionality scores obtained from these null datasets. It returns the mean and standard de viation that parametrize a normal distrib ution of false exceptionalities. This distribution serves as a baseline for judging the signiﬁcance of disco vered subgroups in terms of their e xceptionalities, subject to multiple hypothesis testing. For that, we use the Z-test to obtain one-tailed p-v alues with a nominal alpha cutoff of 0.05. It is to be ex ecuted before running Alg. 1 . # r uns should be no less than 1000. Algorithm 4 B U I L D D F D 1: Input: design matrix X , time-to-event T , event indicator ∆ , size penality γ , initial temperature τ , number of independent samples # runs (default: 1000), number of desired subgroups m (default: 1) 2: Output: parameters of the Guassian representing the distribution of false disco veries, mean µ , and standard deviation η 3: sc or es ← ∅ 4: indic es ← { 1 , . . . , n } 5: p ermutations ← { P r | P r is a random permutation of indic es , r = 1 . . . # r uns } 6: for each P r ∈ P do 7: T ′ ← T [ P r ] 8: ∆ ′ ← ∆[ P r ] 9: M ← RSF ( X , T ′ , ∆ ′ ) 10: R ← ∅ 11: for { 1 , . . . , m } do 12: R ← { R ∪ L E A R N S U B G RO U P ( X , M , R, γ , τ ) } 13: end for 14: sc or es ← { scor es ∪ max( { ˆ ϕ | ˆ ϕ is the exceptionality of each rule ˆ σ ∈ R } ) } 15: end for 16: µ ← mean ( sc or es ) 17: η ← std ( sc or es ) 18: return µ, η 14 Learning and Naming Subgroups with Exceptional Survival Characteristics E. Hyperparameters of S Y S U R V and Baselines In this section, we discuss the hyperparameter choices for S Y S U RV and the baselines used in our experiments. S Y S U RV W e have two core hyperparameters ( γ and τ ) in S Y S U RV , and some more dictated by the choice of the regression model. The most important parameter is γ . T uning γ is crucial for steering the rules towards selecting subgroups of a desired size range. V alues of γ range from 0.0 (very small subgroup) to 1.0 (a very large subgroup). For our experiments, we optimize γ and found that γ ∈ [0 , 0 . 2] was almost always selected. As for τ , we found that τ = 0 . 2 worked well across settings and datasets. W e hav e secondary parameters, namely , the number of epochs and the learning rate used for stochastic optimization. Generally , 1000 epochs paired with a learning rate of 0.01 ensured stable con v ergence. In this work, we instantiated S Y S U R V using Random Survi v al Forests, b ut in theory it can be done using any non-parametric continuous-time regression model. W e do not sample features for each tree, despite the computational cost, to avoid missing important features. Howe ver , we do sample subjects, especially when n is very lar ge. T o train the RSF , we used 100 trees (300 for the case study , to account for its rather small dataset sizes), a maximum depth equal to double the number of features, and a maximum of 2000 subjects per tree. A rule of thumb for getting a reliable splitting is to have at least 10-20 ev ents in a cohort. As such, we set the minimum number of subjects to split to 40 and, correspondingly , the minimum number of subjects in a leaf to 20. Baselines W e use the implementations of the baselines as provided by their authors. W e take the top-5 subgroups discov ered by each baseline for e v aluation. W e preprocess the data in the same w ay for each of the baselines. This only in volves the use of a k -bin discretizer for continuous features using k = 5 and the quantile strategy . F . Computational Complexity of S Y S U RV Modern implementations of RSFs are ef ﬁcient. Nevertheless, ﬁtting the population model requires the most time. During training, we have a computational complexity of O (# tr e es · N · V · d/ # c or es ) . # tr ees is ﬁxed, N is the number of subjects assigned to a tree, V is the number of features considered for splitting, d is the tree depth, and # c or es is also ﬁxed. After training, the computational complexity of predicting on the entire dataset is O (# tr e es · N · d/ # c or es ) , which happens only once, and the results are sav ed. During learning, the computatonal complexity is O (# ep o chs · N · V ) , where # ep o chs is ﬁxed. W e omit this in Alg. 1 , but we save unchanging data structures for efﬁcienc y . All of the computations are done using highly optimized matrix operations through specialized libraries. 15 Learning and Naming Subgroups with Exceptional Survival Characteristics G. Survival Data Generation In Alg. 5 , we provide the pseudocode for , M A K E S U RV I V A L D A TA , the routine we use for generating synthetic surviv al data. M A K E S U RV I V A L D A T A embeds a hidden subgroup within a larger population. It selects k features to deﬁne a hyper-box, or tar get , re gion in the feature space and partitions subjects based on whether their features fall within these bounds. It assigns subjects different W eibull-distrib uted e vent times based on their subgroup status, while also incorporating a linear covariate inﬂuence for variability . It also validates the resulting scales and adjusts them such that they are alw ays positi ve and do not ov erlap between the subgroup and the population. Finally , it applies a censoring mechanism to a percentage of the subjects, where censored subjects hav e their e vent times dra wn uniformly between 0 and their true e v ent time. Algorithm 5 M A K E S U R V I V A L D AT A 1: Input: number of samples n , number of cov ariates p , number of true conditions k , W eibull scale of non-subgroup subjects sc ale nsg , W eibull shape of non-subgroup subjects shap e nsg , W eibull scale of subgroup subjects sc ale sg , W eibull shape of subgroup subjects shap e sg , percentage of subjects in the subgroup r atio tar get , percentage of censored subject outcomes r atio c ens 2: Output: synthetic design matrix X , synthetic time-to-event T , synthetic e vent indicator ∆ 3: V ← Sample ( p, k , without replacement ) 4: ϵ ← k √ r atio tar get 5: for j ← 1 to k do 6: α j ∼ Uniform (0 , 1 − ϵ ) 7: end for 8: X ∼ Uniform (0 , 1) n × p 9: for j ← 1 to k do 10: X sg [: , j ] ∼ Uniform ( α i , α i + ϵ ) n 11: X nsg [: , j ] ∼ Uniform (0 , 1) n 12: while ∃ x nsg [ j ] | α j ≤ x nsg [ j ] ≤ α j + ϵ do 13: X nsg [: , j ] ∼ Uniform (0 , 1) n 14: end while 15: end for 16: σ ∼ Bernoul li ( r atio tar get ) n 17: for j ← 1 to k do 18: X [ σ = 1 , v j ] ← X sg [ σ = 1 , j ] 19: X [ σ = 0 , v j ] ← X nsg [ σ = 0 , j ] 20: end for 21: N 0 ← P k j =1 X [: , v j ] − k / 2 22: ψ ← 1 23: sc ale sg ← sc ale sg + N 0 · ψ 24: sc ale nsg ← sc ale nsg + N 0 · ψ 25: while min ( sc ale sg ) < 0 ∨ min ( sc ale nsg ) < 0 ∨ | Supp ( sc ale nsg ) ∩ Supp ( sc ale sg ) | > 0 do 26: ψ ← ψ · 0 . 9 27: end while 28: Y sg ← sc ale sg · Weibul l ( shap e sg , 1) 29: Y nsg ← sc ale nsg · Weibul l ( shap e nsg , 1) 30: Y [ σ = 1] ← Y sg [ σ = 1] 31: Y [ σ = 0] ← Y nsg [ σ = 0] 32: ∆ ∼ Bernoul li (1 − r atio c ens ) 33: T [∆ = 0] ∼ Uniform (0 , Y [∆ = 0]) 34: T [∆ = 1] ← Y [∆ = 1] 35: return X , T , ∆ 16 Learning and Naming Subgroups with Exceptional Survival Characteristics 10 1 10 2 10 3 1 2 3 4 0 5 Number of features p Runtime (h) 0.2 1.0 1.8 0 . 2 0 . 4 0 . 6 0 . 8 0 0 1 0 Population-Subgroup hazard ratio F1-score S Y S U RV R U L E K I T E S M A M D S F I B E R S 2 10 2 12 2 14 0 . 2 0 . 4 0 . 6 0 . 8 0 0 1 0 Number of subjects n F1-score F igur e 7. Synthetic setting. Comparison of S Y S U RV and each of R U L E K I T , E S M A M D S , and F I B E R S in terms of runtime recovering planted subgroups with increasingly large datasets (Left) . Lower is better . Also, in terms of F1-score, comparisons between S Y S U RV and baselines under varying population-subgroup hazard ratios (Center) , where the hazard ratio indicates the rate by which the population outsurviv es the subgroup, or vice versa, and increasingly lar ge datasets (Right) . Higher is better . The shaded areas show ±1 standard error ov er 10 runs. E S M A M D S is the closest competitor to S Y S U RV closely followed by R U L E K I T . H. Additional Results In this section, we provide additional e xperimental results on our synthetic benchmark, where we vary dif ferent aspects of the data and subgroup generation processes. Also, we pro vide additional results from our real-w orld data regarding subgroup sizes found by each method. Lastly , we provide the full rules discovered in our case study along with their pruned versions. Synthetic Data W e assess the runtimes of S Y S U R V and each of baselines from our scalability experiment in Fig. 4 (left). In Fig. 7 (left), we see that all of the methods hav e consistent runtimes with the exception of E S M A M D S , which varies quite a lot as the number of dimensions increases. Expectedly , E S M A M D S and R U L E K I T require increasingly more time unlike S Y S U RV and F I B E R S . W e attribute some of the increase in runtime of S Y S U RV to learning the population model, which v aries with the choice of the regressor . Overall, S Y S U RV is the fastest method when factoring in retriev al performance. E S M A M D S is the closest competitor but needs around 4 times more time for 1000 features. Next, we assess the retriev al performance of each method under v arying population-subgroup hazard ratios in Fig. 7 (center). The hazard ratio indicates the rate by which the population outsurvi v es the subgroup, or vice v ersa. W e can see that S Y S U RV consistently outperforms all baselines across all hazard ratios. As the hazard ratio approaches 1.0, the retrie v al performance of all methods degrades since the survi val distributions of the subgroup and population become increasingly similar . Lastly , we assess the retrie v al performance of each method under increasingly lar ge datasets in Fig. 7 (right). W e see that S Y S U RV consistently outperforms all baselines across all dataset sizes. As the dataset size increases, the stability of S Y S U RV improv es, since more data is av ailable to learn from. 17 Learning and Naming Subgroups with Exceptional Survival Characteristics T able 2. A verage subgroup sizes per method across real-world datasets having n subjects, p features, and Cens.% censoring percentages. n S Y S U RV R U L E K I T E S M A M D S F I B E R S p Cens.% UnempDur 3241 112.0 1620.5 1620.5 1585.6 7 38.7% Nwtco 4028 168.0 2014.0 1490.0 4027.0 14 85.8% Rott2 2982 320.0 1825.7 1007.2 1210.6 18 57.3% Rdata 1040 140.0 520.0 268.8 503.2 8 47.4% Aids2 2839 306.0 1342.6 503.7 2834.0 15 37.9% Dialysis 6805 2665.0 4095.7 2636.8 2568.6 74 76.4% TRA CE 1878 1878.0 1113.5 657.3 1877.0 10 48.9% Support2 9105 33.0 265.0 204.8 451.0 76 47.1% DataDIV A T2 1837 63.0 918.5 720.7 870.2 7 68.3% ProstateSurv . 14294 1840.0 7147.0 5666.5 11127.2 9 94.4% Actg 1151 38.0 445.0 334.6 1149.0 24 91.6% Scania 1931 122.0 729.6 431.2 1272.2 12 43.7% Grace 1000 33.0 500.0 321.0 760.2 7 67.5% Real-W orld Data In T able 2 , we present the av erage subgroup sizes discovered by each method across datasets in our real-world experiments. W e can see that S Y S U RV consistently discov ers much smaller subgroups than all baselines. This is expected since S Y S U RV directly optimizes for subgroup size via the size penalty γ . In contrast, the baselines do not ha ve a direct mechanism for controlling subgroup size, leading to larger subgroups. Case Study W e present in T ables 3 and 4 examples of rules for the primary resp. postoperative RCT cohorts from our case study . W e also present the corresponding pruned rules, which describe subgroups whose Jaccard similarities in terms of the membership indicators to the original subgroups are no less than the threshold of 0.95. W e also provide the resulting changes in subgroup size and exceptionality . For this, we use the post hoc pruning algorithm P R U N E R U L E , which we introduce in Alg. 3 . W e can see that the rules are reduced to a combination of a handful of conditions after pruning. In the very ﬁrst rule, we see a drastic reduction in rule size. This goes to show ho w the problem of collinearity is ampliﬁed when n ≈ p , when in reality , rules having only a handful of conditions could sufﬁce. Nev ertheless, this post hoc pruning method cannot narro w down the predicates to those that, in the gene expression case, acti vate the rest of the genes in the same gene signature. 18 Learning and Naming Subgroups with Exceptional Survival Characteristics T able 3. Rules learned on the primary RCT cohort before and after pruning using a Jaccard similarity threshold of 0 . 95 for membership indicators along with the respectiv e changes in subgroup sizes and exceptionality . Original rule Pruned rule Changes “ ACTN1 ∈ [ − 3 . 3 , − 0 . 9] ∧ AKT1 ∈ [ − 3 . 6 , − 1 . 9] ∧ ALDH3A1 ∈ [ − 8 . 9 , − 1 . 4] ∧ ANLN ∈ [ − 4 . 3 , − 2 . 2] ∧ BCL2L1 ∈ [ − 7 . 2 , − 5 . 3] ∧ BNIP3L ∈ [ − 3 . 0 , − 0 . 6] ∧ CA V1 ∈ [ − 4 . 4 , − 1 . 1] ∧ CDKN3 ∈ [ − 4 . 5 , − 2 . 5] ∧ CXCL12 ∈ [ − 8 . 2 , − 2 . 6] ∧ DCBLD1 ∈ [ − 6 . 9 , − 4 . 7] ∧ ERBB2 ∈ [ − 1 . 9 , 0 . 7] ∧ ERBB3 ∈ [ − 7 . 0 , − 2 . 9] ∧ GNAI1 ∈ [ − 5 . 2 , − 0 . 0] ∧ IGF1R ∈ [ − 3 . 5 , − 0 . 9] ∧ LOX ∈ [ − 6 . 4 , − 3 . 3] ∧ MTOR ∈ [ − 3 . 2 , − 2 . 3] ∧ NOTCH1 ∈ [ − 4 . 9 , − 2 . 6] ∧ PLAU ∈ [ − 4 . 5 , − 1 . 2] ∧ RB1 ∈ [ − 3 . 8 , − 2 . 4] ∧ RELA ∈ [ − 3 . 2 , − 1 . 8] ∧ RFC4 ∈ [ − 4 . 6 , − 1 . 6] ∧ RP A2 ∈ [ − 4 . 0 , − 2 . 1] ∧ SNAI1 ∈ [ − 6 . 9 , − 3 . 3] ∧ TCF3 ∈ [ − 3 . 2 , − 1 . 3] ∧ TGFB1 ∈ [ − 2 . 6 , − 1 . 0] ∧ TPI1 ∈ [ − 7 . 2 , − 5 . 3] ∧ XP A ∈ [ − 4 . 7 , − 1 . 6] ∧ XRCC1 ∈ [ − 4 . 2 , − 2 . 6] ” “ CDKN3 ∈ [ − 4 . 5 , − 2 . 5] ∧ ERBB2 ∈ [ − 1 . 9 , 0 . 7] ∧ LOX ∈ [ − 6 . 4 , − 3 . 3] ∧ RB1 ∈ [ − 3 . 8 , − 2 . 4] ∧ RP A2 ∈ [ − 4 . 0 , − 2 . 1] ∧ TGFB1 ∈ [ − 2 . 6 , − 1 . 0] ” Subgroup size: 6 → 6 Exceptionality: 19 . 1051 → 19 . 1037 “ AKT1 ∈ [ − 3 . 2 , − 1 . 2] ∧ ALDH1A1 ∈ [ − 8 . 1 , − 2 . 3] ∧ ASS1 ∈ [ − 4 . 4 , − 0 . 4] ∧ CENPK ∈ [ − 6 . 1 , − 4 . 4] ∧ CHEK2 ∈ [ − 5 . 5 , − 3 . 8] ∧ CXCR4 ∈ [ − 6 . 6 , − 2 . 8] ∧ CYP1B1 ∈ [ − 11 . 3 , − 4 . 5] ∧ ERBB3 ∈ [ − 5 . 5 , − 3 . 5] ∧ ERBB4 ∈ [ − 10 . 6 , − 7 . 6] ∧ FGFR3 ∈ [ − 7 . 1 , − 3 . 1] ∧ FOSL1 ∈ [ − 5 . 3 , − 0 . 7] ∧ GNAI1 ∈ [ − 3 . 2 , − 0 . 4] ∧ GPI ∈ [ − 2 . 0 , 0 . 2] ∧ HIF1A ∈ [ − 6 . 8 , − 4 . 7] ∧ KDR ∈ [ − 5 . 5 , − 3 . 5] ∧ LDHA ∈ [ − 6 . 9 , − 4 . 3] ∧ LGALS1 ∈ [ − 2 . 6 , 1 . 0] ∧ LIMD1 ∈ [ − 6 . 2 , − 4 . 7] ∧ LOX ∈ [ − 4 . 1 , − 0 . 9] ∧ MCM6 ∈ [ − 5 . 9 , − 3 . 7] ∧ MET ∈ [ − 5 . 3 , − 2 . 8] ∧ MMP7 ∈ [ − 9 . 2 , − 3 . 4] ∧ MYC ∈ [ − 2 . 8 , 0 . 3] ∧ NBN ∈ [ − 3 . 7 , − 2 . 7] ∧ PDK1 ∈ [ − 4 . 9 , − 3 . 5] ∧ PFKFB3 ∈ [ − 3 . 9 , − 0 . 7] ∧ RASSF6 ∈ [ − 7 . 6 , − 5 . 0] ∧ RB1 ∈ [ − 4 . 4 , − 2 . 5] ∧ SYK ∈ [ − 6 . 0 , − 3 . 9] ∧ SYNGR3 ∈ [ − 8 . 9 , − 4 . 7] ∧ TPI1 ∈ [ − 7 . 2 , − 4 . 8] ∧ XRCC1 ∈ [ − 5 . 4 , − 3 . 4] ∧ CD44 ∈ [ − 0 . 1 , 2 . 1] ” “ ALDH1A1 ∈ [ − 8 . 1 , − 2 . 3] ∧ ASS1 ∈ [ − 4 . 4 , − 0 . 4] ∧ CYP1B1 ∈ [ − 11 . 3 , − 4 . 5] ∧ ERBB3 ∈ [ − 5 . 5 , − 3 . 5] ∧ ERBB4 ∈ [ − 10 . 6 , − 7 . 6] ∧ FOSL1 ∈ [ − 5 . 3 , − 0 . 7] ∧ GNAI1 ∈ [ − 3 . 2 , − 0 . 4] ∧ GPI ∈ [ − 2 . 0 , 0 . 2] ∧ HIF1A ∈ [ − 6 . 8 , − 4 . 7] ∧ KDR ∈ [ − 5 . 5 , − 3 . 5] ∧ LDHA ∈ [ − 6 . 9 , − 4 . 3] ∧ LGALS1 ∈ [ − 2 . 6 , 1 . 0] ∧ LIMD1 ∈ [ − 6 . 2 , − 4 . 7] ∧ LOX ∈ [ − 4 . 1 , − 0 . 9] ∧ MCM6 ∈ [ − 5 . 9 , − 3 . 7] ∧ MET ∈ [ − 5 . 3 , − 2 . 8] ∧ MMP7 ∈ [ − 9 . 2 , − 3 . 4] ∧ MYC ∈ [ − 2 . 8 , 0 . 3] ∧ NBN ∈ [ − 3 . 7 , − 2 . 7] ∧ PDK1 ∈ [ − 4 . 9 , − 3 . 5] ∧ PFKFB3 ∈ [ − 3 . 9 , − 0 . 7] ∧ RASSF6 ∈ [ − 7 . 6 , − 5 . 0] ∧ RB1 ∈ [ − 4 . 4 , − 2 . 5] ∧ SYK ∈ [ − 6 . 0 , − 3 . 9] ∧ SYNGR3 ∈ [ − 8 . 9 , − 4 . 7] ∧ TPI1 ∈ [ − 7 . 2 , − 4 . 8] ∧ XRCC1 ∈ [ − 5 . 4 , − 3 . 4] ∧ CD44 ∈ [ − 0 . 1 , 2 . 1] ” Subgroup size: 12 → 12 Exceptionality: 16 . 6102 → 16 . 6079 “ ADM ∈ [ − 5 . 3 , − 0 . 8] ∧ AKT1 ∈ [ − 3 . 6 , − 1 . 7] ∧ ALDH3A1 ∈ [ − 8 . 9 , − 0 . 5] ∧ ANLN ∈ [ − 4 . 3 , − 2 . 8] ∧ ASS1 ∈ [ − 4 . 9 , − 0 . 0] ∧ A TP5G3 ∈ [ − 1 . 3 , 1 . 4] ∧ A TR ∈ [ − 5 . 9 , − 3 . 8] ∧ BCL2L1 ∈ [ − 7 . 8 , − 4 . 9] ∧ BIRC5 ∈ [ − 4 . 7 , − 1 . 0] ∧ CA V1 ∈ [ − 4 . 4 , 0 . 3] ∧ CDKN3 ∈ [ − 5 . 9 , − 2 . 5] ∧ CLDN4 ∈ [ − 5 . 0 , − 0 . 6] ∧ CXCL12 ∈ [ − 8 . 2 , − 3 . 2] ∧ ERBB3 ∈ [ − 6 . 3 , − 3 . 2] ∧ ERCC5 ∈ [ − 5 . 2 , − 3 . 6] ∧ F AM83B ∈ [ − 7 . 4 , − 1 . 7] ∧ FL T1 ∈ [ − 5 . 7 , − 3 . 1] ∧ GNAI1 ∈ [ − 5 . 2 , − 0 . 2] ∧ IGF1R ∈ [ − 3 . 5 , − 0 . 6] ∧ INHBA ∈ [ − 4 . 4 , 1 . 9] ∧ KDR ∈ [ − 5 . 5 , − 2 . 8] ∧ LDHA ∈ [ − 7 . 5 , − 5 . 7] ∧ LGALS1 ∈ [ − 2 . 1 , 1 . 0] ∧ LOX ∈ [ − 6 . 4 , − 1 . 4] ∧ MDM2 ∈ [ − 4 . 6 , − 2 . 2] ∧ MMP13 ∈ [ − 9 . 3 , 1 . 3] ∧ MRE11A ∈ [ − 6 . 1 , − 3 . 6] ∧ MTOR ∈ [ − 3 . 7 , − 2 . 3] ∧ NOTCH1 ∈ [ − 5 . 9 , − 2 . 6] ∧ P4HA2 ∈ [ − 4 . 7 , − 1 . 4] ∧ PSMD9 ∈ [ − 4 . 2 , − 2 . 1] ∧ PTEN ∈ [ − 7 . 3 , − 5 . 6] ∧ RAD23B ∈ [ − 0 . 9 , 0 . 8] ∧ RB1 ∈ [ − 3 . 8 , − 2 . 3] ∧ RP A2 ∈ [ − 4 . 9 , − 2 . 1] ∧ SERPINB2 ∈ [ − 6 . 6 , − 1 . 2] ∧ TCF3 ∈ [ − 3 . 5 , − 1 . 0] ∧ TGFB1 ∈ [ − 2 . 6 , − 0 . 3] ∧ TPI1 ∈ [ − 7 . 2 , − 5 . 0] ∧ XRCC1 ∈ [ − 5 . 3 , − 2 . 6] ∧ XRCC5 ∈ [ − 2 . 5 , − 0 . 8] ∧ CD44 ∈ [ − 0 . 8 , 2 . 5] ” “ ANLN ∈ [ − 4 . 3 , − 2 . 8] ∧ A TR ∈ [ − 5 . 9 , − 3 . 8] ∧ CLDN4 ∈ [ − 5 . 0 , − 0 . 6] ∧ CXCL12 ∈ [ − 8 . 2 , − 3 . 2] ∧ ERBB3 ∈ [ − 6 . 3 , − 3 . 2] ∧ ERCC5 ∈ [ − 5 . 2 , − 3 . 6] ∧ LDHA ∈ [ − 7 . 5 , − 5 . 7] ∧ LGALS1 ∈ [ − 2 . 1 , 1 . 0] ∧ MDM2 ∈ [ − 4 . 6 , − 2 . 2] ∧ MTOR ∈ [ − 3 . 7 , − 2 . 3] ∧ PTEN ∈ [ − 7 . 3 , − 5 . 6] ∧ RAD23B ∈ [ − 0 . 9 , 0 . 8] ∧ RB1 ∈ [ − 3 . 8 , − 2 . 3] ∧ SERPINB2 ∈ [ − 6 . 6 , − 1 . 2] ∧ TCF3 ∈ [ − 3 . 5 , − 1 . 0] ∧ TGFB1 ∈ [ − 2 . 6 , − 0 . 3] ∧ TPI1 ∈ [ − 7 . 2 , − 5 . 0] ” Subgroup size: 28 → 29 Exceptionality: 16 . 0098 → 15 . 7934 “ AKT1 ∈ [ − 3 . 6 , − 1 . 5] ∧ ALDH3A1 ∈ [ − 8 . 9 , − 0 . 5] ∧ ANLN ∈ [ − 4 . 5 , − 2 . 6] ∧ ASS1 ∈ [ − 4 . 9 , − 0 . 0] ∧ A TP5G3 ∈ [ − 1 . 3 , 1 . 7] ∧ A TR ∈ [ − 5 . 9 , − 3 . 7] ∧ BCL2L1 ∈ [ − 7 . 7 , − 5 . 0] ∧ BIRC5 ∈ [ − 5 . 3 , − 1 . 3] ∧ BNIP3L ∈ [ − 3 . 1 , − 0 . 6] ∧ CA V1 ∈ [ − 4 . 4 , 0 . 4] ∧ CDKN3 ∈ [ − 5 . 8 , − 2 . 5] ∧ CLDN4 ∈ [ − 5 . 0 , − 0 . 7] ∧ CXCL12 ∈ [ − 8 . 2 , − 2 . 2] ∧ ERBB3 ∈ [ − 6 . 2 , − 3 . 2] ∧ ERBB4 ∈ [ − 10 . 6 , − 7 . 4] ∧ ERCC5 ∈ [ − 5 . 2 , − 3 . 6] ∧ F AM83B ∈ [ − 6 . 8 , − 1 . 6] ∧ GNAI1 ∈ [ − 5 . 2 , − 0 . 2] ∧ HIF1A ∈ [ − 7 . 7 , − 4 . 7] ∧ IGF1R ∈ [ − 3 . 5 , − 0 . 6] ∧ INHBA ∈ [ − 4 . 8 , 1 . 8] ∧ KDR ∈ [ − 5 . 5 , − 3 . 6] ∧ LDHA ∈ [ − 7 . 5 , − 5 . 3] ∧ LGALS1 ∈ [ − 2 . 6 , 1 . 0] ∧ LOX ∈ [ − 6 . 4 , − 1 . 7] ∧ MCM6 ∈ [ − 5 . 9 , − 3 . 0] ∧ MMP13 ∈ [ − 9 . 3 , 1 . 3] ∧ MRGBP ∈ [ − 5 . 0 , − 2 . 8] ∧ MTOR ∈ [ − 3 . 8 , − 2 . 3] ∧ P4HA2 ∈ [ − 4 . 7 , − 1 . 4] ∧ PDK1 ∈ [ − 4 . 9 , − 2 . 3] ∧ PGAM1 ∈ [ − 2 . 6 , − 0 . 7] ∧ PSMD9 ∈ [ − 4 . 2 , − 2 . 0] ∧ PTEN ∈ [ − 7 . 3 , − 5 . 5] ∧ RAD23B ∈ [ − 0 . 8 , 0 . 8] ∧ RB1 ∈ [ − 3 . 9 , − 2 . 3] ∧ TCF3 ∈ [ − 3 . 4 , − 1 . 4] ∧ TGFB1 ∈ [ − 2 . 6 , − 0 . 0] ∧ TP53 ∈ [ − 4 . 3 , − 1 . 0] ∧ TPI1 ∈ [ − 7 . 2 , − 4 . 9] ∧ CD44 ∈ [ − 0 . 4 , 2 . 1] ” “ ALDH3A1 ∈ [ − 8 . 9 , − 0 . 5] ∧ CLDN4 ∈ [ − 5 . 0 , − 0 . 7] ∧ ERBB3 ∈ [ − 6 . 2 , − 3 . 2] ∧ ERBB4 ∈ [ − 10 . 6 , − 7 . 4] ∧ HIF1A ∈ [ − 7 . 7 , − 4 . 7] ∧ LDHA ∈ [ − 7 . 5 , − 5 . 3] ∧ MMP13 ∈ [ − 9 . 3 , 1 . 3] ∧ MRGBP ∈ [ − 5 . 0 , − 2 . 8] ∧ P4HA2 ∈ [ − 4 . 7 , − 1 . 4] ∧ PDK1 ∈ [ − 4 . 9 , − 2 . 3] ∧ PGAM1 ∈ [ − 2 . 6 , − 0 . 7] ∧ PTEN ∈ [ − 7 . 3 , − 5 . 5] ∧ RAD23B ∈ [ − 0 . 8 , 0 . 8] ∧ TP53 ∈ [ − 4 . 3 , − 1 . 0] ∧ TPI1 ∈ [ − 7 . 2 , − 4 . 9] ” Subgroup size: 25 → 26 Exceptionality: 15 . 8073 → 15 . 2069 19 Learning and Naming Subgroups with Exceptional Survival Characteristics T able 4. Rules learned on the postoperative RCT cohort before and after pruning using a Jaccard similarity threshold of 0 . 95 for membership indicators along with the respectiv e changes in subgroup sizes and exceptionality . Original rule Pruned rule Changes “ ALDH1A1 ∈ [ − 7 . 0 , − 3 . 2] ∧ BCL2L1 ∈ [ − 6 . 9 , − 5 . 5] ∧ BIRC5 ∈ [ − 4 . 6 , − 1 . 6] ∧ BNIP3L ∈ [ − 2 . 5 , − 1 . 1] ∧ CA9 ∈ [ − 9 . 0 , − 4 . 4] ∧ CBX4 ∈ [ − 5 . 3 , − 3 . 7] ∧ CLDN4 ∈ [ − 5 . 3 , − 1 . 3] ∧ CXCL12 ∈ [ − 5 . 2 , − 1 . 1] ∧ EGLN3 ∈ [ − 4 . 4 , − 0 . 9] ∧ ENO1 ∈ [ − 0 . 2 , 1 . 2] ∧ ENO2 ∈ [ − 5 . 8 , − 2 . 4] ∧ EPHA1 ∈ [ − 7 . 2 , − 4 . 6] ∧ GPI ∈ [ − 1 . 7 , − 0 . 2] ∧ HSPB1 ∈ [ − 4 . 0 , − 2 . 0] ∧ ITGB2 ∈ [ − 4 . 7 , − 2 . 3] ∧ KCTD11 ∈ [ − 2 . 3 , 0 . 1] ∧ MAP2K2 ∈ [ − 4 . 3 , − 3 . 3] ∧ MCM6 ∈ [ − 5 . 8 , − 4 . 0] ∧ MMP13 ∈ [ − 1 . 9 , 1 . 7] ∧ MMP9 ∈ [ − 2 . 7 , 0 . 6] ∧ MPRS17 ∈ [ − 5 . 0 , − 1 . 2] ∧ PFKFB3 ∈ [ − 2 . 7 , − 0 . 2] ∧ PTEN ∈ [ − 6 . 6 , − 5 . 1] ∧ RAD23B ∈ [ − 1 . 0 , 0 . 9] ∧ RB1 ∈ [ − 5 . 0 , − 2 . 5] ∧ RELA ∈ [ − 2 . 9 , − 1 . 6] ∧ SERPINB2 ∈ [ − 7 . 6 , − 2 . 6] ∧ SLC5A1 ∈ [ − 9 . 9 , − 3 . 6] ∧ TCF3 ∈ [ − 3 . 8 , − 2 . 3] ∧ XRCC1 ∈ [ − 5 . 2 , − 3 . 2] ∧ XRCC4 ∈ [ − 6 . 1 , − 4 . 2] ∧ XRCC5 ∈ [ − 2 . 1 , − 0 . 6] ” “ EPHA1 ∈ [ − 7 . 2 , − 4 . 6] ∧ GPI ∈ [ − 1 . 7 , − 0 . 2] ∧ KCTD11 ∈ [ − 2 . 3 , 0 . 1] ∧ MAP2K2 ∈ [ − 4 . 3 , − 3 . 3] ∧ MMP13 ∈ [ − 1 . 9 , 1 . 7] ∧ MMP9 ∈ [ − 2 . 7 , 0 . 6] ∧ XRCC5 ∈ [ − 2 . 1 , − 0 . 6] ” Subgroup size: 12 → 12 Exceptionality: 27 . 8189 → 27 . 7435 “ ANXA5 ∈ [ − 3 . 0 , − 0 . 9] ∧ A TM ∈ [ − 3 . 8 , − 1 . 6] ∧ A TP5G3 ∈ [ − 1 . 1 , 1 . 0] ∧ BNIP3 ∈ [ − 6 . 1 , − 2 . 6] ∧ BSG ∈ [ − 9 . 4 , − 5 . 4] ∧ CD24 ∈ [ − 8 . 8 , − 4 . 5] ∧ CXCL12 ∈ [ − 6 . 2 , − 1 . 3] ∧ DCBLD1 ∈ [ − 7 . 5 , − 4 . 5] ∧ DKK3 ∈ [ − 5 . 3 , − 1 . 4] ∧ F ANCA ∈ [ − 6 . 7 , − 3 . 1] ∧ FN1 ∈ [ − 4 . 6 , 1 . 8] ∧ HIF1A ∈ [ − 7 . 1 , − 4 . 6] ∧ HK2 ∈ [ − 6 . 7 , − 2 . 6] ∧ HSP A4 ∈ [ − 3 . 9 , − 2 . 2] ∧ INHBA ∈ [ − 7 . 9 , − 1 . 3] ∧ ITGB1 ∈ [ − 2 . 5 , − 0 . 1] ∧ KR T17 ∈ [ − 3 . 2 , 2 . 8] ∧ LGALS1 ∈ [ − 3 . 4 , − 0 . 2] ∧ LOXL2 ∈ [ − 6 . 4 , − 3 . 0] ∧ MCM6 ∈ [ − 5 . 3 , − 2 . 5] ∧ MDM2 ∈ [ − 4 . 3 , − 0 . 7] ∧ MMP10 ∈ [ − 9 . 7 , − 1 . 8] ∧ MMP13 ∈ [ − 10 . 4 , − 2 . 8] ∧ MMP2 ∈ [ − 5 . 0 , − 0 . 5] ∧ MTOR ∈ [ − 3 . 9 , − 2 . 3] ∧ MYC ∈ [ − 3 . 7 , − 0 . 6] ∧ MYNN ∈ [ − 5 . 1 , − 3 . 0] ∧ NR1D2 ∈ [ − 4 . 9 , − 2 . 3] ∧ P4HA2 ∈ [ − 5 . 1 , − 2 . 4] ∧ PSMD9 ∈ [ − 4 . 3 , − 2 . 2] ∧ RMI2 ∈ [ − 5 . 1 , − 2 . 9] ∧ RP A2 ∈ [ − 4 . 5 , − 2 . 1] ∧ SFN ∈ [ − 1 . 6 , 3 . 1] ∧ SLC3A2 ∈ [ − 4 . 3 , − 2 . 3] ∧ SMDT1 ∈ [ − 3 . 9 , − 2 . 2] ∧ SNAI1 ∈ [ − 7 . 6 , − 3 . 8] ∧ SPP1 ∈ [ − 9 . 0 , − 3 . 9] ∧ TCF3 ∈ [ − 4 . 0 , − 2 . 1] ∧ TP53 ∈ [ − 3 . 8 , − 1 . 0] ∧ XRCC4 ∈ [ − 5 . 2 , − 3 . 5] ∧ XRCC5 ∈ [ − 2 . 8 , − 0 . 9] ∧ Y AP1 ∈ [ − 3 . 6 , − 0 . 2] ” “ A TM ∈ [ − 3 . 8 , − 1 . 6] ∧ HIF1A ∈ [ − 7 . 1 , − 4 . 6] ∧ KR T17 ∈ [ − 3 . 2 , 2 . 8] ∧ MMP13 ∈ [ − 10 . 4 , − 2 . 8] ∧ MMP2 ∈ [ − 5 . 0 , − 0 . 5] ∧ PSMD9 ∈ [ − 4 . 3 , − 2 . 2] ∧ RMI2 ∈ [ − 5 . 1 , − 2 . 9] ∧ SNAI1 ∈ [ − 7 . 6 , − 3 . 8] ∧ XRCC4 ∈ [ − 5 . 2 , − 3 . 5] ∧ Y AP1 ∈ [ − 3 . 6 , − 0 . 2] ” Subgroup size: 47 → 45 Exceptionality: 12 . 0529 → 11 . 9264 “ ALDH1A1 ∈ [ − 7 . 0 , − 3 . 4] ∧ BSG ∈ [ − 8 . 9 , − 3 . 9] ∧ CA9 ∈ [ − 9 . 5 , − 4 . 5] ∧ CBX4 ∈ [ − 5 . 3 , − 3 . 7] ∧ CLDN4 ∈ [ − 4 . 9 , − 0 . 4] ∧ CXCL12 ∈ [ − 4 . 3 , − 1 . 5] ∧ ENO1 ∈ [ − 0 . 3 , 1 . 2] ∧ ENO2 ∈ [ − 5 . 8 , − 2 . 4] ∧ EPHA1 ∈ [ − 7 . 2 , − 4 . 6] ∧ EPOR ∈ [ − 10 . 2 , − 6 . 6] ∧ ERCC5 ∈ [ − 5 . 2 , − 3 . 0] ∧ HIF1A ∈ [ − 5 . 7 , − 3 . 7] ∧ LOX ∈ [ − 2 . 9 , − 1 . 0] ∧ MMP9 ∈ [ − 2 . 8 , 0 . 6] ∧ MYNN ∈ [ − 5 . 2 , − 2 . 8] ∧ PGK1 ∈ [ − 2 . 0 , − 0 . 2] ∧ PTEN ∈ [ − 6 . 5 , − 5 . 1] ∧ RAD23B ∈ [ − 1 . 0 , 1 . 1] ∧ RAD50 ∈ [ − 3 . 6 , − 1 . 6] ∧ RB1 ∈ [ − 5 . 0 , − 2 . 4] ∧ SLC5A1 ∈ [ − 10 . 0 , − 3 . 3] ∧ TCF3 ∈ [ − 4 . 0 , − 2 . 1] ∧ XRCC1 ∈ [ − 5 . 2 , − 4 . 1] ∧ XRCC4 ∈ [ − 6 . 2 , − 4 . 1] ∧ XRCC5 ∈ [ − 2 . 2 , − 1 . 0] ” “ ALDH1A1 ∈ [ − 7 . 0 , − 3 . 4] ∧ CA9 ∈ [ − 9 . 5 , − 4 . 5] ∧ CBX4 ∈ [ − 5 . 3 , − 3 . 7] ∧ ENO1 ∈ [ − 0 . 3 , 1 . 2] ∧ EPHA1 ∈ [ − 7 . 2 , − 4 . 6] ∧ LOX ∈ [ − 2 . 9 , − 1 . 0] ∧ MMP9 ∈ [ − 2 . 8 , 0 . 6] ∧ MYNN ∈ [ − 5 . 2 , − 2 . 8] ∧ SLC5A1 ∈ [ − 10 . 0 , − 3 . 3] ∧ TCF3 ∈ [ − 4 . 0 , − 2 . 1] ∧ XRCC1 ∈ [ − 5 . 2 , − 4 . 1] ” Subgroup size: 12 → 12 Exceptionality: 27 . 4009 → 26 . 0718 “ A TM ∈ [ − 4 . 7 , − 1 . 6] ∧ BIRC5 ∈ [ − 4 . 2 , − 1 . 0] ∧ BNIP3L ∈ [ − 3 . 2 , − 1 . 2] ∧ BSG ∈ [ − 9 . 2 , − 3 . 9] ∧ CA9 ∈ [ − 10 . 7 , − 2 . 9] ∧ CBX4 ∈ [ − 5 . 3 , − 3 . 1] ∧ CD24 ∈ [ − 8 . 3 , − 4 . 5] ∧ CDKN3 ∈ [ − 6 . 1 , − 2 . 7] ∧ CXCL12 ∈ [ − 7 . 5 , − 2 . 0] ∧ DCBLD1 ∈ [ − 7 . 5 , − 4 . 5] ∧ ENO2 ∈ [ − 6 . 2 , − 2 . 4] ∧ ERCC5 ∈ [ − 5 . 7 , − 3 . 0] ∧ HK2 ∈ [ − 6 . 7 , − 2 . 2] ∧ HSP A4 ∈ [ − 4 . 0 , − 2 . 2] ∧ KDR ∈ [ − 6 . 0 , − 2 . 4] ∧ KR T17 ∈ [ − 3 . 0 , 5 . 1] ∧ LOX ∈ [ − 7 . 1 , − 1 . 5] ∧ LOXL2 ∈ [ − 6 . 4 , − 1 . 8] ∧ MCM6 ∈ [ − 4 . 8 , − 2 . 5] ∧ MME ∈ [ − 9 . 6 , − 4 . 0] ∧ MRE11A ∈ [ − 6 . 1 , − 4 . 0] ∧ MTOR ∈ [ − 4 . 2 , − 2 . 3] ∧ MYC ∈ [ − 3 . 7 , − 0 . 5] ∧ MYNN ∈ [ − 5 . 0 , − 2 . 3] ∧ NBN ∈ [ − 3 . 9 , − 1 . 6] ∧ NOTCH1 ∈ [ − 6 . 3 , − 1 . 9] ∧ RAD23B ∈ [ − 1 . 4 , 1 . 1] ∧ RAD50 ∈ [ − 4 . 0 , − 1 . 6] ∧ RASSF6 ∈ [ − 8 . 1 , − 1 . 6] ∧ RB1 ∈ [ − 5 . 0 , − 2 . 3] ∧ RELA ∈ [ − 3 . 5 , − 1 . 4] ∧ SDHA ∈ [ − 6 . 2 , − 2 . 9] ∧ SERPINB2 ∈ [ − 8 . 0 , − 0 . 6] ∧ TCF3 ∈ [ − 3 . 9 , − 2 . 1] ∧ TP53 ∈ [ − 5 . 0 , − 1 . 0] ∧ XP A ∈ [ − 5 . 3 , − 2 . 6] ∧ XPC ∈ [ − 5 . 7 , − 3 . 7] ∧ XRCC1 ∈ [ − 5 . 2 , − 2 . 5] ∧ XRCC4 ∈ [ − 5 . 8 , − 3 . 5] ∧ XRCC5 ∈ [ − 2 . 8 , − 0 . 8] ” “ BIRC5 ∈ [ − 4 . 2 , − 1 . 0] ∧ BSG ∈ [ − 9 . 2 , − 3 . 9] ∧ CA9 ∈ [ − 10 . 7 , − 2 . 9] ∧ CD24 ∈ [ − 8 . 3 , − 4 . 5] ∧ DCBLD1 ∈ [ − 7 . 5 , − 4 . 5] ∧ LOX ∈ [ − 7 . 1 , − 1 . 5] ∧ LOXL2 ∈ [ − 6 . 4 , − 1 . 8] ∧ MCM6 ∈ [ − 4 . 8 , − 2 . 5] ∧ MME ∈ [ − 9 . 6 , − 4 . 0] ∧ MTOR ∈ [ − 4 . 2 , − 2 . 3] ∧ MYC ∈ [ − 3 . 7 , − 0 . 5] ∧ MYNN ∈ [ − 5 . 0 , − 2 . 3] ∧ RB1 ∈ [ − 5 . 0 , − 2 . 3] ∧ TCF3 ∈ [ − 3 . 9 , − 2 . 1] ∧ XPC ∈ [ − 5 . 7 , − 3 . 7] ∧ XRCC1 ∈ [ − 5 . 2 , − 2 . 5] ” Subgroup size: 85 → 87 Exceptionality: 11 . 2782 → 11 . 1260 20

Learning and Naming Subgroups with Exceptional Survival Characteristics

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment