Surrogate-Assisted Genetic Programming with Rank-Based Phenotypic Characterisation for Dynamic Multi-Mode Project Scheduling

Surrogate-Assisted Genetic Programming with Rank-Based Phenotypic Characterisation for Dynamic Multi-Mode Project Scheduling Y uan T ian B , Y i Mei, Mengjie Zhang Centr e for Data Science and Artiﬁcial Intelligence & Sc hool of Engineering and Computer Science V ictoria University of W ellington, PO Box 600, W ellington 6140, Ne w Zealand {yuan.tian, yi.mei, mengjie.zhang}@ecs.vuw .ac.nz Abstract —The dynamic multi-mode resour ce-constrained project scheduling problem (DMRCPSP) is of practical impor- tance, as it r equires making r eal-time decisions under changing project states and resour ce availability . Genetic Programming (GP) has been shown to effectively ev olve heuristic rules for such decision-making tasks; however , the ev olutionary process typically relies on a large number of simulation-based ﬁtness evaluations, resulting in high computational cost. Surrogate models offer a promising solution to reduce evaluation cost, but their application to GP requir es problem-speciﬁc pheno- typic characterisation (PC) schemes of heuristic rules. There is currently a lack of suitable PC schemes for GP applied to DMRCPSP . This paper proposes a rank-based PC scheme derived from heuristic-driven ordering of eligible activity–mode pairs and activity groups in decision situations. The resulting PC vectors enable a surrogate model to estimate the ﬁtness of unevaluated GP individuals. Based on this scheme, a surrogate- assisted GP algorithm is developed. Experimental results demon- strate that the proposed surrogate-assisted GP can identify high- quality heuristic rules consistently earlier than the state-of-the- art GP approach f or DMRCPSP , while introducing only marginal computational overhead. Further analyses demonstrate that the surrogate model provides useful guidance for offspring selection, leading to improved evolutionary efﬁciency . Index T erms —Project Scheduling, Hyper -heuristics, Genetic Programming, Surrogate model, Phenotypic characterisation I . I N T R O D U C T I O N Project scheduling is a fundamental component of project management, as high-quality scheduling decisions are essential for completing projects within budget and time constraints. The multi-mode resource-constrained project scheduling prob- lem (MRCPSP) [1], [2] is a well-formalised problem set- ting, in which the start times and execution modes of ac- tivities must be determined under precedence and resource constraints to minimise the ov erall duration of the project. In the dynamic MRCPSP (DMRCPSP) considered in this study , the actual durations of activities are uncertain and only become known during execution. As a result, scheduling decisions—speciﬁcally , which activities should be ex ecuted and in which modes—must be made online as the project state updates. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Heuristic rules are well-suited to such dynamic decision- making scenarios due to their lo w computational cost. Genetic Programming (GP) [3], as a hyper-heuristic approach, has been widely used to automatically ev olve scheduling heuristics based on problem-speciﬁc attrib utes and decision contexts. Howe ver , ev olving effecti v e heuristics requires a large num- ber of expensiv e simulation-based ﬁtness e valuations, which sev erely limit scalability and hinder practical applicability in dynamic decision-making environments. Surrogate-assisted GP [4], [5] has shown promise in re- ducing ev aluation costs for scheduling heuristics. Ho we ver , its successful application critically depends on the av ailability of problem-speciﬁc phenotypic characterisation (PC) schemes that can meaningfully capture the behavioural differences among heuristics. For DMRCPSP , there does not exist such a scheme, and the existing schemes in related problems are not applicable in this case, which prevents the surrogate-assisted GP frameworks from being directly applied to this problem. Motiv ated by the need to enable computationally efﬁcient GP-based decision-making for DMRCPSP , this paper aims to bridge the abov e gap by de veloping a suitable phenotypic characterisation and integrating it into a surrogate-assisted GP approach. Speciﬁcally , the research objectiv es of this work are as follows: • T o design a problem-speciﬁc phenotypic characterisation scheme for GP heuristics in DMRCPSP based on the ranking behaviour of heuristics ov er eligible activities. • T o develop a surrogate-assisted GP algorithm that incor- porates the proposed PC scheme. • T o verify the ef fecti veness of the PC scheme and compare the performance of the surrogate-assisted GP with the state-of-the-art GP for DMRCPSP . I I . B AC K G RO U N D A. Pr oblem Description The dynamic multi-mode resource-constrained project scheduling problem considers the scheduling of a set of project activities subject to precedence and resource constraints. Each activity can only start after all of its predecessor activities hav e been completed. For each activity , multiple e xecution modes are av ailable, each with a dif ferent expected duration and resource requirement. In a dynamic en vironment, the exact duration of an acti vity is not known in advance and is only rev ealed during execution. The actual duration varies within a predeﬁned range determined by optimistic and pessimistic estimates around the expected duration. At any point in time, only a limited amount of renew able resources is a vailable, and the total resource consumption of acti vities executed in parallel must not e xceed the corresponding resource capacities. The objecti ve of the DMRCPSP is to construct a feasible project schedule that respects both precedence and resource constraints while minimising the overall project makespan. B. Genetic Pr ogramming for DMRCPSP Genetic Programming (GP) has been widely applied as a hyper-heuristic method for solving complex scheduling prob- lems, including standard RCPSP [6]–[8], job shop scheduling [3], and related domains [9], [10]. In GP-based scheduling, heuristic rules are ev olved to guide decision-making during schedule construction or simulation. For the dynamic multi-mode RCPSP , recent studies [11]– [13] hav e focused on improving decision-making mechanisms to efﬁciently utilise resources. A representative approach [13] formulates schedules to utilise resources efﬁciently using heuristic rules. First, an activity ordering rule ranks eligible activity–mode pairs according to their priority values, and a promising subset of candidates is selected. Next, feasible activity groups are enumerated from the selected candidates, and an activity group selection rule determines which group should be executed. GP employs a multi-tree representation to ev olve these two rules simultaneously . This decision-making frame work is ef fectiv e for DMRCPSP , as it allows GP to jointly optimise acti vity ordering and group selection strategies. Howe ver , the computational cost of ev aluating GP indi viduals remains a major bottleneck, since each indi vidual must be embedded into a simulation to assess its scheduling performance. C. Surr ogate Models for Scheduling Heuristics Surrogate models [14] are a class of techniques designed to accelerate ev olutionary algorithms on such problems by approximating expensi ve ﬁtness e valuations using computa- tionally cheaper alternativ es. The basic idea is to replace a portion of full ev aluations with inexpensi ve estimations based on previously ev aluated solutions. In the design of surrogate models for GP-based scheduling heuristics, two main approaches have been explored in the literature: simpliﬁed simulation models [15] and phenotypic characterisation-based methods [4]. The ﬁrst approach relies on replacing the original, computationally expensiv e ev alu- ation with a simpliﬁed simulation model, for example, by reducing the number of jobs, machines, or resources con- sidered during ev aluation. While such simpliﬁed models can reduce the computational cost to some extent, the time savings are often limited. Moreover , their ef fecti veness depends on carefully selecting which problem parameters can be reduced without signiﬁcantly compromising ev aluation ﬁdelity , which itself requires additional empirical inv estigation. The second approach adopts a more abstract and compu- tationally efﬁcient strategy . Instead of executing a full (or reduced) simulation, this approach extracts a set of rep- resentativ e decision situations from the original simulation en vironment. GP heuristics are then applied to these situations to make decisions, and their decision-making behaviours are recorded and transformed into numerical feature vectors. As a representativ e example, reference [4] proposed a phenotypic characterisation scheme for the job shop scheduling problem based on a reference rule. In each decision situation, all candidate operations are ranked and inde xed according to a predeﬁned reference rule, and the operation selected by a GP heuristic is represented by its corresponding rank index. Consequently , the phenotypic characterisation (PC) vector of a GP indi vidual consists of the indices of the selected operations across all decision situations. These PC vectors can then be used to measure beha vioural similarity between GP individuals, and, when combined with known ﬁtness values, to estimate the ﬁtness of unev aluated individuals using surrogate models such as nearest-neighbour regression. A ke y challenge in applying PC-based surrogate methods lies in the design of an effecti ve scheme that can faithfully map the behaviour of GP heuristics to a numerical vector representation. The PC scheme introduced in [4] has subsequently been applied to other scheduling domains, including ﬂexible job shop scheduling [5] and multi-project RCPSP [16], but this scheme does not apply to GP for DMRCPSP . The main limitation lies in the fact that the scheme in [4] is designed for single-choice decision scenarios, whereas the heuristic rules in [13] are applied to both ranking-based decision situations and activity group selection, which corresponds to subset selection. While subset selection can be adapted by enumerating and indexing activity groups, enumerating activity orderings is impractical due to the factorial growth of permutations. More- ov er , assigning indices to permutations destroys the semantic relationships between different orderings. In addition, the PC scheme in [4] records only the rank of the highest-priority candidate determined by a reference rule, ignoring the relative ordering of other candidates in the same decision situation. As a result, two heuristic rules that select the same highest- priority candidate b ut differ substantially in the ordering of remaining candidates may be considered identical, leading to inaccurate distance measurements between GP individuals. These limitations moti v ate the de velopment of a new PC scheme that can effecti vely capture ranking-based and subset- selection decision behaviours in DMRCPSP , enabling the application of surrogate-assisted GP to this problem. I I I . P R O P O S E D M E T H O D A. Overall F rame work The proposed algorithm adopts the framew ork from [4], representing a con ventional surrogate-assisted GP workﬂow . An ov erview of the algorithm is illustrated in Fig. 1, where the differences introduced in this work are highlighted in blue. The algorithm begins with an initial population P . For each individual in the population, a phenotypic characterisation Ran d o m ly initia li se p o p u lat ion F u ll f itn e ss e va lua tio n Co m p u t e PC v e c t o r o f Rem o ve d u p li ca te s fr o m S to p ? B u il d su r r o g a te m o d e l w ith Gener ate o f f sp r ing Co m p u t e PC v e c t o r o f E s t im a t e f it n e s s u s in g s u r r o g a t e m o d e l F o r m w ith b e st ind ivi d u a ls fr o m N Y Ret u r n b e st ind ivi d u a l Rem o ve d u p li ca te s fr o m Fig. 1. Flowchart of the surrogate-assisted GP algorithm. (PC) vector is computed; the scheme of PC is detailed in Sec- tion III-B. Duplicate individuals in terms of PC vector are then remov ed, and the remaining unique individuals are ev aluated using the full ﬁtness function. During the ev olutionary loop, the PC vectors of the current population are used to construct a surrogate model. Subsequently , k × | P | intermediate offspring P imd are generated, where k is referred to as the of fspring multiplier . The PC vectors of the intermediate offspring are computed, the of fspring with the unique vector are kept, and the surrogate model is employed to estimate their ﬁtness values (details of the surrogate estimation are provided in Section III-C). Finally , the top | P | offspring, as ranked by the surrogate, are selected for full ﬁtness ev aluation. The ev olutionary loop repeats until the stopping criterion is met, and the best individual found is returned. Since this work extends the approach in [13] by introducing a surrogate model, GP indi vidual representation, full ﬁtness ev aluation procedure, as well as crossover , mutation, and selection operators, remain the same as in that work; thus, they are omitted in this paper . More details can be found in [13]. B. New Phenotypic Characterisation Scheme A well-designed PC scheme should reﬂect the behaviour of GP individuals during full ﬁtness ev aluation. In this work, each individual consists of two trees: an activity ordering rule and an activity group selection rule. The activity ordering rule is applied to compute ordering priority values for eligible activity–mode pairs at each decision point. Based on these priority values, a knee point selection strategy is employed to identify a promising subset of acti vity–mode pairs. From this subset, feasible acti vity groups are enumerated, and the activity group selection rule is then used to select the activity group with the highest priority for execution. Motiv ated by the distinct roles of the two heuristic rules, we propose a decision-situation-based PC scheme, as illustrated in Fig. 2. T wo types of decision situations are considered: activity ordering situations and activity group selection situa- tions. In each decision situation, the eligible acti vities and the corresponding project state information are obtained from data sampled during the schedule simulation. For a giv en GP individual, the corresponding heuristic rule n u m b e r o f e l i g i b l e a c t i v i t y - m o d e p a i rs A ct i v i t y o rd e ri n g s i t u a t i o n s Act i v i t y g ro u p s e l e ct i o n s i t u a t i o n s Decision situ a tio n s Act i v i t y o rd e ri n g ru l e G ro u p s e l e ct i o n ru l e GP ind ivi d u a l 2 … 1 0 Decision ve cto r s Ph e n o t y p ic ch a r a ct e r iz a t io n ve ct o r … … … 1 … 7 … … … 2 … 1 0 … … … 1 … 7 … … … n u m b e r o f e l i g i b l e a c t i v i t y - m o d e g ro u p s Si t u a t i o n … Ac t . - m o d e p a i r A1 … E2 … … Pri o ri t y v a l u e 1 5 … 7 8 … … R a n k 2 … 1 0 … … Si t u a t i o n … Ac t . - m o d e g ro u p (A1 , D 2 ) … (B1 , E1 ) … … Pri o ri t y v a l u e 2 3 … 1 0 5 … … R a n k 1 … 7 … … Fig. 2. Phenotypic characterisation scheme of a GP individual. is applied to each decision situation to compute priority values and rankings for all eligible activity–mode pairs or activ- ity–mode groups. A smaller priority value indicates a higher rank. The ranks of all candidates in each decision situation are extracted to form a decision vector . Finally , the decision vectors obtained from all decision situations are concatenated to produce the complete PC vector of a GP individual. This scheme captures the ranking behaviour of heuristic rules across representati ve decision conte xts, providing a phenotypic description of GP individuals. C. Surr ogate-Based F itness Estimation The surrogate model is used to predict the relative quality of candidate offspring for selection purposes, rather than to replace the true ﬁtness ev aluation. In this work, a simple nearest-neighbour surrogate model is employed for ﬁtness estimation. The surrogate takes as input the PC vectors of the current population along with their corresponding ﬁtness values. T o estimate the ﬁtness of a new individual, the Man- hattan distance is computed between its PC vector and the PC vectors of all individuals stored in the surrogate database. The ﬁtness value of the nearest neighbour in the PC space is then assigned as the estimated ﬁtness of the new indi vidual. It is worth noting that the training samples in the surrogate model are not accumulated across generations. Since the full ﬁtness ev aluation inv olves stochastic simulation with different random seeds in each generation, ﬁtness v alues of indi viduals from different generations are not directly comparable. There- fore, the surrogate database contains only individuals from the current population. An illustrativ e example of the ﬁtness estimation process is shown in Fig. 3. In this example, the PC v ector consists of one activity ordering situation and one activity group selection situation. The surrogate database contains the PC vectors and ﬁtness values of four individuals. When predicting the ﬁtness of a new individual, denoted as I nd ∗ , the Manhattan distances between I nd ∗ and the four PC vectors are computed. I nd ∗ is closest to P C 1 with a distance of 4; the ﬁtness v alue associated with P C 1 is returned as the estimated ﬁtness. S u r r o g a t e m o d e l PC Ve ct o r Di sta n ce b e tw e e n ind * a n d P C x F itn e ss PC 1 1 2 3 4 1 2 3 4 4 1 4 4 6 PC 2 4 3 2 1 4 3 2 1 1 4 3 5 4 3 PC 3 1 4 2 3 2 3 1 4 8 2 7 5 6 PC 4 2 3 1 4 3 2 1 4 8 2 3 4 5 PC (I n d * ) 1 3 2 4 1 2 4 3 1 4 4 6 F i n d th e n e a r e st n e i g h b o u r Ret u r n t h e fi tn e ss o f th e n e a r e st n e i g h b o u r Fig. 3. Example of ﬁtness estimation by the surrogate model. I V . E X P E R I M E N TA L S T U DY A. Experiment Deign T o e valuate the ef fectiv eness of the proposed approach, the algorithm in [13], namely KGGP , is adopted as the baseline for comparison. All algorithms use the same simulation en- vironment and GP parameter settings as in [13] to ensure a fair comparison. In the simulation, each project consists of 200 activities, where each activity can be executed in three modes, and a total of 12 renewable resource types are considered. The precedence complexity among activities is characterised using the order strength [1] metric. Three lev els of precedence complexity are e xamined, with order strength values of 0.75, 0.5, and 0.25, corresponding to high, medium, and low precedence density , respectiv ely . The resulting test scenarios are denoted as 0.75/R12, 0.5/R12, and 0.25/R12. T o obtain reliable performance estimates, each GP individual is ev aluated on ﬁv e project instances, and the a verage makespan relativ e to the corresponding lower bound is used as the ﬁtness value. The proposed algorithm is referred to as SKGGP , i.e., surrogate-assisted KGGP . For the PC scheme, ten decision situations are sampled from the simulation runtime for each type of decision situation. Each sampled situation contains more than ten eligible acti vity–mode pairs or activity–mode groups. T o in vestigate the impact of different offspring multi- pliers k on the ev olutionary process, four values are consid- ered: k = 1 , 1 . 5 , 2 and 4 . The corresponding algorithms are denoted as SKGGP-1, SKGGP-1.5, SKGGP-2, and SKGGP- 4, respecti vely . All GP algorithms are implemented using the Python DEAP 1 framew ork, with a population size of 1000 and 100 generations, resulting in up to 10 5 full ﬁtness ev aluations. Each algorithm is independently executed 30 times for each scenario. All experiments were conducted on identical com- putational nodes within a high-performance computing cluster – New Zealand eScience Infrastructure (NeSI) 2 . B. T est P erformance T able I reports the mean and standard deviation of the test performance achieved by the heuristic rules trained using KGGP (baseline) and SKGGP with different offspring mul- 1 https://github .com/DEAP/deap/ 2 https://www .nesi.org.nz/ T ABLE I T H E M E AN ( STA N DA R D D E VI ATI O N ) O F T HE O BJ E C T IV E V A L UE S O B T A I NE D BY FI VE A LG O R I TH M S F RO M 3 0 I N D E PE N D EN T RU N S . Algorithm <0.75/R12> <0.5/R12> <0.25/R12> KGGP 1.724 ± 0.013 1.691 ± 0.011 1.710 ± 0.016 SKGGP-1 1.711 ± 0.010 1.686 ± 0.014 1.702 ± 0.014 ( ↑ ) ( ≈ ) ( ≈ ) SKGGP-1.5 1.713 ± 0.013 1.684 ± 0.016 1.699 ± 0.014 ( ↑ )( ≈ ) ( ≈ )( ≈ ) ( ↑ )( ≈ ) SKGGP-2 1.708 ± 0.009 1.678 ± 0.014 1.700 ± 0.015 ( ↑ )( ≈ )( ≈ ) ( ↑ )( ↑ )( ≈ ) ( ↑ )( ≈ )( ≈ ) SKGGP-4 1.706±0.010 1.678±0.014 1.700±0.021 ( ↑ )( ↑ )( ↑ )( ≈ ) ( ↑ )( ↑ )( ↑ )( ≈ ) ( ↑ )( ≈ )( ≈ )( ≈ ) 0 50 100 1.70 1.75 1.80 1.85 <0.75/R12> 0 50 100 1.70 1.75 1.80 1.85 <0.5/R12> 0 50 100 1.70 1.75 1.80 1.85 1.90 <0.25/R12> 80 90 100 1.70 1.71 1.72 1.73 80 90 100 1.66 1.68 1.70 80 90 100 1.70 1.72 Generation A verage Objective V alue on T est Instances KGGP SKGGP -1 SKGGP -1.5 SKGGP -2 SKGGP -4 Fig. 4. Conv ergence curves of ﬁve algorithms over 30 independent runs. tipliers k across the three scenarios. Statistical signiﬁcance between algorithms is examined using the Wilcoxon signed- rank test. In the table, each algorithm is compared with the ones above it under the same scenario, and the results are indicated using the symbols “( ↑ )” (signiﬁcantly better), “( ↓ )” (signiﬁcantly worse), and “( ≈ )” (no signiﬁcant difference). Fig. 4 illustrates the performance of the best indi vidual from each generation, e valuated on the test set during the training process. Overall, SKGGP demonstrates superior performance to KGGP when a larger number of intermediate offspring are generated. In particular, SKGGP-2 and SKGGP-4 outperform KGGP across all scenarios. The con vergence curves further show that SKGGP-2 and SKGGP-4 con ver ge substantially faster than KGGP . Although SKGGP-4 generates twice as many intermediate offspring per generation as SKGGP-2, the performance dif ference between the two variants is rel- ativ ely small. The impact of different offspring multipliers on offspring selection is further analysed in Section IV -D. For cases with fewer intermediate offspring, such as SKGGP- 1.5, improved performance over KGGP is observed in two of the three scenarios. The abov e comparisons verify the effecti veness of the proposed PC scheme and the surrogate model design. Notably , SKGGP-1, which does not generate additional intermediate of fspring, con ver ges faster than KGGP in the 0.75/R12-1 scenario. This result indicates that the PC- based duplicate remov al mechanism alone can also contribute to performance improvement to some extent. C. Saved Budget Analysis Figure 5 provides an alternativ e perspectiv e by illustrating the ratio of full ﬁtness ev aluations sav ed while still achieving the same performance as the baseline KGGP . The x-axis repre- 0.2 0.4 <0.75/R12> <0.5/R12> <0.25/R12> 0 50 k 100 k 0.4 0.2 0.0 0 50 k 100 k 0 50 k 100 k KGGP Evaluation Budget Budget Saved R atio SKGGP -1 SKGGP -1.5 SKGGP -2 SKGGP -4 Fig. 5. Budget saved to reach the performance of KGGP . sents the number of full ﬁtness ev aluations already consumed by the baseline KGGP . For a given x value, the y-axis shows the budget saved ratio, which is computed by ﬁrst identifying the best solution found by KGGP at that ev aluation count, and then locating the earliest ev aluation at which another algorithm discovers a solution of equal or better quality . The difference between these two e v aluation indices, normalised by the number of ev aluations used by KGGP , deﬁnes the budget sav ed ratio. This analysis highlights the ability of surrogate-assisted KGGP (SKGGP) to identify heuristic rules of comparable quality using fe wer full ev aluations and at earlier stages of ev olution. In the early phase of evolution, the ratio is negati ve, indicating that the heuristic rules discovered by SKGGP are inferior to those found by KGGP at the same number of ev aluations. Howe ver , as ev olution progresses, SKGGP be- gins to accelerate its con ver gence. Once heuristic rules of equiv alent quality to those of KGGP are reached, SKGGP can sav e approximately 20–40% of full ﬁtness ev aluations. These results demonstrate that the surrogate model effecti vely pre- selects intermediate of fspring, thereby improving the quality of individuals entering the next generation. As a consequence, high-quality heuristic rules can be discovered earlier , leading to a more ev aluation-efﬁcient ev olutionary process. D. Surr ogate P erformance and Impact of Offspring Multiplier on Offspring Quality T o further in vestigate the performance of the surrogate model and the impact of different offspring multipliers k on the ev olutionary process, we conduct an additional analysis based on the population generated by the baseline KGGP at each generation. Speciﬁcally , for each generation, four times the population size of intermediate of fspring are generated, and their true ﬁtness values are obtained through full e valuation. These true ﬁtness v alues are then compared with the ﬁtness values estimated by the surrogate model. This analysis aims to address the following two questions: • Whether the surrogate model is capable of distinguishing high-quality individuals from low-quality ones. • T o what extent increasing the number of intermediate offspring improv es offspring quality . T o answer the ﬁrst question, we measure the precision of the surrogate model at each generation under different v alues of k , and the results are shown in Fig. 6. This metric reﬂects the extent to which the surrogate model can correctly select individuals that should be retained for the next generation. Here, a true positive is deﬁned as an indi vidual whose true ﬁtness ranks within the top 1000 and whose estimated ﬁtness also ranks within the top 1000. A false positiv e refers to an individual whose estimated ﬁtness ranks within the top 1000, but whose true ﬁtness rank is worse than 1000. The precision is the ratio of true positiv e instances ov er all positiv e instances. The precision of the surrogate model decreases as the offspring multiplier k increases. When k = 1 . 5 , the precision reaches approximately 80–90%. For k=2, the precision drops to belo w 70%, while for k = 4 , it further decreases to around 50–60%. This trend is intuitiv e, as selecting the top 1000 individuals becomes increasingly challenging as the number of intermediate offspring grows. One possible explanation for this phenomenon lies in the limited size of the surrogate database. Since the database only contains PC vectors from the current population, the surrogate model may lack sufﬁcient representativ e samples to make accurate predictions when faced with a large number of candidate offspring. T o answer the second question, the intermediate offspring are further divided into two groups: base of fspring, which consist of the ﬁrst 1000 generated offspring, and e xtra off- spring, which include the of fspring generated beyond the population size (i.e., the 1001st and subsequent offspring). W e then examine ho w many individuals from the extra offspring are selected into the top 1000 based on estimated ﬁtness. Among these selected extra offspring, individuals whose true ﬁtness and estimated ﬁtness both rank within the top 1000 are classiﬁed as correctly added, while those whose estimated ﬁtness ranks within the top 1000 but whose true ﬁtness ranks outside the top 1000 are classiﬁed as incorrectly added. The corresponding statistics are reported in Fig. 7. When k = 1 . 5 , approximately 300 individuals per genera- tion are selected from the extra of fspring (1001–1500), most of which are correctly added, with only about 50 individuals being incorrectly added. For k = 2 , around 350 individuals are correctly added, while approximately 100 indi viduals are incorrectly added. This indicates that, for k = 2 , about 35% of the extra offspring successfully replace the original offspring, enter the subsequent e volutionary process and contribute to improving population quality . This observation further demon- strates the ef fecti veness of generating intermediate offspring followed by surrogate-based selection. When k = 4 , although the number of correctly added individuals slightly increases compared to k = 2 , the number of incorrectly added individuals rises substantially to around 300. This result is consistent with the lower precision observed for k = 4 and partially explains why SKGGP-2 and SKGGP- 4 achiev e similar overall performance. Although SKGGP-4 generates twice as many intermediate offspring per generation as SKGGP-2, the surrogate model is less capable of accurately distinguishing high-quality indi viduals from lo w-quality ones, leading to the introduction of both beneﬁcial and detrimental individuals into the population. 0 50 100 0.5 0.6 0.7 0.8 0.9 <0.75/R12> 0 50 100 <0.5/R12> 0 50 100 <0.25/R12> Generation P r ecision k=1.5 k=2 k=4 Fig. 6. Mean precision by various offspring multipliers k across generation. 0 50 100 100 200 300 400 <0.75/R12> 0 50 100 <0.5/R12> 0 50 100 <0.25/R12> Generation Count k=1.5-cor r ectly added k=1.5-incor r ectly added k=2-cor r ectly added k=2-incor r ectly added k=4-cor r ectly added k=4-incor r ectly added Fig. 7. Mean count of true positiv e and false positiv e by various of fspring multipliers k across generations. E. Surr ogate Overhead T o examine whether the introduction of the surrogate model imposes a signiﬁcant computational overhead, we report the av erage per-generation ev aluation time of SKGGP across different scenarios in T able II. Full e v aluation refers to the time required to ev aluate | P | individuals (i.e., 1000) in the current generation using the simulation model. Surrogate estimation denotes the time needed to compute the PC vectors of k ∗ | P | intermediate of fspring and to estimate their ﬁtness v alues using the surrogate model. Overall, the time spent on surrogate estimation is approximately 1/20 to 1/40 of that required for full ev aluation. This indicates that the surrogate-related ov erhead is relativ ely small, demonstrating that the surrogate model can enhance the conv ergence of the algorithm with only a marginal increase in computational cost. V . C O N C L U S I O N S This paper proposes a phenotypic characterisation scheme for genetic programming applied to the dynamic multi-mode T ABLE II M E AN E V A L UA T I O N T I ME O F F U LL E V A L UA T I O N A N D S U RR O GAT E E S TI M A T I ON I N S E C ON D S . Scenario Algorithm Full Evaluation Surrogate Estimation 0.75/R12 SKGGP-1 266.27 9.65 SKGGP-1.5 275.01 10.94 SKGGP-2 333.73 13.05 SKGGP-4 362.33 20.26 0.5/R12 SKGGP-1 556.13 14.00 SKGGP-1.5 602.91 19.63 SKGGP-2 541.95 22.89 SKGGP-4 593.04 45.40 0.25/R12 SKGGP-1 784.37 10.53 SKGGP-1.5 967.90 13.83 SKGGP-2 948.97 15.27 SKGGP-4 881.96 23.98 resource-constrained project scheduling problem. By analysing the decision situations encountered during GP-based schedul- ing, a rank-based PC vector is designed to transform the behavioural characteristics of GP indi viduals into a numerical vector . This addresses the lack of suitable PC schemes for GP in DMRCPSP and enables behaviour -lev el comparison between indi viduals. The proposed PC scheme is integrated into a surrogate-assisted GP framework, where it is used to measure distances between GP indi viduals and to estimate the ﬁtness of unev aluated intermediate offspring. Experimental results demonstrate that the SKGGP algorithm is able to ev olve higher-quality heuristic rules than the baseline KGGP with signiﬁcantly fewer expensi ve ﬁtness ev aluations. Further analyses reveal the surrogate model’ s ability to pre-select promising offspring and quantify the contribution of different numbers of intermediate offspring to population quality . Despite these advantages, the precision of the current sur- rogate model decreases when selecting from a large number of intermediate offspring. Future work will therefore focus on improving surrogate accuracy by expanding the surro- gate database, exploring more e xpressiv e PC schemes, and in vestigating alternativ e machine learning models for ﬁtness estimation. R E F E R E N C E S [1] V . V . Peteghem and M. V anhoucke, “ An experimental in vestigation of metaheuristics for the multi-mode resource-constrained project schedul- ing problem on new dataset instances, ” Eur opean Journal of Oper ational Resear ch , vol. 235, no. 1, pp. 62–72, 2014. [2] S. Hartmann and D. Briskorn, “ An updated surve y of variants and extensions of the resource-constrained project scheduling problem, ” Eur opean Journal of Operational Researc h , vol. 297, no. 1, pp. 1–14, 2022. [3] F . Zhang, Y . Mei, S. Nguyen, and M. Zhang, “Surve y on genetic programming and machine learning techniques for heuristic design in job shop scheduling, ” IEEE T ransactions on Evolutionary Computation , vol. 28, no. 1, pp. 147–167, 2024. [4] T . Hildebrandt and J. Branke, “On Using Surrogates with Genetic Programming, ” Evolutionary computation , vol. 23, no. 3, pp. 343–367, 2015. [5] F . Zhang, Y . Mei, S. Nguyen, M. Zhang, and K. C. T an, “Surrogate- Assisted Ev olutionary Multitask Genetic Programming for Dynamic Flexible Job Shop Scheduling, ” IEEE T ransactions on Evolutionary Computation , vol. 25, no. 4, pp. 651–665, 2021. [6] M. Ðumi ´ c and D. Jakobo vi ´ c, “Ensembles of priority rules for re- source constrained project scheduling problem, ” Applied Soft Comput- ing , vol. 110, p. 107606, 2021. [7] J. Luo, M. V anhoucke, J. Coelho, and W . Guo, “ An efﬁcient genetic programming approach to design priority rules for resource-constrained project scheduling problem, ” Expert Systems with Applications , vol. 198, no. 1, p. 116753, 2022. [8] M. Ðumi ´ c and D. Jakobovi ´ c, “Using priority rules for resource- constrained project scheduling problem in static en vironment, ” Com- puters & Industrial Engineering , vol. 169, p. 108239, 2022. [9] S. W ang, Explainable Genetic Pr ogr amming for Evolving Routing P olicies of Uncertain Capacitated Arc Routing Pr oblems . Phd thesis, T e Herenga W aka-V ictoria Univ ersity of W ellington, 2023. [10] T . Guo, Y . Mei, M. Zhang, R. Sun, Y . Zhu, and W . Du, “Genetic pro- gramming with multiﬁdelity surrog ates for large-scale dynamic air traf ﬁc ﬂow management, ” IEEE T ransactions on Evolutionary Computation , vol. 29, no. 6, pp. 2671–2685, 2025. [11] Y . Tian, Y . Mei, and M. Zhang, “Learning Heuristics via Genetic Pro- gramming for Multi-mode Resource-constrained Project Scheduling, ” in 2024 IEEE Congress on Evolutionary Computation (CEC) , pp. 01–08, June 2024. [12] Y . Tian, Y . Mei, and M. Zhang, “Genetic Programming with Activity Group Selection for Dynamic Multi-mode Resource-Constrained Project Scheduling Problems, ” in 2025 IEEE Congress on Evolutionary Com- putation (CEC) , pp. 1–4, IEEE, June 2025. [13] Y . Tian, Y . Mei, and M. Zhang, “Scalable knee-point guided activity group selection in multi-tree genetic programming for dynamic multi- mode project scheduling, ” arXiv: 2601.14485 , 2026. [14] Y . Jin, H. W ang, and C. Sun, “Data-Driven Surrogate-Assisted Evo- lutionary Optimization, ” in Data-Driven Evolutionary Optimization: Inte grating Evolutionary Computation, Machine Learning and Data Science , pp. 147–172, Cham: Springer International Publishing, 2021. [15] S. Nguyen, M. Zhang, and K. C. T an, “Surrogate-Assisted Genetic Pro- gramming W ith Simpliﬁed Models for Automated Design of Dispatching Rules, ” IEEE T ransactions on Cybernetics , vol. 47, no. 9, pp. 2951– 2965, 2017. [16] H. Chen, X. Li, and L. Gao, “ A surrogate-assisted dual-tree genetic programming framework for dynamic resource constrained multi-project scheduling problem, ” International Journal of Production Research , vol. 62, no. 16, pp. 5631–5653, 2024.

Surrogate-Assisted Genetic Programming with Rank-Based Phenotypic Characterisation for Dynamic Multi-Mode Project Scheduling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment