Evolutionarily Stable Stackelberg Equilibrium

We present a new solution concept called evolutionarily stable Stackelberg equilibrium (SESS). We study the Stackelberg evolutionary game setting in which there is a single leading player and a symmetric population of followers. The leader selects an…

Authors: Sam Ganzfried

Evolutionarily Stable Stackelberg Equilibrium Sam Ganzfried Ganzfried Research sam.ganzfried@gmail.com Abstract — W e present a new solution concept called evolu- tionarily stable Stackelberg equilibrium (SESS). W e study the Stackelberg evolutionary game setting in which there is a single leading player and a symmetric population of f ollowers. The leader selects an optimal mixed strategy , anticipating that the follo wer population plays an evolutionarily stable strategy (ESS) in the induced subgame and may satisfy additional ecological conditions. W e consider both leader -optimal and follower - optimal selection among ESSs, which arise as special cases of our framework. Prior approaches to Stackelberg evolutionary games either define the follo wer response via evolutionary dynamics or assume rational best-response beha vior , without explicitly enforcing stability against inv asion by mutations. W e present algorithms for computing SESS in discrete and continuous games, and v alidate the latter empirically . Our model applies naturally to biological settings; for example, in cancer treatment the leader repr esents the physician and the follo wers corr espond to competing cancer cell phenotypes. I . I N T RO D U C T I O N While Nash equilibrium has emerged as the standard solution concept in game theory , it is often criticized as being too weak: often games contain multiple Nash equilibria (sometimes e ven infinitely man y), and we want to select one that satisfies other natural properties. For example, one popular concept that refines Nash equilibrium is evolution- arily stable strategy (ESS). A mixed strategy in a two-player symmetric game is an e volutionarily stable strategy if it is robust to being overtaken by a mutation strategy . Formally , x ∗ is an ESS if for e very mixed strategy x that dif fers from x ∗ , there exists ϵ 0 = ϵ 0 ( x ) > 0 such that, for all ϵ ∈ (0 , ϵ 0 ) , (1 − ϵ ) u 1 ( x , x ∗ )+ ϵu 1 ( x , x ) < (1 − ϵ ) u 1 ( x ∗ , x ∗ )+ ϵu 1 ( x ∗ , x ) . From a biological perspectiv e, we can interpret x ∗ as a distri- bution among “normal” individuals within a population, and consider a mutation that makes use of strategy x , assuming that the proportion of the mutation in the population is ϵ . In an ESS, the expected payoff of the mutation is smaller than the expected payoff of a normal individual, and hence the proportion of mutations will decrease and ev entually disappear over time, with the composition of the population returning to being mostly x ∗ . An ESS is therefore a mixed strategy that is immune to being overtaken by mutations. ESS was initially proposed by mathematical biologists mo- tiv ated by applications such as population dynamics (e.g., maintaining robustness to mutations within a population of humans or animals) [21], [20]. A common example game is the 2x2 game where strategies correspond to an “aggressi ve” Hawk or a “peaceful” Dove strategy . A paper has recently proposed a similar game in which an aggressiv e malignant cell competes with a passive normal cell for biological energy , which has applications to cancer eradication [6]. While Nash equilibrium is defined for general multiplayer games, ESS is traditionally defined specifically for two- player symmetric games. ESS is a refinement of Nash equilibrium. In particular , if x ∗ is an ESS, then ( x ∗ , x ∗ ) (i.e., the strategy profile where both players play x ∗ ) is a (symmetric) Nash equilibrium [19]. Of course the con verse is not necessarily true (not every symmetric Nash equilibrium is an ESS), or else ESS would be a tri vial refinement. In fact, ESS is not guaranteed to exist in games with more than two pure strategies per player (while Nash equilibrium is guaranteed to exist in all finite games). For example, while rock-paper-scissors has a mixed strategy Nash equilibrium (which puts equal weight on all three actions), it has no ESS [19] (that work considers a version where payoffs are 1 for a victory , 0 for loss, and 2 3 for a tie). There exists a polynomial-time algorithm for computing Nash equilibrium (NE) in two-player zero-sum games, while for two-player non-zero-sum and multiplayer games comput- ing an NE is PP AD-complete and it is widely conjectured that no ef ficient (polynomial-time) algorithm exists. Howe ver , sev eral algorithms have been de vised that perform well in practice. The problem of computing whether a game has an ESS was shown to be both NP-hard and CO-NP hard and also to be contained in Σ P 2 (the class of decision problems that can be solved in nondeterministic polynomial time given access to an NP oracle) [7]. Subsequently it was shown that the exact complexity of this problem is that it is Σ P 2 - complete [5], and e ven more challenging for more than two players [2]. Note that this result is for determining whether an ESS exists (as discussed above there exist games which hav e no ESS), not for the complexity of computing an ESS in games for which one exists. Thus, computing an ESS is significantly more difficult than computing an NE. Sev eral approaches hav e been proposed for computing ESS in two- player games [14], [1], [4], [3], [23]. A normal-form game consists of a finite set of players N = { 1 , . . . , n } , a finite set of pure strategies S i for each player i , and a real-valued utility for each player for each strategy vector (aka strate gy pr ofile ), u i : × i S i → R . In a symmetric normal-form game , all strategy spaces S i are equal and the utility functions satisfy the following symmetry condition: for every player i ∈ N , pure strategy profile ( s 1 , . . . , s n ) ∈ S n , and permutation π of the players, u i ( s 1 , . . . , s n ) = u π ( i ) ( s π (1) , . . . , s π ( n ) ) . This allo ws us to remo ve the player index of the utility function and just write u ( s 1 , . . . , s n ) , where it is implied that the utility is for player 1 (we can simply permute the players to obtain the utilities of the other players). W e write u i for notational con venience, but note that only a single utility function must be specified which applies to all players. Let Σ i denote the set of mixed strategies of player i (probability distributions over elements of S i ). If players follow mixed strategy profile x = ( x (1) , . . . , x ( n ) ) , where x ( i ) ∈ Σ i , the expected payoff to player i is u i ( x (1) , . . . , x ( n ) ) = X s 1 ,...,s n ∈ S x (1) s 1 · · · x ( n ) s n u i ( s 1 , . . . , s n ) . W e write u i ( x ) = u i ( x ( i ) , x ( − i ) ) , where x ( − i ) denotes the vector of strate gies of all players except i . If all players follow the same mixed strate gy x , then for all players i : u i ( x ) = u i ( x , . . . , x ) = X s 1 ,...,s n ∈ S x s 1 · · · x s n u i ( s 1 , . . . , s n ) . Definition 1: A mixed strategy profile x ⋆ is a Nash equi- librium if for each player i ∈ N and for each mixed strategy x ( i ) ∈ Σ i : u i ( x ∗ ( i ) , x ∗ ( − i ) ) ≥ u i ( x ( i ) , x ∗ ( − i ) ) . Definition 2: A mixed strategy profile x ⋆ in a symmetric normal-form game in a symmetric Nash equilibrium if it is a Nash equilibrium and: x ∗ (1) = x ∗ (2) = . . . = x ∗ ( n ) . Definition 3: A mixed strategy x ⋆ ∈ Σ 1 is evolutionarily stable in a symmetric normal-form game if for each mixed strategy x  = x ⋆ exactly one of the following holds: 1) u 1 ( x ∗ , x ∗ , . . . , x ∗ ) > u 1 ( x , x ∗ , . . . , x ∗ ) , 2) u 1 ( x ∗ , x ∗ , . . . , x ∗ ) = u 1 ( x , x ∗ , . . . , x ∗ ) and u 1 ( x ∗ , x , x ∗ , . . . , x ∗ ) > u 1 ( x , x , x ∗ , . . . , x ∗ ) . It has been proven that ev ery symmetric normal-form game has at least one symmetric Nash equilibrium [24]. It is clear from Definition 3 that every ev olutionarily stable strategy in a symmetric normal-form game must be a sym- metric Nash equilibrium (SNE). Thus, a natural approach for ESS computation is to first compute SNE and then perform subsequent procedures to determine whether the y are ESS. Now we present the Stackelberg ev olutionary game set- ting. The leader’ s set of pure strategies is M = { 1 , . . . , m } and the follo wer phenotypes are P = { 1 , . . . , n } . The leader’ s utility function is gi ven by A L ∈ R m × n , where A L ( ℓ, i ) giv es the payoff to the leader when the leader plays pure strategy ℓ ∈ M and the follower population state is phenotype i ∈ P . The leader’ s expected utility when the leader plays mixed strategy σ and the follo wer plays mixed strategy x is U L ( σ , x ) = P m ℓ =1 P n i =1 σ ℓ x i A L ( ℓ, i ) . The follower payof f tensor is A F ∈ R m × n × n , where A F ( ℓ, i, j ) is the payof f (fitness) to a follower of phenotype i ∈ P when the leader plays pure strategy ℓ ∈ M and the interacting follo wer has phenotype j ∈ P . For each fixed ℓ ∈ M , A F ( ℓ, · , · ) is an n × n symmetric payoff matrix. After the leader commits to mix ed strate gy σ ∈ ∆( M ) , the induced follower payoff matrix is B σ ∈ R n × n with B σ = P m ℓ =1 σ ℓ A F ( ℓ, · , · ) . The follo wer population then plays a symmetric e volutionary game with payof f matrix B σ . A population state x ∈ ∆( P ) is admissible for σ if f it is an ESS of B σ . Let E ( σ ) denote the set of ESSs of B σ , and let g σ : ∆( P ) → R denote the follower’ s ESS selection function . Finally let G ( σ ) = arg max x ∈E ( σ ) g σ ( x ) . W e now present our general definition for evolutionarily stable Stackelber g equilibrium (SESS) in Definition 4. W e also define two natural special cases. In the first case, the followers play the ESS that is most beneficial to the leader . W e refer to this as an optimistic evolutionarily stable Stack- elber g equilibrium (OSESS). In this case we hav e g σ ( x ) = U L ( σ , x ) . W e also consider pessimistic evolutionarily stable Stack elber g equilibrium (PSESS), in which the followers play the ESS that is worst for the leader . While the followers are not rationally trying to harm the leader, we can vie w PSESS as SESS that is robust to worst-case ev olutionary stability . For PSESS we hav e g σ ( x ) = − U L ( σ , x ) . OSESS and PSESS are defined below in Definitions 5–6. For OSESS we have G ( σ ) = arg max x ∈E ( σ ) U L ( σ , x ) , and for PSESS G ( σ ) = arg min x ∈E ( σ ) U L ( σ , x ) . SESS refines Stackelberg equilibrium by restricting the follower response to strategies that are evolutionarily stable in the induced population game. Definition 4 (Evolutionarily Stable Stac kelber g Equilibrium): Giv en a selection correspondence G ( σ ) ⊆ E ( σ ) , ( σ ∗ , x ∗ ) is an e volutionarily stable Stackelberg equilibrium if σ ∗ ∈ arg max σ ∈ ∆( M ) max x ∈G ( σ ) U L ( σ , x ) and x ∗ ∈ G ( σ ∗ ) . Definition 5 (Optimistic SESS): A pair ( σ ∗ , x ∗ ) ∈ ∆( M ) × ∆( P ) is an optimistic e volutionarily stable Stack elber g equilibrium if ( σ ∗ , x ∗ ) ∈ arg max σ ∈ ∆( M ) , x ∈E ( σ ) U L ( σ , x ) . Definition 6 (P essimistic SESS): A pair ( σ ∗ , x ∗ ) ∈ ∆( M ) × ∆( P ) is a pessimistic evolutionarily stable Stack elber g equilibrium if σ ∗ ∈ arg max σ ∈ ∆( M ) min x ∈E ( σ ) U L ( σ , x ) and x ∗ ∈ arg min x ∈E ( σ ∗ ) U L ( σ ∗ , x ) . I I . C O N T I N U O U S S TAC K E L B E R G – E S S W e no w extend the Stackelberg–ESS framework to con- tinuous Stackelberg ev olutionary games, which arise natu- rally in biological and control applications such as cancer treatment. In this setting, the leader selects a continuous decision variable, while the followers correspond to an ev olving population whose response is described by coupled ecological and e volutionary dynamics. Let M ⊆ R d + denote the leader decision set (e.g., treatment doses or control parameters). For a giv en leader decision m ∈ M , the followers are described by a collection of continuous state variables, including population sizes and traits. W e denote the follower outcome by z = ( x , u ) ∈ Z , where x ∈ R n + represents population abundances and u ∈ U ⊆ R n represents trait or strategy variables. As prior work has done we will assume that U = [0 , 1] n . In the context of cancer treatment d denotes the number of drugs and n denotes the number of cancer cell phenotypes. The leader objectiv e is gi ven by a function Q : M × Z → R . For each m ∈ M , the leader induces an eco–ev olutionary system gov erning the dynamics of ( x , u ) . For a fixed leader decision m , we define the set of admissible follower responses E ( m ) ⊆ Z as the set of eco–evolutionarily stable outcomes under m . Informally , a follower outcome z ∗ = ( x ∗ , u ∗ ) is admissible if it satisfies any required ecological equilibrium conditions and is stable against in vasion by rare mutant phenotypes in the induced eco–ev olutionary system. W e assume throughout that E ( m ) is nonempty for the leader decisions under consideration. When multiple ev olutionarily stable outcomes exist for the same leader decision, we model their selection via a corre- spondence G ( m ) ⊆ E ( m ) . W e now define continuous analogues of Stackelberg–ESS. Definition 7 (Continuous Stac kelber g–ESS): Let G ( m ) ⊆ E ( m ) be a selection correspondence. A pair ( m ∗ , z ∗ ) ∈ M× Z is a continuous Stack elber g–ESS if m ∗ ∈ arg max m ∈M max z ∈G ( m ) Q ( m , z ) and z ∗ ∈ G ( m ∗ ) . Definition 8 (Optimistic continuous Stac kelber g–ESS): A pair ( m ∗ , z ∗ ) ∈ M × Z is an optimistic continuous Stack elber g–ESS if ( m ∗ , z ∗ ) ∈ arg max m ∈M , z ∈E ( m ) Q ( m , z ) . Definition 9 (P essimistic continuous Stack elber g–ESS): A pair ( m ∗ , z ∗ ) ∈ M × Z is a pessimistic continuous Stack elber g–ESS if m ∗ ∈ arg max m ∈M min z ∈E ( m ) Q ( m , z ) and z ∗ ∈ arg min z ∈E ( m ∗ ) Q ( m ∗ , z ) . First we present the Stackelberg ev olutionary game for- mulation of cancer treatment from prior work [18]. In this formulation (Equation 1), the objecti ve Q corresponds to the quality of life function that is maximized by the leader . For each cell type i , there is a fitness function G i ( u i , m , x ) that the follo wer is trying to maximize. W e assume that the dynamics of the population x are gov erned by ˙ x = G ( t ) x . In order to ensure that we are in equilibrium of the ecological dynamics we must hav e that ˙ x i = 0 for all i. Note that in this formulation the follower is playing a best response by maximizing the fitness function. By contrast, in Stackelberg- ESS we assume that the behavior of the cancer cells is determined by e volutionary stability conditions as opposed to explicit utility maximization. An algorithm for solving this problem has been presented that is based on a noncon ve x quadratic program formulation [8]. This algorithm has been demonstrated to quickly compute a global optimal solution on a realistic example problem proposed by prior work [18]. max m ∗ , u ∗ , x ∗ Q ( m ∗ , u ∗ , x ∗ ) s.t. ˙ x ∗ i = 0 , i = 1 , . . . , n, u ∗ i ∈ arg max u i ∈ [0 , 1] G i ( u i , m ∗ , x ∗ ) , i = 1 , . . . , n, m ∗ ≥ 0 , x ∗ ≥ 0 , 0 ≤ u ∗ ≤ 1 . (1) The formulation for optimistic continuous Stackelberg- ESS is gi ven by Equation 2. This formulation is similar to the previous one for standard Stack elberg equilibrium except that now instead of maximizing the fitness functions the follower ensures that their strategy is an ESS in the induced subgame giv en the leader is playing m ∗ . W e will focus our attention on the problem of computing OSESS under this formulation, since it is perhaps the easiest case of SESS to solve since both players are aligned in maximizing Q. max m ∗ , u ∗ , x ∗ Q ( m ∗ , u ∗ , x ∗ ) s.t. ˙ x ∗ i = 0 , i = 1 , . . . , n, ( x ∗ , u ∗ ) ∈ E ( m ∗ ) , m ∗ ≥ 0 , x ∗ ≥ 0 , 0 ≤ u ∗ ≤ 1 . (2) Equation 3 provides an equiv alent reformulation of Equa- tion 2. Note that we use the continuous-trait definition of ESS for our setting [16], [25], [11] as opposed to the discrete ESS definition presented previously . Gi ven fixed m , a resident state ( x ∗ , u ∗ ) is an ESS if f x ∗ is an ecological equilibrium (i.e., ˙ x ∗ = 0 ) and no rare mutant can grow , i.e., sup u i G i ( u i , m , x ∗ ) ≤ 0 ∀ i. Recall that ˙ x = Gx . So the constraint ˙ x ∗ i = 0 is equiv alent to G i ( u ∗ i , m ∗ , x ∗ ) x ∗ i = 0 . Recall that ( x ∗ , u ∗ ) ∈ E ( m ) if f ( x ∗ , u ∗ ) is ev olutionarily stable in the eco-evolutionary system induced by m . This is true iff for all i and u i a mutation to u i does not lead to positiv e fitness (growth rate), i.e., G i ( u i , m , x ∗ ) ≤ 0 . max m ∗ , u ∗ , x ∗ Q ( m ∗ , u ∗ , x ∗ ) s.t. x ∗ i G i ( u ∗ i , m ∗ , x ∗ ) = 0 , i = 1 , . . . , n, G i ( u i , m ∗ , x ∗ ) ≤ 0 ∀ u i ∈ [0 , 1] , i = 1 , . . . , n, m ∗ ≥ 0 , x ∗ ≥ 0 , 0 ≤ u ∗ ≤ 1 . (3) I I I . A L G O R I T H M S W e first present an algorithm for computing OSESS in normal-form games, followed by an algorithm for continuous-trait games. Pseudocode for the main algorithm in the normal-form setting is giv en in Algorithm 1. W e com- pute an OSESS by enumerating candidate follo wer supports and for each support computing a Stackelberg equilibrium of the game where the follower is restricted to strategies with the given support. This is done by using the subroutine given in Algorithm 2. Gi ven the Stackelber g equilibrium ( σ , x ) , we next check whether x is an ESS in the subgame induced by the leader’ s strategy σ . This is done in Algorithm 3. If the strategies do constitute an ESS, then we have found an e volutionarily stable Stackelberg equilibrium (SESS). W e then calculate the objectiv e σ ⊤ A L , and return the SESS with highest value as an OSESS. The algorithm can be easily adapted to output the first SESS found, all SESSs found, or an SESS that optimizes a different objecti ve. The values used for all numerical tolerance parameters are given in T able I. Algorithm 2 computes a Stackelber g equilibrium strategy with given support T by solving a quadratically-constrained quadratic program (QCQP). The constraints x i ≥ ϵ s ensure that the pure strategies in the support are played with positiv e probability . The objectiv e ensures that the leader’ s payoff is maximized, and the final two sets of constraints ensure that the follower is best-responding to the leader’ s strategy . Our algorithm is implicitly assuming that the game is nonde gen- erate and that there exists at least one Stackelberg equilib- rium for each follo wer support. This is a common assumption in equilibrium-finding algorithms [22], [12], [15], and has been studied recently in the context of ESS computation [10], [9]. If the game is degenerate with an infinite continuum of equilibria our algorithm will fail to find all of them, and therefore may fail to find all SESSs; howe ver it may still find one. Algorithm 3 also in volves solving a QCQP . In practice we will be solving noncon vex QCQPs with Gurobi’ s solver , which can solve small problem instances quickly despite their computational complexity . The QCQP checks whether any mutant strategy y can successfully in v ade against a population follo wing strate gy x in the game induced by the leader’ s strategy σ . The parameter ϵ p allows for a small amount of numerical imprecision, and the final constraint ensures that y is numerically distinct from x by using numerical separation parameter δ . Algorithm 1 C O M P U T E D I S C R E T E O S E S S Require: Follo wer payoff tensor A F ∈ R m × n × n Require: Leader payof f matrix A L ∈ R m × n Require: Minimum support mass ϵ s , ESS tolerances ϵ p , δ Ensure: OSESS ( σ ⋆ , x ⋆ ) 1: v ⋆ ← −∞ 2: ( σ ⋆ , x ⋆ ) ← ( ∅ , ∅ ) 3: for all nonempty supports T ⊆ P do 4: ( status, σ , x ) ← S O L V E S E S U P P O RT ( A F , A L , T , ϵ s ) 5: if status = FEASIBLE then 6: if I S E S S ( A F , σ , x , ϵ p , δ ) then 7: v ← σ ⊤ A L x 8: if v > v ⋆ then 9: v ⋆ ← v 10: ( σ ⋆ , x ⋆ ) ← ( σ , x ) 11: end if 12: end if 13: end if 14: end for 15: return ( σ ⋆ , x ⋆ ) Pseudocode for the main algorithm in continuous setting is gi ven in Algorithm 4. The algorithm follows a generate- and-certify structure: first a candidate solution satisfying the KKT conditions of the inv asion maximization problems is computed, and then a global optimization check verifies that no mutant phenotype can inv ade. The main subroutine Algorithm 5 generates a candidate OSESS by solving the op- timization problem presented in Equation 3. The formulation finds a solution that satisfies the KKT optimality conditions, though it is not guaranteed to be a globally optimal solution. T o test if this candidate solution is globally optimal we perform the ex-post certification procedure described in Al- gorithm 6, using a numerical tolerance of ϵ inv . W e set ϵ inv = 10 − 3 , which is biologically negligible but significantly larger than Gurobi’ s default tolerance of 10 − 6 . Note that if the Algorithm 2 S O LV E S E S U P P O RT Require: Follo wer payoff tensor A F ∈ R m × n × n Require: Leader payof f matrix A L ∈ R m × n Require: Support T ⊆ P , minimum support mass ϵ s Ensure: ( status, σ , x ) 1: Solve the follo wing optimization problem: max σ , x ,v σ ⊤ A L x s.t. σ ℓ ≥ 0 , ∀ ℓ ∈ M , X ℓ ∈ M σ ℓ = 1 , x i ≥ ϵ s , ∀ i ∈ T , x j = 0 , ∀ j / ∈ T , X i ∈ P x i = 1 , X ℓ ∈ M X k ∈ P σ ℓ A F ( ℓ, i, k ) x k = v , ∀ i ∈ T , X ℓ ∈ M X k ∈ P σ ℓ A F ( ℓ, j, k ) x k ≤ v , ∀ j / ∈ T . 2: if the problem is feasible then 3: retur n ( FEASIBLE , σ , x ) 4: else 5: retur n ( INFEASIBLE , ∅ , ∅ ) 6: end if Algorithm 3 I S E S S Require: Follo wer payoff tensor A F ∈ R m × n × n Require: Leader mixed strategy σ ∈ ∆( M ) Require: Follo wer mixed strategy x ∈ ∆( P ) Require: T olerances ϵ p , δ Ensure: TRUE if x is an ESS of the induced follower game; otherwise FALSE 1: Define the induced follower payoff matrix B ∈ R n × n : B ij ← X ℓ ∈ M σ ℓ A F ( ℓ, i, j ) , ∀ i, j ∈ P . 2: v ← x ⊤ B x 3: Solve the follo wing optimization problem: max y y ⊤ B y − x ⊤ B y s.t. y j ≥ 0 , ∀ j ∈ P , X j ∈ P y j = 1 ,   y ⊤ B x − v   ≤ ϵ p , ∥ y − x ∥ 2 2 ≥ δ. 4: if the optimal objectiv e v alue ≤ ϵ p then 5: retur n TRUE 6: else 7: retur n FALSE 8: end if quality of life functions G i are concave in u i then the KKT conditions are also sufficient. In our experiments we will use previously-studied Q and G i functions, and the algorithm will in volve solving noncon ve x QCQPs. Algorithm 4 C O M P U T E C O N T I N U O U S O S E S S Require: Leader objecti ve Q ( m , u , x ) Require: Fitness functions { G i ( u i , m , x ) } n i =1 Require: Feasible set M , tolerance ϵ inv Ensure: OSESS ( m ⋆ , u ⋆ , x ⋆ ) if one is found 1: ( m ⋆ , u ⋆ , x ⋆ ) ← G E N O S E S S ( Q, { G i } , M ) 2: for i = 1 to n do 3: G max i ← C E RT I F Y ( i, m ⋆ , x ⋆ ) 4: end for 5: if max i G max i ≤ ϵ in v then 6: retur n ( m ⋆ , u ⋆ , x ⋆ ) 7: else 8: retur n FAILURE 9: end if Algorithm 5 G E N O S E S S Require: Leader objecti ve Q ( m , u , x ) Require: Fitness functions { G i ( u i , m , x ) } n i =1 Require: Feasible set M Ensure: Candidate solution ( m ⋆ , u ⋆ , x ⋆ ) 1: Solve the follo wing optimization problem: max m , u , x , u ′ , λ L , λ U Q ( m , u , x ) s.t. x i G i ( u i , m , x ) = 0 , i = 1 , . . . , n, m ∈ M , x ≥ 0 , 0 ≤ u ≤ 1 , 0 ≤ u ′ ≤ 1 , G i ( u ′ i , m , x ) ≤ 0 , i = 1 , . . . , n, ∂ G i ∂ u i ( u ′ i , m , x ) + λ L i − λ U i = 0 , i = 1 , . . . , n, λ L i u ′ i = 0 , i = 1 , . . . , n, λ U i (1 − u ′ i ) = 0 , i = 1 , . . . , n, λ L i ≥ 0 , λ U i ≥ 0 , i = 1 , . . . , n. 2: return optimal solution ( m ⋆ , u ⋆ , x ⋆ ) Algorithm 6 C E RT I F Y Require: Phenotype index i ∈ { 1 , . . . , n } Require: Fixed ( m ⋆ , x ⋆ ) Ensure: G max i 1: Computing globally optimal solution: max u G i ( u, m ⋆ , x ⋆ ) s.t. 0 ≤ u ≤ 1 2: return optimal objective G max i I V . E X P E R I M E N T S W e experiment with our continuous algorithm on an ev o- lutionary cancer game model previously studied [18]. Note Parameter V alue ϵ s 10 − 4 ϵ p 10 − 5 δ 10 − 2 ϵ inv 10 − 3 T ABLE I T OL E R AN C E P A R AM E T ER V A L U E S F O R T H E A L GO R I T HM S . that prior work for this model has computed a Stackelberg equilibrium (SE), where the leader commits to a strategy that optimizes the quality of life function Q while the cancer cell phenotypes optimize their respectiv e fitness functions G i (subject to the equilibrium conditions of the ecological dynamics being satisfied) [18], [8]. In contrast, we compute an OSESS in this game, which guarantees that the follower cancer cells are following an ESS in the game induced by the leader’ s strategy and no mutant trait can hav e positiv e growth (while SE does not preclude existence of mutants with positiv e gro wth). This is fundamentally a dif ferent solution concept, so the players’ payoffs may increase or decrease. The model is instantiated by the follo wing functional forms for the fitness functions G i and quality of life function Q . Note that the model presentation is slightly dif ferent between the paper [18] and the implementation in the code repository [17], and we will use the model from the code. G 0 = r max  1 − α 00 x 0 + α 01 x 1 + α 02 x 2 K  − d − m 1 k 1 − m 2 k 2 G 1 = r max e − g 1 u 1  1 − α 10 x 0 + α 11 x 1 + α 12 x 2 K  − d − m 1 b 1 u 1 + k 1 − m 2 k 2 G 2 = r max e − g 2 u 2  1 − α 20 x 0 + α 21 x 1 + α 22 x 2 K  − d − m 1 k 1 − m 2 b 2 u 2 + k 2 Q = Q max − c  x 0 + x 1 + x 2 K  2 − w 1 m 2 1 − w 2 m 2 2 − r 1 u 2 1 − r 2 u 2 2 The model has several parameters, whose interpretations are summarized in T able II. Note that in the code additional parameters a 0 , a 1 , a 2 , a 3 are defined, with   α 00 α 01 α 02 α 10 α 11 α 12 α 20 α 21 α 22   =   a 0 a 1 a 1 a 2 a 0 a 3 a 2 a 3 a 0   W e ran experiments comparing the algorithm for comput- ing a Stackelberg equilibrium [8] to our new algorithm for computing OSESS on a problem instance using the same pa- rameter v alues that hav e been pre viously used [17], which are provided in T able III. Both approaches in volve solving QC- QPs using Gurobi’ s nonconv ex mixed-integer quadratically- constrained programming (MIQCP) solver version 11.0.3 build v11.0.3rc0 [13] with Jav a version 14.0.2. Gurobi’ s noncon vex MIQCP solver guarantees global optimality (up to a numerical tolerance). W e used Gurobi’ s default numerical tolerance value of 10 − 6 . W e used Gurobi’ s default values for all settings other than DualReductions which we set to 0 since we observed improved performance by disabling aggressiv e presolve reductions. W e used an Intel Core i7- 1065G7 processor with base clock speed of 1.30 GHz (and Parameter Interpretation r max Max cell growth rate g i Cost of resistance strategy (cell type) i α ij Interaction coefficient between cell types i and j K Carrying capacity d Natural death rate k i Innate resistance that may be present before drug exposure b i Benefit of the ev olved resistance trait in reducing therapy efficac y Q max Quality of life of a healthy patient w i T oxicity of drug i r i Effect of resistance rate of cell type i c W eight for impact of tumor burden vs. drug toxicity/treatment-induced resistance rate T ABLE II I N TE R P R ETA T I O N S O F M O D EL PA RA M E T ER S maximum turbo boost speed of up to 3.9 GHz) with 16 GB of RAM under 64-bit W indo ws 11 (8 logical cores/threads). Parameter V alue r max 0.45 g 1 0.5 g 2 0.5 a 0 1 a 1 0.15 a 2 0.9 a 3 0.9 K 10,000 d 0.01 k 1 5 k 2 5 b 1 10 b 2 10 Q max 1 w 1 0.5 w 2 0.2 r 1 0.4 r 2 0.4 c 0.5 T ABLE III P A R AM E T E R V A L U ES U S ED I N E X P E RI M E N TS The main results from our experiments are depicted in T able IV. Note that the optimal physician strategies ( m ∗ ) vary slightly between the two solutions, and the optimal quality of life Q ∗ is slightly higher for the OSESS solution than the SE solution. The SE algorithm ran significantly faster than the OSESS algorithm (around 2 seconds versus 2.2 minutes). Both algorithms are based on KKT necessary optimality conditions and require the application of ex-post certification procedures to ensure global optimality . For both algorithms we recompute the values of Q and G i exactly by calculating them directly from the solutions output from Gurobi (since the v alues of Q and G i in the optimizations themselves contain numerical error). For the OSESS algo- rithm the certification procedure obtains G max 1 = 7 . 85 × 10 − 5 , G max 2 = 1 . 41 × 10 − 4 , which are both significantly below ϵ inv = 10 − 3 , so we conclude that our solution is in fact an OSESS. As it turns out, running the same certification procedure on the Stackelber g equilibrium solution obtains G max 1 = 8 . 00 × 10 − 5 , G max 2 = 1 . 42 × 10 − 4 , which are also both significantly below ϵ inv = 10 − 3 . So for these particular game parameters it turns out that the SE solution obtained is also an OSESS, though this will not be the case in general. Stackelberg Equilibrium OSESS m ∗ 1 0.4105 0.4003 m ∗ 2 0.4680 0.4571 u ∗ 1 0.2139 0.1827 u ∗ 2 0.2856 0.2828 x ∗ 0 5731.0481 5823.7239 x ∗ 1 0.0087 9.5179 x ∗ 2 950.7623 946.4278 Q ∗ 0.5978 0.6029 Runtime (seconds) 2.2070 134.6910 T ABLE IV E X PE R I M EN TA L R E S U L T S F O R E VO L UT I O NA RY C A N C ER G A ME V . C O N C L U S I O N Stackelber g ev olutionary games have emerged as a power - ful model for studying many types of biological interactions, including cancer therapy , fishery management, and pest con- trol [18]. In these games, a rational human leader first selects their strategy , then ev olutionary followers select their strate- gies conditional on the leader’ s strategy (possibly subject to ecological population equilibrium constraints). Prior work has assumed that the follo wer players act to maximize fitness functions G i . In the resulting Stackelberg equilibrium, the leader and followers are simultaneously maximizing their re- spectiv e objectiv e functions. Howe ver , this solution does not guarantee that the follower strate gy is resistant to in vasion by rare mutants. T o ensure ev olutionary stability we must im- pose the condition that the fitness functions are nonpositiv e (subject to numerical tolerance) for all possible mutations so that their populations do not continue to grow . W e introduced the new solution concept evolutionarily stable Stackelber g equilibrium (SESS) in which the leader is a rational utility maximizer , while the followers employ an ev olutionarily stable strategy (ESS) that is resistant to in vasion. In an SESS, the leader commits to a strategy that maximizes their objectiv e assuming the followers will play an ESS in the subgame induced by the leader’ s strategy . This imposes a stricter stability requirement on the follower strategy than standard Stackelberg equilibrium, which only requires best- response behavior without ensuring ev olutionary stability . W e define SESS both for discrete normal-form games, as well continuous-trait games, which arise naturally in biological and control applications such as cancer treatment. In our general formulation, the followers play any ESS in the subgame induced by the leader’ s strategy . W e also consider the special cases where the followers play the ESS that is most beneficial to the leader (OSESS) and the ESS that is worst for the leader (PSESS). OSESS is perhaps the easiest case of SESS to compute since in a sense all players are aligned in maximizing the leader’ s payof f (subject to the follo wers playing an ESS), which simplifies the optimization to a single outer maximization problem. W e present algorithms for computing OSESS in both the discrete and continuous settings. If we view the follower strategies as population frequencies, then our model is applicable to settings with a large number of follower players; e.g., there may be a large number of cancer cells where each cell has one of a small number of possible phenotypes. W e implemented our OSESS algorithm on an ev olutionary cancer game model that has been previously studied and demonstrated that its runtime is reasonable in practice. It is not surprising that OSESS computation takes longer than SE computation, since OSESS is essentially SE with additional stability constraints imposed and therefore in volves solving a more challenging optimization problem. As it turns out, the Stackelber g equilibrium solution for this game ended up being an OSESS as well, though this is not guaranteed in general. This suggests that for larger games perhaps a good approach would be to first compute an SE, which is more tractable than SESS, and test whether it also satisfies ev olutionary stability; if not then we can continue on to the more computationally expensi ve SESS algorithm. R E F E R E N C E S [1] Andris Abakuks. Conditions for evolutionarily stable strategies. Journal of Applied Pr obability , 17(2):559–562, 1980. [2] Manon Blanc and Kristoffer Arnsfelt Hansen. Computational com- plexity of multi-player ev olutionarily stable strategies. In International Computer Science Symposium in Russia (CSR 2021) , volume 12730 of Lectur e Notes in Computer Science , pages 44–58. Springer , 2021. [3] I. M. Bomze. Detecting all ev olutionarily stable strategies. Journal of Optimization Theory and Applications , 75:313–329, 11 1992. [4] Mark Broom and Jan Rycht ´ a ˇ r. Game-Theoretical Models in Biology . CRC Press, 2013. [5] V incent Conitzer . The exact computational complexity of ev olution- arily stable strategies. In Conference on W eb and Internet Economics (WINE-13) , pages 96–108, 2013. [6] David Dingli, Francisco A. C. C. Chalub, Francisco C. Santos, Sven V an Segbroeck, and Jorge M. Pacheco. Cancer phenotype as the outcome of an evolutionary game between normal and malignant cells. British Journal of Cancer , 101(7):1130–1136, 2009. [7] Kousha Etessami and Andreas Lochbihler . The computational com- plexity of evolutionarily stable strategies. International Journal of Game Theory , 37(1):93–113, 2008. [8] Sam Ganzfried. Computing Stackelberg equilibrium for cancer treat- ment. Games , 15(6), 2024. [9] Sam Ganzfried. Computing evolutionarily stable strategies in imperfect-information games, 2025. arXiv:2512.10279 [cs.GT]. [10] Sam Ganzfried. Computing ev olutionarily stable strategies in multi- player games, 2025. arXiv:2511.20859 [cs.GT]. [11] Stefan A. H. Geritz, ´ Eva Kisdi, G ´ abor Mesz ´ ena, and Johan A. J. Metz. Evolutionarily singular strategies and the adaptive growth and branching of the ev olutionary tree. Evolutionary Ecology , 12(1):35– 57, 1998. [12] Srihari Govindan and Robert W ilson. A global Newton method to compute Nash equilibria. Journal of Economic Theory , 110:65–86, 2003. [13] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2026. [14] John Haigh. Game theory and evolution. Advances in Applied Pr obability , 7(1):8–11, 1975. [15] P . Jean-Jacques Herings and Ronald Peeters. Homotopy methods to compute equilibria in game theory . Economic Theory , 42(1):119–156, 2010. [16] Josef Hofbauer and Karl Sigmund. Evolutionary Games and P opula- tion Dynamics . Cambridge University Press, Cambridge, UK, 1998. [17] Maria Kleshnina. https://github.com/kleshnina/ SEGopinion , 2023. [18] Maria Kleshnina, Sabrina Streipert, Joel Brown, and Kate ˇ rina Sta ˇ nkov ´ a. Game theory for managing ev olving systems: Challenges and opportunities of including vector-v alued strategies and life-history traits. Dynamic Games and Applications , 13:1–26, 2023. [19] Michael Maschler , Eilon Solan, and Shmuel Zamir . Game Theory . Cambridge Univ ersity Press, 2013. [20] John Maynard Smith. Evolution and the Theory of Games . Cambridge Univ ersity Press, Cambridge, UK, 1982. [21] John Maynard Smith and George R. Price. The logic of animal conflict. Natur e , 246:15–18, 1973. [22] Richard D. McKelve y and Andrew McLennan. Computation of equilibria in finite games. In H. Amann, D. Kendrick, and J. Rust, editors, Handbook of Computational Economics , v olume 1, pages 87– 142. Elsevier , 1996. [23] John McNamara, James N. W ebb, E. J. Collins, T am ´ as Sz ´ ekely , and Alasdair I. Houston. A general technique for computing evolutionarily stable strategies based on errors in decision-making. J ournal of Theor etical Biology , 189:211–225, 1997. [24] John Nash. Non-cooperative games. Annals of Mathematics , 54(2):286–295, 1951. [25] William H. Sandholm. P opulation Games and Evolutionary Dynamics . MIT Press, Cambridge, MA, 2010.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment