Semi-Decentralized Coordinated Online Learning for Continuous Games with Coupled Constraints via Augmented Lagrangian
We consider a class of concave continuous games in which the corresponding admissible strategy profile of each player underlies affine coupling constraints. We propose a novel algorithm that leads the relevant population dynamic toward Nash equilibri…
Authors: Ezra Tampubolon, Holger Boche
Semi-Decentralized Coordinated Online Learning f or Continuous Games with Coupled Constraints via A ugmented Lagrangian Ezra T ampubolon and Holger Boch e Abstract — W e co n sider a class of concav e continuous games in which the corresponding admissible strategy profile of each player underlies affin e coupling constraints. W e propose a novel algorithm that leads the corresponding population dynamic toward Nash equ i librium. This algorithm is b ased on a mirror ascent algorithm, wh ich sui t s with the framework of no-regret online learning, and on the augmented Lagrangian method. The d ecentralization aspect of the algorithm corresponds to the aspects that the iterate of each player requires the local informa tion of about ho w she contributes to the coup ling constraints and the price vector b roadcasted by a central coordinator . So each player need not know about the population action. Mor eover , no specific control by the central coordinator is required. W e give a condition on the step sizes and the degree of the augmentation of th e Lagrangian, su ch that th e propose d algorithm con verges to a generalized Nash equilibriu m. I . I N T R O D U C T I O N Competitive non - coopera ti ve selfish ag ents appear as a model in a vast n umber o f applications (see also [1]) such as smart grid [2 ]–[11 ] , competitive markets [1 2 ], and congestion control fo r network s [13]. T he famou s concep t of non- cooper a tive con tin uous game th e ory is suited to analyze such applications: Th e ty pical setting is that a set of agents repeatedly inter a ct with each other, in the sense that at time t the payoff/re ward of an agent d epends not only on his ac tion but also on the joint action of all other agents which is n ot visible fo r him. Giv en th e uncerta in ty faced b y one ag ent about the joint action of th e oth ers, he has therefor e to ch oose his action in an online manner aim ing to op timize his time-variant reward (for detailed discussions see [14] and references therein ) . A reasonable assumption on their behavior is that they a p ply the no-regret policy (see, e.g., [14 ], [1 5]) known in the literature of on line learn ing. The can onical class of no -regret p olicies in the black-box en vir onment, i. e., in the en viro n ment wher e no furth er assumptions on the utility func tio ns p u i q i other than con cavity , is the so-called online mirr or ascent wh ich is a cano nical extension of famous mirror ascen t alg orithm. Mirror ascent consists at e ach time instance the gra d ient step in the dual spa c e an d the "mirr or" step whic h maps it back to the feasible p rimal region . In app lications, there is often a couplin g b etween the ideal agent’ s actio n set a n d the actual othe r ag ents’ action . Some examp les are TCP (cong estion) con trol prob lem [1 6], where th e tra nsmit r ates of the agents un derly capa city link constraints, th e problem o f ch arging of electrical vehicles E. T ampubolon and H. Boche is with the Department of Electric al and Computer Engineering, T echnic al Univ ersity of Munich, 80333 Munich, Germany {ezra.t ampubolon,boche }@tum.de [8], where a thresho ld of the total d e mand o f th e units of power of the agen ts is av ailab le , an d MIMO in terference systems (see e.g. [ 1] and referen ces ther ein), where the transmission strategies of the seco n dary u sers ( agents) in the form of power alloc a tio n vectors over the sub-carrier s underly sum co nstraint. Notice that in all the previously mentioned applications, th e co upled con strained is of affine form. In this work, we consider a novel mirr or ascent b ased algorithm s for concave games that can handle coupled af fine constraints. I n each time step, eac h ag ents executes a mirror ascent update which r e quires lo cally av ailable first-o rder informa tio n of their utility function an d th e pr ice vector broadc a sted by the cen tral co ordinator . The latter is updated by the centra l c o ordinato r using the au gmented- Lagrang ian- based update. W e give a sufficient c ondition on th e (n on- adaptive) step size sequences o f the agen ts an d of the central coordin ator a n d on the augmenta tio n of the Lagrang ian such that th e prop osed algorithm co n verges to a (variationally stable) Nash equilibriu m. As a matter of course, the pro p osed algorithm can also be used b y a system desig ner to design agent control algorithm s in ord e r to ge n erate a d esirable collective behavior in th e case that the latter coincides with the Nash equ ilibrium of the considered cou pled con strained game. Th is is done by correspo n dingly designing lo calized age n t utility functio n and d e signing a related coor dination strategy . Accor ding to our recom mendation , the system designe r migh t realize the latter based on iterative commun ic a tio ns with a c entral co- ordinato r that can gather an d bro adcast info rmation fr om/to the pop ulation. The mo tivation for this semi- decentralized approa c h a r ises f r om the privac y demand b e tween th e age nts, and the com putational intractab ility of a fully centralized solution. Relation to Prior W orks: [17] provid es, among other things, a deterministic ana ly sis o f the on line mirro r ascen t algorithm for g ames with a continuou s action set. In co n trast to th is work, we consider gam es with co ntinuou s action sets which underlies in additio n to coupling c o nstraint so that the admissible set of po pulation strategy profile is not necessarily of pr oduct structur e. W e have to mod ify the de c e ntralized algorithm given in [17 ] and make use of a c e ntral c o ordinato r in ord e r to han dle such add itional constraints. For this reason, our work is an extension of the deter ministic resu lt g iven in [17]. [18] gives th e mo st recent semi-dec e n tralized first-order algorithm for findin g eq uilibrium and hand lin g co upled con- straints.There, the au thors lev er age mostly f rom th e fixed- point m e th od fo r finding the solution of variational inequality (see e.g., Chapter 12 in [ 1 9]), which resu lts in (Euclidean - )projectio n-based algor ithm. Our algorithm is based mor e generally on the mirro r map, which c o nstitutes a general- ization of th e Euclidean projection . For th is reason, we are not able to use the usual fixed-point approach for variational inequality . Moreover , the asymmetric algorith m pr oposed in [18] u ses constant step size in co ntrast to our algo r ithm, which u ses variable step size. W e take the inspiration for the metho d o f augmentatio n of the Lagrang ian from the work [20]. Th ere, the authors provide an algo rithm for o nline optimization with sub-lin ear regret b ound able to hand le co nstraint. The method of augmen ta tio n of the Lag rangian helps to ob tain a sub-linear bound on the v iolation of constraints. Basic Notations: In this work we c o nsider alw ay s the linear Euclidean space R D . The pro jec tion on to th e closed conv ex subset A of R D is deno ted by Π A . The dual nor m of a nor m } ¨ } on R D is den oted b y } ¨ } ˚ . F : R D Ñ R D is said to be Lipsch itz con tinuous on an a non-emp ty subset Z Ă p R D , } ¨ }q w ith constan t L ą 0 if } F p x q ´ F p z q} ˚ ď L } x ´ z } , @ x, z P Z . F is said to be mono tone on Z if : x x 1 ´ x 2 , F p x 1 q ´ F p x 2 qy ď 0 , for all x 1 , x 2 P Z . If in th e latter strict inequa lity hold for x 1 ‰ x 2 , th en F is said to b e strictly monoton e. I I . M O D E L D E S C R I P T I O N S W e consider a non-c ooperative game (NG) Γ play ed b y a finite set of players r N s “ t 1 , . . . , N u . During the g ame each player i P r N s can choose an action /strategy x p i q from a non-em pty compact co n vex su bset X i of a finite-dimen sional normed space p R D i , } ¨ } i q . A usual assumption on the action set is the f ollowing: Assumptions 1: X i is a non- e mpty co mpact conve x subset of a finite d imensional spac e V i . The p ayoff/rew ard for player i P r N s is given by th e function u i : X Ñ R , wher e X “ ś i X i , and the actu al action/strategy-pr ofile x “ p x p 1 q , . . . , x p N q q P X : “ ś i X i . If we work with the whole populatio n, we con sider the normed spac e p ś N i “ 1 R D i , | | |¨| | |q , where | | | x | | | : “ ř i } x p i q } i . In order to hig hlight the action of p lay er i we often write x “ p x p i q , x p´ i q q wher e x p´ i q “ p x p j q q j ‰ i . Mo reover , we mostly assume in this work the fo llowing regularity cond ition for th e utility function s: Assumptions 2: F or all i P r N s and x p´ i q P X ´ i , u i pp¨q , x p´ i q q is conca ve and v i p x q : “ ∇ x p i q u i p x q is con tinuous. In this work, we are sp e cifically interested in NG Γ with coup le d co nstraints (NGCC), i.e. in NG Γ which is in addition subject to cou p led inequality c onstraints C : “ t x P R N : g i p x q ď 0 , i P r M su , or in the vectorized for m C : “ t g p x q ď 0 u . So the set of feasible strategy profile is Q : “ C X X which we assumed to be n on-emp ty , an d corresp o ndingly th e set of feasible strategy fo r play er i is Q p i q p x p´ i q q : “ t x p i q P X i : g p x q ď 0 , u . W e d enote NGCC by Γ “ p r N s , u, X , C q , where u : “ p u 1 , . . . , u N q . For simplicity , we consider more specifically linear co nstraint, where g p x q “ Ax ´ b with A “ r A p : , 1 q , . . . , A p : ,N q s P R M ˆ ř N i “ 1 D i , where A p : ,j q P R M ˆ D j , an d where b P R M . I n this case, each agents is assum e to now only its contribution to the inequality c o nstraints, which means that A p : ,i q is o nly v isible to agent i . The following regularity condition on Q is useful fo r later purpo ses: Assumptions 3 (Slater’ s condition): There exis ts x ˚ P r elint p X q s.t. Ax ˚ ă b , where r elin t p X q denotes th e r ela tive interior o f X . Mirr o r Map and F enchel Couplin g: In general, first order ev olution takes place in the du al space . So, in o rder to realize their ac tions, the agents need a mapping to project the iterate back to their individual constraint sets. A can onical way to do th is is by mean s of th e fo llowing: Definition 1 (Reg ula rizer/penalty fct. and Mirror Map ): Let Z be a c o mpact conve x sub set of a normed sp ace p E , } ¨ }q , and K ą 0 . W e say ψ : Z Ñ R is a K -str ong ly conve x r e gu larizer (or penalty function ) on Z , if ψ is continuo us a nd K -str ongly con vex on Z , in the sense tha t for a ll x, y P X and λ P r 0 , 1 s : ψ p λx ` p 1 ´ λ q y q ď λψ p x q` p 1 ´ λ q ψ p y q´ K 2 λ p 1 ´ λ q} x ´ y } 2 . The mirr or map Φ : E ˚ Ñ Z induced by ψ is defi n ed by: Φ p y q : “ ar g max x P X tx x, y y ´ ψ p x qu . In case that the conve x conjug ate: ψ ˚ p y q “ max x P Z tx y , x y ´ ψ p x qu of the K -strong ly conve x regularize r ψ on Z is k nown, one can comp ute the mirro r map by Φ “ ∇ ψ ˚ . Moreover, it can be shown th a t Φ is 1 { K -Lipschitz contin uous. For a pr o of of those facts, see e.g. Theo rem 23.5 in [21] and Th eorem 12.60 (b) in [22 ]. Mirror map constitutes a g eneralization o f the usual E u- clidean projection operator . In teresting exam ple of mirro r map is the so called logit choice: Φ p y q “ exp p y q ř D l “ 1 exp p y l q which is generated b y the penalty fu nction: ψ p x q “ D ÿ k “ 1 x k log x k , known as the Gibbs entro py , on the simplex ∆ Ă p R D , }} 1 q . As noticed in [2 3], a con vex regularizer ind u ces can oni- cally the following notion o f "distance" : Definition 2 (F enchel Coupling [23]): Let ψ : X Ñ R b e a pena lty fu nction on X . Then the F en chel co upling induced by ψ is defined a s F p p, y q “ ψ p p q ` ψ ˚ p y q ´ x p, y y , p P X , y P E ˚ . Some useful pro perties of the Fenchel couplin g is stated in the following (for proof see [23]): Proposition 1: Let F be the F enchel c o upling induced by a K -str ongly conve x re gularizer o f X . F or p P X , y , y 1 P V ˚ , we ha ve: 1) F p p, y q ě p K { 2 q} Φ p y q ´ p } 2 2) F p p, y 1 q ď F p p, y q ` x y 1 ´ y , Φ p y q ´ p y ` p 1 { 2 K q} y 1 ´ y } 2 ˚ Throu g hout th is work, we assume that each ag e n t i P r N s p o ssess a K i -strongly conve x regularizer ψ i which induces the m ir ror ma p Φ i , and the Fen chel couplin g F i . In order to emph asize the action of th e whole po pulation, we sometimes u se the o perator Φ : ś i R D i Ñ X , y ÞÑ p Φ 1 p y p 1 q q , . . . , Φ N p y p N q qq an d the "total" Fenchel coup ling F N : X ˆ ś i R D i , p x, y q ÞÑ ř i F i p x i , y i q . Algorithm: The evolution of th e agents wh ich we con - cerned with in this work is given in the fo llowing: Algorithm 1 Mirror ascent with Au gmented Lagrang ia n (MAAL) Require: Step size sequ ence p γ t q t , au gmentation seq uence p θ t q t , initial du al action Y p i q 0 P V ˚ i , and initial dual variable λ 0 . for t “ 1 , 2 , . . . do for e very player i P r N s do Play X p i q t Ð Φ i p Y p i q t q Observe v i p X t q Update Y p i q t ` 1 Ð Y p i q t ` γ t p v i p X t q ´ A T p : ,i q λ t q end for Central operator update: λ t ` 1 Ð Π R M ě 0 p λ t ` γ t rp AX t ´ b q ´ θ t λ t sq Central operator broad cast λ t ` 1 to all player s. end for The difference b etween usua l onlin e m irror ascen t is that th e gradien t update for agent i has a n add itional term in volving h is c ontribution to the con straint set ( A T p : ,i q ) and the price vector λ t provided by the centr al co ordinato r . For this reason, we spea k of sem i-decentralized u pdate. Th e price vector is upd ated via p rojected g radient ascen t for maximizing the augm ented Lag rangian du al objective. The aspect o f augmentatio n o f Lagrangia n is reflected in the term ´ θ t λ t . By th is reason , p θ t q t is called augm entation sequence. I I I . V A R I A T I O NA L D E S C R I P T I O N O F E Q U I L I B R I U M S A classical notio n of equ ilibrium is the Na sh equilib rium. It descr ibes the state in wh ich no ag ent can increase his payoff by u nilaterally chan ging his strategy: Definition 3 (Na sh Equilibrium): x ˚ P C is a Nash equilib - rium o f the NGCC Γ “ pr N s , u, X , C q , if for every i P r N s : u i p x ˚ q ě u i p x p i q , x p´ i q ˚ q , @ x p i q P Q p i q p x p´ i q ˚ q (1) A. V ariational In e quality and Nash equilib rium Rather than with the con cept of Nash equilibr iu m, it is advantageous from the an alytical point of view to work with the concept of the so-called v ar ia tio nal inequality ( VI): Definition 4: Let Z be a su bset of a finite dimension al normed sp a ce p E , } ¨ }q , and suppose that F : Z Ñ E ˚ . A point x P Z is a solu tion of the va riational inequ ality VI p Z , F q , if x x ´ x, F p x qy ď 0 , @ x P Z . The set of solution of VI p X , F q is den oted by SOL p X , F q . The usual first order optimality co ndition f or conve x o pti- mization asserts th e fo llowing relation between two conc epts: Proposition 2: If Assumption 2 hold s, then SOL p Q , v q is a subset of the set of Nash equilibriu m s. In the case wher e no cou p ling constraint is present, i.e. C “ X , then the conv er se of above propo sition hold s. Howe ver, due to the couplin g constrain t, a Nash equ ilibrium h as not to be a solution of variational ine quality . Another nice th ing about VI is that under mild co ndition one can establish existence of its so lu tion. For instance it is known that in ca se Z ‰ H is compact an d conve x and F is con tinuous, then ther e exists at least a solution of SOL p Z , F q . Moreover in case that F “ v a n d Z “ Q , the latter and Proposition 2 implies the existence of a Nash equilibriu m for Γ : Proposition 3: Suppo se that the Assumption 2 hold s. Then Γ has a Nash equilibrium. In ca se that A ssumption 2 hold s in the strict manner , th en Γ has a unique Nash equ ilibrium. B. Deco upling the Constraints by means of Lagrangian Method As we have alread y seen, th e equilibr ium o f the co n- strained game Γ is related to the solu tion of the variational inequality VI p Q , v q . In o rder to analyze V I p Q , v q it is conv en ient to extend the p revious pro blem to the p roblem VI p X ˆ R M ` , ˜ v q , where ˜ v : X ˆ R M ` , ˜ v : X ˆ R M ` , p x, λ q ÞÑ “ v p x q ´ A T λ, Ax ´ b ‰ T . The ad vantage o f this metho d is the d ecoupling of th e constraint set, i. e . we only ha ve to h andle with the constrain t set X ˆ R M ě 0 with p r oduct stru cture rathe r than with Q . The following shows tha t there is no burden in doing this: Proposition 4: Suppo se that Assumption 2 and Assumption 3 holds. The following statements ar e equivalent: 1) x P Q is a solution of VI p Q , v q 2) Ther e e xists λ P R M ě 0 s.t. p x , λ q is a solutio n of V I p X ˆ R M ` , ˜ v q . The pro of is standar d KKT argum entation and based on e.g. 1.3.4 Propo sition in [ 19] (see also Subsubsectio n 4 .3.2.2 in [1]). So in order to solve V I p Q , v q it is sufficient to seek for the solutio n of VI p X ˆ R M ` , ˜ v q . It follows from Propo sition 5, we need to seek fo r the latter for variationally stable set for p X ˆ R M ` , ˜ v q , assuming that it is non-emp ty . C. V ariational Ineq u ality and V ariation al Sta bility In case that G is a mon o tone oper ator , it holds: x x ´ x, G p x qy ď x x ´ x, G p x qy ď 0 , @ x P Z , x P SOL p Z , G q . This moti vates to in troduce th e following notion: Definition 5: W e say th at a closed VS p Z , F q Ă Z is a variationally stab le set for p Z , F q , if: x x ´ x, F p x qy ď 0 @ x P Z , x P VS p Z , F q (2) with equ ality for a given x P VS p Z , F q if and on ly if x P VS p Z , F q . As we will see later, it is convenient algor ith mically to work with the concept of variational stability instead with the concept of variational inequ ality . Howe ver, the following giv e s th at under mild condition , both concepts ar e the same: Proposition 5: Suppo se that V S p Z , F q ‰ H . Then VS p Z , F q “ SOL p Z , F q . Pr oof: Let x P SOL p Z , F q b ut x R VS p Z , F q . W e hav e since x P SOL p Z , F q , x x ´ x, F p x qy ď 0 , @ x P Z . So in particular fo r an x ˚ P ˜ Z : x x ˚ ´ x, F p x qy ď 0 . (3) Moreover , since x R VS p Z , F q and x ˚ P VS p Z , F q , we have x x ´ x ˚ , F p x qy ă 0 an d thus x x ˚ ´ x, F p x qy ą 0 . This contradicts to (3), so that we can imply the desired statement. Remark 1: The assum ption tha t VS p Z , F q ‰ H is clo sed appears at the first sight forced . However , if Z is a non empty compact convex set, F “ ∇ g where g : Z Ñ R is a concave f u nction then the assump tion is true. I ndeed Since g is concave, ∇ g is monoton e. Th erefore: x x ´ x, ∇ g p x q ´ ∇ g p x q y ď 0 . By the fir st-order optima lity con dition we have for x ˚ P arg max g and x P Z , x ∇ g p x q , x ´ x y ď 0 and thus by monoto nicity: x x ´ x, ∇ g p x qy ď x x ´ x , ∇ g p x q y ď 0 . Moreover concavity asserts that x ∇ g p x q , x ´ x ˚ y ă 0 whenever x is not a maximizer of g . T hus we have that arg max g “ VS p Z , ∇ g q and the fact that ar g max g ‰ H implies tha t VS p Z , ∇ g q ‰ H . Moreover it is easy to see tha t arg max g is closed. I V . B O U N D F O R P R I M A L - D U A L I T E R A T E V I A F E N C H E L C O U P L I N G W e begin by measu ring the distance b etween the evolution of each agents and a strategy pro file by means of the "total" Fenchel coup ling F N , which is cr ucial to provid e conver - gence theore m for MAAL . By using Proposition 1, inserting the iter ate of the algor ithm, using triangle inequality , we have for all x P X : F N p x, Y t ` 1 q ´ F N p x, Y t q ď γ t x x X t ´ x, v p X t q ´ A T λ t y y ` γ 2 t 2 K p C 2 1 ` C 2 2 } λ t } 2 2 q where C 1 , C 2 ą 0 are constants fulfilling : | | | v p x q| | | ˚ ď C 1 , ˇ ˇ ˇ ˇ ˇ ˇ A T λ ˇ ˇ ˇ ˇ ˇ ˇ ˚ ď C 2 } λ } 2 , @ x P X , λ P R M ě 0 . (4) By summing over all t “ 0 , . . . , T a n d su b sequent telesco p- ing, we obtain a boun d for E p 1 q T p x q : “ F N p x, Y T q ´ F N p x, Y 0 q . That is: E p 1 q T p x q ď T ÿ t “ 0 γ t “ x x X t ´ x, v p X t q ´ A T λ t y y ‰ ` C 2 1 2 K T ÿ t “ 0 γ 2 t ` C 2 2 2 K T ÿ t “ 0 γ 2 t } λ t } 2 2 , (5) In o rder to e lim inate the term (5) inv olv ing the dual iterate λ t , we now estimate of th e distance be tween the dual iter ate and any dual po int. W e can bou nd: E p 2 q T p λ q : “ p} λ ´ λ T } 2 2 ´ } λ ´ λ 0 } 2 2 q{ 2 for any λ P R M ě 0 by: E p 2 q T p λ q ď T ÿ t “ 0 γ t x λ t ´ λ, AX t ´ b y ´ T ÿ t “ 0 γ t θ t 2 p} λ t } 2 2 ´ } λ } 2 q ` T ÿ t “ 0 γ 2 t p C 2 3 ` θ 2 t } λ t } 2 2 q , (6) where C 3 ą 0 is a constant fulfilling: } Ax } 2 ď C 3 . (7) (6) can be proven in the similar mann e r as th e pro of o f (5). By com bining (5) and (6) we o btain immed iately the following estimate f or the e volution o f : ˜ F pp x , λ q , p Y T , λ T qq : “ F N p x, Y T q ` p} λ t ´ λ } 2 2 { 2 q , which is: Theorem 6: Let C 1 , C 2 , C 3 ą 0 be constants fulfilling ( 4) and (7) . It hold s for: E T p x, λ q : “ ˜ F pp x, λ q , p Y T , λ T qq ´ ˜ F pp x , λ q , p Y 0 , λ 0 qq and for all p x, λ q P X ˆ R M ě 0 : E T p x, λ q ď T ÿ t “ 0 γ t x xp X t , λ t q ´ p x, λ q , ˜ v p X t , λ t qy y „ ` ˜ C 1 T ÿ t “ 0 γ 2 t ` T ÿ t “ 0 γ t θ t } λ } 2 2 2 ` T ÿ t “ 0 γ t } λ t } 2 2 ” γ t ´ 2 θ 2 t ` ˜ C 2 ¯ ´ θ t 2 ı wher e: ˜ C 1 : “ C 2 1 2 K ` 2 C 2 3 ˜ C 2 : “ C 2 2 2 K , and for all x, ˜ x P ś i R D i and λ, ˜ λ P R M : x xp x, λ q , p ˜ x, ˜ λ qy y „ “ x x x, ˜ x y y ` x λ, ˜ λ y . V . C O N V E R G E N C E A N A L Y S I S In this sectio n we investigate the conver ge nce of MAAL to the variational stable set VS p Q , v q . As alread y discuss in Section III, this leads, in the case th at VS p Q , v q ‰ H , to the conv ergen ce of MAA L to th e solu tion SOL p Q , v q of variational ineq uality VI p Q , v q and to th e conv ergen ce o f MAAL to the correspon ding subset of the Nash equilib r ium of Γ . By 1 . in prop osition 1 it follows that con vergence with respect to F N implies th e conv ergen ce of th e iterate w .r .t. the underly ing norm | | |¨| | | . The refore th e bound for F N p x, Y T q provided in p r evious section helps us to establish the desired statement. For technical reason, it is a dvantageous to h av e the con verse proper ty : Assumptions 4: F or any p P X and any sequence p Y n q n in V ˚ , it holds: Φ p Y n q Ñ p ñ F N p p, Y n q Ñ 0 Define fo r C Ă X and ˜ C Ă X ˆ R M ě 0 : F N p C , y q : “ inf F N p x, y q : x P C ( ˜ F p ˜ C , z q : “ inf ! ˜ F p x, y q : x P C ) Notice that the pro perty g iv en ( 4) holds also in case th at p is substituted more generally by a closed set: Proposition 7: Suppo se that Assumption 4 ho lds. Let C be a closed subset of X a nd ˜ C be a closed subset of X ˆ R M ě 0 . Then Φ p Y t q Ñ C if an d only if F N p C , y q Ñ 0 an d p Φ p Y t q , ˜ λ t q Ñ ˜ C if and on ly if F N p ˜ C , p Φ p Y t q , ˜ λ t qq Ñ 0 In the f ollowing, we state the f ollowing conv ergen ce statement fo r the iterate p X t , λ t q of MAAL: Theorem 8: Let ˜ C 2 ą 0 be a constant as given in Theor em 6. S uppose that Assumption 4 hold s. Suppo se tha t p γ t q t satisfies: 8 ÿ t “ 0 γ t “ 8 , ř T t “ 1 γ 2 t ř T t “ 1 γ t Ñ 0 , T Ñ 8 . (8) F or an a ugmenta tio n sequ ence p θ t q t satisfying: ř T t “ 1 γ t θ t ř T t “ 1 γ t Ñ 0 , T Ñ 8 , (9) and: γ t ´ 2 θ 2 t ` ˜ C 2 ¯ ´ θ t 2 ď 0 , for lar ge t ě 0 . (10) It h olds for the iterates o f MAA L: 1) Ther e exists a sub seq uence p X t k , λ t k q k of p X t , λ t q t s.t. p X t k , λ t k q Ñ VS p X ˆ R M ě 0 , ˜ v q as k Ñ 8 . 2) p X t , λ t q Ñ VS p X ˆ R M ě 0 , ˜ v q a s t Ñ 8 , Pr oof: T o show the first statement of th e Theorem , notice th at: E T p x ˚ , λ ˚ q ď τ T ´ ř T t “ 0 γ t ξ t p x ˚ ,λ ˚ q τ T ` ˜ C 1 ř T t “ 0 γ 2 t τ T ` ř t t “ 0 γ T ψ t τ T ¯ , (11) where τ T : “ ř T t “ 0 γ k , ξ t p x ˚ , λ ˚ q : “ x xp X t , λ t q ´ p x ˚ , λ ˚ q , ˜ v p X t , λ t qy y „ , ψ t : “ } λ t } 2 2 ” γ t ´ 2 θ 2 t ` ˜ C 2 ¯ ´ θ t 2 ı ` θ t } λ } 2 2 2 ď θ t } λ } 2 2 2 , (12) where the inequality in (12) follows by (10). Let be U b e an arbitrary neigh borho od ( w .r .t. a no rm e .g. } ¨ } 2 ) of VS p X ˆ R M ě 0 , ˜ v q . Suppose that p X t , λ t q R U f or all sufficiently large t ě 0 . W e may assume w .l.o.g. that X t R U for all t ě 0 . So for all p x ˚ , λ ˚ q P VS p X ˆ R M ě 0 , v q , it follows that we can find c ą 0 s.t. ξ t p x ˚ , λ ˚ q ď ´ c , @ t ě 0 . This yields: E T p x ˚ , λ ˚ q ď τ T ´ ´ c ` ˜ C 1 ř T t “ 0 γ 2 t τ T ` ř t t “ 0 γ t ψ t τ T ¯ . (1 3) (8) ( r esp. (12) and (9)) gives that the second (resp. thir d) summand in (13) converges to 0 as t goes to infinity . Finally , by the fact that τ T Ñ 8 as T Ñ 8 , we have E T p x ˚ , λ ˚ q Ñ ´8 , n Ñ 8 , which contra d icts the fact that ˜ F pp x ˚ , λ ˚ q , p Y 0 , λ 0 qq is finite. Thu s p X t , λ t q P U fo r infinitely m any t ě 0 . T o show the conver ge n ce of p X t , λ t q , i.e. the secon d statement o f the Theorem, it is sufficient by 2 . in Proposition 1 to show that f or all ǫ ą 0 , and: U ǫ : “ ! p x, λ q : x “ Φ p y q , ˜ F p VS p X ˆ R M ě 0 , ˜ v q , p y , λ qq ă ǫ ) p X t , λ t q P U ǫ for all but finite t P N . T oward th is end, we show that for sufficiently large t , p X t , λ t q P U ǫ implies p X t ` 1 , λ t ` 1 q P U ǫ . Combinin g this fact with Lemm a 8 yield s finally the desired statement. W e ha ve by 3 . in Propo sition 1 and by 1 . in Lemma 8 : ˜ F pp x ˚ , λ ˚ q , p Y t ` 1 , λ t ` 1 qq ď ˜ F pp x ˚ , λ ˚ q , p Y t ` 1 , λ t ` 1 qq ` γ t ξ t pp x ˚ , λ ˚ qq ` γ t ψ t ` γ 2 t ˜ C 1 , (14) for a constant ˜ C ą 0 . Supp ose tha t p X t , λ t q P U ǫ . By Assumption 4 it ho lds that U ǫ { 2 contains a n eighbor hood of VS p X ˆ R M ě 0 , ˜ v q ( say w .r .t. } ¨ } 2 ). Otherwise we have that any n eighbor hood of VS p X ˆ R M ě 0 , ˜ v q w .r .t. | | |¨| | | is not contained U ǫ { 2 , an d since th e image o f Φ coincides with the domain of the subdifferential of ψ and ψ is subdifferentiable on the interio r o f X , we can choo se a sequ ence p ˜ Y t q t in V ˚ and a sequen ce p ˜ λ nt q in R M ě 0 satisfying p Φ p ˜ Y t q , ˜ λ t q Ñ VS p X ˆ R M ě 0 , ˜ v q but p Φ p ˜ Y t q , ˜ λ t q R U ǫ { 2 , i.e. ˜ F p VS p X ˆ R M ě 0 , ˜ v q , p Φ p ˜ Y t q , ˜ λ t qq ě ǫ { 2 . Since V S p X ˆ R M ě 0 , ˜ v q is closed, the latter contrad icts with Prop osition 7. Now , an imp lication o f the fact tha t U ǫ { 2 contains a neighbo rhood of VS p X ˆ R M ě 0 , ˜ v q is th at for all p x ˚ , λ ˚ q P VS p X ˆ R M ě 0 , ˜ v q , there exists c ą 0 s.t. x x ˜ v p x, λ q , p x, λ q ´ p x ˚ , λ ˚ qy y ď ´ c, @ x P U ǫ z U ǫ { 2 So, if X n P U ǫ z U ǫ { 2 (14) yields: ˜ F pp x ˚ , λ ˚ q , p Y t ` 1 , λ t ` 1 qq ď ˜ F pp x ˚ , λ ˚ q , p Y t , λ t qq ` γ t p´ c ` ψ t ` γ t ˜ C 1 q , By ( 12), we have | ψ t | Ñ 0 for t Ñ 8 and thus for large enough t P N , there exists ˜ c ą 0 s.t. ˜ F pp x ˚ , λ ˚ q , p Y t ` 1 , λ t ` 1 qq ď ˜ F pp x ˚ , λ ˚ q , p Y t , λ t qq ` γ t p´ ˜ c ` γ t ˜ C 1 q . and conseque ntly for sufficiently large t P N s.t. the inequality before and γ t ď ˜ c { ˜ C holds, we ha ve ˜ F p VS p X ˆ R M ě 0 q , p Y t ` 1 , λ t ` 1 qq ď ˜ F p VS p X ˆ R M ě 0 q , p Y t , λ t qqq ă ǫ , since by assumption p X t , λ t q P U ǫ . If p X t , λ t q P U ǫ { 2 , th en it follows f rom ( 14): ˜ F pp x ˚ , λ ˚ q , p Y t ` 1 , λ t ` 1 qq ď ˜ F pp x ˚ , λ ˚ q , p Y t ` 1 , λ t ` 1 qq ` γ t ψ t ` γ 2 n ˜ C . Thus for suf ficien tly large t P N s.t. γ t ψ t ` γ 2 t ˜ C ă ǫ { 2 , we have ˜ F p VS p X ˆ R M ě 0 q , p Y t ` 1 , λ t ` 1 qq ă ǫ . W e are done by combinin g all the observations. The co n vergence p X t q t is now immed iate: Corollary 9: S uppose that the assumptions given in Theor em 8 hold s and suppo se tha t VS p Q , v q , V S p X ˆ R M ě 0 , ˜ v q ‰ H . Then X n Ñ V S p Q , v q . Pr oof: Th eorem 8 asserts that p X t , λ t q Ñ VS p X ˆ R M ě 0 , ˜ v q . Proposition 4 and the assum ption VS p X ˆ R M ě 0 , ˜ v q ‰ H implies that VS p X ˆ R M ě 0 , ˜ v q “ SOL p X ˆ R M ě 0 , ˜ v q . M o reover Prop o sition 5 asserts that p x, λ q P SOL p X ˆ R M ě 0 , ˜ v q implies that x P SOL p X ˆ R M ě 0 , v q . Finally , since VS p Q , v q ‰ H we have VS p Q , v q “ SOL p Q , v q . At la st, let us provide an e xa m ple o f sequen c es p γ t q and p θ t q which f ulfills the condition given in T h eorem 8: Remark 2: Suppose that γ t “ 1 {p t ` 1 q , th en the condition (8) is f ulfilled since ř 8 k “ 1 γ k “ 8 and ř 8 k “ 1 γ 2 k ă 8 . For θ t “ δγ t , we have: γ t p 2 θ 2 t ` ˜ C 2 q ´ θ t 2 “ γ t ” 2 δ γ 2 t ` ˜ C 2 ´ δ 2 ı . In case δ ą 2 ˜ C 2 , we can find c ą 0 s.t.: γ t p 2 θ 2 t ` ˜ C 2 q ´ θ t 2 ď γ t “ 2 δ γ 2 t ´ c ‰ . The latter is cle a r ly negative for t sufficiently large. V I . D I S C U S S I O N S A N D O U T L O O K S In this work we ha ve in troduce a novel semi-dec entralized algorithm fo r concave ga mes with cou p led constraints based on mirror ascent and the method of augm ented lagran gian. W e provide a sufficient co ndition on the step size sequence and the d egree of augm entation such that the algorithm conv erges to variationally stable Nash equilibr ium. Specific choices of step -size seque nce a n d augm entation sequ ence for that purpose is also provided. In particular, step size of order γ t “ O p 1 { t q leads to this desire d state. In the futu re work we plan also to inves tig ate the case where the step-size - and augm entation sequ ence is adaptive. Moreover it is interesting to know whether the case w h ere step size sequence s of the agents differ . Another inte r esting line o f work is to in vestigate whether the a lgorithm is robust toward random d isturbance. That is to in vestigate how the algorithm performanc e if the feedba ck obtained by the age nts is an unb iased martingale estimate of the gradient. W e also p lan to der i ve ba sed on the algorith m given in th is work an algo rithm which ensures not only compliance in the asymptotic region but also in the n o n-asymp to tic region. R E F E R E N C E S [1] G. Scutari, D. P . Palomar , F . Facchine i, and J .-S. Pang, Monotone Games for Cognitiv e Radio Systems . L ondon: Springer London, 2012, pp. 83–112. [2] A. Mohsenian-Rad, V . W . S. W ong, J . J atsk evich, R. Schober , and A. L eon-Garc ia, “ Autonomous demand-side management based on game-the oretic energy consumption scheduling for the future smart grid, ” IEEE T rans. on Smart Grid , vol . 1, no. 3, pp. 320 – 331, Dec. 2010. [3] W . Saad, Z . Han, H. V . Poor , and T . Ba sar, “Game-theor etic methods for the smart grid: An overvie w of microgrid s ystems, demand-side management , and smart grid communications, ” IEEE Sig. Proc. Mag . , vol. 29, no. 5, pp. 86 – 105, 201 2. [4] R. Deng, Z. Y ang, J. Chen, N. R. Asr, and M. Chow , “R esidential en - ergy consumpti on scheduling: A couple d-constraint game approach, ” IEEE T rans. on Smart Grid , vol. 5, no. 3, pp. 1340 – 1350, May 2014. [5] S. L i, W . Zhang, J. Lian, and K. Kalsi, “Marke t-based coordinat ion of thermostatical ly control led loads part i: A mechanism design formulati on, ” IE EE T rans. on P ow . Sys. , vol. 31, no. 2, pp. 1170 – 1178, Marc h 2016. [6] ——, “On re verse stack elberg game and optimal mean field control for a large population of thermostatical ly controll ed loads, ” in 2016 Am. Cont. Conf. , Jul y 2016, pp. 3545 – 3550. [7] S. Grammatico , B. Gentile, F . Parise, and J. L ygeros, “ A mean field control approach for demand side management of lar ge populati ons of thermostat ically controlled loads, ” in Proc. of th e IEEE Eur opean Contr ol Confer ence , 2015. [8] Z . Ma, D. S. Callaw ay , and I. A. Hiske ns, “Decentral ized charging control of large populat ions of plug-in electric vehicles, ” IEEE Tr ans. on Cont . Sys. T ech. , v ol. 21, no. 1, pp. 67 – 78, Jan. 2013. [9] F . Parise, M. Colombino, S. Grammatico, and J. Ly geros, “Mean field constrai ned charging polic y for lar ge popul ations of plug -in electri c vehi cles, ” in 53r d IEEE Confere nce on Decision and Contr ol , Dec. 2014, pp. 5101 – 510 6. [10] Z. Ma, S. Z ou, L. Ran, X. Shi, and I. A. Hiskens, “Efficie nt decen- traliz ed coordi nation of large-sca le plug-i n electric vehicl e charging, ” Automat ica , vol. 69 , pp. 35 – 47, 2016. [11] S. Grammatico, “Exponenti ally con ver gent decentra lized charging control for larg e population s of plug-in electric vehi cles, ” in 2016 IEEE 55th Confer ence on Decision and Contr ol (CDC) , Dec. 2016, pp. 5775 – 5780. [12] N. Li, L. Chen, and M. A. Dahleh, “Demand response using line ar supply function bidding, ” IEEE T rans. on Smart Grid , vol. 6, no. 4, pp. 1827–1838, July 2015. [13] J. Barrera and A. Garcia, “Dyna m ic incent ive s for congestion cont rol, ” IEEE T ransaction s on Automatic Contr ol , vol . 60, no. 2, pp. 299 – 310, Feb . 2015. [14] E. V . Belmeg a, P . Mertik opoulos, R. Negrel , and L. Sanguinetti, “Online con vex optimizat ion and no-re gret learning: Algorithms, gua r- antee s and appl ications, ” , 2018. [15] S. Shale v-Shwartz, “Onli ne lea rning and online con vex optimizati on, ” F oundation s and T rends in Machi ne Learning , vol. 4, 2012. [16] S. H. Lo w and D. E. Lapsley , “Optimization flow control, i: Basic algorit hm and con ver gence, ” IEE E/ACM T ransacti ons on Networking , 1999. [17] P . Mertik opoulos and Z. Z hou, “Learning in games with continuous actio n sets and unknown payoff functions, ” Mathemat ical Pro gram- ming , Mar . 2018. [18] D. Pac cagnan, B. G. G. Parise, M. Kamgarpour , and J.L ygeros, “Nash and wardrop equilibria in aggregati ve games with coupling constrai nts, ” 2017, arXiv:1 702.08789 . [19] F . Facchine i and J.-S. Pang, Fin ite-Dimensional V ariational Inequal- ities and Complementarit y P r oblems . Springer-V erlag New Y ork, 2003. [20] M. Mahda vi, R. Jin, and T . Y ang, “T rading regret for efficie ncy: Online con ve x optimizat ion with long term constrai nts, ” J . Mach. Learn. Res. , vol. 13, no. 1, pp. 2503 – 2528, Jan. 2012. [21] R. T . Rocka fellar , Conv ex Analysis . Princeton Univ ersity Press, 1970. [22] R. T . Rockafell ar and R. J. B. W ets, V ariational Analysi s , ser . A Ser . of Comp. Stud. in Mat h. Springe r-V erlag, 1998, vol. 317. [23] P . Mertikopo ulos and W . H. Sandholm., “Learning in games via reinforc ement and regula rization, ” Math. of Op. R es. , vol. 14, no. 1, pp. 124 – 143, 2016.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment