The Strategic LQG System: A Dynamic Stochastic VCG Framework for Optimal Coordination
The classic Vickrey-Clarke-Groves (VCG) mechanism ensures incentive compatibility, i.e., that truth-telling of all agents is a dominant strategy, for a static one-shot game. However, in a dynamic environment that unfolds over time, the agents' intert…
Authors: Ke Ma, P. R. Kumar
The Strategic LQG System : A Dynam ic Stochastic VCG Framework f or Optim a l Coord ination Ke Ma and P . R. Kumar ∗ † Abstract The classic V ickr ey-Clarke-Gr oves (VCG) mec ha- nism ensur es inc entive co mpatibility , i.e ., that tru th - telling of all ag ents is a dominan t strate g y , for a static one-sho t g a me. However , in a dy namic envir on m e n t that unfolds over time , the ag ents’ intertemporal pay- offs depend on the expected futur e contr ols and pay- ments, and a dir ect extension of the VCG mechanism is not sufficient to guarantee in centive compatibility . I n fact, it does not appear to be feasible to construct mech- anisms tha t ensur e the d ominance of d ynamic truth- telling for agents co mprised of gen e ral stochastic dy- namic systems. The con tribution of this p aper is to show that su ch a dynamic stochastic e xtension does exist for the s pecia l case o f Linear -Quadratic-Gau ssian (LQG) agents with a car eful co nstruction of a sequence of lay- er ed payments o ver time. W e pr opo se a layer ed versi on of a mod ified VCG mechanism for payments that decouples the in tertem- poral e ffect of curr ent bids on fu tur e payoffs, and pr ove that truth-telling of d ynamic states forms a dominan t strate g y if system parameters ar e known and agents are rational. An important e xample of a pr oblem needing such optimal dynamic coor dinatio n of stochastic a gents arises in power systems wher e an Ind ependen t System Operator (ISO) h as to en sur e balan ce o f generation and consumptio n at a ll time instants, while ensuring social optimality (ma ximization of the sum of th e utilities of all agents). Addr essing str ate gic behavior is critical as the price-taking assumption on market pa rticipants may no t hold in an electricity market. Agents, can lie o r o ther- wise g ame the bidding system. The challen ge is to deter- mine a bidding scheme between all agents and the IS O ∗ This work is supported in part by NSF Contract ECCS-1760554, NSF Scien ce & T echnology Cent er Grant CCF-0939370, the Power Systems Engineeri ng Researc h Cente r (PSERC), NSF Contrac t IIS- 1636772, and NSF Contract ECCS-1760554. † Ke Ma and P . R. Kumar are with the Department of Electric al & Compute r Engineering, T exas A&M U nive rsity , Colle ge Station, TX 77843 , USA ke.ma@tamu.edu , prk.tamu@gmai l.com that maximizes social welfa re , while taking into a ccount the stochastic d ynamic mod els of a gents, since renew - able energy res ources such as so lar/wind are stocha stic and dynamic in natur e, as ar e consumptio ns b y loads which are influen ced by factors such a s lo cal tempera- tur es a nd thermal in ertias o f fa cilities. 1. intr oduction Mechanism design is the sub-field of game theory that c o nsiders ho w to imp lement socially optim a l so- lutions to pro blems inv olving multip le self-in te r ested agents, each with a p riv ate utility f unction. A typical approa c h in m echanism design is to provid e financial incentives such as payments to promote truth- telling of utility function p arameters from agen ts. Consider for example the Indep endent System Operator (ISO) p rob- lem of electric power systems in which the ISO aims to maximize social welf are and maintain balan ce of g e n- eration and consump tio n while each gen erator/load has a pri vate utility function. The classic V ickery-Clarke- Groves (VCG) mechanism [1] has played a cen tral role in classic mechanism design since it ensures incentive compatibility , i.e., tru th-telling of utility fu n ctions o f all agents forms a dominant strate gy , a n d social ef ficiency , i.e., th e sum of utilities of all agents is maximized. In- deed, th e outco me generated by the VCG mechan ism is stronger than a Nash equilibr ium in the sense that it is strate gy-pr oof , meanin g that truth-telling of utility function s is op timal irre spectiv e o f what othe r s ar e bid- ding. In fact, Green, Laffont and Holmstrom [2] sho w that VCG mechanisms are the only mechanism s that are both ef ficient and strate gy- proof if payo ffs are quasi- linear . While th e VCG mechanism is applicable to a static one-sho t g ame, it does not work fo r stochastic dynamic games. In a dy namic environment that unf olds over time, the agents’ intertemporal payoffs dep end on the expected futu r e controls and paymen ts, and a direct e x- tension of the VCG mech anism is not sufficient to g uar- antee incen tive com patibility . A fund amental difference between dy namic and static mechanism design is that in the forme r, an agen t can bid an untruthfu l utility func- tion co n ditional on his past bids (which need n ot be truthful) and p ast allocation s (from which he c a n make an inferen ce ab o ut o ther agents’ utility functions). Here we should note that for dynamic deterministic systems, by co llecting the VCG payments as a lump sum of all the pay ments over the en tire time h orizon at the be- ginning , incen tive compatibility is still assure d . How- ev er, for a d ynamic stochastic system, the states are priv ate random v ariables and there is no in centive for agents to bid their states truthfully if VCG pay m ents are collected in the same way as for dynamic d etermin- istic systems. In fact, it does not appear to be feasible to con struct mechanisms that ensure th e dominan ce of dynamic tru th-telling for agents compr ised of gen eral stochastic dynamic system s. Nev ertheless, for the sp e c ial case of Linear- Quadratic-Ga ussian ( LQG) agents, wh ere agents have linear state equ ations, quad ratic utility func tio ns an d additive white Gaussian no ise, we show in this p aper that a dy namic stochastic extension of the VCG mech- anism does exist, based on a carefu l co nstruction o f a sequence of lay ered payments o ver time. For a set of LQG ag ents, we p ropose a modified layered version of the VCG mech anism for payments that decou p les the intertempo ral ef fect of curr ent bids on futur e payoffs, and prove that truth-telling of dynam ic states forms a dominan t strategy if system param eters are known and agents are ra tio nal. “Rational” means that an agent will adopt a dominant strategy if it is the u nique one, and it will act on the basis th at it a nd o thers will d o so at futu r e times. An imp ortant example of a problem needing such optimal dyn amic coordination of stochastic agents arises in the ISO pro blem of power systems. In gen eral, agents may hav e d ifferent app r oaches to respon ding to the pr ices set by th e ISO. If eac h agent acts as a price taker , i.e., it h onestly discloses its energy consum p - tion at the announ c ed prices, a comp etitive equilibrium would b e r eached am ong agents. Howev er, when e a ch agent becomes a price anticip a tor , an d it is critical for the ISO to design a m arket mechanism that is strategy- proof ( i.e., in centive co mpatible). The ch allenge for the ISO is to determine a bidding sch e me between all agents (producers an d con sumers) and the ISO that maximizes social welfare, while taking into account the stochastic dy namic models of agents, since renewable energy resources such as solar/wind are stochastic an d dynamic in na ture, as are consumptio ns by loads wh ich are influence d by factors such as local temp eratures and thermal inertias of facilities. Curre ntly , the ISO solicits bids from gener a tors and Load Ser ving Entities (LSEs) and o p erates two markets: a day-ahead market a nd a real-time market. T he day -ahead market lets m arket participants com mit to buy o r sell wholesale electr ic- ity on e d ay be f ore the operating day , to satisfy en e rgy demand b ids and to en sure ad equate schedu ling of re- sources to me et the next day’ s anticipated load. The real-time market lets market participan ts b uy an d sell wholesale electricity during the course of the oper a tin g day to balance the d ifferences b etween d ay-ahead com - mitments a nd th e actua l real-time demand and produ c- tion [ 3]. Our layered VCG mecha n ism fits p erfectly in the real-time market, as we will see in the sequel. The rest o f the pap er is organized as follows. I n Section 2, a survey of related works is presented. This is f o llowed b y a co mplete descrip tion of the classic VCG framework for th e static a nd dynam ic determ in- istic problem in Sec tio n 3 . A lay ered VCG paymen t scheme is introdu ced for the d ynamic stocha stic prob - lem in Section 4. Section 5 concludes the paper . 2. Related W orks In recent year s, several pap ers h ave been written with the aim of explorin g issues arising in dyna m ic mechanism d e sign. In o rder to a c hiev e ex po st incen - ti ve compatibility , Bergemann and V alimaki [4] pro- pose a gene ralization of the VCG mechanism b ased on the m a rginal contribution of ea c h agent an d sho w that ex post participation constraints are satisfied und er some cond itio ns. Athey and Segal [5] con sider a sim- ilar mod el and focus on budget b alance o f the mech - anism. P av an et al. [6] d erives first-order con d itions under which incentiv e compatibility is guaranteed by generalizin g M irrlees’ s [7] e n velope form ula of static mechanisms. Cav allo et al. [8] considers a dynam ic Markovian mo d el and derives a sequ e nce of Groves- like paymen ts which ach ieves Markov pe rfect eq uilib- rium. Bapn a and W eber [9] solves a sequ ential alloc a- tion prob lem b y for mulating it as a m ulti-armed ban- dit problem . Parkes and Singh [1 0] and Friedman and Parkes [11] consider an environmen t w ith rando mly ar- riving and departing agents and prop ose a “delay e d” VCG mech anism to gu arantee interim incen tiv e com- patibility . Besanko et al. [12] and Battaglini et al. [13] characterize the o p timal infin ite-horizo n mech a nism for an agent modeled as a M arkov process, with Besanko considerin g a linear AR(1) pr ocess over a co ntinuum of states, and Battaglini foc u sing on a tw o-state M arkov chain. Farhadi et al. [1 4] propose a dynamic mecha- nism th at is in centive comp atible, individual ratio n al, ex-ante budget balance a n d social efficient based o n the set of inf erence signals. Howe ver , their notion o f in- centive co mpatibility is in a weaker Nash sense, i.e. , giv en o ther agen ts report tru thfully , agent i ’ s best reac- tion is to rep ort tru thfully . Our d ynamic V CG mech- anism on the o th er h and, guaran te e s incentive com- patibility in wea k ly domina nt strategies, i.e., irr espec- ti ve of wh at other ag ents ar e bid ding, agen t i ’ s best strategy is to repo rt truth f ully . Bergeman n and Pa van [15] h av e an excellent survey on recen t research in dy- namic mech anism design and a more r ecent survey pa- per by Bergemann and V alimaki [16] furth er discusses dynamic mechan ism design problem with r isk-averse agents and the relationship between d ynamic mecha- nism and o ptimal co ntracts. T o our knowledge, there does not appe ar to be any result that ensu res d ominanc e of dy namic truth - telling for agents com prised o f LQG systems. 3. The Static a nd Dynamic Deterministic VCG Let us b egin by consider ing the simp ler static de- terministic case. Suppose there are N agen ts, with each agent h aving a utility function F i ( u i ) , where u i is the amount of energy pro duced/co nsumed by ag e nt i . F i ( u i ) depend s only on its o wn co nsumption /generation u i . Howe ver , f or convenience of no tation, we will occa- sionally abuse no tatio n an d write F i ( u ) with the implicit understan ding that it o n ly dep ends o n the i -th co m po- nent u i of u . Let u : = ( u 1 , ..., u N ) T , u − i : = ( u 1 , ..., u i − 1 , u i + 1 , ..., u N ) T , and let F : = ( F 1 , . . . , F n ) . In the VCG mechanism, each agent is asked to bid its utility function ˆ F i . Th e agen t c a n lie, so ˆ F i may not be equal to F i . (As for F , we denote ˆ F : = ( ˆ F 1 , . . . ˆ F n ) ). After obtain ing the b ids, the ISO calculates u ∗ ( ˆ F ) as the optimal solution to the following problem : max u ∑ i ˆ F i ( u i ) subject to ∑ i u i = 0 . The last equality e n sures balance between gen eration and co nsumptio n . Each a g ent is then assigned to pro- duce/con sume u ∗ i ( ˆ F ) , and is obliged to do so, ac cruing a utility F i u ∗ i ˆ F . Following the rules that it has an- nounc e d a priori bef ore r eceiving th e bids, the ISO then collects a paym ent p i ( ˆ F ) fro m agen t i , de fin ed as f ol- lows: p i ( ˆ F ) : = ∑ j 6 = i ˆ F j ( u ( i ) ) − ∑ j 6 = i ˆ F j ( u ∗ ) , where u ( i ) is defined as th e optimal solution to the fol- lowing prob le m : max u − i ∑ j 6 = i ˆ F j ( u j ) subject to ∑ j 6 = i u j = 0 . W e can see that p i is the co st to the r est of th e agents du e to agent i ’ s pre sen ce, which leads agents to internalize the social externality . In fact, the VCG mechan ism is a special case of the Groves mech anism [17], wh e re pay m ent p i is defined as: p i ( ˆ F ) = h i ( ˆ F − i ) − ∑ j 6 = i ˆ F j ( u ∗ ( ˆ F )) . where h i is any arbitrar y fun ction and ˆ F − i : = ( ˆ F 1 , .., ˆ F i − 1 , ˆ F i + 1 , ..., ˆ F N ) . Truth-telling is a domin ant strategy in the Grov es mechan ism [17]. T h at is, regard- less of other agen ts’ strategies, an agent can n ot do better than tr uthfully declaring its utility function. Theorem 1 . [1 7] T ruth-telling ( ˆ F i ≡ F i ) is the d ominant strate g y equ ilibrium in Gr oves mec han ism. Pr oof. S upp ose agent i an nounc es the true utility func- tion F i . L et ¯ F : = ( ˆ F 1 , ... ˆ F i − 1 , F i , ˆ F i + 1 , ..., ˆ F N ) and ¯ F − i : = ( ˆ F 1 , ... ˆ F i − 1 , ˆ F i + 1 , ..., ˆ F N ) . Let ¯ F ( u ) : = ∑ i ¯ F i ( u i ) . Let ¯ u ∗ i be wha t ISO assigns, and p i ( ¯ F ) be what I SO ch arges, when ¯ F is announced by th e ag e nts. Let u ∗ i be what ISO assigns and p i ( ˆ F ) be what ISO charges when ˆ F is annou n ced b y agents. Note that ¯ F − i = ˆ F − i , and so h i ( ¯ F − i ) = h i ( ˆ F − i ) . Hence f o r agent i , the difference between the net util- ities resulting fro m ann ouncin g F i and ˆ F i is h F i ( ¯ u ∗ i ) − p i ( ¯ F ) i − h F i ( u ∗ i ) − p i ( ˆ F ) i = F i ( ¯ u ∗ i ) − h i ( ¯ F − i ) + ∑ j 6 = i ˆ F j ( ¯ u ∗ i ) − F i ( u ∗ i ) + h i ( ˆ F − i ) − ∑ j 6 = i ˆ F j ( u ∗ i ) = ¯ F ( ¯ u ∗ ) − ¯ F ( u ∗ ) ≥ 0 , where th e last inequality hold s since ¯ u ∗ is the optimal solution to the social we lfare problem with utility func- tions ¯ F . The above VCG schem e can be extend ed to the im- portant case of d ynamic systems. W e first consider the deterministic case. This is a straightfo rward extension of the static case since one can consider th e sequen ce of actions taken by an agent as a vector action. That is, one can simply view the prob lem as an op en-loop con- trol problem, wher e the entire decision on the sequence of co ntrols to b e em ployed is taken a t the in itial time, and so treatab le as a static problem. For ag ent i , let F i , t ( x i ( t ) , u i ( t )) be the on e-step util- ity function at time t . Sup pose th at th e state of agent i ev olves as: x i ( t + 1 ) = g i , t ( x i ( t ) , u i ( t )) . The ISO asks each agent i to bid its one-step u til- ity functions { ˆ F i , t ( x i ( t ) , u i ( t )) , t = 0 , 1 , . . . , T − 1 } , state equation { ˆ g i , t , t = 0 , 1 , . . . , T − 1 } , and initial condition ˆ x i , 0 . T he ISO th en calculates ( x ∗ i ( t ) , u ∗ i ( t )) as the op- timal solution , assumed to be uniqu e , to th e following utility maximization p roblem: max N ∑ i = 1 T − 1 ∑ t = 0 ˆ F i ( x i ( t ) , u i ( t )) subject to x i ( t + 1 ) = ˆ g i ( x i ( t ) , u i ( t )) , for ∀ i and ∀ t , N ∑ i = 1 u i ( t ) = 0 , fo r ∀ t , x i ( 0 ) = ˆ x i , 0 , for ∀ i . W e denote this pr oblem as ( ˆ F , ˆ g , ˆ x 0 ) . W e ca n extend th e notion o f VCG paymen t p i to the deter m inistic dy namic system as fo llow . Let p i : = ∑ j 6 = i T − 1 ∑ t = 0 ˆ F j ( x ( i ) j ( t ) , u ( i ) j ( t )) − ∑ j 6 = i T − 1 ∑ t = 0 ˆ F j ( x ∗ j ( t ) , u ∗ j ( t )) . Here ( x ( i ) i ( t ) , u ( i ) i ( t )) is th e o ptimal solu tion to the fol- lowing prob le m , which is ass umed to be u nique: max ∑ j 6 = i T − 1 ∑ t = 0 ˆ F j ( x j ( t ) , u j ( t )) subject to x j ( t + 1 ) = ˆ g j ( x j ( t ) , u j ( t )) , for j 6 = i and ∀ t , ∑ j 6 = i u j ( t ) = 0 , fo r ∀ t , x j ( 0 ) = ˆ x j , 0 , for j 6 = i . More gen e r ally , we can con sider a Groves payment p i defined as: p i : = h i , t ( ˆ F − i ) − ∑ j 6 = i T − 1 ∑ t = 0 ˆ F j ( x ∗ j ( t ) , u ∗ j ( t )) . where h i , t is any ar b itrary fu n ction. W e first show in the following theorem that truth- tellin g is still the d ominant strategy equ ilibrium in Groves mech a nism. Theorem 2. T ruth-telling of utility fun ction, state dy- namics an d initial con dition ( ˆ F i = F i , ˆ g i = g i and ˆ x i , 0 = x i , 0 ) is a dominant strate gy eq uilibrium u nder the Gr oves mechanism for a dynamic system. Pr oof. Let ˆ F : = ( ˆ F 1 , ..., ˆ F i , ..., ˆ F N ) , ˆ g : = ( ˆ g 1 ..., ˆ g i , ..., ˆ g N ) , an d ˆ x 0 : = ( ˆ x 1 , 0 , ..., ˆ x i , 0 , ..., ˆ x N , 0 ) . Suppose agent i announces the tru e one-step utility function F i , tru e state dynam ics g i , and tru e initial con- dition x i , 0 . Let ¯ F : = ( ˆ F 1 , ... ˆ F i − 1 , F i , ˆ F i + 1 , ..., ˆ F N ) , ¯ g : = ( ˆ g 1 , ... ˆ g i − 1 , g i , ˆ g i + 1 , ..., ˆ g N ) , and ¯ x 0 : = ( ˆ x 1 , 0 , ... ˆ x i − 1 , 0 , x i , 0 , ˆ x i + 1 , 0 , ..., ˆ x N , 0 ) . Let ( ¯ x ∗ i ( t ) , ¯ u ∗ i ( t )) be what ISO assigns and p i ( ¯ F , ¯ g , ¯ x 0 ) be wha t ISO charges wh en ( ¯ F , ¯ g , ¯ x 0 ) is announ ced by a g ents. Let ( x ∗ i ( t ) , u ∗ i ( t )) be what ISO assigns an d p i ( ˆ F , ˆ g , ˆ x 0 ) be what ISO charges when ( ˆ F , ˆ g , ˆ x 0 ) is anno u nced by agents. Let ¯ F ( x i ( t ) , u i ( t )) : = ∑ i ¯ F i ( x i ( t ) , u i ( t )) . For agent i , the d ifference b etween net utility re- sulting f rom a nnoun cing ( F i , g i , x i , 0 ) and ( ˆ F i , ˆ g i , ˆ x i , 0 ) is h ∑ t F i ( ¯ x ∗ i ( t ) , ¯ u ∗ i ( t )) − p i ( ¯ F , ¯ g , ¯ x 0 ) i − h ∑ t F i ( x ∗ i ( t ) , u ∗ i ( t )) − p i ( ˆ F , ˆ g , ˆ x 0 ) i = ∑ t F i ( ¯ x ∗ i ( t ) , ¯ u ∗ i ( t )) − h i , t ( ¯ F − i ) + ∑ j 6 = i ∑ t ˆ F j ( ¯ x ∗ i ( t ) , ¯ u ∗ i ( t )) − ∑ t F i ( x ∗ i ( t ) , u ∗ i ( t )) + h i , t ( ˆ F − i ) − ∑ j 6 = i ∑ t ˆ F j ( x ∗ i ( t ) , u ∗ i ( t )) = ∑ t ¯ F ( ¯ x ∗ i ( t ) , ¯ u ∗ i ( t )) − ∑ t ¯ F ( x ∗ i ( t ) , u ∗ i ( t )) ≥ 0 , since ( ¯ x ∗ i ( t ) , ¯ u ∗ i ( t )) is th e optim a l solution to th e pro b - lem ( ¯ F , ¯ g , ¯ x 0 ) . 4. The Dynamic Stochastic VCG In the above section, we ha ve shown th at the VCG mechanism c a n be natur ally extend e d to dynamic d eter- ministic systems b y employing an open-lo op solution . Howe ver , when agents are dynam ic sto c hastic systems, we need to consider closed- loop solu tions. Su ch closed- loop controls depend on the ob servations of the agen ts, which are g enerally priv ate. So the states of the sys- tem are priv ate rando m variables. Hence the p roblem becomes on e of ad d itionally ensuring th at each agents reveals its “true” states at all times. This a dditional complication appears to pr event a solution for g e neral systems. Howe ver , as we will see, in the case of LQG agents one can indeed ensure the dominance of truth telling strategies that reveal the true states. Howe ver , it d o es not appear f easible to also then en sure th a t the agents reveal their tru e state e q uations and co st func- tions. T o obtain the corr ect payment structure, we will need to c a refully redefin e the VCG payments such that incentive com patibility is still assured for the special case o f lin ear qu adratic Gaussian ( LQG) systems. As noted above, one canno t treat the system as an op en- loop system a s in the p revious section. In par ticu lar , this necessitates collecting payments from agents at each step. For agent i , let w i ∼ N ( 0 , σ i ) b e the discre te-time additive Gaussian white noise process affecting state x i ( t ) via: x i ( t + 1 ) = a i x i ( t ) + b i u i ( t ) + w i ( t ) , where x i ( 0 ) ∼ N ( 0 , ζ i ) and is indep endent of w i . Each agent has a one- step utility function F i ( x i ( t ) , u i ( t )) = q i x 2 i ( t ) + r i u 2 i ( t ) . Let X ( t ) = [ x 1 ( t ) , ..., x N ( t )] T , U ( t ) = [ u 1 ( t ) , ..., u N ( t )] T and W ( t ) = [ w 1 ( t ) , ..., w N ( t )] T . L et Q = d iag ( q 1 , ..., q N ) ≤ 0, R = d iag ( r 1 , ..., r N ) < 0, A = d iag ( a 1 , ..., a N ) , B = d iag ( b 1 , ..., b N ) , Σ = d iag ( σ 1 , ..., σ N ) > 0 a nd Z = d iag ( ζ 1 , ..., ζ N ) > 0. Let RSW : = ∑ N i = 1 ∑ T − 1 t = 0 [ X T ( t ) QX ( t ) + U T ( t ) R U ( t )] be the rando m variable denotin g the social w e lfare of the ag ents, and let S W : = E [ RS W ] denote the expected social welfare. The ran dom social we lfare co u ld also be called the “ex-po st social welfare”. The ISO aims to maximize the social welfare, leading to the follo wing LQG problem: max E N ∑ i = 1 T − 1 ∑ t = 0 X T ( t ) QX ( t ) + U T ( t ) R U ( t ) subject to X ( t + 1 ) = AX ( t ) + B U ( t ) + W ( t ) , 1 T U ( t ) = 0 , for ∀ t , (1) X ( 0 ) ∼ N ( 0 , Z ) , W ∼ N ( 0 , Σ ) . W e will re write the rando m socia l welfare and thereb y the social welfare in terms mo re con venient for us. W e will decompose X ( t ) as: X ( t ) : = t ∑ s = 0 X ( s , t ) , 0 ≤ t ≤ T − 1 , (2) where X ( s , s ) : = W ( s − 1 ) for s ≥ 1 and X ( 0 , 0 ) : = X ( 0 ) . Let X ( s , t ) : = AX ( s , t − 1 ) + B U ( s , t − 1 ) , 0 ≤ s ≤ t − 1 , (3) with U ( s , t ) yet to be specified. W e su ppose that U ( t ) can also be deco mposed as: U ( t ) : = t ∑ s = 0 U ( s , t ) , 0 ≤ t ≤ T − 1 . (4) Then regard less of ho w the U ( s , t ) ’ s a re cho sen, as lon g as the U ( s , t ) ’ s for 0 ≤ s ≤ t are indeed a decomposition of U ( t ) , i.e. , (4) is satisfied, the rando m socia l welfare can b e written in terms of X ( s , t ) ’ s and U ( s , t ) ’ s as: RS W = T − 1 ∑ s = 0 L s , where L s for s ≥ 1 is defined as: L s : = T − 1 ∑ t = s X T ( s , t ) QX ( s , t ) + U T ( s , t ) R U ( s , t ) (5) + 2 s − 1 ∑ τ = 0 X ( τ , t ) ! QX ( s , t ) + 2 s − 1 ∑ τ = 0 U ( τ , t ) ! R U ( s , t ) , and L 0 is defined as: L 0 : = T − 1 ∑ t = 0 h X T ( 0 , t ) QX ( 0 , t ) + U T ( 0 , t ) R U ( 0 , t ) i . Hence, S W = E T − 1 ∑ s = 0 L s . In the scheme to follow the ISO will choo se all U ( s , t ) ’ s for different t ’ s at tim e s based on th e infor- mation it has at time s . (Note that t ≥ s ). Hence X ( s , t ) is com pletely d etermined b y W ( s − 1 ) , and U ( s , t ) f o r s ≤ t ≤ T − 1. Indeed X ( s , t ) can be regarded as the contribution to X ( t ) of these variables. Here we assume that the I SO knows th e true system parameters Q , R , A and B . Th is may hold if the ISO has previously ru n the VCG bidd ing scheme for a dyn amic deterministic system, or equi valently , a day- ahead mar - ket, and system parameter s r emain unchang ed whe n agents p articipate in the real-time stochastic market. W e will con sid er a schem e where a t each stage, the ISO asks the a g ents to bid their x i ( s , s ) (wh ich is equal to w i ( s − 1 ) ) at each time s , for 0 ≤ s ≤ T − 1. Let ˆ x i ( s , s ) be what the agents actually bid, since they may not tell the truth. Based on their bids ˆ x i ( s , s ) for 1 ≤ i ≤ N , the ISO solves th e follo wing problem: max L s subject to 1 T U ( s , t ) = 0 , f or s ≤ t ≤ T − 1 , ˆ X ( s , s ) = [ ˆ x 1 ( s , s ) , ..., ˆ x N ( s , s )] T . The variables ˆ X ( s , t ) fo r t > s are define d as ˆ X ( s , t ) = A ˆ X ( s , t − 1 ) + B U ( s , t − 1 ) , that is, with zero noise in the state variable up dates starting from the “initial con- dition” ˆ X ( s , s ) . The inter pretation is the following. Based on the bids, ˆ X ( s , s ) , which is suppo sedly a bid of W ( s − 1 ) , the ISO calculates the trajectory of the linear systems f r om time s onward, assuming zero noise from that point on. It then allo c ates consumption s/generation s U ( s , t ) fo r future periods t for the correspond ing deterministic lin- ear system, with b a lance of co nsumption and prod uc- tion (1) at each time t . These can be regarded as taking into accou n t the conseq uences of the distur bance occ ur- ring at time s . More spec ifica lly , X ( s , t ) is the trajecto ry resulting fro m the disturban ce W ( s − 1 ) at time s , a n d U ( s , t ) is the adjustment made at time s to allocation at time t d ue to distur bance at time s . Next, the ISO collects a paymen t p i ( s ) fr o m agent i at time s as: p i ( s ) : = h i ( ˆ X − i ( s , s )) − ∑ j 6 = i T − 1 ∑ t = s q j ˆ x 2 j ( s , t ) + r j u ∗ 2 j ( s , t ) + 2 q j s − 1 ∑ τ = 0 ˆ x j ( τ , t ) ! ˆ x j ( s , t ) + 2 r j s − 1 ∑ τ = 0 u j ( τ , t ) ! u ∗ j ( s , t ) , where ˆ X − i ( s , s ) = [ ˆ x 1 ( s , s ) , ..., ˆ x i − 1 ( s , s ) , ˆ x i + 1 ( s , s ) , ... , ˆ x N ( s , s )] T , and h i is any arbitr ary functio n (as in the Groves mechanism) . Before we p rove incentiv e co mpatibility , we need to define wh a t is meant by the term “rational agents”. Definition 1. Rational Agents: W e say agent i is ratio- nal at time T − 1, if it adopts a dominan t strategy , when there is a unique dominant strategy . An agen t i is ratio- nal at tim e t if it ado pts a d o minant strategy at time t under the assumption that all agents including itself are rational a t times t + 1 , t + 2 , ..., T − 1, wh e n there is a unique such d ominant strategy . Rationality is d efined in a recursion fashion. Theorem 3. T ru th -telling of state ˆ x i ( s , s ) for 0 ≤ s ≤ T − 1 , i.e., bidding ˆ x i ( s , s ) = w i ( s − 1 ) , is the unique dominan t str ategy for the stocha stic ISO mechanism, if system parameters Q ≤ 0 , R < 0 , A an d B a r e truthfully known, and agents ar e rational. Pr oof. W e show by backward indu ction. When ag e n t i is at the last stage T − 1, it is easy to verify that truth- telling of state (noise) is domin ant, i.e., ˆ x i ( T − 1 , T − 1 ) = x i ( T − 1 , T − 1 ) . W e next employ inductio n an d so assume that tr uth-telling o f states is a dom inant strategy equilibriu m at time k . If agents are rational, we can take expectation over X ( s , s ) , s ≥ k , an d since optimal feedback gain does not change with respect to time, the cross terms cancel and agent i ’ s o b jectiv e alig n s with th e ISO’ s. W e co nclude that tru th-telling ˆ x i ( k − 1 , k − 1 ) = x i ( k − 1 , k − 1 ) is the dominant strategy for ag ent i at time k − 1 . 5. Concluding Remarks It rema in s a n open pro blem how to con struct a mechanism tha t en sures the dominan ce of d ynamic truth-telling fo r ag e n ts c o mprised o f ge n eral stoc h astic dynamic systems. For the sp ecial case of LQG agents, by carefu l constru ction of a seque n ce of lay ered VCG paymen ts over time, the intertempo ral effect of cu r- rent b ids on future p ayoffs can be decoupled, and truth- telling o f dynam ic states is guaranteed if system par am- eters are known and a g ents are rational. Our results can be ge neralized to LQG systems with p artial state obser- vation and time-v aryin g cost and/or state dyn amics. A CKNO WLEDGMENT The auth o rs w ould like to than k Dr . Le Xie f or his valuable co mments and sug g estions. Refer ences [1] W . V ickrey , “Counterspeculation, Auctions, and Com- petitiv e Sealed T enders, ” The J ournal of F i nance , vol. 16, no. 1, pp . 8– 37, 196 1. [2] J. R. Green and J.-J. L affont, Incentives in Public Deci- sion Making . Amsterdam: North-Holland, 1979. [3] ISO Ne w England, “Day-Ahead and Real-T ime Energy Markets, ” http://www .i so- ne.com/markets- operations/markets/da - rt- energy- markets . [4] D. Bergemann and J. V alimaki, “Dynamic Marginal Contribution Mechanism, ” C o wles Foundation for Re- search in Economics, Y ale Unive rsity , Co wles Founda- tion Discussion Papers 1616, Jul. 2007. [5] S. A t hey and I. Segal, “An Efficient Dynamic Mech- anism, ” Econo metrica , v ol. 81, no. 6, pp. 2463–2485, 2013. [6] A. Pav an, I. Segal, and J. T oikka, “Dynamic Mechanism Design: Incentiv e Compatibility , Profit Maximization and Information Disclosure, ” Evanston , Discussion Pa- per , Center for Mathematical Studies in Economics and Managemen t Science 1501, 2009. [7] J. A. Mirrlees, “An E xploration i n the Theory of Opti- mum Income T axation, ” T he Re view of E conomic Stud- ies , vol. 38, no. 2, pp. 175 –208, 197 1. [8] R. Ca vallo, D. C. Parkes, and S. P . S ingh, “Optimal Coordinated Planning Amongst S elf-Interested Agents with Priv ate St ate, ” CoRR , vol. abs/1206.6820 , 2012. [9] A. Bapna an d T . A. W eber , “Efficient Dynam ic Alloca- tion with Uncertain V aluations?” 2005. [10] D. C. Parkes and S. P . Singh, “An MDP-Based Ap- proach to Online Mechanism Design, ” in Advances in Neural Information Pr ocessing Systems 16 , S. Thrun, L. K. Saul, and B. Sch ¨ olkop f, Eds. MIT Press, 2004, pp. 791–798. [11] E. J. Fri edman and D. C. Parkes, “Pricing W iFi at Star- buck s: Issues in Online Mechanism Design, ” in Pro- ceedings of the 4th ACM Confer ence on E lectr onic Com- mer ce , ser . EC ’03. New Y ork, NY , US A: A CM, 2003, pp. 240–241. [12] D. Besanko, “Multi-period Contracts Between P r incipal and A gent with Adv erse Selection, ” Economics L ett ers , vol. 17, no. 1, pp. 33 – 37, 1985. [13] M. Battaglini and R. L amba, “Optimal Dynamic Con- tracting, ” Princeton University , Department of Eco- nomics, Econometric Research Program., W orking Pa- pers 1431, Oct. 2012. [14] F . Farhadi, H. T avafog hi, D. T eneketzis, and J. Golestani, “ A dynamic incentiv e mechanism for security in netwo rks of interdependen t agen ts, ” in Game Theory for Networks , L. Duan, A. Sanjab, H. Li, X. Chen, D. Materassi, and R. Elazouzi, Eds. Cham: Springer International Publishing, 20 17, pp. 86–96. [15] D. Bergeman n and A. Pav an, “ Introduction to S ym- posium on Dynamic Contracts and Mechanism De- sign, ” J ourna l of Economic Theory , v ol. 159, pp. 679 – 701, 2015, symposium Issue on Dynamic Contracts and Mechanism Design. [16] D. Bergem ann and J. V alimaki, “Dynamic Mechanism Design: An Introduction, ” Cowles Foundation for Re- search in Economics, Y ale Univ ersity , Cowles Founda- tion Discussion Papers 2102, Aug. 2017. [17] T . Grov es, “Incen tive s in T eams, ” Econometrica , vol. 41, no. 4, pp. 617–631, 197 3.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment