Synthesizing Systems with Optimal Average-Case Behavior for Ratio Objectives
We show how to automatically construct a system that satisfies a given logical specification and has an optimal average behavior with respect to a specification with ratio costs. When synthesizing a system from a logical specification, it is often …
Authors: Christian von Essen (UJF/Verimag Grenoble, France), Barbara Jobstmann (CNRS/Verimag Grenoble
Johannes Reich and Bernd Finkbeiner (Eds): International W orkshop on Interactions, Games and Protocols (iWIGP) EPTCS 50, 2011, pp. 17–32, doi:10.4204 /EPTCS. 50.2 c C. von Essen & B. Jobstman n This work is licensed under the Creativ e Commons Attribution Licen se. Synthesiz ing Systems with Optimal A verage-Case Beha vior f or Ra tio Objectives Christian von Essen VERIMA G Grenoble, France EDMSTII Univ ersit Jos eph Fourier , Grenoble, France christian. vonessen@ imag.fr Barbara Jobstmann CNRS/VERIMAG Grenoble, France barbara.jo bstmann@i mag.fr W e show how to automatically constru ct a system that satisfies a giv en logical specification and has an optimal a verage behavior with respect to a specification with ratio costs. When s ynthesizing a system from a logical specification, it is often the case that se veral dif ferent systems satisfy the specification. In this case, it is usually not easy for th e user to state for mally which system she prefers. Prior work prop osed to ran k the correct systems by addin g a quantitative aspect to the spec ification. A desired pref erence relation can be expressed with (i) a qu antitativ e languag e, whic h is a function assign ing a value to ev ery possible behavior o f a sy stem, and (ii) an en vironmen t model defining the desired optim ization criteria of the system, e.g., worst-case or av erage-case optimal. In this paper, we show how to syn thesize a sy stem that is optimal fo r (i) a quantitative language giv en by an automa ton with a ratio co st function, and (ii) an e n vironmen t model g iv en by a lab eled Markov decision process. The ob jectiv e of the system is to m inimize the expected (ratio) costs. The solution is based on a reductio n to Markov Decision Processes with ratio cost functions w hich do not require that the costs in the denomin ator are strictly positiv e. W e find a n optima l stra tegy for these using a fractional linear program. 1 Introd uction Quantita ti ve analysi s techni ques are usually used to measure quantita ti ve proper ties of systems, such as timing, perfor mance, or reliability (cf. [7, 26, 8]). W e use quantitati ve reasoni ng in the classic ally Boolean conte xts of verificatio n and synthesis because they allo w us to distinguish systems with respect to “soft constrain ts” l ike rob ustness [11] or def ault beha vior [10]. This is particular ly h elpful in synthesis, where a system is automatic ally deri ved from a specificati on, because quantitati ve specification s allo w us to guide the synthes is tool towa rds a desired implementatio n. In this paper w e sho w how quantit ati v e specificati ons based on ratio objecti v es can be used to guide the synthes is process. In particula r , we present a techniqu e to synthesize a system w ith an a ver age- case behav ior that satisfies a logical specification and optimizes a quantita ti v e objecti ve giv en by a ratio object i ve. The synthesi s problem can be seen as a game between two playe rs: the system and the en vironment (the conte xt in which the system operates) . The system has a fi xed set of interfac e var iables with a finite domain to interact with its en vironment. The v ariabl es are partitioned into a set of input and outpu t var iables . T he en vironment can modify the set of input variab les. For instance , an input v ariab le can indicate the arri v al of some pack et on a router on a gi ve n port or the request of a client to use a shared resour ce. E ach assignment to the input variab les is a possible mov e of the en vironment in the synthe sis game. The syste m reacts to the beha vior of the en vironment by changin g the v alue of the 18 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves outpu t va riables . An assignment to the output variab les is called an action of the system and describes a possib le move of the system in the synthesis game. E.g., the system can grant a shared resource to Client C by setting a correspond ing output v ariabl e. En viron ment and system change their vari ables in turns. In e ver y step, first the system make s modification to the output var iables , then the en vironment chang es the input variab les. The sequen ce of variab le ev aluations bu ilt up by this interp lay is e v aluat ed with respe ct to a specificatio n. A logical (or qualitati ve) speci fication maps e ve ry sequ ence to 1 or 0, indica ting whether the seque nce sati sfies the specification or not. For example, a sequenc e of ev alu ations in which the system grants a shared resource to two clients at the same time is mapped to 0 if the specifica tion requir es mutual exclusi ve acces s to this resource. T he aim of the system in the synthesis game is to satisfy the specificatio n indep enden t of the choices of t he en vironment. There m ight be sev eral systems that c an achie ve this goal fo r a gi ven spec ification. Therefore, Bloem et al. [10] pro posed to ad d a quantitati ve specification in order to rank the correc t systems. A quantitati ve specification maps ev ery infinite sequen ce of varia ble ev aluations to a valu e indicatin g ho w desirab le this beha vior is. In this paper , we study quantitati ve specification s resulting from ratio objecti v es. The idea is that a beha vio r of the system is mapped to two infinite sequen ces of values . T he first sequence refers to ev ents that were “good ” for the syst em, while the sec ond sequence refers to “bad” e v ents withi n a beha vior . For in stanc e, consid er a server processing reque sts from sev eral clients. If the serve r receiv es a request it can be seen as a bad ev ent, since it requir es the serv er to proce ss the request . On the other hand, ev ery handled reques t is clearly a good e ve nt. Intuiti vel y , the ratio objecti v es computes the long- run ratio between the sum of bad and the sum of good ev ents . This ratio is the val ue of a beha vior . A system can be seen as a set of beha vio rs. W e can assign a v alue to a system by taking, e.g., the worst or the av erag e va lue ov er all its beha viors. Gi v en a way to ev aluate a system, we can ask for a system that optimizes this v alue, i.e., a system that achiev es a better valu e than any other system. T aking the worst val ue ov er the possib le beha viors corresp onds to assuming tha t the syst em is in an adv ersary en vironment. The a ver age v alue is computed with respect to a probabili stic model of the en vironment [15]. In the a ve rage-c ase synthe sis game, the en vironment player is replaced by a probabilist ic player that is playing according to the probab ilistic en viro nment model. In this paper , we present the first av erag e-case synthesis algorithm for specification s that e v aluat e a beha vior of the sy stem with re spect to the ra tio of two cost fun ction s [10]. This ratio o bject i ve allo ws us, e.g., to ask for a syste m that optimizes th e ratio bet ween re quests and ackno wledgment s in a serv er -cli ent system. For the ave rage-c ase analy sis, w e present a new en vironment model, w hich is based on M ark ov decisi on process es and genera lizes the one in [15 ]. W e solve the av erage -case synthes is problem with ratio objecti v e by reducti on to Marko v decisio n pr ocesse s with ratio co st functions. For unichain Markov Decision Processes with ratio cost functi ons, we present a solution based on linear programming. Related W ork. Research ers ha ve co nside red a number of formalisms for qua ntitati ve specificatio ns [5, 12, 13, 14, 2 , 3, 20, 22, 28] but most of them (except for [11]) do not consi der long-run ratio objecti ve s. In [11], t he en vironment is assumed to be adversary , while we assume a probabili stic en vironment mod el. Regar ding the en vironment model, there hav e been se ve ral notion s of metrics for probabili stic systems and games proposed in the literature [4, 19]. The m etrics measure the distanc e of two systems with respec t to all temporal propertie s expres sible in a logic, whereas we (like [15]) uses the quanti tati v e specifica tion to compare systems w rt the property of interest. In contrast to [15], we use ratio objec- ti ve s and a more general en vir onment model. Our en vironment model is the same as the one used for contro l and synthesis in the presence of uncerta inty (cf. [6, 16, 9]). Howe ve r , in this conte xt usual ly only qualitati v e specification s are considered. M DPs with long-run a ve rage objecti ves are well studied. C. v on Essen & B. Jobstmann 19 The books [23, 30] prese nt a detail ed analy sis of this topic. Cyrus Derman [18] studied MDP s w ith a fractio nal objecti v e. This w ork dif fers in two aspects from ou rs: first, Derman requ ires that the payof f of the cost functio n of the denominator is alwa ys strictly positi ve and second, the objecti v e function used in [18] is already giv en in terms of the expecte d cost of the first cost function to the expecte d cost of the second cost functio ns and not in terms of a single trace. De Alfaro [1 ] studies a model that is similar to ours bu t does not consider the synthesis problem. Finally , we would like to note that the two c hoices we ha v e in a quantitati ve synthesis problem, namely the choice of the quantitati ve languag e and the choice of en vironment model are the same tw o ch oices th at appe ar in wei ghted a utomata an d max- plus algeb ras (cf. [21, 24, 17]). 2 Pr eliminaries W ords, Qualitativ e and Quantitativ e Languages. Gi ve n a finite alphabet Σ , a word w = w 0 w 1 . . . is a finite or infinite seque nce of elements of Σ . W e use w i to deno te the ( i + 1 ) -th element in the s equen ce. If w is fi nite, then | w | denotes the length of w , otherwise | w | is infinity . W e denote the empty word by ε , i.e., | ε | = 0. W e use Σ ∗ and Σ ω to denote the set of finite and infinite words, respecti v ely . Giv en a finite word w ∈ Σ ∗ and a fi nite or infinite word v ∈ Σ ∗ ∪ Σ ω , w e w rite wv for the concatenati on of w and v . A qualit ative languag e ϕ is a functio n ϕ : Σ ω → B m apping ev ery infinite word to 1 or 0. Int uiti v ely , a qualit ati v e languag e partitions the set of words into a set of good and a set of bad traces. A quantitati ve langu ag e [14] ψ is a function ψ : Σ ω → R + ∪ { ∞ } associatin g to each infinite word a va lue from the ext ended non-ne gati v e reals. Specifications and automata w ith cost fun ctions. An automaton is a tuple A = ( Σ , Q , q 0 , δ , F ) , where Σ is a finite alphabet , Q is a finite set of states , q 0 ∈ Q is an initial state , δ : Q × Σ → Q is the tr ansit ion function , and F ⊆ Q is a set of sa fe state s . W e use δ ∗ : S × L ∗ → S to denote the clos ure of δ ov er finite words. Formally , gi ve n a word w = w 0 . . . w n ∈ Σ ∗ , δ ∗ is de fined in ducti vely as δ ∗ ( q , ε ) = q , and δ ∗ ( q , w ) = δ ( δ ∗ ( q , w 0 . . . w n − 1 ) , w n ) . W e use | A | to denote the size of the automat on. The run ρ of A on an in finite word w = w 0 w 1 w 2 · · · ∈ Σ ω is an infinite seque nce of states q 0 q 1 q 2 . . . such that q 0 is the initial state of A and ∀ i ≥ 0 : δ ( q i , w i ) = q i + 1 holds. The run ρ is called accepting if for all i ≥ 0, q i ∈ F . A word w is accep ting if the correspondi ng run is accepting . The langua g e of A , denote d by L A , is the qualitati ve languag e L A : Σ ω → B mapping all accepting words to 1 and non- accept ing words to 0, i.e., L A is the character istic functio n of the set of all acceptin g words of A . W e assume w ithout loss of generality that Q \ F is closed under δ , i.e., ∀ s ∈ Q \ F , ∀ a ∈ Σ : δ ( s , a ) ∈ Q \ F . Note that ev ery automaton can be modified to meet this assumption by (i) adding a new state q ⊥ with a self-lo op for ev ery l etter and (ii ) redirec ting ev ery tran sition startin g from Q \ F to the n e w state q ⊥ . The modified automato n accepts the same language as the original automaton. Giv en an auto maton A = ( Σ , Q , q 0 , δ , F ) , a cost function c : Q × Σ → N is a fu nction tha t maps ev ery transit ion in A to a non-ne gat i ve inte ger . W e us e aut omata with cost functio ns and o bjecti ve functio ns to define quantita ti ve langua ges ( or propertie s). Intuiti v ely , the objecti ve function tells u s ho w to summarize the costs along a run. Giv en an automation A and two cost function s c 1 , c 2 , the ratio objective [11] compute s the ratio between the costs seen along a run of A on a word w = w 0 w 1 w 2 · · · ∈ Σ ω : R ( w ) : = lim m → ∞ lim inf l → ∞ ∑ l i = m c 1 ( δ ∗ ( q 0 , w 0 . . . w i ) , w i + 1 ) 1 + ∑ l i = m c 2 ( δ ∗ ( q 0 , w 0 . . . w i ) , w i + 1 ) (1) The ratio objecti ve is a genera lizatio n of the long-ru n ave rage object i ve (also kno wn as mean-pay of f 20 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves object i ve, cf. [33]). W e use R A c 1 c 2 to denote the quantit ati v e languag e defined by A , c 1 , c 2 , and the ratio object i ve function. If A , c 1 , or c 2 are clear from the conte xt, we drop them. Intuiti vely , R computes the lon g-run rati o between the costs accumulated alon g a run. The first li mit allo ws us to ignor e a finite prefix of the run, which ensures that we only conside r the long-r un beha vior . The 1 in the denomin ator av oids divi sion by 0, if the accumulated costs are 0 and has no ef fect if the accumula ted c osts are infinite. W e need the limit i nferio r here because the sequence of th e limit might not con ver ge. Consider the sequenc e ρ = q 1 r 2 q 4 r 8 q 16 . . . , where q k means tha t the State q is v isited k -times. Assume State q and State r ha ve the follo wing costs: c 1 ( q ) = 0, c 2 ( q ) = 1, c 1 ( r ) = 1 and c 2 ( r ) = 1. Then, the v alue of ρ 0 . . . ρ i will alternate between 0 and 1 with increasing i and hence the sequenc e for i → ∞ will not con ver ge. The limit inferior of this sequen ce is 0. Finite-state system and Corr ect ness A finite-st ate syste m S = ( S , L , s 0 , A , δ , τ ) consis ts of the au- tomaton A = ( L , S , s 0 , δ , S ) 1 , an outpu t (or action) alphabet A , and an outpu t function τ : S → A assig n- ing to each state of the system a letter from the output alphabet. The alphab et of the automato n L is called the input alpha bet of the system. Giv en an input word w , the run of the system S on the w or d w is simply the run of A on the word w . For e ve ry word w over the input alphabet, the system produces a word ove r the joint input /outpu t alphabet. W e use O S to deno te the function mapping input word s to the joint input/ou tput word, i.e, gi v en an input word w = w 0 w 1 · · · ∈ L ω , O S ( w ) is the sequen ce of tuples ( l 0 , a 0 )( l 1 , a 1 ) · · · ∈ ( L × A ) ω such th at (i) l i = w i for a ll i ≥ 0, (ii) a 0 = τ ( s 0 ) , and (i ii) fo r all i > 0, a i = τ ( δ ∗ ( s 0 , w 0 . . . w i − 1 ))) holds. Giv en a system S with input alphab et L and outpu t alphabe t A , and an automaton A with alphabet Σ = L × A , we say that the system S satis fies the spec ificati on A , denoted S | = A , if for all input words , the joint input/outp ut word produced by the system S is accepted by the automato n A , i.e., ∀ w ∈ L ω : ( L A ◦ O S )( w ) = 1 , w here ◦ denotes the function composition operator . Pro bability space. W e use the standard definitions of probability spaces. A pr obability space is gi ve n by a tup le P : = ( Ω , F , µ ) , where Ω is th e set of ou tcomes or sample s , F ⊆ 2 Ω is the σ -algeb ra de fining the set of measur able events , and µ ∈ F → [ 0 , 1 ] is a pr obability measur e assignin g a probability to each eve nt such that µ ( Ω ) = 1 and for each countable set E 1 , E 2 , · · · ∈ F of disjoint ev ents we hav e µ ( S E i ) = ∑ µ ( E i ) . R ecall that, since F is a σ -algeb ra, it satisfies the follo wing three cond itions : (i) / 0 ∈ F , (ii) E ∈ F implies Ω \ E ∈ F for any e v ent E , a nd (iii) the union of any countab le set of ev ents E 1 , E 2 , · · · ∈ F is also in F , i.e., S E i ∈ F . Giv en a measurable functio n f : F → R ∪ { + ∞ , − ∞ } , w e use E P [ f ] to denote the expecte d v alue of f under µ , i.e., E P [ f ] = Z Ω f d µ (2) If P is clear from the context we drop the subsc ript or replace it with the structu re that defines P . The inte gral used here is the Lebesgu e Integra l, which is commonly used to define the exp ected val ue of a random va riable . Note that the expecte d value is always define d if the function f maps only to value s in R + ∪ { ∞ } . Marko v chai ns a nd Mark ov decision pr oces ses (MDP). Let D ( S ) : = { p : S → [ 0 , 1 ] | ∑ s ∈ S p ( s ) = 1 } be the set of pr obability distrib uti ons ov er a set S . 1 Note that the last element of this tuple is the set of safe states, i.e., e very state is safe. C. v on Essen & B. Jobstmann 21 A Marko v decision pr ocess is a tuple M = ( S , s 0 , A , ˜ A , p ) , where S is a finite set of states , s 0 ∈ S is an initial state , A is the finite set of action s , ˜ A : S → 2 A is the enabled action function defining for each state s the set of enabled actions in s , and p : S × A → D ( S ) is the probabil istic tran sition funct ion . For techni cal con venienc e w e ass ume that eve ry state h as at least one ena bled action , i.e., ∀ s ∈ S : | ˜ A ( s ) | ≥ 1. If | ˜ A ( s ) | = 1 for all states s ∈ S , then M is called a Marko v chain (MC ) . In this case, we omit A and ˜ A from the definition of M . Giv en a Marko v chain M , w e say that M is irr edu cible if ev ery state can be reache d from any other . W e say that it is unic hain if it has at most one maximal set of states that can reach it other . W e call an MDP unichain if ev ery strate gy induces a unicha in MC . An L-labeled Marko v decision pr oces s is a tuple M = ( S , s 0 , A , ˜ A , p , λ ) , where ( S , s 0 , A , ˜ A , p ) is a Marko v decision proce ss and λ : S → L is a labelin g function such that M is determini stic with respect to λ , i.e, for all states s , s ′ , s ′′ and ev ery action a such that s ′ 6 = s ′′ , p ( s , a ) ( s ′ ) > 0 and p ( s , a )( s ′′ ) > 0 w e ha v e λ ( s ′ ) 6 = λ ( s ′′ ) . Since we use L -labele d Marko v decision process to represen t the behav ior of the en vironment, w e req uire that in ev ery state all action s are enabled, i.e., ∀ s ∈ S : ˜ A ( s ) = A . Sample run s and strateg ies A (sample) run ρ of M is an infinite sequen ce of tuples ( s 0 , a 0 )( s 1 , a 1 ) · · · ∈ ( S × A ) ω of states and action s such that for all i ≥ 0, (i) a i ∈ ˜ A ( s i ) and (ii) p ( s i , a i )( s i + 1 ) > 0. W e use Ω to denote the set of all runs, and Ω s for the set of runs starting at state s . A finite run of M is a prefix of some infinite run. T o av oid confusion, we use v to refer to a fi nite run. Giv en a finite run v , the set γ ( v ) : = { ρ ∈ Ω | ∃ ρ ′ ∈ Ω : ρ = v ρ ′ } of all p ossib le infinite e xtensi ons of v is calle d the con e set of v . W e use the usual exten sion of γ ( · ) to sets of finite words. A stra te gy is a functio n π : ( S × A ) ∗ S → D ( A ) that assigns a probability distrib ution to all finite sequen ces in ( S × A ) ∗ S . A st rate gy must refe r only t o enable d act ions, i.e., for all sequences w ∈ ( S × A ) ∗ , states s ∈ S , and action s a ∈ A , if π ( ws )( a ) > 0, then action a has to be enabled in s , i.e., a ∈ ˜ A ( s ) . A strate gy π is pur e if for all finite sequen ces w ∈ ( S × A ) ∗ and for all states s ∈ S , there is an action a ∈ A such that π ( ws )( a ) = 1. A memoryless strate gy is indepen dent of the history of the run, i.e., for all w , w ′ ∈ ( S × A ) ∗ and for all s ∈ S , π ( ws ) = π ( w ′ s ) holds. A memoryless strategy can be represent ed as functi on π : S → D ( A ) . A pure and memoryless function can be represented by a functi on π : S → A mapping states to action s. An MDP M = ( S , s 0 , A , ˜ A , p ) togeth er with a pure and memoryless strategy π : S → A defines the Marko v chain M π = ( S , s 0 , A , ˜ A π , p ) , in which only the actions prescrib ed in the strate gy π are enabled , i.e., ˜ A π ( s ) = { π ( s ) } . Note that ev ery finite-state system S with input alphabe t S and outpu t alphabet A that refers only to enabled action s can be viewed as a str ate gy for M . V ice-ver sa, an MDP w ith a pure and memoryless strategy π defines a fi nite state system S M π with input alpha bet S and outpu t alphabet A . Induced pr obability space, objectiv e function, and optimal strate gies. An MDP M = ( S , s 0 , A , ˜ A , p ) togeth er with a strategy π and a st ate s ∈ S indu ces a probab ility space P π M , s = ( Ω π M , s , F π M , s , µ π M , s ) over the cone sets of the runs starti ng in s . Hence, Ω π M , s = S ω . T he probab ility measure of a cone set is the probab ility that the MDP starts from state s and follows the common prefix under the strategy π . By con vent ion P π M : = P π M , s 0 . If M is a Marko v chain, then π is fixed (since there is only one a v ailab le action in ev ery stat e), and we simply write P M . An obje ctive func tion of M is a mea surabl e function f : ( S × A ) ω → R + ∪ { ∞ } that maps runs of M to v alues in R + ∪ { ∞ } . W e use E π M , s [ f ] to denot e the expec ted v alue of f wrt the probability space induce d by the MDP M , a strate gy π , and a state s . W e are interes ted in a strateg y that has the least expe cted value for a giv en state. G i ven an MDP M and a state s , a strate gy π is called optimal for objec ti v e f and state s if E π M , s [ f ] = min π ′ E π ′ M , s [ R ] , 22 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves where π ′ ranges ov er all possib le strategie s. Giv en an MDP M = ( S , s 0 , A , ˜ A , p ) and two cost function c 1 : S × A → N and c 2 : S × A → N , the rat io payo f f value is the fu nction R : ( S × A ) ω → R + ∪ { ∞ } mappin g e ve ry run ρ to a value in R + ∪ { ∞ } as follo ws: R c 1 c 2 ( ρ ) : = lim m → ∞ lim inf l → ∞ ∑ l i = m c 1 ( ρ i ) 1 + ∑ l i = m c 2 ( ρ i ) (3) W e drop the subscrip t c 1 c 2 if c 1 and c 2 are clear from the conte xt. 3 Synthesis with Ratio Objecti ve in Pr obabilistic En vironments In this section , we first present a va riant of the quantitati ve synthesis problem introduced in [10]. Then, we show how to solve the synthes is problem with safety and ratio specificat ions in a probabilist ic en vi- ronment describ ed by an MDP . The quantitat i ve synth esis problem w ith probab ilistic en vironments asks to construc t a finite-state system S that satisfies a qualitati v e specification and optimizes a quantitati ve specificatio n under the gi ve n en viro nment. The specifica tions are qualitati v e and quant itati v e languages ov er letters in ( L × A ) , where L and A are the input and output alphabe t of S , respecti vely . In order to compute the av erag e b eha vior of a system, we assume a model of the en vironment . In [15], the en viro nment model is a probability space P = ( L ω , F , µ ) ov er the input words L ω of the system defined by a fi nite L -labeled Marko v chain. This model assumes that the beha vio r of the en viro nment is indepen dent of the beha vio r of the syste m, which restrict s the modeling possib ilities . For instance , a client- serv er system, in which a clien t increa ses the probabili ty of sendin g a reque st if it has not been serv ed in the prev ious step, cannot be modeled using this approach. Therefore, our en vir onmen t model is a function f e that maps eve ry s ystem f s : L ∗ → A to a probabi lity space P = ( L ω , F , µ ) o ver the input words L ω . Note that ev ery finite-state system defines such a system function f s b ut not vice versa. T o descri be a parti cular en vironment model f e , we use a finite L -labeled Mark o v decis ion proces s. Once we ha v e an en vironment model, we can define what it means for a system to satisfy a speci fication under a gi ve n en vironment. Definition 1 (Satisf action ) . Given a finite-st ate system S with alphabe ts L and A, a qualitativ e specifi- cation ϕ ove r alphabet L × A, an d an en vir onmen t model f e , we say that S satisfies ϕ under f e (written S | = f e ϕ 2 ) if f S satisfie s ϕ with pr obability 1 , i.e., E f e ( S ) [ ϕ ◦ O S ] = 1 . Recall that O S denote s the functio n that maps input words to joint input/o utput words, and that ϕ is a qualita ti v e specificatio n, which maps (inpu t/outp ut) words to 0 or 1. Hence, ϕ ◦ O S denote s the functi on that maps a n input sequence to 1 if the beha vior of the sy stem S for this input word satisfies th e specifica tion ϕ . Otherwise, the input word is mapped to 0. The functio n E f e ( S ) [ f ] of some m easura ble functi on f denotes the expecte d v alue of f under the probabili ty distrib utio n indu ced by the system S under the en viro nment model f e . Hence, Definition 1 says that a system satisfies a specificatio n under a probab ilistic en vir onment model if almost all beha vio rs of the system satisfy the specificati on, i.e., the probab ility that the system misbeha ve s is 0. 2 Note that S | = f e ϕ and S | = ϕ coincide if (i) ϕ is prefix-closed (which is the case for t he specifications, we consider here), and (ii) f e ( S ) assigns, for ev ery finite word w ∈ L ∗ , a positiv e probability to the set of infinite wo rds wL ω . C. v on Essen & B. Jobstmann 23 q 0 q 1 a 0 a 1 a 0 a 1 1 (a) Automaton st ating mutual exc lusion s 0 s 1 r 0 0 ra 1 1 ra 1 0 ra 1 1 r a 1 0 ra 0 0 , r a 0 1 (b) Automaton wi th cost fcts for client i Figure 1: Specificatio ns for the client-serv er ex ample Next, we define the va lue of a system with respect to a specification under an en viro nment model and what it means for a system to optimize a specification. Then, we are ready to define the quantitati ve synthe sis problem. Definition 2 (V alue of a syste m) . Given a fin ite-sta te system S with alphabets L and A, a qualitative ( ϕ ) and a quantitati ve specific ation ( ψ ) over alphabet L × A, and an en vir onment model f e , the val ue of S with respect to ϕ and ψ under f e is defined as the exp ected value o f the function ψ ◦ O S in the pr oba bility space f e ( S ) , if S satisfies ϕ , and ∞ oth erwise . F ormally , V alue f e ϕ ψ ( S ) : = ( E f e ( S ) [ ψ ◦ O S ] if S | = f e ϕ , ∞ otherwis e. If ϕ is the set of all w or ds, then we write V alue f e ψ ( S ) . Furthermor e, we say S optimizes ψ wrt f e , if V alue f e ψ ( S ) ≤ V alue f e ψ ( S ′ ) for all sys tems S ′ . Definition 3 (Quantita ti v e realizabilit y and synthes is proble m) . Given a qualita tive specific ation ϕ and a quanti tative specificati on ψ ove r the alpha bets L × A and an en vir onment model f e , the realizabil- ity problem asks to decide if ther e exist s a finite-stat e system S with alphabets L and A such that V alue f e ϕ ψ ( S ) 6 = ∞ . T he synth esis problem asks to construct a finite-sta te system S (if it e xists) s. t. 1. V alue f e ϕ ψ ( S ) 6 = ∞ and 2. S optimizes ψ wrt f e . In the follo wing, we gi ve an e xample of a quan titati ve synthesi s problem. Serv er -client example. Consider a server -client system with two clients and one serv er . Each server - client interface consists of two var iables r i (reque st) and a i (ackno wledge). Client i sends a request by setting r i to 1. The serv er ackno wledge s the request by setting a i to 1. W e require that the server does not ackn o wledge both clients at the same time. Hence , our qualitati ve specificati on demands mutual exc lusion . F igure 1(a) sho ws an automaton stating the mutual exclus ion property for a 1 and a 2 . Edges are labeled with sets of e v alua tions of a 1 and a 2 , e.g., a 1 states that a 1 has to be 0 and a 2 can ha v e either v alue, 1 and 0. S tates dra wn with a dou ble circle are safe states. Among all systems satisfying the mutual exclu sion property , we ask for a system that minimizes the a ve rage ratio between requests and useful ackno wledgmen ts. An ackno wledge is usefu l if it is sent as a r espon se to a re quest. T o expre ss th is proper ty , we can giv e a quantitati ve langu age defined by an automato n with two cost functions ( c 1 , c 2 ) 24 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves r r r a , a ; 1 / 2 a , a ; 1 / 2 a ; 3 / 4 a ; 1 / 2 a ; 1 / 2 a ; 1 / 4 a ; 1 / 8 a ; 1 / 2 a ; 7 / 8 a ; 1 / 2 (a) MDP of one client m 0 m 1 r 1 r 2 r 1 r 2 r 1 r 2 r 1 r 2 (b) Implementation of a server for two clients. S tate labeling: τ ( m 0 ) = a 1 a 2 and τ ( m 1 ) = a 1 a 2 Figure 2: Specificatio ns and implementation for the client-serv er example and the ratio objecti v e (Eqn. 1). Figur e 1(b) sho ws an automaton labeled with tuples repres enting the two cos t functi ons c 1 and c 2 for one c lient. The first compo nent of the tu ples repres ents cost functio n c 1 , the second component defines cost function c 2 . The cost function c 1 is 1, whene v er we see a request. The cost functio n c 2 is 1, when we see a “useful” acknowle dge, which is an ackno wledge that matches an unackno wledg ed request. E.g., ev ery ackno wledge in state s 1 is useful, since the last request has not been ac kno wledg ed yet. In sta te s 0 only ackno wledgments that answer a direct req uest are useful and g et cost 1 (in the second componen t). This corres ponds to a serve r with a buf fer that can hold exactly one reques t and that gets outdated after two steps and has to be dropped. State s 1 says that there is a request in the b uf fer . If there is no ackno wledgmen t while the m achine is in this state, then the request is lost. This means that a request has to be ackno w ledge d in the step it is recei ved or in the step afte r that. Assume we know the expected behav ior of the clients. E.g., in ev ery step, Client 1 is expect ed to send a request with probability 0 . 5 indep endent of the ackno wledgments. Client 2 changes its beha vior based on the ackno wledgments. W e can describe the beha vior of Client 2 by the labeled MDP shown in Figure 2(a). In the beginni ng the chance of getting a request from this client is 0 . 5. Once it has sent a request, i.e., it is in state r , the probability of sendi ng a request again is ver y high until at least one ackno wledgment is giv en . T his is modeled by action g at state r ha ving a proba bility of 3 / 4 to get into state r again, and a probabil ity of 1 / 4 to not send a request in the next step. In this case, we mov e to the right r state. In this state, the probabil ity of recei ving a reque st from this client in the next step is ev en 7 / 8. This means that if this client does not recei ve an ackno wledgmen t after havin g sent a request, then the possibility o f recei ving another re quest from th is client in the next two steps is 1 − 1 / 4 ∗ 1 / 8 = 31 / 32. Consider the finite-state system S shown in Figure 2(b). It is an implementation of a server for two clients . The system has two states m 0 and m 1 labele d with a 1 a 2 and a 1 a 2 , respecti v ely . W e can compute the v alue of S using the follo wing two lemmas (Lem. 1, Lem. 2). C. v on Essen & B. Jobstmann 25 Lemma 1. Given (i) a finite-state system S with alp habets L and A, (ii) an auto maton A with al phabe t L × A, and (iii) an L -labele d MDP M defining an en vir onment model for S , ther e e xist s a Marko v cha in M c and two cost funct ions c 1 and c 2 suc h that S | = M L A Def. 1 ⇐ ⇒ E S M [ L A ◦ O S ] = 1 ⇐ ⇒ E M c [ R c 1 c 2 ] = 0 Pr oof idea: The Marko v chain M c is construct ed by taking the synchr onous product of S , A , and M . In e ver y state ( s , q , m ) ∈ ( S S × Q A × S M ) , we take the action a ∈ A gi ven by the labelin g functi on of t he s ystem τ ( s ) and mov e to a successor state for e ve ry input label l ∈ L such that there exists a state m ′ in the MDP M with λ ( m ′ ) = l and p ( m , a )( m ′ ) > 0. The correspo nding succes sor states of the s ystem- and the auto maton-st ate components are s ′ = δ S ( s , l ) and q ′ = δ A ( q , ( l , a )) . The probab ility distrib ution of M c is taken from the M -compone nt. The two cost functio ns are defined as follo w s: for state ( s , q , m ) and an action a w e set c 1 (( s , q , m ) , a ) = 0 and c 2 (( s , q , m ) , a ) = 1, if q is a safe state in A , otherwis e c 1 (( s , q , m ) , a ) = 1 and c 2 (( s , q , m ) , a ) = 0. Intuiti vely , since the non-saf e states of A are (by definitio n) closed under δ A and all action s in this set ha ve the same cost, they all hav e the same va lue, namely ∞ , so does ev ery stat e from which there is a positi ve probability to reach this set. 3 Lemma 2. Given (i) a finite-state system S with alp habets L and A, (ii) an auto maton A with al phabe t L × A with two cost functions c 1 and c 2 , and (iii) a L -labele d MDP M defining an en vir onment model for S , ther e e xists a Marko v chain M c and two cost function s d 1 and d 2 suc h that V alue M R c 1 c 2 ( S ) Def. 2 = E S M [ R c 1 c 2 ◦ O S ] = E M c [ R d 1 d 2 ] Pr oof idea: The constructi on is the same as the one for Lem. 1 ex cept for the cost functions. The cost functi ons are simply copied from the compone nt referring to the automaton, e.g., gi v en a state ( s , q , m ) ∈ ( S S × Q A × S M ) and an act ion a ∈ A , d 1 (( s , q , m ) , a ) = c 1 ( q ) and d 2 (( s , q , m ) , a ) = c 2 ( q ) . In Section 4, we show how to compute an optimal va lue for MDPs with ratio objecti ves i n polynomial time. S ince Marko v ch ains with ratio objecti ves are a special case of MDPs with ra tio o bjecti ves, we can first use Lem. 1 t o che ck if S | = M L A . If t he chec k succe eds, we th en use L em. 2 to compute th e v alue V alue M R c 1 c 2 ( S ) . This algorithm leads to the follo wing theorem. Theor em 1 (System value ) . Given a finite-st ate syste m S with alphabets L and A, an automaton A with alphabet L × A defining a qualita tive langu ag e , an automaton B with alpha bet L × A and two cost functi ons c 1 and c 2 definin g a quantitat ive langua ge , and a L-labeled M DP M defini ng an en vir onment model, we can compute value of S with r espec t to L A and R c 1 c 2 under P S M in time polyno mial in the maximum of | S | · | A | · | M | and | S | · | B | · | M | . In order to synthesiz e an optimal system, we construct an MDP from the en vironmen t model, the quanti tati v e, and qualitati v e specificatio ns similar to the constructi ons in L em. 1 and 2. Any optimal strate gy for this MD P with a v alue dif ferent from ∞ correspon ds to a system that satisfies the qualitati v e specifica tion and optimizes the quantitati ve specifica tions. In the next section, we w ill show that MD Ps with ratio objecti ve s hav e pure memoryless optimal strate gies. Therefor e, we need to consider only such strate gies that are pure and memoryles s. Gi ven a pure and memoryless strate gy , we buil d the corres pondin g system as follo ws: w e reduce the set of enabled actions in each state to the singl e action 3 Note that instead of an MDP with ratio objecti ve, we could ha v e also set up a two-player safety game here. 26 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves specified by the strateg y . In each state, the enabled action defines the output function of the system. Instea d of deciding the next state probabil istical ly , the system moves from one to the next state depen ding on the chosen input va lue. In the next section we sho w h o w to compute an optimal strategy for a gi v en MDP in time polynomia l in the number of states . This result toge ther w ith cons tructi on abov e leads to the follo wing theor em. Theor em 2 (Synthesi s) . Given an automato n A w ith alphabe t L × A definin g a qualitative langua ge , an automato n B with alpha bet L × A and two cost functio ns c 1 and c 2 definin g a quantitativ e langua g e, and a L-labeled MDP M defining an en vir onment model, we can compute an optimal system S with r esp ect to L A and R c 1 c 2 in time poly nomial in | A | · | B | · | M | . 4 Calculating the best strategy In this section we will first outline a proof sho wing that for ev ery MDP there is a pure and memoryless optimal strate gy for our payof f functio n. T o this end, we arg ue ho w the proof gi v en by [25] can be adapte d to our case. After that we will sho w how w e can calculate an optimal pure and memoryless strate gy . 4.1 Pur e and memoryless strategies suffice In [25 ], Gimbert prov ed that in an MDP any payoff function mapping to R that is submixin g and prefix indepe ndent admits optimal pure and memoryless strateg ies. S ince our pay of f functio n R may also tak e the va lue ∞ , we cannot apply the result immediately . Howe ver , sinc e R maps only to non-ne gati ve v alu es and the set of measurable functions is closed under addition, multiplication , limit inferior and superio r and divis ion, pro vide d th at th e divi sor is not equal to 0, the e xpect ed v alue of R is always defined a nd the theory presented in [25] also applies in this case. Furthermore, to adapt the proof of [25] to minimizing the payof f function instead of maximizing it, one only needs to in v erse the used inequalities and replace max by min. What remains to sho w is that R ful fills the follo wing two prope rties. Lemma 3 ( R is s ubmixin g and prefix independe nt) . Let M = ( S , A , ˜ A , p ) be a M DP and ρ be a run. 1. F or ev ery i ≥ 0 the pr efix of ρ up to i does not matter , i.e., R ( ρ ) = R ( ρ i ρ i + 1 . . . ) . 2. F or every sequence of non-empty wor ds u 0 , v 0 , u 1 , v 1 · · · ∈ ( A × S ) + suc h that ρ = u 0 v 0 u 1 v 1 . . . we have that the payof f of the sequenc e is gr eater than or equal to the minimal payof f of sequence s u 0 u 1 . . . and v 0 v 1 . . . , i.e., R ( ρ ) ≥ min { R ( u 0 u 1 . . . ) , R ( v 0 v 1 . . . ) } . Pr oof. The first property follo w s imm ediate ly from the first limit in the definition of R . For the second property we partition N into U and V such that U contai ns the index es of the parts of ρ that belong to a u k for some k ∈ N and such that V contains the other index es. Formal ly , we define U : = S i ∈ N U i where U 0 : = { k ∈ N | 0 ≤ k < | u 0 |} and U i : = { max ( U i − 1 ) + | v i − 1 | + k | 1 ≤ k ≤ | u i |} . Let V : = U \ N be the other inde xe s. No w we look at the payof f from m to l for some m ≤ l ∈ N , i.e. R l m : = ( ∑ i = m ... l c 1 ( ρ i )) / ( 1 + ∑ i = m ... l c 2 ( ρ i )) . W e can divid e the sums into two parts, the one belon ging to U and the one belonging to V and w e get R l m = ∑ i ∈{ m ... l }∩ U c 1 ( ρ i ) ! + ∑ i ∈{ m ... l }∩ V c 1 ( ρ i ) ! 1 + ∑ i ∈{ m ... l }∩ U c 2 ( ρ i ) ! + ∑ i ∈{ m ... l }∩ V c 2 ( ρ i ) ! C. v on Essen & B. Jobstmann 27 W e no w define the sub-sums between the parenthe ses as u 1 : = ∑ i ∈{ m ... l }∩ U c 1 ( ρ i ) , u 2 : = ∑ i ∈{ m ... l }∩ U c 2 ( ρ i ) , v 1 : = ∑ i ∈{ m ... l }∩ V c 1 ( ρ i ) and v 2 : = ∑ i ∈{ m ... l }∩ V c 2 ( ρ i ) . T hen we recei ve R l m = u 1 + v 1 1 + u 2 + v 2 W e will no w sho w R l m ≥ min u 1 u 2 + 1 , v 1 v 2 + 1 W ithout loss of generality we can assume u 1 / ( u 2 + 1 ) ≥ v 1 / ( v 2 + 1 ) , then we ha v e to sho w that u 1 + v 1 1 + u 2 + v 2 ≥ v 1 v 2 + 1 . This holds if and only if ( u 1 + v 1 )( 1 + v 2 ) = u 1 + v 1 + u 1 v 2 + v 1 v 2 ≥ v 1 ( 1 + u 2 + v 2 ) = v 1 + v 1 u 2 + v 1 v 2 holds. By subtract ing v 1 and v 1 v 2 from both sides w e recei v e u 1 + u 1 v 2 = u 1 ( 1 + v 2 ) ≥ u 2 v 1 . If u 2 is equal to 0 then this holds becaus e u 1 and v 2 are greater than or equal to 0. O therwise, this holds if and only if u 1 / u 2 ≥ v 1 / ( 1 + v 2 ) holds. In general , w e hav e u 1 / u 2 ≥ u 1 / ( u 2 + 1 ) . From the assumption we ha v e u 1 / ( u 2 + 1 ) ≥ v 1 / ( v 2 + 1 ) and hence u 1 / u 2 ≥ v 1 / ( v 2 + 1 ) . T he original claim follo ws because we ha v e sho wn this for any pair of m and l . Theor em 3 (There is al ways a pure and memoryless optimal strate gy) . F or each MDP with the ratio payof f function, ther e is a pur e and memoryless optimal strate gy . Pr oof. See [25] 4.2 Reduction of MDP to a Linear Fractional Program In this section, w e show how to calculate a pure and memoryless optimal strate gy for an MDP with ratio objecti v e by reduc ing the problem to a fraction al linear programming problem. A fractional linear progra mming problem is similar to a linear progr amming problem, b ut the function that one wants to optimize i s the fracti on o f two linear fun ctions . A fracti onal linea r pro gramming pr oblem c an be redu ced to a series of con venti onal linear programming problems to calculate the optimal valu e. W e presen t the reducti on only for unichain MDPs. The ext ension to general MD Ps is based on end-co mponen ts [1] and the fact that end-compo nents hav e an optimal unicha in strategy . Our reduction uses the fa ct that an MDP with a pure and memoryless strategy induces a Mark ov chain and that the runs of a Marko v chain ha ve a special property akin to the law of large numbers , which we can use to calcula te the expected v alue. Definition 4 (Random v ariable s of MCs) . Let p n ( s ) be the pr obability of being in state s at step n and let p ∗ ( s ) : = l im n → ∞ 1 n ∑ n − 1 i = 0 p i ( s ) . This is called the Cesaro limit of p n . Let further ν n s denote the number of visits to state s up to time n. W e hav e the follo w ing lemma describi ng the long -run behav ior of unichain M ark ov chai ns [31, 29]. Lemma 4 (Expected number of visits of a state and well-beh a ved runs) . F or every infinite run of a unich ain Marko v chain, the frac tion of visits to a specifi c state s equals p ∗ ( s ) almost sur ely , i.e., P ( lim l → ∞ ν l s l = p ∗ ( s )) = 1 . W e call the set of runs that have this pr operty w ell-beh a ve d . When we calculate the expected payof f, we only need to consider well-beh a ved words as shown in the follo w ing lemma. 28 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves Lemma 5. L et N denote the set of runs that ar e not well-beh aved. Then E M [ R ] = Z Ω M \ N R d µ M Pr oof. The probab ility m easure of the se t of well-beha ved words is 1. Hence the probabil ity measure of the complement of this set, i.e., N , ha s to be 0. Sets like thes e are called null sets . A classica l result say s that null sets do not need to be consi dered for the L ebesgu e integral. For a well-beh a ve d run, i.e., for ev ery run that we need to consider when calculatin g the expecte d v alue, we can calcula te the payof f in the followin g way . Lemma 6 (Calcula ting the payof f of a well-beha ved run) . Let ρ be a w ell-beh aved run of a unic hain Marko v chain . Denote by π : S → A the only actio n available at a state. T hen R ( ρ ) = ∑ s ∈ S p ∗ ( s ) c 1 ( s , π ( s )) lim l → ∞ 1 l + ∑ s ∈ S p ∗ ( s ) c 2 ( s , π ( s )) Pr oof. By definition of R we ha ve R ( ρ ) = lim m → ∞ lim inf l → ∞ ∑ m i = l c 1 ( ρ i ) 1 + ∑ m i = l c 2 ( ρ i ) W e no w assume that t he Mark ov chain consists of one maximal recurre nce class. W e c an do this b ecaus e e ver y non-recurr ent state will not influence R ( ρ ) , because ρ is well-beha ved and because R is prefix indepe ndent . Hence R ( ρ ) = lim inf l → ∞ ∑ l i = 0 c 1 ( ρ i ) 1 + ∑ l i = 0 c 2 ( ρ i ) W e can calculat e the sums in a dif feren t way: we take the sum ov er the states and count how often we visit one state, i.e., ∑ l i = 0 c 1 ( ρ i ) 1 + ∑ l i = 0 c 2 ( ρ i ) = ∑ s ∈ S c 1 ( s , π ( s )) ν l s 1 + ∑ s ∈ S c 2 ( s , π ( s )) ν l s = ∑ s ∈ S c 1 ( s , π ( s ))( ν l s / l ) 1 / l + ∑ s ∈ S c 2 ( s , π ( s ))( ν l s / l ) No w we tak e lim instead of lim inf. W e will see later that the seque nce con v er ges for l → ∞ and hence lim and lim inf ha v e the same value . Because both sides of the fraction are finite value s we can safely draw the limit in to the fraction , i.e., ( † ) lim l → ∞ ∑ s ∈ S c 1 ( s , π ( s ))( ν l s / l ) 1 / l + ∑ s ∈ S c 2 ( s , π ( s ))( ν l s / l ) = lim l → ∞ ∑ s ∈ S c 1 ( s , π ( s ))( ν l s / l ) lim l → ∞ ( 1 / l + ∑ s ∈ S c 2 ( s , π ( s ))( ν l s / l )) = ∑ s ∈ S c 1 ( s , π ( s )) lim l → ∞ ( ν l s / l ) lim l → ∞ ( 1 / l ) + ∑ s ∈ S c 2 ( s , π ( s )) lim l → ∞ ( ν l s / l ) Finally , by the definition of w ell-beh a ve d runs we hav e lim l → ∞ ν l s l = p ∗ ( s ) . Hence ∑ s ∈ S c 1 ( s , π ( s )) lim l → ∞ ( ν l s / l ) lim l → ∞ ( 1 / l ) + ∑ s ∈ S c 2 ( s , π ( s )) lim l → ∞ ( ν l s / l ) = ∑ s ∈ S c 1 ( s , π ( s )) p ∗ ( s ) lim l → ∞ ( 1 / l ) + ∑ s ∈ S c 2 ( s , π ( s )) p ∗ ( s ) The limit di ve r ges to ∞ if and o nly i f t he second costs are all equal to zero and at lea st on e first cost is not. In this case the original definitio n of R di ver ges and hence R and the last expressi on are the same. Otherwise the last e xpress ion con ver ges, hence † con ver ges, ergo lim inf and lim of this seq uence are the same. C. v on Essen & B. Jobstmann 29 Note that the pre vious lemma implies that the v alue of a well-beha v ed run is independent of the actual run. In other words, on the set of well-beha ved runs of a unichain Marko v chain the payof f functi on is constant 4 . Ergo the e xpecte d va lue of suc h a Mark ov c hain is equ al to the pa yof f of an y of i ts well-beh a ved runs. Theor em 4 (Expected payof f of a MD P and a strate gy) . L et M be a MDP suc h that every pur e and memoryless s tra te gy induces an unichain MC. Let further p ∗ denote the Cesa r o limit of p n of the induce d Marko v chain . Then for every pur e and memoryless strate gy π E π M [ R ] = ∑ s ∈ S c 1 ( s , π ( s )) p ∗ ( s ) lim l → ∞ ( 1 / l ) + ∑ s ∈ S c 2 ( s , π ( s )) p ∗ ( s ) Pr oof. This follo ws from the prev ious lemma and the fact that R is const ant on any w ell-beh a ve d run. Note that this means that an expecte d va lue is ∞ if and only if the second cost of e ver y action in the recurre nce class of the Mark o v chain is 0 and there is at least one first cost that is not. Using this lemma, we are no w able to transform the MDP into a fractio nal linear program. This is done in the same way as is done for the expe cted av era ge payof f case (cf. [30]). W e define v ariable s x ( s , a ) for e ve ry state s ∈ S and eve ry av ailable action a ∈ ˜ A ( s ) . This v ariab le intuiti ve ly correspond s to the probabili ty of being in state s and choosing action a at any time. Then we hav e, for example p ∗ ( s ) = ∑ a ∈ ˜ A ( s ) x ( s , a ) . W e n eed t o restric t this set of v aria bles. First o f a ll, we alwa ys ha ve to be in some state and choose some action, i.e., the sum over all x ( s , a ) has to be one. T he second set of restrictio ns ensure s that we hav e a stationary distr ib utio n, i.e., th e su m of the proba biliti es of goin g out of (i.e., being in) a state is equal to the sum of the probab ilities of moving into this state. Definition 5 (Fractional Lin ear p rogram fo r MDP) . Let M be a n un icha in MDP such th at e very Markov cha in induce d by any strate gy contain s at least one non-zer o second cost. T hen w e define the followin g fra ctiona l linea r pr o gra m for it. Minimize ∑ s ∈ S ∑ a ∈ ˜ A ( s ) x ( s , a ) c 1 ( s , a ) ∑ s ∈ S ∑ a ∈ ˜ A ( s ) x ( s , a ) c 2 ( s , a ) (4) subjec t to ∑ s ∈ S ∑ a ∈ ˜ A ( S ) x ( s , a ) = 1 (5) ∑ a ∈ ˜ A ( s ) x ( s , a ) = ∑ s ′ ∈ S ∑ a ∈ ˜ A ( s ′ ) x ( s ′ , a ) p ( s ′ , a )( s ) ∀ s ∈ S (6) There is a correspon dence between pure and memoryless strate gies and basic feasible soluti ons to the linear program 5 . That is, the linear program alw ays has a solutio n because e ve ry positional strate gy corres ponds to a solution. S ee [30] for a detail ed analysis of this in the expected a ver age rew ard case. Once we ha ve calcu lated a solution of the linear program, we can calculate the strategy as follo ws. Definition 6 (S trate gy from solutio n of linear program) . L et x ( s , a ) be the solutions to the linear pr ogr am. Then we define the stra te gy as follo ws. 4 Note that the fact that any payoff function that is prefix-independent is constant almost surely on each irreducible Markov chain has already been prov ed by [25] 5 A feasible solution is one that fulfills the linear equations that e ve ry solution is subject to. 30 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves π ( s ) = ( arbitr ary if x ( s , a ) = 0 for eve ry enabled action a a if x ( s , a ) > 0 Note that this is well defined becau se for each state s ther e is at most one action a such that x ( s , a ) > 0 becau se of the bijection (modulo the action of transie nt states) between basic feasible solutions and stra te gies and becaus e the optimal strate gy is always pur e and memoryless. 4.3 Fr om LFP to LP Since solv ers to lin ear fractio nal programs are not co mmon and there ar e good free so lve rs to linear pro- grams, we presented a method of co n verting a linear fract ional prog ram to a sequence of linear prog rams that calculate the solut ion. T his algorit hm is due to [27]. Let f ( x ) denot e the va lue of Eqn. 4 under v ariabl e assignment x . Input : feasible solution x 0 , MDP M Output : V ariable assignment , optimal solut ion n ← 0 r epeat g ← f ( x n ) n ← n + 1 Solve Minimize ∑ s ∈ S ∑ a ∈ ˜ A ( s ) x n ( s , a ) c 1 s − g ∑ s ∈ S ∑ a ∈ ˜ A ( s ) x n ( s , a ) c 2 s subjec t to E qn. 5 and Eqn. 6. until f ( x n − 1 ) = f ( x n ) ; r eturn x n , f ( x n ) 4.4 Pr eliminary Implementation W e hav e de ve loped a tool that can handle (finite) unichain MDPs with ratio object i ves based on the approa ch prese nted in this paper . Our tool is implemented in Haskell and uses the GNU Linear Pr o- gra mming Kit to solve the resulti ng linear programs. W e made some initia l experi ments using the serve r -clien t examp le from Section 3. In the case of two clients we hav e a MDP with 24 states and 288 edges. Building and solving this system takes less than 100 millisecon ds on a L aptop with an Intel Core 2 Duo P 8600 clocked at 2.40 G Hz. The resulting machine beha ves as follo ws: If it recei v es only one req uest at the start, th en it ack no wledge s this request immediatel y . Whene ver Client 2, i.e., the complicated client , sends a request, then it also recei ves the ackno wledgment, w ith one excep tion: When Client 1 has an outstandin g request, i.e., if its quali tati v e specifica tion is in s tate s 1 , an d if Clie nt 2 has no outs tandin g request, then Clien t 1 re cei v es the ac kno wl- edgment , The expecte d value is roug hly 1 . 2 = 12 / 10. This means that, out of 12 requests, 10 can be serv ed, which means 83 . 3%. 5 Conclusions and Futur e W ork W e hav e presente d a technique to automatically synthesize system that satisfy a qualitati v e specification and optimize a quantitati ve specification under a giv en en viro nment model. O ur techniq ue can handle qualit ati v e specifications gi v en by an automaton with a set of safe states, and quantita ti ve specifica tions defined by an automaton with ratio objecti ve. C. v on Essen & B. Jobstmann 31 Currently , we are working on a better represen tation of the input specification s. In particula r , we are aiming for a symboli c repres entati on that would allo w us to use a combined symbolic and explic it approa ch, which has shown to be very effe cti v e for MDP with long-run ave rage objecti v e [32]. Further - more, we are ex tendin g the presente d approa ch to qualitati ve specificat ion describe by arbitra ry ω -reg ular specifica tions. Refer ences [1] L. de Alfaro (199 7): F ormal V erificatio n of Pr obabilistic Systems . Ph .D. thesis, Stanford Univ ersity . [2] Luca de Alfaro (19 98): Stochastic T r ansition S ystems . In : Da vide Sang iorgi & Robert de Simon e, editors: CONCUR , Lecture Notes in Com puter Science 1466, Springer, pp. 4 23–43 8. A vailable at http:/ / link. spring er. de/link/servic e/ series / 0558/bibs/1466/146604 23. htm . [3] Luca de Alfaro, Thomas A. He nzinger & Rupak Maju mdar (200 3): Discoun ting the Fu tur e in Systems The- ory . In: Jos C. M. Baeten, Jan Karel Lenstra, Joach im Parrow & Ger hard J. W oeginger, e ditors: ICALP , Lecture Notes in Computer Science 2719, Springer , pp. 1022–10 37. A vailable at http:/ /link. springe r. de/link/servi ce/ series/0558/bibs/2719/2719102 2.htm . [4] Luca de Alfaro, Rupak Majumdar , V ishwanath Raman & Mari ¨ elle Stoelinga (2007): Game Rela- tions and Metrics . In: LICS , IEEE Comp uter So ciety , pp. 99–10 8. A vailable at http://doi. ieeeco mpute rsociety. org /10.1109/LICS.2007.22 . [5] Rajeev Alur, Aldric Degorre, Oded Maler & Gera W eiss (200 9): On Omega-Languages D efined by Mean- P ayo ff Cond itions . In: Luca de Alfaro, editor: FOSSA CS , Lecture Notes in Computer Science 5504, Spring er , pp. 333– 347. A vailable at http://dx.doi.org/10.1007/978- 3- 64 2- 00596- 1_ 24 . [6] C. Baier, M. Gr ¨ oßer, M. Leucker, B. Bo llig & F . Ciesinski (200 4): Contr oller Synthesis for Pr obabilistic Systems . In : IFIP TCS , pp. 493–5 06. [7] Christel Baier , Boudewijn R. Haverkort, Holg er Hermann s & Joost-Pieter Katoen (2010 ): P erformance eval- uation and model chec king join for ces . Commun. AC M 53(9) , pp . 76 –85. A vailable at http:/ / doi.acm. org/10.1145/181089 1.1810912 . [8] G. Behrmann, J. B engtsson, A. Da vid, K. G. Larsen, P . Petterss on & W . Y i. (2002): U P PAA L Imp lementation Secr ets . In: Formal T ech niques in Real-T im e and F ault T olerant Systems . [9] And rea Bianco & Luca de Alfaro (19 95): Mod el Chec king o f Pr o babalistic and Nondeterministic Systems . In: P . S. Thiagar ajan, e ditor: FSTTCS , Lecture No tes in Compu ter Science 1026, Sprin ger , pp. 49 9–513 . A vailable at http:/ / dx.doi.org/10.1007/3- 5 40- 60692- 0_ 70 . [10] Rod erick Bloem, Krishnend u Chatterjee, Thomas A. Henzinger & Barbara Jobstmann (2009 ): Better Quality in S ynthesis thr ough Qu antitative Objectives . In : Ah med Bouajjani & Oded Ma ler , edito rs: CA V , Lecture Notes in Com puter Scien ce 5643, Spr inger, pp . 14 0–156 . A vailable at http://dx.doi.org/10.1007/ 978- 3- 642- 02658- 4_ 14 . [11] Rod erick Bloem, Kar in Greimel, Th omas A. Henzin ger & Barbara Jobstman n (2 009): Syn thesizing r o- bust systems . In: FMCAD , IEEE, pp. 85 –92. A vailable at http:/ / dx.doi.org/10.1109/FMCAD.2009. 535113 9 . [12] Arin dam Chakr abarti, Krishnend u Chatter jee, Thom as A. H enzinger, Orna Kupferman & Rupak Majum - dar (20 05): V erifying Qu antitative Pr o perties Using Boun d Fu nctions . In: Dom inique Borrion e & W o lf- gang J. Paul, ed itors: CHARME , Lecture Notes in Compu ter Science 3725, Springer, pp. 50–64 . A vailable at http: // dx.doi.org/10.1007/115605 48_ 7 . [13] Krish nendu Chatterjee, Luca de Alfaro, Mar co Faella, Th omas A . Henzinger, Rupak Ma jumdar & Ma ri ¨ e lle Stoelinga (2006) : Compositiona l Quantitative Reasoning . In: QEST , IEEE Computer Society , pp. 179– 188. A vailable at http:/ / doi.ieeecomp uterso ciety. org/10.1109/QEST.2006.11 . 32 Synthesi zing Systems with Optimal A ver age-Case Behav ior for Ratio O bjecti ves [14] Krish nendu Chatterjee, Laur ent Doye n & Tho mas A. Henzinger (2 008): Quantitative Languages . In: Michael Kaminski & Simone Ma rtini, editor s: CSL , Lecture No tes in Com puter Scien ce 5213, Springe r , pp. 385– 400. A vailable at http://dx.doi.org/10.1007/978- 3- 54 0- 87531- 4_ 28 . [15] Krish nendu Chatterjee, T homas A. Henzinge r , Barbara Jo bstmann & Rohit Singh (2 010): Measuring a nd Synthesizing Systems in Pr ob abilistic En vir on ments . In: T ayssir T ou ili, B yron Cook & Paul Jackson, editors: CA V , Lecture Notes in Computer Science 6174, Springer, pp. 380–395 . A vailable at http://dx.doi.org/ 10.1007/978- 3- 642- 14295- 6_ 3 4 . [16] Costas Courcoube tis & Mihalis Y an nakakis (1 990): Ma rkov Decision Pr oc esses and Regular Eve nts ( Ex- tended A bstract) . In: Mike Paterson, editor: ICALP , Lecture No tes in Computer Science 443, Sp ringer, pp. 336–3 49. A vailable at http://dx.doi.org/10.1007/BFb003204 3 . [17] R. A. Cuningh ame-Green (1979 ): Minimax algebra . In: Lecture Notes in E conomics a nd Mathematical Systems , 166, Springer-V er lag. [18] C. Der man (1962): On Seq uential Decisions and Marko v Chains . Manageme nt Science 9(1), pp. 16–24 . [19] Josee Desharnais, V ineet Gupta, Radh a Jaga deesan & Pra kash Panangad en (2 004): Metr ics fo r labe lled Markov pr ocesses . Theor . Compu t. Sci. 318(3 ), pp. 323–35 4. A vailable at http:/ / dx.doi.org/10.1016/ j.tcs.2003 . 09.013 . [20] M. Droste & P . Gastin ( 2007) : W eigh ted auto mata and weighted logics . Theoretical Comp uter Science 380, pp. 69–8 6. A vailable at http://dx.doi.org/10.1016/j.tcs.2007.02.055 . [21] M. Droste, W . Kuich & H. V ogler ( 2009) : Handb ook of W eighted Automata . Spr inger Publishing Comp any , Incorp orated. [22] Man fred Dr oste, W ern er Kuich & Geo rge Rahonis (2008 ): Multi-V a lued MSO Logics OverW ords and T r ees . Fundam. Info rm. 84(3- 4), pp. 305– 327. A vailable at htt p://iospre ss. metapre ss. com/content/ j96524 53g66 3425m/ . [23] J. Filar & K. Vrieze (1 996): Competitive Markov Decision Pr o cesses . Spring er-V erlag. [24] Steph ane Gaubert & Max Plu s (19 97): Methods an d App lications of (MAX, +) Linear Algebra . In: R ¨ udiger Reischuk & Michel Morvan, edito rs: ST A CS , Lecture No tes in Comp uter Science 1200, Springer, pp. 261 – 282. A vailable at http://dx.doi.org/10.1007/BFb002346 5 . [25] Hu go Gimber t (200 7): Pure Sta tionary Op timal Strate gies in Ma rkov Decision Pr ocesses . In: W olfgang Thomas & Pascal W eil, editors: ST A CS , Lecture Notes in Comp uter Science 4393, Spr inger, pp. 200 –211. A vailable at http:/ / dx.doi.org/10.1007/978- 3- 54 0- 70918- 3_ 1 8 . [26] A. Hinton, M. Kwiatkowska, G. No rman & D. Parker (2006) : PRIS M: A T ool for Automatic V erifica tion of Pr ob abilistic Systems . In: T A CAS . [27] J. R. I sbell & W . H. Marlow (1956) : Attrition games . Nav al R esearch Logistics Quarterly 3, pp. 71–9 4. [28] Orn a Kupf erman & Y oad Lustig (2 007): Lattice Automata . In : Byron Cook & And reas Podelski, ed itors: VMCAI , Lecture Notes in Computer Science 4349, Spring er , pp. 1 99–21 3. A vailable at htt p://dx. doi. org/10.1007/978- 3- 540- 69738- 1_ 14 . [29] J.R. No rris (2003) : Ma rkov Chains . Cambridge Univ ersity Press. [30] M. L . Puterman (19 94): Ma rkov Decision P r ocesses: Discr ete Stochastic Dynamic Pr ogramming . Wile y- Interscience. [31] H. C. Tijms (2003): A F irst C ourse in Stochastic Models . Chichester: W ile y . [32] Ralf W immer , Bettina Braitling , Bern d Becker , E rnst Moritz Hahn, Pepijn Crouzen, Ho lger Hermann s, Catuscia Dhama & Oliver E. Theel (2010 ): Symblicit Calc ulation of Long-Run A verages for Concurrent Pr ob abilistic S ystems . In: QEST , IEEE Computer Soc iety , pp. 27–3 6. A vailable at h ttp:/ / dx.doi.org/ 10.1109/QEST.2010.12 . [33] Uri Zwick & Mike Paterson ( 1996) : The C omplexity of Mean P ayoff Games on Graphs . Theor . Com put. Sci. 158(1 &2), pp. 343–3 59. A vailable at http://dx.doi.org/10.1016/0304- 3975(95 )00188- 3 .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment