Opportunistic Multiuser Scheduling in a Three State Markov-modeled Downlink
We consider the downlink of a cellular system and address the problem of multiuser scheduling with partial channel information. In our setting, the channel of each user is modeled by a three-state Markov chain. The scheduler indirectly estimates the …
Authors: Sugumar Murugesan, Philip Schniter
Opp ortunistic Multiuser Sc heduling in Three State Mark o v-mo deled Do wnlink Sugumar Murugesan, Philip Sc h niter The Ohio State University Dep artment of Ele ctric al and Computer Engine ering 2015 Neil Avenue, Columbus, Ohio 43210 August 29, 2008 Abstract W e consider the do wnlink of a cellular sys tem and address the problem of m ul- tiuser sc hedu lin g with partial channel in formation. In our setting, the c han n el of eac h user is mo deled by a three-state Marko v chain. The scheduler indir ectly es- timates the c hann el via accum ulated Automatic R ep eat Request (AR Q) feedbac k from th e s cheduled users and uses this inform ation in future sc hed u ling decisions. Using a P artially O bserv able Marko v Decision Pro cess (PO MDP), we f orm u late a throughput maximization pr ob lem that is an extension of our previous work where the c hannels w ere mo deled u s ing tw o states. W e r ecall the greedy p olicy that w as sho wn t o be o p timal and easy to imp lemen t in the t w o sta te ca se and study the implemen tation structure of th e g r eedy p olicy in the considered d o wnlin k. W e classify the system into t w o t yp es based on the c h annel statistics and obtain round robin structures for the greedy p olicy for eac h system t yp e. W e obtain p erformance b ound s for the downlink s ystem using these str u ctures and study the conditions under which the greedy p olicy is optimal. Index T e rms –Marko v c hann el, do wnlink, AR Q , m u ltius er sc hedu ling, greedy p olicy . 1 In tro duct i on Opp ortunistic m ultiuser sc heduling, introduced b y Knopp and Humble t in [1] and defined as al lo c ating the r esour c es to the user exp eriencing the mos t favor able channel c ondition s has gained immense p opularity among netw ork designers in the recen t past. Opp ortunis- tic multius er sche duling essen tially taps the m ultiuser div ersity in the system a nd has motiv ated sev eral researc hers ([2, 3, 4 , 5, 6]) to study t he p erformance g ains obtained b y opp ortunistic sc heduling under v arious scenarios. While i.i.d flat fa ding mo del is p opularly used b y researc hers in mo deling time v arying c hannels, it f a ils to capture the memory in the c hannel observ ed in realistic scenarios. This motiv ated the researc hers to consider the Gilb ert Elliott mo del [7] that represen ts the c hannel by a tw o state Mark ov c hain. Sp ecifically , a user experiences error- free transmission when it observ es a “go o d” 1 c hannel, a nd unsuccessful transmission in a “bad” channe l. Sev eral w o rks hav e b een done on opp ortunistic multiuser sc heduling in this Mark ov mo deled c hannel [8, 9, 10, 11, 12]. It is understandable that the av a ilabilit y of the c hannel state information at the sc heduler is crucial for the success o f the opp or t unistic sc heduling sc hemes. T raditionally , when the sche duler has no c hannel information, pilot based c hannel estimation is p erformed and the estimates are used fo r sc heduling decisions ([2, 6, 13]). A new line of work, [14, 15, 16, 17, 18], att empts to exploit Automatic Rep eat reQuest (AR Q) feedbac k, t r a - ditionally used for error control at the data link la y er, to estimate the state of the tw o state Mark ov mo deled c hannels. In [16] and [18], for a t wo state Marko v mo deled downlink (one to many communi- cation) system, a greedy p olicy ha s b een sho wn to b e optimal from a sum throughput p oin t of view. This greedy p olicy is sho wn to b e amenable to a n easy implemen tation with a simple r o und robin based algo rithm that takes as input the AR Q feedbac k from the sc heduled user. Although mo deling the c hannel b y a tw o state Mark ov chain is a w elcome c hange from the traditional memoryless mo dels, the sc heduler can do b etter by discriminating t he channel conditions on a finer lev el, i.e., if the ch annel is mo deled b y higher state Mark ov c hains. As a first step in this direction, in this rep ort, w e mo del the c hannels by three state Mark ov c ha ins a nd study t he pro p ert y o f the greedy p olicy and conditions under whic h it will b e optimal. The repo rt is organized as follo ws. The problem setup is describ ed in Section 2 follo wed b y a study o f the implemen tation structure of the greedy p olicy in Section 3. A comparison o f the origina l system with the genie-aided system, in tro duced in [1 8], will b e made in Section 4. In Section 5, upp er and lo w er b ounds to the system p erformance is derive d. W e study the conditions under whic h greedy p olicy is optimal in Section 6. Conclusions are prov ided in Section 7. 2 Problem Setu p 2.1 Channel Mo del - Probabilit y T ransition Matrix W e consider dow nlink tr ansmissions with 2 users. The c hannel b etw een the base station and eac h user is mo deled b y an i.i.d, first order, three-state Mark o v c hain. Time is slotted and the c hannel of eac h user remains fixed for a slot and ev olves in to a no ther state in the next slot a ccording to the Mark ov c hain statistics. T he time slots o f all users are sync hronized. The three-state Mark ov c hannel is c haracterized b y a 3 × 3 probability transition matrix P = p 11 p 12 p 13 p 21 p 22 p 23 p 31 p 32 p 33 , (1) where p ij is the probability o f ev olving from state i to state j in the next slot. The states are made to represen t the quan tized strength of the c ha nnel, with state 1 assumed to represe nt the low er end of t he c hannel strength sp ectrum and state 3 represen ting the higher end. W e assume that the Mark ov chain is p ositiv ely correlated 2 in time. Th us p ii ≥ p j i if j 6 = i . Also, motiv ated b y observ ation of realistic c hannels, w e assume that the channel ev olve s in a smo oth fa shion across time. Th us p 21 ≥ p 31 and p 23 ≥ p 13 . Also, observing that state 3 represen ts a region of the c hannel strength sp ectrum that is not b ounded from ab o ve , it is reasonable to assume p 32 ≤ p 12 . T o summarize, the transition matrix elemen ts are related as b elo w: p 11 ≥ p 21 ≥ p 31 p 22 ≥ p 12 ≥ p 32 p 33 ≥ p 23 ≥ p 13 (2) 2.2 Existence of S teady S tate Let p ss denote the steady state probabilit y ve ctor of the Mark ov chain with p ss ( i ) repre- sen ting the steady state probabilit y of state i . W e now rule out the instances of P matrix en tries that either 1) obv iously lead to a steady state p ss ( i ) = 0 for some i ∈ { 1 , 2 , 3 } or 2) eliminates the p ossibilit y of a steady state a ltogether. Both these cases trivialize the sche duling problem w e address in this rep ort. F rom the inequalities gov erning the elemen ts of the P matrix, w e see tha t p ii > 0. Otherwise, p j i = 0 leading to p ss ( i ) = 0 . Also p 12 6 = 0. Since, if p 12 = 0, then p 32 = 0. Th us p ss (2) = 0 (since when the channel en ters state 1 or 3 it will nev er reac h state 2 again). F or similar reasons, p 21 6 = 0 and p 23 6 = 0. Th us the o nly elemen ts that can b e zero are p 13 , p 31 and p 32 . Among these p 31 and p 32 cannot b e b o th zero. Otherwise, p 33 = 1 making state 3 an absorbing state leading to p ss (1) = p ss (2) = 0. Th us there can b e a t most one zero in an y ro w and at most o ne zero in any column of P . This, alo ng with the fact that all the en tries are non-negativ e, renders P 2 a p ositiv e matrix (i.e., all the elemen ts are p ositiv e). F rom [19] (p.51), A nonne gative squar e matrix, A is said to b e r e gular iff ther e e xists a natur al numb er r such that A r is a p ositive matrix . Th us P is a regular matrix with r ≥ 2. W e no w repro duce Theorem 4.2 from [19] Theorem 1. If A is a r e gular sto chastic matrix then A n c onver ges as n → ∞ to a p ositive stable sto chastic matrix e Π ′ , wher e Π = ( π ( i )) i ∈ state sp ac e is a pr ob ability ve ctor with non nul l entries and e is a unit ve ctor with dimension e qual to the c ar dinality of the state sp ac e. Th us the n -step transition probability matrices o f the Mar ko v channels in our problem also conv erge to stable sto ch astic matrices. Since this is necessary and sufficien t for the existence of steady state, under the conditions established ear lier, t he Marko v c hannels in our problem hav e steady state with the steady state probability v ector giv en b y p ss . 2.3 Sc heduling Problem The base stat io n is the central con troller that controls the transmission to the users in eac h slot. In any time slot, the base stat io n do es no t know the exact c hannel state of the users and it m ust sc hedule the transmission of the head o f line pack et of exactly 3 one user (a da ta queue is maintained for each user t o collect the data mean t for t hat user). Th us, a TDMA st yled sc heduling is p erformed here. The p o w er sp en t in eac h transmission is fixed. A t t he b eginning of eac h time slot, the head of the line pa c k et of the sc heduled user is transmitted and is dropp ed from the queue. The sc heduled user, based on measuremen ts of the signal strength of the receiv ed data pac k et, obtains information on the state of the c hannel and sends this bac k to the sc heduler. W e call this feedbac k as F i with i ∈ { 1 , 2 , 3 } . This c hannel state feedback is assumed to b e transmitted ov er a dedicated error-free channel. This feedbac k informat ion, along with the lab el of the slot in which it is acquired, will b e used in future sc heduling decisions. The p erformance metric that the base statio n aims to maximize is the sum reward of the system. D eta ils will b e discussed in the next section. 2.4 F ormal Problem Definition Since the base statio n m ust mak e sc heduling decisions based o n only a part ia l observ ation of the underlying Marko v c hain, w e emplo y tec hniques from part ia lly observ able Mark o v decision pro cess (POMDP) [20, 21, 22, 23] theory in this work. W e no w pro ceed t o in tro duce the t erms/entities tha t we use in this w ork, man y of whic h are b orrow ed fro m the POMDP literature. Contr ol in terval k : Eac h time slot in our problem setup will henceforth b e called a con trol in terv al. The “end” o f the POMDP is fixed. A con trol in terv al is indexed by k if there are k − 1 more interv als un til the end of the pro cess. A ction a k : Indicates the index of the user (1 or 2) sc heduled in con tro l in terv al k . Belief ve ctor of user i at the k th c ontr ol interval π k ,i : Elemen t π k ,i ( j ) denotes the probabilit y that the c hannel of user i ∈ { 1 , 2 } in the k th con trol in terv al is in state j ∈ { 1 , 2 , 3 } , giv en a ll the past information ab out that channel. If F j w as receiv ed from user i , l + 1 con tro l in terv als earlier with l ∈ 0 , 1 , 2 , . . . , then the b elief v ector in the curren t in terv al k is giv en b y π k ,i = [ p j 1 p j 2 p j 3 ] P l . W e will henceforth represen t the v ector [ p j 1 p j 2 p j 3 ] b y p j . If user i is not sc heduled in control in terv al k , then the b elief v ector of this user ev olv es to the next in terv al as follo ws: π k − 1 ,i = π k ,i P . It has b een pro ven in [2 0] t ha t the b elief v ector π k ,i is a sufficien t statistic to an y information ab out the c hannel of user i in the past con trol in terv al, in our case, the sc heduling decis io ns a nd t he channel feedbac k information f rom the past. Th us the sc heduling decision in any con trol interv al can b e solely based on the b elief v ectors fo r that in terv al a nd not on the past c ha nnel feedbac k or sc hedule informat io n. Sche duling Policy A k : A sc heduling p olicy A k in the control in terv al k is a mapping from the b elief v ectors a nd the con trol in terv al index to an action as follow s: A k : ( π k , 1 , π k , 2 , k ) → a k ∀ k ≥ 1 Note that the sc heduling p olicy can, in general, b e time-v arian t. R ewar d Structur e : In an y con trol in terv al k , a rew ard of α i is accrued when the sc heduled user sends back F i . Let state 1 b e defined suc h that no rew ard is accrued when an user in state 1 is sc heduled, i.e., α 1 = 0. This assumption can b e satisfied by letting state 1 represen t the c ha nnel strengths that do not allow an y useful data tra nsfer. Since 4 state 3 represen ts c hannel strengths that are b etter than those represen ted b y state 2, w e ha ve α 3 ≥ α 2 . Throughout this rep ort we will assume α 3 = 1 without lo ss of generalit y . Net Exp e cte d R ewar d in the c ontr ol interval m , V m : With the b elief v ectors, π m, 1 , π m, 2 and the sc heduling p olicy , { A k } k ≤ m , fixed, the net exp ected rew ard, V m , is the sum of the rew ard, R m ( π m,a m , a m ), expected in the current control in terv al m and E[ V m − 1 ], the net reward exp ected in t he future control in terv als conditio ned on the b elief ve ctors and the sc heduling decision in the current con trol in terv al. F ormally , V m ( π m, 1 , π m, 2 , { A k } k ≤ m ) = R m ( π m,a m , a m ) + E[ V m − 1 ( π m − 1 , 1 , π m − 1 , 2 , { A k } k ≤ m − 1 ) | π m, 1 , π m, 1 , a m ] , where the exp ectation is ov er the b elief v ectors π m − 1 , 1 , π m − 1 , 2 . With 1 α = [ α 1 α 2 α 3 ] T , the exp ected curren t rew ar d can b e written as R m ( π m,a m , a m ) = π m,a m α. Note that if a m w as observ ed to b e in state i in t he previous interv al then π m,a m = p i and R m ( π m,a m , a m ) = p i α . Performanc e Metric- the Sum R ewar d, η sum : F or a giv en sc heduling p olicy , { A k } k ≥ 1 , the sum rew ard is giv en b y η sum ( { A k } k ≥ 1 ) = lim m →∞ V m ( p ss , p ss , { A k } k ≥ 1 ) m , (3) where p ss is the steady state proba bility vector of the underlying Marko v c hannels. Optimal Sche duling Policy, { A ∗ k } k ≥ 1 : { A ∗ k } k ≥ 1 = arg max { A k } k ≥ 1 η sum ( { A k } k ≥ 1 ) . (4) 3 Structur e of the Greedy P olicy Consider the following p olicy: b A k : ( π k , 1 , π k , 2 , k ) → a k = arg max a k R k ( π k ,a k , a k ) = arg max i π k ,i α ∀ k ≥ 1 . Since the ab ov e given p olicy attempts to maximize the exp ected curren t rew ard, without an y regard to the exp ected f uture rew a r d, it follo ws an a pproac h tha t is fundamentally gr e e dy in nature. F or this reason, w e henceforth call { b A k } k ≥ 1 the G reedy P olicy . W e now pro ceed to derive t he implemen tation structure of the greedy p olicy . Lemma 2. F o r any k ≥ 0 , the imme diate r ewar d exp e cte d b y sche duling a n user that was ob serve d k + 1 c ontr ol interva l s e arlier, to b e in state 2 , lies b etwe en the r ewar ds c orr esp onding to states 3 and 1, i. e , p 1 P k α ≤ p 2 P k α ≤ p 3 P k α, ∀ k ∈ 0 , 1 , 2 , . . . (5) 1 x T indicates the transp ose of vector x . 5 Lemma 3. The imme diate r ewar d exp e cte d by sc h e duling an user that was observ e d, k + 1 c ontr ol intervals e arlier, to b e in state 3 , monotonic al ly de cr e ases to p ss α as k incr e ases fr om 0 → ∞ , i.e. , p 3 P k +1 α ≤ p 3 P k α, ∀ k ∈ 0 , 1 , 2 , . . . p 3 lim k →∞ P k α = p ss α (6) Note that p ss α is the imme diate r ewar d exp e cte d w hen no p ast information ab out the user is available or when the b elief ve ctor of the user e quals the ste ady state ve ctor, p ss . Lemma 4. The imme diate r ewar d exp e cte d by sc h e duling an user that was observ e d, k + 1 c ontr ol intervals e arlier, to b e in state 1, mono toni c al ly incr e ases to p ss α as k incr e ases fr om 0 → ∞ , i.e. , p 1 P k +1 α ≥ p 1 P k α, ∀ k ∈ 0 , 1 , 2 , . . . p 1 lim k →∞ P k α = p ss α (7) Note that, from the a b ov e lemmas, w e ha ve p 2 lim k →∞ P k α = p ss α. (8) In all the a b o ve results, the immediate rew ard a pproac hes p ss α as the time since the last observ ation of the user increases. This is b ecause, in the underlying first order Mark ov c hain, the dep endency b et wee n the states in tw o control interv als (memory) diminishes as the time ga p b et w een the con trol in terv als increases. These lemmas a r e instrumen tal in obta ining the algorithm f o r implemen ting the greedy p olicy , that will b e summarized so o n. W e first identify tw o ty p es o f system based on the prop ert y of the P matrix and the reward v alues. • Ty p e I system: when p 2 α ≥ p ss α • Ty p e I I system: when p 2 α < p ss α The implemen ta t ion alg o rithm for the g r eedy p olicy significan tly c hanges dep ending on the t yp e of the system. Prop osition 5. When the system is typ e I , the gr e e dy p olicy is implemente d as fol lows • If fe e db ack F 3 or F 2 was r e c eive d fr om the user sche dule d in the pr evious c ontr ol interval (identifie d as user s ), r esche dule the user in the curr ent c ontr ol interval. • Sche dule the other user (identifie d as user u ) i f fe e db ack F 1 was r e c eive d . Pr o of. Refering to Fig. 1, when F 3 w as receiv ed from user s , the exp ected rew ard if s is sc heduled again is give n b y p 3 α . The exp ected rew ard if u is sc heduled is a p oint on o ne of the thr ee curve s (for k > 0) in the figure. Note that p 3 α is greater t ha n an y p oin t (the y-dimension) on any of the curv es, thus establishing ‘retain the sc hedule if F 3 is receiv ed’ p olicy . This result essen tially stems from the follo wing facts: 1) Higher rew ard ( α 3 = 1) 6 ss p alpha p 2 alpha p 3 alpha p 1 p k alpha p 2 p k alpha p 3 p k alpha p 1 alpha k TYPE I SYSTEM Figure 1: T yp e I system is accrued when the sche duled user happ ens to b e in state 3 than in other states. 2) The Mark ov c hannel is p ositiv ely correlated in time ( p ii ≥ p j i if i 6 = j ). Similarly when F 1 w as receiv ed from user s , the expected r eward if s is sc heduled again is give n by p 1 α whic h is less than any other p oint o n the three curves , th us establishing ‘switc h if F 1 is receiv ed’ p olicy . When F 2 is receiv ed, assuming the greedy p olicy w as implemen ted so fa r since the b eginning of the sc heduling pro cess, the rew ard exp ected if u is sc heduled lies on the lo we r curv e p 1 P k α for k > 0. This is b ecause the first time (since the b eginning of the sc heduling pro cess) a F 2 is receiv ed (call this interv al m 0 ), if greedy p olicy was implemen ted so far, user u (the w aiting user) w ould not hav e g iven F 3 when it w as dropp ed and since this is the first time F 2 is observ ed b y the sch eduler, u w ould not ha ve sen t F 2 either, when it w as dropp ed. Therefore u m ust ha ve sen t F 1 the last time it w as sc heduled (and hence dropp ed). Th us the rew ard exp ected if u is sc heduled now (at m 0 ) falls on the b ottom curve leading to retaining of user s (since p 2 α ≥ p 1 P k α fo r an y k ≥ 0). In t he next instance of F 2 reception, the same log ic holds (as long as greedy p olicy is implemen ted a ll along un til this instance) and so on for subseq uent instances o f F 2 . Note that the condition g r e e dy m ust b e implem e nte d sinc e the b e ginning until ‘now’ is quite natura l given our interest in implemen ting the p olicy in the current in terv al. Th us there is no loss of generality here. These arguments establish the prop osition. Prop osition 6. When the system is typ e II, the gr e e dy p olicy is implemente d as fol lows • If fe e db ack F 3 was r e c eive d fr om the user sch e dule d in the pr evious c ontr ol interval (c al l i t user s ), r esche dule the user in the curr ent c ontr ol interval. • If fe e db ack F 1 was r e c eive d, sche dule the o ther user. 7 • If fe e db ack F 2 was r e c eive d, c alculate the exp e cte d imme diate r ewar d if the other user (iden tifie d as user u ) is sc h e dule d in the curr ent interval (identifie d as m ) as fol lows : π m,u α wher e π m,u is the b elief ve ctor of user u in the curr ent c ontr ol interval m . Now, sche dule user s is p 2 α ≥ π m,u α . Otherwi s e, sche dule user u . Pr o of. Refer to Fig. 2. The argumen t for F 3 and F 1 are the same a s in the previous case. p 3 alpha p 2 alpha ss p alpha p 1 alpha p 1 p k alpha p 2 p k alpha p 3 p k alpha k k TYPE II SYSTEM Figure 2: T yp e I I system When F 2 is receiv ed, a s seen fro m the Fig . 2, the waiting user u could hav e an expected rew ard greater than that of s if u had b een dropp ed due to F 1 at least k 0 in terv als earlier or if p 2 P k α do es not monotonically increase to p ss α (Fig. 2 shows suc h a situation). Th us it is necessary to explicitly calculate the exp ected rew ard of user u b efore making a greedy decision. Note that the results in Lemma 2-4 and hence the implemen tatio n structure of the greedy p olicy in Prop ositions 5-6 hold ev en when α 1 > 0 as lo ng a s α 1 ≤ α 2 ≤ α 3 . 4 Comparison w ith t he Genie aided system In the t wo user, two state case, if t he user sc heduled (user s ) in the previous con trol in terv al was observ ed to b e in the b est state, the sc heduler retains the sc hedule (and hence accrues the b est p ossible rew ar d) since there is nothing more to gain by sc heduling to the other user, while a loss is p o ssible if the other user was in the w o r st state. Similarly , if user s w as observ ed to b e in the w o rst state, the sc heduler switc hes to the other, since there is not hing more t o lose b y sc heduling to the other user (as compared to sc heduling s again), while a gain is p ossible if the other user w as in the b est state. Th us the t wo user, t wo state system is equiv alen t, in p erformance, to a genie-aided system where the sc heduler learns ab out the states of b oth the users at the end of ev ery in terv al. 8 This equiv alence do es not hold in the three state system. The nothing m or e to gain argumen t w orks when s w as observ ed to b e in state 3 and the n o thing mo r e to lose argumen t w orks when s w as observ ed to b e in state 1. How ev er, when s w as o bserv ed to b e in state 2, i.e., when F 2 w as receiv ed, by sc heduling t o the other user (user u ) , the sc heduler may either g ain (if u w as in state 3) or lose ( if u w as in state 3 ) as compared to when it sche dules s ag ain. Th us with informatio n ab out the state of the other user, there is definitely a r o om for impro v emen t. Th us the three state (in g eneral, more than t wo states) system is not equiv alen t to the genie-aided system. Note that, the genie aided system can b e redefined a s f o llo ws: t he sche duler learns ab o ut the state o f b oth the users iff s was observ ed in stat e 2 . W e see f rom the discussion so f a r that this mo dified definition do es not impart an y p erformance loss in the genie-aided system. F rom the preceding discussion, it can b e seen t ha t the original three user system approac hes the genie-aided system under a n y of the followin g cases: • p 2 α = p 3 α . Th us on receiving F 2 from user s , nothing more can b e gained by sc heduling the other user u (while a loss is p ossible o n switc hing). Hence, s is resc heduled. Th us there is no need to learn the previous con trol in terv al state of u . • p 2 α = p 1 α . Th us on receiving F 2 from user s , nothing more can be lo st b y sc hedul- ing the other user u (while a g ain is p ossible on switc hing). Hence, u is sc heduled. Again, there is no need to learn the previous control interv al state of u . With mathematical analysis, it can b e seen that case 1 is p ossible iff α 2 = α 3 and p 21 = p 31 . While case 2 is p ossible iff α 2 = α 3 and p 11 = p 21 . When the first set of conditions is satisfied, it can b e seen that the states 2 and 3 can b e merged at a ve ry generic lev el ( no t sp ecific to the t yp e of infor mation used for sc heduling) with the reduced transition matrix given as b elo w: p 11 p 12 + p 13 p 21 p 22 + p 23 (9) where ro w 1 a nd column 1 corresp onds to state 1 and row2 and column2 corresp onds to the merged state. Th us the c hannel is effectiv ely mo deled b y a tw o-state Mark ov chain th us explaining the equiv alence with the genie-aided system. Ho we ver, it is in teresting to no t e t ha t, when the second set o f conditions is satis- fied, suc h a merger is not p ossible b etw een states 1 a nd 2 since w e still hav e p 13 ≤ p 23 making them differen t in their relationship with state 3. Ho we ver, in the con text of the AR Q based sc heduling problem (sp ecifically case 2 in the preceding discussion), they are synon ymous and render the original system equiv alen t to the genie-aided system. 5 Bounds On the System Sum Rew ard C apacit y Prop osition 7. F or the typ e I system, a lower b ound to the sum r ewar d c ap acity, S LB ,I , is given as S LB ,I ≥ p 2 α − p 2 ss (1)( p 2 α − p 1 α ) (10) wher e p 2 ss (1) is the ste ady state pr ob ability that the state of the user is 1. 9 This b ound is obtained b y replacing exp ected rew ard giv en F 3 , i.e., p 3 α with p 2 α in the sum reward ev aluation o f the g r eedy p olicy . Th us this is in fact a lo w er b ound to the greedy p olicy . No t e tha t S LB ,I decreases as the steady state probabilit y of the less rew arding state 1 ( p ss (1)) increases. Also notice that as p 1 α → p 2 α , S LB → p 2 α . This is expected in light of the approac h w e used in obtaining S LB , since the only reward tha t w e accrue in an y con trol in terv al is now p 2 α . Also, the b ound approach es the system sum rew ard capacity when states 2 and 3 b ecome increasingly synon ymous. This happ ens as α 2 → α 3 and p 31 → p 21 . The last statemen t comes from o ur discussion in the previous section, on the equiv alence with the genie-aided system. Prop osition 8. F or the typ e II system, a low er b ound to the s um r ewar d c ap acity is given as S LB ,I I = (2 p ss (3) − p 2 ss (3)) p 3 α + (1 − p ss (3)) 2 p 3 α (11) The pro of pro ceeds as f o llo ws: In an y control in terv al the exp ected immediate rew ard after a feedbac k F 2 is receiv ed in the pr evious inte r v al is replaced by the rew ard that would b e expected if the other ( no t sc heduled in the previous in terv al) user w ere sc heduled. Note that, b y the implemen tatio n structure of the greedy p olicy , this lat ter rew ard is ≤ the rew ard corresp onding to the greedy choice 2 . Nex t w e replace p 2 α with p 1 α giving the sum rew ard capacit y lo wer b ound. Note that S LB ,I I is the same as a t w o user system t ha t accrues rew ard p 3 α if a t least one of the users are in state 3 and rew ard p 1 α if no ne of them ar e in state 3. This in terpretation is strikingly similar to the interpretation w e made in the t w o -state tow user problem in our preliminary researc h. How ev er, note t ha t the presen t in terpretat io n do es not yield to the case when the state o f b oth users are av aila ble. F or instance, if none of the users is in state 3 and a t least one of them is in state 2, then, ideally , if the states o f b oth the users are known, a rew ard of p 2 α m ust b e accrued instead of p 1 α . This demonstrates the loss in p erfo r ma nce due to lac k of kno wledge of b oth user states, th us differen tiating the 3-state system from the 2- state system. Prop osition 9. An upp er b ound to the system sum r ewar d c ap acity is g iven as S U B = (2 p ss (3) − p 2 ss (3)) p 3 α + (2 p ss (1) p ss (2) + p 2 ss (2)) p 2 α + p 2 ss (1) p 1 α (12) The b ound is actually the sum rew ard capacit y of the genie-aided system. Here if at least o ne o f the users was in state 3 in the previous inte rv al, the g reedy p olicy sc hedules that user and accrues a rew ard p 3 α . If none of the users w ere in stat e 3 but at least one of them in state 2, that user is sc heduled and a rew ard of p 2 α is accrued. If b oth the users w ere in state 1, a rew ard of p 1 α is accrued. 6 On the Op t imalit y of th e Greedy P o licy W e pro ceed by intro ducing the following prop erties of the P matrix. 2 The repla cement is only with resp ect to the accr ued reward in the sum reward expression, while the actual s chedule decision is alwa ys ma int a ined as greedy , so as not to disturb the initia l conditions of the problem for the future interv als. 10 Lemma 10. Whe n p 2 P [001] T ≤ p 23 (c ondition (A)), then p 2 P k +1 [001] T ≤ p 2 P k [001] T ∀ k ≥ 0 . Also the ste ady state elem e nt p ss (3) ≤ p 23 and p 2 P k [001] T monotonic al ly de- cr e ases to p ss (3) as k → ∞ . ( A) is also a ne c essary c ondition f o r the pr e c e ding statement to hold. Lemma 11. Under (A) fr om pr evious lemma, p 1 P k [001] T monotonic al ly incr e ases to p ss (3) as k → ∞ , i.e, p 1 P k +1 [001] T ≥ p 1 P k [001] T ∀ k ∈ 0 , 1 , 2 , . . . an d p 1 lim k →∞ P k [001] T = p ss (3) ≤ p 23 This result can b e obtained b y replacing α in L emma 4 b y [0 0 1] T , and using p ss (3) ≤ p 23 from Lemma 10. Prop osition 12. When p 12 = p 22 = p 32 and p 23 p 31 ≥ p 21 p 13 , gr e e dy p olicy is optima l among the p olicies that r etain the sche dule whe n fe e db ack F 3 is r e c eive d. Conjecture 13. When p 12 = p 22 = p 32 and p 23 p 31 ≥ p 21 p 13 , gr e e dy p olicy is glob al ly optimal. The premise b ehind our conjecture is that , in light of the p ositiv e correlation prop erty of the Marko v c hain, there is no obvious reason wh y the globally optima l p olicy w ould reject an user t hat was in the b est state p ossible in the previous control in terv al. Th us it app ears t ha t the globally optimal p olicy b elongs to the class that retains the sc hedule when F 3 is receiv ed suggesting that it is indeed the greedy p olicy itself. 7 Conclus ion W e hav e considered the problem of sc heduling under part ia l ch annel state information assumption in a Mark ov - mo deled t w o- user dow nlink system with a channel state feedbac k pro vision. W e classified the system in tw o t yp es based on the transition probabilit y matrix of the Mark ov chains and the rew ard structure. F or each type, w e established the implemen tation structure of the greedy p o licy . F or the type I system , we sho wed that the g r eedy p olicy can b e implemen ted via a simple round robin algorithm as w a s seen in our earlier w ork for t he tw o-state Mark ov mo del. W e studied the conditions under whic h the original system simplifies to the genie aided system and provided insights on these conditions. By appropriately b ounding the immediate rew ard accrued in an y con trol in t erv al, w e obtained b ounds to the sum rew ard capacit y of the system. Under some conditions on t he P mat r ix, b y restricting the searc h space to a sp ecific class of sc hedulers, w e sho w ed that the greedy p olicy is ‘constrained searc h space’ optimal and conjectured, with reasons, t ha t the greedy p olicy is g lo bally optimal as w ell. 11 A Pro o f of Le mma 2 Let β = [ β 1 β 2 β 3 ] T , with β 1 ≤ β 2 ≤ β 3 . Consider the inequalit y p 3 β ≥ p 2 β . This can b e rewritten as, β 1 p 31 + β 2 p 32 + β 3 p 33 ≥ β 1 p 21 + β 2 p 22 + β 3 p 23 ⇔ β 1 ( p 31 − p 21 ) ≥ β 2 ( p 22 − p 32 ) + β 3 ( p 23 − p 33 ) ⇔ β 1 ( p 21 − p 31 ) ≤ − β 2 ( p 22 − p 32 ) + β 3 ( p 33 − p 23 ) (13) Since β 2 ≥ β 1 , it is no w sufficien t to prov e β 2 ( p 21 − p 31 + p 22 − p 32 ) ≤ β 3 ( p 33 − p 23 ), i.e., β 2 ( p 33 − p 23 ) ≤ β 3 ( p 33 − p 23 ) whic h is indeed true. Consider the inequalit y p 2 β ≥ p 1 β , β 1 p 21 + β 2 p 22 + β 3 p 23 ≥ β 1 p 11 + β 2 p 12 + β 3 p 13 ⇔ β 2 + p 23 ( β 3 − β 2 ) − p 21 ( β 2 − β 1 ) ≥ β 2 + p 13 ( β 3 − β 2 ) − p 11 ( β 2 − β 1 ) (14) The last inequalit y is indeed true, since p 23 ≥ p 13 , p 21 ≤ p 11 and β 3 ≥ β 2 ≥ β 1 . Th us if β 1 ≤ β 2 ≤ β 3 and β = [ β 1 β 2 β 3 ] T , p 3 β ≥ p 2 β ≥ p 1 β (15) W e can write, for i ∈ 1 , 2 , 3, p i P k +1 α = p i [ p 1 P k α p 2 P k α p 3 P k α ] T . Th us if p 1 P k α ≤ p 2 P k α ≤ p 3 P k α , we ha v e, using (15), p 1 P k +1 α ≤ p 2 P k +1 α ≤ p 3 P k +1 α . Since α 1 = 0 ≤ α 2 ≤ α 3 = 1, the lemma is established using induction. B Pro of of Le mma 3 and Lemma 4 Consider p 3 P k +1 α = p 31 p 1 P k α + p 32 p 2 P k α + p 33 p 3 P k α . Since p 1 P k α ≤ p 2 P k α ≤ p 3 P k α from Lemma 2, w e hav e p 3 P k +1 α ≤ p 3 P k α . Lemma 4 can b e prov ed similarly . C Pro of of Le mma 10 Let p 2 P k [001] T ≤ p 2 P k − 1 [001] T . Multiplying b oth sides b y p 22 and adding to b o t h sides p 21 p 1 P k − 1 [001] T + p 23 p 3 P k − 1 [001] T , p 21 p 1 P k − 1 [001] T + p 22 p 2 P k [001] T + p 23 p 3 P k − 1 [001] T ≤ p 2 P k [001] T (16) If w e sho w that p 21 p 1 P k [001] T + p 23 p 3 P k [001] T ≤ p 21 p 1 P k − 1 [001] T + p 23 p 3 P k − 1 [001] T , then using (16), p 21 p 1 P k [001] T + p 22 p 2 P k [001] T + p 23 p 3 P k [001] T ≤ p 2 P k [001] T , i.e, p 2 P k +1 [001] T ≤ p 2 P k [001] T . Consider the inequalit y p 21 p 1 P k [001] T + p 23 p 3 P k [001] T ≤ p 21 p 1 P k − 1 [001] T + p 23 p 3 P k − 1 [001] T ⇔ p 2 P k +1 [001] T − p 22 p 2 P k [001] T ≤ p 2 P k [001] T − p 22 p 2 P k − 1 [001] T ⇔ p 2 ( P k +1 [001] T − P k [001] T ) ≤ p 22 ( p 2 P k [001] T − p 2 P k − 1 [001] T ) ⇒ p 2 P k +1 [001] T ≤ p 2 P k [001] T (17) 12 where the la st inequalit y is from t he initial assumption that p 2 P k [001] T − p 2 P k − 1 [001] T ≤ 0. With p 2 P 1 [001] T ≤ p 2 P 0 [001] T , i.e, p 2 P [001] T ≤ p 23 , using induction, we ha ve the p 2 P k +1 [001] T ≤ p 2 P k [001] T ∀ k ≥ 0. Since steady state exists, by t he definition of steady state, lim k →∞ P k = p ss p ss p ss . Thus p 2 lim k →∞ P k [001] T = p ss (3) and p ss (3) ≤ p 23 b y the monotonic decrease prop erty o f p 2 P k [001] T . Also note tha t the direction of the inequalities throughout this pro of can b e c hang ed a nd w e can prov e that p 2 P k [001] T monotonically incr e ases to p ss (3) as k → ∞ if p 2 P [001] T ≥ p 23 . This establishes that p 2 P [001] T ≤ p 23 is a necessary condition for the first part of the Lemma to ho ld. D Pro of of Prop o sition 12 Let the probability transition matrix satisfy the f o llo wing conditio ns: p 12 = p 22 = p 32 (18) p 23 p 31 ≥ p 21 p 13 (19) The preceding inequality along with condition (18) is equiv alen t to condition (A) in Lemma 10. Th us under (18) a nd (19 ), b oth Lemma 10 and Lemma 11 hold true. F rom Lemma 10, p 23 ≥ p ss (3). F rom (18), p ss (2) = p 22 . Th us p 2 α − p ss α = p 22 α 2 + p 23 − p ss (2) α 2 − p ss (3) = p 23 − p ss (3) ≥ 0. The system is th us t yp e I. Consider a con trol in terv al m > 1 with b elief v ectors π m, 1 , π m, 2 and actio n a m . If w e can sho w for an y m that, assuming the greedy p olicy will b e implemen ted in all the future control in terv als, the greedy p olicy is optimal in control in terv al m , then using induction from in terv al 1, where g reedy is indeed o ptimal, we could establish the long term optimalit y of the greedy p olicy . Let { A k } k ≤ m − 1 = { b A k } k ≤ m − 1 and let S k b e the state v ector suc h that S k ( i ) is the state of the c hannel of user i in interv al k . W e rewrite the net exp ected rew ard as fo llo ws V m ( π m, 1 , π m, 2 , { a m , { b A k } k ≤ m − 1 } ) = π m,a m α + X S m P S m | π m, 1 ,π m, 2 ( S m | π m, 1 , π m, 2 ) ˆ V m − 1 ( S m , ˆ a m − 1 ) , where ˆ V m − 1 is the exp ected future reward conditioned on the state v ector in con trol in terv al m . The h at on this quan tity emphasize s the use of the greedy p olicy in all k ≤ m − 1 . P S m | π m, 1 ,π m, 2 ( S m | π m, 1 , π m, 2 ) is the conditional probabilit y of the curren t state v ector S m giv en the b elief v ectors π m, 1 , π m, 2 . The sc heduling decision in the next con trol interv al, ˆ a m − 1 , is based on the greedy p olicy and is a function of the AR Q feedbac k receiv ed in the curren t con trol interv al k , i.e., S m ( a m ). The decision logic was summarized in Prop osition 5. W e no w pro ceed to compare the net exp ected rew ard when a m = 1 and a m = 2. The net exp ected r eward when a m = 1 is written as follo ws, 13 V m ( π m, 1 , π m, 2 , { a m = 1 , { b A k } k ≤ m − 1 } ) = π m, 1 α + P S m | π m, 1 ,π m, 2 ( S m = [1 1] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [1 1] , ˆ a m − 1 = 2 + P S m | π m, 1 ,π m, 2 ( S m = [1 2] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [1 2] , ˆ a m − 1 = 2 + P S m | π m, 1 ,π m, 2 ( S m = [1 3] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [1 3] , ˆ a m − 1 = 2 + P S m | π m, 1 ,π m, 2 ( S m = [2 1] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [2 1] , ˆ a m − 1 = 1 + P S m | π m, 1 ,π m, 2 ( S m = [2 2] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [2 2] , ˆ a m − 1 = 1 + P S m | π m, 1 ,π m, 2 ( S m = [2 3] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [2 3] , ˆ a m − 1 = 1 + P S m | π m, 1 ,π m, 2 ( S m = [3 1] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [3 1] , ˆ a m − 1 = 1 + P S m | π m, 1 ,π m, 2 ( S m = [3 2] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [3 2] , ˆ a m − 1 = 1 + P S m | π m, 1 ,π m, 2 ( S m = [3 3] | π m, 1 , π m, 2 ) ˆ V m − 1 S m = [3 3] , ˆ a m − 1 = 1 (20) Note that the sc heduler uses the information of the state of the sc heduled user (user 1) alone in the sc heduling decisions, consisten t with the problem setup. Also note tha t when S m (1) = 2, the sc hedule is retained. This is consisten t with the implemen tation structure of the greedy p o licy seen in Prop osition 5, where the sc heduler r etains the sche duling cho ic e even F 2 is r e c ei v e d . As w as discussed in the same prop osition, this is a greedy decision only if an user w as neve r dropp ed in t he past for giving feedbac k F 3 . Since w e are restricting to the class of sc hedulers that retains the sc hedule when F 3 is satisfied 3 , this is indeed a greedy decision. Since t he Mark o v c hannel statistics are iden tical across the users, w e hav e ˆ V k S k +1 = [ x y ] , ˆ a k = 1] = ˆ V k S k +1 = [ y x ] , ˆ a k = 2] . Expanding the net exp ected rew ard when a m = 2 along the lines of (20) and using the preceding symmetry prop ert y , w e hav e, V m ( π m, 1 , π m, 2 , { a m = 1 , { b A k } k ≤ m − 1 } ) − V m ( π m, 1 , π m, 2 , { a m = 2 , { b A k } k ≤ m − 1 } ) = π m, 1 α − π m, 2 α + h ˆ V m − 1 S m = [3 2] , ˆ a m − 1 = 1 − ˆ V m − 1 S m = [2 3] , ˆ a m − 1 = 1 i × h π m, 1 (3) π m, 2 (2) − π m, 1 (2) π m, 2 (3) i (21) Let ˆ a m indicate the greedy c hoice among the users in the curren t control in t erv al, i.e., ˆ a m = arg max i ∈ 1 , 2 R m ( π m,i ). Let ˜ a m indicate the other user. The net exp ected rew a r d can no w b e rewritten a s, V m ( π m, 1 , π m, 2 , { a m = ˆ a m , { b A k } k ≤ m − 1 } ) − V m ( π m, 1 , π m, 2 , { a m = ˜ a m , { b A k } k ≤ m − 1 } ) = π m, ˆ a m α − π m, ˜ a m α + h ˆ V m − 1 S m = [3 2] , ˆ a m − 1 = 1 − ˆ V m − 1 S m = [2 3] , ˆ a m − 1 = 1 i × h π m, ˆ a m (3) π m, ˜ a m (2) − π m, ˆ a m (2) π m, ˜ a m (3) i (22) 3 This is the only instance in the pro of where w e constrain the search space . 14 where, by definition, π m, ˆ a m α ≥ π m, ˜ a m α . W e no w pro ceed to show that the quan tity ˆ V m − 1 S m = [3 2] , ˆ a m − 1 = 1 − ˆ V m − 1 S m = [2 3 ] , ˆ a m − 1 = 1 is no n- negativ e. With ˆ V k S k +1 = [ x y ] := ˆ V k S k +1 = [ x y ] , ˆ a k = 1 , and expanding ˆ V m − 1 S m = [ x y ] along the lines of (20) with π m − 1 , 1 = p x and π m − 1 , 2 = p y and a m − 1 = 1, w e ha ve the follo wing. ˆ V m − 1 S m = [3 2] − ˆ V m − 1 S m = [2 3] = p 3 α − p 2 α + h ˆ V m − 2 S m − 1 = [3 2] − ˆ V m − 2 S m − 1 = [2 3] i ( p 33 p 22 − p 23 p 32 ) ( 2 3) By the pro p ert y o f the P matrix, p 33 ≥ p 23 and p 22 ≥ p 32 . Also, we hav e seen in Lemma3 that r 3 ≥ r 2 ≥ r 1 . Thus if ˆ V m − 2 S m − 1 = [3 2] − ˆ V m − 2 S m − 1 = [2 3] ≥ 0, then ˆ V m − 1 S m = [3 2] − ˆ V m − 1 S m = [2 3] ≥ 0 . Expanding ˆ V m − 2 S m − 1 = [3 2] − ˆ V m − 2 S m − 1 = [2 3] ≥ 0 along the lines o f (23) rep eatedly and using ˆ V 1 S m = [3 2] − ˆ V 1 S m = [2 3] = r 3 − r 2 ≥ 0, b y induction, we can sho w that ˆ V m − 2 S m − 1 = [3 2] − ˆ V m − 2 S m − 1 = [2 3] ≥ 0. Th us ˆ V m − 1 S m = [3 2] − ˆ V m − 1 S m = [2 3] ≥ 0 . Applying this inequalit y in (22), w e see that the optimality of t he greedy p olicy (in t he sp ecified class of p olicies) can b e established if w e show that the follo wing conditio n (condition (S)) holds: π m, ˆ a m (3) π m, ˜ a m (2) ≥ π m, ˆ a m (2) π m, ˜ a m (3) . (24) It app ears that the preceding condition is to o generic to hold true. Ho w ev er, b y con- straining the b elief v ectors to the set of v alues that will b e encoun tered in the ARQ based sc heduling problem, w e will now sho w that, (24) indeed ho lds true. W e first intro duce the fo llo wing result: F rom Lemma 10, p 2 P k [001] T monotonically decreases to p ss [001] T = p ss (3) as k increases. Since p 2 P k [010] = p 22 = p ss (2), the exp ected rew ard from an user giv en the c hannel o f the user w as in state 2 k + 1 interv als earlier, giv en b y , p 2 P k α = α ( 2) p ss (2) + p 2 P k [001] T monotonically decreases to p ss α . W e pro ceed with studying the sufficien t condition under v arious b elief v ectors encoun- tered in the AR Q based sc heduling problem. Assume the sc heduling pro cess has b egun in a control interv al earlier than m a nd is p erfo rmed unin terrupted till the horizon, i.e, con trol in t erv al 1 - assumption (A) 4 . The b elief v ector of the greedy c ho ice ˆ a m and the other user ˜ a m , for the t yp e I system under consideration, fa lls under o ne o f the fo llowing cases. • 1. User ˆ a m w as sc heduled in the previous control in terv al, m + 1, and had giv en a feedbac k F 3 . The b elief v ector π m, ˆ a m = p 3 . The other user w as either sc heduled in k + 1 control in terv als earlier (with k ∈ 1 , 2 , . . . ) with an y of the three p ossible 4 Note that there is no lo ss of generality in this a ssumption for the following reason: The problem setup a nd the optimality analysis of any po licy implicitly as sumes uninterrupted scheduling until the horizon. This is to be in tune with the interv al to interv al evolution of the underly ing Markov chains. Thu s when the uninterrupted scheduling pro cess b egins at a control interv al M , for all m < M condition (A) is satisfied automatically . In the control interv al M , how ever, part of the condition, i.e, sche duling pr o c ess b e gan e arlier , do es no t hold. But at the origin, i.e., the co ntrol in terv al M , the b elief vectors of all the us ers take the stea dy state v alue, p ss . Thus, by all symmetry , the question of what scheduling decision to make and hence the question of the optimality of the gre edy p olicy at M b eco mes irrelev ant. 15 feedbac k or was nev er sc heduled in the past. Thus the b elief v ector of ˜ a m is of the form p i P k with i ∈ 1 , 2 , 3 and k ∈ 1 , 2 , . . . . Note that if ˜ a m w as nev er sc heduled in the past, then π m, ˜ a m = p ss whic h still falls in the preceding form. • 2. User ˜ a m w as sc heduled in the previous con tro l in terv al and had g iv en a feedbac k F 1 . User ˆ a m w as either sc heduled k + 1 con tro l in terv als earlier (with k ∈ 1 , 2 , . . . ) with a n y of the three p ossible f eedbac ks or w as nev er sc heduled in the past. The b elief vec to rs are giv en by π m, ˜ a m = p 1 and π m, ˆ a m = p i P k with i ∈ 1 , 2 , 3 and k ∈ 1 , 2 , . . . . • 3. User ˆ a m w as sc heduled in the previous con tro l in terv al and had g iv en a feedbac k F 2 . User ˜ a m w as sc heduled k + 1 con trol interv a ls earlier (with k ∈ 1 , 2 , . . . ) with feedbac k F 1 or w a s nev er sc heduled in the past. The b elief ve cto r s are giv en b y π m, ˆ a m = p 2 π m, ˜ a m = p 1 P k with k ∈ 1 , 2 , . . . . • 4. User ˆ a m w as sc heduled in the previous con tro l in terv al and had g iv en a feedbac k F 2 . User ˜ a m w as sc heduled k + 1 con trol interv a ls earlier (with k ∈ 1 , 2 , . . . ) with feedbac k F 2 . The b elief v ectors are giv en by π m, ˆ a m = p 2 π m, ˜ a m = p 2 P k with k ∈ 1 , 2 , . . . . • 5. User ˆ a m w as sc heduled in the previous con tro l in terv al and had g iv en a feedbac k F 2 . User ˜ a m w as sc heduled L + 1 or more con t r ol in terv als earlier with feedbac k F 3 . L is the n um b er o f coherence in terv als suc h that, rew ard expected from an user that w as observ ed to b e in state 2 in the previous con t r o l in terv al is higher than the rew ard exp ected from an user that w as observ ed in state 3 k + 1 con trol in terv als earlier iff k ≥ L . Mathematically , L is suc h that, p 2 α ≥ p 3 P k α if k ≥ L p 2 α < p 3 P k α if k < L (25) Note that suc h an L exists since p 2 α ≤ p 3 α and b oth p 2 P k α and p 3 P k α mono- tonically decreases (with k ) to p ss α ≤ p 2 α . The b elief v ectors are hence giv en as π m, ˆ a m = p 2 , π m, ˜ a m = p 3 P k with k ≥ L . • 6. User ˜ a m w as sc heduled in the previous con tro l in terv al and had g iv en a feedbac k F 2 . User ˆ a m w as sc heduled k + 1 control in terv als earlier with feedbac k F 3 with k < L . The b elief v ectors are as follows : π m, ˆ a m = p 3 P k with k < L and π m, ˜ a m = p 2 . The ab ov e list is exhaustiv e. In fact, cases 5 and 6 will nev er app ear since we a r e considering the class of sche dulers that nev er drop an user when it sends an F 3 . Ho wev er, w e will sho w that ev en f or these cases the sufficien t condition is satisfied. In all the ab o v e 6 cases, R m (ˆ a m ) ≥ R m (˜ a m ) as required b y the definition of ˆ a m . W e now fo cus on the sufficien t condition (S) for eac h of the ab o ve cases. • 1. Sufficien t condition (S) is given as follo ws: π m, ˆ a m (3) π m, ˜ a m (2) ≥ π m, ˆ a m (2) π m, ˜ a m (3) i.e., p 33 p i P k [010] T ≥ p 32 p i P k [001] T , ∀ i ∈ 1 , 2 , 3 , k ∈ 1 , 2 , . . . (26) 16 Since p 12 = p 22 = p 32 , w e hav e p i P k [010] T = p 12 = p 22 = p 32 ∀ i ∈ 1 , 2 , 3 , k ∈ 1 , 2 , . . . (27) Also, p i P k [001] T = p i P k − 1 P [001] T = p i P k − 1 [ p 13 p 23 p 33 ] T ≤ p 33 , since p 33 ≥ p 23 ≥ p 13 b y the prop erty of the P matrix. Th us (S) holds for case 1. • 2. (S) is as follows: p i P k [001] T p 12 ≥ p i P k [010] T p 13 , ∀ i ∈ 1 , 2 , 3 , k ∈ 1 , 2 , . . . . F rom the symmetry pro p ert y (27), p 12 = p i P k [010] T . Also since p 13 ≤ p 23 ≤ p 33 w e can show p i P k [001] T ≥ p 13 . Th us (S) is satisfied for case 2. • 3. (S): p 23 p 1 P k [010] T ≥ p 22 p 1 P k [001] T . F rom Lemma 11, p 1 P k [001] T monoton- ically increases to p s s (3) a s k increases a s 0 , 1 , 2 , . . . . Since p 23 ≥ p ss (3) (using Lemma9), we hav e p 23 ≥ p 1 P k [001] T . Also, p 1 P k [010] T = p 22 from t he symmetry prop erty in (27). Th us (S) holds for case 3. • 4. (S): p 23 p 2 P k [010] T ≥ p 22 p 2 P k [001] T . F rom Lemma 10, p 2 P k [001] T mono- tonically decrease s from p 23 to p ss (3) a s k increases as 0 , 1 , 2 , . . . . Th us p 23 ≥ p 2 P k [001] T . This inequalit y along with the symmetry prop ert y (27) establishes (S) for case 4. • 5. (S): p 23 p 3 P k [010] T ≥ p 22 p 3 P k [001] T with k ≥ L . Note that for all k ≥ L , p 2 α ≥ p 3 P k α ⇒ α 2 p 22 + p 23 ≥ α 2 p 3 P k [010] T + p 3 P k [001] T ⇒ p 23 ≥ p 3 P k [001] T (28) where w e ha v e used the symmetry prop erty p 22 = p 3 P k [010] T in obtaining the last inequalit y . (S) is established b y using the symmetry prop ert y along with the preceding inequalit y . • 6. (S): p 3 P k [001] T p 22 ≥ p 3 P k [010] T p 23 with k < L . F or k < L , p 2 α < p 3 P k α . Expanding b oth the sides along the lines of case 5 and using the symme try prop erty of (27), (S) can b e established for case 6. Th us the sufficien t condition for the constrained searc h space optimality of the greedy p olicy is satisfied. References [1] R. Knopp a nd P . A. Hum blet, “Information capacit y and p o we r control in single cell m ultiuser comm unications,” Pr o c. IEEE International Confer enc e o n Co m munic a- tions, (Seattle, W A), pp. 33133 5, June 1995. 17 [2] P . Visw anath, D. Tse, a nd R. Laroia, “ Opp ortunistic b eamforming using dumb an- tennas,” IEEE T r an sactions on I nformation The ory, v ol. 48, no. 6, pp. 1277129 4, Jun. 2002. [3] R. W. Heath, M. Airy , and A. J. Paulra j, “Multiuser div ersit y for MIMO wireless sys- tems with linear receiv ers,” Pr o c. Asilomar Co n f. S i g n als, Systems, an d Computers, (P acific Gro v e, CA), pp. 11 941199, No v. 2 001. [4] A. Gy asi-Agye i, “Multiuser div ersit y based opp ortunistic sc heduling for wireless data net w o r ks,” IEEE Com munic ations L etters, v ol. 9, issue 7, pp. 67 0 -672, Jul. 2005. [5] J. Chung, C. S. Hw ang, K. Kim, and Y. K. K im, “A ra ndo m b eamforming tec hnique in MIMO systems exploiting multius er div ersit y ,” I EEE Journal on Sele cte d Ar e as in Communic ations, vol. 21, pp. 848855 , Jun. 20 03. [6] S. Murugesan, E. Uysal-Biyik oglu a nd P . Schn iter, “Optimization of T raining a nd Sc heduling in t he No n- Coheren t MIMO Multiple-Access Channel,” IEEE Journal on Sele cte d Ar e as in Co mmunic ations, vol. 25, no. 7, pp. 1446- 1456, Sep. 2007. [7] E. Gilb ert, “Capacity of a burst-noise c hannel,” Bel l Systems T e chnic al Journal, v o l. 39, pp. 125312 66, 1 960. [8] S. Lu, V. Bhargha v an, and R. Srik an t, “F air sc heduling in wireless pack et net works ,” IEEE/ACM T r ansactions on Networking, v ol. 7, no. 4, pp. 473489 , Aug. 1999 . [9] T. Nandagopal, S. Lu, and V. Bharghav a n, “A unified arc hitecture for the design and ev aluation of wireless fair queueing alg o rithms,” Pr o c. ACM Mobic om, Aug. 1999. [10] T. Ng, I. Stoica, and H. Zhang, “P ack et fair queueing alg orithms for wireless net- w orks with lo cation-dep endent errors,” Pr o c IEEE INFOCOM, (New Y ork), v ol. 3, 1998. [11] S. Shakk ottai a nd R. Srik an t, “Sc heduling real-time t r a ffic with deadlines o ver a wireless c hannel,” Pr o c. ACM Workshop on Wir eless an d Mobile Multime dia, (Seat- tle, W A), Aug. 1999 . [12] Y. Cao and V. Li, “Schedu ling algorithms in broadband wireless netw orks,” Pr o c. IEEE, v ol. 8 9 , no. 1, pp. 7687, Jan. 2001. [13] Y. C. Lia ng and R. Zhang, “Multiuser MIMO Systems with Ra ndom T ransmit Beamforming,” International Journal of Wir eless Inf o rmation Networks, v o l. 12, no.4 , pp. 235-24 7, D ec. 2 0 05. [14] M. Z orzi and R . Rao, “Error con tro l and energy consumption in commun ications for nomadic computing,” IEEE T r ansaction s on C o mputers, vol. 46, pp. 279 -289, Mar. 1997. [15] L. A. Johnston and V. Krishnamurth y , “ Opp ortunistic F ile T ransfer o ver a F ading Channel: A POMDP Searc h Theory F or mulation with O pt imal Threshold P o licies,” IEEE T r ansactions on Wir eless Communic ations, v ol. 5 , no. 2, F eb. 2006. 18 [16] S. H. A. Ahmad, M. Liu, T. Ja v adi, Q. Zhao, and B. Krishna- mac hari, “Optimalit y of m ypoic sensing in m ulti-c hannel opp ortunistic access,” IEEE T r ans. on I nformation The ory, sub- mitted May 200 8. ( http://www. ece.ucdavi s.edu/ ∼ qzhao/Journal.html ) . [17] Q. Zhao and B. Krishnamac hari, “Optimality o f m yp o ic p olicty in opp ortunistic access with noisy observ ations,” I EEE T r ans. on Aut omatic Contr ol , submitted F eb . 2008. ( http://www .ece.ucdavis.edu/ ∼ qzhao/Journal.html ). [18] S. Murugesan, P . Schn iter, and N. B. Shroff, “Multiuser Sc heduling in a Mark ov- mo deled Downlink En vironmen t,” Pr o c. Al lerton Conf. on Communic ation, Contr ol, and Computing, (Mon ticello, IL), Sept. 20 08. [19] Marius Iosifescu, “Finite Mark o v Pro cesses and Their Applications,” John Wiley and Sons , 1980. [20] R. D. Smallw o o d a nd E. J. Sondik, “The Optimal Con tro l of P ar tially Observ able Mark ov Pro cesses Ov er a F inite Horizon,” Op er ations R ese ar ch, Sep. 19 73. [21] S. Christian Albrigh t, “St r uctura l Results for P ar t ia lly Observ able Marko v Decision Pro cesses,” Op er ations R ese ar ch, v ol. 2 7 , no. 5, pp. 1041- 1053, Sep.-Oct. 1979. [22] C. C. White and W. Scherer, “Solution pro cedures for partially observ ed Mark ov decision pro cesses,” Op er ations R ese ar ch, pp. 7 9 1797, 1985. [23] G . E. Monahan, “A surv ey of partia lly observ able Mark ov decision pro cesses : The- ory , Mo dels, and Algorithms,” Management Sc i e nc e, v ol. 28, no. 1, pp. 116, Jan. 1982. 19
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment