Optimal Medium Access Control in Cognitive Radios: A Sequential Design Approach
The design of medium access control protocols for a cognitive user wishing to opportunistically exploit frequency bands within parts of the radio spectrum having multiple bands is considered. In the scenario under consideration, the availability prob…
Authors: Lifeng Lai, Hesham El Gamal, Hai Jiang
OPTIMAL MEDIUM A CCESS CONTR OL IN COGNITIVE RADIOS: A SEQ UENTIAL DESIGN APPR O A CH Lifeng Lai 1 , Hesham El Gamal 2 , Hai Jiang 3 and H. V incent P oor 1 1 Dept. of Elec. Eng., Princeton Univ ., { llai, poor } @princeton.edu 2 Dept. of Elec. and Comp. Eng., Ohio State Univ ., helgamal@ece.osu.edu 3 Dept. of Elec. and Comp. Eng., Univ . of Alberta, hai.jiang@ece.ualberta.ca ABSTRA CT The design of medium access control protocols for a cogni- ti ve user wi shing to opportun istically exploit fre quency band s within p arts of the radio spectru m having m ultiple b ands is considered . In the scenario u nder co nsideration , the av ail- ability probab ility of each chann el is unknown a prio ri to the cognitive user . Henc e efficient medium access strategies must strike a ba lance between exploring the availability of c hannels and exploiting the op portunities identified thus far . Using a sequential design ap proach, an optimal mediu m acc ess strat- egy is deriv ed. T o a void the prohib iti ve com putational co m- plexity o f this optimal strategy , a lo w complexity asym ptoti- cally op timal strategy is also developed. T he pro posed strat- egy does not require an y prior statistical knowledge ab out the traffic pattern on the different channels. Index T erms — Cognitive radio, bandit pr oblem, medium access control. 1. INTRODUCTION As a p romising tech nique to in crease spectral efficiency o f overcrowded p arts of the radio spectrum , the o pportu nistic spectrum access prob lem has been the foc us of sig nificant research activities [1 ]. The underlyin g idea is to allow un- licensed user s (i.e., cog nitiv e users) to access the av ailable spectrum when the licensed users (i.e ., primary users) are not activ e. The presen ce of high priority prim ary users and the requirem ent that th e cognitive users sho uld not inter fere with them introd uce n ew challen ges f or proto col desig n. The over - arching g oal of the cur rent work is to develop a un ified f rame- work for the design of efficient, and low comp lexity , cognitiv e medium access protocols. The spectral opportun ities av ailab le to cogniti ve users are by their natur e time-varying. T o avoid interfering with the primary n etwork, co gnitive users must first prob e to deter- mine whethe r there a re primary activities bef ore transmission. Under the assumption that each co gnitive user canno t access all of the available channels simultaneo usly , the m ain task of the mediu m access p rotocol is to d istributi vely choo se which channels each cog nitiv e user shou ld attempt to use in different time slots, in order to fully (or maximally) utilize the spectral This research wa s supported by the National Science Foundatio n under Grants ANI-03-3880 7 and CNS-06-2563 7. oppor tunities. This d ecision pro cess can be enhan ced by tak- ing into accoun t any av ailable statistical info rmation abou t the primary traffic. F or examp le, with a sing le cognitive user ca- pable of accessing (sen sing) on ly one c hannel at a time, the problem becomes trivial if the p robability that each channel is free is known a priori . I n this case, the optimal rule is for the cognitive user to access the cha nnel with the high est prob abil- ity of being free in all time slots. Howev er , such time-v arying traffic info rmation is typically n ot av ailable to the cognitive users a priori . The need to lear n this infor mation o n-line creates a fun damental tradeo ff betwee n explo itation and ex- ploration . Exp loitation ref ers to th e sho rt-term gain resulting from accessing the chann el with the estimated highest proba - bility of bein g f ree ( based on th e resu lts o f previous sensing decisions) wher eas exploration is the process by which a cog- nitiv e user learn s the statistical behavior of the pr imary tra f- fic (by choosing possibly d ifferent chann els to pr obe across time slots). I n the presence of multiple cognitive users, the medium access algorith m must also accou nt for the com peti- tion between different users over the same channel. In this pape r , we develop a un ified framework for the de- sign and analysis of cognitive mediu m access protocols in the presence o f a single cogn iti ve user wh o can access a single channel in each time slot. As ar gued in the sequel, this frame- work allo ws f or the construction of strategies that strike an optimal b alance between exp loration and exploitatio n. W e derive an op timal sensing rule that maximizes the expected throug hput obtained by the cognitive user . Comp ared with a genie-aide d scheme, in which the cogn iti ve user knows a pri- ori the pr imary network traffic informatio n, there is a throug h- put loss suf fered by any med ium access strategy . W e obtain a lower bo und o n this loss and f urther con struct a linear com- plexity sing le index p rotocol that ach iev es this lo wer bou nd asymptotically (when the primar y traffic beh avior chang es slowly). Similar ap proach es have been conside red in [3] and [4] , but with different emphases. W e have also extended our study to network s with mu lti- ple cog nitiv e users and networks with more capable cogn iti ve users, an d hav e developed optimal strategies for th ese sce- narios. Howe ver , due to space limitation s, we do n ot discu ss these results he re. W e also om it the pr oofs of results p resented in this paper . Inter ested readers can refer to [5] for details. The r est o f this paper is organized a s fo llows. Our net- work mod el is detailed in Section 2. Sectio n 3 d ev elops and analyzes an optima l s trategy for the single cognitiv e user sce- nario. Finally , Section 4 summarizes our conclusion s. 2. NETWORK MODEL Figure 1 shows the channel model o f inter est. W e consider a primary network consisting of N non-overlapp ing chan- nels, N = { 1 , · · · , N } , eac h with ban dwidth B . The users in the prim ary network are operated in a synchrono us time - slotted fashion. W e assume that at each time slot, chan nel i is fr ee with prob ability θ i . Let Z i ( j ) be a rand om variable that equals 1 if channel i is f ree at time slot j and equ als 0 otherwise. Hence, gi ven θ i , Z i ( j ) is a Bernoulli random vari- able with distribution h θ i ( z i ( j )) = θ i δ (1) + (1 − θ i ) δ (0) , where δ ( · ) is a delta fun ction. Fu rthermo re, f or a given θ = ( θ 1 , · · · , θ N ) , the Z i ( j ) are indepen dent for each i and j . W e consider a b lock varying mode l in which th e value of θ is fixed for a block of T time slots and then rand omly chan ges at the beginning of the next blo ck according to a joint prob a- bility density function (pdf) f ( θ ) . Channel 1 Channel 2 Channel N t=1 t=T Occupied by the primary users Spectrum opportunities Fig. 1 . Chan nel model. In our model, the cognitive users attempt to exploit the av ailab ility of f ree ch annels in the primary n etwork by sens- ing the activity at the beginnin g of each time slot. Our work seeks to characteriz e efficient strategies fo r cho osing which channels to sense (access). The challenge here stems from the fact that the cogniti ve users are assumed to be unaware of θ a priori . W e consider two cases in which a cognitive user either has or does not have prior information about the pdf of θ , i.e. , f ( θ ) . In the scenario presente d in this paper, at time slot j , a single cogn iti ve user selects one channel S ( j ) ∈ N to access. If the sensing result shows that chan nel S ( j ) is free, i.e. , Z S ( j ) ( j ) = 1 , the cognitive user can sen d B bits ov er this chann el; oth erwise, the c ognitive user will wait until the next time slot an d pick a possibly dif ferent channel to access. Therefo re, the to tal number o f bits that the cognitive user is able to send over one block (of T time slots) is W = T X j =1 B Z S ( j ) ( j ) . It is clear that W is a ran dom v ariable that depend s on the traffic in the pr imary n etwork an d, mor e imp ortantly for us, the m edium access pr otocols employed by th e cognitive user . T herefor e, the overarching goal o f this paper is to con- struct low complexity mediu m access p rotocols that m axi- mize E { W } . Intuitively , the co gnitive user would like to select the ch an- nel with th e hig hest prob ability o f being f ree in order to ob- tain m ore transmission opp ortunities. If θ is known then this problem is trivial: the cog nitiv e user should choose the chan - nel i ∗ = arg max i ∈N θ i to sense. The u ncertainty in θ imposes a fund amental tradeoff between exploration, in order to lea rn θ , and exploitatio n, by accessing the channel with the highest estimated free pr obability ba sed on current av ailable infor ma- tion, as detailed in the following section. 3. O PTIMAL MEDIUM A CCESS PRO TOCOLS W e start by developing the op timal solution u nder th e ide- alized assump tion that f ( θ ) is k nown a priori b y the cog- nitiv e user . As we will see, this optimal me dium access al- gorithm suffers fro m a pro hibitive com putational comp lexity that g rows expon entially with th e b lock leng th T . This mo- ti vates the d esign of low complexity asymptotically op timal approa ches, which we also consider . Our cogn iti ve me dium access problem belo ngs to the class of ban dit p roblems. In this setting , the decision maker m ust sequentially choo se one process to ob serve fro m N ≥ 2 stochas- tic processes. These p rocesses usu ally have parameter s that are unknown to the decision maker and, associated with each observation is a utility function. The objective o f the deci- sion maker is to maximize the sum or discoun ted sum of th e utilities via a strategy th at specifies which proce ss to observe for every po ssible history o f selections and ob servations. A compreh ensiv e treatment covering different variants of ban- dit problems can be found in [2]. W e are now r eady to rigo rously fo rmulate ou r pr oblem. The cogn iti ve user emp loys a m edium access strategy Γ , which will select chann el S ( j ) ∈ N to sen se at time slot j for any possible causal info rmation pattern obtained throu gh the pre- vious j − 1 o bservations: Ψ( j ) = { s (1) , z s (1) (1) , · · · , s ( j − 1) , z s ( j − 1) ( j − 1) } , j ≥ 2 , i.e. s ( j ) = Γ( f , Ψ( j )) . Notice that z s ( j ) ( j ) is the sensing outco me of the j th time slot, in which s ( j ) is the chan nel bein g ac cessed. If j = 1 , there is no ac- cumulated information , and thus Ψ(1) = φ and s (1) = Γ( f ) . The utility that the cognitive user ob tains by making decision S ( j ) at time slot j is the nu mber of bits it can transmit at time slot j , whic h is B Z S ( j ) ( j ) . W e denote th e expected value of the p ayoff ob tained by a cogn iti ve u ser who uses strategy Γ as W Γ = E f T X j =1 B Z S ( j ) ( j ) . (1) W e furthe r deno te V ∗ ( f , T ) = sup Γ W Γ , which is the largest throug hput tha t the c ognitive user co uld obtain when the spec- tral op portunities are gov erned b y f ( θ ) and the e xact v alu e of each realization of θ is not known a priori by the user . Each med ium access decision made by the cog nitiv e user has two effects. Th e first one is the short-term g ain, i.e., an immediate transmission op portun ity if the c hosen chann el is found free. Th e second one is the long -term gain, i.e., the updated statistical inform ation abo ut f ( θ ) . Th is in formation will help the co gnitive user in makin g be tter decisions in f u- ture stages. T here is an interesting trad eoff betwee n the short and long- term gains. If we on ly want to max imize th e shor t- term gain , we can choo se th e chann el with the highest avail- ability probab ility to sense, based on the curr ent information. This myopic strategy maximally explo its the existing infor- mation. On the other hand, by choosing o ther channe ls to sense, we g ain statistical information abo ut f ( θ ) wh ich can effecti vely guid e fu ture d ecisions. This p rocess is typically referred to as exploration, as noted pre viously . More specifically , let f j ( θ ) be the up dated pdf after mak- ing j − 1 ob servations. W e b egin with f 1 ( θ ) = f ( θ ) . Af - ter o bserving z s ( j ) ( j ) , we up date the pdf using th e fo llowing Bayesian formula. 1. If z s ( j ) ( j ) = 1 , f j +1 ( θ ) = θ s ( j ) f j ( θ ) R θ s ( j ) f j ( θ ) d θ , 2. If z s ( j ) ( j ) = 0 , f j +1 ( θ ) = ( 1 − θ s ( j ) ) f j ( θ ) R ( 1 − θ s ( j ) ) f j ( θ ) d θ . The following result char acterizes the optimal strategy that maximizes the average th rough put the cog nitiv e user o btains from the network. Lemma 1 F or a ny p rior p df f , there e xists an o ptimal strat- e g y Γ ∗ to the chan nel selection p r oblem ( 1) , and V ∗ ( f , T ) is achievable. Mor eover , V ∗ satisfies the following cond ition: V ∗ ( f , T ) = max s (1) ∈N E f B Z s (1) + V ∗ f Z s (1) , T − 1 , (2 ) wher e f Z s (1) is the cond itional distrib utio n updated using the Bayesian rule described a bove, a s if the cognitive u ser chooses s (1) an d ob serves Z s (1) . Also , V ∗ f Z s (1) , T − 1 is the v alue of a band it pr ob lem with p rior in formation f Z s (1) and T − 1 sequential observations. In princ iple, Lemma 1 provid es the solu tion to pro blem (1). Effecti vely , it deco uples th e calculation a t eac h stage, and hence, allows the use of dy namic prog ramming to solve the problem . The idea is to so lve the ch annel selectio n p roblem with a smaller d imension fir st an d then use ba ckward dedu c- tion to ob tain the optim al s olution for a problem with a larger dimension. Starting with T = 1 , the second ter m inside th e expectation in (2) is 0, sin ce T − 1 = 0 . Hence, the optimal so- lution is to choo se the channel i having the largest E f { B Z i } , which can be calculated a s E f { B Z i } = B R θ i f ( θ ) d θ . An d V ∗ ( f , 1) = max i ∈N E f { B Z i } . W ith the solution for T = 1 at hand, we can now solve th e T = 2 case using (2). At first, for every po ssible choice o f s (1) an d possible observation z s (1) , we calculate th e update d distrib ution f z s (1) using the Bayesian formula. Next, we calculate V ∗ ( f z s (1) , 1) (which is equiv alent to the T = 1 pro blem described above). Finally , applying (2), we have the f ollowing equation for th e chann el selection problems with T = 2 : V ∗ ( f , 2) = max i ∈N Z [ B θ i + θ i V ∗ ( f z i =1 , 1) +(1 − θ i ) V ∗ ( f z i =0 , 1)] f ( θ ) d θ . Hence, in the first step, the cog nitiv e u ser should choo se i ∗ (1) = arg max i ∈N V ∗ ( f , 2) to sense. After ob serving z i ∗ (1) , the co gni- ti ve user h as Ψ(1 ) = { z i ∗ (1) } , and it shou ld choo se i ∗ (2) = arg max i ∈N V ∗ ( f z i ∗ (1) , 1) . Similar ly , after so lving th e T = 2 problem , o ne can proce ed to solve the T = 3 ca se. Using this proced ure recur si vely , we can solve the problem with T − 1 observations. Finally , o ur or iginal pro blem with T obser va- tions is solved as follo ws. V ∗ ( f , T ) = ma x i ∈N Z [ B θ i + θ i V ∗ ( f z i =1 , T − 1) +(1 − θ i ) V ∗ ( f z i =0 , T − 1)] f ( θ ) d θ . The optimal solution dev eloped abov e suffers from a pro- hibitive comp utational co mplexity . In particular, the d imen- sionality of our search dimen sion g rows expo nentially with the block len gth T . M oreover , one can envision many pr ac- tical scenarios in which it would be difficult for the cogni- ti ve user to o btain the prior inf ormation f ( θ ) . This moti- vates ou r pur suit of lo w complexity non-parametr ic pro tocols which main tain cer tain optimality pro perties and do not d e- pend on f ( θ ) explicitly . Hence, in the f ollowing, we aim to develop strate gies that depend on ly on the info rmation ob- tained through observations Ψ . For a given strategy Γ , the expected n umber of bits th e cognitive user is able to tran smit thro ugh a b lock with g iv en parameters θ is E T X j =1 B Z S ( j ) ( j ) = T X j =1 B N X i =1 θ i Pr { Γ(Ψ( j )) = i } . Recall th at Γ(Ψ( j )) = i means that, following strategy Γ , th e cognitive user should ch oose cha nnel i in time slot j , based on the av ailab le info rmation Ψ( j ) . Her e Pr { Γ(Ψ( j )) = i } is the proba bility that the cog nitiv e user will choo se chann el i at time slot j , following the s trategy Γ . Compared with th e ide alistic case where the exact value of θ is known, in which the optimal strategy for the cog nitiv e user is to always cho ose the channel with the largest availabil- ity probability , the loss incur red by Γ is giv en by L ( θ ; Γ) = T X j =1 B θ i ∗ − T X j =1 B N X i =1 θ i Pr { Γ(Ψ( j )) = i } , where θ i ∗ = ma x { θ 1 , · · · , θ N } . W e say that a strategy Γ is consistent if, fo r any θ ∈ [0 , 1] N , there exists β < 1 such that L ( θ ; Γ) scales as O ( T β ) . In the sequel, we u se the following notatio ns 1) g 1 ( N ) = ω ( g 2 ( N )) means th at ∀ c > 0 , ∃ N 0 , such tha t ∀ N > N 0 , g 2 ( N ) < cg 1 ( N ) ; 2) g 1 ( n ) = O ( g 2 ( N )) m eans that ∃ c 1 , c 2 > 0 a nd N 0 , such that ∀ N > N 0 , c 1 g 2 ( N ) ≤ g 1 ( N ) ≤ c 2 g 2 ( N ) . For example, consider a loyal scheme in which the cognitiv e user selects channel i at th e beginning of a block and sticks to it. If θ i is the largest o ne a mong θ , L ( θ ; Γ) = 0 . On th e other han d, if θ i is not the largest on e, L ( θ ; Γ) ∼ O ( T ) . Hen ce, th is loyal scheme is not consistent. The following lemma characterizes the fundam ental limits of any consistent scheme. Lemma 2 F or any θ and any con sistent str ate gy Γ , we have lim inf T →∞ L ( θ ; Γ) ln T ≥ B X i ∈N \{ i ∗ } θ i ∗ − θ i D ( θ i || θ ∗ i ) , (3) wher e D ( θ i || θ l ) denotes th e Kullbac k-Leibler diverg ence b e- tween the two Bernou lli random variab les with parameters θ i and θ l r espe ctively: D ( θ i || θ l ) = θ i ln θ i θ l +(1 − θ i ) ln 1 − θ i 1 − θ l . Lemma 2 shows that the loss of any con sistent s trategy s cales at least as ω (ln T ) . An intuitive explanatio n of th is loss is that we need to spend at least O (ln T ) time slots on sampling each of the channels with smaller θ i , in ord er to get a reason- ably accurate estimate of θ , an d hence use it to determine the channel having the largest θ i to sense. W e say that a strategy Γ is order optimal if L ( θ ; Γ) ∼ O (ln T ) . Before procee ding to t he prop osed lo w co mplexity or der- optimal strategy , we first ana lyze the loss or der of some heu ris- tic strategies which m ay appear to be reasonable. The first simple rule is the ran dom stra tegy Γ r where, at each time slot, the cognitive user rando mly ch ooses a channel from the av ailable N cha nnels. The fraction of time the cog - nitiv e u ser spends on each c hannel is therefor e 1 / N , lead ing to the loss L ( θ ; Γ r ) = B N P i =1 ( θ i ∗ − θ i ) N T ∼ O ( T ) . The second one is the myop ic rule Γ g in which t he cogn i- ti ve user k eeps updatin g f j ( θ ) , and chooses the channel with the largest value o f ˆ θ i = R θ i f j ( θ ) d θ at each stage. Since there are no convergence guar antees for the myopic rule, that is ˆ θ may never converge to θ due to the lack of sufficiently many sam ples for each chan nel [6], the loss of this myopic strategy is O ( T ) . The third pro tocol we consider is staying with the winner and switching fr o m the loser rule Γ S W where the co gnitive user r andomly ch ooses a channel in the first time slot. In the succeeding time-slots 1) if the accessed cha nnel w a s found to be free, it will choose the same chann el to sense; 2) o ther- wise, it will choose one of the remaining channels based on a certain switching rule. Lemma 3 No matter what the switching rule is, L ( θ ; Γ S W ) ∼ O ( T ) . There are sev eral strategies th at have lo ss of order O (ln T ) . W e adop t the follo wing linear complexity s trategy from [7]. Rule 1 (Or der optimal single index str a te g y) The cognitive u ser main tains two vectors X and Y , where each X i r eco r d s the nu mber of time slots in which th e cogni- tive user has sensed channel i to be fr ee, and each Y i r eco r d s the numb er of time slots in which the cognitive user h as cho- sen channel i to sense. The strate g y works as follows. 1. Initia lization: at th e beginning of each block, each chan- nel is sensed once. 2. After the in itialization period, th e cognitive user ob - tains an estimate ˆ θ at the be ginnin g o f time s lot j , given by ˆ θ i ( j ) = X i ( j ) / Y i ( j ) , an d assigns an index Λ i ( j ) = ˆ θ i ( j ) + p 2 ln j / Y i ( j ) to the i th channel. The cognitive user choo ses the channel with th e larg est value o f Λ i ( j ) to sense at time slot j . After each sensing, the cognitive user upda tes X and Y . The in tuition beh ind this strategy is that as long as Y i grows as fas t as O (ln T ) , Λ i conv erges to the true value of θ i in prob ability , and the c ognitive user will choose the chan- nel with the largest θ i ev entually . The loss of O (ln T ) co mes from the time spent in sampling the inferior cha nnels in order to learn the value o f θ . This p rice, howe ver, is in evitable as established in the lower bound of Lemm a 2. Finally , we o bserve tha t the dif ference betwee n the my- opic rule and the order optimal single index rule is the ad- ditional term p 2 ln j / Y i ( j ) added to the c urrent estimate ˆ θ i . Roughly speakin g, th is addition al term guaran tees eno ugh sam- pling time for each c hannel, since if we samp le chann el i too sparsely , Y i ( j ) will be small, whic h will in crease th e pr oba- bility th at Λ i is th e largest index. When Y i ( j ) scales as ln T , ˆ θ i will be th e do minant term in the index Λ i , and hence the channel with th e largest θ i will be cho sen much more fre- quently . 4. CO NCLUSIONS This work has developed a u nified fr amew ork for the design and ana lysis of co gnitive med ium access based on the classi- cal bandit p roblem. Our fo rmulation highligh ts the tradeoff between e xplor ation and exploitation in cognitiv e chan nel se- lection. A lin ear co mplexity cognitive mediu m access algo- rithm, which is asympto tically optimal as the number of time slots increases, has also been proposed . 5. REFERENCES [1] J. Mitola, “Cognitive radio : Making software r adios more per sonal, ” IEEE P ersonal Communicatio ns , vol. 6, pp. 13–18, Aug. 1999. [2] D. A. Berry an d B. Fristedt, Band it Pr ob lems: Sequ en- tial Allo cation o f Exp eriments . Lond on: Chap man an d Hall, 1985. [3] A. Motamedi and A. Bah ai, “Dyn amic channel selec- tion for spectru m sharing in un licensed bands, ” Eur o- pean T rans. on T elecommun ications and Related T ech- nologies , 2007. Sub mitted. [4] Q. Zhao , L. T o ng, A. Swami, and Y . Chen, “Decen- tralized cog nitiv e MAC for opportunistic spe ctrum ac- cess in ad hoc networks: A POMDP fr amew ork, ” IEEE Journal on Selected Areas In Communicatio ns , vol. 25, pp. 589–60 0, Apr . 200 7. [5] L. Lai, H. El Gamal, H. Jiang, an d H. V . Po or , “Cogn i- ti ve medium acce ss: explora tion, exploitation and com- petition, ” IEEE/AC M T rans. on Networking , 2007 . Sub- mitted, a vailable at www . princeton .edu/ ∼ llai. [6] P . R. Kumar, “ A survey of some results in stochastic adaptive control, ” SIAM Journal o n Contr ol and Opti- mization , vol. 23 , pp. 329–380 , May 1985. [7] P . Auer , N. Cesa-Bianchi, and P . Fischer , “Finite-time analysis of the multiarmed band it problem, ” Machine Learning , vol. 47, pp. 23 5–256 , 2002 . Kluwer Aca- demic Publishers.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment