Distributed Inference with M-ary Quantized Data in the Presence of Byzantine Attacks

1 Distrib uted Inference wi th M-ar y Quantized Data in the Presence of Byzantine Attacks V . Sriram Siddhardh (Sid) Nadendla, Student Member , I EEE, Y un ghsiang S. Han, F ellow , I EEE, and Pramod K. V arshney , F ellow , IEEE Abstract —The problem of di stributed inference with M-ary quantized data at the sensors is inv estigated in th e p resence of Byzantine attacks. W e assume that th e attacker does not hav e knowledge about either the true state of the phenomenon of interest, or the quantization th resholds used at the sensors. Theref ore, the Byzantine nodes attack the in ference netw ork by modifying modifying the symbol corr esponding to the quantized data to one of the other M symbols in th e quan tization alphabet- set and t ransmitting th e false symbol to the fusion center (FC) . In this paper , we ﬁnd the optimal Byzantine attack that blinds an y distributed inference network. As the quanti zation alphabet size increases, a tr emendous impro vement in the security perf ormance of the di stributed inf erence network is observ ed . W e also inv estigate the problem of distributed inference in the presence of r esource-constrained Byzantine attacks. In particular , we f ocus ou r attention o n two pro blems: distributed detection and distributed estimation, wh en the Byzantine attacker employs a highly-symmetric attack. For both th e p roblems, we ﬁn d the optimal attack strategies empl oyed b y the attacker to maximally degrade the perfo rmance of the inference network. A reputation- based scheme fo r id entifying malicious nodes is also p resented as th e network’ s strategy to mitigate the impact of Byzantine threats on the inf erence perform ance of th e distributed sensor network. Index T erms —Distributed In ference, Network-Security , S en- sor Networks, Byzantin e Attacks, Kullback-Leib ler Diver gence, Fisher Informa tion. I . I N T RO D U C T I O N Distributed inference in sensor networks has been wide ly studied by se veral schola rs in the p ast three decades (See [1]– [11] and refere nces therein). The distributed inference fra me- work c omprises of a g roup of spatially d istributed sensor s which acqu ire o bservations about a pheno menon of interest (POI) an d send pr ocessed data to a fusion center ( FC) wh ere a g lobal inf erence is made. Due to resource- constraints in sensor ne tworks, this data is pr ocessed at the sensors in such a way that th e observations are m apped to symbols f rom an alphab et set o f size M, prior to transm ission to the FC. When M = 2 , we emp loy binar y quantizatio n to ge nerate processed data. Whe n M > 2 , we send an M-ary sy mbol that is assumed to be generated via ﬁne quantization. A sensor d ecision rule is assumed to be characterized by a set of q uantization thresho lds. I n this paper, we u se the phrases ‘ mapped to on e of the M- ary symbols ’ an d ‘ qua ntized to an V . Sriram Siddhardh (Sid) Nadendl a and Pramod K. V arshn ey are with t he Departmen t of Electric al Engineering and Computer Scie nce, Syracuse Uni- versi ty , Syracuse, NY 13201, USA. E-mail: { vnadendl, varshne y } @syr .edu. Y unghsiang S. Han is with the Department of Electric al Engineering, Nationa l T aiwan Univ ersity of Science and T echnology , T aipei, T aiwan. E- mail: yshan@mail.ntust.edu.tw. M-ary symbol ’ interch angeab ly . A lot of work in the past has focussed on th e b inary quantization case, i.e., M = 2 . I n this paper, we con sider the case of more ge neral M , M = 2 being a spe cial case. Th e fr amew ork of distributed inferen ce networks has been extensively stu died for different ty pes of inference pr oblems such as detectio n (e.g., [ 1], [3], [ 5]–[8], [12]), estimation (e.g., [3], [9], [10]) , and track ing (e.g., [ 3], [11]) in the presen ce of both ideal and non- ideal channels. In this pap er , we fo cus o ur attention on two d istributed in ference problem s, n amely detec tion and estimation in the fram ew o rk of distributed infere nce, where sensors quantize their data to M-ary symb ols. Although the area of sensor n etworks h as been a very activ e ﬁeld of research in th e pa st, security prob lems in sensor networks have g ained atten tion only in th e last decad e [13]– [15]. As th e secur ity thr eats have ev o lved more speciﬁcally directed towards infe rence network s, attempts have been mad e at the system-level to either prevent or mitigate these threats from d eteriorating th e network perf orman ce. While there are many ty pes o f security threa ts, in this p aper, we address the problem of one such attack, called the Byzantine attack, in the con text of distributed infer ence ne tworks ( see a recen t survey [16] by V empaty et al. ). Byzan tine a ttacks (pr oposed by Lamp ort et al . in [1 7]) in ge neral, are arb itrary and may refer to many types of maliciou s behavior . In th is paper, we focus only on th e data-falsiﬁcation asp ect of the Byzantin e attack wherein one or more compromised nodes of the network send false inform ation to the FC in or der to d eteriorate the inference perfor mance o f th e network. A well known example of this a ttack is the man-in -the-midd le attack [1 8] wher e, on one hand, the attacker co llects data from the sensors whose au thentication p rocess is c ompro mised by th e attac ker emulating as the FC, while, on the o ther han d, the attac ker sends false info rmation to the FC using the comp romised sensors’ identity . In summary , if the i th sensor’ s authen tication is comp romised, th e attacker remains invisible to the n etwork, accepts the true decision u i from the i th sensor an d sends v i to the FC in o rder to deterior ate the inference p erform ance. Marano et al. , in [19], an alyzed the Byzantine attac k on a network o f sensor s ca rrying out the task of distributed detection, where the attacker is assumed to have com plete knowledge about the hypo theses. This rep resents the extreme case of Byzan tine nodes having an extra power of kn owing the true hy pothesis. In their model, they assumed th at the sensors qu antized their respecti ve o bservations into M -ary symbols, wh ich ar e later f used at the FC. The Byzantin e nodes pic k sym bols using an o ptimal probab ility d istribution 2 that are conditioned on the tru e hy potheses, and tran smit them to th e FC in order to m aximally degrade the d etection perfor mance. Rawat et al. , in [ 20], also considered t he problem of d istributed detection in the presence of Byzan tine attack s with binar y quantizers at the sensors in their analysis. Unlike the auth ors in [1 9], Rawat et al. did n ot assume complete knowledge of the true hypothe ses at th e Byzantine attacker . Instead, they assume d that th e Byzantin e n odes der iv e th e knowledge abo ut the true hyp otheses fr om their own sensing observations. In oth er words, a Byzantine node pote ntially ﬂips the lo cal decision mad e at the n ode. It d oes not mod ify the thr esholds at the sensor quantizers. Rawat et al. also analyzed th e p erform ance of the network in the pr esence o f indepen dent and collabo rative Byzan tine attacks and modeled the pro blem as a zero- sum game betwee n the sensor network and the Byz antine attacker . In addition to the an alysis of distributed d etection in the p resence of Byzantine attacks, a reputation -based scheme was prop osed by Rawat et al. in [20] fo r identify ing the Byzantine node s b y accumu lating the deviations betwee n eac h sensor’ s decision and the FC’ s decision over a time wind ow of dur ation T . If the accumulated number of deviations is g reater than a prescribed thr eshold for a giv en nod e, then the FC tags it as a Byzantine node. In ord er to mitigate the attack, th e FC removes no des which are tag ged Byzantine node fr om the fusion rule. An other mitigation scheme was proposed by V empaty et al. [21 ] , where each sensor ’ s b ehavior is learnt over time and comp ared to the known behavior of the h onest nod es. Any sign iﬁcant d eviation in the lea rnt beh avior from the expected h onest beh avior is labelled By zantine n ode. Having learnt th eir pa rameters, the authors also pr oposed the u se of this in formatio n to adapt their fusion rule so as to maximize the p erform ance of th e FC. In contrast to the parallel to pology in sensor networks, Kailkh ura et al. in [22] in vestigated th e prob lem o f Byzantine attack s on distributed d etection in a hierarch ical senso r network. They presented the op timal Byzantine strategy wh en the sensor s commun icate th eir decision s to th e FC in multiple ho ps of a balanc ed tree. They a ssumed that the co st of compro mising sensors at different lev els of the tree varies, and fo und the optimal Byzantine strategy that minim izes the cost of attacking a g iv e n hierarch ical network. Soltanmoh ammadi et a l. in [ 23] in vestigated the pr oblem of d istributed d etection in the presence o f different types of Byzantine nod es. Each Byzantine node type co rrespond s to a different oper ating p oint, and, therefore, the authors considered the pro blem of identifyin g dif ferent Byzantine nodes, alon g with th eir o perating poin ts. The problem of maximum -likelihood (ML) estimation of the operating p oints was formulated a nd s olved using the expectation-m aximization (EM) algo rithm. Once th e Byzantine no de operating po ints are estimated, th is informatio n was utilized at the FC to mitigate the malicious ac ti vity in the network, an d also to impr ove global detection perf ormanc e. Distributed target localization in th e presen ce of Byzantine attacks was add ressed by V emp aty et al. in [ 24], where the sen sors quantize th eir o bservations into binary decisions, which ar e transmitted to the FC. Similar to Rawat et a l. ’ s approa ch in [20], the authors in [24] in vestigated the problem of distributed target localizatio n fro m both th e network’ s and Byzantine attacker’ s persp ectiv es, ﬁrst by id entifying the optimal Byzantine attack an d seco nd, mitigating the impac t of the attack with the use of no n-iden tical quan tizers at the sensors. In this paper, we extend the fr amew ork of Byza ntine attacks when Byzan tine n odes do no t have comp lete k nowledge about the true state o f the phe nomeno n-of- interest ( POI), and when the sensors generate M-ary symbols instead of binary symbols. W e also assume that the Byz antine attacker is ignoran t about the qu antization th resholds used at th e sensor s to generate the M-ary symbols. 1 Under these assump tions, we ad dress two inference p roblem s: b inary hy potheses-testing and p arameter estimation. The main contributions of the paper are three-fo ld. First, we deﬁne a Byzantine attac k mo del f or a sensor network with individual sensors qu antizing the ir obser vations into o ne o f the M-ary symbols, when the a ttacker does not hav e complete knowledge about the true state o f the POI and thr esholds employed by th e sen sors. W e mo del the attack strategy as a ﬂipping pr obability matrix, where ( i, j ) th entry repre sents the p robability with which the i th symbol is ﬂippe d into the j th symbol. Seco nd, we sh ow th at q uantization into M-ary symbols at the sensor s, as o pposed to binary quan tization, improves both inference as well as secur ity p erform ance simultaneou sly . As a fu nction of the nu mber of Byzantine nodes in th e network, we de riv e the optimal ﬂipping m atrix. Finally , we extend the mitigation scheme presented by Rawat et a l. in [2 0] to the more general case where sensors generate M-ary symbols. W e present simulation r esults to illu strate the perfor mance of the reputatio n-based scheme for the identiﬁ- cation o f Byzantin e nod es in th e n etwork. The rema inder o f the p aper is organ ized as follows. I n Section II, we descr ibe our system m odel an d present the Byzantine attack mode l for the ca se wh ere sensors gener ate M-ary symb ols w hen the a ttacker has n o kn owledge a bout the true state of the ph enomen on of intere st and quan tization thresholds employed by the sensors. Next, we determ ine the most powerful attack strategy that the By zantine nodes can adopt in Section III. In the case of resource- constrained Byzantine attacks, where the attacker can not comp romise enoug h numbe r of n odes in the n etwork to blin d it (to be deﬁned in Section II), we ﬁnd the optimal Byzantin e a ttack for a ﬁxed fr action o f By zantine n odes in the network in the co ntext of distributed d etection and estimation in Sec tions IV and V respectively . From the n etwork’ s perspe ctiv e , we present a mitigatio n scheme in Sectio n VI that identiﬁes the Byzantine n odes using rep utation-tag s. Finally , we pr esent o ur conclud ing remarks in Section V II. I I . S Y S T E M M O D E L Consider an inf erence ( sensor) network with N senso rs, where α fr action of the n odes in the network a re assumed to be com promised (Refer to Figure 1a). T hese com promised sensors transmit false data to the fusion cen ter (FC) in orde r to deter iorate the inferen ce p erform ance of the network. W e 1 The well-kn own atta cker -in-the-middle is one such exa mple. 3 F usion Center Phenomenon Byzantine Sensors Honest Sensors (a) Sensor Network Model p 12 1 − X j 6 =1 p 1 ,j p 1 ,M 1 2 M 1 2 M u i v i (b) Byzantine Attack Model Fig. 1: Distributed Infe rence Network in the Pr esence of Byzan tine Attacks assume th at the network is designed to infer abo ut a p articular pheno menon, regard ing which sensors acquire con ditionally- indepen dent ob servations. W e den ote th e ob servation of th e i th sensor as r i . This observation r i is mapped to one of the M sy mbols, u i ∈ { 1 , · · · , M } . I n a compr omised inference network, since the Byzan tine sen sors d o not tr ansmit th eir tru e quantized data, we denote the transmitted symbo l as v i at the i th sensor . I f the node i is honest, th en v i = u i . Oth erwise, we assume that the Byzantin e sen sor mod iﬁes u i = l to v i = m with a pr obability p lm , as shown in Figure 1b. F or the sake of compactn ess, we deno te the transition pr obabilities depicted in the grap h in Figu re 1b u sing a r ow-stochastic matrix P , as follows: P =      p 11 p 12 . . . p 1 M p 21 p 22 . . . p 2 M . . . . . . . . . . . . p M 1 p M 2 . . . p M M      . (1) Since the attacker has no k nowledge of qu antization th resh- olds employed at e ach sensor, we assume that P is in- depend ent of the sensor ob servations. Th e message s v = { v 1 , v 2 , · · · , v N } a re transmitted to the fusion cen ter (FC) where a glob al in ference is made abo ut the phenom enon of interest based on v . In order to consid er the gen eral in ference p roblem , we assume that θ ∈ Θ is th e p arameter that d enotes th e phe- nomeno n o f interest in th e received signal r i at the i th sensor . If we are con sidering a d etection/classiﬁcation p roblem, θ is discrete (ﬁnite or coun tably inﬁn ite). In the case of par am- eter estimation , Θ is a continu ous set. W itho ut any loss of generality , we assume Θ = { 0 , 1 , · · · , K − 1 } if the pro blem of interest is classiﬁcation. Hence, d etection is a special case of classiﬁcation with K = 2 . I n the case of estimation , we assume that Θ = R . Note th at the perf ormance of the FC is de termined by the probab ility distrib ution (mass fun ction) P ( v | θ ) . Theref ore, in Section III, we analy ze the b ehavior of P ( v | θ ) in th e pr esence of different attack s and ide ntify the on e with the g reatest impact o n the network. I I I . O P T I M A L B Y Z A N T I N E A T TAC K S Giv e n the conditio nal distribution of r i , p ( r i | θ ) , an d the sensor quan tization thr esholds, λ j for 0 ≤ j ≤ M , the condition al distrib ution o f u i can be fo und as P ( u i = m | θ ) = Z λ m λ m − 1 p ( r i | θ ) dr i (2) for all m = 1 , 2 , · · · , M . If th e true quantized symbol at the i th node is u i = m , a compro mised nod e will mod ify it into v i = l as depicted in Figure 1 b, and transmit it to the FC. Sin ce the FC is n ot aware of the typ e of the node ( honest o r Byzan tine), it is natur al to assume that nod e i is comprom ised with pro bability α , where α is the fraction of nodes in the network that are compromised . Therefo re, we ﬁn d the cond itional distribution of v i at th e FC as fo llows. P ( v i = m | θ ) = αP ( v i = m | i = B y z antine , θ ) +(1 − α ) P ( v i = m | i = H onest, θ ) = α M X l =1 P ( u i = l | θ ) · P ( v i = m | u i = l , θ ) +(1 − α ) P ( u i = m | θ ) = α M X l =1 p lm P ( u i = l | θ ) + (1 − α ) P ( u i = m | θ ) = α X l 6 = m p lm P ( u i = l | θ ) + [(1 − α ) + αp mm ] P ( u i = m | θ ) = [(1 − α ) + αp mm ] + X l 6 = m { αp lm − [(1 − α ) + αp mm ] } P ( u i = l | θ ) . (3) The go al of a Byzan tine attack is to blin d the FC with the least a mount of effort (m inimum α ) . T o totally blind the 4 FC is equ iv alent to making P ( v i = m | θ ) = 1 / M for all 0 ≤ m ≤ M − 1 . In E quation (3), the RHS consists o f two terms. Th e ﬁrst on e is based o n pr ior knowledge and the second term conveys infor mation based o n the ob servations. In order to blin d the FC, the attacker sho uld m ake th e second term e qual to zero. Since the attacker does not have any knowledge regarding P ( u i = l | θ ) , it can make the second term of Equatio n (3) equal to zero by setting αp lm = (1 − α ) + αp mm , ∀ l 6 = m. (4) Then the con ditional probab ility P ( v i = m | θ ) = (1 − α ) + αp mm becomes in depend ent of the observations r i (or its quantized version u i ), resulting in equipro bable symbols at the FC. In oth er words, th e recei ved vector v = { v 1 , v 2 , · · · , v N } does not carry any informa tion ab out θ and, therefor e, resu lts in the mo st degraded p erform ance at th e FC. So, the FC n ow has to solely depend on its prior informatio n about θ in making an infe rence. Having identiﬁed the c ondition in Equation (4) und er which the Byzantine attack makes the greatest impact o n the perfor- mance of the network, we identify the strategy that the attacker should employ in or der to achieve this condition as follows. Since we need P ( v i = m | θ ) = (1 − α ) + αp mm = 1 / M , α = M − 1 (1 − p mm ) M . T o minim ize α , o ne need s to m ake p mm = 0 . In this pape r , we den ote the α co rrespon ding to this optimal strategy that minimizes the Byzantine attacker’ s resources required to blind the FC as α blind . He nce, α blind = M − 1 M . Rearrangin g Equation (4), we have 1 α = 1 + ( p lm − p mm ) = 1 + p lm ∀ l 6 = m. (5) By setting α to α blind , we have p lm = 1 / ( M − 1) , ∀ l 6 = m . That is, the transition pr obability P is a highly-sym metric matrix. W e summarize the resu lt as a theorem as follows. Theorem 1. If the Byzantine a ttack er ha s no knowledge o f the quan tization th r esho lds employed at each sensor , then the optimal B yzantine attack is given as p lm =    1 M − 1 ; if l 6 = m 0 ; oth erwise α blind = M − 1 M . (6) W e term Eq uation (6) as the o ptimal Byzan tine attack, sin ce the FC does not get any informatio n from the data v it receives from the sensors to p erform a n infer ence task. Theref ore, the FC has to rely o n pr ior inform ation about the param eter θ , if av ailable. Theorem 1 can be extended to the case wher e the channels between senso rs (attac kers) are n ot p erfect. Th e r esult is given in A ppendix A. In Figure 2, we show how α blind scales with increasing quantization alp habet size, M . Sin ce the quan tized symbols Quantiz ation bit s α blind 1 0.5 2 0.75 3 0.875 4 0.9375 5 0.9688 6 0.9844 7 0.9922 8 0.9961 T ABLE I: Improvement in α blind with inc reasing nu mber o f quantization bits, log 2 M are en coded into bits, we also show an exponen tial incre ase in α blind as the number o f b its needed to encode the M symbols, i.e ., lo g 2 M , increases. This is also shown in T able I. Note th at, if the senso rs use o ne add itional qu antization-b it (2- bit qua ntization) in their quan tization scheme instead o f 1-b it quantization (binar y q uantization) , then the α blind increases from 0.5 to 0.75. This trend is observed with increasing number of quantiza tion b its, and when the sensors emp loy an 8- bit quantizer, then th e attacker need s to comp romise at least 99.6 % of th e sensors in the network to blind the FC. Obviously , the improvement in security performa nce is not f ree as the sensors incur a co mmunic ation cost in ter ms of energy and b andwidth as the number of qu antization bits increases. Therefo re, in a pr actical world, the network designer faces a trad e-off between th e com munication co st and the security guaran tees. Also, note that, wh en M = 2 (1-b it qu antization), o ur results coincid e with th ose of Rawat et al. in [20], where the fo cus was o n the prob lem of binary h ypothe ses testing in a distributed sensor network. On th e oth er h and, our r esults are mor e g eneral as they ad dress any infer ence prob lem - detection, estimation or classiﬁcation in a distributed sensor network. Anoth er extreme c ase to note is when M → ∞ , in which case, α blind → 1 . This means th at the Byzan tine attacker can not blind the FC u nless all the sensors are co m- promised. In the following sectio ns, we consider distributed detection and estimation pro blems in sensor networks and analyze the impact of th e optimal Byzantine attack on these systems. For the sake of tractability , we consider a n oiseless channel ( Q = I ) at the FC in the fram ew o rk of re source-co nstrained Byzantine attack. There fore, accord ing to Theorem 1, we restrict our atten tion to the set of highly- symmetric P for the sake of tractability . In other word s, we assume tha t p lm = ( p if l 6 = m 1 − ( M − 1) p otherwise. (7) I V . D I S T R I B U T E D D E T E C T I O N I N T H E P R E S E N C E O F R E S O U R C E - C O N S T R A I N E D B Y Z A N T I N E A T TAC K S In this section, we consider a resource- constrained Byzan- tine attac k on binary hypo theses testing in a distributed sensor network wher e the ph enomen on of interest is denoted as θ and is mo deled as follows: θ = ( 0; if H 0 1; if H 1 . (8) 5 5 10 15 20 25 30 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 M α blind 5 − bits 4 − bit 3 − bits 2 − bits 1 − bit (a) α blind vs. M 1 2 3 4 5 6 7 8 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 No. of quantization bits α blind (b) α blind vs. Number of quantiza tion bits ( log 2 M ) Fig. 2: Impr ovement in α blind with in creasing nu mber of quantizatio n le vels In or der to characterize the pe rforma nce of the FC, we consider Kullback-Leib ler Divergence (KLD) a s the p erfor- mance metric. Note that KLD can be in terpreted as the err or exponent in the Neyman- Pearson d etection framework [25], which m eans that th e p robab ility of missed d etection g oes to zero exponen tially with the numb er of sensors at a rate equal to KLD computed at the FC. W e deno te KLD at th e FC by D F C and d eﬁne it as f ollows: D F C = E H 0  log  P ( v | H 0 ) P ( v | H 1 )  = X m ∈{ 1 , ··· ,M } N P ( v = m | H 0 ) · log  P ( v = m | H 0 ) P ( v = m | H 1 )  (9) Since w e have assumed tha t the sen sor ob servations are condition ally independen t, 2 KLD c an be expressed as D F C = N D F C , (10) where D F C = M X m =1 P ( v = m | H 0 ) · log  P ( v = m | H 0 ) P ( v = m | H 1 )  . Note th at the optimal Byzan tine attack, a s given in Equatio n (6), results in equipro bable symbols at th e FC irr espective of the hy potheses. Theref ore, D F C = 0 under o ptimal Byzantin e attack, re sulting in the blinding o f the FC. On the o ther hand , if the attacker does no t have eno ugh resources to comprom ise α blind fraction of sensors in the net- work ( i.e. α < α blind ), an optima l strategy for the Byzantin e node is to u se an appr opriate P matrix that d eteriorates th e perfor mance of the sensor ne twork to the m aximal extent. 2 For notatio nal con venience , sensor inde x i is ignored in the rest of the paper . As men tioned earlier in Section III, we restrict our search to ﬁnd ing the o ptimal P within a sp ace of highly sym metric row-stochastic ma trices, as given in Equ ation (7). Thus, we formu late the pro blem as follows. Problem 1. Given the va lue o f α < α blind , ﬁn d th e o ptimal P within a space of hig hly symmetric r ow-stochastic matrices, as g iven in Equatio n (7) , such th at minimize p D F C subject to 0 ≤ p ≤ 1 M − 1 Theorem 2 presents the op timal ﬂip ping probab ility that provides the solutio n to Pro blem 1. Note that th is resu lt is indepen dent of the design of the sensor network and, therefore, can be em ployed when the Byzantin e ha s n o kn owledge about the network . Theorem 2 . Given a ﬁxed α < M − 1 M , the pr oba bility p that optimizes P within a spa ce of highly symmetric r ow- stochastic matrices, as given in Equa tion (7) , such that D F C is minimized, is given by p ∗ = 1 M − 1 . (11) Pr oof: See Ap pendix B. Note th at this solutio n is of par ticular interest to the Byzan- tine attacker since the solu tion does not require any knowledge about the sensor network design . Also, the attacker’ s strategy is very simple to implem ent. Numerical Results For illustration p urposes, let us conside r the f ollowing example, where the inference ne twork is dep loyed to aid the 6 oppor tunistic spectrum acc ess for a cognitive rad io network (CRN). In other words, the CRs are sensing a licensed spectrum band to ﬁnd the vacant b and for the operatio n of the CRN. Let the o bservation mo del at the i th sensor be de ﬁned as follows. r i = s ( θ ) + n i , (12) where θ ∈ { 0 , 1 } , s ( θ ) = µ · ( − 1) 1+ θ is a BPSK-mo dulated symbol transmitted by th e licensed (o r the p rimary) u ser transmitter, and the noise n i is the A WGN at the i th sensor with p robab ility distrib ution N (0 , σ 2 ) . Therefo re, the co nditiona l distribution of r i under H 0 and H 1 can be given as N ( − µ, σ 2 ) and N ( µ, σ 2 ) respec ti vely . The range o f r i spans th e en tire re al line ( R ). Howe ver, we assume that the qua ntizer restricts the su pport by limiting the range of output values to a smaller range, say [ − A, A ] . This par ameter A is called the overloading parameter [26] b ecause the choice of A dictates th e am ount of overloading distortio n caused by the qu antizer . Within this restricted r ange of obser vations, we assume a uniform qu antizer with a step size (called the granularity p arameter) given by ∆ = 2 M − 2 , which d ictates the granu larity d istortion of the quan tizer . In o ther words, the observation r i is qu antized using the fo llowing quantizer: u i =            0; if − ∞ < r i ≤ λ 1 1; if λ 1 < r i ≤ λ 2 . . . M − 1; if λ M − 1 < r i ≤ ∞ , (13 ) where λ i = A ·  2( i − 1) M − 2 − 1  . Note th at, λ 1 = − A and λ M − 1 = A repre sent the restricted range of the qua ntizer, as discussed earlier . T he i th sensor transmits a symbol v i to the FC, where v i = u i if it is ho nest. In the case of the i th sensor being a Byzantin e nod e, the decision u i is m odiﬁed into v i using the ﬂipp ing probab ility matrix P as given in Eq uation (6). Although th e p erform ance of a g iv en sensor network is quantiﬁed by the pr obability of error at the FC, we use a sur rogate metric, as described earlier, called the KLD at the FC (Refer to Equation (9)) for th e sake of tractability . In an a symptotic sense, Stein’ s Lemm a [25] states tha t th e KLD is the rate at which the p robability of missed detection conv erges to zero un der a co nstrained pro bability of false alarm. T herefor e, in our num erical results, we p resent h ow KLD at the FC varies with the fr action of By zantine nodes α , in th e network. For the above sensor network, we assume that µ = 1 , σ 2 = 1 and A = 2 . In Figure 3, we plot the contribution of each sensor in terms of KLD at the FC as a function of α , for 1-bit, 2 -bit, 3-bit an d 4-bit quantization s, i.e ., M = 2 , 4, 8 and 1 6 resp ectiv ely , at th e sensor s. As per our in tuition, we observe an im provement in both the detectio n perfo rmance (KLD) as well a s security per forman ce ( α blind ). Ther efore, for a given α , the Byzan tine attack can b e mitig ated by emp loying ﬁner q uantization at the sensors. Of course, th e best that the 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 α D FC 1−bit 2−bit 3−bit 4−bit Fig. 3: Contr ibution o f a sensor to the overall KL D a t the fusion cen ter as a fun ction of α , fo r different numb er o f quantization levels. The pen tagrams on the x-a xis corre spond to th e α blind for 1 -bit, 2- bit, 3- bit an d 4- bit quan tizations respectively f rom left to rig ht. designer can do is to let the sen sors transmit unq uantized data to the FC, whether in the form of observation samp les or their sufﬁcient statistic (likelihoo d r atio). In this case, we can see that α blind = 1 , since lim M →∞ M − 1 M = 1 . V . D I S T R I B U T E D E S T I M A T I O N I N T H E P R E S E N C E O F R E S O U R C E - C O N S T R A I N E D B Y Z A N T I N E A T TAC K S In this section, we conside r the pr oblem of estima ting a scalar p arameter o f in terest, d enoted b y θ ∈ R , in a distributed sensor network . As descr ibed in the system m odel, we a ssume that the i th sensor qu antizes its o bservation r i into an M- ary symbol u i , and tr ansmits v i to the FC. If the i th node is honest, then v i = u i . Otherwise, we assum e that the sensor is compro mised and ﬂips u i into v i using a ﬂipping p robab ility matrix P . Under the assump tion that the FC rece i ves the symbols v over an ideal chann el, the estimation perfo rmance at th e FC depen ds on the pr obability mass f unction P ( v | θ ) . The perfo rmance of a distributed estimation network can be expressed in terms of the mean-squar ed e rror, deﬁne d as E h ( ˆ θ − θ ) 2 i . I n the case of un biased estimators, this mean - squared error is lower bound ed by the Cramer-Rao lower bound (CRLB) [27], which p rovides a ben chmark for the design of an estimator at th e FC. W e p resent this re sult in Equation ( 14): E h ( ˆ θ ( v ) − θ ) 2 i ≥ 1 I F C , (14) where I F C = E "  ∂ log P ( v , θ ) ∂ θ  2 # . (15) 7 The term I F C is well kn own as the Fisher infor mation (FI), and is, th erefor e, a perfo rmance metric that captur es the perfor mance of the optimal estimator at the FC. Note th at, as shown in Eq uation (16), I F C can be fu rther deco mposed into two parts, one correspo nding to the prior knowledge about θ at the FC, and the other (deno ted as J F C ) r epresenting the informa tion about θ , in th e sensor tran smissions v : I F C = J F C + E "  ∂ log p ( θ ) ∂ θ  2 # , (16) where J F C = E "  ∂ log P ( v | θ ) ∂ θ  2 # . (17 ) In most cases, a clo sed fo rm expression for the m ean- squared e rror is intractable and, th erefore, condition al Fisher informa tion (FI) is used as a surro gate metr ic to qu antify the perf ormance o f a distributed estimation ne twork. In this paper, we also u se condition al FI of the re ceiv ed data v as the perfor mance metric. Since the senso r observations are cond itionally independ ent resu lting in ind epende nt v , we denote the con ditional FI as J F C and is deﬁned as f ollows: J F C = N J F C , (18) where J F C = E  ∂ ∂ θ log P ( v | θ )  2 = − E  ∂ 2 ∂ θ 2 log P ( v | θ )  . (19) Follo wing th e sam e appr oach as in Section IV, we con- sider th e problem of ﬁnding an optimal resource-c onstrained Byzantine attack when α < α blind , by ﬁnd ing th e symmetric transition ma trix P that minimizes the co nditiona l FI at the FC. Th is can be formu lated as follows. Problem 2. Given the va lue of α , determine th e optimal P within a space of h ighly symmetric r ow-sto chastic ma trices, as g iven in Equatio n (7) , such that minimize p J F C subject to 0 ≤ p ≤ 1 M − 1 . Theorem 3 presen ts the op timal ﬂipping proba bility that provides a solution to Problem 2. Note that this r esult is indepen dent of the design of the sensor network and, therefore, can be em ployed when the Byzantin e h as n o knowledge abou t the network. Theorem 3. Given a ﬁxed α < M − 1 M , the ﬂipp ing pr o babil- ity p that optimizes P over a spa ce o f h ighly symmetric r ow- stochastic matrices, as g iven in Equa tion (7) , by minimizing J F C is given by p ∗ = 1 M − 1 . Pr oof: See Ap pendix C. Numerical Results As an illustrative example, we co nsider the problem of estimating θ = 1 at the FC based on all the sensors’ transmitted messages. Let the observation m odel at th e i th sensor be deﬁned as f ollows: r i = θ + n i , (20) where the noise n i is the A WGN at the i th sensor with probab ility distribution N (0 , σ 2 ) . The sensors emp loy the same qu antizer as the one presented in Equation (13). Th e quantized sy mbol, den oted as u i at the i th sensor , is then modiﬁed into v i using the ﬂipping prob ability matr ix P , as giv en in Equation (6) . Figure 4 plots the conditiona l FI correspondin g to one sensor , for different values of α an d M , wh en th e un iform quantizer is cen tered arou nd the tr ue value of θ . Note that as SNR increases ( σ → 0 ), we observe that it is be tter for the network to per form as mu ch ﬁner qu antization as p ossible to mitigate the By zantine attackers. On the other ha nd, if SNR is low , coarse quantizatio n perform s better for lower values of α . This pheno menon of coarse quan tization perf orming better under low SNR scenar ios, can be attributed to th e fact that more no ise gets ﬁlter ed as the quantization gets coa rser (decreasing M ) than the signal itself. On the other hand, in the case of high SNR, since the signal level is hig h, co arse quantization cancels o ut the signal comp onent signiﬁcantly , thereby re sulting in a d egradation in p erforma nce. V I . M I T I G AT I O N O F B Y Z A N T I N E A T TAC K S I N A B A N DW I D T H - C O N S T R A I N E D I N F E R E N C E N E T W O R K Giv e n that the distributed inference network is under Byzan- tine attack, we showed that the performance of the network can be improved by in creasing the quantizatio n alph abet size of the sensors. Obviou sly , in a band width-con strained distributed inference network , th e sensor s can only transmit with the maximum possible M , which is ﬁnite. In this section, we as- sume that the network cannot f urther increase the quan tization alphabet size du e to this bandwid th constraint. Th erefore, we present a r eputation- based Byzantine identiﬁcation/m itigation scheme, which is a n extension of the on e propo sed by Raw at et al. in [20], in or der to im prove th e inf erence perfo rmance of the network . A. Reputatio n-T agging a t the Sen sors As propo sed by Rawat et al. in [20], the FC identiﬁes the Byzantine nodes by itera ti vely u pdating a r eputation- tag for each node as time pro gresses. W e extend the schem e to include ﬁne quantizatio n scenar ios, i.e., M > 2 , and analyze its per forman ce through simulation results. As mentioned earlier in the paper, th e FC receives a vector v of received symbo ls from the sensor s and fuses the m to yield a glo bal decision , denoted as ˆ θ . W e assume tha t the observation m odel is k nown to the network designer, and is giv en as f ollows: r i = f i ( θ ) + n i , (21) where f i ( · ) denotes the known observation mo del. W e denote the quantization r ule emp loyed at the sensor a s γ . Therefo re, 8 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 α J FC 1−bit 2−bit 3−bit 4−bit (a) Low SNR case: σ = 1 0 0.2 0.4 0.6 0.8 1 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 α J FC 1−bit 2−bit 3−bit 4−bit (b) High SNR case: σ = 0 . 01 Fig. 4: Contribution of a sensor to the overall co nditional FI at the FC as a fu nction o f α , for dif ferent nu mber o f qu antization lev e ls whe n θ = 0 and A = 2 . The pentag rams on the x-axis correspo nd to the α blind for 1-b it, 2-b it, 3 -bit an d 4-bit quantization s respectively fr om left to right. the quan tized message at the sensor is g iv en by u i = γ ( r i ) . As discussed earlier, the i th sensor ﬂips u i into v i using a ﬂip ping probab ility matrix P . Since th e FC m akes a glob al inferen ce ˆ θ , it can calculate the square d-deviation d i of each sensor from the expected message th at it is to n ominally tran smit as fo llows: d i =  γ − 1 ( v i ) − f i ( ˆ θ )  2 , (22) where γ − 1 ( v i ) is th e inverse of the sen sor q uantizer γ ( v i ) and it is assumed to be the cen troid of th e correspond ing decision region of th e q uantizer v i . Note that v i is the r eceiv ed sym bol which chara cterizes the behavior (ho nest or Byzan tine) of the i th sensor , while f i ( ˆ θ ) is the signal that the FC expects the sensor to observe. If th e i th sensor is honest, we expec t the m ean of d i to be small. On th e oth er h and, if the i th sensor is a comp romised nod e, then the me an o f d i is expe cted to be large. Therefo re, we accumulate the squar ed-deviations d i = { d i (1) , · · · , d i ( T ) } over T time intervals an d co mpute a reputation tag Λ i ( d i ) , a s a time -average fo r the i th node as follows: Λ i = 1 T T X t =1 d i ( t ) . (23) The i th sensor is declared honest/Byzantine using the follow- ing thr eshold-b ased tagging rule Λ i Byzantine ≷ Honest η . (24) The per forman ce of the above tagging rule depend s strongly on the ch oice of η . Note that the th reshold η sho uld be ch osen based on two factors. Firstly , η shou ld be chosen in such a way that the prob ability w ith w hich a ma licious node is tag ged Byzantine is high. High er the value of η , lower is the chance of taggin g a node to b e Byzan tine and vic e-versa. Th is results in a tradeoff be tween the p robab ility of detecting a Byzantin e vs. the pro bability o f falsely tagging an ho nest n ode a s a Byzantine. Secondly , the value of M also play s a role in the choice of η , an d therefor e, the perfo rmance of the tagging rule. W e illustrate this p henom enon in ou r simulation results. B. Optimal Cho ice of the T a gging Threshold as T → ∞ In this pape r , we denote the tru e type of the i th node as T i , where T i = H corre sponds to honest beh avior , wh ile T i = B correspo nds to Byz antine beh avior , f or all i = 1 , · · · , N . Earlier, in th is sectio n, we presented Equa tion (2 4) which allows us to make infer ences abo ut the true type. But, the perfor mance o f the Byzantine tagging scheme c orrespon ding the i th sensor is q uantiﬁed by the condition al pro babilities P (Λ i ≥ η | T i = T ) , fo r bo th T = H , B . In ord er to ﬁnd the optimal choice of η in Eq uation (24), we continu e with the Neyman-Pearson framew o rk even in the co ntext o f Byzantine identiﬁcation, where the go al is to maxim ize P (Λ i ≥ η | T i = B ) , subject to the con dition that P (Λ i ≥ η | T i = H ) ≤ ξ . T o ﬁnd th ese two cond itional probabilities P (Λ i ≥ η | T i = H ) and P (Λ i ≥ η | T i = B ) , we need a closed form expression of the conditional distributions, P (Λ i | T i = H ) and P (Λ i | T i = B ) respectively . In p ractice, wh ere T is ﬁnite, it is intractab le to deter mine th e co nditiona l distribution o f Λ i , whic h is ne cessary to co me up with the optim al ch oice of η . Theref ore, in this paper, we assum e that T → ∞ and present an asy mptotic choice of the tagging threshold η used in Equ ation (24). As T → ∞ , since d i ( t ) is in depend ent across t = 1 , · · · , T , due to central-limit theo rem, (Λ i | T i = T ) ∼ N ( µ i, T , σ i, T ) , 9 where µ i, T = E (Λ i | T i = T ) = E   γ − 1 ( v i ( t )) − ˆ θ ( t )  2 | T i = T  (25) and σ 2 i, T = V ar(Λ i | T i = T ) = 1 T V ar   γ − 1 ( v i ( t )) − ˆ θ ( t )  2 | T i = T  . (26) In this pap er , we do not p resent the ﬁnal f orm of µ i, T and σ i, T in ord er to preserve gene rality . Assuming that v i ( t ) is indep endent across sen sors as well as time, the m oments of d i can b e comp uted for any given FC’ s inferen ce ˆ θ ( t ) at time t about a g iv en p henome non. Although the ﬁnal form o f µ i, T and σ i, T is not p resented, since d i ( t ) is a function o f v , we present the cond itional proba bility of ( v j | T i = T ) in Equation (2 7), wh ich is necessary f or the computatio n of µ i, T and σ i, T . P ( v j | T i = T ) = Z P ( v j | θ, T i = T ) p ( θ ) dθ, (27) where P ( v j | θ, T i = T ) can b e calculate d as follows: P ( v j = m | θ , T i = H ) =                P ( u j = m | θ ) , if j = i (1 − π B H ) P ( u j = m | θ ) + π B H M X k =1 p km P ( u j = k | θ ) , if j 6 = i (28) and P ( v j = m | θ , T i = B ) =                      M X k =1 p km P ( u j = k | θ ) , if j = i (1 − π B B ) P ( u j = m | θ ) + π B B M X k =1 p km P ( u j = k | θ ) , if j 6 = i , (29) where π B H = P ( T j = B | T i = H ) and π B B = P ( T j = B | T i = B ) are co nditional prob abilities of the j th node’ s type, given the type of the i th node. Since there are α fractio n of nodes in the network, given that th e FC kn ows the type of i th node as H , the condition al probability o f the j th node belongin g to a type T is given by π B H = N α N − 1 and π B B = N α − 1 N − 1 . Giv e n the condition al d istributions P (Λ i | T i = H ) and P (Λ i | T i = B ) , we ﬁnd th e perfor mance of the Byzan tine identiﬁcation schem e as follows: P (Λ i ≥ η | T i = H ) = Q  η − µ i,H σ i,H  P (Λ i ≥ η | T i = B ) = Q  η − µ i,B σ i,B  . (30) Under the NP f ramework, the optim al η can b e chosen by letting P (Λ i ≥ η | i = H ) = β , when Λ i is normally distrib u ted condition ed o n the tru e type of a given n ode. In o ther words, Q  η − µ i,H σ i,H  = ξ (31) or eq uiv alently , η optimal = µ i,H + σ i,H Q − 1 ( ξ ) . (32) Note th at, since P ( v i | T i = H ) is a fun ction of α , it follows that b oth µ i,H and σ i,H are fun ctions of α . Althou gh we d o not provide a clo sed-form expression fo r η as a function of α , we p rovide th e following example to portray how η varies with different values of α . 1) Exa mple: V ariation of η a s a functio n of α : Consider a distributed e stimation netw ork with N = 5 identical nodes. Let the p rior distribution o f th e tru e ph enomen on θ be the u niform distribution U (0 , 1) . W e assum e that the sensing chann el is an A WGN channel where the sen sor observations is gi ven by r i = θ + n i . Th erefore, the condition al distribution of the senso r observations is N ( θ , σ 2 ) , when conditioned on θ . W e assume that the sensor s employ the quantizer rule shown in Equ ation (13) on their o bservations r i . At the FC, we let γ − 1 ( · ) be deﬁned as th e centro id fun ction th at retu rns c i = λ i − 1 + λ i 2 . Let ˆ θ = 1 M N X i =1 γ − 1 ( v i ( t )) be the fu sion rule employed a t the FC to estimate θ . Since the ne twork comprises of iden tical nodes, withou t any loss o f gene rality , we hencefor th fo cus o ur attention on the reputation -based identiﬁcation rule at sensor-1. Substitutin g the above mentioned fu sion rule in the squared-d eviation d 1 correspo nding to sensor-1 in Equatio n (22), we have d 1 = γ − 1 ( v 1 ) − 1 M 5 X i =1 γ − 1 ( v i ( t )) ! 2 = M − 1 M γ − 1 ( v 1 ) − 1 M 5 X i =2 γ − 1 ( v i ( t )) ! 2 . (33) Let us deno te φ ij = E n  γ − 1 ( v i )  j | T 1 = H o = M X v i =1 h  γ − 1 ( v i )  j P ( v i | T 1 = H ) i , f or all i = 1 , · · · , 5 and j = 1 , 2 , · · · , ∞ . Here, P ( v i | T 1 = H ) can be co mputed using 10 Equation ( 28) as follows: P ( v i = m | T 1 = H ) = Z ∞ −∞ P ( v i = m | θ , T 1 = H ) p ( θ ) dθ = Z 1 0 P ( v i = m | θ , T 1 = H ) dθ =                a 1 ,m if i = 1 N α ( N − 1)( M − 1) +  1 − M N α ( N − 1)( M − 1)  a i,m otherwise. (34) where a i,m = Z 1 0 P ( u i = m | θ ) dθ , fo r all i = 1 , · · · , N . Note that, since all the nod es in the network ar e iden tical, P ( u i | θ ) is indepen dent o f the no de-ind ex i , a nd the refore, φ ij = φ 2 j , f or all i 6 = 1 . Thus, the condition al mean and variance, µ 1 H and σ 2 1 H , are giv en as f ollows for the special case of N = 5 : µ 1 H = E   M − 1 M γ − 1 ( v 1 ) − 1 M 5 X i =2 γ − 1 ( v i ( t )) ! 2 | T i = H   = 1 M 2 E   ( M − 1) γ − 1 ( v 1 ) − 5 X i =2 γ − 1 ( v i ( t )) ! 2 | T i = H   = 1 M 2  ( M − 1) 2 φ 12 + 4 φ 22 + 12 φ 2 21 − 8( M − 1) φ 11 φ 21  (35) and σ 2 1 H = 1 T V ar   γ − 1 ( v i ( t )) − ˆ θ ( t )  2 | T i = H  = 1 T  ∆ − µ 2 1 H  , (36) where ∆ = E   M − 1 M γ − 1 ( v 1 ) − 1 M 5 X i =2 γ − 1 ( v i ( t )) ! 4 | T i = H   = 1 M 4  ( M − 1) 4 φ 14 − 16( M − 1) 3 φ 13 φ 21 +6( M − 1) 2 φ 12 { 4 φ 22 + 12 φ 2 21 } − 4( M − 1) φ 11 (4 φ 23 + 36 φ 22 φ 21 + 24 φ 3 21 ) + 4 φ 24 +12 φ 23 φ 21 + 36( φ 23 φ 21 + φ 2 22 + 2 φ 22 φ 2 21 ) +24( φ 4 21 + 3 φ 22 φ 2 21 )  . (37) Thus, fo r ξ = 0 . 01 , we com pute the taggin g threshold η numerically as shown in Equ ation ( 32), and plot th e variation of η as a function of α in Figure 5. Note that, in o ur num erical results, we ob serve that the op timal choice of η is a conve x function of α , where the curvature o f the con vexity de creases with in creasing M . This ca n be c learly seen from Figure 5 b, where we only p lot the case of M = 7 . W e ob serve a similar behavior for all the o ther values of M , and th erefore, present the case of M = 7 to illustrate the conve x behavior of η . In other word s, for very large values of M , the ch oice of η becomes inde penden t o f α , f or any ﬁxed α ≤ α blind . C. Simu lation Results In order to illustra te the per forman ce of the p roposed reputation -based scheme, we con sider a sensor network with a total of 100 sensors in the network, out of which 20 are Byzantine sensor s. Let the sensor quantizers be gi ven by Equation (13) and the fu sion rule at the FC be the MAP rule, giv en as f ollows: N X i =1 log  P ( v i | H 1 ) P ( v i | H 0 )  ˆ θ =1 ≷ ˆ θ =0 log p 0 p 1 . (38) Figure 6 plots the rate of iden tiﬁcation of the n umber of Byzantine nodes in th e network fo r the p ropo sed reputatio n- based schem e for d ifferent sizes of th e qu antization alph abet set. Note that the con vergence rate deterior ates as M in creases. This is due to the fact that the Byzantine nodes ha ve increasing number of sym bol options to ﬂip to , because of which a greater numbe r of time-sam ples are needed to identify the malicious beh avior . In additio n, we also simu late th e evolution of mislabelling an h onest no de as a Byzantine nod e in time, and p lot the pro bability of the occur rence of th is ev ent in Figure 7. Just as the conv ergence deterio rates with increasing M , we o bserve a similar be havior in the ev olu tion of the probab ility of m islabelling honest nodes. An other imp ortant observation in Figure 7 is that the probab ility of mislabelling a node always conv erges to zero in time. Similarly , we simulate the evolution of mislabelling a Byzantine n ode as an hon est one in time in Figure 8. W e observe similar c onv ergen ce of the probab ility o f mislabelling a Byzantin e nod e as an honest node to zero , with a rate that decr eases with increasing number o f qu antization levels, M . Th erefor e, Figures 6, 7 and 8 de monstrate that, after a sufﬁcient amo unt o f time, the reputation -based scheme always iden tiﬁes the true beh avior of a nod e within the n etwork, with negligib le nu mber o f mislabels. V I I . C O N C L U D I N G R E M A R K S In summary , we m odelled the pr oblem of distributed infer- ence with M-ar y quantized d ata in the presenc e of Byzan tine attacks, under the assumptio n that the attacker does not have knowledge about either th e true h ypotheses or th e quantiza tion thresholds at th e sensors. W e fo und the optimal Byzantine attack tha t blinds the FC in the case of any in ference task for both noiseless and n oisy FC chann els. W e also con sid- ered the p roblem of r esource-co nstrained Byzan tine a ttack ( α < α blind ) for d istributed detection and estimation in the pre sence of r esource-co nstrained Byzantine attacker for 11 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 2 4 6 8 10 12 14 16 18 α η M = 2 M = 3 M = 4 M = 5 M = 6 M = 7 (a) M = 2 , · · · , 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 1 1.05 1.1 1.15 α η M = 7 (b) M = 7 Fig. 5: V ariation of the optimal tagging thr eshold η (in the asy mptotic sen se, wh ere T → ∞ ) as a function o f α 0 20 40 60 80 100 15 20 25 30 35 40 Time Est. no. of Byzantines 1−bit 2−bit 3−bit 4−bit Fig. 6: Rate of ide ntiﬁcation of the numb er of Byzantine nodes in tim e for different numbe r of qu antization levels the special c ase o f hig hly sym metric attac k strategies in the presence of noiseless chann els at the FC. From the inferen ce network’ s per spectiv e, we presen ted a mitigation sch eme that identiﬁes the Byzantine nodes thro ugh repu tation-tagg ing. W e also sh owed how the optimal tagging thr eshold ca n be fou nd when the tim e-window T → ∞ . Fina lly , we also in vestigated the per forman ce of our r eputation- based scheme in our simu- lation results and show that our sch eme alw ays co n verges to ﬁnding all the co mprom ised nodes, g iv e n sufﬁcient amo unt of time. In our futur e work , we will investigate the optimal resource- constrained Byzantine attack in the spac e of all row- stochastic ﬂipping prob ability m atrices, and if p ossible, ﬁnd schemes that mitigate the Byzan tine attack more effecti vely . 0 20 40 60 80 100 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Time Prob. of labeling honest nodes as Byzantines 1−bit 2−bit 3−bit 4−bit Fig. 7: Evolution of the pr obability of m islabelling an hon est node as a Byzantine in time for different n umber o f quantiza- tion levels A P P E N D I X A O P T I M A L B Y Z A N T I N E A T TAC K I N T H E P R E S E N C E O F A D I S C R E T E N O I S Y C H A N N E L A T T H E F C Giv e n that the messages v = { v 1 , v 2 , · · · , v N } ar e trans- mitted to the fusion cen ter (FC), we assume a discrete noise channel Q = [ q mn ] between the sensors and the FC, wh ere q mn is the pro bability with wh ich v i = m is transf ormed to symbol z i = n a t th e i th sensor . Based on the received z at the FC, a global infere nce is made abo ut the phen omeno n of interest. In this pa per, we assume that the r ow-stochastic channel matr ix Q is in vertible f or the sake of tractability . Giv e n the transition p robability matr ix Q for the ch annel between the sensors and the FC, we assume th at the FC receives z i = n when the the i th sensor transm its v i = m , with a probability q mn . The condition al distribution of z i = n 12 0 20 40 60 80 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time Prob. of labeling Byzantines as honest nodes 1−bit 2−bit 3−bit 4−bit Fig. 8: Evolution o f the probability of mislabelling a B yzantine node as an h onest n ode in time for d ifferent number of quantization le vels under a given phenomen on θ , is given as P ( z i = n | θ ) = M X m =1 q mn P ( v i = m | θ ) . (39) Note that if Q is a doub ly stoch astic matrix, since M X m =1 q mn = 1 , it is sufﬁcient for th e Byzantin e attacker to ensu re P ( v i = m | θ ) = 1 M . Th us, by Theor em 1, we have th e following theorem wh en Q is a d oubly stoch astic matrix. Theorem 4. If the channe l matrix Q is doub ly-stochastic, and if th e Byzantine a ttack er has no knowledge ab out the sensors’ quantizatio n thres holds, then the optimal Byzan tine attack is given a s p lm =    1 M − 1 ; if l 6 = m 0 ; oth erwise α blind = M − 1 M . (40) Therefo re, we focus our atten tion to any general row- stochastic chann el matrix Q , where M X m =1 q mn need not nec- essarily sum to unity for all n = 1 , · · · , M . I n other words, the Byzantine attacker h as to ﬁnd an alternative strategy to blind the FC, wher e P ( z i = n | θ ) = 1 M . Substituting E quation (3) in Equation (39) and rearrangin g the term s, we h av e the following. P ( z i = n | θ ) = M X m =1 q mn P ( v i = m | θ ) = M X m =1 q mn [(1 − α ) + αp mm ] + M X m =1 q mn    X l 6 = m { αp lm − [(1 − α ) + αp mm ] } P ( u i = l | θ )    = M X m =1 q mn [(1 − α ) + αp mm ] + M X l =1   X m 6 = l q mn { αp lm − [(1 − α ) + αp mm ] }   P ( u i = l | θ ) . (41) The go al of a Byzantin e attack is to blin d the FC with the least amo unt of effort (minimum α ). T o totally blind th e FC is equiv ale nt to makin g P ( z i = n | θ ) = 1 / M for all 0 ≤ n ≤ M − 1 . In Eq uation (41), the RHS co nsists of two terms. Th e ﬁrst on e is based o n pr ior knowledge and the second term conveys in formatio n based on the ob servations. In order to blin d the FC, the attacker sho uld m ake the second term equ al to ze ro. Since the attacker does not h av e any knowledge r egarding P ( u i = l | θ ) , it can make the seco nd term of Equatio n (41) eq ual to zero by setting X m 6 = l q mn { αp lm − [(1 − α ) + αp mm ] } = 0 for a ll 1 ≤ n, l ≤ M . (42) Then the con ditional probab ility P ( z i = n | θ ) = M X m =1 q mn [(1 − α ) + αp mm ] becomes indepen dent of the observations r i (or its quantized version u i ), resulting in equ iprob able symbols at the FC. In o ther words, the received vector z = { z 1 , z 2 , · · · , z N } does not carry any inform ation abou t u = { u 1 , u 2 , · · · , u N } , thus making FC so lely dep endent on its prio r info rmation about θ in making an infe rence. In or der to identify the strategy that the attacker shou ld employ to a chieve the cond ition in Equa tion (42), for all n = 1 , · · · , M , we need P ( z i = n | θ ) = 1 M , or , M X m =1 q mn { (1 − α ) + αp mm } = 1 M . (43) In matr ix for m, we can rewrite Equ ation (43) a s (1 − α ) 1 T Q + α p T Q = 1 M 1 T , where 1 is an all-one column-vector a nd p = [ p 11 , · · · , p M M ] T is the colum n-vector of all d iagonal elements of P . In o ther word s, α ( 1 − p ) = 1 − 1 M  Q T  − 1 1 (44) 13 Note that every elem ent in the LHS of Equ ation (44) always lies between 0 and 1. Th erefor e, th e existence of the Byzantine’ s optimal strategy relies on the following con dition. In oth er words, 0 ≤  Q T  − 1 1 ≤ M 1 . (45) If (4 5) do es not hold, there does n ot exist an optimal strategy . Giv e n that the condition in Equation (45) h olds, the min imum α can b e foun d as f ollows. α blind = min  1 − 1 M  Q T  − 1 1  = 1 − 1 M max n  Q T  − 1 1 o . (46) Therefo re, p can be calculated as p = 1 − 1 α blind  1 − 1 M  Q T  − 1 1  = 1 α blind M  Q T  − 1 1 − 1 − α blind α blind 1 . (47) Next, in order to ﬁnd the rest o f the P matrix, let us co nsider Equation (4 2). Ad ding q ln { αp ll − [1 − α + αp ll ] } o n both sides to Equ ation (42), we h ave M X m =1 q mn { αp lm − [(1 − α ) + αp mm ] } = − q ln (1 − α ) for all 1 ≤ n, l ≤ M . or , α M X m =1 q mn p lm = 1 M − q ln (1 − α ) for all 1 ≤ n, l ≤ M . (48) In matr ix for m, we have α PQ = 1 M 1 − (1 − α ) Q , (49) where 1 is an all-one matrix . Equ iv alently , we have P = 1 αM 1 Q − 1 − 1 − α α I , (50) where I is the identity m atrix. Note tha t the vector p (com- prising the diag onal elements of P ) obtained from Equ ation (50) is veriﬁed to be same as that from E quation (4 7). In summar y , we have the fo llowing theore m that provid es the o ptimal Byzantine strategy in th e presence o f n oisy FC channels: Theorem 5 . Let the Byzantine attacker have no knowledge about th e sensors ’ quantization thr esholds, and, the FC’ s channel ma trix b e Q . If Q is no n-singula r , a nd, if 0 ≤  Q T  − 1 1 ≤ M 1 , then the o ptimal Byzantine a ttack is g iven as α blind = 1 − 1 M max n  Q T  − 1 1 o P = 1 α blind M 1 Q − 1 − 1 − α blind α blind I . (51) Note that, if the chann el m atrix Q is doubly -stochastic, we have Q 1 = 1 and Q T 1 = 1 . Sub stituting these cond itions in Equation ( 51), The orem 5 reduces to Theor em 4. Having identiﬁed the optimal Byzantin e attack, one can observe that the attacker needs to comp romise a huge numb er of sensors ( α blind = 1 − 1 M max n  Q T  − 1 1 o ) in the n etwork to blind th e FC. T herefor e, it is obvious that, in the case of a r esource-c onstrained attacker, th e attac ker co mprom ises a ﬁxed fraction of nodes α ≤ α blind in su ch a way that the perf orman ce degrad ation at the FC is maxim ized. In our fu ture work, we will inv estigate the problem o f ﬁnd ing the optimal strategy in the context of resour ce-constrain ed Byzantine attacks in the pr esence o f noisy FC cha nnels. A P P E N D I X B P R O O F F O R T H E O R E M 2 For the sake of notationa l simplicity , let us d enote x m = P ( u = m | H 0 ) a nd y m = P ( u = m | H 1 ) . Similarly , ˜ x m = P ( v = m | H 0 ) and ˜ y m = P ( v = m | H 1 ) . Re writing E quation (3) in our n ew notation, we have ˜ x m = α X l 6 = m px l + (1 − α ( M − 1 ) p ) x m = αp + (1 − M αp ) x m (52) and ˜ y m = α X l 6 = m py l + (1 − α ( M − 1 ) p ) y m = αp + (1 − M αp ) y m . (53) Therefo re, the KLD at th e FC can be r ewritten as D F C = M X m =1 ˜ x m log  ˜ x m ˜ y m  . (54) On partially differentiating D F C with respect to p , we have ∂ D F C ∂ p = ∂ ∂ p M X m =1 ˜ x m log  ˜ x m ˜ y m  = α M X m =1  (1 − M x m )  1 + log ˜ x m ˜ y m  − (1 − M y m ) ˜ x m ˜ y m  = α M X m =1 (1 − M x m ) + α M X m =1 (1 − M x m ) log ˜ x m ˜ y m − α M X m =1 (1 − M y m ) ˜ x m ˜ y m . (55) Consider the ﬁrst term in the RHS of Equation (5 5). Note that, since x = { x 1 , · · · , x M } is a pr obability mass func tion, we h av e M X m =1 (1 − M x m ) = M − M M X m =1 x m = M − M = 0 . 14 Therefo re, Equation (5 5) reduce s to ∂ D F C ∂ p = α M X m =1 (1 − M x m ) log ˜ x m ˜ y m − α M X m =1 (1 − M y m ) ˜ x m ˜ y m . (56) Rearrangin g the term s in Equation ( 56), we have ∂ D F C ∂ p = α M X m =1  log ˜ x m ˜ y m − ˜ x m ˜ y m  − αM M X m =1 x m log ˜ x m ˜ y m + αM M X m =1 y m ˜ x m ˜ y m . (57) Let u s d enote the ﬁrst te rm as T 1 . I n oth er words, T 1 = α M X m =1  log ˜ x m ˜ y m − ˜ x m ˜ y m  . Let us now focus ou r attention on the o ther terms in the RHS of Eq uation (5 7). Substituting Equations (5 2) an d (5 3) in the second and third terms of the RHS of Equation (57), we h av e ∂ D F C ∂ p = T 1 − M α 1 − M αp M X m =1 ( ˜ x m − αp ) log ˜ x m ˜ y m + M α 1 − M αp M X m =1 ( ˜ y m − αp ) ˜ x m ˜ y m = T 1 − M α 1 − M αp D ( ˜ x || ˜ y ) + M α 1 − M αp ( M X m =1 αp log ˜ x m ˜ y m − M X m =1 αp ˜ x m ˜ y m + M X m =1 ˜ x m ) , (58) where D ( ˜ x || ˜ y ) is th e KLD b etween ˜ x and ˜ y an d is, th erefore, non-n egati ve. Also, n ote that in Equa tion (5 8), since ˜ x = { ˜ x 1 , · · · , ˜ x M } is a prob ability mass fun ction, M X m =1 ˆ x m = 1 . Therefo re, Equation (5 8) reduce s to ∂ D F C ∂ p = T 1 − M α 1 − M αp D ( ˜ x || ˜ y ) + M α 1 − M αp + M α 2 p 1 − M αp M X m =1  log ˜ x m ˜ y m − ˜ x m ˜ y m  . (59) Note that the last term in the RHS o f Equ ation (59), M α 2 p 1 − M αp M X m =1  log ˜ x m ˜ y m − ˜ x m ˜ y m  = M αp 1 − M αp T 1 . In oth er words, ∂ D F C ∂ p =  1 + M αp 1 − M αp  T 1 − M α 1 − M αp D ( ˜ x || ˜ y ) + M α 1 − M αp = 1 1 − M αp T 1 − M α 1 − M αp D ( ˜ x || ˜ y ) + M α 1 − M αp . (60) Rearrangin g the terms in Equation (60) and expand ing T 1 , we h av e ∂ D F C ∂ p = − M α 1 − M αp D ( ˜ x || ˜ y ) + M α 1 − M αp + α 1 − M αp M X m =1  log ˜ x m ˜ y m − ˜ x m ˜ y m  = − M α 1 − M αp D ( ˜ x || ˜ y ) + α 1 − M αp M X m =1  log ˜ x m ˜ y m −  ˜ x m ˜ y m − 1  . (61) Since log x ≤ x − 1 f or all x , we ﬁnd that the second term in the RHS o f Equation ( 56) is negativ e . Therefore, we have ∂ D F C ∂ p ≤ 0 . (62) Since D F C is a n on-incr easing functio n of p , the optimal p , p ∗ , takes th e m aximum value 1 / ( M − 1) . A P P E N D I X C P R O O F F O R T H E O R E M 3 For the sake o f notation al simplicity , we let z m = P ( u = m | θ ) . Similarly , ˜ z m = P ( v = m | θ ) . Using this notation in Equation ( 19), we have J F C = M X m =1 P ( v = m | θ )  ∂ log P ( v = m | θ ) ∂ θ  2 = M X m =1 ˜ z m  ∂ log ˜ z m ∂ θ  2 = (1 − M αp ) 2 M X m =1 1 ˜ z m  ∂ z m ∂ θ  2 . (63) 15 Partially differentiating J F C with r espect to p , we h av e ∂ J F C ∂ p = 2(1 − M αp )( − M α ) M X m =1 1 ˜ z m  ∂ z m ∂ θ  2 +(1 − M αp ) 2 M X m =1  − 1 ˜ z 2 m  ( α − M αz m )  ∂ z m ∂ θ  2 = − (1 − M αp ) " 2 M α M X m =1 ˜ z m  1 ˜ z m ∂ z m ∂ θ  2 +(1 − M αp ) M X m =1 α  1 ˜ z m ∂ z m ∂ θ  2 − (1 − M αp ) M X m =1 M αz m  1 ˜ z m ∂ z m ∂ θ  2 # = − (1 − M αp ) " α (1 − M αp ) M X m =1  1 ˜ z m ∂ z m ∂ θ  2 + M α (1 + M αp ) M X m =1 z m  1 ˜ z m ∂ z m ∂ θ  2 # . (64) In Eq uation (64), we have a negativ e term multip lied by a non-n egati ve term, and hence we have ∂ J F C ∂ p ≤ 0 . (65) Since J F C is a n on-incr easing function o f p , p ∗ = 1 M − 1 , being the maximu m value, is the optimal solution to Problem 2. A C K N O W L E D G E M E N T This work was supporte d in par t by AFOSR u nder Grants F A9550-10-1 -045 8, F A9550- 10-1- 0263, F A 9550 -10-C-0 179 and by CASE at Sy racuse University an d National Science Council of T aiwan, under g rants NSC 99-22 21-E-0 11-158 -MY3, NSC 10 1-222 1-E-0 11-069 -MY3 . Han’ s work was completed du ring his visit to Syracu se University from 2012 to 2 013. R E F E R E N C E S [1] P . K. V arshney , Distributed Detection and Data Fusion . Springer , New Y ork, 1997. [2] I. F . Akyildiz , W . Su, Y . Sankarasubramani am, and E. Cayirci, “W ireless sensor networks: a surv ey , ” Computer netwo rks , vol . 38, no. 4, pp. 393– 422, 2002. [3] V . V . V eerav alli and P . K. V arshne y , “Distrib uted infe rence in wireless sensor networks, ” Philosophic al T ransact ions of the Royal Socie ty A: Mathemat ical, Physical and Engineeri ng Science s , vol. 370, no. 1958, pp. 100–117, 2012. [4] J. N. Tsitsikli s, “Decentral ized detection , ” in Advances in Signal Pr o- cessing , H. V . Poor and J. B. Thomas, Eds. J AI P ress, 1993, vol. 2, pp. 297–344. [5] R. V iswana than and P . K. V arshne y , “Distribute d detectio n with multiple sensors i. fundamental s, ” Pr oc. IEEE , vol. 85, no. 1, pp. 54–63, 1997. [6] R. S. Blum, S. A. Kassam, and H. V . Poor , “Distrib uted dete ction with multiple sensors ii. advance d topics, ” Proc . IEE E , vol. 85, no. 1, pp. 64–79, 1997. [7] B. Chen, L. T ong, and P . K . V arshne y , “Chan nel-a ware distrib uted de- tecti on in wireless sensor networks, ” IEE E Signal Pr ocessing Magazine , vol. 23, no. 4, pp. 16–26, 2006. [8] J. Chamberla nd and V . V e erav alli, “Wir eless sensors in distributed detec tion applicati ons, ” IE EE Signal P r ocessing Magazine , vol. 24, no. 3, pp. 16–25, 2007. [9] L. Cheng, C. W u, Y . Z hang, H. Wu , M. Li, and C. Maple , “ A surv ey of localizat ion in wireless sensor netwo rk, ” Internatio nal J ournal of Distrib uted Sensor Networks , vol. 2012, pp. 1–12, 2012. [10] N. Patwa ri, J. Ash, S. Kyperount as, A. Hero, R. Moses, and N. Correal, “Locati ng the nodes: cooperati ve localiz ation in wireless sensor net- works, ” IEEE Signal P r ocessing Magazine , vol. 22, no. 4, pp. 54–69, 2005. [11] R. R. Brooks, P . Ramanath an, and A. M. Sayeed, “Distrib uted target classiﬁca tion and track ing in sensor networks, ” Pro ceedin gs of the IEEE , vol. 91, no. 8, pp. 1163–1171, 2003. [12] J . Tsitsiklis, “Decentra lized detectio n by a larg e number of sensors, ” Math. Contr ol, Signal s, Systems , vol. 1, no. 2, pp. 167–182, 1988. [13] A. Perrig, R. Sze wczyk, J. D. T ygar , V . W en, and D. E. Culler , “Spins: security protoc ols for sensor networks, ” W ir el. Netw . , vol. 8, no. 5, pp. 521–534, Sep. 2002. [Online]. A va ilable: http:/ /dx.doi.or g/10.1023/A: 1016598314198 [14] A. Perrig, J. Stankov ic, and D. W agner , “Security in wireless sensor netw orks, ” Communications of the ACM , vol . 47, no. 6, pp. 53–57, 2004. [15] C. Karlof, N. Sastry , and D. W agner , “Tiny sec: a link layer security archit ecture for wireless sensor networks, ” in Proce edings of the 2nd internat ional confer ence on Embedded networke d sensor systems , ser . SenSys ’04. New Y ork , NY , USA: A CM, 2004, pp. 162–175. [Onlin e]. A v ailable: http://doi .acm.org/10 .1145/1031495.1031515 [16] A. V empat y , L . T ong, and P . V arshney , “Distribu ted inference with byzanti ne data, ” to appear in IEE E Signal Process. Mag., Special Issue: Signal Processi ng for Cyber-securi ty and Privacy . [17] L . L amport, R. Shosta k, and M. Pease, “The byzantine gene rals prob- lem, ” ACM T rans. P r ogram. Lang. Syst. , vol. 4, no. 3, pp. 382–401, Jul. 1982. [Online]. A v ailab le: http: //doi.ac m.org/10.1145 /357172.357176 [18] G. N. Nayak and S. Samaddar , “Dif ferent ﬂavou rs of man-in-the-middle attac k, consequenc es and feasibl e solutions, ” in Computer Science and Informati on T ec hnology (ICCSIT), 2010 3rd IEEE Internati onal Confer ence on , vol. 5, 2010, pp. 491–495. [19] S . Marano, V . Matta, and L . T ong, “Distrib uted detection in the presence of byzantine att acks, ” IEEE T rans. Signal Proce s s. , vol. 57, no. 1, pp. 16–29, 2009. [20] A. S. Rawat , P . Anand, H. Chen, and P . K. V arshney , “Collab orati ve spectrum s ensing in the presence of byzantine att acks in cogni tive radio netw orks, ” IEEE T rans. Signal Proce ss. , vol. 59, no. 2, pp. 774–786, 2011. [21] A. V empaty , K. A graw al, H. Chen, and P . V arshney , “ Adap ti ve learni ng of byzantines’ behavior in cooperati ve spectrum sensing, ” in Pro c. IEE E W ir eless Communications and Networki ng Conf. (WCNC) , 2011, pp. 1310–1315. [22] B. Kailkh ura, S . Brahma, and P . K. V arshn ey , “Optimal byzantine att ack on distrib uted detec tion in tree based topologie s , ” in P r oc. of Inter- national Confer ence on Computing, Networking and Communicati ons W ork shops (ICNC-CPS) , San Diego, USA, January 2013. [23] E . Soltanmoha mmadi, M. Orooji, and M. Naraghi-Pour , “Decentrali zed hypothesi s testing in wireless sensor networks in the presence of misbeha ving nodes, ” IE EE Tr ansactions on Informati on F or ensics and Securit y , vol. 8, no. 1, pp. 205–215, 2013. [24] A. V empa ty , O. Ozdemir , K. Agrawal, H. Chen, and P . V a rshney , “Locali zation in wireless sensor networks: Byzantines and mitigati on techni ques, ” IE EE T ransactio ns on Signal P r ocessing , vol. 61, no. 6, pp. 1495–1508, 2013. [25] J .-F . Chamberl and and V . V . V ee rav alli, “Decent raliz ed detectio n in sensor netw orks, ” IEEE T rans. Signa l Proc ess. , vol. 51, no. 2, pp. 407– 416, 2003. [26] J . G. Proakis, Digital signal proc essing: principles algorithms and applica tions . Pearson E ducati on India, 2001. [27] A. Ribeiro and G. B. Gianna kis, “Bandwidth-c onstraine d distrib uted estimati on for wireless sensor netwo rks - part i: Gaussia n case, ” IEEE T ransacti ons on Signal P r ocessing , vol. 54, no. 3, pp. 1131–1143, 2006.

Distributed Inference with M-ary Quantized Data in the Presence of Byzantine Attacks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment