Distributed Bayesian Detection with Byzantine Data
In this paper, we consider the problem of distributed Bayesian detection in the presence of Byzantines in the network. It is assumed that a fraction of the nodes in the network are compromised and reprogrammed by an adversary to transmit false inform…
Authors: Bhavya Kailkhura, Yunghsiang S. Han, Swastik Brahma
1 Distrib uted Bayesian Detection with Byzantine Data Bha vya Kailkhura, Student Member , IEEE , Y unghsiang S. Han, F ellow , IEEE , Swastik Brahma, Member , IEEE , Pramod K. V arshney , F ellow , IEEE Abstract In this paper , we consider the problem of distributed Bayesian detection in the presence of Byzan- tines in the network. It is assumed that a fraction of the nodes in the network are compromised and reprogrammed by an adversary to transmit false information to the fusion center (FC) to degrade detection performance. The problem of distributed detection is formulated as a binary hypothesis test at the FC based on 1-bit data sent by the sensors. The expression for minimum attacking power required by the Byzantines to blind the FC is obtained. More specifically , we sho w that above a certain fraction of Byzantine attackers in the network, the detection scheme becomes completely incapable of utilizing the sensor data for detection. W e analyze the problem under different attacking scenarios and deriv e results for different non-asymptotic cases. It is found that existing asymptotics-based results do not hold under se veral non-asymptotic scenarios. When the fraction of Byzantines is not sufficient to blind the FC, we also provide closed form expressions for the optimal attacking strate gies for the Byzantines that most degrade the detection performance. Index T erms Bayesian detection, Data falsification, Byzantine Data, Probability of error, Distributed detection I . I N T R O D U C T I O N Distributed detection is a well studied topic in the detection theory literature [1]–[3]. In distributed detection systems, due to bandwidth and energy constraints, the nodes often make a B. Kailkhura, S. Brahma and P . K. V arshney are with Department of EECS, Syracuse Uni versity , Syracuse, NY 13244. (email: bkailkhu@syr .edu; skbrahma@syr .edu; varshney@syr .edu) Y . S. Han is with EE Department, National T aiw an Uni versity of Science and T echnology , T aiwan, R. O. C. (email: yshan@mail.ntust.edu.tw) DRAFT 2 1-bit local decision regarding the presence of a phenomenon before sending it to the fusion center (FC). Based on the local decisions transmitted by the nodes, the FC mak es a global decision about the presence of the phenomenon of interest. Distrib uted detection w as originally moti vated by its applications in military surveillance but is no w being emplo yed in a wide v ariety of applications such as distributed spectrum sensing (DSS) using cognitiv e radio networks (CRNs) and traf fic and en vironment monitoring. In many applications, a large number of inexpensi ve and less reliable nodes that can provide dense cov erage are used to provide a balance between cost and functionality . The performance of such systems strongly depends on the reliability of the nodes in the network. The robustness of distributed detection systems against attacks is of utmost importance. The distributed nature of such systems makes them quite vulnerable to dif ferent types of attacks. In recent years, security issues of such distributed netw orks are increasingly being studied within the networking [4], signal processing [5] and information theory communities [6]. One typical attack on such networks is a Byzantine attack. While Byzantine attacks (originally proposed by [7]) may , in general, refer to man y types of malicious behavior , our focus in this paper is on data-falsification attacks [8]–[15]. In this type of attack, an attacker may send false (erroneous) data to the FC to degrade detection performance. In this paper , we refer to such a data falsification attacker as a Byzantine and the data thus generated is referred to as Byzantine data. W e formulate the signal detection problem as a binary hypothesis testing problem with the two hypotheses H 0 (signal is absent) and H 1 (signal is present). W e make the conditional i.i.d. assumption under which observations at the nodes are conditionally independent and identically distributed giv en the hypothesis. W e assume that the FC is not compromised, and is able to collect data from all the nodes in the network via error free communication channels. 1 W e also assume that the FC does not know which node is Byzantine, but it knows the fraction of Byzantines in the network. 2 W e consider the problem of distrib uted Bayesian detection with prior probabilities of hypotheses known to both the FC and the attacker . The FC aims to minimize the probability of error by choosing the optimal fusion rule. 1 In this work, we do not consider how individual nodes deliv er their data to the fusion center except that the Byzantines are not able to alter the transmissions of honest nodes. 2 In practice, the fraction of Byzantines in the network can be learned by observing the data sent by the nodes at the FC over a time window; howev er, this study is beyond the scope of this work. DRAFT 3 A. Related W ork Although distributed detection has been a very activ e field of research in the past, security problems in distributed detection networks gained attention only very recently . In [11], the authors considered the problem of distributed detection in the presence of Byzantines under the Neyman-Pearson (NP) setup and determined the optimal attacking strategy which minimizes the detection error exponent. This approach based on Kullback-Leibler di vergence (KLD) is analytically tractable and yields approximate results in non-asymptotic cases. They also assumed that the Byzantines kno w the true hypothesis, which obviously is not satisfied in practice but does provide a bound. In [12], the authors analyzed the same problem in the context of collaborati ve spectrum sensing under Byzantine Attacks. They relaxed the assumption of perfect kno wledge of the hypotheses by assuming that the Byzantines determine the kno wledge about the true hypotheses from their own sensing observ ations. A v ariant of the above formulation was explored in [13], [16], where the authors addressed the problem of optimal Byzantine attacks (data falsification) on distributed detection for a tree-based topology and extended the results of [12] for tree topologies. By assuming that the cost of compromising nodes at different le vels of the tree is different, they found the optimal Byzantine strategy that minimizes the cost of attacking a gi ven tree. Schemes for Byzantine node identification hav e been proposed in [12], [15], [17], [18]. Our focus is considerably dif ferent from Byzantine node identification schemes in that we do not try to authenticate the data; we consider most ef fectiv e attacking strategies and distributed detection schemes that are robust against attacks. B. Main Contributions All the approaches discussed so far consider distributed detection under the Neyman-Pearson (NP) setup. In this paper , we consider the distributed Bayesian detection problems with known prior probabilities of h ypotheses. W e assume that the Byzantines do not hav e perfect knowledge about the true state of the phenomenon of interest. In addition, we also assume that the Byzantines neither have the kno wledge nor control ov er the thresholds used to make local decisions at the nodes. Also, the probability of detection and the probability of false alarm of a node are assumed to be the same for ev ery node irrespectiv e of whether they are honest or Byzantines. In our earlier work [19] on this problem, we analyzed the problem in the asymptotic re gime. Adopting Chernof f information as our performance metric, we studied the performance of a distrib uted detection DRAFT 4 T ABLE I D I FF ER E N T S C E NA R I O S BA S E D O N T H E K N OW L E D GE O F T H E O P P O NE N T ’ S S T R A T E GI E S Cases Attacker has the knowledge of the FC’s strategies FC has the knowledge of Attacker’ s strategies Case 1 No No Case 2 Y es No Case 3 Y es Y es Case 4 No Y es system with Byzantines in the asymptotic regime. W e summarize our results in the following theorem. Theorem 1 ( [19]) . Optimal attacking strate gies, ( P ∗ 1 , 0 , P ∗ 0 , 1 ) , which minimize the Chernoff information ar e ( P ∗ 1 , 0 , P ∗ 0 , 1 ) ( p 1 , 0 , p 0 , 1 ) if α ≥ 0 . 5 (1 , 1) if α < 0 . 5 , wher e, ( p 1 , 0 , p 0 , 1 ) satisfy α ( p 1 , 0 + p 0 , 1 ) = 1 . In our current work, we significantly extend our pre vious w ork and focus on a non-asymptotic analysis for the Byzantine attacks on distributed Bayesian detection. First, we show that above a certain fraction of Byzantines in the network, the data fusion scheme becomes completely incapable (blind) and it is not possible to design a decision rule at the FC that can perform better than the decision rule based just on prior information. W e find the minimum fraction of Byzantines that can blind the FC and refer to it as the critical power . Next, we explore the optimal attacking strategies for the Byzantines under dif ferent scenarios. In practice, the FC and the Byzantines will optimize their utility by choosing their actions based on the knowledge of their opponent’ s behavior . This motiv ates us to address the question: what are the optimal attacking/defense strategies gi ven the knowledge of the opponent’ s strategies? Study of these practically motiv ated questions requires non asymptotic analysis, which is systematically studied in this w ork. By assuming the error probability to be our performance metric, we analyze the problem in the non asymptotic re gime. Observe that, the probability of error is a function of the fusion rule, which is under the control of the FC. This giv es us an additional degree of freedom to analyze the Byzantine attack under different practical scenarios where the FC and the Byzantines DRAFT 5 may or may not ha ve knowledge of their opponent’ s strategies (For a description of dif ferent scenarios see T able I). It is found that results based on asymptotics do not hold under se veral non-asymptotic scenarios. More specifically , when the FC does not hav e kno wledge of attacker’ s strategies, results for the non-asymptotic case are different from those for the asymptotic case. Ho wev er , if the FC has complete kno wledge of the attacker’ s strategies and uses the optimal fusion rule to make the global decision, results obtained for this case are the same as those for the asymptotic case. Knowledge of the behavior of the attacker in the non-asymptotic regime enables the analysis of many related questions, such as the design of the optimal detector (fusion rule) and effects of strategic interaction between the FC and the attacker . In the process of analyzing the scenario where the FC has complete kno wledge of its opponent’ s strategies, we obtain a closed form e xpression of the optimal fusion rule. T o summarize, our main contrib utions are threefold. • In contrast to pre vious works, we study the problem of distrib uted detection with Byzantine data in the Bayesian framework. • W e analyze the problem under different attacking scenarios and deriv e closed form expres- sions for optimal attacking strategies for different non-asymptotic cases. • In the process of analyzing the scenario where the FC has complete kno wledge of its opponent’ s strategies, we obtain a closed form expression for the optimal fusion rule. The signal processing problem considered in this paper is closest to [12]. The approach in [12], based on Kullback-Leibler di vergence (KLD), is analytically tractable and yields approximate results in non-asymptotic cases. Our results, ho wev er , are not a direct application of those of [12]. While as in [12] we are also interested in the optimal attack strategies, our objectiv e function and, therefore, techniques of finding them are different. In contrast to [12], where only optimal strategies to blind the FC were obtained, we also provide closed form expressions for the optimal attacking strate gies for the Byzantines that most degrade the detection performance when the fraction of Byzantines is not suf ficient to blind the FC. In fact, finding the optimal Byzantine attacking strategies is only the first step toward designing a robust distributed detection system. Kno wledge of these attacking strategies can be used to implement the optimal detector at the FC or to implement an ef ficient reputation based identification scheme [12], [20] ( thresholds in these schemes are generally a function of attack strate gies). Also, the optimal attacking distributions DRAFT 6 H one s t B y z an ti ne v1 v1 v2 v2 vN-1 vN uN-1 vN Fig. 1. System Model in certain cases hav e the minimax property and, therefore, the kno wledge of these optimal attack strategies can be used to implement the robust detector . The rest of the paper is org anized as follows. Section II introduces our system model, including the Byzantine attack model. In Section III, we pro vide the closed form expression for the critical po wer abo ve which the FC becomes blind. Ne xt, we discuss our results based on non-asymptotic analysis of the distributed Bayesian detection system with Byzantine data for different scenarios. In Section IV, we analyze the problem when Byzantines do not have any knowledge about the fusion rule used at the FC. Section V discusses the scenario where Byzantines ha ve the kno wledge about the fusion rule used at the FC, b ut the FC does not know the attack er’ s strategies. Next in Section VI, we extend our analysis to the scenario where both the FC and the attacker hav e the kno wledge of their opponent’ s strategies and act strategically to optimize their utilities. Finally , Section VII concludes the paper . I I . D I S T R I B U T E D D E T E C T I O N I N T H E P R E S E N C E O F B Y Z A N T I N E S Consider two hypotheses H 0 (signal is absent) and H 1 (signal is present). Also, consider a parallel network (see Figure 1), comprised of a central entity (known as the Fusion Center (FC)) and a set of N sensors (nodes), which faces the task of determining which of the two hypotheses is true. Prior probabilities of the two hypotheses H 0 and H 1 are denoted by P 0 and P 1 , respecti vely . The sensors observ e the phenomenon, carry out local computations to decide the presence or absence of the phenomenon, and then send their local decisions to the FC that DRAFT 7 yields a final decision after processing the local decisions. Observations at the nodes are assumed to be conditionally independent and identically distributed giv en the hypothesis. A Byzantine attack on such a system compromises some of the nodes which may then intentionally send falsified local decisions to the FC to make the final decision incorrect. W e assume that a fraction α of the N nodes which observ e the phenomenon hav e been compromised by an attacker . W e consider the communication channels to be error-free. Ne xt, we describe the modus operandi of the sensors and the FC in detail. A. Modus Operandi of the Nodes Based on the observ ations, each node i makes a one-bit local decision v i ∈ { 0 , 1 } regarding the absence or presence of the phenomenon using the likelihood ratio test p (1) Y i ( y i ) p (0) Y i ( y i ) v i =1 ≷ v i =0 λ, (1) where λ is the identical threshold 3 used at all the sensors and p ( k ) Y i ( y i ) is the conditional probability density function (PDF) of observation y i under the hypothesis H k . Each node i , after making its one-bit local decision v i , sends u i ∈ { 0 , 1 } to the FC, where u i = v i if i is an uncompromised (honest) node, but for a compromised (Byzantine) node i , u i need not be equal to v i . W e denote the probabilities of detection and false alarm of each node i in the network by P d = P ( v i = 1 | H 1 ) and P f = P ( v i = 1 | H 0 ) , respecti vely , which hold for both uncompromised nodes as well as compromised nodes. In this paper , we assume that each Byzantine decides to attack independently relying on its own observation and decision regarding the presence of the phenomenon. Specifically , we define the follo wing strategies P H j, 1 , P H j, 0 and P B j, 1 , P B j, 0 ( j ∈ { 0 , 1 } ) for the honest and Byzantine nodes, respecti vely: Honest nodes: P H 1 , 1 = 1 − P H 0 , 1 = P H ( x = 1 | y = 1) = 1 (2) P H 1 , 0 = 1 − P H 0 , 0 = P H ( x = 1 | y = 0) = 0 (3) Byzantine nodes: P B 1 , 1 = 1 − P B 0 , 1 = P B ( x = 1 | y = 1) (4) 3 It has been shown that the use of identical thresholds is asymptotically optimal [21]. DRAFT 8 P B 1 , 0 = 1 − P B 0 , 0 = P B ( x = 1 | y = 0) (5) P H ( x = a | y = b ) ( P B ( x = a | y = b ) ) is the probability that an honest (Byzantine) node sends a to the FC when its actual local decision is b . From now onwards, we will refer to Byzantine flipping probabilities simply by ( P 1 , 0 , P 0 , 1 ) . W e also assume that the FC is not aware of the e xact set of Byzantine nodes and considers each node i to be Byzantine with a certain probability α . B. Binary Hypothesis T esting at the Fusion Center W e consider a Bayesian detection problem where the performance criterion at the FC is the probability of error . The FC recei ves decision vector , u = [ u 1 , · · · , u N ] , from the nodes and makes the global decision about the phenomenon by considering the maximum a posteriori probability (MAP) rule which is giv en by P ( H 1 | u ) H 1 ≷ H 0 P ( H 0 | u ) or equiv alently , P ( u | H 1 ) P ( u | H 0 ) H 1 ≷ H 0 P 0 P 1 . Since the u i s are independent of each other , the MAP rule simplifies to a K -out-of- N fusion rule [1]. The global false alarm probability Q F and detection probability Q D are then giv en by 4 Q F = N X i = K N i ( π 1 , 0 ) i (1 − π 1 , 0 ) N − i (6) and Q D = N X i = K N i ( π 1 , 1 ) i (1 − π 1 , 1 ) N − i , (7) where π j 0 and π j 1 are the conditional probabilities of u i = j gi ven H 0 and H 1 , respecti vely . Specifically , π 1 , 0 and π 1 , 1 can be calculated as π 1 , 0 = α ( P 1 , 0 (1 − P f ) + (1 − P 0 , 1 ) P f ) + (1 − α ) P f (8) 4 These expressions are valid under the assumption that α < 0 . 5 . Later in Section VI, we will generalize our result for any arbitrary α . DRAFT 9 and π 1 , 1 = α ( P 1 , 0 (1 − P d ) + (1 − P 0 , 1 ) P d ) + (1 − α ) P d , (9) where α is the fraction of Byzantine nodes. The local probability of error as seen by the FC is defined as P e = P 0 π 1 , 0 + P 1 (1 − π 1 , 1 ) (10) and the system wide probability of error at the FC is giv en by P E = P 0 Q F + P 1 (1 − Q D ) . (11) Notice that, the system wide probability of error P E is a function of the parameter K , which is under the control of the FC, and the parameters ( α , P j, 0 , P j, 1 ) are under the control of the attacker . The FC and the Byzantines may or may not hav e knowledge of their opponent’ s strategy . W e will analyze the problem of detection with Byzantine data under se veral dif ferent scenarios in the follo wing sections. First, we will determine the minimum fraction of Byzantines needed to blind the decision fusion scheme. I I I . C R I T I C A L P O W E R TO B L I N D T H E F U S I O N C E N T E R In this section, we determine the minimum fraction of Byzantine nodes needed to make the FC “blind” and denote it by α blind . W e say that the FC is blind if an adv ersary can make the data that the FC receiv es from the sensors such that no information is con veyed. In other words, the optimal detector at the FC cannot perform better than simply making the decision based on priors. Lemma 1. In Bayesian distributed detection, the minimum fraction of Byzantines needed to make the FC blind is α blind = 0 . 5 . Pr oof: In the Bayesian framew ork, we say that the FC is “blind”, if the receiv ed data u does not provide an y information about the hypotheses to the FC. That is, the condition to make the FC blind can be stated as P ( H i | u ) = P ( H i ) for i = 0 , 1 . (12) DRAFT 10 It can be seen that (12) is equiv alent to P ( H i | u ) = P ( H i ) ⇔ P ( H i ) P ( u | H i ) P ( u ) = P ( H i ) ⇔ P ( u | H i ) = P ( u ) . Thus, the FC becomes blind if the probability of receiving a gi ven vector u is independent of the hypothesis present. In such a scenario, the best that the FC can do is to make decisions solely based on the priors, resulting in the most degraded performance at the FC. No w , using the conditional i.i.d. assumption, under which observations at the nodes are conditionally independent and identically distributed giv en the hypothesis, condition (12) to make the FC blind becomes π 1 , 1 = π 1 , 0 . This is true only when α [ P 1 , 0 ( P f − P d ) + (1 − P 0 , 1 )( P d − P f )] + (1 − α )( P d − P f ) = 0 . Hence, the FC becomes blind if α = 1 ( P 1 , 0 + P 0 , 1 ) . (13) α in (13) is minimized when P 1 , 0 and P 0 , 1 both take their lar gest v alues, i.e., P 1 , 0 = P 0 , 1 = 1 . Hence, α blind = 0 . 5 . Next, we in vestigate how the Byzantines can launch an attack optimally considering that the parameter ( K ) is under the control of the FC. By assuming error probability to be our performance metric, we analyze the non-asymptotic regime. Observe that the probability of error is dependent on the fusion rule. This giv es us an additional degree of freedom to analyze the Byzantine attack under dif ferent scenarios where the FC and the Byzantines may or may not hav e knowledge of their opponent’ s strategies. I V . O P T I M A L A T TAC K I N G S T R A T E G I E S W I T H O U T T H E K N O W L E D G E O F F U S I O N R U L E In practice, the Byzantine attacker may not have the kno wledge about the fusion rule, i.e., the v alue of K , used by the FC. In such scenarios, we obtain the optimal attacking strategy for Byzantines by maximizing the local probability of error as seen by the FC, which is independent of the fusion rule K . W e formally state the problem as DRAFT 11 T ABLE II S O U L T I O N O F M A X IM I Z I N G L O C A L E R R O R P e P R O B L E M P 1 , 0 P 0 , 1 Condition 0 0 P d P f < P 0 P 1 < 1 − P d 1 − P f 0 1 P d P f > P 0 P 1 < 1 − P d 1 − P f 1 0 P d P f < P 0 P 1 > 1 − P d 1 − P f 1 1 P d P f > P 0 P 1 > 1 − P d 1 − P f maximize P 1 , 0 ,P 0 , 1 P 0 π 1 , 0 + P 1 (1 − π 1 , 1 ) subject to 0 ≤ P 1 , 0 ≤ 1 0 ≤ P 0 , 1 ≤ 1 (P1) T o solve the problem, we analyze the properties of the objecti ve function, P e = P 0 π 1 , 0 + P 1 (1 − π 1 , 1 ) , with respect to ( P 1 , 0 , P 0 , 1 ) . Notice that dP e P 1 , 0 = P 0 α (1 − P f ) − P 1 α (1 − P d ) (14) and dP e P 0 , 1 = − P 0 αP f + P 1 αP d . (15) By utilizing monotonicity properties of the objectiv e function with respect to P 1 , 0 and P 0 , 1 ((14) and (15)), we present the solution of the Problem P1 in T able II. Notice that, when P d P f < P 0 P 1 < 1 − P d 1 − P f , both (14) and (15) are less than zero. P e then becomes a strictly decreasing function of P 1 , 0 as well as P 0 , 1 . Hence, to maximize P e , the attacker needs to choose ( P 1 , 0 , P 0 , 1 ) = (0 , 0) . Ho wev er , the condition P d P f < P 0 P 1 < 1 − P d 1 − P f holds iff P d < P f and, therefore, is not admissible. Similar arguments lead to the rest of results giv en in T able II. Note that, if there is an equality in the conditions mentioned in T able II, then the solution will not be unique. For example, dP e P 0 , 1 = 0 ⇔ P 0 P 1 = 1 − P d 1 − P f implies that the P e is constant as a function of P 0 , 1 . In other words, the attacker will be indifferent in choosing the parameter P 0 , 1 because any value of P 0 , 1 will result in the same probability of error . DRAFT 12 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 P 1,0 P 0,1 Local Probability of Error P e (a) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 0.35 0.4 0.45 0.5 0.55 P 1,0 P 0,1 Local Probability of Error P e (b) Fig. 2. (a) P e as a function of ( P 1 , 0 , P 0 , 1 ) when P 0 = P 1 = 0 . 5 . (b) P e as a function of ( P 1 , 0 , P 0 , 1 ) when P 0 = 0 . 1 , P 1 = 0 . 9 . Next, to gain insight into the solution, we present illustrati ve examples that corroborate our results. A. Illustrative Examples In Figure 2(a), we plot the local probability of error P e as a function of ( P 1 , 0 , P 0 , 1 ) when ( P 0 = P 1 = 0 . 5) . W e assume that the local probability of detection is P d = 0 . 8 and the local probability of false alarm is P f = 0 . 1 such that P d P f = 8 , 1 − P d 1 − P f = . 2222 , and P 0 P 1 = 1 . Clearly , P d P f > P 0 P 1 > 1 − P d 1 − P f and it implies that the optimal attacking strategy is ( P 1 , 0 , P 0 , 1 ) = (1 , 1) , which can be verified from Figure 2(a). In Figure 2(b), we study the local probability of error P e as a function of the attacking strategy ( P 1 , 0 , P 0 , 1 ) when ( P 0 = 0 . 1 , P 1 = 0 . 9) . W e assume that the local probability of detection is P d = 0 . 8 and the local probability of false alarm is P f = 0 . 1 such that P d P f = 8 , 1 − P d 1 − P f = . 2222 , and P 0 P 1 = . 1111 . Clearly , P d P f > P 0 P 1 < 1 − P d 1 − P f implies that the optimal attacking strategy is ( P 1 , 0 , P 0 , 1 ) = (0 , 1) , which can be verified from the Figure 2(b). These results corroborate our theoretical results presented in T able II. In the next section, we in vestigate the scenario where Byzantines are aware of the fusion rule K used at the FC and can use this kno wledge to provide false information in an optimal manner to blind the FC. Howe ver , the FC does not hav e kno wledge of Byzantine’ s attacking strategies ( α, P j, 0 , P j, 1 ) and does not optimize against Byzantine’ s behavior . Since majority rule DRAFT 13 is a widely used fusion rule [14], [22], [23], we assume that the FC uses the majority rule to make the global decision. V . O P T I M A L B Y Z A N T I N E A T TAC K I N G S T R A T E G I E S W I T H K N O W L E D G E O F M A J O R I T Y F U S I O N R U L E In this section, we in vestigate optimal Byzantine attacking strategies in a distrib uted detection system, with the attacker having knowledge about the fusion rule used at the FC. Howe ver , we assume that the FC is not strategic in nature, and uses a majority rule, without trying to optimize against the Byzantine’ s behavior . W e consider both the FC and the Byzantine to be strategic in Section VI. The performance criterion at the FC is assumed to be the probability of error P E . For a fixed fusion rule ( K ∗ ) , which, as mentioned before, is assumed to be the majority rule K ∗ = d N +1 2 e , P E v aries with the parameters ( α, P j, 0 , P j, 1 ) which are under the control of the attacker . The Byzantine attack problem can be formally stated as follows: maximize P j, 0 ,P j, 1 P E ( α, P j, 0 , P j, 1 ) subject to 0 ≤ P j, 0 ≤ 1 0 ≤ P j, 1 ≤ 1 . (P2) For a fix ed fraction of Byzantines α , the attacker wants to maximize the probability of error P E by choosing its attacking strategy ( P j, 0 , P j, 1 ) optimally . W e assume that the attacker is aw are of the fact that the FC is using the majority rule for making the global decision. Before presenting our main results for Problem P2, we make an assumption that will be used in the theorem. Assumption 1. W e assume that α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } , 5 wher e m = N 2 N − 2 . A consequence of this assumption is π 1 , 1 > m , which can be shown as follo ws. By (9), we 5 Condition α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } , where m = N 2 N − 2 > 0 . 5 , suggests that as N tends to infinity , m = N 2 N − 2 tends to 0 . 5 . When P d tends to 1 and P f tends to 0 , the abov e condition becomes α < 0 . 5 . DRAFT 14 hav e π 1 , 1 = α ( P 1 , 0 (1 − P d ) + (1 − P 0 , 1 ) P d ) + (1 − α ) P d = αP 1 , 0 (1 − P d ) − α P d P 0 , 1 + P d ≥ − αP d P 0 , 1 + P d ≥ P d (1 − α ) > m. (16) Eq. (16) is true because α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } ≤ (1 − ( m/P d )) . Another consequence of this assumption is π 1 , 0 < 0 . 5 , which can be sho wn as follo ws. From (8), we hav e π 1 , 0 = α ( P 1 , 0 (1 − P f ) + (1 − P 0 , 1 ) P f ) + (1 − α ) P f = αP 1 , 0 − αP f ( P 1 , 0 + P 0 , 1 ) + P f ≤ α + P f < 0 . 5 . (17) Eq. (17) is true because α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } ≤ (0 . 5 − P f ) . Next, we analyze the properties of P E with respect to ( P 1 , 0 , P 0 , 1 ) under our assumption that enable us to find the optimal attacking strategies. Lemma 2. Assume that the FC employs the majority fusion rule K ∗ and α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } , wher e m = N 2 N − 2 . Then, for any fixed value of P 0 , 1 , the err or pr obability P E at the FC is a quasi-con vex function of P 1 , 0 . Pr oof: A function f ( P 1 , 0 ) is quasi-conv ex if, for some P ∗ 1 , 0 , f ( P 1 , 0 ) is non-increasing for P 1 , 0 ≤ P ∗ 1 , 0 and f ( P 1 , 0 ) is non-decreasing for P 1 , 0 ≥ P ∗ 1 , 0 . In other words, the lemma is prov ed if dP E dP 1 , 0 ≤ 0 (or dP E dP 1 , 0 ≥ 0 ) for all P 1 , 0 , or if for some P ∗ 1 , 0 , dP E dP 1 , 0 ≤ 0 when P 1 , 0 ≤ P ∗ 1 , 0 and dP E dP 1 , 0 ≥ 0 when P 1 , 0 ≥ P ∗ 1 , 0 . First, we calculate the partial deri vati ve of P E with respect to P 1 , 0 for an arbitrary K as follows: dP E dP 1 , 0 = P 0 dQ F dP 1 , 0 − P 1 dQ D dP 1 , 0 . (18) The detailed deri vation of dP E dP 1 , 0 is gi ven in Appendix B and we present a summary of the main DRAFT 15 results below . dQ F dP 1 , 0 = α (1 − P f ) N N − 1 K − 1 ( π 1 , 0 ) K − 1 (1 − π 1 , 0 ) N − K , (19) dQ D dP 1 , 0 = α (1 − P d ) N N − 1 K − 1 ( π 1 , 1 ) K − 1 (1 − π 1 , 1 ) N − K , (20) and dP E dP 1 , 0 = − P 1 α (1 − P d ) N N − 1 K − 1 ( π 1 , 1 ) K − 1 (1 − π 1 , 1 ) N − K + P 0 α (1 − P f ) N N − 1 K − 1 ( π 1 , 0 ) K − 1 (1 − π 1 , 0 ) N − K . (21) dP E dP 1 , 0 gi ven in (21) can be reformulated as follows: dP E dP 1 , 0 = g ( P 1 , 0 , K , α ) e r ( P 1 , 0 ,K,α ) − 1 , (22) where g ( P 1 , 0 , K , α ) = N N − 1 K − 1 P 1 α (1 − P d )( π 1 , 1 ) K − 1 (1 − π 1 , 1 ) N − K (23) and r ( P 1 , 0 , K , α ) = ln P 0 P 1 1 − P f 1 − P d π 1 , 0 π 1 , 1 ( K − 1) 1 − π 1 , 0 1 − π 1 , 1 ( N − K ) ! = ln P 0 P 1 1 − P f 1 − P d + ( K − 1) ln π 1 , 0 π 1 , 1 + ( N − K ) ln 1 − π 1 , 0 1 − π 1 , 1 . (24) It can be seen that g ( P 1 , 0 , K , α ) ≥ 0 so that the sign of dP E dP 1 , 0 depends only on the value of r ( P 1 , 0 , K , α ) . T o prov e that P E is a quasi-conv ex function of P 1 , 0 when the majority rule K ∗ is used at the FC, it is sufficient to sho w that r ( P 1 , 0 , K ∗ , α ) is a non-decreasing function. Dif ferentiating r ( P 1 , 0 , K ∗ , α ) with respect to P 1 , 0 , we get dr ( P 1 , 0 , K ∗ , α ) dP 1 , 0 = ( K ∗ − 1) α (1 − P f ) π 1 , 0 − α (1 − P d ) π 1 , 1 + ( N − K ∗ ) α (1 − P d ) 1 − π 1 , 1 − α (1 − P f ) 1 − π 1 , 0 DRAFT 16 = ( K ∗ − 1) α 1 − P f π 1 , 0 − 1 − P d π 1 , 1 − ( N − K ∗ ) α 1 − P f 1 − π 1 , 0 − 1 − P d 1 − π 1 , 1 . (25) It can be sho wn that dr ( P 1 , 0 , K ∗ , α ) dP 1 , 0 > 0 (see Appendix A) and this completes the proof. Quasi-con ve xity of P E ov er P 1 , 0 implies that the maximum of the function occurs on the corners, i.e., P 1 , 0 = 0 or 1 (may not be unique). Next, we analyze the properties of P E with respect to P 0 , 1 . Lemma 3. Assume that the FC employs the majority fusion rule K ∗ and α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } , wher e m = N 2 N − 2 . Then, the pr obability of err or P E at the FC is a quasi- con vex function of P 0 , 1 for a fixed P 1 , 0 . Pr oof: For a fixed P 1 , 0 , we hav e ( π 1 , 0 ) 0 = dπ 1 , 0 /dP 0 , 1 = α ( − P f ) . (26) By a similar argument as giv en in Appendix B, for an arbitrary K we hav e dP E dP 0 , 1 = P 1 αP d N N − 1 K − 1 ( π 1 , 1 ) K − 1 (1 − π 1 , 1 ) N − K − P 0 αP f N N − 1 K − 1 ( π 1 , 0 ) K − 1 (1 − π 1 , 0 ) N − K . (27) dP E dP 0 , 1 gi ven in (27) can be reformulated as follows: dP E dP 0 , 1 = g ( P 0 , 1 , K , α ) e r ( P 0 , 1 ,K,α ) − 1 , (28) where g ( P 0 , 1 , K , α ) = N N − 1 K − 1 P 0 αP f ( π 1 , 0 ) K − 1 (1 − π 1 , 0 ) N − K (29) DRAFT 17 and r ( P 0 , 1 , K , α ) = ln P 1 P 0 P d P f π 1 , 1 π 1 , 0 ( K − 1) 1 − π 1 , 1 1 − π 1 , 0 ( N − K ) ! = ln P 1 P 0 P d P f + ( K − 1) ln π 1 , 1 π 1 , 0 + ( N − K ) ln 1 − π 1 , 1 1 − π 1 , 0 . (30) It can be seen that g ( P 0 , 1 , K , α ) ≥ 0 such that the sign of dP E dP 0 , 1 depends on the value of r ( P 0 , 1 , K , α ) . T o prov e that P E is a quasi-con vex function of P 1 , 0 when the majority rule K ∗ is used at the FC, it is sufficient to show that r ( P 0 , 1 , K ∗ , α ) is a non-decreasing function. Dif ferentiating r ( P 0 , 1 , K ∗ , α ) with respect to P 0 , 1 , we get dr ( P 0 , 1 , K ∗ , α ) dP 0 , 1 = ( K ∗ − 1) αP f π 1 , 0 − αP d π 1 , 1 + ( N − K ∗ ) αP d 1 − π 1 , 1 − αP f 1 − π 1 , 0 (31) = ( N − K ∗ ) α P d 1 − π 1 , 1 − P f 1 − π 1 , 0 − ( K ∗ − 1) α P d π 1 , 1 − P f π 1 , 0 . (32) In the following, we show that dr ( P 0 , 1 , K ∗ , α ) dP 0 , 1 > 0 , (33) i.e., r ( P 0 , 1 , K ∗ , α ) is non-decreasing. It is sufficient to show that ( N − K ∗ ) P d 1 − π 1 , 1 − P f 1 − π 1 , 0 > ( K ∗ − 1) P d π 1 , 1 − P f π 1 , 0 . (34) First, we consider the case when there are an e ven number of nodes in the network and majority fusion rule is gi ven by K ∗ = N 2 + 1 . Since 0 ≤ π 1 , 0 < π 1 , 1 ≤ 1 and N ≥ 2 , we have 1 − 2 N π 1 , 1 π 1 , 0 (1 − π 1 , 1 )(1 − π 1 , 0 ) > − 1 ⇔ 1 − 2 N 1 1 − π 1 , 1 − 1 1 − π 1 , 0 > 1 π 1 , 1 − 1 π 1 , 0 ⇔ 1 − 2 N 1 1 − π 1 , 1 − 1 π 1 , 1 > 1 − 2 N 1 1 − π 1 , 0 − 1 π 1 , 0 . (35) DRAFT 18 Using the fact that P d P f > 1 , π 1 , 1 > N 2 N − 2 , and K ∗ = N 2 + 1 , (35) becomes P d P f 1 − 2 N 1 1 − π 1 , 1 − 1 π 1 , 1 > 1 − 2 N 1 1 − π 1 , 0 − 1 π 1 , 0 ⇔ 1 − 2 N P d 1 − π 1 , 1 − P d π 1 , 1 > 1 − 2 N P f 1 − π 1 , 0 − P f π 1 , 0 ⇔ ( N − K ∗ ) P d 1 − π 1 , 1 − P f 1 − π 1 , 0 > ( K ∗ − 1) P d π 1 , 1 − P f π 1 , 0 . (36) Next, we consider the case when there are odd number of nodes in the network and majority fusion rule is gi ven by K ∗ = N + 1 2 . By using the fact that π 1 , 0 π 1 , 1 > P f P d , it can be seen that the right-hand side of (36) is nonnegati ve. Hence, from (36), we hav e N 2 − 1 P d 1 − π 1 , 1 − P f 1 − π 1 , 0 > N 2 P d π 1 , 1 − P f π 1 , 0 ⇔ N − 1 2 P d 1 − π 1 , 1 − P f 1 − π 1 , 0 > N − 1 2 P d 1 − π 1 , 1 − P f 1 − π 1 , 0 ⇔ ( N − K ∗ ) P d 1 − π 1 , 1 − P f 1 − π 1 , 0 > ( K ∗ − 1) P d π 1 , 1 − P f π 1 , 0 . This completes our proof. Theorem 2. (1 , 0) , (0 , 1) , or (1 , 1) ar e the optimal attacking strate gies ( P 1 , 0 , P 0 , 1 ) that maximize the pr obability of err or P E , when the majority fusion rule is employed at the FC and α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } , wher e m = N 2 N − 2 . Pr oof: Lemma 2 and Lemma 3 suggest that one of the corners is the maximum of P E because of quasi-con ve xity . Note that (0 , 0) cannot be the solution of the maximization problem since the attacker does not flip any results. Hence, we end up with three possibilities: (1 , 0) , (0 , 1) , or (1 , 1) . Next, to gain insights into Theorem 2, we present illustrati ve examples that corroborate our results. A. Illustrative Examples In Figure 3(a), we plot the probability of error P E as a function of the attacking strate gy ( P 1 , 0 , P 0 , 1 ) for ev en number of nodes, N = 10 , in the network. W e assume that the probability DRAFT 19 0 0.5 1 0 0.2 0.4 0.6 0.8 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 P 1,0 P 0,1 Probability of Error P E (a) 0 0.5 1 0 0.2 0.4 0.6 0.8 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 P 1,0 P 0,1 Probability of Error P E (b) Fig. 3. (a) P E as a function of ( P 1 , 0 , P 0 , 1 ) for N = 10 . (b) P E as a function of ( P 1 , 0 , P 0 , 1 ) for N = 11 . of detection is P d = 0 . 8 , the probability of false alarm is P f = 0 . 1 , prior probabilities are ( P 0 = 0 . 4 , P 1 = 0 . 6) , and α = 0 . 37 . Since α < min { (0 . 5 − P f ) , (1 − ( m/P d )) } , where m = N 2 N − 2 , quasi-con ve xity can be observed in Figure 3(a). Figure 3(b) shows the probability of error P E as a function of attacking strate gy ( P 1 , 0 , P 0 , 1 ) for odd number of nodes, N = 11 , in the network. Similarly , quasi-con vexity can be observed in Figure 3(b). It is evident from Figures 3(a) and 3(b) that the optimal attacking strategy ( P 1 , 0 , P 0 , 1 ) is either of the following three possibilities: (1 , 0) , (0 , 1) , or (1 , 1) . These results corroborate our theoretical results presented in Theorem 2. Observe that the results obtained for this case are not the same as the results obtained for the asymptotic case (Please see Theorem 1). This is because the asymptotic performance measure (i.e., Chernof f information) is the exponential decay rate of the error probability of the “optimal detector”. In other words, while optimizing over Chernoff information, one implicitly assumed that the optimal fusion rule is used at the FC. Next, we in vestigate the case where the FC has the kno wledge of attacker’ s strategies and uses the optimal fusion rule K ∗ to make the global decision. Here, the attacker tries to maximize its worst case probability of error min K P E by choosing ( P 1 , 0 , P 0 , 1 ) optimally . V I . O P T I M A L B Y Z A N T I N E A T TAC K I N G S T R A T E G I E S W I T H S T R A T E G Y - AW A R E F C In this section, we analyze the scenario where the FC has the knowledge of attack er’ s strategies and uses the optimal fusion rule K ∗ to make the global decision. The Byzantine attack problem DRAFT 20 can be formally stated as follows: maximize P j, 0 ,P j, 1 P E ( K ∗ , α , P j, 0 , P j, 1 ) subject to 0 ≤ P j, 0 ≤ 1 0 ≤ P j, 1 ≤ 1 , (P3) where K ∗ is the optimal fusion rule. In other words, K ∗ is the best response of the FC to the Byzantine attacking strategies. Ne xt, we find the expression for the optimal fusion rule K ∗ used at the FC. A. Optimal Fusion Rule First, we design the optimal fusion rule assuming that the local sensor threshold λ and the Byzantine attacking strategy ( α, P 1 , 0 , P 0 , 1 ) are fixed and kno wn to the FC. Lemma 4. F or a fixed local sensor thr eshold λ and α < 1 P 0 , 1 + P 1 , 0 , the optimal fusion rule is given by K ∗ H 1 ≷ H 0 ln h ( P 0 /P 1 ) { (1 − π 1 , 0 ) / (1 − π 1 , 1 ) } N i ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] . (37) Pr oof: Consider the maximum a posteriori probability (MAP) rule P ( u | H 1 ) P ( u | H 0 ) H 1 ≷ H 0 P 0 P 1 . Since the u i s are independent of each other , the MAP rule simplifies to N Y i =1 P ( u i | H 1 ) P ( u i | H 0 ) H 1 ≷ H 0 P 0 P 1 . Let us assume that K ∗ out of N nodes send u i = 1 . No w , the above equation can be written as π K ∗ 1 , 1 (1 − π 1 , 1 ) N − K ∗ π K ∗ 1 , 0 (1 − π 1 , 0 ) N − K ∗ H 1 ≷ H 0 P 0 P 1 . DRAFT 21 T aking logarithms on both sides of the above equation, we have K ∗ ln π 1 , 1 + ( N − K ∗ ) ln(1 − π 1 , 1 ) − K ∗ ln π 1 , 0 − ( N − K ∗ ) ln(1 − π 1 , 0 ) H 1 ≷ H 0 ln P 0 P 1 ⇔ K ∗ [ln( π 1 , 1 /π 1 , 0 ) + ln((1 − π 1 , 0 ) / (1 − π 1 , 1 ))] H 1 ≷ H 0 ln P 0 P 1 + N ln((1 − π 1 , 0 ) / (1 − π 1 , 1 )) ⇔ K ∗ H 1 ≷ H 0 ln P 0 P 1 + N ln((1 − π 1 , 0 ) / (1 − π 1 , 1 )) [ln( π 1 , 1 /π 1 , 0 ) + ln((1 − π 1 , 0 ) / (1 − π 1 , 1 ))] (38) ⇔ K ∗ H 1 ≷ H 0 ln h ( P 0 /P 1 ) { (1 − π 1 , 0 ) / (1 − π 1 , 1 ) } N i ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] , where (38) follo ws from the f act that, for π 1 , 1 > π 1 , 0 or equi valently , α < 1 P 0 , 1 + P 1 , 0 , [ln( π 1 , 1 /π 1 , 0 )+ ln((1 − π 1 , 0 ) / (1 − π 1 , 1 ))] > 0 . The probability of false alarm Q F and the probability of detection Q D for this case are as gi ven in (6) and (7) with K = d K ∗ e . Ne xt, we present our results for the case when the fraction of Byzantines α > 1 P 0 , 1 + P 1 , 0 . Lemma 5. F or a fixed local sensor thr eshold λ and α > 1 P 0 , 1 + P 1 , 0 , the optimal fusion rule is given by K ∗ H 0 ≷ H 1 ln h ( P 1 /P 0 ) { (1 − π 1 , 1 ) / (1 − π 1 , 0 ) } N i [ln( π 1 , 0 /π 1 , 1 ) + ln((1 − π 1 , 1 ) / (1 − π 1 , 0 ))] . (39) Pr oof: This can be prov ed similarly as Lemma 4 and using the fact that, for π 1 , 1 < π 1 , 0 or equi valently , α > 1 P 0 , 1 + P 1 , 0 , [ln( π 1 , 0 /π 1 , 1 ) + ln((1 − π 1 , 1 ) / (1 − π 1 , 0 ))] > 0 . The probability of false alarm Q F and the probability of detection Q D for this case can be calculated to be Q F = b K ∗ c X i =0 N i ( π 1 , 0 ) i (1 − π 1 , 0 ) N − i (40) and Q D = b K ∗ c X i =0 N i ( π 1 , 1 ) i (1 − π 1 , 1 ) N − i . (41) Next, we analyze the property of P E with respect to Byzantine attacking strategy ( P 1 , 0 , P 0 , 1 ) that enables us to find the optimal attacking strategies. DRAFT 22 Lemma 6. F or a fixed local sensor thr eshold λ , assume that the FC employs the optimal fusion rule d K ∗ e , 6 as given in (37) . Then, for α ≤ 0 . 5 , the err or pr obability P E at the FC is a monotonically incr easing function of P 1 , 0 while P 0 , 1 r emains fixed. Con versely , the err or pr obability P E at the FC is a monotonically incr easing function of P 0 , 1 while P 1 , 0 r emains fixed. Pr oof: Observe that, for a fix ed λ , P E ( d K ∗ e ) is a continuous but not a differentiable function. Ho wev er , the function is non differentiable only at a finite number (or infinitely countable number) of points because of the nature of d K ∗ e . Now observe that, for a fixed fusion rule K , P E ( K ) is differentiable. Utilizing this fact, to show that the lemma is true, we first find the condition that a fusion rule K should satisfy so that P E is a monotonically increasing function of P 1 , 0 while keeping P 0 , 1 fixed (and vice versa) and later show that d K ∗ e satisfies this condition. From (22), finding those K that satisfy dP E dP 1 , 0 > 0 7 is equiv alent to finding those value of K that make r ( P 1 , 0 , K , α ) > 0 ⇔ ln P 0 P 1 1 − P f 1 − P d + ( K − 1) ln π 1 , 0 π 1 , 1 + ( N − K ) ln 1 − π 1 , 0 1 − π 1 , 1 > 0 ⇔ K < ln P 0 P 1 + N ln (1 − π 1 , 0 ) (1 − π 1 , 1 ) + ln 1 − P f 1 − P d − ln π 1 , 0 π 1 , 1 ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] . (42) Similarly , we can find the condition that a fusion rule K should satisfy so that P E is a mono- tonically increasing function of P 0 , 1 while keeping P 1 , 0 fixed. From (28), finding those K that satisfy dP E dP 0 , 1 > 0 is equiv alent to finding those K that make r ( P 0 , 1 , K , α ) > 0 ⇔ ln P 1 P 0 P d P f + ( K − 1) ln π 1 , 1 π 1 , 0 + ( N − K ) ln 1 − π 1 , 1 1 − π 1 , 0 > 0 ⇔ K > ln P 0 P 1 + N ln (1 − π 1 , 0 ) (1 − π 1 , 1 ) + ln P f P d − ln π 1 , 0 π 1 , 1 ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] . (43) From (42) and (43), we hav e 6 Notice that, K ∗ might not be an integer . 7 Observe that, for α < 0 . 5 , the function g ( P 1 , 0 , K ∗ , α ) = 0 (as giv en in (23)) only under extreme conditions (i.e., P 1 = 0 or P d = 0 or P d = 1 ). Ignoring these extreme conditions, we hav e g ( P 1 , 0 , K ∗ , α ) > 0 . DRAFT 23 A = ln P 0 P 1 + N ln (1 − π 1 , 0 ) (1 − π 1 , 1 ) + ln 1 − P f 1 − P d − ln π 1 , 0 π 1 , 1 ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] > K > ln P 0 P 1 + N ln (1 − π 1 , 0 ) (1 − π 1 , 1 ) + ln P f P d − ln π 1 , 0 π 1 , 1 ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] = B . (44) Next, we sho w that the optimal fusion rule d K ∗ e given in (37) is within the region ( A, B ) . First we prov e that d K ∗ e > B by showing K ∗ > B . Comparing K ∗ gi ven in (37) with B , K ∗ > B if f 0 > ln P f P d − ln π 1 , 0 π 1 , 1 . (45) Since P d > P f , to prov e (45) we start from the inequality (1 − P d ) P d < (1 − P f ) P f ⇔ αP 1 , 0 (1 − P d ) + P d (1 − P 0 , 1 α ) P d < αP 1 , 0 (1 − P f ) + P f (1 − P 0 , 1 α ) P f ⇔ π 1 , 1 P d < π 1 , 0 P f ⇔ 0 > ln P f P d − ln π 1 , 0 π 1 , 1 . No w , we sho w that A > d K ∗ e . Observe that, A > d K ∗ e ⇔ ln 1 − P f 1 − P d − ln π 1 , 0 π 1 , 1 ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] > d K ∗ e − K ∗ . Hence, it is suf ficient to sho w that ln 1 − P f 1 − P d − ln π 1 , 0 π 1 , 1 ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] > 1 > d K ∗ e − K ∗ . DRAFT 24 1 > d K ∗ e − K ∗ is true from the property of the ceiling function. By (55), we hav e 1 − P f 1 − P d > 1 − π 1 , 0 1 − π 1 , 1 ⇔ ln 1 − P f 1 − P d > ln 1 − π 1 , 0 1 − π 1 , 1 ⇔ ln 1 − P f 1 − P d − ln π 1 , 0 π 1 , 1 > ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] ⇔ ln 1 − P f 1 − P d − ln π 1 , 0 π 1 , 1 ln [ { π 1 , 1 (1 − π 1 , 0 ) } / { π 1 , 0 (1 − π 1 , 1 ) } ] > 1 which completes the proof. Based on Lemma 6, we present the optimal attacking strategies for the case when the FC has the knowledge regarding the strategies used by the Byzantines. Theorem 3. The optimal attacking strate gies, ( P ∗ 1 , 0 , P ∗ 0 , 1 ) , which maximize the pr obability of err or , P E ( d K ∗ e ) , ar e given by ( P ∗ 1 , 0 , P ∗ 0 , 1 ) ( p 1 , 0 , p 0 , 1 ) if α > 0 . 5 (1 , 1) if α ≤ 0 . 5 wher e ( p 1 , 0 , p 0 , 1 ) satisfies α ( p 1 , 0 + p 0 , 1 ) = 1 . Pr oof: Note that, the maximum probability of error occurs when the posterior probabilities are equal to the prior probabilities of the hypotheses. That is, P ( H i | u ) = P ( H i ) for i = 0 , 1 . (46) No w using the result from (13), the condition can be simplified to α ( P 1 , 0 + P 0 , 1 ) = 1 . (47) Eq. (47) suggests that when α ≥ 0 . 5 , the attacker can find flipping probabilities that make P E = min { P 0 , P 1 } . When α = 0 . 5 , P 1 , 0 = P 0 , 1 = 1 is the optimal attacking strategy and when α > 0 . 5 , any pair which satisfies P 1 , 0 + P 0 , 1 = 1 α is optimal. Ho wev er , when α < 0 . 5 , (47) cannot be satisfied. In this case, by Lemma 6, for α < 0 . 5 , (1 , 1) is an optimal attacking strategy , ( P 1 , 0 , P 0 , 1 ) , which maximizes probability of error , P E ( d K ∗ e ) . DRAFT 25 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.4 0.42 P 1,0 P 0,1 min K P E (a) 0 0.2 0.4 0.6 0.8 1 0 0.5 1 0.2 0.25 0.3 0.35 0.4 0.45 P 1,0 P 0,1 min K P E (b) Fig. 4. Minimum probability of error ( min K P E ) analysis. (a) min K P E as a function of ( P 1 , 0 , P 0 , 1 ) for α = 0 . 4 . (b) min K P E as a function of ( P 1 , 0 , P 0 , 1 ) for α = 0 . 8 . Next, to gain insight into Theorem 3, we present illustrativ e examples that corroborate our results. B. Illustrative Examples In Figure 4, we plot the minimum probability of error as a function of attacker’ s strategy ( P 1 , 0 , P 0 , 1 ) , where P E is minimized over all possible fusion rules K . W e consider a N = 11 node network, with the nodes’ detection and false alarm probabilities being 0 . 6 and 0 . 4 , respectiv ely . Prior probabilities are assumed to be P 0 = 0 . 4 and P 1 = 0 . 6 . Observe that, the optimal fusion rule as giv en in (37) changes with attacker’ s strategy ( P 1 , 0 , P 0 , 1 ) . Thus, the minimum probability of error min K P E is a non-differentiable function. It is e vident from Figure 4(a) that ( P 1 , 0 , P 0 , 1 ) = (1 , 1) maximizes the probability of error , P E ( d K ∗ e ) . This corroborates our theoretical results presented in Theorem 3, that for α < 0 . 5 , the optimal attacking strategy , ( P 1 , 0 , P 0 , 1 ) , that maximizes the probability of error , P E ( d K ∗ e ) , is (1 , 1) . In Figure 4(b) we consider the scenario where α = 0 . 8 (i.e., α > 0 . 5 ). It can be seen that the attacking strate gy ( P 1 , 0 , P 0 , 1 ) , that maximizes min K P E is not unique in this case. It can be verified that any attacking strate gy which satisfies P 1 , 0 + P 0 , 1 = 1 0 . 8 will make min K P E = min { P 0 , P 1 } = 0 . 4 . This corroborates our theoretical results presented in Theorem 3. Observe that the results obtained for this case are consistent with the results obtained for the asymptotic case. This is because the optimal fusion rule is used at the FC and the asymptotic DRAFT 26 performance measure (i.e., Chernof f information) is the e xponential decay rate of error probability of the “optimal detector”, and thus, implicitly assumes that the optimal fusion rule is used at the FC. When the attacker does not hav e the knowledge of the fusion rule K used at the FC, from an attacker’ s perspectiv e, maximizing its local probability of error P e is the optimal attacking strategy . The optimal attacking strategy in this case is either of the three possibilities: ( P 1 , 0 , P 0 , 1 ) = (0 , 1) or (1 , 0) or (1 , 1) (see T able II). Ho wev er , the FC has knowledge of the attacking strategy ( α, P 1 , 0 , P 0 , 1 ) and thus, uses the optimal fusion rule as giv en in (37) and (39). V I I . C O N C L U S I O N A N D F U T U R E W O R K W e considered the problem of distributed Bayesian detection with Byzantine data, and char- acterized the power of attack analytically . For distributed detection for a binary hypothesis testing problem, the expression for the minimum attacking po wer above which the ability to detect is completely destroyed was obtained. W e showed that when there are more than 50% of Byzantines in the network, the data fusion scheme becomes blind and no detector can achie ve any performance gain ov er the one based just on priors. The optimal attacking strategies for Byzantines that degrade the performance at the FC were obtained. It was sho wn that the results obtained for the non-asymptotic case are consistent with the results obtained for the asymptotic case only when the FC has the knowledge of the attacker’ s strate gies, and thus, uses the optimal fusion rule. Howe ver , results obtained for the non-asymptotic case, when the FC does not have kno wledge of attacker’ s strategies, are not the same as the results obtained for the asymptotic case. There are still many interesting questions that remain to be explored in the future work such as an analysis of the scenario where Byzantines can also control sensor thresholds used for making local decisions. Other questions such as the case where Byzantines collude in sev eral groups (collaborate) to degrade the detection performance can also be in vestigated. A C K N O W L E D G M E N T This work was supported in part by AR O under Grant W911NF-14-1-0339, AFOSR under Grant F A9550-10-1-0458 and National Science Council of T aiwan, under grants NSC 99-2221- E-011-158 -MY3, NSC 101-2221-E-011-069 -MY3. Han’ s work was completed during his visit to Syracuse Univ ersity from 2012 to 2013. DRAFT 27 A P P E N D I X A P R O O F O F dr ( P 1 , 0 , K ∗ , α ) dP 1 , 0 > 0 Dif ferentiating both sides of r ( P 1 , 0 , K ∗ , α ) with respect to P 1 , 0 , we get dr ( P 1 , 0 , K ∗ , α ) dP 1 , 0 = ( K ∗ − 1) α 1 − P f π 1 , 0 − 1 − P d π 1 , 1 − ( N − K ∗ ) α 1 − P f 1 − π 1 , 0 − 1 − P d 1 − π 1 , 1 . In the following we show that dr ( P 1 , 0 , K ∗ , α ) dP 1 , 0 > 0 (48) i.e., r ( P 1 , 0 , K ∗ , α ) is non-decreasing. Observe that in the abov e equation, (1 − P f ) π 1 , 0 > (1 − P d ) π 1 , 1 . (49) T o sho w that the above condition is true, we start from the inequality P d > P f (50) ⇔ P d 1 − P d > P f 1 − P f (51) ⇔ αP 1 , 0 + (1 − P 0 , 1 α ) P d 1 − P d > αP 1 , 0 + (1 − P 0 , 1 α ) P f 1 − P f (52) ⇔ αP 1 , 0 (1 − P d ) + P d (1 − P 0 , 1 α ) (1 − P d ) > αP 1 , 0 (1 − P f ) + P f (1 − P 0 , 1 α ) (1 − P f ) (53) ⇔ π 1 , 1 (1 − P d ) > π 1 , 0 (1 − P f ) (54) ⇔ (1 − P f ) π 1 , 0 > (1 − P d ) π 1 , 1 (55) Similarly , it can be shown that 1 − π 1 , 1 1 − P d > 1 − π 1 , 0 1 − P f (56) No w from (49) and (56), to show that dr ( P 1 , 0 , K ∗ , α ) dP 1 , 0 > 0 is equiv alent to show that ( K ∗ − 1) 1 − P f π 1 , 0 − 1 − P d π 1 , 1 > ( N − K ∗ ) 1 − P f 1 − π 1 , 0 − 1 − P d 1 − π 1 , 1 (57) Next, we consider two dif ferent cases, first when there are odd number of nodes in the network and second when there are ev en number of nodes in the network. DRAFT 28 Odd Number of Nodes: When there are odd number of nodes in the network, the majority fusion rule is K ∗ = ( N + 1) / 2 . In this case (57) is equiv alent to show that N − 1 2 1 − P f π 1 , 0 − 1 − P d π 1 , 1 > N − 1 2 1 − P f 1 − π 1 , 0 − 1 − P d 1 − π 1 , 1 . (58) T o sho w that the above condition is true, we start from the following inequality (1 − π 1 , 0 )(1 − π 1 , 1 ) π 1 , 0 π 1 , 1 > − 1 ⇔ 1 π 1 , 0 − 1 π 1 , 1 > 1 1 − π 1 , 0 − 1 1 − π 1 , 1 ⇔ 1 π 1 , 0 − 1 1 − π 1 , 0 > 1 π 1 , 1 − 1 1 − π 1 , 1 Since 1 − P f 1 − P d > 1 , π 1 , 0 < 0 . 5 (consequence of our assumption) and N ≥ 2 , the abov e condition is equiv alent to 1 − P f 1 − P d 1 π 1 , 0 − 1 1 − π 1 , 0 > 1 π 1 , 1 − 1 1 − π 1 , 1 ⇔ 1 − P f π 1 , 0 − 1 − P d π 1 , 1 > 1 − P f 1 − π 1 , 0 − 1 − P d 1 − π 1 , 1 ⇔ N − 1 2 1 − P f π 1 , 0 − 1 − P d π 1 , 1 > N − 1 2 1 − P f 1 − π 1 , 0 − 1 − P d 1 − π 1 , 1 (59) which implies that dr ( P 1 , 0 , K ∗ , α ) dP 1 , 0 > 0 for odd number of nodes case. Next, we consider the e ven number of nodes case. Even Number of Nodes: Now , we consider the case when there are ev en number of nodes in the network and majority fusion rule is giv en by K ∗ = N 2 + 1 . Condition (57) is equiv alent to sho w that N 2 1 − P f π 1 , 0 − 1 − P d π 1 , 1 > N 2 − 1 1 − P f 1 − π 1 , 0 − 1 − P d 1 − π 1 , 1 . Which follows from the fact that N 2 1 − P f π 1 , 0 − 1 − P d π 1 , 1 > N 2 − 1 1 − P f π 1 , 0 − 1 − P d π 1 , 1 and the result gi ven in (58). This completes our proof. DRAFT 29 A P P E N D I X B C A L C U L A T I N G P A R T I A L D E R I V A T I V E O F P E W . R . T . P 1 , 0 First, we calculate the partial deriv ati ve of Q F with respect to P 1 , 0 . Notice that, Q F = N X i = K ∗ N i ( π 1 , 0 ) i (1 − π 1 , 0 ) N − i (60) where π 1 , 0 = α ( P 1 , 0 (1 − P f ) + (1 − P 0 , 1 ) P f ) + (1 − α ) P f (61) ( π 1 , 0 ) 0 = dπ 1 , 0 /dP 1 , 0 = α (1 − P f ) . (62) Dif ferentiating both sides of (60) with respect to P 1 , 0 , we get dQ F dP 1 , 0 = N K ∗ ( K ∗ ( π 1 , 0 ) K ∗ − 1 ( π 1 , 0 ) 0 (1 − π 1 , 0 ) N − K ∗ − ( π 1 , 0 ) K ∗ ( N − K ∗ )(1 − π 1 , 0 ) N − K ∗ − 1 ( π 1 , 0 ) 0 ) + N K ∗ + 1 (( K ∗ + 1)( π 1 , 0 ) K ∗ ( π 1 , 0 ) 0 (1 − π 1 , 0 ) N − K ∗ − 1 − ( π 1 , 0 ) K ∗ +1 ( N − K ∗ − 1) ( 1 − π 1 , 0 ) N − K ∗ − 2 ( π 1 , 0 ) 0 ) + · · · + N N ( N ( π 1 , 0 ) N − 1 ( π 1 , 0 ) 0 − 0) = ( π 1 , 0 ) 0 ( π 1 , 0 ) K ∗ − 1 (1 − π 1 , 0 ) N − K ∗ " N K ∗ K ∗ − π 1 , 0 1 − π 1 , 0 ( N − K ∗ ) + N K ∗ + 1 ( K ∗ + 1) π 1 , 0 1 − π 1 , 0 − ( N − K ∗ − 1) π 1 , 0 1 − π 1 , 0 2 ! + · · · # = ( π 1 , 0 ) 0 ( π 1 , 0 ) K ∗ − 1 (1 − π 1 , 0 ) N − K ∗ " N K ∗ ( K ∗ − π 1 , 0 1 − π 1 , 0 ( N − K ∗ )) + π 1 , 0 1 − π 1 , 0 N K ∗ + 1 ( K ∗ + 1) − ( N − K ∗ − 1) π 1 , 0 1 − π 1 , 0 + · · · # = ( π 1 , 0 ) 0 ( π 1 , 0 ) K ∗ − 1 (1 − π 1 , 0 ) N − K ∗ " N K ∗ K ∗ + " − π 1 , 0 1 − π 1 , 0 N K ∗ ( N − K ∗ ) DRAFT 30 + π 1 , 0 1 − π 1 , 0 N K ∗ + 1 ( K ∗ + 1) # + · · · # Since, N K ∗ K ∗ N = N − 1 K ∗ − 1 , the abov e equation can be written as dQ F dP 1 , 0 = ( π 1 , 0 ) 0 ( π 1 , 0 ) K ∗ − 1 (1 − π 1 , 0 ) N − K ∗ " N − 1 K ∗ − 1 N + π 1 , 0 1 − π 1 , 0 ( N K ∗ + 1 ( K ∗ + 1) − N K ∗ ( N − K ∗ ) ) + · · · # . (63) Notice that, for any positi ve integer t π 1 , 0 1 − π 1 , 0 t N K ∗ + t ( K ∗ + t ) − N K ∗ + t − 1 ( N − K ∗ − t + 1) = 0 . (64) Using the result from (64), (63) can be written as dQ F dP 1 , 0 = ( π 1 , 0 ) 0 ( π 1 , 0 ) K ∗ − 1 (1 − π 1 , 0 ) N − K ∗ N − 1 K ∗ − 1 N + π 1 , 0 1 − π 1 , 0 [0] + · · · + [0] ⇔ dQ F dP 1 , 0 = α (1 − P f ) N N − 1 K ∗ − 1 ( π 1 , 0 ) K ∗ − 1 (1 − π 1 , 0 ) N − K ∗ . Similarly , the partial deri vati ve of Q D w .r .t. P 1 , 0 can calculated to be dQ D dP 1 , 0 = α (1 − P d ) N N − 1 K ∗ − 1 ( π 1 , 1 ) K ∗ − 1 (1 − π 1 , 1 ) N − K ∗ . R E F E R E N C E S [1] P . K. V arshney , Distributed Detection and Data Fusion . Ne w Y ork:Springer-V erlag, 1997. [2] R. V iswanathan and P . K. V arshney , “Distributed detection with multiple sensors: Part I - Fundamentals, ” Pr oc. IEEE , vol. 85, no. 1, pp. 54 –63, Jan 1997. [3] V . V eera valli and P . K. V arshney , “Distributed inference in wireless sensor networks, ” Philosophical T ransactions of the Royal Society A: Mathematical, Physical and Engineering Sciences , vol. 370, pp. 100–117, 2012. [4] B. Wu, J. Chen, J. Wu, and M. Cardei, “ A survey of attacks and countermeasures in mobile ad hoc networks, ” W ir eless/Mobile Network Security , Springer , vol. 17, pp. 103–135, 2007. DRAFT 31 [5] S. A. Kassam and H. V . Poor , “Robust techniques for signal processing: A survey , ” Pr oc. IEEE , vol. 73, no. 3, pp. 433–481, 1985. [6] S. Jaggi, M. Langberg, S. Katti, T . Ho, D. Katabi, and M. Medard, “Resilient network coding in the presence of byzantine adversaries, ” in Proc. 26th IEEE Int. Conf. on Computer Commun., INFOCOM, (Anchorag e, AK) , 2007, pp. 616–624. [7] L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem, ” ACM T rans. Pro gram. Lang. Syst. , vol. 4, no. 3, pp. 382–401, Jul. 1982. [Online]. A vailable: http://doi.acm.org/10.1145/357172.357176 [8] A. V empaty , L. T ong, and P . V arshney , “Distributed Inference with Byzantine Data: State-of-the-Art Revie w on Data Falsification Attacks, ” Signal Processing Magazine, IEEE , vol. 30, no. 5, pp. 65–75, 2013. [9] A. Fragkiadakis, E. Tragos, and I. Askoxylakis, “ A surv ey on security threats and detection techniques in cognitive radio networks, ” IEEE Communications Surveys T utorials , vol. 15, no. 1, pp. 428–445, 2013. [10] H. Rif ` a-Pous, M. J. Blasco, and C. Garrigues, “Revie w of robust cooperativ e spectrum sensing techniques for cognitiv e radio networks, ” W irel. P ers. Commun. , vol. 67, no. 2, pp. 175–198, Nov . 2012. [Online]. A vailable: http://dx.doi.org/10.1007/s11277- 011- 0372- x [11] S. Marano, V . Matta, and L. T ong, “Distributed detection in the presence of byzantine attacks, ” IEEE T rans. Signal Pr ocess. , vol. 57, no. 1, pp. 16 –29, Jan. 2009. [12] A. Ra wat, P . Anand, H. Chen, and P . V arshne y , “Collaborative spectrum sensing in the presence of byzantine attacks in cognitiv e radio networks, ” IEEE T rans. Signal Process. , vol. 59, no. 2, pp. 774 –786, Feb 2011. [13] B. Kailkhura, S. Brahma, and P . K. V arshney , “Optimal byzantine attack on distributed detection in tree based topologies, ” in Pr oc. International Confer ence on Computing, Networking and Communications W orkshops (ICNC-2013) , San Diego, CA, January 2013, pp. 227–231. [14] B. Kailkhura, S. Brahma, Y . S. Han, and P . K. V arshney , “Optimal distrib uted detection in the presence of byzantines, ” in Pr oc. The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013) , V ancouver , Canada, May 2013. [15] A. V empaty , K. Agrawal, H. Chen, and P . K. V arshney , “ Adaptive learning of byzantines’ behavior in cooperativ e spectrum sensing, ” in Pr oc. IEEE W ireless Comm. and Networking Conf. (WCNC) , march 2011, pp. 1310 –1315. [16] B. Kailkhura, S. Brahma, Y . S. Han, and P . K. V arshney , “Distributed Detection in T ree T opologies W ith Byzantines, ” IEEE T rans. Signal Process. , vol. 62, pp. 3208–3219, June 2014. [17] R. Chen, J.-M. Park, and K. Bian, “Robust distributed spectrum sensing in cognitive radio networks, ” in Pr oc. 27th Conf. Comput. Commun., Phoenix, AZ , 2008, pp. 1876–1884. [18] E. Soltanmohammadi, M. Orooji, and M. Naraghi-Pour, “Decentralized hypothesis testing in wireless sensor networks in the presence of misbehaving nodes, ” IEEE T rans. Inf. F orensics Security , vol. 8, no. 1, pp. 205–215, 2013. [19] B. Kailkhura, Y . S. Han, S. Brahma, and P . K. V arshney , “Asymptotic Analysis of Distributed Bayesian Detection with Byzantine Data, ” CoRR , vol. abs/1408.3434, 2014. [Online]. A vailable: http://arxiv .org/abs/1408.3434 [20] B. Kailkhura, Y . Han, S. Brahma, and P . V arshney , “On Covert Data Falsification Attacks on Distributed Detection Systems, ” in Communications and Information T echnologies (ISCIT), 2013 13th International Symposium on , Sept 2013, pp. 412–417. [21] J. N. Tsitsiklis, “Decentralized detection by a lar ge number of sensors*, ” Math. contr ol, Signals, and Systems , vol. 1, pp. 167–182, 1988. [22] W . Shi, T . W . Sun, and R. D. W esel, “Optimal binary distrib uted detection, ” in Pr oc. The 33rd Asilomar Conference on Signals, Systems, and Computers , 1999, pp. 24–27. DRAFT 32 [23] Q. Zhang, P . V arshney , and R. W esel, “Optimal bi-le vel quantization of i.i.d. sensor observations for binary hypothesis testing, ” IEEE T rans. Inf . Theory , vol. 48, no. 7, pp. 2105 –2111, jul 2002. DRAFT
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment