Data Fusion Trees for Detection: Does Architecture Matter?

Data Fusion T rees for Detection: Does Architect ure Matter? W ee Peng T ay , Student Member , IEEE, John N. Tsitsiklis, F ellow , IEEE, and Moe Z. W in, F ellow , IEEE Abstract W e consider the problem of d ecentralized detection in a network consisting o f a large num ber of nod es arranged as a tree of bou nded height, under the assumption of cond itionally independen t, identically distributed observations. W e characterize the optim al error exponent under a Neyman-Pearson formu lation. W e show that the T ype II err or pro bability decay s expon entially fast with the nu mber o f nodes, and the op timal erro r expon ent is o ften the same as that correspon ding to a parallel conﬁgu ration. W e provide sufﬁcient, as well as necessary , conditions f or this to h appen. For those n etworks satisfy ing the sufﬁcient conditions, we p ropose a simple strategy that nearly achieves the optimal error expon ent, and in which all non -leaf no des need on ly sen d 1-bit messages. Index T erms Decentralized detection, error exponent, sensor networks. I . I N T R O D U C T I O N Most of the decentralized detection literature has been concerned with characterizing opt imal detection strategies for particular sensor conﬁgurations; the comparis on o f the d etection perfor - mance o f different conﬁgurations i s a rather unexplored area. W e bridge thi s gap by cons idering This research was supported, in part, by t he National S cience Foundation under contracts ECS-0426453 and ANI-0335256, the Charles Stark Draper Laboratory Robust Distributed Sensor Networks Program, and an Ofﬁce of Nav al Research Y oung In vestigator A ward N00014-03 -1-0489. A preliminary version of this paper was presented at the 44th Annual Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, September 2006. W .P . T ay , J.N. Tsitsiklis and M.Z. Win are with the Laboratory for Information and Decision Systems, MIT , Cambridge, MA, USA. E-mail: { wptay, jnt, moewin } @mit.edu SUBMITTED TO IEEE TRANS. INFORMA TION THEOR Y 2 the asymp totic performance of bounded height tree networks. W e analyze the dependence of t he optimal error exponent o n the n etwork architecture, and characterize the op timal error exponent for a lar g e class of tree n etworks. The problem of optimal decentralized detection has attracted a lot of interest over the last twenty-ﬁv e years. T enne y and Sandell [1] are the ﬁrst t o consider a decentralized detecti on system in which each of sever al sensors makes an observation and sends a summary (e.g., using a q uantizer or o ther “transm ission function”) to a fusion center . Such a system i s to be contrasted to a centralized one, where the raw obs erv ations are transm itted directly to the fusion center . The frame work introduced in [1] in v olves a “star topolo gy” or “parallel conﬁguration”: the fus ion center is regarded as the root of a tree, while the senso rs are the leav es, di rectly connected to the root. Seve ral pieces of work foll ow , e.g., [2]–[12], all of wh ich s tudy the parallel conﬁguratio n und er a Ne yman-Pearson or Bayesian criterion. A comm on goal of these references is to characterize the optim al transmission function, where optimality usually refers to the minimization of the probability of error or som e other cost funct ion at th e fusion center . A typical result i s that und er th e assu mption of (conditionally) i ndependent sens or o bserva tions, likelihood ratio quanti zers are o ptimal; see [6] for a summary of such results. The st udy of sensor networks o ther than the parallel conﬁguration i s in itiated in [13], which considers a tandem conﬁguration, as well as more general tree conﬁgurations , and character- izes optimal t ransmission strategies under a Ba yesian formulation. T ree conﬁgurations are also discussed in [14]–[21 ], under various performance objective s. In al l but the simplest cases, the exact form of optim al s trategies in tree conﬁgurations is difﬁcult to derive . M ost of t hese references focus o n person-by-person (PBP) optimality and obt ain necessary , but no t sufﬁcient, conditions for an optim al st rategy . When the transmissi on functions are assumed to be ﬁnite- alphabet q uantizers, typical results establish that under a condit ional independence ass umption, likelihood ratio quantizers are PBP optimal. Ho wev er , ﬁnding the optimal quantizer t hresholds requires the soluti on of a nonl inear sy stem of equations, with as many equations as there are thresholds. As a consequence, computing the optimal thresho lds or characterizing the overall performance is hard, e ven for networks of mo derate size. Because of th ese dif ﬁcult ies, t he analysis and comparison of lar ge sens or networks is ap- parently tractable only in an asymptotic regime that focuses on the rate of decay of the error probabilities as the number of sensors increases. For example, in the Neyman-Pearson framework, October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 3 one can focus on m inimizing the error exponent 1 g = lim sup n →∞ 1 n log β n , where β n is the T ype II error prob ability at t he fusion center and n is th e number of sensors, while keeping the T ype I error probability less than som e given threshold. Note our con vention that error exponents ar e negati ve numbers. The magnitude of the error exponent, | g | , is commonly referred to as the rate of decay of the T ype II error probability . A l ar ger | g | would translat e to a faster decay rate, hence a better detection performance. This probl em has been studied i n [22], for the case of a parallel conﬁguration with a large number of sensors that receive independent, identically distributed (i.i.d. ) observations. The asympt otic performance of ano ther special conﬁguration, i n volving n sens ors arranged in tandem, has been studi ed in [23]–[25], under a Ba yesian formulation. Necessary and sufﬁcient conditions for the error probability to dec rease to zero as n increases ha ve been derived. Howe ver , e ven wh en the error probabi lity decreases to zero, it apparently does so at a sub-exponential rate (see [26] for such a result for t he Bayesian case). Accordingly , [25] ar g ues that the tandem conﬁguration i s inef ﬁcient and suggest s that as the numb er of sensors increases, the network “should expand more in a parallel than in [a] tandem” fashion. Even though the error probabilities in a parallel conﬁguration decrease exponentially , the ener gy consum ption o f having each senso r transmit directly to the fus ion center can b e too high. The energy consum ption can be reduced by setti ng up a di rected spanning in-tree, rooted at the fusion center . In a tree conﬁguration, each non-leaf node combi nes i ts own observation (if any) with t he m essages it has received and forms a new m essage, which it transmits to another node. In th is way , informatio n from each node i s propagated along a multi-hop path t o th e fusion center , but t he informati on is “degraded” al ong the way . For the case where observations are obtained only at the leav es, it is not hard to see that the detection performance of such a tree cannot be better than that o f a parallel conﬁguratio n wit h the same number of lea ves. In this paper , we in vestigate the detection perf ormance of a tree conﬁguration un der a Neyman- Pearson criterion. W e restrict to trees with bounded height for t wo reasons. First, without a restriction on the height of t he tree, performance can be poor (this is exempliﬁed by tandem 1 Throughout this paper , log st ands for the natural logarithm. October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 4 networks in which, as rem arked above, t he error probabil ity seems to decay at a sub-exponential rate). Second, bounded height transl ates to a bound on the delay until informatio n reaches the fusion center . As it is not apparent that t he T ype II error probabil ity decays exponentially f ast with t he number of nodes in the network, we ﬁrst s how that un der the bound ed height assum ption, exponential decay i s p ossible. W e then obtain the rather count erintuitive result t hat i f lea ves dominate (in the sense that asym ptotically almost all nodes are lea ves), then bound ed height trees hav e the same asymptotic performance as the parallel conﬁguration, eve n in non-tri vial cases. (Such an equality is clear in som e trivial cases, e.g., the conﬁguration shown in Figure 1, but is u nexpected i n general.) This result has important rami ﬁcations: a system desig ner can reduce the energy cons umption in a network (e.g., by empl oying an h -hop spanning tree that minimizes the overall energy con sumption), without losing detectio n ef ﬁciency , under certain conditions. v 1 v 2 f { n − h v h − 1 Fig. 1. A tree network of height h , w ith n − h leaves. Its error probability is no larger than that of a parallel conﬁguration with n − h leav es and a fusion center . If h is bounded while n increases, the optimal error expone nt is the same as for a parallel conﬁguration with n l eav es. W e also provide a strategy i n which each no n-leaf node sends only a 1-bit message, and which nearly achieves the same performance as th e parallel conﬁguration. These result s are counterintuitive for the following reasons: 1) messages are compressed to o nly one bit at each non-leaf no de so that “in formation” is lost along the way , whereas in the parallel conﬁguration, no s uch comp ression occurs; 2) ev en th ough leave s domi nate, there is no reason why t he error exponent will be determined so lely by the leaves. F or example, our discussi on in Section V -E indicates th at without the bounded height assum ption, or i f a Bayesian frame work is assum ed instead of t he Neyman-Pearson formulati on, then a generic tree network (of height g reater t han 1) performs strictly worse than a parallel conﬁguration, e ven if lea ves dom inate. Finally , und er a m ild additional assumpt ion on the all owed transmission functio ns, we ﬁnd October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 5 that the suf ﬁcient conditions for achieving t he same error exponent as a parallel conﬁguration, are als o necessary . The rest of this paper is organized as follows. In Section II , we present our model i n detail . In Section III, we state the Neyman-Pearson prob lem, provide some motiva ting examples, and state the m ain results . In Section IV, we consider “relay t rees, ” in which observations are o nly made at the leaves. In Section V, w e prove the main results. Finally , in Section VI, we sum marize and offer some conclu ding remarks. I I . P RO B L E M F O R M U L A T I O N In this section, we introdu ce t he model and the required not ation. W e consider a decentralized binary d etection problem in volving n − 1 sensors and a fusio n center; we will be int erested in the case wh ere n increases to inﬁnit y . W e are giv en two probability spaces (Ω , F , P 0 ) and (Ω , F , P 1 ) , associated with two hypotheses H 0 and H 1 . W e use E j to denot e t he expectation operator with respect to P j . Each s ensor v observes a random variable X v taking values i n som e set X . Under either hypothesis H j , j = 0 , 1 , the random variables X v are i.i.d., with marginal distribution P X j . A. T r ee Networks The conﬁguratio n of the sensor n etwork is represented by a directed tree T n = ( V n , E n ) . Here, V n is the set o f nod es, of cardinalit y n , and E n is the set o f directed arcs of the tree. One of the nodes (t he “root”) represents th e fusion center , and the remaining n − 1 nodes represent the remaining s ensors. W e wil l always use the special sym bol f to denote th e root of T n . W e assume that the arcs are oriented so that t hey all point t ow ards the fusion center . In the sequel, whenev er we use the t erm “tree”, we mean a directed, rooted tree as described abov e. W e will use the terminolo gy “sensor” and “node” i nterchangeably . Moreover , the fusion center f will also be called a sensor , ev en though it plays the special role of fusing ; whether the fusion center makes its own observation or not is irrele va nt, s ince we are working in the large n regime, and we will assume i t does not. W e say th at nod e u is a pre decessor of node v if there exists a directed path from u to v . In this case, we also say that v is a successor of u . An immedi ate pr edecessor of nod e v is a node u such that ( u, v ) ∈ E n . An immediate successor is s imilarly deﬁned. Let the s et of i mmediate October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 6 predecessors of v be C n ( v ) . If v is a leaf, C n ( v ) is naturally deﬁned to be empt y . The length of a path is deﬁned as the numb er of arcs in the path. The height of the tree T n is the length of the lo ngest path from a leaf to the root, and will be denoted by h n . Since we are interested in asymptotically large v alues of n , we will consider a s equence of trees ( T n ) n ≥ 1 . While we could t hink of the sequence as representing the ev olution of the n etwork as sensors are added, we do not require the sequence E n to b e an increasing sequence of sets; thus, th e addit ion of a new sensor to T n may result in som e edges being deleted and some new edges b eing added. W e deﬁne the h eight of a sequence of trees to be h = sup n ≥ 1 h n . W e are interested in tree sequences of boun ded height, i .e., h < ∞ . Deﬁnition 1 ( h -uniform tr ee): A tree T n is said to be h -uniform if the length of ev ery path from a l eaf to the root is exactly h . A sequence of trees ( T n ) n ≥ 1 is said t o be h -uni form if there exists so me n 0 < ∞ , so that for all n ≥ n 0 , T n is h -uniform. For a tree with heig ht h , we say that a node is at level k if it i s connected to th e fusion center via a p ath of length h − k . Hence the fusion center f is at level h , while in an h -uniform tree, all leav es are at lev el 0. Let l n ( v ) be t he number of l ea ves of the sub-tree rooted at the node v . (These are t he leav es whose path t o f goes t hrough v .) Thus, l n ( f ) is the to tal numb er of leav es. Let p n ( v ) be the total n umber of predecessors of v , i.e., the total n umber of nodes in t he s ub-tree rooted at v , not counting v itself. Thus, p n ( f ) = n − 1 . W e let A n ⊂ V n be the set of nodes wh ose immediate predecessors include leaves of the tree T n . Finally , we let B n ⊂ A n be the set of no des all of whose predecessors are leav es; see Figure 2. v u Fig. 2. Both nodes v and u belong to the set A n , but only node u belongs to the set B n . October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 7 B. St rate gies Giv en a tree T n , cons ider a node v 6 = f . No de v receive s messages Y u from every u ∈ C n ( v ) (i.e., from its immediat e predecessors). Node v th en uses a transmission fun ction γ v to encode and transmit a summary Y v = γ v ( X v , { Y u : u ∈ C n ( v ) } ) of its own observation X v , and of the recei ved m essages { Y u : u ∈ C n ( v ) } , to its imm ediate successor . 2 W e con strain all messages to be symbols in a ﬁxed alphabet T . Thus, if the in-degree of v is | C n ( v ) | = d , then the transmissio n function γ v maps X × T d to T . Let Γ( d ) be a given set of transm ission functions that the node v can choose from. In general, Γ( d ) is a subset of the set of all possibl e mappings from X × T d to T . For example, Γ( d ) is often assum ed to be the set of qu antizers whose output s are the result of comparing likelihood rati os t o some t hresholds (cf. the deﬁnition of a Log-Likelihood Ratio Qu antizer in Section III-B). For con venience, we denote t he set of transmi ssion functions for the lea ves, Γ(0 ) , by Γ . W e assum e that all transmissions are perfectly reliable. Consider now the root f , and supp ose that i t has d im mediate predecessors. It receives messages from its immediate predecessors, and based on this informatio n, i t decides between the two hypo theses H 0 and H 1 , using a fusion rule γ f : T d 7→ { 0 , 1 } . 3 Let Y f be a binary-valued random variable i ndicating th e decision of the fusion center . W e d eﬁne a strate gy for a tree T n , wit h n − 1 nodes and a fusion center , as a collection of transmi ssion functions, one for each nod e, and a fusion rule. In some cases, we wil l be considering s trategies in which only the leav es make observations; e very other node v simply fuses the messages it has receiv ed, and forwards a message Y v = γ v ( { Y u : u ∈ C n ( v ) } ) to it s immediate successor . A strategy of this type will be called a r elay strate gy . A t ree network in which we restrict to relay strategies will be called a r elay tr ee . If in additi on, the alphabet T is binary , we will use the terms 1 -bit r elay strate gy and 1-bi t r elay tr ee . Finally , i n a relay tree, nodes other than the roo t and the leav es will be called r elay nodes . 2 T o simplify the notation, we suppress the dependen ce of X v , Y v , γ v , etc. on n . 3 Recall that in centralized N eyman -Pearson detection, randomization can reduce t he T ype II error probability . Therefore, in general, the fusion center uses a randomized fusion rule to make it s decision. Simil arly , the transmission functions γ v used by each node v , can also be randomized. W e avoid any discussion of randomization to simplify t he exposition, and because randomization is not required asymptotically , as wi ll become apparent i n Section V. October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 8 I I I . T H E N E Y M A N - P E A R S O N P RO B L E M In this sectio n, we formulate the Neyman-Pearson decentralized detection probl em in a tree network. W e provide so me motivating examples, and in troduce our assumpti ons. Then, we give a sum mary of the mai n results. Giv en a tree T n , we require that the T ype I error probability P 0 ( Y f = 1) be no more than a giv en α ∈ (0 , 1) . A st rategy is sai d t o be admiss ible if it meets this constraint. W e are interested in minimi zing the T ype II error probability P 1 ( Y f = 0) . Accordingly , we deﬁne β ∗ ( T n ) as the i nﬁmum o f P 1 ( Y f = 0) , over all admissible strategies. Similarly , we deﬁne β ∗ R ( T n ) as the inﬁmum of P 1 ( Y f = 0) , ove r all admi ssible relay s trategies. T ypically , β ∗ ( T n ) or β ∗ R ( T n ) will con verge to zero as n → ∞ . W e are interested in th e q uestion of whether such con vergence takes place exponenti ally fast, and in the e xact value of the T ype II error exponent, deﬁned by g ∗ = lim sup n →∞ 1 n log β ∗ ( T n ) , g ∗ R = lim sup n →∞ 1 l n ( f ) log β ∗ R ( T n ) . Note that i n the relay case, we us e the total n umber of leaves l n ( f ) instead of n in th e deﬁnition of g ∗ R . This is because only the leav es m ake ob serva tions and therefore, g ∗ R measures the rate of error decay per observation. W e d enote the K ullback-Leibler (KL) diver gence of two probability measures, P and Q , as D( P k Q ) = E P h log d P d Q i , where E P is the expectation operator with respect to (w .r .t.) P . Suppose that X is a sensor observation. F or any γ ∈ Γ , let the distribution of γ ( X ) be P γ j . N ote that − D( P γ 0 k P γ 1 ) ≤ 0 ≤ D( P γ 1 k P γ 0 ) , with bo th inequalities being strict as lon g as the measures P γ 0 and P γ 1 are not indisting uishable. In the classical case of a parallel conﬁguration, with n − 1 leav es directly connected to the fusion center , the optimal error exponent, denot ed as g ∗ P , is gi ven by [22] g ∗ P = lim n →∞ 1 n log β ∗ ( T n ) = − sup γ ∈ Γ D( P γ 0 k P γ 1 ) , (1) under Ass umptions 1-2, stated i n Section III-B below . Our objective is to s tudy g ∗ and g ∗ R for differ ent sequences of trees. In particular , we wish to obtain bo unds on these quantities, de velop condit ions und er wh ich they are strictly n egati ve (indicating exponential decay of error probabilities), and deve lop condition s under wh ich t hey October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 9 are equ al to g ∗ P . At this point, under Assump tions 1-2, we can record two relations t hat are alwa ys true: g ∗ P ≤ g ∗ R , − D( P X 0 k P X 1 ) ≤ g ∗ ≤ z g ∗ R , (2) where z = lim inf n →∞ l n ( f ) /n . The ﬁrst inequali ty is true because all of th e com bining of messages that takes place in a relay network can be carried out int ernally , at t he fusion center of a parallel network with the same number of leav es. The inequality − D( P X 0 k P X 1 ) ≤ g ∗ follows from th e fact that − D( P X 0 k P X 1 ) is the classical error exponent in a centralized system where all raw observations are transmitted directly to t he the fusion center . Finally , the inequality g ∗ ≤ z g ∗ R follows because an o ptimal strategy is at least as good as an optimal relay strategy; the f actor of z arises because we ha ve normalized g ∗ R by l n ( f ) inst ead of n . For a sequence of trees of the form shown in Figure 1, it is easily seen that g ∗ = g ∗ R = g ∗ P . In order to de velop so me insigh ts into the problem, we now consider some less trivial examples. A. Mot ivating Examples In th e following examples, we restri ct to relay strategies for simplicit y , i.e., we are in terested in characterizing the error exponent g ∗ R . Howe ver , most of our sub sequent resul ts hold without such a rest riction, and similar statements can be made about the error exponent g ∗ (cf. Theorem 1). Example 1: Consider a 2-uniform sequence of t rees, as shown in Figure 3, where eac h node v i recei ves m essages from m = ( n − 3) / 2 leave s (for s implicity , we assum e that n is odd). f v 1 v 2 m m Fig. 3. A 2-uniform tree wit h two relay nodes. Let us restrict to 1-bit relay s trategies. Consider the fusion rule that declares H 0 iff bot h v 1 and v 2 send a 0 . In o rder to keep the T ype I error p robability bounded by α , we vie w the October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 10 message by each v i as a local decision about the hypothesis , and requi re that its local T ype I error probability b e bounded by α/ 2 . Furthermore, by viewing the sub-tree rooted at v i as a parallel conﬁguratio n, we can des ign strategies for each sub-tree so that lim n →∞ 1 m log P 1 ( Y v i = 0) = g ∗ P . (3) At the fusion center , the T ype II error exponent i s then given by lim n →∞ 1 n log β n = lim n →∞ 1 n log P 1 ( Y v 1 = 0 , Y v 2 = 0) = 1 2 lim n →∞ 1 m log P 1 ( Y v 1 = 0) + 1 2 lim n →∞ 1 m log P 1 ( Y v 2 = 0) = g ∗ P , where the last equality fol lows from (3). This shows that the T ype II error probability f alls exponentially and, mo re s urprisingly , th at g ∗ R ≤ g ∗ P . In vie w of Eq. (2), we have g ∗ R = g ∗ P . It is not difﬁcult to generalize this con clusion to all sequences of trees i n which the num ber n − l n ( f ) − 1 of relay nodes is bounded. For such sequences, we wi ll also see that g ∗ = g ∗ R (cf. Theorem 1(iii)).  Example 2: W e now consider an example in which the numb er of relay nodes grows with n . In Figure 4, we let bot h m and N be increasing functions of n (the total number of nodes), in a manner to be m ade explicit shortly . v 2 v 1 f m m m v N Fig. 4. A 2-uniform tree wit h a large number of relay nodes. Let u s t ry to appl y a s imilar argument as in Exam ple 1 , t o see whether the optimal exponent of t he parallel conﬁguration can be achie ved with a relay strategy , i.e., whether g ∗ R = g ∗ P . W e let each node v i use a local Neyman-Pearson test. W e als o l et the fusion center d eclare H 0 iff it recei ves a 0 from all relay sensors. In order to ha ve a h ope of achieving th e err or e xponent of October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 11 the parallel con ﬁguration, we need to choose the local Neyman-Pearson test at each relay so that its local T ype II error exponent i s close to g ∗ P = − sup γ ∈ Γ D( P γ 0 k P γ 1 ) . Ho we ver , the associated local T ype I error cannot fall f aster than exponentially , so we can assume it is bounded belo w by δ exp ( − mǫ ) , for some δ, ǫ > 0 , and for all m lar ge enou gh. In that case, the overall T ype I error probability (at the fusion center) is at l east 1 − (1 − δ e − mǫ ) N . W e then n ote that if N increases qui ckly wit h m (e.g., N = m m ), the T yp e I error p robability approaches 1, and e ventually exceeds α . Hence, we no longer hav e an admis sible s trategy . Thus, if there is a hope of achieving the optimal exponent g ∗ P of the p arallel conﬁguration, a m ore complicated fusion rule wil l have to be used.  Our subsequent results will establish that, sim ilar to Exam ple 1, the equalities g ∗ = g ∗ R = g ∗ P also h old in Example 2. Howe ver , Example 2 shows t hat in order to achiev e thi s optim al error exponent, we may need to em ploy nontrivial fusion rules at the fusion cent er (and for sim ilar reasons at the relay nodes), and various thresholds will have to be properly tuned. Th e sim plicity of the fusion rule in Example 1 i s no t representative . In our next example, the opt imal error exponent is inferior (st rictly lar ger) than that of a parallel conﬁguratio n. Example 3: Consider a sequence of 1-bit relay trees with the s tructure shown in Figure 5. Let the obs erv ations X v at the leaves be i.i .d. Bernoulli random v ariables wit h parameter 1 − p v m v 2 v 1 f 2 2 2 Fig. 5. A 2-uniform tree, wi th m = l n ( f ) / 2 . under H 0 , and parameter p under H 1 , where 1 / 2 < p < 1 . Not e that g ∗ P = E 0 h log d P X 1 d P X 0 i = p lo g 1 − p p + (1 − p ) log p 1 − p . W e can ident ify this relay tree with a parallel conﬁguration in volving m nodes, with each node recei ving an independent observation distributed as γ ( X 1 , X 2 ) . N ote t hat w e can restrict October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 12 the transmissi on function γ t o be the sam e for all nodes v 1 , ..., v m [22], witho ut loss of optim ality . W e h a ve lim n →∞ 1 m log β ∗ ( T n ) = min γ ∈ Γ(2) 1 X j =0 P 0  γ ( X 1 , X 2 ) = j  log h P 1  γ ( X 1 , X 2 ) = j  P 0  γ ( X 1 , X 2 ) = j  i . (4) T o minimi ze the right-hand side (R.H.S.) of (4), we only n eed to consider a small number of choices for γ . If γ ( X 1 , X 2 ) = X 1 , we are ef fectiv ely removing half of the origi nal 2 m n odes, and th e resulti ng error e xponent is g ∗ P / 2 , which is inferior t o g ∗ P . Suppos e now that γ i s of the form γ ( X 1 , X 2 ) = 0 iff X 1 = X 2 = 0 . Th en, it i s easy t o see, after some calculations (om itted), that lim n →∞ 1 m log β ∗ ( T n ) = p 2 log (1 − p ) 2 p 2 + (1 − p 2 ) log 1 − (1 − p ) 2 1 − p 2 > 2  p log 1 − p p + (1 − p ) log p 1 − p  , and lim n →∞ 1 l n ( f ) log β ∗ ( T n ) > p log 1 − p p + (1 − p ) log p 1 − p = g ∗ P . Finally , we need to consider γ of the form γ ( X 1 , X 2 ) = 1 iff X 1 = X 2 = 1 . A simil ar calculation (om itted) shows that the resul ting error exponent is again inferior . W e conclude that the relay network is strictly inferior to the parallel conﬁguration, i. e., g ∗ P < g ∗ R . An explanation is provided by noti ng that thi s sequence of trees violates a necessary conditi on, de veloped in Section V -F for the optim al error exponent to b e the same as that of a parallel con ﬁguration; see Theorem 1(i v).  A comp arison of the results for the previous examples suggest s that we ha ve g ∗ P = g ∗ R (respectiv ely , g ∗ P < g ∗ R ) whenever the degree of lev el 1 nodes increases (respectiv ely , stays bounded) as n increases. That would sti ll lea ve open the case of networks in which dif ferent lev el 1 nodes ha ve different degrees, as i n our ne xt example. Example 4: Consider a sequence of 2 -uniform trees of the form shown i n Figu re 6. Each node v i , i = 1 , ..., m , has i + 1 leav es attached to i t. W e will see that the opt imal error exponent is again t he same as for a p arallel conﬁguration, i .e., g ∗ R = g ∗ = g ∗ P . (cf. Theorem 1(ii)).  B. As sumptions In this subs ection, we list ou r assumptions . Assumptions 1 and 2 are s imilar to th e assumptions made in the study of the parallel conﬁguration (see [22]). October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 13 v m v 2 v 1 f 2 3 m + 1 Fig. 6. A 2-uniform tree, wi th l n ( v i ) = i + 1 . Assumption 1: The measures P X 0 and P X 1 are equi valent, i.e., they are abso lutely continuous w .r .t. each other . Furthermore, t here exists s ome γ ∈ Γ such that − D( P γ 0 k P γ 1 ) < 0 < D( P γ 1 k P γ 0 ) . Assumption 2: E 0  log 2 d P X 1 d P X 0  < ∞ . Assumptio n 2 implies the fol lowing lemma; see [22] for a proof. Lemma 1: There exists some a ∈ (0 , ∞ ) , such th at for all γ ∈ Γ , E 0 h log 2 d P γ 1 d P γ 0 i ≤ E 0 h log 2 d P X 1 d P X 0 i + 1 < a, E 0 h    log d P γ 1 d P γ 0    i < a. Giv en an admissible s trategy , and for each node v ∈ V n , we con sider the log -likelihood ratio of the distribution of Y v (the m essage sent by v ) under H 1 , w .r .t. its distrib ution under H 0 , L v,n ( y ) = log d P ( v ) 1 ,n d P ( v ) 0 ,n ( y ) , where d P ( v ) 1 ,n / d P ( v ) 0 ,n is the Radon-Nikodym deri vati ve of the d istribution o f Y v under H 1 w .r .t. that under H 0 . If Y v takes values in a discrete set, then this is j ust the l og-likelihood ratio log  P 1 ( Y v = y ) / P 0 ( Y v = y )  . For simplicity , we l et L v,n = L v,n ( Y v ) and deﬁne the log- likelihood ratio of the receiv ed messages at node v t o be S n ( v ) = X u ∈ C n ( v ) L u,n . (Recall t hat C n ( v ) is t he set of immediate p redecessors of v .) A (1-bit) Log -Likelihood Ratio Quantizer (LLRQ) with threshold t for a non -leaf node v , with | C n ( v ) | = d , is a binary-valued functi on o n T d , deﬁned by LLR Q d,t  { y u : u ∈ C n ( v ) }  =    0 , if x ≤ t, 1 , if x > t, October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 14 where x = 1 l n ( v ) X u ∈ C n ( v ) L u,n ( y u ) . (5) By deﬁnition, a node v that us es a LLRQ i gnores its own observation X v and acts as a relay . If all non-leaf nod es use a LL RQ, we hav e a special case of a relay strategy . W e will assum e t hat LLRQs are a vailable choices of transmissi on functions for all non-leaf nodes. Assumption 3: F or all t ∈ R and d > 0 , LLR Q d,t ∈ Γ( d ) . As already di scussed (cf. Eq . (2)), the opti mal performance of a relay tree is always domin ated by that of a p arallel conﬁguration w ith the same number o f lea ves, i.e., g ∗ P ≤ g ∗ R . In Section V, we ﬁnd sufﬁcient conditions under which the equalit y g ∗ R = g ∗ P holds. Th en, in Section V -F, we look into necessary cond itions for this to be the case. It turns out that no n-trivial necessary conditions for t he equality g ∗ R = g ∗ P to hold are, in general, difﬁcult to obtain, b ecause they depend on the nature o f the transmi ssion functions av ailable t o the sensors. For example, if t he sensors are allowed to sim ply forward u ndistorted all of th e messages that t hey receive, then the equality g ∗ R = g ∗ P holds tri vially . Hence, we need to im pose so me restrictions on the set of transmissio n functi ons av ailable, as in the assumption that fol lows. Assumption 4: (a) There exists a n 0 ≥ 1 such that for all n ≥ n 0 , we have l n ( v ) > 1 for all v in the set B n of nodes whose immediate predecessors are all leav es. (b) Let X 1 , X 2 , . . . be i.i.d. random variables un der either hypo thesis H j , each with distribution P X j . For k > 1 , γ 0 ∈ Γ( k ) , and γ i ∈ Γ , i = 1 , . . . , k , let ξ = ( γ 0 , . . . , γ k ) . W e also let ν ξ j be the di stribution of γ 0 ( γ 1 ( X 1 ) , . . . , γ k ( X k )) un der hypot hesis H j . W e assume that g ∗ P < inf ξ ∈ Γ( k ) × Γ k 1 k E 0 h log d ν ξ 1 d ν ξ 0 i , (6) for all k > 1 . Assumptio n 4 holds in m ost cases of interest. Part (a) results i n no loss of generality : if i n a relay tree we ha ve l n ( v ) = 1 for some v ∈ B n , we can remove the predecessor of v , and treat v as a leaf. Regarding part (b), it is easy t o see that the left-hand side (L.H.S.) of (6) i s alwa ys l ess than or equal to the R.H.S., hence we hav e only excluded those cases where (6) holds with equality . W e are essentially assuming t hat when the messages γ 1 ( X 1 ) , . . . , γ k ( X k ) are summ arized (or quantized) by γ 0 , there is som e loss of inform ation, as measured by the associated KL di ver gences. October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 15 C. Main Results In this sectio n, we collect and summ arize our main results. The asymptotic proportion o f nodes that are leav es, deﬁned by z = lim inf n →∞ l n ( f ) n , plays a critical role. Theor em 1: Consider a sequence of trees, ( T n ) n ≥ 1 , of bounded height. Suppo se that Assump - tions 1-3 hold. Then, (i) g ∗ P ≤ g ∗ R < 0 and − D( P X 0 k P X 1 ) ≤ g ∗ ≤ z g ∗ R < 0 . (ii) If z = 1 , then g ∗ P = g ∗ = g ∗ R . (iii) If t he number of non-leaf nodes i s bounded, or if min v ∈ B n l n ( v ) → ∞ , then g ∗ P = g ∗ = g ∗ R . (iv) If Assumption 4 also ho lds, we ha ve g ∗ R = g ∗ P iff z = 1 . Note that part (i) follo ws from (2), except for the strict negati vi ty of the error exponents, which is establis hed in Proposi tion 2. P art (ii) is proved in Proposit ion 3. Part (iii) is pro ved in Corollary 1. (Recall that B n is the set of non-leaf nod es all of whose immediat e predecessors are leav es.) Part (iv) is proved in Propos ition 5. One might also have expected a result asserting that g ∗ P ≤ g ∗ . Ho we ver , thi s is not true without additional assu mptions, as will be discussed in Section V -F. I V . E R RO R B O U N D S F O R h - U N I F O R M R E L A Y T R E E S In this section, we consider a 1-bi t h -uniform relay tree, in whi ch all relay nodes at leve l k use a LLRQ with a com mon threshol d t k . W e wish to deve lop upper bounds for the error probabilities at t he var ious nodes. W e do this recursively , by moving along the levels of th e tree, starting from the leaves. Gi ven bou nds on the error probabilit ies associated with the mess ages recei ved by a node, we deve lop a bound on the log-moment generating function at th at node (cf. Eq. (8)), and then use the standard Che rnoff bou nd technique to de velop a bound on the error probability for the message sent by that n ode (cf. Eq. (7)). Let t ( k ) = ( t 1 , t 2 , . . . , t k ) , for k ≥ 1 , and t (0) = ∅ . For j = 0 , 1 , k ≥ 1 , and λ ∈ R , we deﬁne October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 16 recursiv ely Λ j, 0 ( γ ; λ ) = Λ j, 0 ( γ , ∅ ; λ ) = log E j h d P γ 1 d P γ 0  λ i , Λ ∗ j,k ( γ , t ( k ) ) = sup λ ∈ R  λt k − Λ j,k − 1 ( γ , t ( k − 1) ; λ )  , (7) Λ j,k ( γ , t ( k ) ; λ ) = max  − Λ ∗ 1 ,k ( γ , t ( k ) )( j + λ ) , Λ ∗ 0 ,k ( γ , t ( k ) )( j − 1 + λ )  . (8) The operation in (7) is known as the Fenchel-Legendre t ransform of Λ j,k − 1 ( γ , t ( k − 1) ; λ ) [27]. W e wi ll be interested in the case where − D( P γ 0 k P γ 1 ) < 0 < D( P γ 1 k P γ 0 ) , (9) t 1 ∈  − D( P γ 0 k P γ 1 ) , D( P γ 1 k P γ 0 )  , (10) t k ∈  − Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) , Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) )  , for 1 < k ≤ h. (11) W e now provide an inductive argument to show that t he above requirement s on the t hresholds t k are feasibl e. From Assumpti on 1, th ere exists a γ ∈ Γ that satisﬁes (9), hence the constraint (10) is feasibl e. Furthermo re, the Λ ∗ j, 1 ( γ , t (1) ) are large deviations rate functi ons and are therefore positive when t 1 satisﬁes (10) [27]. Suppose now th at k > 1 and that Λ ∗ j,k − 1 ( γ , t ( k − 1) ) > 0 . From (8), Λ j,k − 1 ( γ , t ( k − 1) ; λ ) is the maximum of two linear functions of λ (see Figure 7). T aking the Fenchel-Legendre transform, and si nce t k satisﬁes (11), we obtain Λ ∗ j,k ( γ , t ( k ) ) > 0 , wh ich completes the induction. λ 0 1 { Slope= − Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) Slope=Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) ) Λ ∗ 0 ,k ( γ , t ( k ) ) Slope= t k Fig. 7. T ypical plot of Λ 0 ,k − 1 ( γ , t ( k − 1) ; λ ) , k ≥ 2 . October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 17 From t he deﬁnit ions of Λ j,k and Λ ∗ j,k , t he foll owing relations can be establ ished. The proof consists of straightforward algebraic manipulations and is omitted. Lemma 2: Suppose that γ ∈ Γ satisﬁes (9), and t ( h ) satisﬁes (10)-(11). F or k ≥ 1 , we ha ve Λ ∗ 1 ,k ( γ , t ( k ) ) = Λ ∗ 0 ,k ( γ , t ( k ) ) − t k . Furthermore, the supremum i n (7) is achieved at some λ ∈ ( − 1 , 0) for j = 1 , and λ ∈ (0 , 1) for j = 0 . F or k ≥ 2 , we have Λ ∗ 1 ,k ( γ , t ( k ) ) = Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) )(Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) ) − t k ) Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) ) + Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) , Λ ∗ 0 ,k ( γ , t ( k ) ) = Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) )(Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) + t k ) Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) ) + Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) . Proposition 1 below , whos e proof i s provided in the Appendi x, will be our m ain tool in obtaining upper bounds on error probabi lities. It shows t hat the T ype I and II error exponents are ess entially upp er bounded by − Λ ∗ 0 ,h ( γ , t ( h ) ) and − Λ ∗ 1 ,h ( γ , t ( h ) ) respectively . Recall that p n ( v ) is the total numb er of predecessors of v , l n ( v ) is the nu mber o f leav es in the sub-tree rooted at v , and B n is t he set of nodes all of whose i mmediate predecessors are leaves. Pr opos ition 1 : Fix some h ≥ 1 , and consider a s equence of trees ( T n ) n ≥ 1 such that for al l n ≥ n 0 , T n is h -uniform. Suppose that Assumptions 1-2 ho ld. Supp ose that, for ev ery n , e very leaf uses the same transmiss ion function γ ∈ Γ , whi ch satisﬁes (9), and that ev ery level k node ( k ≥ 1 ) uses a LLRQ with threshold t k , sati sfying (10)-(11). (i) For all nodes v of le vel k ≥ 1 and for all n ≥ n 0 , we ha ve 1 l n ( v ) log P 1  S n ( v ) l n ( v ) ≤ t k  ≤ − Λ ∗ 1 ,k ( γ , t ( k ) ) + p n ( v ) l n ( v ) − 1 , 1 l n ( v ) log P 0  S n ( v ) l n ( v ) > t k  ≤ − Λ ∗ 0 ,k ( γ , t ( k ) ) + p n ( v ) l n ( v ) − 1 . (ii) Suppose that for all n ≥ n 0 and all v ∈ B n , we ha ve l n ( v ) ≥ N . Then, for all n ≥ n 0 , we hav e 1 l n ( f ) log P 1  S n ( f ) l n ( f ) ≤ t h  ≤ − Λ ∗ 1 ,h ( γ , t ( h ) ) + h N , 1 l n ( f ) log P 0  S n ( f ) l n ( f ) > t h  ≤ − Λ ∗ 0 ,h ( γ , t ( h ) ) + h N . October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 18 V . O P T I M A L E R R O R E X P O N E N T In this sectio n, we show that the T y pe II error probabilit y in a s equence of bounded height trees falls e xponentially f ast with the number of nodes. W e deriv e s ufﬁ cient conditions for the error exponent to be the same as that of a parallel conﬁguration. W e show th at if almost all of the nodes are leav es, i.e., z = 1 , then g ∗ P = g ∗ = g ∗ R . The condition z = 1 is also equiv alent to another cond ition that requires that the proportion of lea ves attached to bounded degree nodes vanishes asympt otically . W e also show t hat under some additional mil d assumpt ions, this sufﬁ cient conditi on is n ecessary . W e s tart wi th some g raph-theoretic preli minaries. A. Properties of T r ees. In this secti on, we deﬁne various quantiti es associated with a t ree, and derive a few elementary relations that will be us ed later . Recall that B n is the set of non -leaf n odes all of whose predecessors are lea ves. (For an h -uniform tree, B n is t he set of all level 1 no des.) For N > 0 , let F N ,n = { v ∈ B n : l n ( v ) ≤ N } , F c N ,n = { v ∈ B n : l n ( v ) > N } , (12) and q N ,n = 1 l n ( f ) X v ∈ F N,n l n ( v ) , (13) where the s um is taken to be zero if the set F N ,n is empty . Let q N = lim sup n →∞ q N ,n . For a sequence of h -uni form trees, this is t he asym ptotic proportio n of l ea ves that belong to “small” subtrees in the network. It turns out that i t is easier to work with h -uniform trees. For this reason, we sh ow how to transform any tree of height h to an h -uniform tree. Height Uniformization Procedur e. Consider a tree T n = ( V n , E n ) of height h , and a node v that has at least one leaf as an immedi ate predecessor ( v ∈ A n ). L et D n be t he set of lea ves that are immediate predecessors of v , and whose paths to the fusion center f are of length k < h . Add h − k nodes, { u j : j = 1 , . . . , h − k } , to V n ; remove t he edges ( u , v ) , for all u ∈ D n ; add th e edges ( u 1 , v ) , and ( u j +1 , u j ) , for j = 1 , . . . , h − k − 1 ; add the edges ( u, u h − k ) , for all u ∈ D n . Thi s procedure i s repeated for all v ∈ A n . The resulting tree is h -uniform.  October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 19 The height uniformi zation procedure essentially adds m ore nodes to t he network, and re- attaches som e leav es, so t hat the path from eve ry leaf h as exactly h hops. Let ( T ′ n = ( V ′ n , E ′ n )) n ≥ 1 be t he new sequence of h -uniform trees o btained from ( T n ) n ≥ 1 , after appl ying the uni formization procedure. (W e are abusing notation here in that T ′ n typically d oes not have n nodes, nor is th e sequence | V ′ n | increasing.) Regarding notation, we adopt the con vention that quantities marked with a prime are d eﬁned with respect to T ′ n . Note that l ′ n ( f ) = l n ( f ) . For the case of a relay network, it is seen that any function of the observations at the leav es t hat can be computed i n T ′ n can also be computed in T n . Th us, the detection performance of T ′ n is n o better than t hat of T n . Hence, we obtain g ∗ R ≤ lim sup n →∞ 1 l ′ n ( f ) log β ∗ ( T ′ n ) . (14) Therefore, any upper bound derive d for h -uniform trees, readily translates to an up per bound for general trees. On the other h and, the coefﬁc ients q N for the h -uniform trees T ′ n (to be denot ed by q ′ N ) are diffe rent from the coefﬁcients q N for the original sequence T n . They are related as follows. The proof is given in the Appendix. Lemma 3: For any N , M > 0 , we hav e q ′ N ≤ h ( N q M + N / M ) . In particul ar , if q N = 0 for all N > 0 , then q ′ N = 0 for all N > 0 . It t urns out that the condition z = 1 is equiv alent t o t he condi tion q N = 0 for all N > 0 . Th e proof is provided in the Appendix. Lemma 4: W e have z = 1 if f q N = 0 for all N > 0 . B. An Upper Bound In this section, we dev elop an upper boun d on t he T y pe II error p robabilities, which takes into account some qualitative properties o f the s equence of trees, as captu red by q N . Lemma 5: Consider an h -uniform sequence of trees ( T n ) n ≥ 1 , and sup pose t hat Assumptions 1-3 ho ld. For e very ǫ > 0 , th ere exists some N such that g ∗ R ≤ (1 − q N )( g ∗ P + ǫ ) . October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 20 Pr oof : If g ∗ P + ǫ ≥ 0 , there is n othing to prove, since q N ≤ 1 and g ∗ R ≤ 0 . Suppose that g ∗ P + ǫ < 0 . Choose γ ∈ Γ su ch that − D( P γ 0 k P γ 1 ) ≤ − sup γ ′ ∈ Γ D( P γ ′ 0 k P γ ′ 1 ) + ǫ 2 = g ∗ P + ǫ 2 < 0 . Let t k = t = − D( P γ 0 k P γ 1 ) + ǫ/ 2 ≤ g ∗ p + ǫ , for k = 1 , . . . , h , and note that − D( P γ 0 k P γ 1 ) < t < 0 . (15) Because of (15), we ha ve Λ ∗ 0 , 1 ( γ , t (1) ) > 0 . Furth ermore, usi ng Lemma 2, Λ ∗ 1 , 1 ( γ , t (1) ) = Λ ∗ 0 , 1 ( γ , t (1) ) − t > − t . No w let k ≥ 2 , and suppose that Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) > − t and Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) ) > 0 . From Lemma 2, Λ ∗ 0 ,k ( γ , t ( k ) ) = Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) )(Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) + t ) Λ ∗ 0 ,k − 1 ( γ , t ( k − 1) ) + Λ ∗ 1 ,k − 1 ( γ , t ( k − 1) ) > 0 , and Λ ∗ 1 ,k ( γ , t ( k ) ) = Λ ∗ 0 ,k ( γ , t ( k ) ) − t k = Λ ∗ 0 ,k ( γ , t ( k ) ) − t > − t. Hence, by induction, t k satisﬁes (10)-(11), so t hat Proposi tion 1 can b e applied. Choose N sufﬁ ciently lar ge so that h/ N < Λ ∗ 0 ,h ( γ , t ( h ) ) . If q N = 1 , the claimed result ho lds trivially . Hence, we ass ume that q N ∈ [0 , 1 ) . In this case, for n sufﬁ ciently large, there exists at least one node in B n so that l n ( v ) > N . W e remov e all nodes v ∈ B n with l n ( v ) ≤ N , and t heir immedi ate p redecessors. Then, we remove all l e vel 2 nodes v that no longer hav e any predecessors, and so on. In this way , we obtain an h -uni form subtree of T n , to be d enoted by T ′′ n . (Quant ities m arked wit h double primes are deﬁned w .r .t. T ′′ n .) W e have l ′′ n ( v ) > N for all v ∈ B ′′ n , and l ′′ n ( f ) = P v ∈ F c N,n l n ( v ) = l n ( f )(1 − q N ,n ) . Cons ider the following relay strategy on the tree T ′′ n . (Since thi s is a subtree of T n , this i s also a relay strategy for the tree T n , with so me nodes remaining idle.) The leav es t ransmit wit h transm ission function γ , and the o ther nodes use a 1-bit LLRQ with threshol d t . (Note th at in the d eﬁnition (5) of the normalized log-li kelihood ratio, the denominator l n ( v ) no w becomes l ′′ n ( v ) .) October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 21 W e ﬁrst show that t he st rategy just described i s admissible. W e apply part (ii ) of Propositi on 1 to T ′′ n , to obtain lim sup n →∞ 1 l n ( f ) log P 0 ( Y f = 1) = lim sup n →∞ l ′′ n ( f ) l n ( f ) · 1 l ′′ n ( f ) log P 0 ( Y f = 1) ≤ (1 − q N ) lim sup n →∞ 1 l ′′ n ( f ) log P 0  S n ( f ) l ′′ n ( f ) > t  ≤ (1 − q N )  − Λ ∗ 0 ,h ( γ , t ( h ) ) + h N  < 0 , hence P 0 ( Y f = 1) ≤ α , when n is suf ﬁciently l ar ge. T o bou nd the T ype II error p robability , we use Proposition 1 and Lemma 2, to obt ain g ∗ R ≤ lim sup n →∞ 1 l n ( f ) log β ∗ ( T ′′ n ) ≤ (1 − q N ) lim sup n →∞ 1 l ′′ n ( f ) log P 1  S n ( f ) l ′′ n ( f ) ≤ t  ≤ (1 − q N )  − Λ ∗ 1 ,h ( γ , t ( h ) ) + h N  = (1 − q N )  t − Λ ∗ 0 ,h ( γ , t ( h ) ) + h N  ≤ (1 − q N ) t ≤ (1 − q N )  g ∗ P + ǫ  . This proves the lemma. C. Exponenti al decay of error pr obabiliti es W e now establish that T ype II error probabil ities decay exponentially . T he bounded height as- sumption is crucial for this result. Indeed, for the case of a tandem conﬁguration, the exponential decay property does not seem to hold. Pr opos ition 2 : Consider a sequence of trees of height h , and let Assumpti ons 1-3 hold. Then, −∞ < g ∗ P ≤ g ∗ R < 0 and − ∞ < − D( P X 0 k P X 1 ) ≤ g ∗ < 0 . Pr oof : Th e lo wer bounds on g ∗ R and g ∗ follow from (2). Note that g ∗ P cannot be equal to −∞ because it cannot be better than the error exponent o f a parallel conﬁguration in whi ch all October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 22 the observations are provided uncompressed to the fusion center . The error exponent in the latter case is − D( P X 0 k P X 1 ) , by Stein’ s L emma, and is ﬁnite as a consequence of Assumpti on 2. It remains to show that the optim al error e xponents are negati ve. Every tree of height h s atisﬁes n ≤ l n ( f ) h + 1 . From (2), we obtain g ∗ ≤ g ∗ R /h . T herefore, we only need to show that g ∗ R < 0 . As di scussed in connection to (14), we can restrict attention to a sequence of h -uni form t rees. W e use induction on h . If h = 1 , we have a p arallel conﬁgurati on and the result follows from [22]. Suppose that the resul t i s true for all sequences of ( h − 1) -uniform t rees. Consider now a sequence of h -uniform trees. Let ǫ > 0 be such that g ∗ P + ǫ < 0 . From Lemma 5, th ere exists some N such t hat g ∗ R ≤ (1 − q N )( g ∗ P + ǫ ) . If q N < 1 , we readily obtain the in equality g ∗ R < 0 . Suppose now that q N = 1 . W e only need to consi der a sequence ( n k ) k ≥ 1 such that lim k →∞ q N ,n k = 1 . Us ing the i nequality (22), we ha ve | F N ,n k | l n k ( f ) ≥ q N ,n k N , and lim inf k →∞ | F N ,n k | l n k ( f ) ≥ 1 N . (16) For each node v ∈ B n , we remov e all of its immediate predecessors (lea ves) except for one, call it u . The leaf u transmits γ ( X u ) to its im mediate successor v . Since node v recei ves only a single message, it just forwa rds it to its immediate successor . The resulting performance is t he same as if the nodes v in B n were making a measurement X v and transmitting γ ( X v ) to t heir successor . This i s equiv alent to deleting all the l ea ves of T n to form a new tree, T ′′ n , wh ich is ( h − 1 ) -uniform. The abo ve argument shows that β ∗ ( T n k ) ≤ β ∗ ( T ′′ n k ) . W e h a ve l ′′ n k ( f ) = | B n k | and from (16), lim inf k →∞ | B n k | l n k ( f ) ≥ lim inf k →∞ | F N ,n k | l n k ( f ) ≥ 1 N . Therefore, lim sup k →∞ 1 l n k ( f ) log β ∗ ( T n k ) ≤ 1 N lim sup k →∞ 1 l ′′ n k ( f ) log β ∗ ( T ′′ n k ) . By the in duction hypo thesis, the right-hand sid e in the above inequality is n egati ve and the proof is complete. October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 23 D. Su fﬁcient Condit ions for Matching the P erformance of the P arallel Conﬁguration W e are now ready to prov e t he main result of this section . It shows that when q N = 0 for all N > 0 , o r equiv alent ly when z = 1 (cf. Lemm a 4), bound ed heig ht t ree networks match the performance of the parallel conﬁguration . Pr opos ition 3 : Consider a sequence of trees of height h in which z = 1 , or equiv al ently q N = 0 for all N > 0 . Suppose that A ssumption s 1-3 hold. Then, g ∗ P = g ∗ = g ∗ R . Furthermore, if the sequence o f trees is h -uniform, the optimal error e xponent does no t change e ven if we restrict to relay strategies in which ev ery leaf uses the s ame t ransmission functi on and all other nodes us e a 1-bit LLRQ wit h the same th reshold. Pr oof : W e have shown g ∗ P ≤ g ∗ R in (2). W e now prove that g ∗ R ≤ g ∗ P . As already explained, there is no lo ss in g enerality in assuming that the sequ ence of trees is h -uniform (by performing the heig ht uniformi zation procedure, and u sing Lemm a 3). For any ǫ > 0 , Lemma 5 yields g ∗ R ≤ g ∗ P + ǫ. Letting ǫ → 0 , we obtain g ∗ R ≤ g ∗ P , hence g ∗ R = g ∗ P . From (2) with z = 1 , we obtain g ∗ ≤ g ∗ R = g ∗ P . W e now show t hat g ∗ ≥ g ∗ P . Consider a tree w ith n nodes, l n ( f ) of which are leav es. W e will compare it with another sensor network in whi ch l n ( f ) nodes v transmi t a message γ v ( X v ) to the fusion center and n − l n ( f ) − 1 nodes transmit th eir ra w observa tions t o the fusion center . The latter network can simulate the origi nal network, and therefore its op timal error exponent is at least as goo d. By a standard argument (similar to the one in Proposition 4 below), the optimal error exponent in the latter n etwork can be sho wn to be greater than or equal to lim sup n →∞ l n ( f ) n g ∗ P + lim sup n →∞ − n − l n ( f ) − 1 n D( P X 0 k P X 1 ) = g ∗ P , hence concludin g the proof. Fix an ǫ ∈ (0 , − g ∗ P ) . For any tree sequence with z = 1 , we can perform the h eight uniformiza- tion procedure to obtain an h -uniform s equence of trees. In practice, this height uniformization October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 24 procedure m ay be performed virtually at each node, so that th e t ree sequence simu lates a h - uniform tree sequ ence. A simpl e strategy on the height uniform ized t ree sequence that ǫ -achie ves the op timal error exponent i s a relay strategy i n whi ch: (i) all leav es transmi t with the same transm ission function γ ∈ Γ , where γ is chosen such th at − D( P γ 0 k P γ 1 ) ≤ g ∗ P + ǫ/ 2 ; (ii) all other n odes us e 1-bit LLRQs with the same t hreshold t = − D( P γ 0 k P γ 1 ) + ǫ/ 2 . Lemmas 3 and 4, and the proof o f Lemm a 5 shows that this relay st rategy ǫ -achiev es the optim al error exponent g ∗ R = g ∗ = g ∗ P . This also shows that t here is no loss in opt imality e ven i f we restrict the relay nodes to use only 1-bit LLRQs. This may be us eful in s ituations wh ere the nodes are simple, low-cost devices. Proposition 3 p rovides sufﬁcient condit ions for a sequence of trees to achieve t he sam e error exponent as the parallel conﬁguration. W e n ote a few special cases in which t hese sufﬁcient conditions are satis ﬁed. The ﬁrst one is the case where there is a ﬁnite b ound on the number of no des that are not lea ves. In t hat case, z is easily seen to b e 1. This is consistent wi th the conclusion of Example 1, w here a simpler argument was used. The second is the m ore general case where nod es in B n are attached t o a growing num ber of leaves, which im plies that q N = 0 for all N > 0 . Cor ollar y 1: Suppose that Assum ptions 1-3 hold. Supp ose further that eith er of th e following conditions hold s: (i) There is a ﬁnite bound on th e numb er of nod es that are not leav es. (ii) W e hav e min v ∈ B n l n ( v ) → ∞ . Then, g ∗ P = g ∗ = g ∗ R . The above corol lary can be appl ied to Example 2. In that example, ev ery lev el 1 nod e has m lea ves attached to it , wit h m growing large as n increases. T herefore, the tree network satisﬁes condition (ii) in Corollary 1, and the optim al error exponent is g ∗ = g ∗ R = g ∗ P . In th is case, ev en if the number N of l e vel 1 no des grows much faster t han m , we s till achiev e the s ame error exponent as the parallel conﬁguration. The above proposed strategy , in which ev ery leaf uses the s ame t ransmission functi on, and ev ery nod e uses the same LLRQ, will nearly achiev e the optimal performance. W e are no w in a position to determine the opt imal error e xponent in Example 4. October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 25 Example 4, re visited: Recall t hat i n Example 4 , every v i ∈ B n has i + 1 of predecessors. It is easy t o check that z = 1 . From Proposition 3, the optim al error exponent i s the same as that for the parallel conﬁguration.  E. Di scussion of th e Sufﬁcient Conditio ns Proposition 3 is unexpected as i t establishes that the performance of a tree po ssessing certain qualitative properties is comparable to that of t he parallel conﬁguration. Furthermore, the opt imal performance is obtained eve n i f we restrict the non-leaf nodes to us e 1-bit LLRQs. At ﬁrst sight, it might appear intu itive that if t he leav es d ominate in a relay tree ( z = 1 ), then the tree should always ha ve the sam e performance as a parallel conﬁguration. Howe ver , this intuiti on is misleading, as this is not the case for a Bayesian formulation, in which both the T ype I and II error probabilities are required to decay at the same rate, is i n volved. T o see this, cons ider the 2-uniform t ree in Figure 3, where every no de i s constrained to sending 1-bit messages. Suppose we are given nonzero p rior prob abilities π 0 and π 1 for the hypotheses H 0 and H 1 . Ins tead of the Neyman-Pearson criterion, suppose that we are interested in minimizing t he error e xponent lim sup n →∞ 1 l n ( f ) log P ∗ e , where P ∗ e is th e m inimum of the error probability π 0 P 0 ( Y f = 1) + π 1 P 1 ( Y f = 0) , optimized over all strategies. It can be shown that to obtain the optimal error exponent, we only need to consider the following two fusion rules: (a) the fusion center declares H 0 iff b oth v 1 and v 2 send a 0, o r (b) the fus ion center declares H 1 iff bot h v 1 and v 2 send a 1. Then, using the resu lts in [28], the optimal error exponent for this tree network is s trictly worse than t hat for the parallel conﬁguration. Simi larly , if we constrain the T ype I error in the Neyman-Pear son criterion to decay faster th an a predetermined rate, it can be sh own that the optimal T yp e II error exponent for a tree network can be strictly worse than that of a parallel conﬁguratio n. Note t hat th e bound ed height assumption is essential in proving g ∗ = g ∗ R = g ∗ P , when z = 1 . Although our technique can be extended to include those tree sequences w hose height grows very slo wly com pared to n (on the order of log | lo g( n/l n ( f ) − 1) | ), we have no t been able to ﬁnd th e opt imal error exponent for the general case o f unbo unded height. As no ted before, in a tandem network, the Bayesian error probability decays sub -exponentially fast [26]. T he proo f of Proposition 2 in [26] in volves the construction of a tree network, with un bounded height, and October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 26 in which z = 1 . In that proof, it i s also shown th at such a network has a sub-exponential rate of error decay . W e conjecture that this is also the case for the Ne yman-Pearson formul ation. In summary , for a tree network to achie ve the same T ype II error exponent as a parallel conﬁguration, we require that the tree sequence h a ve a bounded height, satisfy the condi tion z = 1 , and that t he error criterion be t he Neyman-Pearson criterion. W ithout any one of these three condit ions, our results no longer hold. F . A Necessary Conditio n for Mat ching the P erformance of the P arallel Conﬁguration In th is section, we establish necessary conditio ns under wh ich a sequence of relay trees with bounded height performs as well as a parallel conﬁguration. As no ted in Sec tion III-B, any necessary conditions generally depend on the typ e of transmissi on functions a vailable to the relay nodes. Howe ver , und er an addit ional conditio n (Assumptio n 4), the su f ﬁcient condition for g ∗ R = g ∗ P in Propositi on 3 is also necessary . Pr opos ition 4 : Suppose that Assumption s 1, 2 and 4 hold, and h ≥ 2 . If t here exists some N > 0 such that q N > 0 (equiv alently , z < 1 ), then g ∗ P < g ∗ R . Pr oof : Fix som e N > 0 and suppose t hat q N > 0 . Giv en a tree T n , we construct a ne w tree T ′′ n , as follows. W e remove all no des other than the lea ves and the nodes in F N ,n . For all th e lea ves u that are not immediate predecessors of some v ∈ F N ,n , we let u transmit i ts message directly to the fusion center . W e add ne w edges ( v , f ) , for each v ∈ F N ,n . This gives us a tree T ′′ n of height 2, with l ′′ n ( f ) = l n ( f ) and q ′′ N = q N . The l atter tree T ′′ n can simulat e the tree T n , hence t he optimal error exponent associated with the sequence ( T n ) n ≥ 1 is bounded below by the op timal error exponent associated with the sequence ( T ′′ n ) n ≥ 1 . Therefore, without loss of generality , we only need to prove th e proposit ion for a s equence of trees of height 2, and in which F N ,n = B n , for som e N > 0 such that q N > 0 ; we henceforth ass ume that thi s i s the case. The rest of the ar gument is si milar to the proof of Stein’ s Lemma in Lem ma 3.4.7 of [27]. Suppose t hat a particular adm issible relay strategy has been ﬁxed, and let β n be the associated T ype II error probability . Let λ n = E 0 [ S n ( f )] /l n ( f ) . W e show t hat S n ( f ) /l n ( f ) is close to λ n in probabili ty . Let D n be the set of lea ves th at transmit directly to t he fusion center . The proof of the follo wing lemma is in the Appendix. Lemma 6: For all η > 0 , P 0 ( | S n ( f ) /l n ( f ) − λ n | > η ) → 0 , as n → ∞ . October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 27 W e return to t he proof of Propos ition 4. Giv en the t ransmission functi ons at all ot her nodes, the fusion center w ill op timize performance by using an appropriate likelihood ratio test, wit h a (possibly randomized) t hreshold. W e can therefore assume, without l oss of generality t hat this is the case. W e let ζ n be the threshold chosen, and note that it must satis fy P 0 ( S n ( f ) /l n ( f ) ≤ ζ n ) ≥ 1 − α. (17) From a change of measure ar gument (see Lemma 3.4.7 in [27 ]), we ha ve for η > 0 , 1 l n ( f ) log β ∗ ( T n ) ≥ λ n − η + 1 l n ( f ) log P 0  λ n − η < S n ( f ) l n ( f ) ≤ ζ n  . Using (17) and Lemma 6 , we see that the l ast t erm goes to 0 as n → ∞ .W e also ha ve λ n = 1 l n ( f )  X v ∈ D n E 0  log d P γ v 1 d P γ v 0  + X v ∈ F N,n E 0 [ L v,n ]  ≥ (1 − q N ,n ) g ∗ P + q N ,n K, where, usin g the notation i n Assumpt ion 4, K = inf 1 g ∗ P . Then, l etting n → ∞ , we ha ve g ∗ R ≥ (1 − q N ) g ∗ P + q N K − η , for all η > 0 . T aking η → 0 completes the p roof. The conditio n that there exists a ﬁnite N such that l n ( v ) ≤ N for a non -v anishing proportion of no des, in the statement of Proposition 4, can be thought of as corresponding t o a situat ion where relay nodes are of two dif ferent types: high cost relays that can process a large number of recei ved m essages ( l n ( v ) → ∞ ) and low cost relays that can onl y process a l imited n umber of receiv ed messages ( l n ( v ) ≤ N for som e small N ). From this perspectiv e, Proposition 4 states that a tree network of heigh t greater than one, with a n ontrivial prop ortion of low cost relays, will always hav e a performance worse than that of a parallel conﬁgurati on. T ogether wit h Proposition 3, w e have sh own the fol lowing. Pr opos ition 5 : Suppose that Assumption s 1-4 hold. Then, g ∗ R = g ∗ P iff z = 1 (o r equivalently , iff q N = 0 for all N > 0 ). October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 28 W e close with an example in whi ch z < 1 and g ∗ < g ∗ P . Since t here are also easy examples where z < 1 and g ∗ P < g ∗ , this suggests that one can com bine them to cons truct examples where z < 1 and g ∗ = g ∗ P . Th us, unli ke t he case of a relay t ree, z = 1 is not a necessary conditi on for g ∗ = g P . Example 5: Consider th e tree network shown in Figure 8, w here e very node makes a 3-bit observation. Each leaf then compresses i ts 3-bit obs erv ation to a 1-bit m essage, while each level 1 node is allowed to send a 4-bi t message. (Recall t hat our framework allows for di f ferent transmissio n functi on sets Γ( d ) at the differe nt levels.) W e assume Assumpti ons 1-3 hold. Moreover , we assum e that this network satis ﬁes Assum ption 4 . 4 bits 1 bit 1 bit 1 bit 1 bit 4 bits f v 1 v m Fig. 8. Every node makes a 3-bit observ ation. Leav es are constrained to sending 1-bit messages, whil e level 1 nodes are constrained to sending 4-bit messages. Consider the following strategy: each lev el 1 no de forwards the two 1-bit m essages it receiv es from its two leaves to the fusion center . It then compress its own 3-bit observation in to a 2-bit message before s ending it to the fusion center . Using this strategy , the tree network is equiv alent t o a parallel conﬁguration with 3 m nodes, 2 m of which are constrained to sending 1-bit messages, and m of which are constrained to sending 2-bi t mess ages. Clearly , th is parallel conﬁguration performs st rictly bet ter t han one in which all 3 m nodes are con strained to send ing 1-bit messages, therefore we ha ve g ∗ < g ∗ P .  Example 5 s hows that, unlike the case of relay trees, a tree can outperform a parallel con- ﬁguration. On the o ther hand, Example 5 is an artifact of our ass umptions. For example, if we restrict every node i n th is example to sending only 1 bit, the si tuation is re versed and we have g ∗ P < g ∗ . The questi on of wheth er a parallel conﬁguration always performs at least as well as a tree network, i .e., whether g ∗ P ≤ g ∗ , when ev ery node can send the same num ber of bits , remains October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 29 open. V I . C O N C L U S I O N W e have studied the asympt otic detection performance of tree networks with bo unded height, under a Neyman-Pear son criterion. Similar to the parallel conﬁguration, we have shown that the optimal T yp e II error probability decays exponenti ally fast with t he n umber of nodes. In addition, we hav e shown that if leave s dom inate (i.e., l n ( f ) /n → 1 ), the network can achie ve the same p erformance as if all nodes were transmit ting directly to th e fusion center . W e also provided a sim ple strategy , in which all leave s use the same transmi ssion function , and all other nodes act as 1-bi t relays, which achieve s the optimal error exponent to any desired accuracy . The sufﬁ cient conditions are easy to achi e ve in cases of practical interest, hence a system designer can obtain the optimal performance while ensuring that t he network is energy ef ﬁcient. Once the sufﬁ cient conditi ons are satisﬁed, the architecture of t he network no longer affects it s detection error exponent. On the o ther hand, w e also sh owed that for the practically interestin g case where z = 1 , the suf ﬁcient cond itions are also necessary . Thus, in a network where the lea ves do not dominate, the error decay rate wi ll be worse than that of a parallel conﬁguration, and will actually depend on the particular network architecture. Needless to say , our con clusions onl y hold for the particular sett ing and criterion we hav e employed. One issue that has n ot b een touched up on is that, with a relay network, a signiﬁcantly lar ger v alue of n may be required before the asymptot ic error e xponent yields a good approx i- mation. M oreover , in practice, it would be wasteful t o have only the leav es make observations, if n i s not large enough. Furthermore, under a Bayesian criterion, the s ame performance as t he parallel conﬁguration can no longer be achie ved, altho ugh exponential decay is sti ll possible [28]. Finally , t he more realistic case where the i. i.d. assum ption is v iolated, remains unexplored, with work mainly lim ited to the parallel conﬁguration [29]–[34]. Future work in cludes characterizing the asymp totically optim al performance of tree networks without the bounded heig ht constraint. W e would like to understand the rate at which the error probability decays, and its dependence on the rate at which the heig ht of th e tree increases. Another intri guing q uestion, which has been left unans wered, is whether the inequ ality g ∗ P ≤ g ∗ is always true under the bounded height assum ption, when e very node is constrained to sending the sam e number of bits. October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 30 V I I . A C K N OW L E D G E M E N T S W e wish t o thank the anonymous revie wers for their careful reading of the manus cript, and their detailed comments that have i mproved the presentatio n. A P P E N D I X A. Proof of Proposition 1 W e ﬁrst show part (i). The proof proceeds by induction o n k . Suppose that k = 1 , which is equiv alent t o t he well-st udied case where all sensors transmit directly t o a fusion center . In this case, p n ( v ) = l n ( v ) . Since t 1 ∈ ( − D( P γ 0 k P γ 1 ) , D( P γ 1 k P γ 0 )) , from (2.2.13) of [27], we obtain 1 l n ( v ) log P 1  S n ( v ) l n ( v ) ≤ t 1  ≤ − Λ ∗ 1 , 1 ( γ , t 1 ) . The in equality for th e T y pe I error probability follows from a similar argument. Consider now the ind uction hypothesi s that the result hold s for some k . Given a k -uni form tree rooted at v , the inductio n hypoth esis leads to bounds on the probabilities associated with the log-likelihood ratio L v,n of the message Y v computed at the node v . W e us e t hese b ounds t o obtain bounds on the log-moment generating funct ion of L v,n . Recall that L v,n equals L v,n (0) whenev er Y v = 0 , which is the case if and only if S n ( v ) / l n ( v ) ≤ t k . Fix some λ ∈ [ − 1 , 0] . W e hav e 1 l n ( v ) log E 1  e λL v,n  = 1 l n ( v ) log h P 1 ( Y v = 0) e λ L v,n (0) + P 1 ( Y v = 1) e λ L v,n (1) i = 1 l n ( v ) log h P 1 ( Y v = 0) 1+ λ P 0 ( Y v = 0) − λ + P 1 ( Y v = 1) 1+ λ P 0 ( Y v = 1) − λ i ≤ 1 l n ( v ) log h P 1 ( Y v = 0) 1+ λ + P 0 ( Y v = 1) − λ i . Using the inequality log ( a + b ) ≤ max { lo g(2 a ) , log(2 b ) } , we obtain 1 l n ( v ) log E 1  e λL v,n  ≤ max  1 + λ l n ( v ) log P 1 ( Y v = 0) , − λ l n ( v ) log P 0 ( Y v = 1)  + log 2 l n ( v ) ≤ max  − (1 + λ )Λ ∗ 1 ,k ( γ , t ( k ) ) , λ Λ ∗ 0 ,k ( γ , t ( k ) )  + p n ( v ) l n ( v ) − 1 + log 2 l n ( v ) (18) ≤ Λ 1 ,k ( γ , t ( k ) ; λ ) + p n ( v ) l n ( v ) + 1 l n ( v ) − 1 , (19) October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 31 where (18) follo ws from t he indu ction hypothesis. Consider now a node u at level k + 1 . The subtree rooted at u is a ( k + 1) -uniform tree. Each leve l k node v ∈ C n ( u ) can be v iewe d as the root of a k -uniform tree and Eq. (19) can be appli ed to L v,n . From the Markov Inequali ty , and since λ ∈ [ − 1 , 0] , we have P 1  S n ( u ) l n ( u ) ≤ t k +1  ≤ e − λl n ( u ) t k +1 E 1  e λS n ( u )  , so that 1 l n ( u ) log P 1  S n ( u ) l n ( u ) ≤ t k +1  ≤ − λt k +1 + 1 l n ( u ) X v ∈ C n ( u ) log E 1  e λL v,n  = − λt k +1 + X v ∈ C n ( u ) l n ( v ) l n ( u ) · 1 l n ( v ) log E 1  e λL v,n  ≤ − λt k +1 + Λ 1 ,k ( γ , t ( k ) ; λ ) + X v ∈ C n ( u ) p n ( v ) l n ( u ) + | C n ( u ) | l n ( u ) − 1 (20) = − λt k +1 + Λ 1 ,k ( γ , t ( k ) ; λ ) + p n ( u ) l n ( u ) − 1 , (21) where (20) follows from t he induction hypothesis and (19). T aking the inﬁmum over λ ∈ [ − 1 , 0] (cf. Lemma 2), and us ing (7), we obtain 1 l n ( u ) log P 1  S n ( u ) l n ( u ) ≤ t k +1  ≤ − Λ ∗ 1 ,k +1 ( γ , t ( k +1) ) + p n ( u ) l n ( u ) − 1 . A simi lar argument proves the result for th e T ype I error probabilit y , and the proof of part (i) is complete. For part (ii), suppose t hat for all n ≥ n 0 and all v ∈ B n , we have l n ( v ) ≥ N . Not e that l n ( f ) ≥ N | B n | . Furthermore, the number of nodes at each level k ≥ 1 is b ounded by | B n | , which yields p n ( f ) l n ( f ) − 1 ≤ n l n ( f ) − 1 = n − l n ( f ) l n ( f ) ≤ h | B n | N | B n | = h N . Applying the results from part (i), with k = h , we obtain part (ii). October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 32 B. Proof of Lemma 3 W e h a ve l ′ n ( f ) = l n ( f ) . Furthermore, it can be sho wn t hat | B ′ n | ≤ h | B n | . Therefore, q ′ N ,n = 1 l ′ n ( f ) X v ∈ F ′ N,n l ′ n ( v ) ≤ 1 l n ( f ) N | B ′ n | ≤ 1 l n ( f ) N h  | F M ,n | + | F c M ,n |  ≤ hN q M ,n + hN / M , where the last inequalit y follows from | F M ,n | ≤ P v ∈ F M ,n l n ( v ) and | F c M ,n | ≤ l n ( f ) / M . T aking th e limit superior as n → ∞ , we obtain q ′ N ≤ h ( N q M + N / M ) . Suppose t hat q M = 0 for all M > 0 . Then for all N , M > 0 , we ha ve q ′ N ≤ hN / M . T aking M → ∞ , we obtain th e desired resu lt. C. Pr oof of Le mma 4 Suppose that q N > 0 for some N > 0 . Using the in equality q N ,n = 1 l n ( f ) X v ∈ F N,n l n ( v ) ≤ N | F N ,n | l n ( f ) , or | F N ,n | ≥ q N ,n N l n ( f ) , (22) we obtai n l n ( f ) n ≤ l n ( f ) | F N ,n | + l n ( f ) ≤ l n ( f ) q N ,n l n ( f ) / N + l n ( f ) = N N + q N ,n . Letting n → ∞ , we obtain z ≤ N N + q N < 1 . October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 33 For the conv erse, suppos e that q N = 0 for all N > 0 . It can be seen that each non-leaf node is on a path that conn ects some v ∈ B n to the fusi on center . Therefore, the number of non-leaf nodes n − l n ( f ) is bo unded by h | B n | . W e hav e n − l n ( f ) l n ( f ) ≤ h | B n | l n ( f ) = h | F N ,n | + | F c N ,n | l n ( f ) ≤ hq N ,n + h N . Therefore, lim sup n →∞ n − l n ( f ) l n ( f ) ≤ h N . This is true for all N > 0 , which impli es that lim n →∞ l n ( f ) /n = 1 . D. Proof of Lemma 6 For eac h v ∈ B n , we have Y v = γ v ( { γ u ( X u ) : u ∈ C n ( v ) } ) , for some γ v ∈ Γ( l n ( v )) . Us ing the ﬁrst, and the s econd part of Lemma 1, there exists some a 1 ∈ (0 , ∞ ) , such that E 0 [ L 2 v,n ] ≤ E 0 h X u ∈ C n ( v ) log d P γ u 1 d P γ u 0  2 i + 1 ≤ l n ( v ) E 0 h X u ∈ C n ( v ) log 2 d P γ u 1 d P γ u 0 i + 1 ≤ l 2 n ( v ) a 1 + 1 ≤ l 2 n ( v ) a, (23) where a = a 1 + 1 . T o p rove the lemma, we u se Chebychev’ s inequality , and the i nequalities l n ( v ) ≤ N for v ∈ F N ,n , and | D n | ≤ l n ( f ) , to obtain P 0    S n ( f ) l n ( f ) − λ n   > η  ≤ 1 η 2 l 2 n ( f )  X v ∈ D n E 0  log 2 d P γ v 1 d P γ v 0  + X v ∈ F N,n E 0 [ L 2 v,n ]  ≤ 1 η 2 l 2 n ( f )  X v ∈ D n a + X v ∈ F N,n l 2 n ( v ) a  (24) ≤ a η 2 l n ( f ) + a η 2 l n ( f ) X v ∈ F N,n l n ( v ) l n ( f ) N ≤ a (1 + N ) η 2 l n ( f ) , (25) October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 34 where (24) follows from L emma 1 and (23). The R.H.S. of (25) goes to zero as n → ∞ , and the proo f is complete. R E F E R E N C E S [1] R. R . T enney and N. R. Sandell, “Detection with distributed sensors, ” IEEE T ran s. Aerosp. Electro n. Syst. , vol. 17, pp. 501–51 0, 1981. [2] Z. Chair and P . K. V arshney , “Optimal data fusion in multiple sensor detection systems, ” IEEE Tr ans. Aer osp. Electron . Syst. , vol. 22, pp. 98–101, 1986. [3] G. Polychronopou los and J. N. Tsitsiklis, “Explicit solutions f or some simple decentralized detection problems, ” IEEE T rans. Aer osp. Electron . Syst. , vol. 26, pp. 282–292 , 1990. [4] P . W illett and D. W arren, “The suboptimality of randomized tests i n distributed and quantized detection systems, ” IEEE T rans. Inf. Theory , vol. 38, pp. 355–361, Mar . 1992. [5] J. N. Tsitsi klis, “Extremal properties of likelihood-ratio quantizers, ” IEEE T rans. Commun. , vol. 41, pp. 550–558, 1993. [6] ——, “Decentralized detection, ” Advances in Statistical Signal P r ocessing , vol. 2, pp. 297–34 4, 1993. [7] W . W . Irving and J. N. Tsitsiklis, “Some properties of optimal thresholds in decentralized detection, ” IEEE T rans. Autom. Contr ol , vol. 39, pp. 835– 838, 1994. [8] R. V iswanathan and P . K. V arshney , “Distributed detection with multiple sensors: part I - fundamentals, ” Proc. IE EE , vol. 85, pp. 54–63, 1997. [9] B. Chen and P . K. V arshney , “ A Bayesian sampling approach to decision fusion using hierarchical models, ” IEE E Tr ans. Signal Pr ocess. , vol. 50, no. 8, pp. 1809–1818, Aug. 2002. [10] B. C hen and P . K. Willett, “On the optimality of the likelihood -ratio test for local sensor decision rules in the presence of nonideal channels, ” IEEE T rans. Inf. Theory , vol. 51, no. 2, pp. 693–699 , F eb . 2005. [11] A. Kashyap, “Comments on “On t he optimality of t he li kelihoo d-ratio test for local sensor decision rules in the presence of nonideal channels”, ” IEEE T ran s. Inf. Theory , vol. 52, no. 3, pp. 1274–12 75, Mar . 2006. [12] B. L iu and B. C hen, “Channel-optimized quan tizers for decentralized detec tion in sensor netw orks, ” IEEE Tr ans. Inf. Theory , vol. 52, no. 7, pp. 3349– 3358, Jul. 2006. [13] L. K. Ekchian and R. R. T enney , “Detection networks, ” i n Pr oc. IEEE Confer ence on Decision and Control , 1982, pp. 686–69 1. [14] R. V iswanathan, S . C. A. Thomopoulo s, and R. T umuluri, “Optimal serial distributed decision fusion, ” I EEE T rans. Aero sp. Electr on. Syst. , vol. 24, no. 4, pp. 366–376, 1988. [15] A. R. Reibman and L. W . Nolte, “Design and performance comparison of distri buted detection networks, ” I EEE T rans. Aer osp. El ectr on. Syst. , vol. 23, pp. 789–797, 1987. [16] Z. B. T ang, K. R. Pattipati, and D. L. Kleinman, “Optimization of detection networks: part I - tandem structures, ” IEE E T rans. Syst., Man, Cybern. , vol. 21, no. 5, pp. 1044–1059, 1991. [17] ——, “Optimization of detection networks: part II - tree structures, ” IEEE T rans. Syst., Man, Cybern. , vol. 23, no. 1, pp. 211–22 1, 1993. [18] J. D. Papastavrou and M. At hans, “On optimal distributed decision architectures in a hypothesis testing en vironment, ” IEEE T rans. Autom. Contr ol , vol. 37, no. 8, pp. 1154 –1169, 1992. [19] A. Pete, K. Pattipati, and D. Kleinman, “Optimization of detection netwo rks with multiple even t structures, ” IEEE T ran s. Autom. Control , vol. 39, no. 8, pp. 1702–17 07, 1994. October 27, 2021 DRAFT SUBMITTED TO IEEE TRANS. INFORMA T ION THEOR Y 35 [20] S. Alhakeem and P . K. V arshn ey , “ A uniﬁed approach to the design of decentralized detection systems, ” IEEE T rans. Aer osp. El ectr on. Syst. , vol. 31, no. 1, pp. 9–20, 1995. [21] Y . Lin, B. Chen, and P . K. V arshney , “Decision fusion r ules in multi-hop wireless sensor network s, ” IEEE T rans. Aero sp. Electr on. Syst. , vol. 41, no. 2, pp. 475–488, Apr . 2005. [22] J. N. Tsitsiklis, “Decentralized detection by a large number of sensors, ” Math. Contr ol, Signals, Syst. , vol. 1, pp. 167–182 , 1988. [23] M. E. Hellman and T . M. Cov er, “Learning with ﬁnite memory , ” Ann. of Math. Statist. , vol. 41, no. 3, pp. 765–782, 1970. [24] T . M. Cover , “Hypothesis testing with ﬁnite statisti cs, ” Ann. of Math. Statist. , vol. 40, no. 3, pp. 828–835, 1969. [25] J. D. Papastavro u and M. Athans, “Di stributed detection by a large team of sensors in tandem, ” IEEE Tr ans. Aero sp. Electr on. Syst. , vol. 28, no. 3, pp. 639–653, 1992. [26] W . P . T ay , J. N. Tsitsiklis, and M. Z. W in, “On the sub-exp onential decay of detection probabilities in long tandems, ” in Pr oc. I EEE Int. Conf. Acoustics, Speech , and Signal Proc essing , Honolulu, HI, Apr . 2007, pp. 837 – 840. [27] A. Dembo and O. Zeitouni, Lar ge Deviations T echniques and Applications . Ne w Y ork, NY : Springer-V erlag, 1998. [28] W . P . T ay , J. N. Tsitsiklis, and M. Z. Win, “Bay esian detection in bounded height t ree networks, ” in Pr oc. Data Compr ession Conf. , S no wbird, UT , Mar . 2007, pp. 243 – 252. [29] E. Drakopo ulos and C. C. Lee, “Optimum multi sensor fusion of correlated local decisions, ” IE EE T rans. Aer osp. Electr on. Syst. , vol. 27, no. 4, pp. 593–606 , Jul. 1991. [30] M. Kam, Q. Zhu, and W . S. Gray , “Optimal data fusion of correlated local decisions in multiple sensor detection systems, ” IEEE T rans. Aer osp. Electro n. Syst. , vo l. 28, no. 3, pp. 916–9 20, 1992. [31] R. S. Blum and S . A. Kassam, “Optimum distributed detection of weak signals i n depend ent sensors, ” IEE E T ran s. I nf. Theory , vol. 38, no. 3, pp. 1066– 1079, May 1992. [32] R. S. Blum, S. A. Kassam, and H. Poor , “Distributed detection with multi ple sensors: part II - advanced topics, ” Pro c. IEEE , vol. 85, no. 1, pp. 64–79, 1997. [33] J.-F . Chamberland and V . V . V eerav alli, “Ho w dense should a sensor network be for detection with correlated observ ati ons?” IEEE T rans. Inf. Theory , vol. 52, no. 11, pp. 5099–5106, Nov . 2006 . [34] W . Li and H. Dai, “Distr ibuted detection i n large-scale sensor networks with correlated sensor observ ati ons, ” in Pro c. Allerton Conf. on Communication, Contro l, and Computing , Sep. 2005 . October 27, 2021 DRAFT

Data Fusion Trees for Detection: Does Architecture Matter?

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment