Particle Filtering for Large Dimensional State Spaces with Multimodal Observation Likelihoods

We study efficient importance sampling techniques for particle filtering (PF) when either (a) the observation likelihood (OL) is frequently multimodal or heavy-tailed, or (b) the state space dimension is large or both. When the OL is multimodal, but …

Authors: ** Namrata Vaswani **

Particle Filtering for Large Dimensional State Spaces with Multimodal   Observation Likelihoods
1 P articl e Filteri ng for Lar ge Dimensional State Spaces with Multimodal Ob serv ation Lik elihoods Namrata V aswani Abstract — W e study efficient importance sampli ng techniques fo r particle filtering ( PF) when either (a) the observation like- lihood (OL) is fre quentl y multimodal or hea vy-tailed, or (b) the state space dimension is large or both. When the OL is multimodal, but the state transition pdf (STP) is narro w enough, the optimal i mportance density is usu ally un imodal. Und er th is assumption, many techniques have been proposed. But when the STP is b road, this assumption does not hold. W e study how existing techniq ues can be generalized to situations where the optimal importance den sity is mu ltimodal, but is unimodal conditioned on a part of the state vector . Sufficient conditions to test for the unimodality of this condi- tional posterior are deriv ed . Our result is directly extend able to testing for unimodality of any posterior . The number of p articles, N, to accurately track using a PF increases with state space dimension, thus maki ng any regular PF impractical f or large d imensional tracking problems. But in most such p roblems, most of the state change occurs in only a few dimensions, whi le the change in the rest of the dimensions is small. Usi ng th is property , we propose to replace importance sampling fr om a large p art of the state space (whose conditional posterior is narro w enough) by just tracking the mode of the conditional posterior . This introduces some extra error , but i t also greatly reduces the importance sampli ng dimension. The net effect is much smaller error fo r a given N, especially when the a vailable N is small. An important class of l arge dimensional p roblems with multi modal OL is tracking spatially vary ing physical quantities su ch as temperatur e or pressure in a large area using a network of sensors which may be nonlinear and/or may have non-negligible failure probabilities. Improv ed perfo rmance of our proposed algorithms ov er existing PF s is demonstrated for this problem. I . I N T RO D U C T I O N T racking is the p roblem of causally estimating a h idden state sequence , { X t } , fr om a sequ ence o f noisy and p ossibly nonlinear o bservations, { Y t } th at satisfy the Hidd en Markov Model (HMM) assumption. A tracker recursi vely co mputes (or approx imates) the “posterior” at time t , using the posterio r at t − 1 and the curren t obser vation Y t . For no nlinear an d/or no n- Gaussian state space mod els, the posterior cannot be comp uted exactly . But, it can be efficiently approximated using a se- quential Mo nte Carlo method called particle filtering (PF) [ 3], [4], [5]. A PF ou tputs at each time t , a cloud of N weighted particles whose empirical m easure closely approxim ates the true posterior for large N . A generic PF is summarized in Algorith m 1. T here are two main issues in PF design: (a) c hoice of impor tance sampling d ensity that redu ces the variance o f the particle weights an d thus impr oves “effecti ve particle size” [6] and (b) choice of resampling techniq ues N. V aswani is with the Dept. of Electric al and Computer Engineering, Iow a State Uni versit y , Ames, IA 50011 (email: namrata@iasta te.edu). A pa rt of this pape r is c ontained in [1 ]. Some initial ide as appeare d in [2]. that improve effecti ve par ticle size wh ile not sign ificantly increasing “pa rticle impoverishmen t” [4]. Some solutions f or (b) are [5, Ch. 13 ],[7], [8]. Our focus is on designing efficient importan ce de nsities and analyzing the assumptions under which they work, when either or both of the following occu r: 1) The observation likelihoo d (OL) is freque ntly multi- modal or heavy-tailed (or most gener ally , no t strongly log-con cav e) as a function of the state and th e state transition pr ior (STP) is bro ad. 2) State space dimension is large (typically more than 10 o r 12). It is well known [3], [9] th at the n umber of particles for a given tracking accuracy increases with state space dimension. This m akes any regular PF imp ractical for large dimensiona l state spaces (LDSS) . Definition 1 ( Multimodal (o r heavy- tailed) OL): ref ers to the OL, p ( Y t | X t ) , having multiple local maxima (or a heavy tail) as a functio n o f the state, X t , for a gi ven observ ation, Y t . An example is the observation model for the nonstationary growth mod el of [ 3]: Y t = X 2 t + w t . Here, the OL is b imodal with m odes a t X t = ± √ Y t whenever Y t is sig nificantly positive. Another example is the clutter mode l of [1 0]. Other examp les ar e as f ollows. Con sider track ing spatially varying temper ature chang e using a network of sensors ( see Example 1). Wh enever o ne or mo re sensors fail (e.g. due to a large u nmode led disturban ce or som e other d amage), the OL is ofte n heavy-tailed or mu ltimodal (see Fig. 1). The mo dels of Exa mple 1 are also similar to the com monly used clutter model in rad ar ba sed target trac king application s or in co ntour tracking applications, e. g. Condensation [10], an d to o utlier noise models used in other visual tracking problem s [11] or in airc raft navigation prob lems [9]. A nother r eason for OL multimoda lity is having a sensor that measur es a nonlin ear (many-to-o ne) functio n of the actual temperature. For e.g ., the growth model of [3 ]. Ano ther m any-to-on e example is when the observation is a pro duct of functions o f two subsets of states plus noise, for e.g. bearings-on ly trackin g [3] o r illumination a nd m otion trackin g [12], [13]. Note that even though our w ork was motiv ated by tracking problem s with fre quently mu ltimodal OL, it is equally well applicab le to any pr ob lem where the po sterior is often mu lti- modal (e. g. due to nonline arities in the system model) , b ut is unimoda l conditioned o n a part of the state sp ace . Large dimen sional state spaces ( LDSS) occur in tracking time-varying rand om fields, such as temper ature o r pressure, at a large numb er of no des using a network o f sen sors [14], [15] (ap plications in environment monitoring and weath er forecasting ); in tr acking AR parameters fo r noisy spee ch [16]; and in visual trac king pro blems such as trackin g deform ing contour s [17], [ 18], [1 9], [1 1], tr acking spatially varying 2 −15 −10 −5 0 5 10 15 0 10 20 30 40 50 State, X t → −log PDF(X t ) Narrow prior: Unimodal posterior −log[OL] −log[STP] −log[p*(X t )] (a) Bimodal OL, Narrow STP: Unimodal p ∗ −15 −10 −5 0 5 10 15 0 10 20 30 40 50 State, X t → −log PDF(X t ) Broad prior: Multimodal posterior −log[OL] −log[STP] −log[p*(X t )] (b) Bimodal OL, Broad ST P: Bimodal p ∗ −15 −10 −5 0 5 10 15 0 10 20 30 40 50 X t −log PDF(X t ) Broad STP: Multimodal p* −log[OL] −log[STP] −log[p * (X t )] (c) Heavy -tailed OL, Broad STP: Bimodal p ∗ Fig. 1. Demonstrating the effec t of multimodal or heavy-tailed OL and broad STP for a M = 1 dimensional version of Example 1 with temperature independent failure. X t is temperature. The STP is N ( X t − 1 , σ 2 sy s ) , i.e. E xample 1 with a = 0 . Fig. 1(a): One out of J = 2 sensors fails (bimodal OL) but narrow enough STP ( σ 2 sy s = 1 ). S o p ∗ is unimodal. Fig. 1(b): One out of J = 2 sensors fails (bimodal OL) and broad S TP ( σ 2 sy s = 5 ). So p ∗ is bimodal. Fi g. 1(c): Estimating temperature but wit h J = 1 sensor and broad STP ( σ 2 sy s = 5 ). When the sensor fails, the OL is heavy-tailed and peaks at the wrong mode. T hus p ∗ is bimodal with the wrong mode being the strong one. Note that the correct mode is so weak it may get missed in numerical computations. illumination c hange [1 2], [13] or tr acking sets o f “land mark” points [20]. In all of the above pro blems, at a ny time, “most state change” occurs in a small numb er of dimensions, while the chan ge in the rest of the sta te space is small. W e call this the “LDSS pr operty . The LDSS prop erty is r elated to , but different from, the assumption used by dimen sion reductio n technique s such a s Princip al Com ponen ts Analysis (PCA). If X t is a station ary large dimensional time series, or if X t projected alon g a large part of the state space is asymp totically stationary , PCA can be used f or dimension redu ction. Un der a similar assumption , ano ther PF h as been recently pro posed [21]. But if X t follows a rando m walk mo del (the inc rements, X t − X t − 1 , are station ary) in a ll dimensions, one cann ot simply eliminate the lo w variance direction s of X t − X t − 1 , or use [21]. This is beca use the variance of X t ev en alon g these directions will b e sig nificant as t increases. A generic PF is summariz ed in Algorith m 1. The most common ly used imp ortance sampling density is the STP [3]. This a ssumes nothin g and is easiest to imp lement. But since this d oes not use knowledge of th e observation, the weigh ts’ variance c an be large (p articularly when th e STP is broad compare d to the OL), resulting in lower effective p article sizes [4]. T he “optimal” importan ce density [6 ], i.e. one th at min - imizes the variance of weig hts condition ed on past par ticles and o bservations un til t , is the posterior condition ed on the previous state, d enoted p ∗ . When p ∗ is unimodal ( at least approx imately), PF-Do ucet [ 6] ap proxim ates it b y a Gau ssian about its mode (L aplace’ s ap proxima tion) and importan ce samples from the Gaussian. Laplace’ s appro ximation has also been u sed for ap proxim ating po steriors in dif ferent con texts earlier [2 2], [23], [ 24]. Other work in PF literature that a lso implicitly assumes that p ∗ is u nimodal includes [4], [25], [26]. When the OL is m ultimodal, p ∗ will b e u nimoda l o nly if the STP is unimodal and narrow enoug h (see Fig. 1). In many situations, espec ially for LDSS problem s, th is does n ot hold . W e develop the PF with Effi cient IS (PF-EIS) algorithm to address such situation s. PF-EIS assumes unim odality of p ∗ condition ed on a fe w states which we call “multimodal s tates” . Sufficient co nditions to test for the unimodality of this condition al posterior are derived in Th eorem 1. T o the b est of o ur knowledge, such a result h as not b een proved ear lier . I t is eq ually ap plicable to test for unimod ality o f any p osterior . When in addition to multimodality , the state space space dimension is also large (typ ically mo re than 10 or 12 ), the number of particles req uired for rea sonable accuracy is very large [3], [9] and this makes a regular PF impractical. One solution tha t partially addr esses this issue is [5, Ch 13 ] or [7] which prop ose to resamp le more than once within a time interval. But more resamp ling r esults in more particle impov- erishment [4]. When the state space model is c onditiona lly linear-Gaussian, or when many states can be vector quantized into a few discrete centers (need to know the centers a- priori), Rao Blackwellization (RB-PF) [ 27], [9] can be used. In gene ral, n either assum ption may hold. But when the L DSS proper ty holds, it is p ossible to split the state space in su ch a way t hat the conditional posterior of a part of it is quite n arrow , besides being unimodal. If it is na rrow enou gh, impo rtance sampling (IS) from this part of the state space can be replaced by just trackin g the mode of the cond itional posterior ( mode tracking (MT) ). The resu lting algorith m is called PF-EIS-MT . MT introdu ces some extra er ror . But it greatly reduces the IS dimension. T he n et effect, is that a much smaller nu mber of particles are requir ed to achieve a given error, thus makin g PF practical for L DSS pr oblems. In summary , our contributions ar e (a) two efficient algo- rithms for multimod al and large dimen sional p roblems, (PF- EIS and PF-EI S-MT); and (b) a set o f su fficient conditio ns to test for u nimodality o f the condition al p osterior (Th eorem 1) and heuristics based on it to split the state space in the most efficient way . PF-EIS and Theor em 1 are derived in Sec. II. A ge neric L DSS mo del is introd uced in Sec. III. Practical ways o f choosing the “multimod al states” are d iscussed in Sec. IV. PF-EIS-MT and PF-MT are introduc ed in Sec. V. Relation to existing work is descr ibed in Sec. VI. In Sec. VII, we given extensive simulation results comp aring ou r me thods with existing work for the temper ature field trackin g prob lem. Conclusions and op en issues are presen ted in Sec. VII I. I I . P F - E I S : P F - E FFI C I E N T I M P O RTA N C E S A M P L I N G W e den ote the prob ability density fun ction ( pdf) of a random vecto r X , f X ( X ) , using the notation p ( X ) a nd we denote th e con ditional p df, f X | Y ( X | Y ) , by p ( X | Y ) . Con sider tracking a hidden sequ ence of states X t from a sequen ce o f observations Y t which satisfy th e HMM pro perty: 3 Algorithm 1 Generic PF . Going fr om π N t − 1 to π N t ( X t ) , P N i =1 w ( i ) t δ ( X t − X i t ) (note δ ( X − a ) denotes a Dirac delta function at a ) A PF starts with sam pling N times f rom π 0 at t = 0 to ap proxim ate it by π N t ( X 0 ) . For ea ch t > 0 , it appr oximates the Bayes recursion f or g oing fro m π N t − 1 to π N t using seq uential im portance samplin g. Th is consists of th e following 3 steps: 1) Importance Sample (IS ): For i = 1 , 2 ...N , Sample X i t ∼ q ( X i t ) . T he IS den sity , q , can depend on X i 1: t − 1 , Y 1: t . 2) W eig ht: For i = 1 , 2 ...N , co mpute the weigh ts: w i t = ˜ w i t P N j =1 ˜ w ( j ) t , where ˜ w i t = w i t − 1 p ( Y t | X i t ) p ( X i t | X i t − 1 ) q ( X i t ) . 3) Resample: Replicate pa rticles in pro portion to their weights & r eset w i t for all i [4]. Set t ← t + 1 & go to step 1. Assumption 1 (HMM): For each t , 1) The dependen ce X t − 1 → X t is Markovian, with state transition pdf (STP), p ( X t | X t − 1 ) . 2) Conditioned on X t , Y t is indep endent o f p ast and f uture states and observations. The observation likelihood (OL) is p ( Y t | X t ) . A g eneric par ticle filter (PF) is summarized in A lgorithm 1. A. PF-E IS Algorithm Consider designing a PF fo r a g i ven state space model. The optimal imp ortance samp ling density [6] is p ( X t | X i t − 1 , Y t ) , p ∗ ( X t ) . In most cases, th is c annot b e compu ted analy ti- cally [6]. If p ∗ is un imodal (at least appr o ximately) , [6] suggests appr oximating it by a Gaussian abou t its mod e and sampling from it (L aplace’ s approxima tion [ 24]). But, when the OL is multim odal, or h eavy-tailed, or other wise not strongly log-concave, p ∗ will be unimodal o nly if the STP is unimo dal and n arrow en ough an d the predicted state particle is near en ough to an OL mod e (see Fig. 1). In many situation s, this may not ho ld in all dime nsions. But in most such situation s, the STP is broad and/or multimo dal in only a few directions of th e state space which we call the “multimodal” d ir ectio ns . I t can be shown that if the STP is unimod al and narrow en ough in the rest o f the d irections, p ∗ will be unimo dal cond itioned on the “multimo dal states” (Theor em 1). When this holds, we prop ose to split the state vector as X t = [ X t,s ; X t,r ] in such a way th at X t,s contains the m inimum numb er o f dimensio ns for wh ich p ∗ is unim odal condition ed o n it, i.e. p ∗∗ ,i ( X t,r ) , p ∗ ( X t | X i t,s ) = p ( X t,r | X i t − 1 , X i t,s , Y t ) (1) is unim odal. W e samp le X t,s from its STP (to samp le the p os- sibly multiple m odes of p ∗ ), an d u se Laplace’ s ap proxim ation to appro ximate p ∗∗ ,i and sample X t,r from it, i.e. samp le X i t,r from N ( m i t , Σ i I S ) wh ere m i t = m i t ( X i t − 1 , X i t,s , Y t ) , min X t,r L i ( X t,r ) , where, Σ i I S , [( ∇ 2 L i )( m i t )] − 1 L i ( X t,r ) , − log[ p ∗∗ ,i ( X t,r )] + const (2) ∇ 2 L i denotes th e Hessian of L i . Th e weig hting step also changes to satisfy the princip le of importan ce sampling. The complete algor ithm is given in Algorith m 2. W e ca ll it PF with Efficient I mportan ce Sampling (PF-EIS). As we shall see later , it is very expensi ve to exactly verify the unim odality condition s of Th eorem 1. But ev en if X t,s is cho sen so that p ∗∗ ,i is unimod al for most par ticles an d at most times (i.e. is unimod al with high probab ility), the prop osed alg orithm w orks well. This c an be seen fro m the simulation re sults of Sec. VII. B. Conditio nal P osterior Unimo dality W e derive sufficient condition s for u nimodality of the condi- tional posterio r , p ∗∗ ,i . Let dim( X t,s ) , K , dim( X t,r ) , M r , dim( X t ) , M = K + M r . Because of the HMM structur e, p ∗∗ ,i ( X t,r ) = ζ p ( Y t | X i t,s , X t,r ) p ( X t,r | X i t − 1 , X i t,s ) (3) where ζ is a propo rtionality co nstant. Definition 2 : W e first define a f e w terms an d sym bols. 1) The notatio n A > 0 ( A ≥ 0 ) wh ere A is a squ are m atrix means th at A is p ositive defi nite (po sitive semi-defin ite) . Also, A > B ( A ≥ B ) m eans A − B > 0 ( A − B ≥ 0 ) . 2) The ter m “minimizer” refers to the un constrained loca l minimizer of a function, i.e. a p oint x 0 s.t. f ( x 0 ) ≤ f ( x ) ∀ x in its neig hborh ood. Similarly for “maximizer” . 3) A twice differentiable function , f ( x ) , is str ongly conve x in a region R , if there exists an m > 0 s.t. at all points, x ∈ R , the Hessian ∇ 2 f ( x ) ≥ mI . If f is stron gly conv ex in R , it h as at most on e minimizer in R and it lies in the interio r of R . If f is stro ngly-co n vex on R M , then it has exactly on e (finite) minimize r . 4) A fun ction is str ongly log-conca ve if its n egati ve log is strongly co n vex. An example is a Gaussian pdf. 5) Since a p df is an integrable f unction, it will always h av e at least one (finite) maximizer . Thus a pdf having at most one m aximizer is equivalent to it being u nimodal . 6) The sym bol E [ . ] den otes expected value. 7) W e d enote the − log of O L u sing th e symbo l E Y t , i.e. E Y t ( X t ) , − log p ( Y t | X t ) + const (4) 8) W e d enote the − log of th e STP o f X t,r as D i ( X t,r ) , − log p ( X t,r | X i t − 1 , X i t,s ) + const (5) 9) When th e STP of X t,r is stro ngly log-co ncave (assum ed in Th eorem 1) , we deno te its unique mode by f i r , f r ( X i t − 1 , X i t,s ) = arg max X t,r p ( X t,r | X i t − 1 , X i t,s ) (6 ) 10) [ z ] p or z p denotes the p th coordin ate of a vector, z . 11) max p is of ten used in place of max p =1 , 2 ,...M r . Combining (3), (4) and (5), L i ( X t,r ) can be written as L i ( X t,r ) = E Y t ( X i t,s , X t,r ) + D i ( X t,r ) (7) Now , p ∗∗ ,i ( X t,r ) will be unimo dal if and on ly if we can sh ow that L i has at most o ne minimizer . W e derive a set of sufficient 4 Algorithm 2 P F-EIS. Going from π N t − 1 to π N t ( X t ) = P N i =1 w ( i ) t δ ( X t − X i t ) , X i t = [ X i t,s , X i t,r ] 1) Importance Sample X t,s : ∀ i , sample X i t,s ∼ p ( X i t,s | X i t − 1 ) . 2) Efficient Importanc e S ample X t,r : ∀ i , samp le X i t,r ∼ N ( X i t,r ; m i t , Σ i I S ) . Here m i t ( X i t − 1 , X i t,s , Y t ) = arg min X t,r L i ( X t,r ) an d Σ i I S , ( ∇ 2 L i ( m i t )) − 1 and L i is defined in ( 7). 3) W eig ht: ∀ i , co mpute w i t = ˜ w i t P N j =1 ˜ w j t where ˜ w i t = w i t − 1 p ( Y t | X i t ) p ( X i t,r | X i t − 1 ,X i t,s ) N ( X i t,r ; m i t , Σ i I S ) where X i t = [ X i t,s , X i t,r ] . 4) Resample [4]. Set t ← t + 1 & go to step 1 . condition s o n E Y t , D i and f i r to en sure th is. T he main idea is as f ollows. W e assume stro ng log -concavity (e.g. Gaussianity) of the STP of X t,r . Thus D i ( X t,r ) will be stron gly conve x with a u nique minimizer at f i r . But E Y t ( X t ) (and so E Y t as a function of X t,r ) can h av e multiple m inimizers since OL can be m ultimodal. Assume tha t E Y t ( X i t,s , X t,r ) is locally co n vex in the neig hborh ood of f i r (this will ho ld if f i r is clo se eno ugh to any o f its minim izers). Denote this region by R LC . Thu s, inside R LC , L i will be strongly con vex and hen ce it will have at mo st one minim izer . W e sh ow th at if max p | [ ∇ D ] p | is large e nough outside R LC (the spre ad of the STP o f X t,r is small en ough) , L i will have no stationar y poin ts (and h ence no m inimizers) ou tside R LC or o n its bo undar y . This id ea leads to Theorem 1 b elow . Its first condition ensures stron g conv exity of D i ev erywher e. The secon d one ensures that R LC exists. Th e thir d o ne ensures that ∃ an ǫ 0 > 0 , s.t. at all poin ts in R c LC (complem ent of R LC ), max p | [ ∇ L i ] p | > ǫ 0 (i.e. L i has no station ary poin ts in R c LC ). Theor em 1: p ∗∗ ,i ( X t,r ) is u nimodal with th e u nique mode lying inside R LC if Assumption 1 an d the following hold: 1) The STP o f X t,r , p ( X t,r | X i t − 1 , X i t,s ) , is stro ngly log- concave. Its uniq ue mode is denoted b y f i r . 2) The − log of OL giv en X i t,s , E Y t ( X i t,s , X t,r ) is twice continuo usly differentiable almost e verywh ere and is locally conve x in th e neighb orhoo d of f i r . Let R LC ⊆ R M r denote the largest convex region in th e n eighbor- hood of f i r where ∇ 2 X t,r E Y t ( X i t,s , X t,r ) ≥ 0 ( E Y t as a function of X t,r is locally conv ex). 3) There exists an ǫ 0 > 0 such that inf X t,r ∈∩ M r p =1 ( A p ∪Z p ) max p =1 ,...M r [ γ p ( X t,r )] > 1 (8) where γ p ( X t,r ) ,      | [ ∇ D i ] p | ǫ 0 + | [ ∇ E Y t ] p | , if X t,r ∈ A p | [ ∇ D i ] p | ǫ 0 −| [ ∇ E Y t ] p | , if X t,r ∈ Z p (9) A p , { X t,r ∈ R c LC : [ ∇ D i ] p . [ ∇ E Y t ] p < 0 } Z p , { X t,r ∈ R c LC : [ ∇ E Y t ] p . [ ∇ D i ] p ≥ 0 & | [ ∇ E Y t ] p | < ǫ 0 } ( 10) ∇ E Y t , ∇ X t,r E Y t ( X i t,s , X t,r ) ∇ D i , ∇ X t,r D i ( X t,r ) (11) Pr oof: In the proo f, ∇ is used to denote ∇ X t,r . Also, we remove th e su perscripts f rom L i and D i . p ∗∗ ,i ( X t,r ) will be unimod al iff L d efined in (7) has at most o ne m inimizer . W e obtain sufficient cond itions for this. Cond ition 1) ensur es that D is stro ngly co n vex e verywher e with a uniqu e minimizer at f i r . Condition 2 ) e nsures th at R LC exists. By definition of R LC , E Y t is conv ex in side it. Thus the first two cond itions ensure tha t L is strongly conve x in side R LC . So it has at most one m inimizer inside R LC . W e now show that if co ndition 3) also ho lds, L will have no stationar y points (and h ence no m inimizers) in R c LC or o n its bo undar y . A sufficient condition for this is: ∃ ǫ 0 > 0 s.t. max p | [ ∇ L ] p | > ǫ 0 , ∀ X t,r ∈ R c LC (12) W e show th at condition 3 ) is sufficient to ensu re (1 2). No te that ∇ L = ∇ E Y t + ∇ D . In the regions wh ere for a t least o ne p , [ ∇ E Y t ] p . [ ∇ D ] p ≥ 0 (have same sign) and | [ ∇ E Y t ] p | > ǫ 0 , condition (1 2) will always h old. Thus we only n eed to worry about regions wher e, for all p , either [ ∇ E Y t ] p . [ ∇ D ] p < 0 or [ ∇ E Y t ] p . [ ∇ D ] p ≥ 0 but | [ ∇ E Y t ] p | < ǫ 0 . This is the region ∩ M r p =1 ( A p ∪ Z p ) , G , A p , Z p defined in (1 0) (13) Now , D o nly h as o ne stationary point which is f i r and it lies inside R LC (by definition of R LC ), and n one in R c LC . Thu s ∇ D 6 = 0 in R c LC and, in p articular, inside G ⊂ R c LC . Th us if we can find a c ondition which ensures that, f or all poin ts in G , for at least on e p , [ ∇ L ] p “follows th e sign of [ ∇ D ] p ” (i.e. [ ∇ L ] p > ǫ 0 where [ ∇ D ] p > 0 and [ ∇ L ] p < − ǫ 0 where [ ∇ D ] p < 0 ), we will be do ne. W e first find the required co ndition for a g i ven p and a p oint X t,r ∈ G . For any p , if X t,r ∈ G , then it either b elongs to A p or b elongs to Z p . If X t,r ∈ A p , | [ ∇ L ] p | > ǫ 0 if | [ ∇ D ] p | ǫ 0 + | [ ∇ E Y t ] p | > 1 (14) This is o btained by combining the co nditions for the case [ ∇ D ] p > 0 and the case [ ∇ D ] p < 0 . Proceed ing in a similar fashion, if X t,r ∈ Z p , | [ ∇ L ] p | > ǫ 0 if | [ ∇ D ] p | ǫ 0 − | [ ∇ E Y t ] p | > 1 (15) Inequa lities (14) and (1 5) c an b e combined and re written as γ p ( X t,r ) − 1 > 0 where γ p is d efined in (9). For (12) to hold, we n eed | [ ∇ L ] p | > ǫ 0 for at least on e p , for all X t,r ∈ G . This will happen if inf X t,r ∈G max p γ p ( X t,r ) > 1 . But this is con dition 3. Thu s co ndition 3) im plies that L has no minimizers in R c LC . Th us if co nditions 1), 2) and 3) of the theorem hold , L h as at most o ne minimizer w hich lies inside R LC . T hus p ∗∗ ,i ( X t,r ) has a uniqu e mod e wh ich lies inside R LC , i.e. it is unimod al.  The most comm on example of a stron gly log-co ncave p df is a Gaussian. When the STP o f X t,r is Gaussian with mean (= 5 mode) f i r , the ab ove result can be furth er simplified to get an upper bou nd on the eigenv alues of its covariance m atrix. First consider the case when the cov aria nce is diagon al, d enoted ∆ r . In this case, D i ( X t,r ) = P p ([ X t,r − f i r ] p ) 2 2∆ r,p and so [ ∇ D i ] p = [ X t,r − f i r ] p ∆ r,p . By substituting this in con dition 3), it is easy to see th at we g et the following simplified cond ition: inf X t,r ∈∩ M r p =1 ( A p ∪Z p ) max p [ γ num p ( X t,r ) − ∆ r,p ] > 0 (16) γ num p ( X t,r ) ,      | [ X t,r − f i r ] p | ǫ 0 + | [ ∇ E Y t ] p | , if X t,r ∈ A p | [ X t,r − f i r ] p ǫ 0 −| [ ∇ E Y t ] p | , if X t,r ∈ Z p (17) A p , { X t,r ∈ R c LC : [ X t,r − f i r ] p . [ ∇ E Y t ] p < 0 } Z p , { X t,r ∈ R c LC : [ ∇ E Y t ] p . [ X t,r − f i r ] p ≥ 0 & | [ ∇ E Y t ] p | < ǫ 0 } (18) Also, since max p [ g 1 ( p ) − g 2 ( p )] ≥ max p g 1 ( p ) − max p g 2 ( p ) for any two f unctions, g 1 , g 2 , a sufficient co ndition f or ( 16) is max p ∆ r,p < inf X t,r ∈∩ M r p =1 ( A p ∪Z p ) max p [ γ num p ( X t,r )] , ∆ ∗ (19) Thus, we h av e the following coro llary . Cor ollary 1: When the STP o f X t,r is Gau ssian with mean f i r and diagonal covariance, ∆ r , p ∗∗ ,i ( X t,r ) is un imodal if (a) conditio n 2) of Theo rem 1 ho lds an d ( b) there exists an ǫ 0 > 0 s.t. (16) ho lds with γ num p defined in (17) and A p , Z p defined in (1 8). A sufficient cond ition for (16) is (19). Now consider the case when the STP of X t,r is Gaus- sian with non-diag onal covariance, Σ r = U ∆ r U T . Defin e ˜ X t,r = U T X t,r . Since ˜ X t,r is a one-to -one and linear function of X t,r , it is easy to see th at p ∗∗ ,i ( X t,r ) is unimodal iff p ∗∗ ,i ( ˜ X t,r ) , p ( ˜ X t,r | X i t − 1 , X i t,s , Y t ) is unim odal. Th e STP of ˜ X t,r is N ( U T f i r , ∆ r ) . Also, its OL is p ( Y t | X i t,s , U ˜ X t,r ) . Define ˜ E Y t ( ˜ X t,r ) , E Y t ( U ˜ X t,r ) . Cor ollary 2: When the STP o f X t,r is Gau ssian with mean f i r and non- diagonal covariance, Σ r = U ∆ r U T , p ∗∗ ,i ( X t,r ) is unimodal if the condition s of Corollary 1 hold with E Y t replaced by ˜ E Y t ; f i r replaced by U T f i r and X t,r replaced by ˜ X t,r ev erywher e. T o summar ize the above discussion, p ∗∗ ,i is unimod al if 1) The STP of X t,r is strongly log-con cav e (e.g. Gaussian), 2) The mode of the STP of X t,r is “close enough ” to a mode of [O L gi ven X i t,s ], so that condition 2) of Theorem 1 ho lds. Deno te this mod e b y X ∗ r . 3) The max imum spread of the STP of X t,r is “small enoug h” to ensure that con dition 3) o f Theore m 1 holds. In the Gau ssian STP case, this translates to the maxim um eigenv alu e of its covariance be ing smaller than ∆ ∗ , defined in (1 9). ∆ ∗ itself is directly propor tional to th e distance of X ∗ r to th e n ext near est mo de of [OL given X i t,s ] and inversely p roportio nal to its strength. The last two cond itions a bove automa tically hold if [OL g iven X i t,s ] is str ongly log-concav e ( R c LC is empty and s o ∆ ∗ = ∞ ). I I I . A G E N E R I C S TA T E S PAC E M O D E L F O R L D S S For many pro blems, an d, in particu lar , for many large dimensiona l state space (LD SS) pro blems, the state space model ca n be expressed as follows with X t = [ C t , v t ] (a generalizatio n of the con stant velocity mo tion mode l): Y t = h C,w ( C t , w t ) , w t ∼ p w ( . ) C t = C t − 1 + g C t − 1 ( B v t ) , B , B ( C t − 1 ) v t = f v ( v t − 1 ) + ν t , ν t ∼ N (0 , ∆ ν ) , ∆ ν diagona l (20) The no ises ν t , w t are indep endent of each othe r and over time. If h C,w is one -to-one as a function of w t , an d its inverse is denoted by g ( C t , Y t ) , th e OL can be written as p ( Y t | C t ) = p w ( g ( C t , Y t )) (21) Then its − log , E Y t ( C t ) = − log p w ( g ( C t , Y t )) . In cer- tain problems, it is easier to directly specify p ( Y t | C t ) = β ex p[ − E Y t ( C t )] . In th e above mo del, C t denotes the LDSS quantity of interest, for e.g. it m ay deno te the M con tour point locations or it may denote tempera ture ( or any other p hysical quantity) at M sensor node s. The quantity V t , B v t often denotes the time “derivati ve” of C t and is assumed to f ollow a first o rder Mar kov m odel. If C t belongs to a smooth manifold S , then V t belongs to the tangent space to S at C t . g C ( V ) denotes the mapp ing fro m the tan gent sp ace at C to S , while if S is a vector sp ace, then g C ( V ) ≡ V . In this work, we only study the vector space case. W e d ev elop the same ideas for the sp ace of contou rs (a smooth ma nifold) in [1 1]. Related work on definin g AR m odels f or smo oth m anifolds is [28]. Note that in the above model, the system noise d imension (and hence th e importance sampling d imension) is M = dim( ν t ) = dim( v t ) , an d not 2 M , an d this is what g overns the nu mber of particles req uired for a giv en accur acy . W e d iscuss som e LDSS examp les b elow . Example 1 (T emperature T racking): Consider tracking temperatur e at M loc ations using a n etwork of sensors. Here S is a vector space and so g C ( V ) ≡ V . Let C t,p denotes temperatur e at locatio n p , p = 1 , . . . M an d V t,p denote the first d eriv a ti ve of temper ature at no de p . V t is assumed to be zero mean an d its dy namics can be mod eled by a linear Gauss M arkov m odel (as also in [ 14]), i.e . C t = C t − 1 + V t , V t = A V V t − 1 + n t , n t ∼ N (0 , Σ n ) ( 22) Since V t is u sually spatially correlated, Σ n may not be diagona l. Let the eigenvalue deco mposition of Σ n is Σ n = B ∆ ν B T . Define v t , B T V t , ν t , B T n t , f v ( v ) ≡ B T A V B v and g C ( V ) ≡ V . For simplicity , we use A V = aI an d so f v ( v ) ≡ B T A V B v = av . W ith f v ( v ) = av , B is also the eigenv ector matrix of the covariance of V t . Then (22) can be rewritten in the fo rm (20) as C t = C t − 1 + B v t v t = av t − 1 + ν t , ν t ∼ N (0 , ∆ ν ) (23) T em perature at each node, p , is measur ed using J ( J = 1 or 2) sensors tha t have failure probabilities α ( j ) p , j = 1 , 2 . Note tha t th ere m ay actually b e two senso rs at a nod e, or two nearby senso rs can be combin ed and treated as one “node” 6 for trackin g purpo ses. Failure of the J M sensors is assumed to be independ ent of each other and over time. If a sensor is working , the ob servation, Y ( j ) t,p , is the actual temperature, C t,p , or some function of it, h p ( C t,p ) , plus Gaussian noise with small variance, σ 2 obs,p (indepen dent of noise in other sensors and at other times). If the senso r fails, Y ( j ) t,p is either indepen dent of , or weakly depend ent on C t,p (e.g. large variance Gaussian about C t,p ). An alternative failure mod el is Y ( j ) t,p being some different f unction, h f p , of C t,p plus no ise. In all the ab ove cases, the OL c an be wr itten as p ( Y t | C t ) = M Y p =1 J Y j =1 p ( Y ( j ) t,p | C t,p ) , wher e p ( Y ( j ) t,p | C t,p ) = (1 − α ( j ) p ) N ( Y ( j ) t,p ; h p ( C t,p ) , σ 2 obs,p ) + α ( j ) p p f ( Y ( j ) t,p | C t,p ) (24) W e simulated two typ es of senso rs h p ( C t,p ) = C t,p (linear) and h p ( C t,p ) = C 2 t,p (squared ). Note that a squared sensor is an extreme example of possible sensing nonlin earities. First consider J = 1 (on e sensor per n ode), h p ( C t,p ) = C t,p , ∀ p (all line ar senso rs), an d p f ( Y ( j ) t,p | C t,p ) = p f ( Y ( j ) t,p ) (when the sensor fails, the observation is in depend ent of the true temperatur e). In this case, each O L term is a raised Gaussian (heavy-tailed) as a fun ction of C t,p and so it is n ot strong ly log-con cav e. For a given p , p ∗ ( C t,p ) will be mu ltimodal when Y (1) t,p is “far” from the pr edicted tempera ture at this node and the STP is not narr ow enou gh. Th is happ ens with high probab ility ( w .h.p.) whe nev er the sensor fails. See Fig. 1(c). A similar m odel is also used in m odeling clu tter [1 0], [20]. Now consider J = 2 , all linear sensors and p f ( Y ( j ) t,p | C t,p ) = p f ( Y ( j ) t,p ) . Whenever one or bo th sensors at a nod e p 0 fail, the observations Y (1) t,p 0 , Y (2) t,p 0 will be “far” co mpared to σ 2 obs w .h.p. In this ca se, the OL will be bim odal as a fun ction of C t,p 0 since p ( Y t,p | C t,p ) can be written as a su m of four terms: a pr oduct o f Gaussians term (which is n egligible), plus K 1 + K 2 N ( Y (1) t,p 0 ; C t,p 0 , σ 2 obs ) + K 3 N ( Y (2) t,p 0 ; C t,p 0 , σ 2 obs ) where K 1 , K 2 , K 3 are constants w .r .t. C t . This is bimod al since the modes of the two Ga ussians, Y (1) t,p 0 , Y (2) t,p 0 , are “far”. See Fig. 1 (b). I f no sensor at a node fails, bo th ob servations will be “close” w . h.p.. In this case all four terms h av e rough ly th e same mode, and thus th e su m is u nimoda l. When p f ( Y ( j ) t,p | C t,p ) is weakly dependent on C t,p (e.g. a large variance Gaussian), K 1 , K 2 , K 3 are n ot co nstants but are slowly varying functions of C t . A similar argument applies there a s well. A squared sen sor r esults in a bimo dal OL whenever Y ( j ) t,p is sig nificantly p ositi ve. Sq uared sen sor is one examp le of a many-to-o ne m easurement fu nction. Other examples inc lude bearings-o nly tracking [3] and illumination trac king [12], [13]. Example 2 (I llumination/Mo tion T racking): T he illumin a- tion an d m otion track ing m odel of [12], [13] can be rewritten in the form (20). In this case, the OL is of ten multimo dal since the o bservation (image in tensity) is a m any-to-on e function of the state (illuminatio n, motion ), but co nditioned on motion, it is of ten un imodal. Th e STP of illumination is qu ite n arrow . Example 3 (Con tour T racking, Landmark S hape T racking): T wo non -Euclidean space exam ples of the LDSS m odel (2 0) are (a) the con tour track ing pro blems g i ven in [11], [29], [17] and ( b) th e landm ark shape trackin g problem of [ 20], [1 0]. I V . C H O O S I N G T H E “ M U LT I M O D A L S T A T E S ” F O R L D S S In Sec.IV - A below , we app ly Theo rem 1 to th e g eneric LDSS model, (2 0), and show a n example of verifyin g its condition s. Practical ways to select X t,s are given in Sec.IV -B. A. Unimod ality Result for LDSS m odel Consider a model of the form (20) with g C ( V ) ≡ V . Assume that v t can be partition ed into v t = [ v t,s ; v t,r ] where v t,s denotes th e temp erature chang e coefficients along the “multimod al” d irections of the state space and v t,r denotes the rest. Th us, X t,s = v t,s and X t,r = [ v t,r , C t ] . Similarly partition B = [ B s , B r ] , ∆ ν = di ag (∆ ν,s , ∆ ν,r ) a nd ν t = [ ν t,s ; ν t,r ] . W e discuss how to choo se v t,s and v t,r in Sec. IV -B. Th e “m ultimodal” dime nsion, K = dim( v t,s ) and M r = M − K . Denote ˜ C i t , C i t − 1 + B s v i t,s , f i r , f v, r ( v i t − 1 ) Then we have p ∗∗ ,i ( v t,r , C t ) = p ( v t,r , C t | v i t − 1 , C i t − 1 , v i t,s , Y t ) = ζ N ( v t,r ; f i r , ∆ ν,r ) δ ( C t − [ ˜ C i t + B r v t,r ]) p ( Y t | C t ) = ζ N ( v t,r ; f i r , ∆ ν,r ) p ( Y t | ˜ C i t + B r v t,r ) δ ( C t − [ ˜ C i t + B r v t,r ]) , p ∗∗ ,i ( v t,r ) δ ( C t − [ ˜ C i t + B r v t,r ]) (25) where δ denotes th e Dirac delta function and ζ is a p ro- portion ality co nstant. Since C t is a deterministic function of C i t − 1 , v i t,s , v t,r , its pdf is a Dirac d elta f unction (which is tri v ially unimoda l at ˜ C i t + B r v t,r ). Thus for th e purpose of importan ce sam pling, X t,r = v t,r only , and we need condition s to ensure that p ∗∗ ,i ( v t,r ) is unimod al. In th is case, L i ( v t,r ) , − log p ∗∗ ,i ( v t,r ) + con st b ecomes L i ( v t,r ) , E Y t ( ˜ C i t + B r v t,r ) + M − K X p =1 ([ v t,r − f i r ] p ) 2 2∆ ν,r,p (26) Applying Coro llary 1 we get, Cor ollary 3: Conside r model (20) with g C ( V ) ≡ V . Cor ol- lary 1 ap plies with th e following substitutions: X t,s ≡ v t,s , X t,r ≡ v t,r , M r ≡ M − K , ∆ r,p ≡ ∆ ν,r,p , f i r ≡ f v, r ( v i t − 1 ) , ∇ E Y t ≡ B T r ∇ C E ( ˜ C i t + B r v t,r ) , R LC ⊆ R M − K is the largest conv ex region in the neighb orhoo d of f i r where E Y t ( ˜ C i t + B r v t,r ) is c on vex as a function of v t,r . W e de monstrate h ow to verify the conditio ns of Cor ollary 3 using a temp erature tracking example. W e use numerical (finite difference) computations of gradients. Here ǫ 0 needs to be c hosen carefu lly , de pending on the resolution of th e discretization gr id of v t,r . I t shou ld be just large enoug h 1 so that on e do es no t miss any stationa ry point of E Y t . 1 If ǫ 0 is too small, [ ∇ E Y t ] p may transition from a value sm aller than − ǫ 0 to a v alue la rger than + ǫ 0 (or vice ve rsa) o ver on e gri d po int, a nd t his r egion will not be included in Z p (e ven if [ ∇ D ] p has the same sign as [ ∇ E Y t ] p ), thus getting a wrong (too large) value of ∆ ∗ . If ǫ 0 is larger than required, the region Z p may be bigger than needed, thus givi ng a sm alle r value ∆ ∗ than what can actuall y be allo w ed. 7 R LC & the point v t−1,r i v t,r,1 v t,r,2 −10 −5 0 5 10 −15 −10 −5 0 5 10 15 (a) Regi on R LC & point f i r A 1 ∩ A 2 , ∆ *=1.79 v t,r,1 v t,r,2 −10 −5 0 5 10 −15 −10 −5 0 5 10 15 (b) A 1 ∩ A 2 Z 1 ∩ A 2 , ∆ *=1642.6 v t,r,1 v t,r,2 −10 −5 0 5 10 −15 −10 −5 0 5 10 15 (c) Z 1 ∩ A 2 Z 2 ∩ A 1 , ∆ *=403.7 v t,r,1 v t,r,2 −10 −5 0 5 10 −15 −10 −5 0 5 10 15 (d) Z 2 ∩ A 1 Z 1 ∩ Z 2 , ∆ *=4771.4 v t,r,1 v t,r,2 −10 −5 0 5 10 −15 −10 −5 0 5 10 15 (e) Z 1 ∩ Z 2 (f) E Y t ( v t,r ) v t,r,1 v t,r,2 [ ∇ E Y t ] p =0, p=1,2 −10 −8 −6 −4 −2 0 2 4 6 8 10 −15 −10 −5 0 5 10 15 [ ∇ E] 1 =0 [ ∇ E] 2 =0 (g) ( ∇ E Y t ) p = 0 , p = 1 , 2 v t,r,1 v t,r,2 [ ∇ L] p =0, p=1,2 for ∆ r,1 = ∆ r,2 =0.9 ∆ * −10 −8 −6 −4 −2 0 2 4 6 8 10 −15 −10 −5 0 5 10 15 [ ∇ L] 1 =0 [ ∇ L] 2 =0 (h) ( ∇ L ) p = 0 , ∆ r = 0 . 9∆ ∗ v t,r,1 v t,r,2 [ ∇ L] p =0, p=1,2 for ∆ r,1 = ∆ r,2 =1.1 ∆ * −10 −8 −6 −4 −2 0 2 4 6 8 10 −15 −10 −5 0 5 10 15 [ ∇ L] 1 =0 [ ∇ L] 2 =0 (i) ( ∇ L ) p = 0 , ∆ r = 1 . 1∆ ∗ Fig. 2. Computing ∆ ∗ for E xample 4. W e used α (1) = [0 . 1 , 0 . 1 , 0 . 1] , α (2) = [0 . 4 , 0 . 4 , 0 . 4] , p f ( Y ( j ) t,p ) = U nif ( − 10 , 10) , j = 1 , 2 , ∀ p , σ 2 obs = [1 , 1 , 1] , ∆ ν, 1 = 5 . 4 , B = [ − 0 . 27 , 0 . 96 , − 0 . 02] ′ ; [0 . 33 , 0 . 11 , 0 . 94] ′ ; [0 . 90 , 0 . 24 − 0 . 35] ′ (we use MA TLAB notation). Also, C i t − 1 = [0 , 0 , 0] ′ , v i t − 1 ,r = [0 0] ′ , v i t − 1 ,s = 0 , Y (1 , 2) t, 1 = [5 . 36 , 0 . 59] , Y (1 , 2) t, 2 = [ − 2 . 25 , − 1 . 60] Y (1 , 2) t, 3 = [ − 0 . 68 , 0 . 35] and v i t,s = − 3 . 2 (si mulated from N (0 , ∆ ν, 1 ) ). F ig. 2(a): r egion R LC , and the point f i r = v i t − 1 ,r which lies inside it. F ig. 2(b), 2(c), 2(d) , 2(e): the regions, A 1 ∩ A 2 , Z 1 ∩ A 2 , Z 2 ∩ A 1 and Z 1 ∩ Z 2 , along with the computed minimum v alue of max p γ p ( v t,r ) in the 4 regions (1.79, 1642.6, 403.7, 4771.4). The fi nal valu e of ∆ ∗ is the minimum of these four values, i.e. ∆ ∗ = 1 . 79 . Fig 2(f): mesh plot of E Y t as a function of v t,r . Note the 2 dominant modes. Fig 2(g): contours of [ ∇ E Y t ] 1 = 0 and [ ∇ E Y t ] 2 = 0 (obtained using the contour command t o find the zero lev el set of [ ∇ E Y t ] j , j = 1 , 2 ). The contours hav e many points of intersection (points where ∇ E Y t = 0 ), i.e. many stationary points. Fig 2(h): contours of [ ∇ L ] 1 = 0 and [ ∇ L ] 2 = 0 for L computed w ith ∆ ν, 2 = ∆ ν, 3 = 0 . 9∆ ∗ . The contours hav e only one point of i ntersection which is a minimum. Fig 2(i) : contours of of [ ∇ L ] j = 0 , j = 1 , 2 for ∆ ν, 2 = ∆ ν, 3 = 1 . 1∆ ∗ . There are 3 intersection points (2 are minima). Example 4: Consider Ex ample 1. Assume that M = 3 and OL follows (24) with h p ( C t,p ) = C t,p (linear sensors) and p f ( Y ( j ) t,p | C t,p ) = p f ( Y ( j ) t,p ) . Also, let a = 1 . In Fig. 2, we demonstra te how to verify th e cond itions of Coro llary 3. Let K = 1 , i.e M r = 2 . Assume that X t,s = v t,s = v t, 1 and v t,r = v t, 2:3 . Assum e a given value of C i t − 1 , f i r and of Y t (given in the figu re captio n). Note that Y (1) t, 1 = 5 . 36 , Y (2) t, 1 = 0 . 59 are “far” compared to σ obs, 1 = 1 an d hence the OL is multimo dal. Fig. 2(f) plots E Y t ( v t,r ) . Fig. 2(g ) p lots the contour s of [ ∇ E Y t ] p = 0 , p = 1 , 2 (th e points where the red and b lue co ntours inter sect are the stationary points of E Y t ). V erification of co ndition 2 is shown in Fig. 2(a) . Next, we show the steps for comp uting ∆ ∗ . For M r = 2 , G = ∩ 2 p =1 ( A p ∪ Z p ) is a subset of R 2 and is a union of the 4 regions: A 1 ∩ A 2 , Z 1 ∩ A 2 , A 1 ∩ Z 2 , Z 1 ∩ Z 2 , shown in Fig 2(b), 2 (c), 2( d), 2(e). The compu ted value o f the minim um of max p γ num p ( v t,r ) in ea ch region is also giv en in the titles. The final ∆ ∗ = 1 . 7 9 is the m inimum of these 4 v alues. Contours of [ ∇ L i ] 1 = 0 and of [ ∇ L i ] 2 = 0 com puted for ∆ ν, 2 = ∆ ν, 3 = 0 . 9 ∆ ∗ and 1 . 1∆ ∗ are shown in Figs. 2(h), 2(i). Notice that when ∆ ν, 2 = ∆ ν, 3 = 0 . 9∆ ∗ , they intersect at only one point i.e. ∇ L i = 0 at only one po int (o ne station ary point). When ∆ ν, 2 = ∆ ν, 3 = 1 . 1 ∆ ∗ , th ere are 3 stationary points ( and 2 are min ima). B. Choosin g the “Multimodal” States, X t,s Corollary 3 gives a unim odality con dition that needs to be verified separately f or each particle an d each Y t at each t . An exact a lgorithm to do this would b e to begin by check ing at each t , fo r eac h i , if Theor em 1 holds with K = 0 . Keep increasing K and doing this until find a K for which Corollary 3 holds co nditioned on X i t, 1: K . T his can be d one efficiently only if ∆ ∗ can be computed analytically or using some efficient num erical techniq ues. Th at will be the fo cus of future research. But, as d iscussed earlier, PF-EIS works ev en if un imodality of p ∗∗ ,i ( X t,r ) h olds for most p articles at m ost times, i.e. it hold s w .h.p. W e u se the temperatu re trac king pro blem o f Example 1 to explain how to choose X t,s . For a given K , we would like to choose X t,s = v t,s that makes it most likely for p ∗∗ ,i ( v t,r ) to be u nimodal. Giv en X i t − 1 , v i t,s , v t,r is a lin ear f unction o f C t . If v t,r were also a one-to- one f unction of C t , then o ne co uld equiv alently find condition s for unimod ality of p ∗∗ ,i ( C t ) , which is easier to analyze. For an appr oximate an alysis, we make it a one- to-one function o f C t by add ing a very small variance (com pared to th at of any ν t,p ) no ise, n t,s , along B s , i.e. giv en X i t − 1 , v i t,s , set C t = C i t − 1 + B s v i t,s + B r v t,r + B s n t,s . Now , C t is a one-to- one and linear fun ction of [ v t,r , n t,s ] . This also makes p ∗∗ ,i ( C t ) a no n-degener ate pdf. First co nsider the case where w .h.p . OL can be multimo dal as a functio n of te mperature at only one node p 0 , for e.g., h p ( C t,p ) = C t,p , ∀ p 6 = p 0 , α j p = 0 , ∀ p 6 = p 0 , and either α j p 0 > 0 or h p 0 ( C t,p 0 ) is m any-to-on e. Then, p ∗∗ ,i ( C t ) = p ∗∗ ,i ( C t,p 0 ) z }| { ζ p ( Y t,p 0 | C t,p 0 ) p ( C t,p 0 | X i t − 1 , v i t,s ) × [ Y p 6 = p 0 p ( Y t,p | C t,p )] p ( C t | C t,p 0 , X i t − 1 , v i t,s ) (27) 8 and the last two ter ms above are G aussian (and hence strongly log-co ncave) as a function of C t,p , p 6 = p 0 . If p ∗∗ ,i ( C t,p 0 ) is also strongly log-co ncave then p ∗∗ ,i ( C t ) (and hence p ∗∗ ,i ( v t,r ) ) will b e strongly log-co ncave, an d hen ce unimod al. Now , p ∗∗ ,i ( C t,p 0 ) will be stro ngly log- concave if ∃ ǫ 0 > 0 such that ∆ C,p 0 = V ar [ p ( C t,p 0 | X i t − 1 , v i t,s )] < inf { C t : ∇ 2 C t,p 0 E Y t ( C t ) < 0 } 1 |∇ 2 C t,p 0 E Y t ( C t ) | + ǫ 0 . This bound can only b e com puted on th e fly . A- priori, p ∗∗ ,i ( C t,p 0 ) will be most likely to b e log -concave if v t,s is c hosen to ensure that ∆ C,p 0 is smallest. Let v t,s = v t,k 0 where the set k 0 contains K elem ents o ut of [1 , . . . M ] and K is fixed. Th en, ∆ C,p 0 = P k / ∈ k 0 B 2 p 0 ,k ∆ ν,k . This igno res the variance o f n t,s (valid since th e variance is assumed very small co mpared to all ∆ ν,p ’ s). Thus, ∆ C,p 0 will be sm allest if v t,s is ch osen as v t,s = v t,k s , k s , arg min k 0 X k / ∈ k 0 B 2 p 0 ,j ∆ ν,k (28) When K = 1 , this is equi valent to cho osing k s = arg max k B 2 p 0 ,k ∆ ν,k . Based on the a bove d iscussion, we have the following heu ristics. Heuristic 1 : If OL can be multim odal as a f unction of temperatur e at only a single node, p 0 , an d is unimodal as a function of temper ature at other n odes, select v t,s using (28). Heuristic 2 : If OL is mu ch mor e likely to b e mu ltimodal as a function o f C t,p 0 , comp ared to temper ature at any o ther node ( e.g. if a sensor at p 0 is old so that its failure prob ability is mu ch larger th an th e r est), app ly Heur istic 1 to th at p 0 . Heuristic 3 : When p 0 is a set (not a sin gle index), Heuristic 1 can be extended to select k s to minimize th e spectral radiu s (maximu m eigenv alue) of the matrix , P k / ∈ k 0 B p 0 ,k B T p 0 ,k ∆ ν,k . Heuristic 4 : If OL is eq ually likely to b e multimod al as a function of any C t,p (e.g. if all sensor s hav e eq ual failure probab ility), then p 0 = [1 , . . . M ] . Apply ing Heuristic 3, one would select the K largest variance direction s of STP as v t,s . Heuristic 5 : If the probability of OL being multimod al is itself very small, th en K = 0 c an be used. In Exam ple 1 with all linear sensors, this pr obability is roug hly 1 − Q p,j (1 − α j p ) . Heuristic 6 : For J = 2 and all linear sensors, p 0 may be chosen on-the- fly as a r g max p [( Y (1) t,p − Y (2) t,p ) 2 /σ 2 obs,p ] (larger the difference, the more likely it is fo r OL to be multimo dal at th at p ). If the maxim um itself is sma ll, set K = 0 . W e sh ow an example now . Con sider Ex ample 4 w ith α (1) = α (2) = [0 . 4 , 0 . 01 , 0 . 0 1] , Σ ν = diag ([10 , 5 , 5]) , B = [0 . 95 , 0 . 21 , 0 . 21] ′ ; [ − 0 . 2 1 , 0 . 98 , − 0 . 05] ′ ; [ − 0 . 2 2 , 0 , 0 . 9 8 ] ′ (us- ing MA TL AB n otation). By Heuristic 5, the probab ility of OL being multimodal is about 0.65 which is n ot small. So we ch oose K > 0 ( K = 1 ). By Heuristic 2 , we cho ose p 0 = 1 since OL is multimo dal as a function of C t, 1 with probab ility 0.6 4, while th at for C t, 2 or C t, 3 together is 0 . 02 (much sma ller). Apply ing (28) for p 0 = 1 , we get v t,s = v t, 1 . V . P F - E I S - M T: P F - E I S W I T H M O D E T R AC K E R For any PF (includ ing efficient PFs such as PF-EIS o r PF-Doucet), the effecti ve p article size [4], [6] reduces with increasing dim ension, i.e. the N requ ired for a g iv en track- ing accuracy increases with dimen sion. This makes all PFs impractically expensive for LDSS p roblems. W e discuss one possible solu tion to this pr oblem here. A. PF-E IS-MT and P F-MT Algorithm Consider the LDSS m odel ( 20). T o apply PF-EIS, we split the state X t into [ X t,s , X t,r ] , such that p ∗ is unimo dal w .h .p. condition ed on X t,s . As explained earlier , this is ensure d if th e eigenv a lues of Σ r are small enou gh to satisfy (19). Now , because of the LDSS prop erty , X t,r can f urther b e sp lit into [ X t,r,s , X t,r,r ] so th at the maximum e igen value of the covariance of th e STP of X t,r,r is small enou gh to e nsure that there is little erro r in ap proxima ting the co nditional posterior of X t,r,r by a Dirac delta function at its mode. W e call this the M ode Tracking (MT) app roximation of impor tance sampling (I S), or IS- MT . W e r efer to ˜ X t,s , [ X t,s , X t,r,s ] a s the “effective” state and to ˜ X t,r , X t,r,r as the “r esidual” state. W e explain IS- MT in detail below . In PF-EIS, we I S X i t,s from its STP , a nd we EIS X i t,r from N ( m i t , Σ i I S ) wh ere m i t , Σ i I S are defined in (2). Let m i t =  m i t,s m i t,r  and Σ i I S =  Σ I S,s Σ I S,s,r Σ I S,r ,s Σ I S,r  . Th is is equ iv alent to first samplin g X i t,r,s ∼ N ( m i t,s , Σ i I S,s ) and then samplin g X i t,r,r ∼ N ( m ∗ t,r i , Σ i I S,r ) wh ere m ∗ t,r i , m i t,r + Σ i I S,r ,s Σ i I S,s − 1 ( X i t,r,s − m i t,s ) , Σ ∗ I S,r i , Σ i I S,r − Σ i I S,r ,s Σ i I S,s − 1 Σ i I S,r ,s T (29) Now , from (29), Σ ∗ I S,r i ≤ Σ i I S,r . Also, sinc e m i t lies in a locally con vex r egion of E Y t ( X i t,s , X t,r ) , i.e. ∇ 2 E Y t ( X i t,s , m i t ) ≥ 0 (by Theor em 1), Σ i I S ≤ ∆ r . This implies that ∆ r,r − Σ i I S,r , which is a square sub-matrix of ∆ r − Σ i I S , is also no n-negative definite. Thus, Σ ∗ I S,r i ≤ Σ i I S,r ≤ ∆ r,r (30) If the m aximum eigenv alu e o f ∆ r,r is small enough, any sample fr om N ( m ∗ t,r i , Σ ∗ I S,r i ) will be clo se to m ∗ t,r i w .h.p. So we can set X i t,r,r = m ∗ t,r i with little extra error (quan tified in the next subsection) . The algorithm is then c alled PF-EI S- MT . It is summarized in Algorithm 3. A more accurate, b ut also more expensive mo dification(n eed to implem ent it on -the-fly) would be do MT o n the low eigenv alue directio ns of Σ i I S . A simpler, but sometimes less accu rate, mo dification is PF-MT (summarized in Algor ithm 4). I n PF-MT , we combine X t,r,s with X t,s and imp ortance sample th e comb ined state ˜ X t,s = [ X t,s , X t,r,s ] fro m its STP (or in some cases X t,r,s is empty ), while pe rformin g mo de tracking (MT) on ˜ X t,r = X t,r,r . The IS-MT appr oximation introd uces some error in the estimate o f X t,r,r (error decreases with d ecreasing spread o f p ∗∗ ,i ( X t,r,r ) ). But it also reduces the samp ling dimension f rom dim( X t ) to dim([ X t,s ; X t,r,s ]) ( significant red uction fo r large dimensiona l pro blems), th us improvin g the effectiv e p article size. For care fully ch osen d imension o f X t,r,r , this results in smaller total error, especially when the av ailab le numb er of particles, N , is small. This is observed experimenta lly , but proving it theoretically is an open problem. W e say that the IS-MT ap pr oxima tion is “va lid” fo r a given cho ice of X t,r,r if it re sults in smaller total e rror than if it were not used. B. IS- MT Appr oximation W e quan tify the er ror due to IS-M T . If we did not use the MT appr oximation , X i t,r,r ∼ N ( m ∗ t,r i , Σ ∗ I S,r i ) . But using MT , 9 Algorithm 3 P F-EIS-MT . Going from π N t − 1 to π N t ( X t ) = P N i =1 w ( i ) t δ ( X t − X i t ) , X i t = [ X i t,s , X i t,r ] , X i t,r = [ X i t,r,s , X i t,r,r ] 1) Importance Sample X t,s : ∀ i , sample X i t,s ∼ p ( X i t,s | X i t − 1 ) . 2) Efficient Importan ce Sample X t,r,s : ∀ i , a) Co mpute m i t ( X i t − 1 , X i t,s , Y t ) = ar g min X t,r L i ( X t,r ) and Σ i I S , ( ∇ 2 L i ( m i t )) − 1 where L i is define d in (7). L et m i t =  m i t,s m i t,r  and Σ i I S =  Σ I S,s Σ I S,s,r Σ I S,r Σ I S,r ,s  . b) Sample X i t,r,s ∼ N ( m i t,s , Σ i I S,s ) . 3) Mode T rack X t,r,r : ∀ i , a) Co mpute m ∗ t,r i using ( 29). b) Set X i t,r,r = m ∗ t,r i 4) W eig ht: ∀ i , co mpute w i t = ˜ w i t P N j =1 ˜ w j t where ˜ w i t = w i t − 1 p ( Y t | X i t ) p ( X i t,r | X i t − 1 ,X i t,s ) N ( X i t,r ; m i t , Σ i I S ) where X i t,r = [ X i t,r,s , X i t,r,r ] . 5) Resample. Set t ← t + 1 and go to step 1. Algorithm 4 P F-MT . Going from π N t − 1 to π N t ( X t ) = P N i =1 w ( i ) t δ ( X t − X i t ) , X i t = [ ˜ X i t,s , ˜ X i t,r ] 1) Importance Sample ˜ X t,s : ∀ i , sample ˜ X i t,s ∼ p ( ˜ X i t,s | X i t − 1 ) . 2) Mode T rack ˜ X t,r : ∀ i , set ˜ X i t,r = m i t where m i t ( X i t − 1 , ˜ X i t,s , Y t ) = arg min ˜ X t,r L i ( ˜ X t,r ) an d L i is de fined in (7). 3) W eig ht: ∀ i , co mpute w i t = ˜ w i t P N j =1 ˜ w j t where ˜ w i t = w i t − 1 p ( Y t | X i t ) p ( ˜ X i t,r | X i t − 1 , ˜ X i t,s ) where X i t = [ ˜ X i t,s , ˜ X i t,r ] . 4) Resample. Set t ← t + 1 & go to step 1. we set X i t,r,r = m ∗ t,r i . Let th e eigen value decomp osition of Σ ∗ I S,r i = U Λ ∗ I S,r i U T and let λ p , (Λ ∗ I S,r i ) p,p be its p th eigenv alu e. Let d , X i t,r,r − m ∗ t,r i . For an ǫ > 0 , we b ound the probab ility of || d || = || U T d || > ǫ using Chernoff b oundin g: P r ( || d || > ǫ ) = P r ( || U T d || > ǫ ) = P r ( e s P p ( U T d ) 2 p > e sǫ 2 ) ≤ Y p [ (1 − 2 λ p s ) − 1 / 2 e − sǫ 2 / M r,r ) ] ≤ [ (1 − 2 λ m s ) − 1 / 2 e − sǫ 2 / M r,r ) ] M r,r (31) ≤ [ (1 − 2∆ m s ) − 1 / 2 e − sǫ 2 / M r,r ) ] M r,r (32) where s > 0 , λ m , max p λ p and ∆ m , ma x p ∆ r,r ,p . The first inequality fo llows by a pplying Markov in equality , the second follows because λ p ≤ λ m , ∀ p and (32) follows because λ m ≤ ∆ m which f ollows fr om (30). Now , (3 2) hold s for any s > 0 and thus P r ( || d || > ǫ ) ≤ [min s> 0 { (1 − 2∆ m s ) − 1 / 2 e − sǫ 2 / M r,r ) ) } ] M r,r =  ( M r,r ∆ m ǫ 2 ) − 1 e − ( ǫ 2 M r,r ∆ m − 1)  M r,r / 2 , B (∆ m , ǫ ) (33) Re writing [ B (∆ m , ǫ )] 2 / M r,r = ( M r,r ∆ m ǫ 2 ) − 1 /e ( ǫ 2 M r,r ∆ m − 1) and applying L ’Hospital’ s ru le, we get lim ∆ m → 0 B (∆ m , ǫ ) = 0 . Note that, if instead of (3 2), we applied min s> 0 to ( 31), we would get P r ( || d || > ǫ ) ≤ B ( λ m , ǫ ) . Thu s, Theor em 2: Consider any HMM mo del (satisfyin g As- sumption 1) and assume that the co nditions of Theorem 1 hold. Let X i t,r,r ∼ N ( m ∗ t,r i , Σ ∗ I S,r i ) . T hen lim ∆ m → 0 P r ( || X i t,r,r − m ∗ t,r i || > ǫ ) = 0 and also lim λ m → 0 P r ( || X i t,r,r − m ∗ t,r i || > ǫ ) = 0 , i.e. X i t,r,r conv erges in pr obability to m ∗ t,r i in the Euclidean norm as ∆ m , ma x p ∆ r,r ,p → 0 and also as λ m , max p (Λ ∗ I S,r i ) p,p → 0 . Remark 1 : Even if th e conditio ns of Theor em 1 do not h old (inequality (30) does not hold), we can still prove Theo rem 2 if we assum e tha t Σ ∗ I S,r i = C ov ar [ p ∗∗ ,i ( X t,r,r )] (actually Σ i I S is on ly an approx imation to C ov ar [ p ∗∗ ,i ( X t,r,r )] ). The result will th en follow by using th e con ditional variance identity [3 0, Theorem 4.4.7] to show that E Y t [Σ ∗ I S,r i ] ≤ ∆ r,r . In summ ary , PF-EIS-MT can be used if p ∗∗ ,i ( X t,r ) is unimod al w .h.p . and the largest eigenv alue of Σ ∗ I S,r i is small enoug h to ensur e the validity of IS-MT . A sufficient cond ition is that the largest eige n value of ∆ r,r be small enou gh. The choice o f ǫ is g overned by th e tr adeoff b etween the incr ease in error du e to IS-MT and the decrease due to reduced IS dimension. This will be studied in f uture work. C. Choosing the MT -Residu al states, X t,r,r W e first choose an X t,s , X t,r for the EIS step usin g the unimod ality heuristics discussed earlier in Sec. IV -B. Th en we split X t,r into X t,r,s and X t,r,r so that IS-MT is v alid for X t,r,r . Then PF-EIS-MT can be imp lemented with the chosen X t,s , X t,r,s , X t,r,r . Alter nativ ely , one ca n im plement PF-MT (faster) with ˜ X t,s = [ X t,s ; X t,r,s ] , ˜ X t,r = X t,r,r . For a g iv en value of ǫ, ǫ 2 , two approac hes can be used to cho ose X t,r,r . The fir st is offline and finds the largest ∆ m so that B (∆ m , ǫ ) < ǫ 2 . The second is o nline, i. e. at eac h t , for each particle i , it finds λ m so that B ( λ m , ǫ ) < ǫ 2 . Heuristic 7 : Begin with M r,r = M r and keep r educing its value. For each value of M r,r , choose th e states with the M r,r smallest values of ∆ ν,r,p (so that max p ∆ ν,r,r ,p is smallest) as X t,r,r . Wi th this c hoice, com pute B (∆ m , ǫ ) and check if it is smaller th an ǫ 2 . If it is smaller , th en stop, else red uce M r,r by 1 and r epeat the same step s. A seco nd appro ach is to do the same thin g on -the-fly , using B ( λ m , ǫ ) . 10 D. Connec tion with Rao-Bla ckwelli zed PF (RB- PF) W e first discuss the con nection of PF-MT to RB-PF . PF-MT can be interprete d as an app roximatio n of the RB-PF of [9]. The RBPF of [9] is a pplicable when the state vector can be split as X t = [ X t,nl , X t,l ] with the following p roperty: X t,nl has any ge neral non linear or n on-Gau ssian state space model; but cond itioned on X 1: t,nl , X t,l has a linear Gaussian state space mod el. Thus the RB-PF o f [9] impo rtance samples X t,nl from its STP but applies th e Kalman recur sion to compute the c onditiona l p rediction and posterio r densities (both ar e Gaussian) of X t,l condition ed on each p article X i 1: t,nl . T he OL of each particle X i 1: t,nl , is compu ted by marginalizing over the prediction den sity of X t,l . PF-MT can be under stood as an approx imation to the RB-PF in the following sense: r eplace the “nonline ar” part of th e state space b y ˜ X t,s , i.e. X t,nl ≡ ˜ X t,s , an d the “linear ” part by ˜ X t,r , i.e. X t,l ≡ ˜ X t,r . In PF-MT , the con ditional predictio n a nd posterior densities of ˜ X t,r (conditio ned on ˜ X i 1: t,s ) are assumed to be unimodal (not necessarily Gau ssian), but narrow . I n general, it is not possible to marginalize over any u nimoda l density . Bu t if th e product of the STP of ˜ X t,r and the OL giv en ˜ X i t,s is n arrow e nough to be be app roximated by its maximum value times a Dirac delta functio n at its un ique maximizer, PF-MT can b e interpreted as an RB-PF . In that case, th e co nditional posterior of ˜ X t,r is also ap proxim ated by a Dira c delta fun ction. Thus, Theor em 3: PF-MT ( Algorithm 4) is RB-PF (Alg orithm 1 of [ 9]) with the following app roximation at each t : p ( Y t | ˜ X i t,s , ˜ X t,r ) p ( ˜ X t,r | X i t − 1 , ˜ X i t,s ) = p ( Y t | ˜ X i t,s , ˜ X i t,r ) p ( ˜ X i t,r | X i t − 1 , ˜ X i t,s ) δ ( ˜ X t,r − ˜ X i t,r ) (34) ˜ X i t,r = m i t = arg max ˜ X t,r [ p ( Y t | ˜ X i t,s , ˜ X t,r ) p ( ˜ X t,r | X i t − 1 , ˜ X i t,s )] W ith the above appro ximation, th e following a lso hold s: p ∗∗ ,i ( ˜ X t,r ) , p ( ˜ X t,r | X i t − 1 , ˜ X i t,s , Y t ) = δ ( ˜ X t,r − m i t ) (35) The pr oof is a simp le exercise o f simplifyin g RB-PF expres- sions using ( 34) and henc e is omitted. For PF-EIS-MT , r eplace ˜ X t,r by X t,r,r and ˜ X t,s by [ X t,s ; X t,r,s ] in the above d iscussion. Also, importance sam- pling from the STP in case of RB-PF is r eplaced by EIS. V I . R E L A T I O N T O E X I S T I N G W O R K W e discuss her e the relation of o ur algo rithms to existing work. Th e problem of estimating temp erature at a large nu m- ber o f locatio ns in a roo m using a network o f sensors is also studied in [14], [ 15]. Their focus is on mod eling the spatio - temporal temperatur e variation using an RC circuit, estimating its param eters, an d using the mo del for pr edicting temperatu re at unknown nod es. They assum e zero sensor failure probability and o bservation noise (usua lly valid whe n sensors are new) and hence do no t r equire tracking. In a practica l system, o ne can use [14] when sensors are new and reliab le, but track the temp erature using PF-E IS-MT (and the m odels estimated using [ 14]) wh en sensor s grow older and unre liable. For multimod al OL o r STP , if ther e are o nly a few mode s at known mo de lo cations, the Gaussian Su m PFs (GSPF-I or GSPF-II) of [31] can be used. All examples shown in [31] have a o ne dimensiona l p rocess n oise, and thus effecti vely a one dim ensional state. As dimen sion increases, th e numb er of mixands that n eed to b e maintained by G SPF-I increases significantly . W e comp are PF-EIS with GSPF-I in Fig. 3. GSPF-II d efines a mixan d about each p ossible m ode of OL or of STP , f ollowed by resampling to p rune insignificant mode s. The possible n umber o f OL modes inc reases w ith dimension , ev en though for a given observation, it is high ly un likely that all m odes appear . For e. g., in case of tr acking temperatu re at 50 no des with 2 sen sors per no de, each with n onzero failure probab ility , the maximum nu mber o f possible OL modes at any time is 2 50 . Another work that also appro ximates a multimodal pdf by a mix ture density is [32]. The Ind ependen t Partition PF (IPPF) of [3 3] a nd th e IPPF- JMPD o f [34] prop ose efficient PFs fo r mu ltiple target track- ing. There the motion model of different targets is independ ent, while the OL is cou pled when the targets are nearby (be cause of c orrespon dence a mbiguity betwee n ob servations and tar- gets). The main idea of IPPF is to resam ple indep endently for each target when the targets are significantly far apar t (their OL s are rou ghly in depend ent). In our work, a nd also in other LDSS prob lems, this canno t be do ne sinc e the temperatur e (or other state) dynamics of different no des is coupled (temperatu re cha nge is spatially correlated ). The main id ea of MT was first introduced by us in [2 9] and first gene ralized in [2 ], [3 5], [1]. Th e work o f [3 6] which propo ses a “PF using gr adient prop osal” is related to [29]. Th e MT step can also be un derstood as Rao-Blackwellization [27], [9] if the appr oximation of Theore m 3 holds. Anoth er rec ent PF that also perfor ms ap proxim ate ma rginalization, but on ly on the stable directio ns of th e state space, is [21]. This can be made more e fficient b y using the E IS idea on the u nstable directions. Many existing algorithm s may be interpr eted as special cases of PF-EI S-MT , fo r e.g. PF-Original is PF-EIS- MT with X t,s = X t , PF-Dou cet is PF-EIS-MT with X t,r,s = X t , and the approxim ate “ posterior mod e tr acker” of [18] is approx imately PF-EIS-M T with X t,r,r = X t . There is a funda mental difference between MT and the common ly used idea of r eplacing the PF estimate of the posterior by a Dirac delta function at the highest weight particle (or at the mo de of the PF po sterior estimate), as in [17], o r doing this for a sub set of states, as in [3 7]. The latter can be understood as an extrem e type o f resampling which will automatically occu r in any PF if the largest weight p article has much h igher weight than any other par ticle. It still r equires IS on the entire state space to first get the PF e stimate of posterior . V I I . S I M U L AT I O N R E S U LT S W e used Root Mean Squared Error (RMSE) of the PF approx imation of th e MMSE state estimate (f rom its true value) and per centage of o ut-of-tra ck rea lizations to compar e the per formanc e of PF-EIS with that of PF-Orig inal (PF-EIS with K = M ) [3] a nd PF-Dou cet (PF-EIS w ith K = 0 ) [6] in Fig. 3. The nu mber of particles ( N ) was kept fixed for all PFs in a given co mparison . W e also show th e RMSE plot of GSPF-I [31] with total nu mber of pa rticles (num ber 11 0 5 10 15 20 25 0 20 40 60 80 100 120 time, t RMSE RMSE from ground truth. N=100 particles K=0 (PF−Doucet) K=1 (PF−EIS) K=M (PF−Original) GSPF−I,N=8*14=112 0 5 10 15 20 25 0 10 20 30 40 50 60 70 80 90 time, t % of times MSE > threshold Out−of−track %. N=100 particles K=0 (PF−Doucet) K=1 (PF−EIS) K=M (PF−Original) (a) Sensor failu re (temperature independent) 0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 30 time, t RMSE RMSE from ground truth. N=50 particles K=0 (PF−Doucet) K=1 (PF−EIS) K=M (PF−Original) GSPF−I,N=8*7=56 GSPF−I,N=27*7=189 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 time, t % of times MSE > threshold Out−of−track %. N=50 particles K=0 (PF−Doucet) K=1 (PF−EIS) K=M (PF−Original) (b) Sensor failure (weakly temperature dependent) 0 2 4 6 8 10 12 14 16 18 20 0 50 100 150 200 250 Time, t → RMSE RMSE from ground truth. N=100 particles PF−Doucet PF−EIS PF−Original 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 Time, t → % of times MSE > threshold Out−of−track %. N = 100 particles PF−Doucet PF−EIS PF−Original (c) Squared sensor at node 1 Fig. h ( C t ) p f ( Y ( j ) t,p | C t,p ) α (1) , α (2) σ 2 obs ∆ ν B C 0 a N 3(a) h ( C t ) = C t N (0 , 100) α (1) = [ . 9 , . 1 , . 1] , [10, 1, 1] diag (10 , 5 , 5) , [ . 99 , . 1 , . 1] ′ ; [0 , 0 , 0] ′ 1 100 α (2) = [ . 4 , . 01 , . 01]; [ − . 10 , 0 . 99 , − . 01] ′ ; [ − . 10 , 0 , . 99] ′ 3(b) h ( C t ) = C t N ( . 2 C t,p , 100) α (1) = [ . 4 , . 01 , . 01] [1 , 1 , 1] diag (10 , 5 , 5) [ . 95 , . 21 , . 21] ′ ; [0 , 0 , 0] ′ 1 50 α (2) = [ . 4 , . 01 , . 01] [ − . 21 , . 98 , − . 05] ′ ; [ − . 22 , 0 , . 98] ′ 3(c) h 1 ( C t ) = C 2 t, 1 α (1) = [0 , 0 , 0] [3 , 1 , 1] diag (10 , 5 , 5) , [ . 95 , . 21 , . 21] ′ ; [5 , 5 , 5] ′ . 7 50 h p ( C t ) = C 2 t,p , p > 1 [ − . 21 , . 98 , − . 05] ′ ; [ − . 22 , 0 , . 98] ′ (d) T able of parameters Fig. 3 . Comparing RMSE, out-of-track % and N ef f of PF-E IS (black- △ ) with that of PF-Doucet (red-*), PF-Ori g (magen ta-o) and GS PF-I (magenta -+). RMSE at time t is the square root of the mean of the squared error between the tr ue C t and the t racked one ( N -particle PF estimate of E [ C t | Y 1: t ] ). Out-of-track % is the percentage of realizations for which the norm of the squared error exceeds an i n-track threshold ( 2-4 times of total observation noise v ariance). In-track t hreshold for Fig. 3(a) was 48, for Fig. 3(c) was 20 and for F ig. 3(b) was 12. W e av eraged ov er 90 Monte Carlo simulations in Figs. 3(b) and 3(c) and ov er 40 in Fig. 3(a). Note C 0 refers to the starting value of C t . of mixtures times nu mber of particles per mixtu re) ro ughly equal to N . In Fig . 4, we show superior p erforman ce of PF- MT and PF-EIS-MT over PF-EIS, PF-Douc et, PF-Original and PF-Orig- K - dim (d imension redu ced original PF , i.e. original PF ru n on on ly the first K dimen sions). Note that for multimod al p osteriors, the RMSE at the current time does no t tell us if a ll significant m odes have been tracked or not. But, if a significant mode is missed, it will often result in larger errors in fu ture state estimates, i.e. the erro r due to the missed mode will be cap tured in f uture RMSEs. In many pr oblems, the g oal o f tracking is only to get an MMSE state estimate, and not n ecessarily view all the modes, and in these cases RMSE is still the co rrect performance measure. If a missed p osterior mo de d oes no t result in larger fu ture RMSEs, it does not affect p erform ance in a ny way 2 . Of course, the increase in error du e to a missed mod e may occu r at different time instants fo r d ifferent realizations and h ence the a verage may not always truly reflec t th e loss in track ing perf ormance. 2 The true posterior is unkno wn. T he only other way to e val uate if a PF is tracking all the m odes at all times, is to run another PF with a very large number of particles and use its posterior estimate as the true one. Evaluatin g PF- EIS: W e first explain a typical situatio n where PF-Dou cet fails but PF-E IS does no t. This occur s wh en the STP is broad a nd the OL is bimo dal ( or in general, multimoda l) with mod es th at lie close to eac h other in itially , but s lowly drift apart. PF-Doucet uses gradient descent starting at C i t − 1 to find the mode. When p ∗ is mu ltimodal, it app rox- imates p ∗ by a Gaussian abo ut the m ode in whose ba sin- of-attractio n the previous par ticle (i.e . C i t − 1 ) lies. At t = 0 , particles o f v t are ge nerated from th e initial state distribution and so there ar e som e particles in th e basin-o f-attraction of both mode s. But du e to resamp ling, within a few time instants, often all par ticles cluster aroun d one m ode. If th is happ ens to be the wro ng m ode, it results in loss o f track. I n contrast, PF- EIS samples v t,s from its STP , i.e. it gen erates new pa rticles near b oth OL mod es at each t , and so d oes no t lose track. All plots of Fig. 3 simulated Examp le 1 with M = 3 . Mo del parameters used fo r each s ubfigu re are given in the table in F ig. 3(d). Th e examp le of Fig. 3(a) is a special case of Example 4. It has M = 3 sensor n odes; J = 2 senso rs per node; all linear sensors and “tem perature- indepen dent failure”, i.e. p f ( Y ( j ) t,p | C t,p ) = p f ( Y ( j ) t,p ) = N ( Y ( j ) t,p ; 0 , 100) . T emperatu re 12 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 time, t RMSE RMSE from ground truth. N=100 particles PF−K dim PF−MT PF−EIS PF−Original PF−Doucet 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 100 time, t % of times MSE > threshold Out−of−track %. N=100 particles PF−K dim PF−MT PF−EIS PF−Original PF−Doucet (a) Sensor failu re (temperature independen t) 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 time, t RMSE RMSE from ground truth. N=50 particles PF−Doucet PF−EIS PF−Original PF−EIS−MT (b) Robustne ss to model error Fig. M h ( C t ) p f ( Y ( j ) t,p | C t,p ) α (1) , α (2) σ 2 obs ∆ ν B 1 , : C 0 a N 4(a) 10 h ( C t ) = C t N (0 , 100) α (1) = [0 . 9 , 0 . 01 9 ] 1 10 diag ([10 , 1 9 ]) [0 . 83 , 0 . 18 9 ] ′ [0 10 ] ′ 1 100 α (2) = [0 . 4 , 0 . 01 9 ] 4(b) 5 h ( C t ) = C t N (0 . 2 C t,p , 100) α (1) = α (2) = [0 . 2 , 0 . 1 4 ] [1 5 ] diag ([5 , 5 , 1 3 ]) [0 . 7 , 0 . 35 5 ] ′ ; [0 5 ] ′ 1 50 α (1) sim = [0 . 95 , 0 . 1 4 ] (c) T able of parameters. The notation b k denotes a row vec tor of bs of length k , e.g. 1 9 . Fig. 4. Comparing PF-MT (blue-  ) in 4(a) and PF-EIS- MT (blue-+) i n 4(b) with PF-Doucet (red-* ), PF-EIS (black- △ ), PF-Orig (magenta-o) and PF-Ori g-K dim (magenta-x). In F ig. 4(a), M = 10 was used. X t,s = v t, 1 was used for both PF-EIS and P F-MT . A veraged ov er 50 simulations. PF-MT has best performance. In Fig. 4(b), we test the robustness to error in the failure probability parameter . M = 5 was used . W e used X t,s = v t, 1 , X t,r,s = v t, 2 for PF-E IS-MT . X t,s = v t, 1 was used for PF -EIS. A veraged ove r 100 simulations. PF-E IS-MT is the most robu st when N = 50 particles were used (av ailable N is small). If N = 100 particles are used, PF-EIS is the most robust (not sho wn). change fo llowed a rand om walk m odel, i.e. a = 1 . By Heuristic 2, we choose p 0 = 1 sin ce OL is multimodal as a fu nction of C t, 1 with much higher probab ility than at other nod es (we simulate an extreme case). Ap plying (28) for p 0 = 1 , we g et v t,s = v t, 1 . T his was used for PF-EIS. As can be seen , RMSE for PF-EIS was smaller than f or PF- Doucet and so wer e the numb er of “out o f trac k” realizations. GSPF-I [31] with G = 8 mixtures and N g = 7 particles per mixture (a total of 56 par ticles) and PF-Origin al had much worse performan ce for reasons explaine d earlier (used inefficient importan ce de nsities). In Fig. 3(b), we simulated “weakly temperatu re depend ent sensor failure”, i.e. p f ( Y ( j ) t,p | C t,p ) = N ( Y ( j ) t,p ; 0 . 2 C t,p , 10 0 σ 2 obs,p ) . Also, sensor failure probab ility at nod e 1 was lower than in Fig. 3(a). Thus the per formanc e of all algo rithms is better . Fig. 3(c) u sed J = 1 sensor pe r node and a squared sensor at node 1, i.e. h ( C t ) = [ C 2 t, 1 ; C t, 2 ; C t, 3 ] . All sensors had zero failure prob ability , i.e. α (1) p = 0 , ∀ p . T emper ature change followed a first order au toregressiv e model 3 with a = 0 . 7 . In this case O L is bim odal as a fun ction o f C t, 1 whenever Y t, 1 is significan tly positive. This h appens w .h.p when tem- peratures ar e gre ater than p 3 σ obs, 1 = 2 . 3 (o r less than − 2 . 3 ) which itself happen s very often. Also, often, the modes are initially nearb y an d slowly d rift apart as the mag nitude of Y t, 1 increases. As exp lained earlier, th is is just the situatio n that results in failure of PF-Doucet. Perform ance of PF-Doucet is significantly worse than that of PF-EIS (which used v t,s = v t, 1 obtained b y ap plying (2 8) f or p 0 = 1 ). Note th at we in itiated tracking with an initial known temperatu re of 5 , so that th ere was a bias towards positive tempera ture values and it was 3 This example is a difficul t one because OL is almost alway s bimodal with two equal modes. W ith a random walk model on v t , eve n N = 100 particl es were not enough for accurate tracking using any PF . indeed possible to cor rectly track the temper ature and its sign. Using an an onymous re viewer’ s sugge stion, we also plot the effecti ve particle size, N ef f , fo r all the ab ove e xamples in Fig. 5. N ef f is eq ual to the inv erse of the variance of norma lized particle weights [4]. Because of resampling at each t , N ef f only measures the ef fectiveness of the curr ent particles, and not how they influen ce the futu re posterio r estimates. N ef f will be h igh even when most p articles cluster a round an OL mode which in future turns o ut to be th e wr ong o ne, re sulting in larger fu ture RMSEs. Th is is why PF-Do ucet, which samples from the Laplace a pprox imation to th e “optim al” im portance density (optimal in the sense of minimizing the co nditiona l weights’ variance) has th e hig hest N ef f , but not the smallest RMSE. Th is issue is most obvious for the square d sensor case. T ime Compa rison. W e used the MA T LAB pr ofiler to compare the tim es taken by different PFs for tracking fo r 20 time steps. GSPF-I took 1 seco nd, PF-Original took 2 seconds, PF-EIS took 60 .2 seconds, an d PF-Doucet too k 111.2 seconds. GSPF-I an d PF-Original took significantly lesser time since they do not use grad ient d escent at all. No te also th at the gradient descent algor ithm used by us was a very basic and slow implemen tation using the fminunc func tion in MA TLAB, thus makin g PF-E IS or PF-Do ucet mor e slo wer than they would actually b e. PF-D oucet takes mo re time than PF-EIS because (a) it finds the m ode on an M dimension al spa ce, while PF-EIS finds mode only on an M − K dimen sional space and (b) p ∗ is very likely to b e multimodal (many times the initial g uess p article m ay no t lie in the b asin-of-attrac tion o f any mode an d so many more d escent iterations are re quired). Evaluatin g PF-MT an d PF-EI S-MT : In Fig. 4, we comp are the perfor mance of PF-MT and PF-EIS-MT with oth er PFs. The model of Fig. 4(a) was similar to that of Fig. 3(a), but with M = 10 . W e u sed X t,s = v t, 1 , X t,r,s = em pty and X t,r,r = v t, 2:10 , i.e. this was a PF-MT with ˜ X t,s = v t, 1 13 0 5 10 15 20 25 0 10 20 30 40 50 60 70 80 Time, t Avg N eff Avg N eff . N=100 particles K=0 (PF−Doucet) K=1 (PF−EIS) K=M (PF−Original) (a) N ef f for Fig. 3(a) 0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 30 35 40 Avg N eff . N=50 particles Time, t Avg N eff PF−Doucet (K=0) PF−EIS (K=1) PF−Original (K=3) (b) N ef f for Fig. 3(b) 0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 30 35 Avg N eff . N=100 particles Time, t Avg N eff PF−Doucet PF−EIS PF−Original (c) N ef f for Fig. 3(c) 0 2 4 6 8 10 12 14 16 18 20 0 10 20 30 40 50 60 70 80 90 time, t Avg N eff Avg N eff. N=100 particles PF−K dim PF−MT PF−EIS PF−Original PF−Doucet (d) N ef f for Fig. 4(a) Fig. 5. Ef fectiv e particle sizes ( N ef f ). Because of resampling at each t , N ef f only measures the effectiv eness of the current particles, and not ho w they influence future posterior estimates. It is high ev en when most particles cluster around an OL mode which in future turns out to be the wrong one, resulting in l arger future RMSEs. P F-Doucet has highest N ef f , but not lowest RMSE or out-of-track % (see Fig. 3 ). and ˜ X t,r = v t, 2:10 . As can b e seen from the figur e, PF- MT o utperfo rms all oth er algor ithms. It o utperfo rms PF-EIS because it importance samples only on a K = 1 dim s pace, but perfor ms MT o n the oth er 9 dim ensions ( which have a n arrow enoug h con ditional po sterior) an d so its effecti ve particle size is mu ch high er (see Fig. 5 (d)). This is p articularly impo rtant when the av ailab le N is small. PF-MT ou tperfor ms PF-Do ucet primarily because of the E IS step (a pprox imated by MT). It is much b etter than PF-Or iginal again because of better effective particle size (r esult of using E IS instead of IS from STP). Finally , it is significantly better than PF-K-d im because PF- K-dim perfo rms dim ension red uction o n 9 states (a ll of which are non stationary) wh ich results in very large erro r , wh ile PF- MT tracks th e p osterior mod e o n all these dimension s. No te that because o f resamplin g, N ef f may also b e very h igh when a PF is comp letely out-of -track (all particles h av e very low but rough ly equal weights). This is tr ue for PF-K- dim (Fig. 5 (d)). In Fig. 4(b ), we ev aluate ro bustness to modeling er ror in sensor failure p robability . The tracker assum ed failure probab ility α (1) 1 = 0 . 2 . The observations were simulated u sing α (1) 1 = 0 . 95 . This simu lates the situation where a sensor begins to fail m uch mo re often due to som e sudden damage to it. For this pro blem, M = 5 . W e u sed X t,s = v t, 1 , X t,r,s = v t, 2 and X t,r,r = v t, 3:10 i.e. we implemen ted PF-E IS-MT . PF-EIS-MT has the best perf ormance when N = 50 (av ailable numbe r of particles is small) while PF-EIS has the best perfo rmance when a larger N , N = 10 0 is used (not shown). Note tha t M = 5 or 1 0 is a large enoug h d imensional state space if reason able accuracy is desired with as low as N = 5 0 or 10 0 particles. In practical scen arios (which are difficult to run multip le Monte Carlo runs of) such as contou r tr acking [29], [11] or tracking temperatur e in a wide area with large number of sensor s, th e state dimension can be as large as 10 0 or 20 0 while one can not use enough particles to imp ortance sample on a ll dime nsions. The IS-MT appro ximation will be really u seful fo r such types of prob lems. V I I I . D I S C U S S I O N A N D F U T U R E W O R K W e have stud ied efficient importance sampling tech niques for PF when the observation likelihood (OL) is fr equently multimoda l o r heavy-tailed and the state transition pdf (STP) is bro ad and/or multimodal. The p roposed PF-EI S algorithm generalizes Doucet’ s idea of sampling fro m a Gaussian ap- proxim ation to the optim al importan ce d ensity , p ∗ , whe n p ∗ is unimod al, to the case of multimo dal p ∗ . Sufficient co nditions to ensur e unimodality of p ∗ condition ed on the “multimo dal states”, X t,s , a r e derived in Theorem 1. Theor em 1 can be extended to test for unimo dality o f any posterior . Specifically , it can also b e extended to pro blems in volving static po sterior importan ce sampling . In its curren t form, it is very expensive to verify the con ditions of Th eorem 1. But, based on it, multiple heur istics to choose X t,s to ensure that p ∗ condition ed on X t,s is most likely to be unimod al have been prop osed. An unsolved resear ch issue is to either find efficient nume rical techn iques to verify th e cond itions of Theorem 1 on-the- fly or to find ways to mod ify th e r esult so that the selection can be done a-priori. W e h av e shown thro ugh extensiv e simulatio ns that PF-EIS outperf orms PF-Doucet (PF-EIS with K = 0 ) whenever p ∗ is freq uently mu ltimodal. But, in o ther cases, PF-Douce t has lower error . An efficient algo rithm (in terms o f the r equired N ) would be to cho ose the d imension and d irection of X t,s on-the- fly using Heuristic 6. Increasing N for a ny PF increases its comp utational c ost. Once X t,s is la rge enoug h to satisfy u nimodality w .h .p., th e N requ ired fo r a given err or increases as dimension of X t,s is in creased fur ther (fo r e.g ., PF-Origin al had much higher RMSE than PF-EIS for given N ). But, compu tational cost per particle always red uces as dimen sion o f X t,s is increased (for e .g. PF-Orig inal took much lesser time than PF-EIS which took lesser time th an PF-Dou cet). F or a given tracking performance, if one had to choo se X t,s to ensur e minima l computatio nal com plexity , then the op timal choice will be a higher d imensional X t,s than what is r eq uir e d to just satisfy unimoda lity . Find ing a systematic way to do this is an open problem . On the oth er hand, if the goa l was to fin d a PF with min imal storage complexity or to fin d a PF that uses the sm allest n umber of parallel hardware un its (in case o f a parallel implem entation), the comp lexity is pro portion al to N . In this case, PF-EIS (or PF-EIS-MT) with sm allest po ssible “multimod al state” dim ension would be th e best techniq ue. As state sp ace dim ension increases, the effecti ve particle size reduce s ( variance o f weights increases), thus mak ing any regular PF im practical for large dimensional tracking 14 problem s. The posterior Mo de T racking (MT) appro ximation to im portance samp ling (I S) f or the states who se con ditional posterior is narr ow eno ugh, is one way to tackle this issue. The IS-MT a pprox imation intro duces some e rror in th e estimation of th ese states, but at the same time, it also r educes the sam- pling d imension by a large amou nt, thus impr oving effective particle size. For carefully chosen IS-MT directions, the net effect is smaller total er ror, especially when the a vailable N is small. An o pen issue is to find rigoro us techniqu es to select the I S-MT d irections to ensure maximu m reductio n in err or . A related issue is to stud y the stability o f PF-MT or PF-EIS-MT , i.e. to sho w that the increase in PF er ror due to the IS-MT approx imation at a certain time t 0 goes to zero with t fast enoug h and thu s the net err or du e to IS-MT at all tim es is bound ed. A related work is [38] which analy zes the RB-PF . An interesting op en q uestion is if Compressed Sen sing [39] can b e u sed to select the IS-MT d irections and wh en. A C K N OW L E D G E M E N T S The au thor would like to thank the anonymous revie wers and the associate editor f or their excellent comments and suggestions. The autho r would also like to th ank Aditya Ramamoorth y fo r usef ul com ments on Theorem 1. R E F E R E N C E S [1] N. V aswani, “Pf-eis & pf-mt: New partic le filte ring algorithms for multimodal observati on like lihoods and large dimensional state spaces, ” in IEEE Intl. Conf. Acoustics, Speech, Sig. Pr oc. (ICASSP) , 2007. [2] N. V aswani, A. Y ezzi, Y . Rathi, a nd A. T annenbaum, “Partic le filt ers for infinite (or large ) dimensional state spaces-par t 1, ” in IEEE Intl. Conf. Acoustics, Speech , Sig. Pr oc. (ICASSP) , 2006. [3] N. J. Gordon, D. J . Salmond, and A . F . M. Smith, “Nov el approach to nonlinear/non gaussian bayesian state e stimation, ” IEE Pr oceedings-F (Radar and Signal Proc essing) , pp. 140(2):107 –113, 1993. [4] S. Arulampalam, S. Maskel l, N. Gordon, and T . Clapp, “ A tutorial on partic le filters for on-line non-linea r/non-gaussia n bayesian tracking , ” IEEE T rans. Sig. Pro c. , vol. 50, no. 2, pp. 174–188, Feb. 2002. [5] A. Doucet, N. d eFreitas, and N. Gordon, Eds., Sequential Monte Carlo Methods in Practice , Springer , 2001. [6] A. Doucet, “On sequentia l monte carlo sampling methods for bayesia n filterin g, ” in T ech nical R eport CUED/F-INFENG/TR. 310, Cambridge Univer sity Department of Engineering , 1998. [7] J.P . MacCormick and A. Blak e, “ A probabil istic contour discriminant for object locali sation, ” IEEE Intl. Conf. on Computer V ision (ICCV) , Mumbai, India, June 1998. [8] M. Pitt and N. Shephard, “Fi ltering via simulation: auxilia ry particle filters, ” J. Amer . Stat. A ssoc , v ol. 94, pp. 590599, 1999. [9] T . Schn, F . Gustafsson, and P . J. Nordlund, “Marginal ized particle filte rs for nonlinear state-space m odels, ” IEEE T rans. Sig. Pr oc. , 2005. [10] M. Isard and A. Blake, “Conden sation: Conditional Density Propagation for V isual Trac king, ” Intl. J ournal of Comp. V ision , pp. 5–28, 1998. [11] N. V aswani, Y . Rathi, A. Y ezzi, and A. T annenbaum, “Pf-mt with an interpo lation ef fecti ve basis for tracking local contour deformatio ns, ” IEEE T rans. Image Pr oc. , Accepted (2008). [12] A. Kale and C. Ja ynes, “ A joint illumination and shape m odel for visual tracki ng, ” in IEEE Co nf. on Comp. V is. and P at. R eco g. (CVPR) , 20 06, pp. 602–609. [13] A. Kale, N. V aswani , and C. Jaynes, “Partic le filter with mode tracke r (pf-mt) for visual tracking across illuminat ion change, ” in IEEE Intl. Conf . Acoustics, Speech, Sig. Pr oc. (ICASSP) , 2007. [14] H. Zhang, B. Krogh, J. M. F . Moura, and W . Zhang, “Estimat ion in virtual sensor-actua tor arrays using reduced-order physical models, ” in IEEE Conf. Decisio n and Contro l (CDC) , 2004. [15] H. Zhang, J. M. F . Moura, and B. Krogh, “Estimation in sensor networks: A graph approach, ” in Informat ion Pr ocessing in Sensor Networks (IPSN) , 2005. [16] W . F ong and S. Godsill, “Sequential monte carl o simulation of d ynam- ical models with slowly va rying paramete rs: Applicati on to audio, ” in IEEE Intl. Conf. Acoustics, Speech , Sig. Proc . (ICASSP) , 2002. [17] W . Sun, M. Cetin, R. Chan, V . Reddy , G. Holmv ang, V . Chandar , and A. W illsky , “Segmenti ng and tracking the lef t ventricl e by learning the dynamics in cardiac images, ” MIT T echnical R eport 2642 , Feb 2005. [18] J. Jackson, A. Y ezzi, and S. Soatto, “Trackin g deformable moving object s under sev ere occlusions, ” in IEEE Conf. Decision and Contr ol (CDC) , 2004. [19] O. Juan, R. Ker iv en, and G. Postelnicu, “Stochastic m ean curvat ure motion in co mputer visi on: Stoc hastic ac ti ve cont ours, ” in VLSM , 2004. [20] N. V aswani and S. Das, “Partic le filter with ef ficient importan ce sampling and mode track ing (pf-eis-mt) and its applic ation to landmar k shape tracki ng, ” in Asilomar Conf. on Sig. Sys. Comp. , 2007. [21] A. J. Chorin and P . Krause, “Dimensiona l reduction for a bayesian filter , ” Pr oc. Nat. Acad. of Scienc es, USA , vol. 101 (42), pp. 15013– 15017, 2004. [22] F . Mosteller and D.L. W allace , Inferenc e and Disputed Aut horship: The F ederal ist , Addison-W esley , Reading, MA, 1964. [23] D. V . Lin dley , “ Approximate baye sian methods, ” in Bayesia n Stati stics, Univer sity Press, V alenci a, Spain , J. M. Bernado, M. H. Degroot , D.V . Lindle y , and A. F . M. Smith, Eds., 1980. [24] L. Tierne y and J. B. Kadane, “ Accurate approximations for posterior moments and margin al densiti es, ” J. A mer . Stat. Assoc , vol. 81, no 393, pp. 82–86, March 1986. [25] R. va n der Merwe, N. de Freitas, A. Doucet, and E. W an, “The unscente d partic le filter , ” in Advances in Neural Information Pro cessing Systems 13 , Nov 2001. [26] V . Ce vher and J. H. McClella n, “Proposal s trate gies for joint sta te-space tracki ng wi th particl e filte rs, ” in IEEE Intl. Conf. A coustic s, Speech, Sig. Pr oc. (ICASSP) , 2005. [27] R. Chen and J. S. Liu, “Mixture kalman filters, ” Journal of the Royal Statist ical Society , vol. 62(3), pp. 493–508, 2000. [28] J. Xavie r and J.H. Manton, “On the generaliza tion of ar processes to riemanni an man ifolds, ” in IEEE Intl. Conf. A coustic s, Speech, Sig. Pr oc. (ICASSP) , 2006. [29] Y . Rathi, N. V aswani, A . T annenbaum, and A. Y ezzi, “Trackin g deforming objects using particle filtering fo r geometric acti ve c ontours, ” IEEE T rans. P attern Anal. Machine Intell. , 2007, to appear . [30] G. Casella and R. Berger , Stati stical Infer ence , Duxbury T homson Learning, second edition, 2002. [31] J. H. K otecha and P . M. Djuric, “Gaussian sum part icle filte ring, ” IEEE T rans. Sig. Pr oc. , pp. 2602–2612, Oct 2003. [32] M. S. Oh and J. O. Berge r , “Integ ration of multimodal functions by monte carlo importance sa mpling, ” J . Amer . Stat. Assoc , vol. 88, no. 22, pp. 450–456, 1993. [33] M. Orton and W . Fitz gerald, “ A bay esian approach t o trac king multipl e targ ets using sensorarrays and particle filters, ” IEEE T rans. Sig. Proc. , pp. 216–223, February 2002. [34] C. Kreucher , K. Kastella , and A.O. Hero III, “Multita rget t racking using the joint m ultit arget probability density , ” IEEE T rans. Aer . Elec. Sys , pp. 1396–1414, October 2005. [35] N. V aswani, “P articl e filters for infinite (or large) dimensional state spaces-pa rt 2, ” in IE EE Intl. Conf. Acoustics, Speec h, Sig. Proc. (ICASSP) , 2006. [36] Z. Chen, T . Kirubaraj an, and M. R. Morelande, “Improved particle filterin g schemes for target tracking, ” in IEEE Intl. Conf. Acoustics, Speec h, Sig. Pr oc. (ICASSP) , 2005. [37] S. Zhou and R. Chellappa , “Probabili stic human recognition from video, ” in Eur . Conf . on Comp. V is. (ECCV) , 2002. [38] N. Chopin, “Cent ral limit theorem for sequential monte carlo methods and its applica tion to bayesian inferenc e, ” Annals of Statistic s , vol. 32 (6), pp. 2385–2411, 2004. [39] E. Cand es, J. Romberg , and T . T ao, “Robust unc ertaint y principle s: Exact signal reconstruction from highly incomplete frequenc y informat ion, ” IEEE T rans. on Info. Theory , vol. 52(2), pp. 489–509, February 2006.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment