Decision Based Uncertainty Propagation Using Adaptive Gaussian Mixtures

Decision Based Uncertaint y Pr opagation Using Adaptive Gaussian Mixtur es Gabriel T er ejanu a Puneet Singla b T arunraj Singh b Pe ter D. Scott a terejanu@buf f alo.edu psingla@buf falo.edu tsingh@buf falo.edu peter@buf f alo.edu a Department of Computer Science & Engineering b Department of Mechanical & Aerospace Engineering Univ ersity at Buf falo, Buf fa lo, NY -14260. Abstract – Given a decision p r ocess based on the appr ox- imate pr obability density function r eturned by a data assim- ilation algo rithm, a n interaction level b etween the d ecision making level and the data assimilation level is designed to incorporate the information held by the decision maker into the data assimilation pr ocess. Her e th e information held by the decision ma ker is a lo ss functio n at a decision time which maps th e state spa ce onto r ea l numbers which repr e- sent the thr ea t associa ted with d iffer ent possible outcomes or states. The new pr obability density function obtained w ill address the re gion of inter est, the ar ea in the state space with the h ighest threat, and will pr ovide overall a better app r ox- imation to the true condition al pr o bability density function within it. Th e appr o ximation used for the pr ob ability density function is a Gau ssian mixtur e and a numerical example is pr esented to illustrate the conc ept. Keywords: Adaptive Gaussian Sum, Decision Making, Uncertainty Propag ation, Expec ted Loss. 1 Introd uction Chemical, Biological, Radiological, and Nuclear (CBRN) incidents are rare e vents but very co nsequential, which man- dates extensi ve re search and oper ational efforts in mitigating their outcom es. For such cr itical applicatio ns the accuracy in pred icting the futur e e volution of toxic plume s in a timely fashion represen ts a n imp ortant par t in the Decision Maker (DM) toolbox. Based on these f orecasts, dec isions can be made on ev acu ating cities, sheltering or medical ge ar de- ployment. Such decisions are taken based on a loss function or region of in terest such as the population density in a g iv en area. Many research p rojects try to mo del the a tmospheric transport and diffusion of toxic plu mes. While inheren tly stochastic an d high ly non linear, these mathematical models are a ble to cap ture just a p art of the dyn amics of the real pheno menon and the f orward integration yields an un cer- tain pr ediction. Th e decision maker takes actions b ased o n expected loss computed using both th e predicted uncer tainty and the loss function, which here m aps a region o f interest in the state space into a threat lev el, such as the po pulation den- sity in a city . Th us the ability to p ropaga te the uncertainty and errors throug hout the dynamic system is of great impo r- tance, and the ev olution o f the probab ility density function (pdf) has received much attention recently . Due to the u ncertainty acc umulation in integrating the model, the forecasts of the system beco me less and less use- ful f or the decision maker . Data Assimilation ( D A) offers a way to r educe the uncer tainty by co mbining mea surements provided by sensors with model predictio n in a Bayesian way [1 4]. Th is gives an improved situation assessment for the hin dcast and nowcast. Unfor tunately the foreca st, u sed to e valuate the impact assessment, is still affected by the ac- curacy o f proba bility d ensity functio n evolution. Even in the hindcast and nowcast cases if the sensor s provid e ambigu ous measuremen ts such a q uadratic measuremen t mo del, the im- provement brou ght by D A may be marginal. For nonlinear systems, the exact description of the transi- tion pdf is provided b y a linear p artial d ifferential equa tion (pde) known as the Fokker Planck K o lmogor ov Eq uation (FPKE) [12]. Ana lytical solutio ns exist only for stationar y pdf and are restricted to a limited class of dynam ical systems [12]. Thus researche rs are looking acti vely at numerical ap- proxim ations to solve the FPKE [8, 10], generally usin g th e variational fo rmulation of the pr oblem. F or discrete-time dynamica l systems, solv ing for the exact solutio n, which is giv en by the Chapman-Kolmogorov Equation (CKE), yields the same problems as in th e continuou s-time case. Several other techniq ues exist in the litera ture to ap proxim ate the pdf ev olution, the most po pular being Mon te Carlo (MC) methods [5], Gaussian clo sure [ 7] (or high er ord er m oment closure), Equ iv alen t Linea rization [13], Stocha stic A verag- ing [9] , Gaussian mixtur e ap proxim ations [1, 6, 16]. Fur- thermor e, all these approach es provide only an approximate description of the uncertain ty pro pagation p roblem by r e- stricting the solution to a small number of parameters. All these assumption s employed make the p roblem tractable and co mputation al ef ﬁcient, which satisﬁes the re- quiremen t of minimizin g d ecision laten cy . Bu t th e app rox- imation giv en ma y be of little use when com puting the ex- pected loss, since the method is not sensiti ve to the region of interest. Such an examp le may b e given using Monte Carlo approx imations or Gaussian Sum appr oximation s, when the propag ated unce rtainty offers no or very little probabilistic support in the region of interest. In other words, no particles or Gaussian co mpone nts are centered in th e region of inter- est, and even tho ugh the pro babilistic support may be inﬁ- nite, the expected loss computed might be underestimated. Methods to deal with suc h situations have b een devel- oped, from risk sen siti ve ﬁlters [3] to risk sensitiv e p arti- cle ﬁlters [ 17]. The risk sensitiv e ﬁlters min imize the ex- pected exponential of estimation erro r , con trolling th is way how mu ch to weigh the hig her-order mom ents versus the lower - order mom ents. While weigh ting more the h igher- order mom ents, these methods are no t designed to be selec- ti vely sensitive to particular regions of interest in the state space. The risk sensiti ve particle ﬁlter is a ble to genera te more samples in the r egion of interest, but at the expense of biasing the pro posal distribution, thus the p articles obtained are biased tow ards the region of interest. While this meth od is app ropriate for fault detectio n, it provides a limited out- put for the decision maker who is inter ested in query ing the probab ility density fun ction for d ifferent nume rical qu anti- ties used in the decision pro cess such as the expected loss, the mean or the mode of the pdf. The presen t paper is concern ed with providin g a better approx imation to the prob ability density fun ction by incor- porating con textual loss infor mation he ld by the decision maker into th e DA process. In this work we use a Gaussian mixture appro ximation to th e prob ability de nsity function . W e pr opose a “non-in trusive” way in com puting an approx - imate pdf that addresses the region of interest and it is closer to the true pd f. Non-intrusive refers her e to th e fact the we do not re quire a new DA metho d when incorp orating the loss function into the deriv ation. A progr essi ve selectio n method is d esigned to ad d n ew Gaussian comp onents to the initial Gaussian mixtu re, in as- suring that p robabilistic suppor t is r eaching th e region of interest at the decision time. The in itial weights of the n ew Gaussian co mpone nts are set to zero and they ar e mo diﬁed when prop agated th rough out the nonlin ear d ynamical sys- tem to minimize the error in the FPKE [1 6]. Theref ore if there is any probability den sity mass in the r egion of inter- est it will b e represen ted by the non -zero weight of the n ew Gaussian compon ents at the decision time. The problem is stated in Section 2 and the Gaussian Sum approx imation to the conditional pdf is presented in Section 3. The pr ogressive selection o f Gaussian co mpon ents is de- riv ed in Sec tion 4. An example to illustrate th e co ncept is giv en in Section 5 and the conclu sions an d fu ture work are discussed in Section 6. 2 Pr oblem Statement Consider a general n -d imensional contin uous-time noise driven nonlin ear dy namic system with uncer tain initial con- ditions and discrete measure ment model, given by the equa- tions: ˙ x ( t ) = f ( t, x ( t )) + g ( t, x ( t )) Γ ( t ) (1) z k = h ( t k , x k ) + v k (2) and a set of k observations, Z k = { z i | i = 1 . . . k } . W e denote, x k = x ( t k ) , Γ ( t ) repre sents a Gaussian white noise proc ess with the correlation fun ction Q δ ( t k +1 − t k ) , and th e in itial state uncertainty is captured b y the pdf p ( t 0 , x 0 ) . The r andom vector v k denotes the m easurement noise, which is tempora lly uncorrelated , zero -mean random sequence with known covariance, R k . The p rocess n oise and the measurement noise are unc orrelated with each other and with the initial condition. W e are inte rested in ﬁndin g the condition al proba bility density fu nction p ( t, x ( t ) | Z k ) . For t > t k we get the f ore- cast pdf by integratin g only Eq.1 for ward, if t = t k we are interested in the nowcast or ﬁltered pdf and for t < t k we obtain the hindcast or the smoothed pdf. Giv en a state space region of interest at a particular de ci- sion time, t d , which may b e represen ted as a loss fu nction by the d ecision maker , L ( x d , a d ) , the expected lo ss o f an action a d is giv en by: L ( a d ) = Z L ( x d , a d ) p ( t d , x d | Z k )d x d (3) Here we will consider on ly the cases wh ere t d > t k . Giv en ap proxim ate compu tational meth ods f or the cond i- tional pdf, ˆ p ( t d , x d | Z k ) , we ar e able to ob tain an estimate of the expected lo ss and also ﬁnd the optimal Bayesian de- cision, if a set of decisions exists. ˆ L ( a d ) = Z L ( x d , a d ) ˆ p ( t d , x d | Z k )d x d (4) ˆ a d = arg min a d Z L ( x d , a d ) ˆ p ( t d , x d | Z k )d x d (5) The decision makin g process in the da ta assimilation framework is presented in Fig.1(left). Obviously if we have a good ap prox imation for the conditional pdf in the region of interest the same can be said for the expected loss. This sit- uation beco mes more drama tic when a large de viation exists between the ac tual and the estimated conditiona l pd f in th e region o f interest. In th e case of evaluation of a single de- cision, the algo rithm can un derestimate the a ctual expected loss, ˆ L ( a d ) ≪ L ( a d ) , misgu iding the decision maker with respect to the magnitude of the situation. In th e case when a optimal decision has to be chosen, the lar ge dif ference be- tween conditional pdf s may result in picking not only a sub- optimal decision but a very consequ ential one. While one c an derive a new method to approx imate the condition al pdf by inclu ding the loss fu nction in the deriva- tion and reduce the difference in the region of interest to better approxima te the expected loss, it will accomplish this at the expense of worsening the approximatio n of the condi- tional pdf in the rest of the state space. This will affect o ther estimates based o n the con ditional pdf, but the expected loss, that m ay be require d in gu iding Figure 1: Left ﬁgure repre sents the classic approac h to de - cision makin g in the d ata assimilation con text. The righ t ﬁgure shows the proposed model. the d ecision pr ocess, like the mean of the pdf, the mo des of the pd f, etc. These will be b iased tow ards the region of interest, thus misleading the decision maker . In other words, if we call the compu tation of the e xpected loss o f a given action as impact assessment and the co mpu- tation of the moments and other quantities based on the con - ditional pd f as situation a ssessment , one will req uire that both to be as ac curate as p ossible. At the limit, if we can compute exactly the conditio nal pdf we obtain b oth impac t assessment and situation assessment since we can quantify exactly the probability of all the outcomes. Since the decision maker holds important information re- garding the use of the conditional pd f obtained f rom the data assimilation metho d, we can incorpor ate this info rma- tion in the data assimilation pr ocess in a non-intru si ve man- ner (d o not have to derive a new m ethod) , by supplemen t- ing the inp uts into the d ata assimilation module. The pr o- posed method is shown in Fig.1(right), wher e a ne w interac- tion level is introd uced between the d ecision maker an d th e data assimilation, that uses the contextual in formatio n pro- vided by the decision maker to supplement the inputs of D A / change the en vironm ent in which D A is running. Therefo re we want to ﬁnd an ap proxim ation to the con- ditional p df, ˆ p ∗ ( t d , x d | Z k ) , that addresses the interest held by the decision maker and provides both a better impact and situation assessment th an ˆ p ( t d , x d | Z k ) . These objectives c an be captured by the following two relations: Z   p ( t d , x d | Z k ) − ˆ p ∗ ( t d , x d | Z k )   2 d x d ≤ Z   p ( t d , x d | Z k ) − ˆ p ( t d , x d | Z k )   2 d x d (6)    ˆ L ∗ ( a d ) − L ( a d )    ≤    ˆ L ( a d ) − L ( a d )    (7) In the present p aper, we will design an inter action level between the decision maker and the data assimilation mo d- ule that approx imates the conditio nal pdf using a Gaussian mixture. The interaction le vel is adding ne w Gaussian com- ponen ts to the initial uncertainty , such that they will be posi- tioned near the r egion of interest at th e decision time. Their initial weights will be set to zero, thus t he initial unce rtainty is n ot changed, but the e volution of the weights is dictated by the er ror in the Fokker-Planck equ ation. Th us if any pr oba- bility density mass is moving naturally tow a rds the region of interest, the weigh ts of the new G aussian co mpon ents will become gre ater than zero . Therefore the metho d will just make sure that if there is an y probability density mass in the region of inter est it will be fou nd by th e data assimilation method. In this paper we will consider o nly the forecast of the con- ditional pdf when no measuremen ts are av ailab le between the curre nt time and the d ecision time. A sugg estion, on how this can be used in the case when we have observations to assimilate between the current time a nd the decision ti me, is giv en in Section 4. 3 A pproximation of the Conditional Pr obability Density Function The nonlinear ﬁltering problem has been extensively stud - ied an d various metho ds are pr ovided in literatur e. Th e Ex- tended Kalman Filter (EKF) is historically the ﬁrst, and still the most wide ly ado pted approach to solve th e nonlinear state estimation problem . I t is b ased on the assum ption that the nonlinear system dynamics can b e accurately modeled by a ﬁrst-orde r T aylor series expansion [4] . Sin ce the EKF provides u s on ly with a roug h approx imation to the a pos- teriori pdf and solv ing for the exact solutio n of the con di- tional pdf is very expen si ve, researchers h av e been loo king for mathematically con venient appro ximations. In R ef. [1], a weighted sum of Gaussian density function s has been prop osed to ap proxim ate the condition al pdf. The probab ility density fun ction of th e initial co ndition is given by the following Gaussian sum, p ( t 0 , x 0 ) = N X i =1 w i 0 N ( x 0 | µ i 0 , P i 0 ) (8) N ( x | µ , P ) = | 2 π P | − 1 / 2 exp  − 1 2 ( x − µ ) T P − 1 ( x − µ )  Let us assume th at the und erlying condition al p df can be approx imated by a ﬁnite sum of Gaussian pdf s ˆ p ( t, x ( t ) | Z k ) = N X i =1 w i t | k N ( x ( t ) | µ i t | k , P i t | k ) | {z } p g i (9) where µ i t | k and P i t | k represent the co nditional mean an d co- variance of the i th compon ent of th e Gaussian pdf with re- spect to the k me asurements, and w i t | k denotes the amplitude of i th Gaussian in the mixture. The p ositi vity and normal- ization con straint on the m ixture p df, ˆ p ( t, x | Z k ) , lead s to following constrain ts on the amplitude vector: N X i =1 w i t | k = 1 , w i t | k ≥ 0 , ∀ t (10) A Gaussian Sum Filter [ 1] may be used to prop agate and update th e conditio nal pdf. Since all the com ponen ts of the mixture pdf (9) are Gaussian and thus, on ly estimate s of their mean and covariance need to b e propaga ted between t k and t k +1 using the conv entional Extended Kalman Filter time update equations: ˙ µ i t | k = f ( t, µ i t | k ) (11) ˙ P i t | k = A i t | k P i t | k + P i t | k ( A i t | k ) T + g ( t, µ i t | k ) Qg T ( t, µ i t | k ) (12) A i t | k = ∂ f ( t, x ( t )) ∂ x ( t )     x ( t )= µ i t | k (13) In Ref. [16] an upd ate method of a dapting the weights o f different Gaussian compo nents du ring prop agation is giv en based on minimizin g the e rror in the Fokker-Planck equa- tion for continuo us dy namical systems, and on minimizing the in tegral square difference between the true foreca st pdf, giv en by the Chapman-Kolmogorov equation s and its Gaus- sian sum approx imation in the discrete time dyna mical sys- tems. The new weights are given by the solu tion of the fo llow- ing quadr atic programing problem: w t | k = a rg min w t | k 1 2 w T t | k ( L + I ) w t | k − w T t | k w k | k (14) subject to 1 T N × 1 w t | k = 1 w t | k ≥ 0 N × 1 where w t | k ∈ R N × 1 is a vector of Gaussian weights, and the elements of L ∈ R N × N are giv en by: L ij = Z V L i L j d x (15) L i ( t, x ) =   ∂ p g i ∂ µ i t | k T f ( t, µ i t | k ) + n X j =1 n X k =1 ∂ p g i ∂ P i,j k t | k ˙ P i,j k t | k + n X j =1  f j ( t, x ) ∂ p g i ∂ x j + p g i ∂ f j ( t, x ) ∂ x j + 1 2 ∂ d (1) j ( t, x ) p g i ∂ x j − 1 2 n X k =1 ∂ 2 d (2) j k ( t, x ) p g i x j x k !# d (1) ( t, x ) = 1 2 ∂ g ( t, x ) ∂ x Qg ( t, x ) d (2) ( t, x ) = 1 2 g ( t, x ) Qg T ( t, x ) Notice, to carr y o ut this minimizatio n, w e need to evalu- ate integrals inv olving Gaussian pdfs over v olume V which can b e com puted exactly fo r po lynomia l nonline arity and in g eneral can be appr oximated by the G aussian q uadratur e method. The measurement upd ate is done using Bayes rule, where the state and the covariance ma trix are up dated usin g the Extended Kalman Filter measurement update equations, and the weights are updated as it is shown in R ef. [2]. Th e equa- tions can be found in Ref. [15] . By updating the forecast weights, not only can we obtain a more accurate estimate b ut also a better approximation to the condition al p robability density function [15] . This weight update method during uncertain ty p ropag ation is very useful when the measurement mo del offer limited or n o inform a- tion in updating the states of the system. The estimated co nditiona l pdf is used to comp ute the ex- pected loss. W e requir e that the loss fu nction provide d is positive, ﬁnite everywhere and it is able to distinguish the importan t states fr om th e un importan t on es. For simplicity the loss function used in this work has the follo wing form: L ( x d , a d ) = N ( x d | µ L , Σ L ) (16) Observe th at even with a be tter appr oximation o f th e weights of different Gaussian com ponen ts, these comp o- nents m ay drift away from the loss function d ue to ﬁrst or- der T aylor series unce rtainty pr opagatio n and limited in for- mation in th e measuremen t up date, situation which may b e av o ided if the conditional pdf can be found in an e xact way . Due to the ap prox imations u sed in propag ating th e con- ditional pdf it may hap pen that no or very little pr obabil- ity density mass exists in the region o f interest at the deci- sion time, dep icted here by the loss func tion. Th us the ex- pected loss will be signiﬁcantly und erestimated, misguid ing this way the decision maker . 4 Decision Maker - Data Assimilation Interaction Lev el The iterative m ethod prop osed here , is adding a set of Gaussian compon ents to the initial pdf that are sensitive to the loss functio n at the decision time. After pro pagation , these Gau ssian components will be lo cated near th e center of support of the loss function at the decision time. Initially the weights of these com ponen ts are set to zero, and they will be up dated in the propagation step if any probab ility density mass is moving in their direction . The weig hts at th e deci- sion time will give their relativ e contributions in comp uting the expected l oss with respect to the entire cond itional pdf. An alg orithm th at bears similarity to the simulated an- nealing and the pro gressive corre ction used in particle ﬁl- ters [11], is propo sed in selecting the initial Gaussian com- ponen ts. The m ain idea is to select a set of Gau ssian c ompon ents initially , prop agate each on e of them using th e time up date equations in th e Exten ded K alman Filter u ntil the d ecision time is reached and based o n their contributions to the ex- pected loss, select th eir mean s and variances in the in itial distribution such th at after prop agation th e expected loss is maximized. Let the initial pdf be giv en b y p ( t 0 , x 0 ) as a G aussian sum, Eq.8. Compute the mean and the variance of this mixtu re. µ 0 = N X i =1 w i 0 µ i 0 (17) P 0 = N X i =1 w i 0  P i 0 + ( µ i 0 − µ 0 )( µ i 0 − µ 0 ) T  (18) Assume tha t we want to add ano ther M new Gau ssian compon ents to th e initial pdf with z ero weights and sen - siti ve to the loss functio n. W e sample the m eans of th ese Gaussian comp onents from the initial d istribution such th at their equally weighted sum giv es the mean in Eq.17. µ i ∼ p ( t 0 , x 0 ) for i = 1 . . . M − 1 (19) µ M = M µ 0 − M − 1 X i =1 µ i (20) The default covariance of the Gaussian com ponen ts i s D . W e want to ﬁnd the ne w co variance D ∗ such that the covari- ance of the new Gaussian comp onents matches the c ovari- ance of the initial pdf. Let D ∗ = γ D . Thu s we want to ﬁnd γ su ch that we minimize the following expression: J γ = T r  P 0 − 1 M M X i =1  γ D + ( µ i − µ 0 )( µ i − µ 0 ) T  (21) γ = 1 T r  D  T r  P 0 − 1 M M X i =1 ( µ i − µ 0 )( µ i − µ 0 ) T  (22) Only solution s γ > 0 are acc epted. Otherwise we r e- peat th e sampling of the mea ns, Eq. 19. On ce we have the initial Gaussian sum compo nents we propagate them us- ing the time update equations in the Extended Kalman Filter until we r each the decision time. Let µ i t d and P i t d be their means and cov ar iances. The Gaussian componen ts will then be weighted based on their contribution to the expected loss. A larger co ntribution means a mor e sensiti ve com ponent to the loss function , thus a larger weight. T o be able to comp ute th e weigh ts of the Gaussian com- ponen ts, m ake sure that all of them are fairly weighted and we are not runnin g into num erical problem s and also cr eate an indicator to mark the end of the algorithm, we com pute an inﬂation coefﬁ cient for th e loss function. Let Σ ∗ L = α Σ L be the inﬂated covariance of the loss function. The inﬂation coefﬁcient α is found such that the expected loss com puted using the mo st distant Gaussian compo nent from the loss f unction is maximized. Let th e mean a nd the cov ar iance of the most distant component be denoted by µ max t d and P max t d respectively . J max = Z N ( x d | µ L , α Σ L ) N ( x d | µ max t d , P max t d )d x d = N ( µ L | µ max t d , α Σ L + P max t d ) (23) An eq uiv ale nt way to seek α is b y min imizing the nega- ti ve log of the above expectation. J min = log [det( α Σ L + P max t d )] + ( µ L − µ max t d ) T  α Σ L + P max t d  − 1 ( µ L − µ max t d ) (24) Let us d enote K = α Σ L + P max t d and U = ( µ L − µ max t d )( µ L − µ max t d ) T . W e seek α > 0 suc h that ∂ J min ∂ α = 0 (25) T r  K − 1 Σ L − K − 1 UK − 1 Σ L  = 0 (26) After a few mathe matical m anipulation s, Eq.2 6 can be written in the following for mat: T r  K − 1 Σ L ( α I + P max t d Σ − 1 L − U Σ − 1 L ) K − 1 Σ L  = 0 (27) Denote A = K − 1 Σ L and B = α I + P max t d Σ − 1 L − U Σ − 1 L . Observe that for α > 0 the matrix A is symmetric and po si- ti ve deﬁnite. Lemma : If T r[ ABA ] = 0 an d A is symmetr ic and posi- ti ve deﬁnite then T r[ B ] = 0 . Pr oof : Let A = VSV T be a singular value decom- position of matrix A , where V is a unitary matrix an d S is a diag onal matrix. Our tra ce can now be written as T r[ ABA ] = T r[ VSV T BVSV T ] = T r[ S 2 B ] . If T r[ S 2 B ] = 0 then S 2 B is a co mmutator . Thus the re is X a nd Y such that S 2 B = XY − YX . But B = S − 2 XY − S − 2 YX = X ∗ Y − YX ∗ , where X ∗ = S − 2 X . Th erefor e B is also a commutato r , hence T r[ B ] = 0 . Applying the previous lemma to Eq.2 7 we get T r  α I + P max t d Σ − 1 L − U Σ − 1 L  = 0 (28) Therefo re we accept solution s α > 1 that satisfy the f ol- lowing relation α = 1 n T r  ( U − P max t d ) Σ − 1 L  (29) For α ≤ 1 we stop the algor ithm. Otherwise we con tinue getting the weights of the Gaussian c ompon ents by solving the following optimizatio n problem: w = arg min w 1 2 w T Mw − w T N ( 30) subject to 1 T M × 1 w = 1 (31) w ≥ 0 M × 1 (32) where w ∈ R M × 1 , M ∈ R M × M and N ∈ R M × 1 and their entries are giv en by: m ij = N  µ j t d | 0     µ i t d | 0 , P i t d | 0 + P j t d | 0  (33) n i = N  µ L     µ i t d | 0 , P i t d | 0 + Σ ∗ L  (34) The new pdf used to sample the new means is gi ven by: p new ( t 0 , x 0 ) = M X i =1 N ( x 0 | µ i , β D ∗ ) (35) Algorithm 1 Prog ressiv e Selection of Gaussian Compo- nents Require: t d - decision time p ( t 0 , x 0 ) - initial pro bability density function M - nu mber of extra Gaussian components D - Gaussian compon ent covariance w tol - add only Gaussian c ompon ents with weights greater than this threshold L ( x ) - loss fu nction N { x | µ L , Σ L } 1: ˆ p 0 = p ( t 0 , x 0 ) , α = ∞ , γ = − 1 2: while ( α > 1) & maxiter do 3: The mean and the cov ariance of the initial pdf µ 0 = P N i =1 w i 0 µ i 0 P 0 = P N i =1 w i 0  P i 0 + ( µ i 0 − µ 0 )( µ i 0 − µ 0 ) T  4: while ( γ < 0 ) do 5: Get the means of the Gaussian compon ents Draw µ i ∼ p ( t 0 , x 0 ) for i = 1 . . . M − 1 Set µ M = M µ 0 − P M − 1 i =1 µ i 6: γ = 1 T r  D  T r  P 0 − 1 M P M i =1 ( µ i − µ 0 )( µ i − µ 0 ) T  7: end while 8: Get the covariance of the Gaussian compon ents P i 0 = γ D 9: Propagate the moments from t = 0 to t = t d ˙ µ i t | 0 = f ( t, µ i t | 0 ) ˙ P i t | 0 = A i t | 0 P i t | 0 + P i t | 0 ( A i t | 0 ) T + gQg T 10: Get the most distant compone nt by compu ting the Maha nalobis distance d i = ( µ L − µ i t d | 0 ) T  P i t d | 0 + Σ L  − 1 ( µ L − µ i t d | 0 ) µ max t d | 0 , P max t d | 0 = arg max( d i ) 11: Compute optimal α and the inﬂated matrix Σ ∗ L α = 1 n T r  ( µ max t d | 0 − µ L )( µ max t d | 0 − µ L ) T − P max t d | 0  Σ − 1 L  12: if α < 1 then α = 1 end if Σ ∗ L = α Σ L 13: Elements of M ∈ R M × M and N ∈ R M × 1 m ij = N  µ j t d | 0     µ i t d | 0 , P i t d | 0 + P j t d | 0  n i = N  µ L     µ i t d | 0 , P i t d | 0 + Σ ∗ L  14: Compute the weights w = arg min w 1 2 w T Mw − w T N subject to 1 T M × 1 w = 1 w ≥ 0 M × 1 15: Set ˆ p 0 = P M j =1 w j N { x | µ j 0 , β P j 0 } 16: end while 17: Set p NEW ( t 0 , x 0 ) = p ( t 0 , x 0 ) + P M ,w j ≥ w tol j =1 0 × N { x | µ j 0 , P j 0 } 18: return p NEW ( t 0 , x 0 ) Where β ≤ 1 is a coefﬁcient that controls the decrease of the initial variance. I f α has decreased fro m the previous it- eration this means that the Gaussian compon ents are getting closer to the loss function and therefore we can decrease the variance of the initial distribution to ﬁnely tun e the position of the Gaussian comp onents, otherwise β = 1 . W e co ntinue to sample n ew means from the new pdf until α < 1 or the maximum nu mber o f time steps has been reach ed. Th e en- tire alg orithm is pr esented in T able 1. While not the sco pe of this paper, the ab ove metho d c an also b e applied when measuremen ts are a vailable between the curr ent tim e and the decision time. T he pro gressive selection algorithm will be applied ev ery time a measur ement has been assimilated and the a posterior i pdf has b een fo und. The dr awback of this proced ure is th at the number of Gau ssian compo nents w ill increase lin early with the number of me asurements. Better ways to deal with the measur ement updates are set as futu re work. In the case of multiple loss functio ns, the algorithm is run once fo r eac h o ne o f th e loss functions, creating sets of in i- tial Gaussian compon ents sensitive to their loss fun ction. 5 Numerical Results T o illustrate the con cept of inc orpor ating contextual infor- mation into the u ncertainty propag ation algorithm, we con - sider th e fo llowing continuou s-time dynamical system with uncertain initial condition gi ven by (36): ˙ x = sin ( x ) + Γ( t ) where Q = 1 (36) x 0 ∼ N ( − 0 . 3 , 0 . 3 2 ) The state space region of interest is depicted by the fol- lowing loss fu nction, and th e time o f decision is at t d = 8 sec. L ( x ) = N ( x | π 2 , 0 . 1 2 ) (37) First we compute an accurate numerical solution based on the FPKE, and this will stand a s the true pro bability density function . Th e perf ormance measures f or this method will be la beled as TR UTH . Th e e volution of the pdf using this method can be seen in Fig.2a. Three o ther approxim ations for th e pdf are pr ovided in- cluding the method pr esented in th is paper . T he ﬁrst ap- proxim ation prop agates the initial uncertainty using the ﬁrst order T aylor expan sion, Eq.(11 ) and Eq.(1 2), also known as the Extended Kalman Filter time up date eq uations, labeled later as EKF . T he evolution o f the pdf for this meth od is presented in Fig.2b . For the n ext appro ximation m ethod, we add another 5 Gaussian comp onents to the initial o ne, creatin g this way a Gaussian mixture with 6 c ompon ents. T he means of the ne w compon ents a re just the result of back pro pagation (from t d = 8 sec to t c = 0 sec) of 5 e quidistant samples taken in the 3 sigma b ound of the loss function support. The vari- ance of the new compon ents is set to 1 0 − 10 and their initial weights are set to zero . The label used fo r this method is GS BCK and the evolution of the p df is shown in Fig.2c. While all the means of the ne w Gaussian compon ents ar e p o- sitioned in th e loss function support region, th eir variances are very being difﬁcult to see the probab ility density mass in that region. −10 −5 0 5 10 0 2 4 6 8 0 0.5 1 x time (sec) p(t k ,x k ) (a) TR UTH: Numerical approximati on FPKE −10 −5 0 5 10 0 2 4 6 8 0 0.5 1 x time (sec) p(t k ,x k ) (b) EKF: ﬁrst order T aylor expa nsion approximat ion −10 −5 0 5 10 0 2 4 6 8 0 0.5 1 x time (sec) p(t k ,x k ) (c) GS BCK: back propagated means −10 −5 0 5 10 0 2 4 6 8 0 0.5 1 x time (sec) p(t k ,x k ) (d) GS DEC: progressi ve selectio n of Gaussian component s −5 0 5 0 0.2 0.4 0.6 0.8 1 1.2 x p(t k =8 sec,x k ) GS_DEC TRUTH LOSS_FUNCTION GS_BCK EKF (e) Probabilit y density function at t k = 8 sec −0.6 −0.4 −0.2 0 0.2 0 5 10 15 20 0 50 100 x time step pdf to sample GS means (f) The e volut ion of the pdf used to sample the means of the Gaussian components Figure 2: The ev olution of the F orecast pdf and the Sampling pdf W e apply the m ethod p resented in this paper to gener- ate at mo st 5 new Gaussian comp onents to be ad ded to the initial con dition. Their means a nd variances are returned by the p rogressive selection algorithm , Alg. 1 . T he in itial weights of the new Gaussian componen ts ha ve also been set to zero. Th e default value for the β co efﬁcient is 0 . 9 and Gaussian co mpon ents are inclu ded only if their weights a re greater than w tol = 10 − 3 . Th e label used fo r this method is GS DEC and its correspo nding pdf is pre sented in Fig.2d. The evolution the Gaussian compon ents for the last two methods is also achieved using the ﬁrst-order T aylor expan- sion, b ut it is in terrupted every ∆ t = 0 . 5 sec to adju st the weights o f d ifferent Gaussian com ponen ts using the o pti- mization in Eq.(1 4 ). The fo llowing perf ormanc e measure s have b een com - puted for the methods used in the experiment: ˆ L d = Z L ( x ) ˆ p ( t d , x d )d x d (38) ˆ R err = 1 L d    L d − ˆ L d    (39) I S D = Z   p ( t d , x d ) − ˆ p ( t d , x d )   2 d x d (40) W I S D = Z L ( x )   p ( t d , x d ) − ˆ p ( t d , x d )   2 d x d (41) In Fig.2e it is plotted the forecast pdf at time t d for all the methods. Ou r metho d, GS DEC, is able to better e stimate the probab ility density mass in the region of interest. In T able I, we present the p erforma nce m easures afte r 500 Mon te Carlo run s. The expected loss given b y the GS DEC method is co nsistently better approx imated over all the Monte Carlo r uns th an the EKF and the GS BCK method. W e also are able to co nsistently give a better ap- proxim ation to the pdf overall and also in the region o f in- terest than the EKF method, which justify the u se of this method. Compared with th e GS BCK we d o a b etter jo b in av erage in ap proxim ating the p df which su ggests that th ere is a trad e o ff in selecting th e Gaussian co mpon ents regar ding their means and variances. In Fig. 2f it is plotted the evolution of the pdf u sed to sample th e me ans o f the n ew Gaussian co mpone nts for o ne particular run. The pdf used in the ﬁrst iteration is our initial uncertainty and we see h ow it conv erges, as the numb er of iterations increa ses, to a particular region in th e state sp ace that is sensiti ve to the loss function at the decision time. 6 Conclusion An interaction level be tween the d ecision m aker and the data assimilation mod ule has been designed , such th at we T able 1: Perf ormance measures - 500 Monte Carlo runs ˆ L d ˆ R err I S D W I S D TR UT H 0.033 2 N/A N/A N/A EKF 4.93E- 09 1.0 000 0 .1840 0.0015 GS BCK 0.000 1 0.99 68 0.0 536 0.0015 GS DEC (mean) 0.025 6 0.23 00 0.0 470 0.0004 GS DEC: Percen tile T ab le - 500 Observations Percent ˆ L d ˆ R err I S D W I S D 0.0% 0.001 0 0.01 51 0.0 368 0.0002 5.0% 0.014 2 0.02 30 0.0 378 0.0003 10.0% 0.01 77 0.0 271 0. 0380 0.0003 25.0% 0.02 29 0.0 566 0. 0387 0.0003 50.0% 0.02 57 0.2 270 0. 0491 0.0003 75.0% 0.03 13 0.3 090 0. 0514 0.0004 90.0% 0.03 23 0.4 670 0. 0574 0.0006 95.0% 0.03 24 0.5 710 0. 0601 0.0007 100.0 % 0.032 7 0.97 00 0.0 705 0.0014 can incor porate contextual information held by the d ecision maker in to th e d ata assimilation p rocess to better appro xi- mate the conditiona l probability function. The progr essi ve selection algorithm is run once at the be- ginning of the simulation to supp lement the initial uncer- tainty with new Gaussian compo nents that are sensitive to the loss function at the decision ti me. The weights of all the Gaussian co mpon ents are then up dated durin g the prop aga- tion b ased on th e Fokker Planck Equation . This way we obtain no t only a better approxima tion of the prob ability density function in the r egion of interest but also a better approx imation overall. The co st of th is overall improvement is an increase in the nu mber of Gaussian comp onents. The prin cipal b eneﬁt is not the modest increase in accur acy overall, but the signiﬁcantly enhanced accuracy within the decision maker’ s region of interest. Acknowledgment: This work was supported under Con- tract No. HM1 582-0 8-1-0 012 fr om ONR. Refer ences [1] H. Alspach, D.; Sorenson. Nonlinear bayesian estima- tion using gaussian sum ap proxim ations. A utomatic Contr ol, IEEE T ransaction s on , 17(4):439 –448, 1972. [2] Brian D.O. Ander son and John B. Moore. Optimal F iltering . Prentice-Hall, 1979. [3] R.N. Banavar an d J.L. Speyer . Properties of r isk- sensiti ve ﬁlters/estimators. IEE Pr oc.Con tr ol Theory and Application s , 145:1 06–1 12, 1998. [4] J. Crassidis and J. Junkins. Optimal Estimation of Dy- namic Systems . CRC Press, 20 04. [5] A. Douce t, N. de Freitas, and N. Gor don. Sequential Monte-Carlo Metho ds in Practice . Springer-V erlag, April 2001. [6] K. Ito, K.; Xio ng. Gaussian ﬁlters f or no nlinear ﬁlter- ing p roblems. Automatic Contr o l, IEEE T ransactions on , 45(5):91 0–92 7, 2000. [7] R. N. Iyen gar and P . K. Dash. Stu dy of the rando m vibration of nonlin ear systems by the gaussian closure technique . J ournal of App lied Mechanics , 45:393 –399 , 1978. [8] M. Kumar, P . Singla, S. Chakrav orty , and J. L. Junk- ins. Th e partition of unity ﬁnite element appr oach to the stationary fo kker-planck equation. In 200 6 AIAA/AAS A str odyna mics Specia list Confer ence a nd Exhibit, K eystone, CO , Aug. 21-24, 2006. [9] T . Lefebvre, H. Bruyninckx, and J. De Schutter . Kalman ﬁlters of non -linear system s: A comp arison of p erform ance. Intern ational journa l of Contr ol , 77(7) :639–6 53, 2004. [10] G. Muscolin o, G. Ricciar di, an d M. V asta. Station- ary and non-stationary probability density function for non-lin ear oscillators. Internatio nal J ournal of Non - Linear Mechanics , 32:1051–1 064, 1997. [11] C. Musso, N. Oudjan e, and F . Le Gland. Seq uential Monte Carlo Method s in Practice, Statistics for En- gineering a nd Information Science , chapter 12: Im- proving regularized pa rticle ﬁlters, page s 247– 271. Springer–V erla g, New Y o rk, 2001. [12] H. Risken. The F okker-Planc k Equation : Methods of Solution and Applications . Sprin ger, 1989. [13] J. B. Rober ts and P . D. Spanos. Ran dom V ibration and Statistical Linearization . W iley , 1990 . [14] Gabriel T erejanu, T arunr aj Sin gh, and Peter D. Scott. Unscented Kalm an Filter/Smoo ther for a CBRN Puff- Based Dispersion Mod el. In The 10th Intl. Conference on Information Fusion, Quebec City , Canad a , 2007. [15] Gabriel T erejan u, Puneet Singla, T a runraj Singh, and Peter D. Scott. A novel gaussian sum ﬁlter method for accurate solution to nonlinear ﬁltering problem. I n The 11th Interna tional Confer ence o n Information Fu sion, Cologne, Germany , 200 8. [16] Gabriel T erejan u, Puneet Singla, T a runraj Singh, and Peter D. Scott. Uncertainty propag ation for nonlin- ear dyn amical systems u sing g aussian mixtu re mod - els. J ournal of Guidanc e, Co ntr ol, a nd Dynamics , 31:162 3–16 33, 2008. [17] S. Thrun, L angfor d. J., and V . V erma. Risk sen siti ve particle ﬁlters. In Advan ces in Neural Info rmation Pr o - cessing Systems 14 . MIT Press, 2002.

Decision Based Uncertainty Propagation Using Adaptive Gaussian Mixtures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment