Data Driven Computing by the Morphing Fast Fourier Transform Ensemble Kalman Filter in Epidemic Spread Simulations

The FFT EnKF data assimilation method is proposed and applied to a stochastic cell simulation of an epidemic, based on the S-I-R spread model. The FFT EnKF combines spatial statistics and ensemble filtering methodologies into a localized and computat…

Authors: Jan M, el, Jonathan D. Beezley

Data Driv en Computing b y the Morphing F ast F ourier T ransform Ensem ble Kalman Filter in Epidemic Spread Sim ulations Jan Mandel Jonathan D. Beezley Loren Cobb Ashok Krishnam urthy Departmen t of Mathematical and Statistical Sciences, Univ ersity of Colorado Denv er, Den ver, CO 80217-3364, USA Abstract The FFT EnKF data assimilation method is prop osed and applied to a stochastic cell simulation of an epidemic, based on the S-I-R spread mo del. The FFT EnKF com bines spatial statistics and ensemble filtering metho dologies into a lo calized and computationally inexp ensiv e version of EnKF with a very small ensemble, and it is further combined with the morphing EnKF to assimilate c hanges in the p osition of the epidemic. 1 In tro duction Starting a mo del from initial conditions and then waiting for the result is rarely satisfactory . The mo del is generally incorrect, data is burdened with errors, and new data comes in that needs to b e accounted for. This is a w ell-known problem in weather forecasting, and tec hniques to incorp orate new data b y sequential statistical estimation are known as data assimilation [1]. The ensemble Kalman filter (EnKF) [2] is a p opular data assimilation metho d, whic h is easy to implement without any c hange in the mo del. The EnKF evolv es an ensem ble of simulations, and the mo del only needs to b e capable of exp orting its state and restarting from the state mo dified by the EnKF. Ho wev er, the ensemble size required can be large (easily in the hundreds), the amoun t of computations in the EnKF can b e significan t, sp ecial lo calization tec hniques need to b e employ ed to suppress spurious long- range correlations in the ensem ble co v ariance matrix, and the EnKF do es not w ork w ell for problems with sharp coherent features, suc h as the tra velling w av es found in epidemics and wildfires. W e propose a v arian t of EnKF based on the F ast F ourier transform (FFT), whic h reduces significan tly the amoun t of computations required by the EnKF, as well as the ensem ble size. The use of FFT is inspired b y spatial statistic: FFT EnKF assumes that the state 1 appro ximately a stationary random field, that is, the co v ariance b etw een t wo p oin ts is mainly a function of their distance vector. Then the multiplication of the cov ariance matrix and a v ector is a conv olution. In addition, the morphing transform [3] is used here so that changes of the state b oth in p osition and in amplitude are p ossible. The FFT EnKF with morphing is illustrated here for trac king a sim ulated epidemic w av e. The use of data assimilation techniques can increase the accuracy and reliability of epidemic trac king by using the data as so on as they are a v ailable, and some applications of data assimilation in epidemiology already exist [4, 5]. The FFT EnKF with morphing has the p oten tial to reduce complicated simulations and accurate real-time use of data to a laptop or a smartphone in the field. F or FFT EnKF in a wildfire sim ulation, see [6]. The F ourier domain Kalman filter (FDKF) [7] consists of the Kalman filter used in each F ourier mo de separately . The cov ariance of a stationary random field can b e estimated from a single realization b y the cov ariogram [8], which can b e computed efficiently by the FFT [9]. W e prop ose to use the cov ariogram for an EnKF with an ensemble of one , whic h will b e further dev elop ed elsewhere. 2 FFT EnKF The EnKF approximates the probability distribution of the model state u b y an ensemble of sim ulations u 1 , . . . , u N . Eac h mem b er is adv anced b y the simulation in time indep endently . When new data d arriv es, it is giv en as data lik eliho o d d ∼ N ( H u, R ), where H is the observation op er ator and R is the data err or c ovarianc e matrix . No w the for e c ast ensemble [ u k ] is com bined with the data by the EnKF analysis [10] u a k = u k + C N H T  H C N H T + R  − 1  d + e k − H u f k  , k = 1 , . . . , N , (1) to yield the analysis ensemble [ u a k ]. Here, C N is an appro ximation of the cov ariance C of the model state, tak en to b e the co v ariance of the ensemble, and e k is sampled from N (0 , R ). The analysis ensemble is then adv anced b y the simulations in time again. In [11], it was pro ved that the ensemble con v erges for large N to a sample from the Kalman filtering distribution when all probability distributions are Gaussian. Of course, the EnKF is used for more general cases as well. When C N is the ensemble co v ariance, the EnKF formulation (1) do es not take adv an tage of any sp ecial structure of the mo del. This allows a simple and efficien t implemen tation [12], but large ensembles, often ov er 100, are needed [2]. In an application, v ariables in the state are r andom fields , and the cov ariance deca ys with spatial distance [8]. T ap ering is the multiplication of sample co v ariance term-by-term with a fixed deca y function that drops off with the distance. T ap ering improv es the accuracy of the approximate cov ariance for small ensem bles [13], but it makes the implementation of (1) more exp ensiv e: the sample co v ariance matrix can no longer be efficien tly represen ted as the pro duct of t wo muc h smaller dense matrices, but it needs to b e manipulated as a large, alb eit sparse, matrix. Random 2 fields in geostatistics are often assumed to be stationary , that is, the cov ariance betw een tw o p oin ts dep ends on their spatial distance vector only . The FFT EnKF discussed here uses a v ery small ensemble, but larger than one. W e explain the FFT EnKF in the 1D case; higher-dimensional cases are exactly the same. Consider first the case when the mo del state consists of one blo ck only . Denote b y u ( x i ), i = 1 , . . . , n the entry of vector u corresp onding to node x i . If the random field is stationary , the co v ariance matrix satisfies C ( x i , x j ) = c ( x i − x j ) for some cov ariance function c , and m ultiplication by C is the con v olution v ( x i ) = n X j =1 C ( x i , x j ) u ( x j ) = n X j =1 u ( x j ) c ( x i − x j ) , i = 1 , . . . , n. After FFT, con v olution b ecomes en try-by-en try m ultiplication of v ectors, that is, m ultipli- cation b y a diagonal matrix. W e assume that the random field is approximately stationary , so we neglect the off- diagonal terms of the cov ariance matrix in the frequency domain, which leads to the the follo wing FFT EnKF metho d. First apply FFT to eac h member, b u k = F u k . Next, approx- imate the forecast cov ariance matrix in the frequency domain by the diagonal matrix with the diagonal en tries giv en b y b c i = 1 N − 1 N X k =1    b u ik − b u i    2 , where b u i = 1 N N X k =1 b u ik . (2) Then define approximate cov ariance matrix C N b y term-by-term multiplication · in the F ourier domain u = C N v ⇐ ⇒ b u = F u, b v = b c • b u, v = F − 1 b v, ( b c • b u ) i = b c i b u i . When H = I and R = r I , the ev aluation of (1) reduces to b u a k = b u k + b c • ( b c + r ) − 1 •  b d + b e k − b u f k  . (3) In general, the state has more than one v ariable, and u , C , and H ha ve the block form u =    u (1) . . . u ( n )    , C =    C (11) · · · C (1 M ) . . . . . . . . . C ( M 1) · · · C ( M M )    , H =  H (1) · · · H ( M )  . (4) Here, the first v ariable is observ ed, so H (1) = I , H (2) = 0,. . . , H ( M ) = 0, and (1) b ecomes u ( j ) ,a k = u ( j ) k + C ( j 1) N  C (11) N + R  − 1  d + e k − u (1) k  , j = 1 , . . . , M , (5) and in the frequency domain b u ( j ) ,a k = b u ( j ) k + b c ( j 1) •  b c (11) + r  − 1 •  b d + b e k − b u k  . (6) 3 The cross-cov ariance b et ween field j and field 1 is approximated b y neglecting the off-diagonal terms of the sample cov ariance in the frequency domain as w ell, b c ( j 1) i = 1 N − 1 N X k =1  b u ( j ) ik − b u ( j ) i   b u (1) ik − b u (1) i  , where b u ( ` ) i = 1 N N X k =1 b u ( ` ) ik , ` = 1 , j. (7) In the computations reported here, w e hav e used the real sine transform, so all n umbers in (7) are real. Also, the use of the sine transform naturally imp oses no change of the state on the b oundary . 3 Morphing EnKF Giv en an initial state u , the initial ensemble in the morphing EnKF [3, 12] is giv en by u ( i ) k =  u ( i ) N +1 + r ( i ) k  ◦ ( I + T k ) , k = 1 , . . . , N , (8) with an additional member u N +1 = u , called the r efer enc e memb er . In (8), r ( i ) k are random smo oth functions on Ω, T k are random smooth mappings T k : Ω → Ω, and ◦ denotes comp osition. Th us, the initial ensemble v aries b oth in amplitude and in p osition, and the c hange p osition is the same in all blo c ks. The random smo oth functions and mapping are generated by FFT as F ourier series with random co efficients with zero mean and v ariance that deca ys quic kly with frequency . The data d is an observ ation of u (1) . The first blo c ks of all members u 1 , . . . , u N and d are then registered against the first blo ck of u N +1 as u (1) k ≈ u (1) N +1 ◦ ( I + T k ) , T k ≈ 0 , ∇ T k ≈ 0 , k = 0 , . . . , N , u (1) 0 = d and T k : Ω → Ω, k = 0 , . . . , N are called r e gistr ation mappings . The registration mapping is found b y m ultilevel optimization. The morphing tr ansform maps eac h ensemble mem b er u k in to the extended state v ector, the morphing r epr esentation, u k 7→ e u k = M u N +1 ( u k ) =  T k , r (1) k , . . . , r ( M ) k  , (9) where r ( j ) k = u ( j ) k ◦ ( I + T k ) − 1 − u ( j ) N +1 , k = 0 , . . . , N , are r e gistr ation r esiduals . Likewise, the extended data vector is defined by d 7→ e d =  T 0 , r (1) 0  and the the observ ation op erator is  T , r (1) , . . . , r ( M )  7→  T , r (1)  . W e then apply the FFT EnKF metho d (6) is applied to the transformed ensemble e u 1 , . . . , e u N . The co v ariance C (11) in (5) consists of three diagonal matrices and w e neglect the off-diagonal blo cks, so the fast form ula (6) can b e used. The analysis ensem ble u 1 , . . . , u N +1 is obtained b y the inverse morphing tr ansform u a, ( i ) k = M − 1 u N +1 ( e u a k ) =  u ( i ) N +1 + r a, ( i ) k  ◦ ( I + T a k ) , k = 1 , . . . , N + 1 , (10) 4 where the new transformed reference member is given b y e u a N +1 = 1 N N X k =1 e u a k . (11) 4 Epidemic mo del The epidemic mo del that we used for this study is a spatial version of the common S-I-R dynamic epidemic mo del. A p erson is susc eptible or infe ctious in this context if he or she can contract or transmit the disease, resp ectively . The r emove d state includes those who ha ve either died, ha ve b een quarantined, or ha ve recov ered from the disease and b ecome imm une. The state v ariables are the susceptible ( S ), the infectious ( I ), and the remov ed ( R ) p opulation densities.The core ideas for this mo del date back to the 1957 spatial formulation b y Bailey [14], but the sp ecific version that we hav e employ ed here is due to Hopp enstaedt [15, p. 64]. The p opulation is considered to b e disp ersed o v er a planar domain Ω ⊂ R 2 , and it is lab elled according to its p osition with resp ect to the spatial co ordinates x and y . The (deterministic) ev olution of the state ( S ( t ) , I ( t ) , R ( t )) is giv en b y ∂ S ( x,y ,t ) ∂ t = − S ( x, y , t ) R R Ω w ( x, y , u, v ) I ( u, v , t ) dudv , ∂ I ( x,y,t ) ∂ t = S ( x, y , t ) R R Ω w ( x, y , u, v ) I ( u, v , t ) dudv − q ( x, y , t ) I ( x, y , t ) , ∂ R ( x,y ,t ) ∂ t = q i ( x, y , t ) I ( x, y , t ) .          (12) The function q ( x, y , t ) gives the rate of remo v al of infectiv es due to death, quaran tine, or recov ery . The weigh t function w ( x, y , u, v ) measures the influence of infectives at spatial p osition ( u, v ) on the exp osure of susceptibles at p osition ( x, y ); in this simulation we used the function w ( x, y , u, v ) = α exp  − (( x − u ) 2 + ( y − v ) 2 ) 1 / 2 /λ  , whic h expresses the idea that the influence of nearby infectives decays as an exp onen tial function of Euclidean distance, with constan t λ , c haracteristic of the distance at which the disease spreads. More mobile so cieties will hav e larger v alues of λ . The parameter α measures the infectiousness of the disease. A stochastic cell mo del is created b y treating the quan tities on the righ t-hand-side of (12) as the intensities of a P oisson pro cess and b y piecewise constan t integration ov er the cells. The domain Ω is decomp osed into nonov erlapping cells Ω i with cen ters ( x i , y i ) and areas A (Ω i ), i = 1 , . . . , K . The state in the cell Ω i is the random elemen t ( S i , I i , R i ), adv anced in time o ver the in terv al [ t, t + ∆ t ] by S i ( t + ∆ t ) = S i ( t ) − ∆ S i , I i ( t + ∆ t ) = I i ( t ) + ∆ S i − ∆ R i , R i ( t + ∆ t ) = R i ( t ) + ∆ R i , where the random incremen ts ∆ S i and ∆ R i are sampled from ∆ S i ∼ P ois  S i ( t ) P K j =1 w ( x i , y i , x j , y j ) I j ( t ) A (Ω i ) ∆ t  , (13) ∆ R i ∼ P ois ( q i ( t ) I i ( t ) A (Ω i ) ∆ t ) , 5 and q i ( t ) is the given remov al rate in the cell Ω i . The summation in (13) is done only ov er the cells Ω j near Ω i ; for far a wa y cells, the weigh ts w ( x i , y i , x j , y j ) are negligible. It is not necessary to compute a P oisson-distributed transmission rate from eac h source cell to a given target cell, b ecause a finite sum of indep endent Poisson-distributed random v ariables, eac h with its own intensit y parameter, is itself Poisson-distributed with an in tensit y parameter equal to the sum of the individual intensities. 5 Computational results W e ha v e c hosen to mo del an epidemic disease that first emerges in Congo. The computational domain is a square p ortion of cen tral Africa. In Figure 1 (a), we see the epidemic w av e 120 mo del time steps after the emergence of the disease. The b eha vior of the mo del is suc h that an y spurious infection will tend to grow in to a secondary infection wa v e. This is problematic for data assimilation b ecause the o ccurrence of spurious features is virtually guaran teed. W e attempt to reduce the o ccurrence and magnitude of these features using the morphing transformation and FFT EnKF; ho w ever, some amount of residual artifacts will remain. W e ha ve found that by pro cessing the mo del state in the follo wing manner, w e can further reduce these artifacts. W e b egin b y scaling the absolute quan tities contained in the mo del v ariables to a p ercen tage of the lo cal p opulation b efore p erforming the data assimilation. After data assimilation, w e truncate the v ariables to the range [0 , 1], and w e apply a threshold so that any infection rate b elow 1% is set to 0. Finally , we rescale the output in absolute units ensuring that the num b er of p eople at eac h grid cell is preserv ed. W e ha ve applied the FFT EnKF to the epidemic mo del describ ed in Section 4 with an ensem ble of size 5. Eac h ensem ble simulation was started with the same initial conditions, but with differen t random seeds, and adv anced in time b y 100 model time units, then p erturb ed randomly to obtain the initial ensemble. The analysis ensemble and data were adv anced in time an additional 20 mo del time steps for further assimilation cycles. In total, 3 assimilation cycles w ere p erformed in this manner. W e hav e p erturb ed each member of the initial ensem ble randomly in space by applying (10) to the eac h v ariable of the morphing represen tation of the mo del. The mappings T k for this p erturbation were generated from a space of smo oth functions that are zero at the b oundary . While the residuals r k are customarily initialized to smo oth random fields as w ell, w e hav e c hosen to set r k = 0 to av oid spurious infections. W e instead multiply eac h field after the inv erse morphing transform b y 1 + s k , where s k is another smo oth random field. This ensures that an initial infection rate of 0 is unchanged b y the p erturbation. A part of a t ypical ensem ble with spatial as w ell as amplitude v ariabilit y is shown Figure 1 (b). The output of the observ ation function used in this example consists of the Infe cte d field of the model. In this case, the data is a spatial “image” of the num b er of infected persons in eac h grid cell. The data were generated syn thetically from a mo del simulation, which was initialized in the same manner as the ensemble. F our v ariants of the EnKF were then applied: the standard EnKF and FFT EnKF and morphing EnKF and morphing FFT EnKF. The same initial ensemble and the same data 6 (a) (b) Figure 1: (a) The n um b er of p eople per kilometer squared infected, susceptible, and re mo ved after 120 time steps in a simulation of an epidemic disease spreading through cen tral Africa. These images corresp ond to v ariables I , S , and R in Equation (12). (b) Num b er of p eople infected p er kilometer squared in three forecast ensemble mem b ers. 7 w ere used for each metho d. The deviation of the initial ensem ble and the mo del error w ere c hosen so that the analysis should b e ab out half w a y b et ween the forecast and the data. In the morphing v ariants, the data deviation in the amplitude was taken very large, so that the filter up dates essentially only the p osition. Ensemble of size 5 was used. The result in the first assimilation cycle for each metho d is sho wn in Figures 2 and 3. The first image in each column is the forecast mean. In the morphing v arian ts, the mean is taken o v er all ensem ble members in all fields of the morphing representation (9) and it pla ys the role of the comparison state for registration. Thus, in the morphing v arian ts, b oth the amplitude and the p osition of the infection wa v e in the ensem ble members are av eraged. The second image in eac h column is the data, which is a mo del tra jectory started from the same initial state for eac h metho d. Because the mo del is itself sto chastic, the data images are slightly differen t. The third image in eac h column is the analysis mean, whic h is tak en in the morphing represen tation (11) for tw o morphing filters, so that b oth the amplitude and the lo cation are av eraged. W e see that both standard EnKF and FFT EnKF filters cannot mo ve the state tow ards the data; a muc h larger ensemble would b e needed. The morphing EnKF do es mov e the state tow ards the data, but there are strong artifacts due to the p o or approximation of the co v ariance b y the co v ariance of the small ensem ble. Finally , the morphing FFT-EnKF is capable of mo ving the state tow ards the data b etter. 6 Conclusion W e ha v e in tro duced morphing FFT EnKF and presen ted preliminary results on data assim- ilation for an epidemic simulation. Morphing was essential to mov e the state to wards the data, but it resulted in artifacts for the small ensemble size used, y et small ensem ble size is imp ortan t to p erform simulatio ns with data assimilation on general computing devices instead of sup ercomputers. W e hav e observed that the estimation of the cov ariance matrix in the frequency domain results in b etter forecast cov ariance in the algorithm, whic h has the p oten tial to reduce the artifacts due to small ensemble size. 8 (a) (b) Figure 2: The n um b er of p eople infected p er kilometer squared in analysis cycle 1 using the standard EnKF and FFT EnKF, each with ensem ble size of 5. Both approac hes are unable to mo ve the lo cation of the infection in the simulation state. 9 (a) (b) Figure 3: The n um b er of p eople infected p er kilometer squared in analysis cycle 1 using the morphing EnKF and morphing FFT EnKF, each with ensem ble size of 5. Both approaches are able to mo v e the state spatially and p erform similarly . Ho wev er, EnKF suffers from stronger artifacts due to low accuracy and low rank of the ensem ble cov ariance than the morphing FFT EnKF. 10 7 Ac kno wledgemen ts This w ork was partially supported by NIH grant 1 R C1 LM01641-01 and NSF gran ts CNS- 0719641 and A TM-0835579. References [1] E. Kalna y , A tmospheric Mo deling, Data Assimilation and Predictabilit y , Cam bridge Univ ersity Press, 2003. [2] G. Evensen, Data assimilation: The ensem ble Kalman filter, 2nd Edition, Springer V erlag, 2009. [3] J. D. Beezley , J. Mandel, Morphing ensem ble Kalman filters, T ellus 60A (2008) 131–140. [4] L. Bettencourt, R. Rib eiro, G. Chow ell, T. Lan t, C. Castillo-Chav ez, T ow ards real time epidemiology: data assimilation, mo deling and anomaly detection of health surveillance data streams, in: In telligence and Security Informatics: Biosurv eillance, V ol. 4506 of Lecture Notes in Computer Science, Springer, 2007, pp. 79–90. [5] C. Rho des, T. Hollingsw orth, V ariational data assimilation with epidemic models, Jour- nal of Theoretical Biology 258 (4) (2009) 591–602. [6] J. Mandel, J. D. Beezley , V. Y. Kondratenk o, F ast Fourier transform ensemble Kalman filter with application to a coupled atmosphere-wildland fire mo del, mS2010, submitted (2010). [7] E. Castrono v o, J. Harlim, A. J. Ma jda, Mathematical test criteria for filtering complex systems: plentiful observ ations, J. Comput. Ph ys. 227 (7) (2008) 3678–3714. [8] N. A. C. Cressie, Statistics for spatial data, John Wiley & Sons Inc., New Y ork, 1993. [9] D. Marcotte, F ast v ariogram computation with FFT, Computers & Geosciences 22 (10) (1996) 1175–1186. [10] G. Burgers, P . J. v an Leeuw en, G. Evensen, Analysis sc heme in the ensem ble Kalman filter, Mon thly W eather Review 126 (1998) 1719–1724. [11] J. Mandel, L. Cobb, J. D. Beezley , On the conv ergence of the ensem ble Kalman filter, arXiv:0901.2951, Applications of Mathematics, to app ear (January 2009). [12] J. Mandel, J. D. Beezley , J. L. Coen, M. Kim, Data assimilation for wildland fires: Ensem ble Kalman filters in coupled atmosphere-surface mo dels, IEEE Control Systems Magazine 29 (2009) 47–65. [13] R. F urrer, T. Bengtsson, Estimation of high-dimensional prior and p osterior co v ariance matrices in Kalman filter v arian ts, J. Multiv ariate Anal. 98 (2) (2007) 227–255. 11 [14] N. Bailey , Mathematical Theory of Epidemics, Griffin, 1957. [15] F. Hopp enstaedt, Mathematical Theories of P opulations, Demographics, and Epidemics, CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, 1975. 12

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment