Diffusion Maps Kalman Filter for a Class of Systems with Gradient Flows

1 Dif fusion Maps Kalman Filter for a Class of Systems with Gradient Flo ws T al Shnitzer , Student Member , IEEE , Ronen T almon, Member , IEEE , Jean-Jacques Slotine Abstract —In this paper , we propose a non-parametric method for state estimation of high-dimensional nonlinear stochastic dynamical systems, which evolv e according to gradient ﬂows with isotropic diffusion. W e combine diffusion maps, a manifold learning technique, with a linear Kalman ﬁlter and with concepts from K oopman operator theory . More concr etely , using diffusion maps, we construct data-dri ven virtual state coordinates, which linearize the system model. Based on these coordinates, we devise a data-driven framework for state estimation using the Kalman ﬁlter . W e demonstrate the strengths of our method with respect to both parametric and non-parametric algorithms in three tracking pr oblems. In particular , applying the approach to actual recordings of hippocampal neural activity in rodents directly yields a representation of the position of the animals. W e show that the pr oposed method outperforms competing non-parametric algorithms in the examined stochastic problem formulations. Additionally , we obtain results comparable to classical parametric algorithms, which, in contrast to our method, are equipped with model kno wledge. Index T erms —Intrinsic modeling, manifold learning, Kalman ﬁlter , diffusion maps, non-parametric ﬁltering. I . I N T R O D U C T I O N I N many real applications, the system model is not accessi- ble and some estimation of it is required. State estimation and characterization of stochastic, possibly nonlinear , dynam- ical systems are therefore widely studied problems. Tradition- ally , such problems are addressed using classical algorithms, which rely on predeﬁned parametric models. On the one hand, parametric models need to be sufﬁciently simple to allow accurate parameter estimation from measurements. On the other hand, too simple models often fail to accommodate the complexity of real systems. This facilitates the dev elopment of non-parametric methods [1]–[5]. Particularly in this paper, we take a non-parametric approach and propose a new method to deriv e the system model in a data-dri ven manner . T o demonstrate the primary idea, consider a classical nonlin- ear Simultaneous Localization and Mapping (SLAM) problem, where the goal is to track the 2D position x = ( x 1 , x 2 ) T of a moving object. T ypically , in the bearing-only version of this problem, the accessible system measurements are gi ven by the azimuth of an object, i.e. by φ = arc tan ( x 1 /x 2 ) . This nonlinear model, relating the state (position) of the system x to the measurements φ , poses a challenge for processing and analysis, since traditional linear methods cannot be applied. T al Shnitzer and Ronen T almon are with the Department of Electrical Engi- neering, T echnion – Israel Institute of T echnology , T echnion City , Haifa, Israel 32000 (e-mail: shnitzer@campus.technion.ac.il; ronen@ee.technion.ac.il). Jean-Jacques E. Slotine is with the Nonlinear Systems Laboratory , Mas- sachusetts Institute of T echnology , Cambridge, Massachusetts 02139, U.S.A. (e-mail: jjs@mit.edu). In [6], this nonlinearity is resolved by constructing virtual measurements, y , and measurement mapping, H , based on the knowledge of the system properties and model, which allow for the formulation of an equiv alent linear problem and the application of a linear time-varying Kalman ﬁlter [7]. Brieﬂy , since the system state is gi ven by x = (sin φ, cos φ ) , linearization is achie ved by deﬁning the virtual measurements y = H x + v , where v is noise and: H = h cos φ − sin φ i (1) such that H x = 0 . The resulting measurement equation is linear and can be constructed from the given measurements φ . In this work, analogously to the virtual measurements that linearize the system, we propose a computational method to construct the data-dri ven non-parametric counterparts of a virtual state, linearizing the system dynamics and measurement model. Howe ver , in contrast to [6], our construction of this virtual state is data-dri ven and does not require any explicit knowledge of the system properties or measurement modal- ity , e.g. the knowledge that the measurements represent the azimuth in a 2D tracking problem. One way to devise such a computational method, which has recently drawn signiﬁcant research attention, is to address the problem of data-driv en system analysis and state estimation from an operator-theoretic point of view . In this approach, the dynamical system is described by two dual operators, the Perron-Frobenius operator , which represents probability density e volution, and the K oopman Operator , which is deﬁned on some linear functional space of inﬁnite dimension and describes the time-e volution of observables [8], [9]. In the context of empirical dynamical systems analysis, the main challenge is to approximate these operators from the system measurements. Se veral methods for estimating the Koopman Operator have been proposed in recent years [9]–[14]. For example, the Extended Dynamic Mode De- composition (EDMD) [9], [10] approximates the Koopman eigenfunctions and modes based on two sets of points, related through system dynamics, and a set of dictionary elements. Howe ver , the optimal choice of the dictionary in EDMD depends on the data [10]. This frame work for estimating the K oopman eigenfunctions and modes was later employed in [3] as part of a non-parametric Kalman ﬁltering framew ork, where a linear Kalman ﬁlter is constructed based on the approximation of the Koopman Operator , along with its eigen- vectors, eigen values and K oopman modes using EDMD. The Kalman ﬁlter propagates the system in the space spanned by the K oopman eigen vectors and then the resulting estimates are 2 projected back to the state space. Other work on the analysis of nonlinear stochastic dynamical systems based on Koopman operator theory includes [11], [12], [15]. The work in [15] offers a formal deﬁnition and rigorous mathematical analysis for the generalization of the K oopman operator to stochastic dynamical systems. In addition, a new framework for approx- imating the eigenfunctions and eigen values of this stochastic K oopman operator is presented. A different approach is pro- posed in [11] and [12], where the authors characterize the long term behavior of a system (asymptotic dynamics) based on time-a verages of functions. In [11], in variant-measures are deﬁned and it is sho wn that these measures coincide with the eigenfunctions of the Koopman Operator and can be simply calculated by the Fourier transform. In [12], this framework is extended to address dynamical systems which are not measure- preserving using Laplace av erages. In both [11] and [12], sev eral trajectories of the system are needed for the analysis. Different related non-parametric frame works for state es- timation in stochastic dynamical systems based on geometry and manifold learning were presented in [5], [16]–[18]. In [5] and [16] a probabilistic approach is proposed, in which the problem is projected onto coordinates constructed by diffusion maps [19], a manifold learning technique. In these diffusion maps coordinates, the probability density function of the system state can be propagated in time without prior knowledge of the system dynamics, yet, the underlying system state is assumed to be accessible. The work in [17] proposes to construct an ensamble Kalman ﬁlter based on delay embed- ding coordinates (T akens embedding [20]), used for dynamics estimation. The propagation in time is estimated based on nearest neighbors of the current time-lag frame. In [18], a framew ork for estimating the eigenfunctions of the Koopman Operator generator based on diffusion maps is presented. This work discusses the relationship between diffusion maps and the K oopman operator and proposes to estimate the Koop- man generator eigenfunctions using the eigenfunctions of the Laplace-Beltrami operator approximated by diffusion maps. The relationship between diffusion maps and the Koopman Operator is further discussed in [8]. There, it is sho wn that EDMD can also provide data-driven dimensionality reduction. Moreov er , for systems in which the system state is described by a Markov process, the eigenfunctions of the backward Fokker -Planck operator can be approximated using EDMD, similarly to dif fusion maps. The main beneﬁt of such a manifold learning approach using EDMD is that both the dynamics and the geometry of the underlying state are taken into account. In this paper , we present a framework for state estimation of stochastic dynamical systems, with state dynamics of gradient ﬂows and isotropic diffusion, which re veals the system model with minimal prior assumptions, using dif fusion maps [19] and the Kalman ﬁlter . W e assume that we are giv en a set of noisy measurements from some unkno wn nonlinear function of a stochastic underlying state and sho w that a linear model de- scribing the system can be rev ealed, e ven for highly nonlinear systems. This is obtained in a completely data-dri ven manner, based on virtual state coordinates constructed by dif fusion maps and their inherent dynamics [21]. A Kalman ﬁlter is then formulated based on the recovered system model and uti- lized for non-parametric state estimation. By constructing the Kalman ﬁlter based on the recovered model, we incorporate system dynamics into diffusion maps, combining geometry and dynamics, as in [8], from a new manifold learning standpoint. W e further show that our method uncovers an operator describing the system dynamics, which is analogous to the Stochastic K oopman Operator [8], an extension of the K oopman Operator for stochastic systems. Moreo ver , our devised method is well suited for high-dimensional systems due to the nonlinear dimensionality reduction obtained by diffusion maps. The framework in this paper is tightly related to [22], in which a linear observer was constructed by exploiting similar properties of the diffusion maps coordinates and their dynam- ics. W e sho w that the Kalman ﬁlter framew ork introduced in the present work outperforms the observer framework both by reducing the amount of hyper -parameters and by signiﬁcantly improving the results in certain scenarios. W ith respect to previous work, our method encompasses sev eral key differences. First, it does not rely on accessibility to the state of the system (as assumed in [5], [16]). Second, it does not require predeﬁned dictionary elements (as required in [10]). Third, our method addresses stochastic dynamical systems in contrast to the deterministic systems considered in [3], [17]. Finally , it does not require a training set of samples with known states, in contrast to common non-parametric ﬁltering algorithms [1], [2]. The remainder of the paper is organized as follows. In Section II, the general problem setting is presented. In Section III, we ﬁrst overvie w the key-points of our method and then describe the deriv ation of the Diffusion Maps Kalman ﬁlter frame work in detail. In Section IV, we demonstrate the strengths of our method on two object tracking problems. W e compare our method with both parametric and non-parametric methods. Section V concludes the paper with a short summary . I I . P R O B L E M F O R M U L A T I O N Consider an ergodic stochastic dynamical system with some nonlinear generator S : M → M , where M is a compact Riemannian manifold of dimension d . The system is deﬁned by: ˙ θ t = S ( θ t , u t ) (2) z t = g ( θ t ) + v t (3) where θ t ∈ M is the system state, ˙ θ t is its time deriv ativ e, u t ∈ R d is the process noise, z t ∈ R m are the system measurements through some unkno wn nonlinear function g and v t ∈ R m is the measurement noise. The ev olution in time of such a dynamical system can be described by the Stochastic K oopman Operator [8], which is deﬁned by ( U st f ) ( θ t ) = E [ f ◦ T ( θ t , u t )] (4) where T is the ﬂow induced by S , ◦ is the composition operator , f : M → R are some observ ables from an inﬁnite dimensional vector space, closed under composition with T , and E denotes expectation. 3 One of the notable properties of the K oopman Operator, which has increased its usage in a line of recent work, is that it is linear in the space of observ ables, ev en for highly nonlinear dynamical systems: U st ( α 1 f 1 + α 2 f 2 ) = E [( α 1 f 1 + α 2 f 2 ) ◦ T ] = α 1 E [ f 1 ◦ T ] + α 2 E [ f 2 ◦ T ] = α 1 U st f 1 + α 2 U st f 2 . Howe ver , the tradeoff is that ev en for ﬁnite dimensional dynamical systems, the Koopman Operator acts on an inﬁnite dimensional space of observ ables. In this work, we focus on the case in which the dynamics equation (2) tak es the form of a Langevin equation: ˙ θ t = −∇ U ( θ t ) + r 2 β ˙ u t (5) where U ( θ t ) is a smooth and bounded potential, q 2 β is a constant dif fusion coef ﬁcient, u t is Bro wnian motion and ˙ u t is its time deri v ativ e. The goal in this work is to build a ne w coordinate sys- tem representing the hidden system state θ t and devise a ﬁltering framework in these new coordinates, giv en only the measurements z t , without any prior kno wledge on the system equations (2) and (3). I I I . D I FF U S I O N - B A S E D K A L M A N F I LT E R In this section we lay the foundation for our proposed framew ork and present the theoretical results which allo w for the construction of a data-driv en linear Kalman ﬁlter . A. Overview W e present a method based on diffusion maps that discovers a new coordinate system describing a model of the state of the system with linear drift in a completely data-driv en manner . Based on this linear drift, our method constructs a linear operator , analogous to the Stochastic Koopman Operator , from measurements, without prior model knowledge. By exploiting the linearity of this operator , we will formulate a linear Kalman ﬁlter framework, which allows for estimation of trajectories of the underlying system state based on noisy nonlinear measurements. Now we will brieﬂy ov ervie w the key points of our method, which is described in detail in the following subsections. Giv en noisy measurements z t ∈ R m , we use diffusion maps to represent the system state θ t by a new set of k coordinates, denoted by Φ t , where k < m . W e will sho w that in this new coordinate system, the dynamical system can be described by the following linear equations: ˙ Φ t = F Φ t + Q 1 / 2 t ˙ ω t (6) z t = H Φ t + R 1 / 2 t v t (7) where F is a linear operator describing the linear drift of the dynamics of the new coordinates Φ t , H is a linear lift operator from the ne w coordinates Φ t to the measurements z t , ˙ ω t is a standard normally distributed noise process, v t is measurement noise, and Q t and R t are matrices determining the covariance of the dri ving and measurement noise processes, respectively . W e will further sho w that the linear operators F and H , in (6) and (7) respectiv ely , can be constructed using diffusion maps in a data-driv en manner , providing a coarse approximation of the system. Y et, this construction can be further improv ed since diffusion maps ignore the inherent time-dependencies between consecutiv e samples. Therefore, we formulate a linear Kalman ﬁlter using the constructed coordinates Φ t and the recovered system operators F and H , thereby impro ving the state estimate by incorporating the dynamics into the diffusion maps coordinates. This leads to a data-driv en linear ﬁltering framework, which can be applied to nonlinear systems with an unknown model, rev ealing a new representation of the system b Φ t , which is tightly related to the system state θ t as will be described in Subsection III-B. In addition, our framework allows for the estimation of speciﬁc realizations of system trajectories based on measurements, in contrast to most existing work on the Stochastic K oopman Operator , which only represent the a verage time-ev olution. The remainder of this section is described as follo ws. In Subsection III-B and Subsection III-C, we reiterate the deriv ations presented in [22] for state and model recovery using dif fusion maps [19]. In Subsection III-D, we present our proposed Kalman ﬁlter framew ork. In Subsection III-E, we discuss the observability and detectability of the proposed framew ork and in Subsection III-F, we elaborate on the relation between our frame work and the Stochastic Koopman Operator . B. Recovering the State Giv en noiseless measurements, z t = g ( θ t ) , of the hidden system state, θ t , the follo wing k ernel is deﬁned k  ( s, t ) = exp  − d 2 ( z s , z t )  2  (8) where  > 0 is a k ernel scale, traditionally set to the median of the distances between the measurements, and d ( z s , z t ) is a distance function between z s and z t . In our case, we cal- culated this distance using a modiﬁed Mahalanobis distance, ﬁrst presented in [23]: d ( z s , z t ) = r 1 2 ( z s − z t )  C † s + C † t  ( z s − z t ) T (9) where C † s and C † t are the psuedo-in verse of the measurement cov ariance matrices at times s and t , respectiv ely . In [23], it was shown that this modiﬁed Mahalanobis distance between the measurements approximates the Euclidean distance be- tween the hidden states. The constructed k ernel is then normalized according to p  ( s, t ) = k  ( s, t ) d  ( s ) (10) where d  ( s ) = ´ k  ( s, t ) p eq ( θ t ) d θ t and p eq ( θ t ) = e − U ( θ t ) is the equilibrium density of the hidden state θ t . W e can then deﬁne the operator P  by ( P  f ) ( θ s ) = ˆ p  ( s, t ) f ( θ t ) p eq ( θ t ) d θ t where f is a real function deﬁned on the hidden state θ t . Based on P  we deﬁne L  = P  − I  (11) 4 where I is the identity operator . In [23] it was shown that for hidden states θ t with dynamics as in (5), the operator L  con verges to the backward F okker -Planck operator L deﬁned on the manifold M , as  → 0 : L q = 1 β ∆ q − ∇ q · ∇ U (12) where q denotes functions in a subspace of observables deﬁned on the system state, which describe averages of functions: q ( θ t ) = E [ h ( θ t ) | θ 0 = a 0 ] , where a 0 is some initial state, and h is smooth. The operator L  has a discrete set of decreasing eigen values, {− λ ( ` ) } ` ∈ N , 0 = − λ (0) > − λ (1) ≥ − λ (2) ≥ ... , and eigenfunctions, { φ ( ` ) } ` ∈ N [19]. In many stochastic systems, these eigenv alues exhibit a spectral gap, implying on only a few dominant eigen values (which are close to 0 ) [21]. In such systems, we can represent the hidden state using only the principal eigenfunctions, corresponding to the largest (non- trivial ` 6 = 0 ) eigen values. The diffusion maps coordinates are then obtained by calculating the eigen value decomposition of L  and using the k eigenfunctions corresponding to the k largest eigen values: z t 7→ Φ ( θ t ) = h φ (1) ( θ t ) , φ (2) ( θ t ) , ..., φ ( k ) ( θ t ) i (13) It was shown in [23] that when using a kernel based on the modiﬁed Mahalanobis distance (9), the eigenfunctions of L  con verge to the eigenfunctions of the backward Fokker -Planck operator L , deﬁned on the hidden state θ t , as  → 0 . The diffusion maps coordinates in (13) are tightly related to the system state. Consider the adjoint of L , which is known as the forward Fokk er-Planck operator . The forward Fokk er- Planck operator exhibits two important properties. First, it describes the ev olution of the transition probability density , p ( θ , t | θ 0 ) . Second, it is Hermitian, and therefore, its eigen- functions form a basis for the space of real functions of the system state (with the equilibrium density as a measure p eq ( θ ) = exp − U ( θ ) ). In addition, since the backward and forward operators are adjoint, their eigenfunctions can be normalized to be bi-orthonormal, and the eigenfunctions of the backward operator can be used as a bi-orthonormal basis. Due to this connection between the eigenfunctions of the backward and forward operators and since the eigenfunctions of the forward operator are tightly related to the system state, the eigenfunctions of the backward Fokker -Planck operator can be used to represent the system state as well. Therefore, from this point on, the diffusion maps coordinates, denoted by φ ( ` ) , which approximate these eigenfunctions, can be considered as a new set of coordinates representing the hidden system state. Note that the above results are obtained only when we have access to clean measurements z t = g ( θ t ) . W e will address this issue in Subsection III-D. C. Recovering the Model W e will no w show that the representation of the system state using the diffusion maps coordinates (13) can be used to construct a data-dri ven representation of the system model. The Dynamics of the Diffusion Maps Coor dinates: Consider the state dynamics in (5) measured through z t = g ( θ t ) . For such systems, based on Itô’ s Lemma, the eigenfunctions of the operator L  , obtained using the diffusion maps algorithm, ev olve according to a stochastic differential equation (SDE) of known form [21]: ˙ φ ( ` ) ( θ ) = − λ ( ` ) φ ( ` ) ( θ ) + √ 2 k∇ θ φ ( ` ) ( θ ) k 2 ˙ ω ( ` ) (14) where φ ( ` ) ( θ ) is the ` th eigenfunction of the operator L  , − λ ( ` ) is the corresponding eigen value, ω ( ` ) is the ` th coordi- nate of a multidimensional Bro wnian motion process, where each coordinate is some linear combination of the Brownian motion coordinates of the underlying process, u ( ` ) , and k · k 2 is the L 2 norm. Note that we omitted the time notation of the state ( θ t ) to indicate that the eigenfunctions are dependent only on the state. This SDE depicts that the obtained diffusion maps coor- dinates ev olve according to a linear drift, − λ ( ` ) φ ( ` ) ( θ ) , and an additional diffusion component. W e note that the linear drift component is fully known since we have, using diffusion maps, both the eigenfunctions and the corresponding eigen- values of the operator L  , which approximates the backw ard Fokker -Planck operator . This linear drift will be used as the new linear state dynamics. Constructing the Lift Function: As stated in Subsection III-B, the eigenfunctions of the backward F okker-Planck op- erator , approximated by diffusion maps, form a basis for the space of real functions deﬁned on the system state. Therefore, ev ery real function of the system state can be written as an expansion in these eigenfunctions. Speciﬁcally , we can represent the measurement function in the follo wing manner: z ( i ) t = g ( i ) ( θ t ) = ∞ X ` =0 α i,` φ ( ` ) ( θ t ) (15) where α i,` =  z ( i ) , φ ( ` )  p eq = ´ M z ( i ) t φ ( ` ) ( θ t ) p eq ( θ t ) d θ t . When the eigenv alues of the backward Fokker -Planck operator exhibit a spectral gap, most of the ener gy is captured by the k eigenfunctions, corresponding to the k largest eigen values. In such cases we can approximate the mapping in (15) using only these k eigenfunctions: z ( i ) t ≈ k X ` =0 α i,` φ ( ` ) ( θ t ) (16) W e can no w write expression (16) in matrix form: z t ≈ α Φ ( θ t ) (17) where Φ ( θ t ) is deﬁned in (13) and α is an m × k matrix in which ( α ) i,` = α i,` . These results imply that through the representation of the measurement function using the diffusion maps eigenfunc- tions, we obtain a linear mapping between the eigenfunctions, Φ ( θ t ) , and the system measurements, z t . Thus, we set the linear lift function simply to be α . T o conclude this subsection, we note that all of the theoret- ical deriv ations above are true for state equations of the form (5). Ho wev er , it is not essential that speciﬁcally the state will 5 exhibit such dynamics, but rather that the dynamics of some underlying system parameter are governed by the Lange vin equation. In such systems, the diffusion maps coordinates, constructed using the modiﬁed Mahalanobis distance (9), exhibit useful properties and can be used as a foundation for the proposed time-series ﬁltering framework, as described in the remainder of this paper . Since many natural phenomena are gov erned by dynamics that can be modeled using Langevin equation (5), our frame work is applicable to a wide range of problems. D. Diffusion Maps Kalman F ilter Based on the Euler-Maruyama method, the dynamics of the diffusion maps coordinates can be discretized to φ ( ` ) ( θ n +1 ) =  1 − λ ( ` ) ∆ t  φ ( ` ) ( θ n ) + √ 2 k∇ θ φ ( ` ) ( θ n ) k 2 ∆ ω ( ` ) n (18) where ∆ t is the time step, ∆ ω ( ` ) t is a normally distributed noise process and ∇ θ denotes the gradient according to the system state θ n . T o approximate the linear drift in (18) from discrete measurements, discrete approximations of φ ( ` ) ( θ n ) and λ ( ` ) are required. These approximations can be obtained using the discrete counterpart of diffusion maps as follo ws. Giv en discrete-time measurements { z n } N n =1 , the N × N kernel matrix is calculated similarly to (8) by: K ( i, j ) = exp  − d 2 ( z i , z j )  2  (19) where  > 0 is the kernel scale and d ( · , · ) is the modiﬁed Mahalanobis distance (9). The kernel matrix is then normalized to be row-stochastic P ( i, j ) = K ( i, j ) D ( i ) (20) where D ( i ) = P N j =1 K ( i, j ) . From the eigen value-decomposition of P we obtain a set of eigen vectors and eigen values, { ψ ( ` ) } k ` =0 , { µ ( ` ) } k ` =0 . It was shown in [19], that in this discrete setting, the dif fusion maps eigen vectors, { ψ ( ` ) } k ` =0 , approximate the continuous diffusion maps eigenfunctions, { φ ( ` ) } k ` =0 , discussed in the previous subsections. Moreo ver , it was shown in [23], that the eigen values of the discrete diffusion maps algorithm can be used to approximate the eigenv alues of the continuous operator according to − λ ( ` ) = 2  log µ ( ` ) . W e rewrite the state and measurement equations in the discrete setting using the diffusion maps coordinates: Ψ n +1 = ( I + Λ) Ψ n + Q 1 / 2 n ∆ ω n (21) z n = α Ψ n + R 1 / 2 n v n (22) where Ψ n = [ ψ (1) n , ψ (2) n , . . . , ψ ( k ) n ] , Λ is a diagonal ma- trix with {− λ ( ` ) ∆ t } k ` =1 as its diagonal elements, I is the identity matrix, Q 1 / 2 n is a matrix containing the coefﬁcients of the second term in (18), ∆ ω n is a vector of standard normally distributed noise processes, R n is the cov ariance of the measurement noise and α is the lift function from the diffusion maps coordinates to the measurements. For simplicity of notation, we denote Ψ n = Ψ ( θ n ) , omitting the dependence on θ n . Importantly , note that these system equations are approximately linear , e ven for highly nonlinear systems. Therefore, using the diffusion maps coordinates, we obtain a virtual system state which linearizes the problem, in an entirely data-dri ven manner, giv en only the measurements. In the discrete setting, the lift function is approximated by α i,` = P N n =1 z ( i ) n ψ ( ` ) ( θ n ) [22]. The term α Ψ n in (22) approximates the nonlinear mea- surement function g ( θ n ) . Therefore, the noise term in (22), denoted by R 1 / 2 n v n , is approximately the measurement noise from (3). Ho wever , since the dif fusion maps coordinates are constructed based on the noisy measurements, z n , the approximation of the measurement function may be inaccurate, leading to non-Gaussian components in R 1 / 2 n v n . In re gard to this issue, we note that in many cases the leading dif fusion maps eigenv ectors (corresponding to the largest eigen v alues) represent the slo w components in the data, and therefore, are only mildly affected by the noise. This property facilitates the use of our devised Kalman model in man y applications, as demonstrated in the experimental study in Section IV, ev en though the model is inaccurate due to the measurement noise. W e further note that the noise term in (22) can be used to represent deviations from the true model as described in [24]. Howe ver , signiﬁcant measurement noise may still lead to some deterioration in the performance of the proposed method, as demonstrated in Subsection IV -A. Due to the linearity of the derived system in (21) and (22) (except for elements in Q n as discussed in a subsequent paragraph), we can construct a linear Kalman ﬁlter based on the diffusion maps coordinates. Using the Kalman ﬁlter frame- work we harness the linear dynamics of the diffusion maps eigenfunctions to improve the approximation of the eigenv ec- tors of the discrete diffusion maps algorithm. Therefore, our framew ork incorporates the inherent time-dependencies of the system samples into a manifold learning technique. Our proposed framework is highly related to the method presented in [22], where a linear observer was constructed based on the system equations described in (21) and (22), by exploiting the linearity of the data-dri ven representation in a similar manner . The Kalman ﬁltering framework improv es the linear observer frame work in [22] in two main aspects. First, the constructed observer scheme is a deterministic frame work which discards the stochastic term of the dynamics, whereas the Kalman ﬁlter takes it naturally into account. Second, the Kalman ﬁlter provides an adaptive optimal update of the ﬁxed model parameters in [22]. W e empirically show in Section IV, that the adapti ve parameter update and the stochastic frame- work signiﬁcantly improv e the state recovery and robustness to noise, and outperforms the observ er framew ork as well as other competing methods. W e note that Q n is a nonlinear function of the state and induces dependencies between different time-samples of the system noise process, which does not ﬁt the Kalman ﬁlter framew ork. Howe ver , in man y applications, the leading diffu- sion maps eigen vectors, which are used as a low-dimensional representation of the system, are slowly varying functions of 6 the system state. Therefore, the gradient of ψ ( ` ) according to θ n in (18) may be sufﬁciently small (or approximately constant), allowing for the proper use of the Kalman ﬁlter . These properties are demonstrated in simulations in Section IV. An alternativ e approach to address the dependence of the driving noise on the state would be using a particle ﬁlter instead of the Kalman ﬁlter . P article ﬁlters are designed to support a wider range of driving noise distrib utions [25], [26]. Howev er, our empirical study showed that the Kalman ﬁlter signiﬁcantly outperformed the particle ﬁlter , deﬁned in a corresponding manner using the derived model, since the particle ﬁlter is sensitive to errors in the estimation of the gradient of ψ ( ` ) . Improving our methods by appropriately approximating the driving noise remains a subject for future work. The implemented Kalman ﬁlter frame work is described as follows: b Ψ n = F n b Ψ n − 1 + κ n  z n − H n F n b Ψ n − 1  P n = ( I − κ n H n )  F n P n − 1 F T n + Q n  (23) κ n =  F n P n − 1 F T n + Q n  H T n  H n F n P n − 1 F T n H T n + H n Q n H T n + R n  − 1 where b Ψ n is the state estimate at time n , z n is the mea- surement at time n , I is the identity matrix, F n = I + Λ represents the dyn amics of the state, H n = α is the lift function between the measurements and the state calculated in a data driven manner as described after equation (22), Q n is the co variance matrix of the state driving noise, and R n is the co variance matrix of the measurement noise. W e refer to the proposed Kalman ﬁlter framework as the Dif fusion Maps Kalman (DMK). In the e xperiments presented in Section IV, the covariance matrices, R n and Q n , were estimated from the data according to R n ( k , k ) = v ar  z ( k )  and Q n ( k , k ) = v ar  λ ( k ) ψ ( k )  , respectiv ely , where ψ ( k ) is the k th dif fusion maps eigen vector corresponding to eigenv alue µ ( k ) and − λ ( k ) = 2 / log µ ( k ) . This empirical choice led to good results in all applications. Howe ver , it is possibly an underestimation of the cov ariance and could be further improved. For example the adaptiv e estimation for R n and Q n described in [27] could be used for unknown system dynamics and measurement function. In addition, this issue will be addressed in future work, where a method for properly estimating the variance of the eigenfunctions in (14) will be devised. E. Observability and Detectability of the New System Equa- tions In this subsection, we discuss the conditions under which the proposed discrete linear system in (21) and (22) is observ- able and detectable. T o address the observability and detectability conditions, we ﬁrst recall that in the proposed system, the state transition matrix is giv en by F = I + Λ , which is a full rank diagonal matrix, and the observation matrix is given by H = α , where α is deﬁned in Subsection III-C, after equation (15). Pr oposition 1: The system in (21) and (22) is observ able if ∀ ` = 1 , . . . , k , ∃ p = 1 , . . . , m such that D z ( p ) , ψ ( ` ) E 6 = 0 , where D z ( p ) , ψ ( ` ) E = P N n =1 z ( p ) n ψ ( ` ) ( θ n ) . In other words, for observability it is required that the k diffusion maps eigen vectors, used in the construction of the system, are not orthogonal to all measurement coordinates. Pr oof: In the discrete and linear Kalman ﬁlter setting, a system is observable if there are no vectors v ( ` ) 6 = 0 such that F v ( ` ) = γ ( ` ) v ( ` ) and H v ( ` ) = 0 , where v ( ` ) and γ ( ` ) denote the ` th eigen vector and corresponding eigenv alue of F , respectiv ely [28]. Since F is a diagonal matrix in (21), its eigenv ectors, v ( ` ) , contain 1 at index ` and zeros else where. Therefore, H v ( ` ) is the ` th column of H , which based on Subsection III-C, corresponds to the follo wing vector: H v ( ` ) =      D z (1) , ψ ( ` ) E . . . D z ( m ) , ψ ( ` ) E      (24) where z ( i ) denotes the i th measurement coordinate, ψ ( ` ) , ` = 1 , . . . , k , is the ` th diffusion maps eigen vector and h· , ·i denotes the inner product between two vectors. From this deriv ation, for observ ability we require that H v ( ` ) 6 = 0 , ∀ ` = 1 , . . . , k , i.e. that the diffusion maps coordinates are not orthogonal to all measurement coordinates. Pr oposition 2: If ∃ ` = 1 , . . . , k , such that D z ( p ) , ψ ( ` ) E = 0 , ∀ p = 1 , . . . , m , then the system is detectable if 1 − λ ( ` ) ∆ t < 0 , where − λ ( ` ) is the ` th eigenv alue of the continuous operator , approximated using dif fusion maps, ψ ( ` ) is the corresponding eigen vector and ∆ t denotes the time step. Pr oof: A discrete linear system is detectable, if there is no v ( ` ) 6 = 0 and γ ( ` ) such that F v ( ` ) = γ ( ` ) v ( ` ) , H v ( ` ) = 0 with γ ( ` ) + ( γ ( ` ) ) ∗ ≥ 0 , where () ∗ denotes the complex conjugate [28]. In the constructed system, the eigen values, γ ( ` ) , of F are real and equal to the diagonal elements of F , which take the form of 1 − λ ( ` ) ∆ t . Therefore, the constructed system is detectable if 1 − λ ( ` ) ∆ t < 0 , when D z ( p ) , ψ ( ` ) E = 0 , ∀ p = 1 , . . . , m , i.e. γ ( ` ) < 0 for ψ ( ` ) , for which H v ( ` ) = 0 . Note that by using only diffusion maps coordinates that are not orthogonal to all system measurements in the construction of the system, both propositions hold. Therefore, an informed choice of the diffusion maps coordinates leads to an observable and detectable system. According to common practice, we take the leading k diffusion maps eigen vectors, corresponding to the largest eigen values, since these coordinates typically represent the system and are only mildly affected by the noise. Moreov er , the choice of the ne w system state dimensionality , k , can be motiv ated by the two propositions, since we expect that eigen vectors which ar e orthogonal to the measurements, e.g. due to noise, will correspond to smaller eigen values, i.e. will have larger indexes ` . Empirically , following this procedure led to observable and detectable systems in the experimental study in Section IV. 7 F . K oopman Operator Our presented framew ork is tightly related to the K oopman Operator . Speciﬁcally , we show the analogy of the revealed dynamics in the present work to the Stochastic Koopman Operator [8], ( U st f ) ( θ n ) = E [ f ◦ T ( θ n , ω n )] . Gi ven mea- surements from some stochastic nonlinear system with state dynamics of the form of (5), we project the problem onto the eigen vector space obtained by diffusion maps. By taking these eigen vectors as observables we obtain a space in which the e volution of the observables is represented by a known linear operator:  ˜ U st ψ ( ` )  ( θ n ) = E h ψ ( ` ) ◦ T ( θ n , u n ) | Ψ n i = E "  1 − λ ( ` ) ∆ t  ψ ( ` ) ( θ n ) + √ 2 k∇ θ ψ ( ` ) ( θ n ) k 2 ∆ ω ( ` ) n | Ψ n # =  1 − λ ( ` ) ∆ t  ψ ( ` ) ( θ n ) (25) where ˜ U st is analogous to the Stochastic K oopman Operator . Note that in contrast to the standard Stochastic Koopman Operator , ˜ U st is conditioned on Ψ n . The use of diffusion maps for the approximation of the K oopman Operator is discussed in detail in [18]. In particular a Galerkin method for approximating the eigenfunctions of the K oopman generator using the diffusion maps eigenfunctions and eigen v alues is presented. Furthermore, it was shown that for ergodic systems with pure-point spectra, the eigenfunctions of the Koopman generator can be robustly estimated from ﬁnite data using diffusion maps. Importantly , our method is completely different than the method presented in [18], since we approximate the Stochastic K oopman Operator , whereas in [18] the generator is approximated. In our proposed frame work, we combine the constructed linear operator and observables, obtained by diffusion maps, with a Kalman ﬁlter . This leads to two main beneﬁts. First, instead of representing the a verage time e volution of the observables in stochastic systems, we obtain an estimation of individual realizations of trajectories from the measurements. In addition, the Kalman ﬁlter compensates for the noise and deviations from the measurements. Second, due to the use of diffusion maps, we obtain a data-driven dimensionality reduction and approximate the Koopman Operator based on a ﬁnite set of orthonormal functions, spanning the state space of the system [21]. A related work, combining the Koopman Operator and a Kalman ﬁlter is presented in [3]. There, the authors deﬁne the K oopman Observer Form (KOF) for noiseless systems and the K oopman Kalman Filter (KKF) for systems with measurement noise. They construct a set of linear update equations based on the eigen vectors and modes of the K oopman Operator which provides a linear ﬁltering framework for nonlinear systems, where the K oopman Operator of a giv en data-set is approximated using EDMD [10]. The EDMD algorithm requires a dictionary of basis functions which af fects the resulting estimations [10]. Con versely , in the proposed w ork, we obtain the linear dynamics and observ ables based on the data, from the diffusion maps algorithm without a pre- deﬁned dictionary . Another difference is the problem setting, which includes in the present work stochastic system dynamics rather than measurement noise only . W e compare our proposed framew ork to the one presented in [3] in Subsection IV -A. Our framework can also be partially related to the work presented in [4]. There, linear update equations are learned from the data for chaotic systems using concepts from K oop- man theory . In order to represent the chaotic dynamics using a ﬁnite approximation, a nonlinear forcing term representing the deviation from linearity is added. In our work, we rely on the dynamics of the diffusion maps eigenfunctions, which can be expressed as a sum a linear drift component and a nonlinear stochastic component. The stochastic component represents deviations from the simple linear dynamics and can be used for error analysis as well. In the proposed Kalman ﬁlter framew ork we consider the stochastic component as the system noise. I V . E X P E R I M E N TA L R E S U LT S In this section we present two simulated examples of object tracking and a real tracking application based on neuronal spiking acti vity . All three tracking problems are nonlinear with unknown system dynamics and measurement functions, where each example depicts a different measurement modality . W e compare our proposed Diffusion Maps Kalman (DMK) to sev eral competing algorithms, which are detailed in the fol- lowing. W e show that our DMK framew ork leads to improved state estimates compared with non-parametric algorithms . In addition, it obtains results which are comparable with parametric methods , which, in contrast to DMK, are provided with the system model. A. Nonlinear Object T racking W e ﬁrst present a model with Gaussian noise, where the location of an object in a 2-dimensional space is measured through its radius and azimuth angle. The underlying process, describing the Cartesian position of the object at each time point is given by the follo wing discrete time Langevin equa- tions: ∆ θ (1) n +1 = − 1 2  θ (1) n − 1  3 +  θ (1) n − 1  + √ 2 u (1) n ∆ θ (2) n +1 = − 1 2  θ (2) n − 6  3 +  θ (2) n − 6  + √ 2 u (2) n (26) where u (1) n and u (2) n denote standard Gaussian noises and the drift terms in these equations describe double-well potentials. An example of the resulting 2-dimensional process is pre- sented in Figure 1 The object location is measured in polar coordinates, through the azimuth and radius: φ n = arc tan θ (1) n θ (2) n ! r n = r  θ (1) n  2 +  θ (2) n  2 (27) 8 -2 -1 0 1 2 3 4 4 5 6 7 8 Fig. 1: Example of the underlying 2-dimensional process. and the system measurements are created by adding Gaussian noise, z n = h φ n + v ( φ ) n , r n + v ( r ) n i , where v ( φ ) n and v ( r ) n are Gaussian noise processes with variance σ 2 φ and σ 2 r , respec- tiv ely . W e created trajectories of 1000 samples with a time- step of ∆ t = 0 . 01 and different Signal-to-Noise (SNR) ratios, by varying σ 2 φ and σ 2 r . W e applied the diffusion maps algorithm to the mea- surements, z n , with  set as the median of the distances. The cov ariance matrices used in the modiﬁed Mahalanobis distance (9) were estimated similarly to previous work [29], by calculating the empirical cov ariance of ov erlapping win- dows of 30 time frames, centered at each measurement, i.e. C n = co v( z n − N : n + N − 1 ) where N = 15 . Since the covariance matrices are not necessarily full rank, their pseudoinv erse was calculated using singular value decomposition (SVD). W e then constructed the Dif fusion Maps Kalman (DMK) based on the ﬁrst two largest eigenv alues and corresponding eigen vectors obtained by dif fusion maps. The dimensionality of the diffu- sion maps coordinates was determined based on the existence of a signiﬁcant spectral gap after the second coordinate. The cov ariance matrices in the Kalman ﬁlter update equations, R n and Q n in (23), were estimated from the data, according to R n ( k , k ) = v ar  z ( k )  and Q n ( k , k ) = v ar  λ ( k ) ψ ( k )  , where − λ ( k ) = 2 / log µ ( k ) and µ ( k ) and ψ k are the k th diffusion maps eigen value and eigen vector , respectively . Using this setting, we ev aluated our DMK algorithm, which requires access only to the noisy measurements, z n , and compared it to a parametric algorithm, the particle ﬁlter (PF). For ev aluation purposes, the particle ﬁlter w as pro vided with the true system model , which is considered unknown in our setting. In addition, we compared our results to three non-parametric algorithms: Gaussian Process ﬁltering (GP) [1], the Kalman ﬁlter based algorithm described in [3] (KKF) and the observer framew ork presented in [22]. Note that there is a fundamental difference between our method and two of the competing non- parametric methods, GP and KKF . Both GP and KKF require a subset of data pairs, { θ n , z n } N n =1 , in their construction. In our e xperiments, we used a subset of N = 100 and N = 21 samples, covering the entire state space, for algorithms GP and KKF , respecti vely . Moreover , the KKF algorithm requires 0.4 0.6 0.8 1 SNR 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Normalized RMSE (log scale) 0.4 0.6 0.8 1 SNR 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Normalized RMSE (log scale) Meas. DMK Observer PF KKF GP Fig. 2: A verage and standard deviation of the nRMSE of the clean measurement estimates. The nRMSE values were av eraged ov er 50 realizations of process and noise trajectories. a choice of a kernel function. Here, we used the same k ernel function as in [3]. In contrast, our DMK frame work can provide a ﬁltered version of the measurements and a ne w set of coordinates representing the system characteristics without any information on the underlying state or a speciﬁc choice of a kernel function. Howe ver , note that if an estimate of a speciﬁc state r epresentation is required, some alignment between the DMK coordinates and the underlying state may be needed. Here, we used the mapping deﬁned in (17), α , and ob- tained an estimate of the clean system measurement, g ( θ n ) . Therefore, the comparison between the different algorithms was performed in the measurement domain. Figure 2 presents plots of the normalized root mean square error (nRMSE) values (in log scale) of the clean measurement estimates, φ n and r n , obtained by the compared algorithms and the measurement error (denoted by Meas.) as an upper bound. The nRMSE values are presented as a function of the measurement noise level (SNR), where the a verage and standard deviation of the nRMSE were calculated over 50 different process realizations, for each noise lev el separately . W e note that the extended Kalman ﬁlter (EKF), with the true system model, was considered as well. Howe ver , the EKF led to results which were similar to the particle ﬁlter and was omitted for brevity therefor . Figure 2 depicts that the DMK outperforms all three non-parametric algorithms in all noise le vels. In addition, for high SNR, the DMK errors are close to the errors obtained by the parametric particle ﬁlter , which has access to the true system equations . Figure 3 further demonstrates this and presents an example for the clean measurement estimation obtained by DMK (in blue) and by the particle ﬁlter (in gray), compared with the true clean measurements (in dotted black) and the noisy measurements (gray ’x’), with SNR=1. Plot (a) presents coordinate φ n and plot (b) presents coordinate r n . This ﬁgure depicts that the DMK estimation closely follows the particle ﬁlter estimation. Note that for lo wer SNR values the DMK estimation de- 9 2 2.5 3 3.5 4 4.5 5 t [sec] -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 (a) 2 2.5 3 3.5 4 4.5 5 t [sec] 2 4 6 8 10 Noisy meas. Clean meas. PF estimation DMK estimation (b) Fig. 3: Example of a trajectory of the ﬁltered measurement coordinates using the DMK algorithm (in blue) compared with the PF estimation (in gray), the true clean measurements (in dotted black) and the noisy measurements (gray ’x’). teriorates in comparison with the particle ﬁlter . This is due to errors in the inferred model, caused by the high noise. As described in Subsection III-B and Subsection III-C, the new system model, which is deri ved based on dif fusion maps, is accurate only for the noiseless case. The noise affects the modiﬁed Mahalanobis distance and, as a result, the obtained diffusion maps coordinates and eigenv alues contain errors. Since the eigen values are used in the model dynamics, these errors af fect the estimation quality of our frame work. Ne ver - theless, DMK works quite well in noisy situations, especially in comparison with the non-parametric frameworks. In order to demonstrate the extent of the estimation de- terioration due to noise, we compared the DMK algorithm, which w as constructed based on the noisy measurements, and a modiﬁed DMK algorithm, where the dynamics (diffusion maps eigen values) were obtained by applying dif fusion maps to the clean measurements and calculating the corresponding “clean dynamics”. The resulting nRMSE for different SNR values are presented in Figure 4, where the average and standard deviation of the nRMSE were calculated based on 50 process and noise realizations for each SNR value. This ﬁgure presents the DMK based on the noisy measurements in blue, the DMK with the “clean dynamics” ( λ ) in cyan and the particle ﬁlter , as a baseline, in gray . As expected, for high SNR, both the noisy dynamics and the clean dynamics lead to a similar result, whereas for low SNR, the clean dynamics improv e the result. This indicates that our method is indeed af fected by the measurement noise. Howe ver , note that ev en with these model errors, our method still outperforms the competing non- 0.2 0.4 0.6 SNR 0.4 0.5 0.6 0.7 0.8 Normalized RMSE (log scale) 0.2 0.4 0.6 SNR 0.4 0.5 0.6 0.7 Normalized RMSE (log scale) DMK (clean ) DMK (noisy ) PF Fig. 4: A verage and standard deviation of the nRMSE of the clean measurement estimates, over 50 process and noise realizations. A comparison between the DMK algorithm con- structed based on the noisy measurements (in blue) the DMK with the “clean dynamics” (in cyan) and the PF (in gray). parametric algorithms, as depicted in Figure 2. T o complete the analysis of this example, we compare the con vergence rates of DMK based on the noisy measurements, DMK with the “clean dynamics” and the con ver gence rates of the parametric and non-parametric algorithms. Figure 5 and Figure 6 present the av erage asymptotic RMSE (aRMSE), i.e. r 1 M P M k =1    ˆ z ( k ) n − ζ ( k ) n    2 2 , where M is the number of process and noise realizations, M = 50 in both ﬁgures, ˆ z ( k ) n denotes an estimation of the measurements (ﬁltered) and ζ ( k ) n denotes the true clean measurements at time n in realization k . Figure 5 presents a comparison between the DMK algorithm based on the noisy measurements (in blue), the DMK algorithm with the “clean dynamics” (in cyan) and the particle ﬁlter (in gray), for two SNR values. Plot (a) presents the aRMSE v alues for measurement coordinate φ n and plot (b) presents the aRMSE values for measurement coordinate r n . This ﬁgure depicts that the con vergence rate of DMK is not affected by the noise. In addition, the con ver gence rate of DMK is either comparable (in plot (a)) or faster (in plot (b)) compared with the con ver gence rate of the particle ﬁlter , for both SNR v alues. Figure 6 presents a comparison of the asymptotic RMSE values of the cleaned measurements obtained by the DMK algorithm (in blue), by the two competing non-parametric algorithms, GP and KKF (in light gray and in black, respec- tiv ely) and by the particle ﬁlter (in dark gray), for two different SNR values. Plots (a) and (c) were created using an SNR of 0 . 67 and plots (b) and (d) were created using an SNR of 0 . 18 . This ﬁgure depicts that the con ver gence rates of all the non-parametric algorithms are similar in both measurement coordinates and for both SNR values. B. Non-Gaussian Nonlinear Object T rac king W e no w present an example which is based on the setting giv en in [22], [29]. In this setting, the location of a radi- ating object moving on a 3D sphere is estimated based on 10 0 10 20 30 40 50 60 0.1 0.2 0.3 0.4 0.5 0.6 DMK (clean ),SNR =0.18 DMK (clean ),SNR =0.67 DMK (noisy ),SNR =0.18 DMK (noisy ),SNR =0.67 PF,SNR =0.18 PF,SNR =0.67 (a) 0 10 20 30 40 50 60 1 2 3 4 5 (b) Fig. 5: Asymptotic RMSE, averaged ov er 50 realizations, of the measurements estimation using the DMK algorithm constructed based on the noisy measurements, colored in blue, the DMK algorithm with the “clean dynamics”, colored in cyan, and the particle ﬁlter , colored in gray . measurements from three sensors, s (1) , s (2) , s (3) , which are modeled as “Geiger Counters”. The movement of the object is deﬁned by two underlying Langevin processes, describing the elev ation and azimuth angles as follo ws ∆ θ (1) n +1 =  π 2 · c − c · θ (1) n  + u (1) n (28) ∆ θ (2) n +1 =  π 5 · c − c · θ (2) n  + u (2) n (29) where b is the diffusion coefﬁcient and c is the drift rate parameter . In our experiments, b was set to 0 . 01 and c was varied between 0 . 1 and 1 to simulate dif ferent trajectory properties. The 3D location of the object at the n th time step is computed by: x (1) n = cos  θ (2) n  sin  θ (1) n  x (2) n = sin  θ (2) n  sin  θ (1) n  (30) x (3) n = cos  θ (1) n  W e mark the 3D position of the object by x n = h x (1) n , x (2) n , x (3) n i . The system measurements are gi ven by three Poisson pro- cesses, with a rate parameter that is based on the 3D location of the object. y ( j ) n ∼ P ois  r ( j ) n  j = 1 , 2 , 3 (31) where r ( j ) n = exp  −   s ( j ) − x n    . Finally , a Poisson noise process with a ﬁxed rate parameter, v ( j ) n ∼ P ois ( λ v ) , is added to each sensor z ( j ) n = y ( j ) n + v ( j ) n (32) where z ( j ) n are the accessible system measurements. Note that the presented setting is non-Gaussian and there- fore the Kalman ﬁlter assumptions are not held. Howe ver , we sho w that DMK still pro vides good results, due to the incoporation of time dependencies, and is either better or comparable to the observer frame work presented in [22]. In order to obtain an estimated representation of the sys- tem state (the azimuth and elev ation angles) from the noisy measurements, { z n } N n =1 , we apply DMK. W e simulated 300 , 000 time samples of the two underlying angles θ (1) n , θ (2) n with ∆ t = 0 . 5 and constructed the measure- ments according to (30) and (32). After obtaining the system measurements z n , we ﬁrst performed a pre-processing stage, similarly to [22], [29]. This includes constructing histograms for overlapping frames of 60 time-samples of z n and then calculating the modiﬁed Mahalanobis distance (9) between pairs of histograms. The cov ariance matrices for the mod- iﬁed Mahalanobis distance between the measurements were calculated similarly to Subsection IV -A, based on the empir- ical cov ariance of overlapping windows of 20 time frames, centered at each measurement. At this point we are left with 300 , 000 / 60 = 5000 system measurements. The dif fusion maps algorithm, described in Subsection III-B, is applied to the measurements using the calculated Mahalanobis distance, with an empirical choice of  as the median of the Euclidean distances. W e obtain a set of eigen- vectors and eigen values representing intrinsic properties of the system. Howe ver , these eigenv ectors do not necessarily correspond to the true system state, θ n , and can represent some linear combination of the state coordinates [30]. Therefore, for e valuation purposes, we perform a linear regression on 3 eigen vectors, corresponding to the largest eigen values, and the true system state, based on 100 samples. Note that in this speciﬁc example, the linear regression provides a good representation of the underlying state using only a few dif- fusion maps coordinates. This is due to the choice of the underlying state equation and the lift function which places the data on a sphere. This system was speciﬁcally chosen in this application, since the measurements (spike trains) cannot be easily i nterpretable, in contrast to the example in Subsection IV -A. In Subsection IV -C, a more complicated system is presented, where real neuronal spiking activity is analyzed. For such data, the underlying state equation and system model are completely unknown, and indeed, there, a larger number of coordinates is required to obtain a good representation of the true underlying state (the animal position). Based on the resulting eigenv ectors and eigen values we construct the Kalman ﬁlter, described in Subsection III-D. The cov ariance matrices of the measurement noise, R n , and the state noise, Q n , were estimated from the data (according to the v ariance of the histogram measurements and the covariance of the obtained eigen vectors and eigen values) as described in Subsection IV -A. 11 0 10 20 30 40 50 60 0.1 0.2 0.3 0.4 0.5 0.6 DMK PF KKF GP (a) 0 10 20 30 40 50 60 0.1 0.2 0.3 0.4 0.5 0.6 DMK PF KKF GP (b) 0 10 20 30 40 50 60 1 2 3 4 5 (c) 0 10 20 30 40 50 60 1 2 3 4 5 (d) Fig. 6: Asymptotic RMSE, av eraged over 50 realizations, of the measurements estimation using the DMK (in blue), the PF (in dark gray), KKF , (in black) and GP (in light gray). The SNR was set to 0.67 in (a) and (c) and to 0.18 in (b) and (d). Plots (a) and (b) present the asymptotic RMSE v alues for measurement coordinate ψ n and plots (c) and (d) present the asymptotic RMSE values for measurement coordinate r n . W e compare our results to the observer framework, de- scribed in [22], with a choice of γ = 0 . 01 (which led to the best results in this case). In Figure 7, a comparison between the DMK, the observer framew ork and the diffusion maps coordinates (without additional analysis) is presented. This ﬁgure contains 6 identical scatter plots, each presenting the true underlying angles, θ (1) n , θ (2) n . Each plot is colored accord- ing to a different coordinate, plots (a) and (d) are colored according to the ﬁrst and second estimated coordinates of the observer framework, plots (b) and (e) are colored according to the DMK estimation and plots (c) and (f) are colored according to the diffusion maps coordinates (DM). The color gradients in Figure 7 depict that the DMK signiﬁcantly improves the estimation of the two underlying angles, compared with the diffusion maps coordinates and the observer frame work. Moreov er , the coordinates obtained by the observer framew ork suffer from inaccuracies at the boundaries of the data. This is visible for example, in plot (a), when θ (1) n < 0 . 5 and is due to the inaccuracy of the linear lift function at the boundaries. These inaccuracies are not apparent in the DMK coordinates which recover the true underlying angles more accurately ev en at the boundaries of the data. Figure 8 presents the correlation between the DMK coor - dinates (after linear regression based on 100 samples) and the true underlying states, θ (1) and θ (2) , colored in blue. The average and standard deviation over 50 realizations are presented. For comparison, we present the correlations of the observer coordinates, in green, and the diffusion maps coordinates, in black. These plots depict that e ven though the Kalman model is inaccurate in this example, the DMK obtains result which are either better or comparable to the observer , in different drift-diffusion rate ratios. C. Location T rac king based on Rat Hippocampal Neur onal Activity T o demonstrate our framew ork on real data, we consider a publicly av ailable data set [31]. This data set contains simultaneous recordings of hippocampal regions C A 1 and C A 3 or regions C A 1 and M E C in 9 rats during a spatial alternation task in a W -shaped maze. W e sho w that when applying DMK to the neuronal spiking acti vity , it re veals a meaningful representation which is related to the 2D position of the rat. A related work [32] addressed this task using a parametric framew ork that recovers a latent state governing the spike rates. Their framew ork is based on a Poisson process that generates the spiking data and two Gaussian processes, which model the latent state dynamics and the mapping between the latent state and the ﬁring rate. This parametric method assumes a speciﬁc model and is especially suited for neuronal spike train data, whereas our method is non-parametric and can be applied to a lar ger class of problems. From the available data set [31], we focus on e xperiments in which the animals were activ ely moving in the maze. In addition, since some recordings include only a small number of activ e neurons, we restrict our analysis to e xperiments which contain more than 20 activ e neurons and animals which hav e more than 3 such experiments (for e v aluation of error 12 0.5 1.5 2.5 -2 -1 0 1 2 0 1 2 3 4 (a) 0.5 1.5 2.5 -2 -1 0 1 2 0 1 2 3 4 (b) 0.5 1.5 2.5 -2 -1 0 1 2 0 1 2 3 4 (c) 0.5 1.5 2.5 -2 -1 0 1 2 -2 -1 0 1 2 (d) 0.5 1.5 2.5 -2 -1 0 1 2 -2 -1 0 1 2 (e) 0.5 1.5 2.5 -2 -1 0 1 2 -2 -1 0 1 2 (f) Fig. 7: Scatter plots of the azimuth and ele vation angles, colored by the coordinate estimates. The plots are colored according to the the ﬁrst and second state estimates of the observer framwork (in (a) and (d)), the ﬁrst and second state estimates of our suggested DMK ﬁlter (in (b) and (e)) and the ﬁrst and second coordinates obtained by diffusion maps (in (c) and (f)). 10 20 30 40 0 0.2 0.4 0.6 0.8 1 DM Observer DMK (a) 10 20 30 40 0 0.2 0.4 0.6 0.8 1 (b) Fig. 8: Correlation between the DMK coordinates and the true underlying system state coordinates, in blue, compared with the observer coordinates, in green, and the diffusion maps coordinates, in black. statistics). The remaining data include six animals (abbreviated Bon, T en, Cor , Eig, Fra and Mil) with 10 to 24 e xperiments each. The length of each experiment ranged between 7 to 15 minutes. W e apply the proposed method to the recorded neuronal spiking data and construct a new data-driv en representation. W e sho w that this ne w representation is highly related to the true position of the animal. For this purpose, we ﬁrst bin the data to obtain a histogram for each neuron depicting the spike rate over time. The histogram bin sizes were chosen empirically , such that at least one neuron is acti ve at any time frame (bin). The bin sizes ranged between 1 − 2 . 6 seconds (chosen differently in different animals). The bin sizes, number of experiments per animal and number of activ e neurons used in our analysis are summarized in T able I. The diffusion maps algorithm was applied to the obtained histograms, which are treated as measurements in the construc- tion of the afﬁnity matrix described in (19). The cov ariance matrices for the modiﬁed Mahalanobis distance were estimated based on overlapping windows of 15 time frames (histogram bins), and the kernel scale,  , was empirically set to be 3 times the median of the distances, to avoid outliers in the dif- fusion maps coordinates. The af ﬁnity matrix was normalized according to (20) and its ﬁrst 20 (non-tri vial) eigenv ectors, corresponding to the largest eigen values, were computed. The DMK was then applied to these eigen vectors, resulting in a new representation of the data, ˆ Ψ n = [ ψ (1) n , . . . , ψ (20) n ] . The DMK covariance matrices were estimated based on the data and the calculated eigen vectors, similarly to Subsection IV -A and Subsection IV -B, where Q n ( k , k ) = v ar( λ k ψ k ) and R n was set to be a diagonal matrix with the variance of the histograms (measurements) on its diagonal. Each experiment in each animal was analyzed and ev aluated separately in order to av oid batch effects. An example for the resulting representation of one ex- periment is presented in Figure 9. This ﬁgure presents 6 identical plots containing the true x-y positions of the animal (marked by ﬁlled circles), where each plot is colored by a different DMK coordinate, according to ln | ˆ ψ i | . W e note that in this example, only 6 eigen vectors were used as input to the DMK algorithm, since these 6 coordinates provided a good representation for the position of the animal. In other experiments, 6 eigen vectors were not always sufﬁcient and 13 Bon T en Cor Eig Fra Mil Histogram bin size (sec) 1 1 2 . 6 1 . 6 1 1 Number of experiments 24 14 10 12 18 12 Number of neurons per exp. 33 − 67 42 − 60 20 − 26 29 − 59 40 − 69 25 − 38 T ABLE I: Hipocampal neuronal acti vity - analysis parameters therefore, for consistency , 20 eigen vectors were used in all experiments in the follo wing analysis. Figure 9 depicts that the DMK algorithm provides a mean- ingful representation for the animal location, since different 2D locations are highlighted in different coordinates (colored in yello w). By combining information from all coordinates, the 2D location can be inferred. For example, the middle arm in the W -shaped maze, ( x, y ) ≈ (160 , 110) , is represented by high values (mostly) in coordinates ˆ ψ 2 and ˆ ψ 3 , whereas the lower arm, ( x, y ) ≈ (160 , 80) , is represented by high values (mostly) in coordinates ˆ ψ 2 and ˆ ψ 5 . In addition, Figure 9 demonstrates that the obtained representation covers dif ferent regions in the 2D space in a relati vely smooth manner . W e emphasize that the DMK coordinates presented in Figure 9 were obtained in a completely data-driv en manner and with minimal model assumptions. In order to quantitativ ely assess the quality of the new representation obtained by the DMK, we divided the data in each experiment into a train set, consisting of 80% of the data, and a test set, consisting of 20% of the data and performed cross-validation. W e performed linear regression between the DMK coordinates representing the data in the train set and the true position of the animal and then used the regression parameters to estimate the animal’ s position based on the DMK representation of the test set. An e xample for the resulting position estimation based on the test set of one e xperiment is presented in Figure 10. This ﬁgure presents the estimated x position (top plot) and estimated y position (bottom plot) based on the DMK coordinates (in blue) compared with the true position (in black). Figure 10 depicts that after the linear regression, the estimated position based on the DMK closely follows the true position of the animal. The corresponding 2D position of the animal in this exper - iment is presented in Figure 11, where the position estimation based on the DMK coordinates is marked by blue ’x’ and the true 2D position is marked by black circles. This plot depicts that after linear regression, the DMK coordinates are highly related to the 2D position of the animal in most locations. Note that the edge of the lower arm of the W -shaped maze is not represented properly by the DMK coordinates. This is consistent with the result in Figure 9, where none of the coordinates captures this speciﬁc location properly . For comparison, we applied linear regression to two addi- tional data representations: (1) diffusion maps with no addi- tional processing and (2) principal component analysis (PCA) applied to the spike rate histograms. W e calculated the correlation between the estimated position and the true position for each experiment in all animals. Figure 12 presents the correlation v alues of the x and y positions, for each animal separately , based on the train set and on the test set. The av erage and standard deviation of the correlation values were calculated ov er 5-fold cross-validation, where the data was di vided into 5 consecuti ve segments. This ﬁgure depicts that the DMK coordinates provide a meaningful representation that relates to the true position of the animal, since simply by linear re gression, the DMK coordinates gi ve rise to a good estimation of the true position. In addition, this representation is signiﬁcantly better than the representations obtained by the dif fusion maps coordinates and by PCA. Note that the linear regression here was performed solely for a quantitati ve ev aluation of the constructed coordinates. Our main result here is that the data driv en DMK coordinates contain meaningful information re garding the location of the animal. W e conclude by noting that in this example, the true system model is completely unkno wn and is not necessarily compatible with the assumed process form in (5). Howe ver , we demonstrated that our method can still be used to obtain a meaningful representation in this application. V . C O N C L U S I O N In this work we addressed the analysis of high-dimensional nonlinear stochastic dynamical systems with measurement noise, and presented a non-parametric ﬁltering framework in which a data-driv en linear Kalman ﬁlter is constructed based on diffusion maps coordinates, utilizing their inherent dynam- ics and properties. W e showed that the presented framework recov ers new coordinates that capture meaningful properties of the system gi ven only a set of noisy measurements and with no further knowledge on the system properties. These coordinates may not be directly related to the underlying system state, yet they can be employed for ﬁltering the measurements. T o obtain an estimate of a speciﬁc underlying state, some alignment is required based on a training set, for example, using linear regression. The theoretical justiﬁcation of the proposed framework required fe w assumptions. First, the deri v ation of the re- cov ered system dynamics was based on the assumption that the underlying system dynamics e volv e according to gradient ﬂows with constant diffusion. Second, for compatibility of the reco vered system with the Kalman ﬁltering frame work, we assumed that the leading dif fusion maps coordinates are slowly e volving functions of the system state and are only mildly affected by the measurement noise. Several studies, e.g. [21], [33] and [34], have shown that based on properties of the diffusion maps coordinates, this assumption commonly holds. Third, we speciﬁed the conditions under which the 14 120 160 200 70 110 150 y -8 -7 -6 -5 -4 120 160 200 70 110 150 -8 -7 -6 -5 -4 120 160 200 70 110 150 -8 -7 -6 -5 -4 120 160 200 x 70 110 150 y -8 -7 -6 -5 -4 120 160 200 x 70 110 150 -8 -7 -6 -5 -4 120 160 200 x 70 110 150 -8 -7 -6 -5 -4 Fig. 9: True x-y positions of the animal in one experiment (ﬁlled circles), colored according to 6 DMK coordinates ( l n | ˆ ψ i | ) calculated based on the spike rate histograms 20 40 60 80 100 120 140 160 180 200 t [sec] 100 150 200 250 x 20 40 60 80 100 120 140 160 180 200 t [sec] 50 100 150 y True position DMK Fig. 10: Example of the estimated x-y positions of the animal in one experiment. The estimations are presented for the test- set, after applying linear regression to the DMK coordinates. 100 120 140 160 180 200 220 240 40 60 80 100 120 140 160 180 DMK True position Fig. 11: Estimated 2D position in one experiment. The es- timation is presented for the test-setm after applying linear regression to the DMK coordinates Bon Ten Cor Eig Fra Mil Animal 0.2 0.4 0.6 0.8 1 Correlation of x Train set Bon Ten Cor Eig Fra Mil Animal 0.2 0.4 0.6 0.8 1 Correlation of y Train set Bon Ten Cor Eig Fra Mil Animal 0.2 0.4 0.6 0.8 1 Correlation of x Test set Bon Ten Cor Eig Fra Mil Animal 0.2 0.4 0.6 0.8 1 Correlation of y Test set DMK DM PCA Fig. 12: Correlations between the estimated coordinates (after linear regression) and the true x-y positions, on the train-set and the test-set, for a 5-fold cross-v alidation. devised system is observable and detectable, and sho wed that these conditions hold for an informed choice of the diffusion maps coordinates. Although these assumptions are required for the theoretical deri vations, the experimental results depict that our framew ork performs well in comparison to other non-parametric algorithms, ev en when these assumptions are not fully met. W e sho wed that the proposed method obtains an improv ed representation of dynamical systems compared with the diffusion maps coordinates and other non-parametric methods. In the future, we plan to extend our method and address multi-modal data-sets arising from stochastic dynamical sys- tems. W e will de vise methods for rev ealing the underlying 15 common dynamics based on measurements from dif ferent sensors. V I . A C K N O W L E D G E M E N T S W e would like to thank the Associate Editor and the anony- mous revie wers for their helpful comments and suggestions. R E F E R E N C E S [1] M. P . Deisenroth, R. D. T urner, M. F . Huber , U. D. Hanebeck, and C. E. Rasmussen, “Robust ﬁltering and smoothing with gaussian processes, ” IEEE T ransactions on A utomatic Control , vol. 57, no. 7, pp. 1865–1871, 2012. [2] Y . W ang, M. A. Brubaker , B. Chaib-draa, and R. Urtasun, “Bayesian ﬁltering with online gaussian process latent variable models., ” in UAI , 2014, pp. 849–857. [3] A. Surana and A. Banaszuk, “Linear observer synthesis for nonlinear systems using koopman operator framework, ” IF AC-P apersOnLine , vol. 49, no. 18, pp. 716–723, 2016. [4] S. L. Brunton, B. W . Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz, “Chaos as an intermittently forced linear system, ” Nature Communications , vol. 8, 2017. [5] T . Berry , D. Giannakis, and J. Harlim, “Nonparametric forecasting of low-dimensional dynamical systems, ” Physical Review E , vol. 91, no. 3, pp. 032915, 2015. [6] F . T an, W . Lohmiller , and J. J. Slotine, “ Analytical slam without linearization, ” in Robotics Researc h , pp. 89–105. Springer, 2018. [7] R.E. Kalman, “ A new approach to linear ﬁltering and prediction problems, ” T rans. ASME J. Basic Eng. , vol. 82, pp. 34–45, 1960. [8] I. Mezi ´ c, “Spectral properties of dynamical systems, model reduction and decompositions, ” Nonlinear Dynamics , vol. 41, no. 1, pp. 309–325, 2005. [9] M. Budiši ´ c, R. Mohr, and I. Mezi ´ c, “ Applied koopmanism a, ” Chaos: An Inter disciplinary Journal of Nonlinear Science , vol. 22, no. 4, pp. 047510, 2012. [10] M. O. W illiams, I. G. Ke vrekidis, and C. W . Rowle y , “ A data-driv en approximation of the koopman operator: Extending dynamic mode decomposition, ” Journal of Nonlinear Science , vol. 25, no. 6, pp. 1307– 1346, December 2015. [11] I. Mezi ´ c and A. Banaszuk, “Comparison of systems with complex behavior , ” Physica D: Nonlinear Phenomena , vol. 197, no. 1, pp. 101– 133, 2004. [12] R. Mohr and I. Mezi ´ c, “Construction of eigenfunctions for scalar- type operators via laplace averages with connections to the koopman operator , ” arXiv preprint , 2014. [13] S. M. Ulam, “Problems of modern mathematics. science editions, new york, 1964. originally published as: A collection of mathematical problems, ” 1960. [14] S. Klus, P . Koltai, and C. Schütte, “On the numerical approxima- tion of the perron-frobenius and koopman operator , ” arXiv pr eprint arXiv:1512.05997 , 2015. [15] N. ˇ Crnjari ´ c-Žic, S. Ma ´ ceši ´ c, and I. Mezi ´ c, “K oopman operator spectrum for random dynamical system, ” arXiv preprint , 2017. [16] T . Berry and J. Harlim, “Nonparametric uncertainty quantiﬁcation for stochastic gradient ﬂows, ” SIAM/ASA Journal on Uncertainty Quantiﬁcation , vol. 3, no. 1, pp. 484–508, 2015. [17] F . Hamilton, T . Berry , and T . Sauer , “Ensemble kalman ﬁltering without a model, ” Physical Review X , vol. 6, no. 1, pp. 011021, 2016. [18] D. Giannakis, “Data-driv en spectral decomposition and forecasting of ergodic dynamical systems, ” Applied and Computational Harmonic Analysis, in press , 2017. [19] R. Coifman and S. Lafon, “Diffusion maps, ” Appl. Comput. Harmon. Anal. , vol. 21, pp. 5–30, Jul. 2006. [20] F . T akens, “Dynamical systems and turbulence (detecting strange attractors in ﬂuid turbulence), ” 1981. [21] R.R. Coifman, I.G. Kevrekidis, S. Lafon, M. Maggioni, and B. Nadler, “Diffusion maps, reduction coordinates, and low dimensional represen- tation of stochastic systems, ” Multiscale Modeling & Simulation , vol. 7, no. 2, pp. 842–864, 2008. [22] T . Shnitzer , R. T almon, and J. J. E. Slotine, “Manifold learning with contracting observers for data-driv en time-series analysis, ” IEEE T ransactions on Signal Processing , vol. 65, no. 4, pp. 904–918, 2017. [23] A. Singer and R. Coifman, “Non-linear independent component analysis with diffusion maps, ” Appl. Comput. Harmon. Anal. , vol. 25, pp. 226– 239, 2008. [24] Y . Bulut and O. Bayat, “Kalman ﬁltering with model uncertainties, ” in T opics in Modal Analysis I, V olume 5 , pp. 447–455. Springer, 2012. [25] M.S. Arulampalam, S. Maskell, N. Gordon, and T . Clapp, “ A tutorial on particle ﬁlters for online nonlinear/non-Gaussian Bayesian tracking, ” IEEE T rans. Signal Process. , vol. 50, pp. 174–188, 2003. [26] Arnaud Doucet, Simon Godsill, and Christophe Andrieu, “On sequential monte carlo sampling methods for bayesian ﬁltering, ” Statistics and computing , vol. 10, no. 3, pp. 197–208, 2000. [27] T . Berry and T . Sauer , “ Adaptive ensemble kalman ﬁltering of non-linear systems, ” T ellus A: Dynamic Meteor ology and Oceanography , vol. 65, no. 1, pp. 20331, 2013. [28] Eduardo D Sontag, Mathematical control theory: deterministic ﬁnite dimensional systems , vol. 6, Springer Science & Business Media, 2013. [29] R. T almon and R.R. Coifman, “Empirical intrinsic geometry for nonlinear modeling and time series ﬁltering, ” Pr oc. Nat. Acad. Sci. , vol. 110, no. 31, pp. 12535–12540, 2013. [30] C.J. Dsilva, R. T almon, R.R. Coifman, and I.G. Kevrekidis, “Parsimo- nious representation of nonlinear dynamical systems through manifold learning: A chemotaxis case study , ” Applied and Computational Har- monic Analysis , 2015, In Press. [31] M Karlsson, M Carr , and LM Frank, “Simultaneous extracellular recordings from hippocampal areas ca1 and ca3 (or mec and ca1) from rats performing an alternation task in two w-shapped tracks that are geometrically identically but visually distinct. crcns. org, ” 2015. [32] Anqi W u, Nicholas G Roy , Stephen Keeley , and Jonathan W Pillo w , “Gaussian process based nonlinear latent structure discovery in multi- variate spike train data, ” in Advances in Neural Information Pr ocessing Systems , 2017, pp. 3496–3505. [33] A. Singer, R. Erban, I. G. Ke vrekidis, and R. Coifman, “Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps, ” PNAS , vol. 106, no. 38, pp. 16090–1605, 2009. [34] C.J. Dsilva, R. T almon, C.W . Gear , R.R. Coifman, and I.G. Ke vrekidis, “Data-driv en reduction for a class of multiscale fast-slow stochastic dynamical systems, ” SIAM Journal on Applied Dynamical Systems , vol. 15, no. 3, pp. 1327–1351, 2016.

Diffusion Maps Kalman Filter for a Class of Systems with Gradient Flows

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment