Generalized chi-squared detector for LTI systems with non-Gaussian noise

Previously, we derived exact relationships between the properties of a linear time-invariant control system and properties of an anomaly detector that quantified the impact an attacker can have on the system if that attacker aims to remain stealthy t…

Authors: Navid Hashemi, Justin Ruths

Generalized chi-squared detector for LTI systems with non-Gaussian noise
Generalized chi-squar ed detector f or L TI systems with non-Gaussian noise Navid Hashemi 1 and Justin Ruths 1 Abstract — Pre viously , we derived exact relationships between the properties of a linear time-in v ariant control system and properties of an anomaly detector that quantified the impact an attacker can hav e on the system if that attacker aims to remain stealthy to the detector . A necessary first step in this process is to be able to precisely tune the detector to a desired level of performance (false alarm rate) under normal operation, typically through the selection of a thr eshold parameter . T o- date efforts have only considered Gaussian noises. Here we generalize the approach to tune a chi-squared anomaly detec- tor for noises with non-Gaussian distributions. Our method leverages a Gaussian Mixture Model to repr esent the arbitrary noise distributions, which preser ves analytic tractability and pro vides an inf ormative interpr etation in terms of a collection of chi-squared detectors and multiple Gaussian disturbances. I . I N T RO D U C T I O N Model-based anomaly detection uses a predictor to fore- cast the ev olution of a dynamical system. This prediction is compared with the actual measured v alue to test for an appreciable discrepancy , which may indicate the presence of anomaly in the system. If the predictor is perfect, then this task is easy - any discrepancy is enough to cause concern. The task is made more challenging because the predictor has some uncertainty in its forecast, which implies that relatively small discrepancies may be due to this uncertainty rather than anomalous behavior . One of the most fundamental sources of uncertainty in con ventional control systems is captured in the terms that quantify system and measurement noise. The ov erwhelming majority of work on stochastic model-based detection assumes system and measurement noises that are Gaussian distributed, see e.g., [1], [2], [3]. In the context of linear time-in variant systems especially , normal distrib utions preserve the capability of finding analytic solutions. In this work, we do not assume any structure of the noise proba- bility density functions; howe ver , we manage to retain some analytic tractability by employing a Gaussian Mixture Model (GMM) representation of these arbitrary distributions. In a stochastic context, detectors necessarily trade off sensitivity (rate of true positives - true alarms) for fall-out (rate of false positives - false alarms). The machinery we present to tune detectors is a necessary step to (a) understand the shape of the Receiver Operating Characteristic (ROC) curve, (b) quantify how that shape depends on system and design parameters, and (c) select a point on the ROC that has a desired lev el of performance - as measured by a 1 These authors are with the Departments of Mechanical and Systems Engineering at the Univ ersity of T exas at Dallas, Richardson, T exas, USA navid.hashemi@utdallas.edu, jruths@utdallas.edu tolerable true/false positiv e balance. In past work we have provided methods to tune sev eral popular detectors when the uncertainty is dri ven by Gaussian system and measurement noises [4], [5]. When noise distributions deviate significantly from this assumption, then ne w tools are required to produce an effecti ve means of tuning detectors. Here we provide a generalized chi-squared detector that can be tuned to any desired rate of false alarms through the appropriate selection of the detector sensiti vity threshold. The process of tuning model-based detectors is to propa- gate the uncertainty , here the noise distributions, through the system and into the detector . By quantifying the distributions the detector expects to see under normal operation (with no anomalies), we can quantify the trade-of f between the true and false positive rates based on the sensitivity threshold we select. W e begin by re vie wing the tools for the Gaussian case and conclude by demonstrating the backwards compatibility of the results we develop with the Gaussian case and by showing an example. I I . M O D E L - BA S E D A N O M A L Y D E T E C T I O N Consider a general discrete-time linear time-in v ariant (L TI) system, x k +1 = F x k + Gu k + v k y k = C x k + η k (1) with time index k ∈ N , state x k ∈ R n , output y k ∈ R p , input u k ∈ R m , matrices F , G , and C of appropriate dimensions, and iid multiv ariate noises v k ∈ R n and η k ∈ R p with cov ariance matrices R 1 ∈ R n × n , R 1 ≥ 0 and R 2 ∈ R p × p , R 2 ≥ 0 respectively . The random processes v k and η k are mutually independent. W e assume that ( F , G ) is stabilizable and ( F, C ) is detectable such that there are no unstable unobservable or uncontrollable modes. The main idea behind fault detection theory is to use an estimator to forecast the evolution of the system in the absence of fault [6]. In this analysis we consider a model- based estimator of Luenberger form, ˆ x k +1 = F ˆ x k + Gu k + L ( y k − C ˆ x k ) , (2) where the estimator has perfect model kno wledge, L is the observer gain matrix, and ˆ x k +1 is the predicted state. This prediction is compared to the actual measurement recei ved from the sensors. If the difference between the measurement and the predicted output is large, there may be an anomaly in the system. This dif ference is typically called the residual, r k = y k − C ˆ x k , (3) where y k is the actual measurement and C ˆ x k is the estimated output. Different model-based detectors use the residual in different ways to quantify deviation away from the estimate. One of the simplest and most widely used approach con- structs the distance measure z k from a quadratic form of the residual, z k = ( r k − µ ) T Σ − 1 ( r k − µ ) , z k > α − → alarm (4) where µ ∈ R p and Σ ∈ R p × p are the mean and cov ariance of the residual random vector , r k , under normal operation (no faults or attacks). This detector raises alarms if z k exceeds an assigned threshold, i.e., z k > α , α ∈ R > 0 . When the system and sensor noises are zero-mean and Gaussian distrib uted, then the residual is also a zero- mean Gaussian random variable, i.e., r k ∼ N (0 , Σ) , with cov ariance Σ = C P C T + R 2 , where P represents the asymptotic co variance of the estimation error , lim k →∞ P k = lim k →∞ E [ e k e T k ] = P , e k = x k − ˆ x k , and is the solution of the L yapunov equation [7], ( F − LC ) P ( F − LC ) T − P + R 1 + LR 2 L T = 0 . (5) The distance measure z k is then a sum of squared Gaussian variables making it chi-squared distrib uted. For this reason, the detector (4) is con ventionally called the Chi-Squar ed Detector . This name underscores the pre v ailing assumption that system and sensor noises are Gaussian distributed - the simplifying assumption we lift in this work. In past work, we characterized the relationship to select the threshold α in (4) to yield a desired false alarm rate. Effecti vely capturing the receiver operator curve (ROC) of the detector, this characterization is critical for comparisons across detectors because all detectors must be tuned for comparable performance. The following Lemma states the relationship for a chi-squared detector with zero-mean Gaus- sian system and measurement noises. Lemma 1 ([5]): Assume that there are no anomalies present in an L TI system (1) driv en by zero-mean Gaussian system and measurement noises such that the residual r k ∼ N (0 , Σ) and consider the chi-squared detector (4) with threshold α ∈ R > 0 . T o achieve a desired false alarm rate A ∗ set α = α ∗ := 2 P − 1 (1 − A ∗ , p 2 ) , where P − 1 ( · , · ) denotes the inv erse regularized lo wer incomplete gamma function. I I I . N O N - G AU S S I A N N O I S E S Although assuming noises fall according to Gaussian distributions is standard for the theoretical treatment of control systems, practical implementations for tuning chi- squared detectors on real systems can fall short when using Lemma 1 directly . System noise is typically used to aggre gate model uncertainty . F or many control concerns, increasing the cov ariance of Gaussian noise is an effecti ve, conserv ati ve approach for capturing model imperfections. Howe ver , when we aim to use this noise to predict the expected behavior of the system it is important that the noise distribution is an accurate representation of the model discrepancy . In addition, injecting conserv atism into the system noise directly reduces the sensitivity of the detector . For sensor noise, many sensors - 3 - 2 - 1 0123 0 0.25 0.5 0.75 1 Fig. 1: Giv en the probability density function (black) a Gaus- sian Mixture Model (GMM) expression can be constructed. Here the GMM is composed of three Gaussian modes. exhibit nonlinear beha vior in the distribution of their reported values or e vidence strong quantization effects; both of which can have a dramatic effect on the accuracy of Lemma 1. W e revisit the chi-squared detector defined in (4) with system and sensor noises that are arbitrary distrib uted. These distributions can be expressed as con vex mixtures of Gaus- sian distributions [8]. W e construct the so-called Gaussian mixtur e model (GMM) e xpansion of the distrib utions for system noise ( v k ) and measurement noise ( η k ), respecti vely , f η = m 1 X j =1 p η j N ( x | µ η j , K η j ) , (6) f v = m 2 X j =1 p v j N ( x | µ v j , K v j ) , (7) where m 1 is the number of Gaussian modes for the mea- surement noise and m 2 is the number of Gaussian modes for the system noise, µ η j ∈ R p and µ v j ∈ R n are the mean value of each Gaussian mode, K η j ∈ R p × p and K v j ∈ R n × n are their corresponding co v ariances. It is important that the scalar values p η j and p v j satisfy m 1 X j =1 p η j = 1 , and m 2 X j =1 p v j = 1 , (8) which follows from interpreting (6) (resp., (7)) as a total law of probability between Gaussian conditional probabilities with probability p η j (resp., p v j ). The EM algorithm supplies a reliable mechanism for computing GMMs from data with guarantees on the con ver gence of these models [9]. See Fig. 1 for an example of a GMM. The challenge, now , for anomaly detection is to accurately characterize the distribution of the residual and distance mea- sure, ideally maintaining an analytic relationship so that the role of different design variables (e.g., observer gain) can be understood. This paper employs a generalized version of the chi-squared detector and in using the GMM representation of the noise distributions allows us to dev elop the GMM representation of the residual distribution and then compute statistics about the distance measure distribution. A. Residual Distribution W e first characterize the residual distribution. Lemma 2: Given the L TI system (1) driven by system and sensor noises whose probability density functions can be expressed as the Gaussian mixture models in (6) and (7), the probability density function of the residual r k at time k ∈ N can be written as a Gaussian mixture model of m k = m k 1 m k − 1 2 Gaussian modes f r k ( x ) = m k X j =1 τ j N ( x | β j , Θ j ) , (9) where τ j represents the mixture probabilities of the Gaussian modes, β j are the means, and Θ j are the cov ariances, β j = k X κ =1 A T κ µ η n j κ + k − 1 X κ =1 B T κ µ v n j κ + k Θ j = k X κ =1 A κ K η n j κ A T κ + k − 1 X κ =1 B κ K v n j κ + k B T κ τ j = k Y κ =1 p η n j κ × k − 1 Y κ =1 p v n j κ + k A κ =  I κ = 1 − C ( F − LC ) κ − 2 L 2 ≤ κ ≤ k B κ = C ( F − LC ) κ − 1 1 ≤ κ ≤ k − 1 where n j κ captures all the possible permutations of combining terms from the system and measurement noise GMM modes, following the rules: leftmargin=1.3cm Init: n 1 κ = 1 for κ = 1 , . . . , 2 k − 1 Add: n j +1 1 = n j 1 + 1 Wrap: if n j +1 κ >  m 1 for κ = 1 , 2 , . . . , k m 2 for κ = k + 1 , . . . , 2 k − 1  then n j +1 κ +1 = n j +1 κ +1 + 1 and n j +1 κ = 1 . Remark 1: Since any probability density function can be expressed as a Gaussian mixture model, the key insight from Lemma 2 is not that the residual can be e xpressed as a GMM, but rather that the residual GMM can be found analytically from the GMMs of the system and measurement noises. The notation above is unavoidably complicated due to a large number of terms. Their definitions are straightforward by looking at the proof below - namely the expansion of the product in (16). In Section V we will discuss an effecti ve means of reducing the number of GMM terms needed to represent the density function. Pr oof: In the absence of anomalies, from the definition of the residual (3) and the system equations (1), we can recursiv ely solve for the expression of r k as a function of only the system and sensor noises (alternativ ely use the z- transform), r k = η k − k − 1 X κ =1 C ( F − LC ) κ − 1 ( Lη k − κ − v k − κ ) . (10) Thus the residual is a linear combination of sequential iid samples of noise such that its distribution can be e xpressed as a con volution integral in terms of the constituent noise distributions [10], i.e., f r k = f A k η 1 ∗ · · · ∗ f A 1 η k ∗ f B k − 1 v 1 ∗ · · · ∗ f B 1 v k − 1 (11) where A κ = − C ( F − LC ) κ − 2 L, 2 ≤ κ ≤ k , A 1 = I and B κ = C ( F − LC ) κ − 1 , 1 ≤ κ ≤ k − 1 . Since the noise samples are iid, we can drop the time dependence to yield, f r k = f A 1 η ∗ · · · ∗ f A k η ∗ f B 1 v ∗ · · · ∗ f B k − 1 v . (12) The characteristic function of a random v ariable X is defined ϕ X ( ω ) = E [ e iω T X ] , where E [ · ] denotes expectation. Using the characteristic function to represent the distributions in- volv ed in (12), changes the con volution from an integral to a product. The characteristic functions of the residual and noises are, ϕ r k ( ω ) = Z ∞ −∞ f r k ( x ) e iω T x dx, (13) ϕ η ( ω ) = Z ∞ −∞ f η ( x ) e iω T x dx, (14) ϕ v ( ω ) = Z ∞ −∞ f v ( x ) e iω T x dx. (15) Note that the characteristic function of an affine transforma- tion of a random variable Y = QX + R is ϕ Y ( ω ) = e ( iω T R ) ϕ X ( Q T ω ) , thus ϕ A κ η ( ω ) = ϕ η ( A T κ ω ) and ϕ B κ v ( ω ) = ϕ v ( B T κ ω ) . Therefore, ϕ r k ( ω ) is a product of characteristic functions for transformed system and measurement noises, ϕ r k ( ω ) = k Y κ =1 ϕ η  A T κ ω  × k − 1 Y κ =1 ϕ v  B T κ ω  . (16) Using the Gaussian mixture model with the characteristic function representation makes it possible to expand the sys- tem noise and measurement noise, e.g., for the measurement noise ϕ η ( ω ) = m 1 X j =1 p η j Z ∞ −∞ N ( x | µ η j , K η j ) e iω T x dx, (17) = m 1 X j =1 p η j e ( iω T µ η j − 1 2 ω T K η j ω ) , (18) where the simplification from (17) to (18) comes from identi- fying that the integral in (17) is the characteristic function of a Gaussian distribution which has a closed form expression [10]. If we replace the variable ω with its transformed A T κ ω and perform the same process for the system noise, ϕ η ( A T κ ω ) = m 1 X j =1 p η j e ( iω T A κ µ η j − 1 2 ω T A κ K η j A T κ ω ) , (19) ϕ v ( B T κ ω ) = m 2 X j =1 p v j e ( iω T B κ µ v j − 1 2 ω T B κ K v j B T κ ω ) . (20) Substituting these expressions into (16) rev eals that ϕ r k ( ω ) is a linear combination of m k = m k 1 m k − 1 2 exponential terms ϕ r k ( ω ) = m k X j =1 τ j e ( iω T β j − 1 2 ω T Θ j ω ) (21) where the expressions for β j , Θ j , and τ j in the statement of Lemma 2 can be deri ved by substituting (19)-(20) and expanding the product in (16). Con veniently , (21) is in the form of the characteristic function of a linear combination of Gaussian functions, thus (21) demonstrates that the residual distribution at time step k can be expressed as f r k ( x ) = m k X j =1 τ j N ( x | β j , Θ j ) , (22) where τ j represents the mixture probabilities of the m k Gaussian modes, β j are the means of the modes, and Θ j are the cov ariances of the modes. In the absence of anomalies, the time dependence of the residual is gov erned by the conv ergence of the estimator , since the system is time-inv ariant and the noises are iid. In most practical situations the estimator is designed to con v erge relati vely quickly and so it is reasonable to assume for the rest of this work that sufficient con v ergence of the estimation has already been achiev ed. Thus we seek the steady state distribution of the residual, which permits some simplification by removing the dependence on time. Lemma 3: If the system is stable, then the distrib ution of the residual con verges to f r ∞ as k → ∞ , i.e., giv en some error tolerance  ∈ R > 0 there exists a k ∗ ∈ N such that k f r ∞ − f r k ∗ k < , (23) implying that f r := f r k ∗ provides an arbitrarily close approximation of the steady state distribution. Pr oof: T o not belabor an intuiti ve result, we provide a sketch of the proof without explicitly writing all details regarding the conv er gence. Because the system is stable ρ ( F − LC ) < 1 , which means lim κ →∞ A κ = O p × p , and lim κ →∞ B κ = O n × p , where O is the zero matrix of the designated size. Therefore, in steady state the characteristic functions characterizing the system and measurement noise become lim κ →∞ ϕ η ( A T κ ω ) = lim κ →∞ m 1 X j =1 p η j e ( iω T A κ µ η j − 1 2 ω T A κ K η j A T κ ω ) = m 1 X j =1 p η j = 1 , (24) lim κ →∞ ϕ v ( B T κ ω ) = lim κ →∞ m 2 X j =1 p v j e ( iω T B κ µ v j − 1 2 ω T B κ K v j B T κ ω ) = m 2 X j =1 p v j = 1 . (25) The con v ergence of these characteristic functions imply the con v ergence of the characteristic function of the residual to ϕ r ∞ ( ω ) such that for a gi ven ˜  ∈ R > 0 there exists a k ∗ such that | ϕ r ∞ ( ω ) − ϕ r k ∗ ( ω ) | < ˜ , (26) where ϕ r k is defined as in (16). This provides an approxi- mation for the steady state that is made arbitrarily accurate by selecting an arbitrarily small ˜  (and hence large k ∗ ), ϕ r ∞ ( ω ) = ∞ Y κ =1 ϕ η ( A T κ ω ) × ∞ Y κ =1 ϕ v ( B T κ ω ) (27) ≈ k ∗ Y κ =1 ϕ η ( A T κ ω ) × k ∗ − 1 Y κ =1 ϕ v ( B T κ ω ) . (28) As mentioned earlier , k ∗ can be interpreted as the settling time of the control system and estimator . As before, the char- acteristic function of the residual corresponds to a probability density function that is composed of the sum of multiple Gaussian modes - a GMM. This distribution can also be made arbitrarily accurate by selecting a smaller ˜  which corresponds to a larger k ∗ and a smaller  in (23), f r ∞ ( x ) ≈ f r ( x ) := f r k ∗ ( x ) = m X j =1 π j N ( x | µ j , K j ) , (29) where m = m k ∗ 1 m k ∗ − 1 2 is the number of Gaussian modes used to represent the steady state residual distribution. As before, the values of π j , µ j , and K j are the mixture probabilities, means, and cov ariances, respectively , of the m Gaussian modes and are computed by substituting (19)-(20) and expanding the product in (28). B. Distance Measure & F alse Alarm Rate T uning Most detectors construct a scalar-valued distance measure from the residual to quantify how dif ferent the measurement is from what is expected. In this paper we use the generalized chi-squared detector in (4) where µ is the overall mean v alue and Σ is the ov erall cov ariance of the steady state residual. From (29), µ = m X j =1 π j µ j and Σ = m X j =1 π j K j + γ j γ T j (30) where γ j = µ − µ j is the difference between the individual GMM (mode) means and overall mean. Our aim in this section, and ultimately of this paper , is to characterize the expected rate of false alarms giv en a chosen threshold v alue, α , of the detector . Recall that alarms are generated if z k > α for any k ∈ N . Thus the probability of drawing a distance measure value from its distribution that leads to an alarm is P ( z k > α ) . Figure 2 plots the multi variate, scalar-v alued distance measure function z k ov er the domain of the residual r k , in the two dimensional ( p = 2 ) case. Due to the quadratic form of the distance measure in (4), the surface is a paraboloid. The region D is the area contained by the projection of the lev el set z k = α onto the r k plane. Theorem 1 is the generalized version of Lemma 1 for tuning the M j = det C p (2 π ) p det K j Z 2 π 0 Z π 0 · · · Z π 0 Z √ α 0 e ( C T ρ k + γ j ) T K − 1 j ( C T ρ k + γ j ) d ρ d ρ = | ρ k | p − 1 sin ( φ 1 ) p − 2 sin ( φ 2 ) p − 3 · · · sin ( φ p − 2 ) d | ρ k | dφ 1 dφ 2 ... dφ p − 1 ( ? ) Fig. 2: The 3D surface of the residual GMM probability density function (shaded to rev eal height mapping) constructed from m = 3 Gaussian modes with means µ j denoted by blue dots. The quadratic form of the distance measure z k forms a paraboloid over the domain of the residual r k centered at µ (the mean of the residual distribution). The cumulative probability P ( z k ≤ α ) corresponds to the amount of probability in the residual distrib ution within the region D (33) (green area), the area corresponding to the domain formed by projecting the z k = α lev el set of the paraboloid (red line). The probability P ( z k ≤ α ) = P j π j M j , where M j is the volume of probability contributed by the Gaussian mode j . In Section IV, we show that we can interpret the generalized detector as the probabilistic combination of m chi-squared detectors, where the j -th detector is tuned such that the false alarm rate is A j = 1 − M j , which means the threshold α j is selected such that the volume of probability contained within the region D j (38) (the projection of the α j -lev el set of the paraboloid centered at µ j ) is equal to M j . detector for a desired level of performance (desired false alarm rate) in the case that the system and measurement noises are no longer Gaussian. Theor em 1: Assume that there are no anomalies present in an L TI system (1) driv en by arbitrary system and mea- surement noises such that the residual r k ∼ f r (29), i.e., an m -th order Gaussian mixture model, and consider the generalized chi-squared detector (4) with threshold α ∈ R > 0 . The expected false alarm rate is A = 1 − P ( z k ≤ α ) = 1 − m X j =1 π j M j , (31) where M j is given by ( ? ), in the box above. Pr oof: The false alarm rate is the probability that z k exceeds the threshold α , A = P ( z k > α ) = 1 − P ( z k ≤ α ) . The cumulati v e probability of the nonnegati ve distance measure is giv en by P ( z k ≤ α ) = Z α 0 f z k ( z ) dz = Z Z D f r ( r ) d r , (32) where d r is the differential area element over the region D = { r k | z k = ( r k − µ ) T Σ − 1 ( r k − µ ) ≤ α } , (33) which is in general a p -dimensional ellipsoid and whose boundary is defined by the projection of the lev el set z k = α onto the r k plane, and threshold α is the assigned threshold of the generalized chi-squared detector . Effecti vely , (32) expresses that the probability of having z k ≤ α is equal to the volume under the f r distribution, restricted to the r k values that generate a z k value less than or equal to α . Since f r is composed of a mixture of Gaussian modes, Fig. 2 depicts how this formulation sums the volume of probability contained under the various Gaussian modes and within the region D . Replacing f r in equation (32) e xplicitly expressing the m GMM modes leads to P ( z k ≤ α ) = 1 p (2 π ) p m X j =1 π j p det K j Z Z D e ( r k − µ j ) T K − 1 j ( r k − µ j ) d r . In order to write this equation in terms of the assigned threshold on the detector , we write it in normalized spherical form, changing the v olume element d r to the volume element d ρ which is a volume element ov er an p -sphere characterized with radius √ α ; hence, r k − µ = C T ρ k → r k − µ j = C T ρ k + γ j , and d ρ = d r det C , where ρ k is a vector v arying inside the p -sphere and is an affine transformation of the steady state residual ρ k = C − T ( r k − µ ) and the matrix C is the Cholesky decomposition of co v ariance Σ . This transformation simplifies the new region of integration to be a p -sphere. Making this substi- tution and expressing the limits of integration in spherical form yield the final from in (31) and ( ? ), where ρ k in spherical form is a function of its norm | ρ k | and angles φ i , i = 1 . . . p − 1 , where p is the dimension of the measurement. Remark 2: Now that we have the relationship (31) and ( ? ) between assigned threshold of the detector α and the corresponding CDF for the distance measure in the no fault/attack case P ( z k < α ) , we can use this result to find the threshold α that provides a desired false alarm rate. There is a mapping between the threshold α and the false alarm rate A . Armed with Theorem 1, it is possible to use, for example, a bisection method to find the threshold value to yield a desired false alarm rate. I V . I N T E R P R E T A T I O N Using an approach that leverages the Gaussian mixture model representation of arbitrary noise distrib utions not only recov ers analytic tractability , it also pro vides an intuitive and instructiv e backwards compatibility with the standard chi- square detector driv en by Gaussian noise. Suppose that in using the tools presented in this paper , we find that the GMM of the residual distribution is the combination of three distinct Gaussian modes (e.g., see Fig. 1), f r ( x ) = 3 X j =1 π j N ( x | µ j , K j ) , (34) where µ j are the mean v alues of the GMM modes, K j are the corresponding covariances, and π j are the mixing probabilities. A way to interpret the GMM residual probability distri- bution (34) is that at each time k the residual r k is drawn from the first GMM mode with probability π 1 , drawn from the second GMM mode with probability π 2 , and drawn from the third GMM mode with probability π 3 . For each of these modes of the residual GMM, we can rev erse engineer a hypothetical Gaussian measurement noise that if applied to the system in isolation would generate a Gaussian residual equal to that mode of the residual GMM. This hypothetical Gaussian noise would have mean a j and covariance C j , a j = E − T µ j and C j = E − 1 K j E − T , (35) for j ∈ { 1 , 2 , 3 } and the matrix E is defined as E = ∞ X κ =1 A κ . If the measurement noise was Gaussian with mean a j and cov ariance C j , j ∈ { 1 , 2 , 3 } , then we could tune a conv en- tional chi-squared detector using Lemma 1 to determine a threshold α j to yield a false alarm rate A j . If further , we selected A j = 1 − M j , where M j is defined in ( ? ), then (31) becomes A = 1 − π 1 (1 − A 1 ) − π 2 (1 − A 2 ) − π 3 (1 − A 3 ) , (36) = π 1 A 1 + π 2 A 2 + π 3 A 3 , (37) 0 0.05 0.1 0.15 -15 -10 -5 0 5 10 15 0 0.05 0.1 0.15 Empirical GMM Fig. 3: The measurement noise distribution (top) is approxi- mated by a Gaussian mixture model with six modes and leads to a corresponding complex residual distribution (bottom). The GMM approximations agree well with the empirical distributions from Monte-Carlo simulations. since π 1 + π 2 + π 3 = 1 . In this context, then M j can also be interpreted the probability under the j -th Gaussian mode distribution (characterized by µ j and K j ) ov er the inte gration region D j := { r k | z k = ( r k − µ j ) T K − 1 j ( r k − µ j ) ≤ α j } , (38) which is defined by the lev el set z k = α j of the paraboloid characterized by µ j and K j . Thus M j and α j are related according to Lemma 1 α j = 2 P − 1  M j 2 , p 2  . (39) This interpretation is depicted in Fig. 2. V . E X A M P L E Consider a single output system and estimator character- ized by the following matrices and driven by measurement noise (no system noise) distributed according to the proba- bility density function sho wn in Fig. 3, F =  0 . 8 0 . 2 − 0 . 25 0 . 1  , C =  0 . 5 0 . 5  , L =  0 . 3 − 0 . 3  . W e select a threshold for the detector α = 0 . 75 and use Theorem 1 to calculate the expected false alarm rate for this threshold. Figure 3 sho ws the fit of the GMM of the measurement noise compared to the empirical distribution attained from a Monte-Carlo simulation with 5 × 10 6 samples. Here we select a mixture of six ( m 1 = 6 ) Gaussian modes and T able I presents the means and cov ariances of each mode. One of the challenges of our method is the possibility of having a large number of terms in the residual distribution 1 2 3 4 5 6 p η 0.0847 0.2012 0.1184 0.3200 0.1889 0.0869 µ η -7.0877 -4.4709 -2.0082 1.2318 4.5240 7.0504 K η 2.1997 0.4471 0.2062 1.0392 0.3858 2.2329 T ABLE I: The coefficients, means, and cov ariances of the GMM modes ( m 1 = 6 ) of the measurement noise in Fig. 3. 0123456 0 0.25 0.5 0.75 1 0.522 GMM Chi-squared Empirical Fig. 4: The cumulati ve distrib ution function of the distance measure corresponding to the measurement noise in Fig. 3 evidencing noticeable de viations away from a comparable chi-squared distribution (which would correspond to Gaus- sian measurement noise). The GMM distribution agrees very well with the empirical distribution found through Monte- Carlo simulation. For a threshold α = 0 . 75 , the GMM distribution, using Theorem 1, predicts a false alarm rate of A = 1 − 0 . 522 = 0 . 478 . The empirical distribution has a false alarm rate of A = 1 − 0 . 516 = 0 . 484 . GMM: selecting the settling time k ∗ = 10 , m = m k ∗ 1 = 6 10 since we have no system noise. In practice, howe ver , many of these terms have mean and cov ariance v alues that are extremely similar . W e can then greatly simplify the GMM expression by aggregating terms that are roughly the same. T o do this we define a threshold on the normed difference between mean values and cov ariance v alues, d µ and d K . If the normed dif ference between mean values and cov ariance values for any pair of modes is less than these thresholds, k µ i − µ j k ≤ d µ and k K i − K j k ≤ d K , for i 6 = j ∈ { 1 , . . . , m 1 } , we consider those terms the same and their coefficients are added together . This procedure pro- vides an arbitrarily accurate approximation with significantly fewer terms. Here we select d µ = 0 . 0747 and d K = 0 . 0917 using a heuristic based on the spread of the distribution and these choices lead to e m = 282 terms that remain, signifi- cantly fewer than the original m = 6 10 . Using this reduced number of modes, the GMM expression of the distribution of residual is sho wn in Fig. 3 and compares fa vorably with the empirical residual distrib ution computed (through Monte- Carlo simulation) from the true noise distribution. By using Theorem 1, we calculate the expected false alarm rate A = 1 − P ( z k ≤ 0 . 75) = 0 . 478 This expected false-alarm rate is shown in Fig. 4 and the value of A compares well with the value attained through Monte-Carlo simulation ( 0 . 484 ). V I . C O N C L U S I O N In this paper , for discrete-time L TI systems subject to arbitrary sensor and measurement noise, we provide tools to tune model-based detectors and characterize the trade-off between true and false positi ves. W e ha ve generalized one of the most widely used fault detector , the chi-squared detector , for use with general noise distributions. Our approach uses a Gaussian mixture model expression of the disturbances which preserves some of the appealing analytic tractability of working with Gaussian noises on an L TI system. R E F E R E N C E S [1] Z. Guo, D. Shi, K. H. Johansson, and L. Shi, “Optimal Linear Cyber- Attack on Remote State Estimation, ” IEEE T ransactions on Control of Network Systems , vol. PP , no. 99, pp. 1–10, 2016. [2] Y . Mo and B. Sinopoli, “On the performance degradation of cyber - physical systems under stealthy integrity attacks, ” IEEE Tr ansactions on A utomatic Contr ol , vol. 61, pp. 2618–2624, 2016. [3] C. Z. Bai, F . Pasqualetti, and V . Gupta, “Security in stochastic control systems: Fundamental limitations and performance bounds, ” in American Control Conference (ACC) , 2015, pp. 195–200. [4] C. Murguia and J. Ruths, “Characterization of a cusum model-based sensor attack detector, ” in proceedings of the 55th IEEE Conference on Decision and Contr ol (CDC) , 2016. [5] ——, “Cusum and chi-squared attack detection of compromised sensors, ” in pr oceedings of the IEEE Multi-Confer ence on Systems and Contr ol (MSC) , 2016. [6] J. Chen and R. J. Patton, Robust Model-based F ault Dia gnosis for Dynamic Systems . Norwell, MA, USA: Kluwer Academic Publishers, 1999. [7] K. Astrom and B. Wittenmark, Computer -contr olled Systems (3rd Ed.) . Prentice-Hall, Inc., 1997. [8] A. G. Bacharoglou, “ Approximation of probability distributions by complex mixtures of gaussian measures, ” Proceedings of the American Mathematical Society , vol. 138, no. 7, pp. 2619–2628, jul 2010. [9] L. Xu and M. I. Jordan, “On conv ergence properties of the em algorithm for gaussian mixtures, ” Neural Computation , vol. 8, no. 1, pp. 129–151, 1996. [10] H.Stark and J.W .W oods, Probability , Statistics, and Random Processes for Engineers, 4th Edition . Pearson Education,Inc., 2014.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment