Detection of Cyber-Physical Faults and Intrusions from Physical Correlations

Detection of Cyber -Ph ysical F aults and Intrusions fr om Ph ysical Correlations Andrey Y . Lokhov Center f or Nonlinear Studies and Theoretical Division T -4 Los Alamos National Laboratory Los Alamos, NM 87545 lokhov@lanl.go v Nathan Lemons Theoretical Division T -5 Los Alamos National Laboratory Los Alamos, NM 87545 nlemons@lanl.gov Thomas C. McAndrew Depar tment of Mathematics and Statistics and V ermont Comple x Systems Center University of V ermont Burlington, V er mont 05405 thomas.mcandre w@uvm.edu Aric Hagberg Theoretical Division T -5 Los Alamos National Laboratory Los Alamos, NM 87545 hagberg@lanl.gov Scott Backhaus Materials Physics and Applications Division Los Alamos National Laboratory Los Alamos, NM 87545 backhaus@lanl.go v ABSTRA CT Cyber-physical systems are critical infrastructures that are crucial b oth to the reliable deliv ery of resources such as en- ergy , and to the stable functioning of automatic and con- trol arc hitectures. These systems are composed of in terde- pendent physical, control and communic ations netw orks de- scribed by disparate mathematical models creating scien tiﬁc c hallenges that go well b ey ond the modeling and analysis of the individual netw orks. A k ey c hallenge in cyber-physical defense is a fast online detection and lo calization of faults and in trusions without prior kno wledge of the failure t yp e. W e describe a set of tec hniques for the eﬃcient identiﬁca- tion of faults from correlations in physical signals, assuming only a minimal amoun t of a v ailable syst em in formation. The performance of our detection method is illustrated on data collected from a large building automation system. CCS Concepts • Information systems → Data stream mining; • Securit y and priv acy → Intrusion dete ction systems; K eywords Cyber-physical systems; critical infrastruct ures; outlier de- tection; in trusion localization Permission to make digital or hard copies of all or part of this w ork for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full cita- tion on the ﬁrst page. Copyrights for components of this work owned by others than A CM must be honored. Abstracting with credit is permitted. T o copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and /or a fee. Request permissions from permissions@acm.org. ACM SIGKDD W orkshop on Outlier Deﬁnition, Detection and Description on Demand (ODD 4.0) August 13-17, 2016, San Fr ancisco, California, USA c  2016 A CM. ISBN 978-1-4503-2138-9. DOI: 10.1145/1235 1. INTR ODUCTION Cyber-physical systems are physical net work s, gov erned b y the laws of ph ysics, but regulated by a con trol system coupled to computer netw orks that transmit the informa- tion required to optimize and con trol the ph ysical net works for reliabilit y and eﬃciency [30, 34]. Examples include, but are not limited to, smart grids, gas pip elines, civil infras- tructures, autonomous automotive systems, automatic pilot a vionics and process con trol systems. The interdependence of the cyb er and ph ysical netw orks mak es the combined sys- tem more vulnerable to attacks; manipulation of the com- puter control netw ork can leverage cyb er-ph ysical capabil- ities to cause damage or signiﬁcantly degrade the p erfor- mance of the critical infrastructure [6, 21]. The ability to detect and lo calize failures or attacks rep- resen ts an imp ortant step tow ards the design of resilient cyber-physical net w orks and strategies for implemen tation of certiﬁcates for prop ortional response. It is natural to ex- pect that indications of in trusion or misb eha vior in the cyb er subsystem are present as anomalies in the physical net work. This fact can b e used for searching for outliers in the data streams collected by the sensors monitoring the state of the ph ysical system – a well-studied problem in a wide range of application domains [19]. Although anomalous changes in individual signals can be an indication of a ma jor failure or a crude attack, they do not capture more sophisticated sce- narios of coordinated in trusions. Therefore, it is important to take into accoun t information from the spatiotemporal correlations of anomalies of individual signals. Moreo ver, exploiting these correlations migh t enable probabilis tic lo- calization of the intruder or failure within the netw ork, and hence serv e as a basis for building a prop er resp onse. W e study the problem of detection and localization of dis- turbances based on the analysis of spatiotemporal correla- tions b et w een physical data streams. Our goal is to develop eﬃcien t methods for the detection and lo calization of fail- ures within the cyber-physical system without reference to a predeﬁned attack vector. F ailure ev en ts can be v ery di- v erse, while attacks become more and more creative and so- phisticated, so the detection metho dologies cannot b e based on scripted scenarios. In addition, detection methodologies whic h do not exploit prior knowledge of the top ology of the ph ysical net work will ha ve a broader range of application. Therefore, w e delib erately do not incorporate an y sp eciﬁc aspects of the physical system arc hitecture in the algorithm design. F urther desired requiremen ts for detection and local- ization algorithms include scalability (the num b er of signals and time measuremen ts can p oten tially b e very large), gen- eralit y (w e assume that the signals are heterogeneous and of div erse nature), robustness (the signals can b e noisy and incomplete) and lo w computa tional complexit y (to allo w de- plo yment of the algorithm in a fast online fashion). Cyber-physical intrusion detection and resp onse method- ologies will improv e at m uch faster rates when the devel- opmen t and reﬁnement is closely coupled with real-world experimentation that v alidates strengths and rev eals w eak- nesses. The simplicit y and generality of the detection algo- rithms are very important since they will allow for deploy- men t in diﬀerent cyber-physical systems. In this pap er, w e test our tec hniques on sp eciﬁc real-w orld data from an au- tomated HV A C system in a large building at Los Alamos National Lab oratory (LANL). W e are planning to deploy and exp erimentall y v alidate these methods on sev eral other cyber-physical systems of imp ortance to LANL. W e presen t a general proto col for detection and localiza- tion of disturbance which meet most of the aforemen tioned requiremen ts. First, w e develop a simple pro cedure for con- structing a sp ecial correlation matrix out of detrended het- erogeneous signals, making some assumptions on the anomaly signature we would like to be able to capture. Then, w e use the correlation matrix to solve three crucial tasks: i) detec- tion of the anomaly using sp ectral metho ds; ii) lo calization of a subset of anomalous nodes within the system using low- rank appro ximations and biclustering metho ds; iii) ﬁnally , iden tiﬁcation of the functional role of the inferred anomaly based on the sensor labels. W e v alidate our framework on experimental real-world data collected from a building au- tomation system at LANL. 2. TIME SERIES AN AL YSIS AND CORRE- LA TION MA TRIX CONSTRUCTION W e consider the problem inv olving data from N physical sensors indexed by V . F or each sensor i ∈ V w e are given a time series X i ( t ) collected at times t ∈ T . The data X i ( t ) can b e heterogeneous real or integer v alued signals and pro- vides a (partial) description of a system. W e assume that the spatial and temporal relationships b et w een the sensors are unknown, but that w e do hav e access to sensor lab els. W e also assume that the ﬂuctuations of each time series in the system around their mean behavior during normal op- erations are essentially indep endent . F ormally , we say that during normal operations the ob- serv ations X i ( t ) can be mo deled as X i ( t ) = Y i ( t ) + N i ( t ) + S i ( t ) , (1) where the N i ( t ) represent the uncorrelated random noise, S i ( t ) is a p oten tial signal of attack or failure (correlated b e- t ween sensors) whic h is absent during normal op erations, and Y i ( t ), which w e call the trace, describ es the idealized operation of the system without noise. When the system is attack ed or exp eriences a fault the aﬀected parts of the system U ⊂ V are expected to mo v e aw a y from the trace, S i ( t ) 6 = 0 for i ∈ U . W e are interested in those cases when the signal is nonzero for a signiﬁcant subset of sensors. It ma y o ccur that for each individual sensor the failure sig- nal is not directly observ able, but that it can b e detected and becomes statistically signiﬁcant when the subset of af- fected sensors are tak en into accoun t collectively . In these cases, the diﬀerences b etw een the trace and the corresp ond- ing observ ations will b ecome related. In other w ords, since the S i ( t ) v alues corresponding to a particular disturbance ev ent are likely to b e correlated, we exp ect that the correla- tion relations will b ecome apparent in the detrended signals X i ( t ) − Y i ( t ) if the signal (e.g. attac k or failure) o ccurs at t = τ and lasts for T time steps. Our goal is to construct a suitable correlation matrix out of these time series which will enable the detection and lo calization of the undesirable c hanges in system state. 2.1 Detrending the Signals Unfortunately , the traces Y i ( t ) are a priori unkno wn. In some cases they can be learned from an ensemble of rep eat- ing op erations under normal behavior, but here w e assume that this data migh t b e una v ailable. Thus we approximate the traces with a running mean, ¯ X i ( t ) := 1 τ av t + τ av / 2 X t 0 = t − τ av / 2 X i ( t 0 ) , (2) cen tered at t . This is a reasonable assumption if the traces Y i are fairly smooth: in this case, ˆ X i ( t ) are smo othed using the p oin ts X i ( t 0 ) for t 0 = t − τ av / 2 to t + τ av / 2. This will not b e a go od assumption if the system c hanges mo des of operation or otherwise undergo es rapid changes within the in terv al [ t − τ av / 2 , t + τ av / 2]. Note that although the use of the cen tered running mean requires the knowledg e of the signal in the future, it pro- duces b etter results with resp ect to the approach where the trailing m ean is emplo yed. (A cen tered roll ing mean appro x- imates the trace with a linear function, while the trailing mean approximates the trace with a constant function.) At the same time, an online detection algorithm based on the cen tered mean will hav e a time-lag of τ av / 2. There is hence a trade oﬀ b etw een the quality of approximation and the speed of detection. It seems in tuitive that the choic e of smaller τ av w ould in tro duce a smaller time-lag, and thus would lead to better results. On the other hand, τ av should b e large enough to a verage out the small ﬂuctuations caused by the terms N i ( t ). A similar argumen t implies that τ av should b e chosen to be close in size to the exp ected duration of an attac k or fault signal one w ould like to b e able to detect: if τ av is m uch larger than this scale, the signal will be lik ely to be a veraged out. In practice, there is often a range of reasonable c hoices for the length τ av of the sliding window ; one should c ho ose the one which satisﬁes the requirements on a desired maxim um time-lag of detection. 2.2 Construction of the Correlation Matrix W e calculate correlation matrices from the residuals (an example is depicted in Figure 1) of the detrended data streams R i ( t ) := X i ( t ) − ¯ X i ( t ) . (3) A t this point, one more parameter, the time in terv al τ corr o ver which correlations are calculated, must b e chosen. Ide- ally , this time window should b e at least as large as the duration of the even t we w ould like to detect. This time length, in general, is application dep enden t; t ypically , we are interested in the time scales which are a lo w multiple of τ av . Th us if the correlation windo w is determined to b e of length τ corr , w e calculate the Pearson correlation coeﬃcient for eac h pair, ξ ij ( t ) := P ( R i ( t 0 ) − µ i,t ) ( R j ( t 0 ) − µ j,t ) q P ( R i ( t 0 ) − µ i,t ) 2 q P ( R j ( t 0 ) − µ j,t ) 2 , (4) where eac h sum is tak en from t 0 = t − τ corr to t and µ i,t := 1 τ corr t X t 0 = t − τ corr R i ( t ) . (5) This gives us the desired correlation matrix M ij ( t ) = ξ ij ( t ) at eac h time instance. W e are not interested in detecting the self-correlations whic h are trivially equal to one, so we put b y deﬁnition ξ ii ( t ) = 0 ∀ i ∈ V . 10:00 11:00 12:00 10:30 11:30 t − 0 . 1 0 . 0 0 . 1 R (a) 10:00 11:00 12:00 10:30 11:30 t − 0 . 1 0 . 0 0 . 1 R (b) 10:00 11:00 12:00 10:30 11:30 t − 0 . 1 0 . 0 0 . 1 R (c) Figure 1: Three residuals from a typical signal stream. Signal (a) has S ( t ) = 0 while signals (b) and (c) hav e correlated S ( t ) 6 = 0 due to an attack or failure. The attack starts at appro ximatly 11:00 and some correlation can b e observ ed b et w een (b) and (c). The goal is to ﬁnd and iden tify suc h correlated signals among the man y recorded signals. With our setup under normal operations, when the data streams can be mo deled as in Equation (1) with S i ( t ) = 0, w e exp ect the detrended data streams to be uncorrelated, ∀ i 6 = j, E [ ξ ij ( t )] = 0 . (6) Ho wev er, during an attac k or failure we exp ect there to be a set of sensors U ⊂ V such that S i ( t ) 6 = 0 for i ∈ U , and hence ∀ r 6 = s, r, s ∈ U, E [ ξ rs ( t )] = σ rs > 0 , (7) since the non-zero signals S i ( t ) , i ∈ U of the attac k are supposed to hav e a similar b ehavior. 3. DETECTION AND LOCALIZA TION OF ANOMALOUS SUBMA TRIX In this section, we presen t a proto col for detecting and lo- calizing a group of anomalously b eha ving devices within the ph ysical netw ork. F ormulating the problem in the frame- w ork of submatrix lo calization, the detection step is done b y monitoring the sp ectral gap in the correlation matrix spectrum. This metho d is universal and does not require an y prior assumptions on the form of the noise and on par- ticular normalization of the correlation matrix. W e explore three approac hes to the localization of the anomalous n o des: sparse PCA based on a lo w-rank appro ximation, and t w o bi- clustering metho ds for ﬁnding a submatrix with an elev ated mean v alue. 3.1 Detection of Anomalous Submatrix Under normal conditions and low noise, the correlation matrix of the ph ysical system migh t con tain some structu ral information ab out the topology of the system. F or instance, w e can exp ect comm unities represen ting common functional roles or spatial lo cations of devices to hav e strong corre- lations. All other matrix elemen ts should app ear as noisy and uncorrelated v alues ﬂuctuating around zero. When an anomaly occurs under the assumptions of Section 2 with a strong enough signal, one should witness the emergence of one single submatrix with a higher mean v alue. As in the problem of detecting a single comm unit y in a graph [15], the c hange in the correlation matrix induced by the anomalous signal should b e also visible in the sp ectrum of the corre- lation matrix. In the ideal case, if the comm unit y is large enough, there is a sp ectral gap b etw een the ﬁrst and the second largest eigen v alues, and in addition, the principle eigen vector contains information ab out the lo cation of the comm unity . W e use the idealized case to gain intuition ab out the behavior of the real world system. This intuition for the correlation matrices constructed from the real signals comes from rigorous analysis for ideal noise, whic h also illustrates the concept of a “suﬃciently strong sig- nal” used abov e. As an example, consider a rank-1 matrix with eigenv alue θ , P = θ uu T , and suppose that w e observ e this matrix corrupted by a noise taking the form of a nor- malized N × N Gaussian Wigner matrix W , with zero-mean elemen ts and v ariance of the oﬀ-diagonal elements equal to 1 / N 2 . It is well kno wn that the sp ectrum of W con v erges to the semi-circle law with support [ − 2 , 2]. Let us denote the largest eigen v alue asso ciated with the measurement matrix P + W as λ 1 , and the corresponding eigen vector as u 1 . De- pending on the “signal strength” θ , the v alues of the largest eigen v alue and eigen vector of P + W undergo a phase tran- sition [2]. If θ > 1, then in the large N limit λ 1 → 1 + 1 /θ is clearly separated from the bulk, and |h u, u 1 i| → 1 − 1 /θ 2 . In the opposite case θ ≤ 1, λ 1 → 2 and the associated eigen vec- tor do es not carry any useful information, b eing completely degraded b y the noise, with |h u, u 1 i| → 0. Similar results hold for the case of multiplicativ e noise. In a t ypical real-world situation, the sp ectrum of the cor- relation matrix in the presence of an anomalously correlated group of devices has a form presented in the main part of Figure 2. There is a clear gap, separating tw o largest eigen- v ectors λ 1 and λ 2 , and the nonzero v alues of eigen v alues λ i for i ≥ 2, sorted b y the ord er of magnitude, is en tirely due to the noise. In the case of a weak signal, how ev er, the picture can be similar to the inset of Figure 2, where the presence of the sp ectral gap ∆ 1 = λ 1 − λ 2 does not seem to b e so ob vious. The imp ortan t question is ho w to decide whether the gap is statistically signiﬁcan t. The c hallenge here is that we do not assume any prior information on the statistics of the trace and on the noise distribution; this setting has not been well studied in the literature so far. T o address this question, w e suggest the follo wing detection criterion. Let ∆ i = λ i − λ i +1 be the collection of spacings betw een suc- cessiv e eigen v alues of the correlation matrix. F ollowing the assumption that the nonzero v alues of all eigenv alues but the largest one are en tirely due to a random noise, we can empirically estimate the corresp onding characteristic noise scale as δ = s 1 N − 2 X 1 ∆ 2 + δ. (9) W e count the opposite case as an absence of detection. The v alidit y of this detection criterion will b e c heck ed in the Section 4 inv olving real data examples. n λ n 0 Δ 1 δ Δ 2 0 Δ 1 Δ 2 δ st r ong si gnal weak s igna l Figure 2: A represen tation of a typical spectrum of a real-w orld correlation matrix in the presence of an anomaly (main ﬁgure) and with a weak anomalous signal (inset). In the ﬁrst case, the condition (9) is satisﬁed, and hence w e consider the outcome of the detection te st as positive. In the case of w eak signal, the level of noise do es not allow us to conclude that an anomalous comm unity of devices is present. 3.2 Localization Using the Low-rank Appr ox- imation Once the detection certiﬁcate presented in Subsection 3.1 yields a positive result, the next step is to localize the anoma- lously correlated elements of the system. The K communi- ties detection problem is often addressed using the low rank appro ximation [10]. In our case, a signiﬁcant sp ectral gap ∆ 1 indicates that the hidden matrix can b e lo calized by looking at the b est rank 1 approximation c M of the initial matrix M , c M = arg min c M k M − c M k F s.t. rank( c M ) = 1 , (10) where k · k F is the F robenius norm. The solution to this problem is well-kno wn and is given by the singular v alue decomposition (SVD) of the matrix M , from whic h we re- tain only the leading singular v alue σ and the corresp onding singular v ector q [14]: c M = σ q q T . (11) Unfortunately , in general the resulting v ector q is not sparse, whic h does not allow us to iden tify the location of the anoma- lous no des. Ideally , for detecting a group containing k anoma- lous no des, we would like to obtain a vector with only k nonzero components, indicating their p ositions; this prob- lem is often referred to as sparse PCA [11]. While under a general lo w-rank assumption this problem is NP-hard, for the special case of rank 1 it can be solv ed analytically simply b y sorting the elemen ts of q , and retaining only k larg est ele- men ts [28, 37], resulting in a k -sparse vector that we denote as q k . The constant in the expression for c M is then simply giv en by σ k = q T k M q k . Another diﬃcult y comes from the fact that a priori we do not know the size of the anomalous mo dule. Sometimes, in order to ﬁnd the optimal v alue of k , the so-called elb ow method can b e used [35]. The idea is fairly simple; ﬁnd the minimal k such that the quality of approximation ε k ≡ k M − σ k q k q T k k F is not increased “to o muc h” when we make a step from k to k + 1. More precisely , the optimal k is given b y the minimal k such that ε k − ε k +1 < , (12) where  is some small constant, and the only parameter of the algorithm. The total complexity of the method is dom- inated by the complexity of the SVD-decomp osition and is O ( N 3 ) in the most general case. W e exp ect the nonzero v alues of q k for the optimal k to indicate the lo cation of the no des pro ducing anomalous cor- relations. Ho wev er, in the examples in volving real data, the cusp on the elbow diagram migh t be not very pronounced in hard cases (see Figure 3 for an example), therefore, in prac- tice it can b e unclear how to select an appropriate  and hence how to apply the condition (12). At the same time it should b e noted that at the end of the day w e are not necessarily in terested in inferring the whole set of anoma- lous no des, but rather in understanding the cause of the anomaly . In this sense, one can choose to infer only a subset of anomalous sensors, but requiring a high level of conﬁ- dence for this localization task; then the idea is to search for a subset of k ∗ strongly correlated no des. How ever k ∗ can not be arbitrary small. Indeed, even in the idealized case there exist a practically achie v able low er b ound on the size of detectable comm unit y [20, 12] k & √ N . That is why the ﬁnal suggested strategy consists in searc hing for a subset of most correlated sensors of size k ∗ = √ N , and then in ana- lyzing the corresponding group of devices using the tag data for determining the cause of the anomaly . This approac h will b e used in our experimental tests in Section 4, where an empirical evidence for the algorithmic failure in detection of comm unities of v ery small size will be presented. k ε k ide al el bow r e al da ta el bow k* Figure 3: An example of an ideal and real-world el- b o w diagram. In the case of relativ ely weak signals, the elbow plot produced from the real data does not ha ve a pronounced cusp, whic h mak es the iden tiﬁ- cation of the optimal size of the group hard. 3.3 Localization via Biclustering Methods In this part we discuss tw o eﬃcient algorithms for lo- calization of the anomalous subgraph of the physica l net- w ork, whic h do not explicitly use the rank 1 assumption, but instead attempt to ﬁnd a k × k submatrix with an ele- v ated mean. The ﬁrst one, called Large Av erage Submatrix ( LAS ), has b een introduced in [31] and analyzed in Ref. [3], and consists in consecutiv e up dates of k rows and k columns, starting from a random k × k submatrix and repeating the updates un til a gua ranteed con v ergence to a lo cal ma ximum, meaning that the resulting submatrix can not b e improv ed b y c hanging only its column or ro w set. A recen tly intro- duced improv ed version of this algorithm, analysed in [16] and named Iterativ e Greedy Procedure ( I G P ) follo ws a sim- ple greedy sc heme: starting by one randomly chosen ro w, w e add the b est columns and rows sequentially until a k × k submatrix is recov ered. This algorithm outputs a prov ably better results, at least in the case of large Gaussian ran- dom matrices. In what follo ws, w e test the p erformance of these algorithms on a real data set as a part of the lo caliza- tion pro cedure for ﬁnding the anomalously b ehavin g group of nodes. In order to get the best resulting submatrix, we use a m ulti-start pro cedure, initializing both algorithms L times for given k , and retain the most signiﬁcan t submatrix. As before, the size of the hidden subgraph k is unkno wn. In this case, again, we use k ∗ = √ N in order to ﬁnd a smaller sub- matrix, represen ting the no des whic h b elong to the anoma- lous group of devices. The proposed metho d is summarized in Algorithm 1. The complexity of the Algorithm 1 is domi- nated by the complexit y o f the localization step, and is equal to O ( N 3 ) for the lo w-rank algorithm, to O ( I LN ln N ) for LAS and to O (2 k ∗ LN ln N ) for I G P , where I is the num- ber of iterations needed for conv ergence of the LAS scheme ( I . 1000 for practical cases described here), and L & 10 3 is the num ber of w arm starts that we use in biclustering algorithms to ac hieve a desired precision of the best lo cal maxim um. Algorithm 1 Detection and localiza tion of f aul ts Input: N time series { X i } i ∈ V , recorded in real time Correlation matrix: compute { R i ( t ) } i ∈ V and M = { ξ ij ( t ) } as described in Section 2. Detection: chec k for the condition (9) ∆ 1 > ∆ 2 + δ . if p ositiv e detection then Lo calization: apply lo w-rank or biclustering algo- rithms on M , and infer a subset of k ∗ anomalous no des Iden tiﬁcation: using the label data, infer the common cause of the failure end if If the tag data (sensor lab els) and/or additional top olog- ical information is a v ailable, one should be able to infer a possible cause of the failure by lo oking at the common fac- tor uniting the selected nodes. In most cases, the selected basic devices are coupled to a single functional mo del or to a particular con troller which migh t b e at the origin of the fault and requires additional insp ection. 3.4 T ests with synthetic data Prior to running tests on a real-w orld platform (next sec- tion), we examine the detection procedure on artiﬁcially- generated signals consisting of a mixture of correlated and uncorrelated one-dimensional random walks. In this ide- alized situation, we generate N = 900 artiﬁcial signals as one-dimensional random w alks starting from zero. W e select k 0 = 50 of them to be correlated and to represen t an anoma- lous subgroup w e would like to detect and identify . Uncor- related random walk s are lazy . With probabilit y p 0 = 0 . 9, the position at time X i ( t + 1) remains unchanged with re- spect to the previous time step X i ( t ), and with probabilit y p ± = 0 . 05 tw o p ositions separated by one time step sat- isfy X i ( t + 1) = X i ( t ) ± 1. Co rrelated random walks are constructed as follo ws: they are related to one of the ran- dom walks (called the master random w alk), at each time step indep enden tly rep eating the step of the master random w alk with probability ρ = 0 . 5, and otherwise b eha ving as an uncorrelated random walk. Let us now sho w the performance of Algorithm 1 on this artiﬁcial signal ensemble. First, we detrend the data and construct the correlation matrix M in the w ay describ ed in Section 2; w e c ho ose τ corr = 200, and the running mean is tak en o v er the windo w τ av = 10 time steps. The spectrum of M is presented in Figure 4 and triggers a p ositive detection according to the criterion (9). Next, we run the localization algorithms presen ted in Sec- tions 3.2 and 3.3. W e ﬁnd that for k ∗ = √ N = 30, all algorithms p erfectly identify a subgroup of 30 correlated signals. If w e c ho ose to searc h the correlated group with the (unknown) ground truth size k 0 = 50, then the lo w- rank appro ximation approach misidentiﬁes 5 signals, cor- rectly counting the other 45 as correlated. Both bicluster- ing metho ds make only one mistake in this case; ho wev er, it requires a rather large num b er of warm starts ( L ' 3 · 10 4 ) in order to con verge to the b est solution, whic h makes the algorithm slightly slo wer compared to the SVD-based one. As we will see in the next section, the speed of conv ergence is a very important prop ert y for online deploymen t of the 10 0 10 1 10 2 10 3 −10 0 10 20 30 40 50 Δ 1 n Figure 4: The sp ectrum (on a semi-log scale) of the correlation matrix M constructed from the total of N = 900 artiﬁcially-generated signals, including k 0 = 50 correlated w alks. The correlated group of signals pro duces an identiﬁable gap ∆ 1 in the eigen v alue sp ectrum. algorithm. 4. EXPERIMENTS WITH REAL D A T A 4.1 System Description Large commercial air conditioning (AC) systems repre- sen t an attractiv e cyber-physical test case for fault detection and localization algorithms b ecause they contain relativ ely sophisticated ph ysical, control and communications archi- tectures, and the av ailable tag data can serv e as a ground truth for discov ered groups and mo dules. W e collected and analyzed the data streams from the AC system in a 30 000 m 2 oﬃce building, with ab out 900 sensors located in the conditioned spaces. These sensors record local tempera- ture, airﬂow and v alve opening p ositions. See Figure 5 for a schematic representation of the system used in this study , whic h shares a common structure with a large num b er of commercial AC systems. A more in depth discussion of this A C lay out is pro vided in the references [1, 18]. Altogether this constitutes a system of approximately 1000 data het- erogeneous data streams, sampled once p er minute. The v ariable air volume (V A V) units represent the air in- lets to the co oled spaces, containing v alv es that regulate the c hilled air ﬂowing to the conditioned space. Diﬀerent V A Vs spatially close to each other are connected to a common air handling unit (AHU). A pressure sensor at the fans output pro vides an input to to a local con trol lo op that regulates the electrical fan p o wer to ﬁx the fan pressure output. A net work representation of a part of the physical system in- cluding conditioned spaces, fans and controllers is drawn in Figure 6; this data has b een extracted from the tag data accompan ying the recorded signals. This ﬁgure tak es in to accoun t the spatial lay out of conditioned ro oms, and giv es an idea of physical and communication links in the system. Due to a co nﬂict of local con trol loops, one of the fans (F an 6 in Figure 6) in this building is b eha ving anomalously: at certain times of the da y it pro duces mild uncon trolled os- cillations. Although this action is not a result of a cyb er 4 A HU uni ts 300 V A V units Figure 5: A schematic representation of air condi- tioning (A C) system used in this w ork. The AC sys- tem includes tw o sets of lo ops: a w ater lo op circu- lating w ater betw een the chiller and the air-to-water heat exc hangers, and the air loops, where the fans in the air handling units (AHU) force the w arm return air through the heat exc hangers, and the cooled air is then delive red to the v ariable air volume (V A V) units. Thermostats (T) throughout the system pro- vide input to the con trollers that regulate the air ﬂo ws supplied to the V A Vs. The recorded tem- p erature, airﬂo w and v alve op ening position signals from all the sensors and fans are used as input data streams to our fault detection and localization algo- rithm. attac k, it represents a p erfect initial test for the proto col aiming at detection and lo calization of failures: we expect that these oscillations should lea v e a signature in the cor- relations of related physical signals, while the signal is to o w eak to be visible and identiﬁed as an outlier in individual recorded signals. This anomalous b ehavio r in the system is a pro xy for attac ks of the con trol arc hitecture that can o c- cur due to vulnerabilities of the cyb er part of the netw ork. First, we demonstrate the p erformance of our detection cer- tiﬁcate, using the describ ed F an 6 oscillations as a failure ev ent that w e would like to detect and identify . At a sec- ond stage, we perform controlled experiments mimic king a simple intrusion on a smaller subset of devices in order to test the p erformance limits of the detection and lo calization algorithms as a function of the size of the anomalous set. 4.2 Detection Algorithm Perf ormance In Figure 7, we sho w examples of our data stream. The left plot of Figure 7 shows an anomalous b eha vior of F an 6, and three examples of temp erature measuremen ts in three conditioned spaces, tw o of whic h are serviced by F an 6, and one being unrelated. The righ t plot sho ws examples of other signals of diﬀeren t types (airﬂo w and v alv e p ositions) that w e use for tests. The analysis of individual signals do not allow us to detect an anomalous b eha vior and to relate it to the malfunctioning F an 6, and therefore w e follo w the procedure physi ca l s ens or s cont r ol l inks B CU 1 B CU 2 B CU 3 B CU 4 F A N 4 F A N 5 F A N 6 F A N 7 F1 F2 F3 F4 F5 F6 F7 C OMM UNIC A TION NE TWORK PHY SIC A L NE T W OR K Figure 6: Netw ork represen tation of a part of the cyb er-ph ysical system considered in this w ork. The net work reﬂects the spatial organization of the con- ditioned spaces, and includes a part of both physical and con trol links. F an 6 is the anomalously behaving unit of the system. described in Section 2, constructing the correlation matrix and attempting to detect the anomaly from correlations of ph ysical signals. Let us ﬁrst demonstrate the p erformance of the detection algorithm, described in Section 3.1. In Figure 8, we sho w the sp ectra of the correlation matrices M in four diﬀerent situations: i) F an 6 oscillating, and all signals included; ii) F an 6 oscillating, and signals serviced by F an 6 remov ed from the data; iii) F an 6 not oscillating, all signals included; iv) F an 6 oscillating smo othly with a large perio d (on the order a half a da y). It is clear that only case i) should trigger a p ositiv e detection outcome. Indeed, we notice that only the sp ectrum in this case satisﬁes the condition (9), while all other situations yield a negative detection result. The matrix M in each case has been constructed using the parameters τ av = 30 min and τ corr = 200 min. 4.3 Localization Algorithm Perf ormance Once the presence of anomaly is detected, we compare the performance of lo calization algorithms. Is it possible to correctly identify the group of no des related to the anoma- lous fan, and hence to infer the reason of misb eha vior? T a- bles 1 and 2 demonstrate lo calization results for tw o v alues of group sizes. The ground truth k 0 = 209, whic h is in gen- eral unknown , and for k ∗ = 30 strongest signals. W e follow the strategy outlined in Sections 3.2 and 3.3 and use diﬀer- en t combinations of the smo othing windo w time τ av and the correlation time window τ corr . As discussed in Section 2, little relev an t information is captured with small τ av , and indeed we ﬁnd that τ av = 10 do es not lead to a p ositiv e detection, see T able 1. The b est results are obtained for larger v alues of τ av , where more data is incorporated in the correlation matrix. One of the ma jor requirements for the algorithms is the abilit y to p erform online detection and lo calization. New data p oints arrive every minute, so we would like the lo- calization algorithms to conv erge in several seconds. The lo w-rank algorithm is v ery fast, and do es not need any ad- justmen ts. As discussed in the previous section, in order 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 T ime 0 10 20 30 40 50 60 % Ou t pu t F an 6 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 T ime 22 23 23 24 ◦ C T emperature 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00 T ime 100 200 300 f t 3 / s Air Flo w 09:00:00 10:00:00 11:00:00 12:00:00 13:00:00 14:00:00 15:00:00 16:00:00 17:00:00 t 25 35 45 55 d e gr ees V alv e Opening Figure 7: F an 6 oscillations create anomalous data measuremen ts in rooms that are serviced b y that fan. Changes in output of F an 6 can b e seen in the temp erature, air ﬂow, and v alv e op ening p ositions in Ro om 1 (red) and Ro om 2 (green) V A V measure- men t data but not in Ro om 3 (blue) data. Ro om 1 and 2 are serviced by F an 6 but Ro om 3 is not. 1 0 0 1 0 1 1 0 2 1 0 3 i 0 10 20 30 40 50 60 λ i ∆ 1 ∆ 2 (a) 1 0 0 1 0 1 1 0 2 1 0 3 i 0 5 10 15 20 25 30 ∆ 1 ∆ 2 (b) 1 0 0 1 0 1 1 0 2 1 0 3 i 0 10 20 30 40 ∆ 1 ∆ 2 (c) 1 0 0 1 0 1 1 0 2 1 0 3 i 0 10 20 30 40 50 60 ∆ 1 ∆ 2 (d) Figure 8: Sp ectra (in the semi-log scale) of the corre- lation matrix M for diﬀeren t scenarios. Oscillations of F an 6 o ccur: (a) related signals included, (b) re- lated signals excluded. All signals included when (c) F an 6 do es not oscillate and (d) F an 6 oscillates, but with smo othly with a large p eriod. Only the sp ec- trum (a) satisﬁes the detection condition (9) , as it should b e. to meet the computation complexity requiremen t for the bi- clustering algorithm we are forced to limit the n umber of w arm starts to 1000 for the size k 0 = 209 and to 10000 for k ∗ = 30 since the con vergen ce time of bic lustering procedure gro ws with k . Another imp ortan t property of the bicluster- ing metho ds is that unlike in the lo w-rank approximation, the iden tities of the disco v ered columns do not alw ays matc h the iden tity of the discov ered ro ws; we use only one of the subsets to compute the num ber of mismatches. With these restrictions, the three algorithms produce sim- τ av Detection Num b er of false p ositiv es Lo w-rank LAS I G P k ∗ k 0 k ∗ k 0 k ∗ k 0 10 7 27 169 26 144 25 149 30 3 0 123 0 112 0 115 50 3 0 106 0 107 0 108 T able 1: Performance of diﬀerent localization algo- rithms as a function of τ av in the presence of F an 6 activit y . There are k 0 = 209 heterogeneous streams serviced by F an 6, out of N = 974 total signals. The table demonstrates the n um b er of mismatc hes (false detections) iden tiﬁed b y the algorithms in the case of searche d groups of sizes k ∗ and k 0 , with k ∗ = 30 . F or all cases, τ corr = 120 min is kept ﬁxed. τ corr Detection Num b er of false p ositiv es Lo w-rank LAS I G P k ∗ k 0 k ∗ k 0 k ∗ k 0 90 3 2 128 2 120 2 122 120 3 0 123 0 112 0 115 160 3 0 112 0 110 0 109 200 3 0 106 0 103 0 104 T able 2: Comparison of the lo calization algorithms under the same conditions as the ones describ ed in T able 1, as a function of τ corr . In this table, τ av = 30 min is k ept ﬁxed. ilar results with a comparable sp eed (under 3 seconds for lo w-rank algorithm and within 20 − 30 seconds for bicluster- ing in the present case). While only half of the true nodes are disco v ered when searc hing for all of the k 0 anomalous signals, v ery few false p ositiv es o ccur when only searching for the k ∗ strongest signals. The discov ered k ∗ signals in al- most all cases b elong to a subgroup of a true group related to the anomalous fan. This v alue is suﬃcient to determine the common functional role of nodes inside this group, whic h corresponds to their relation to the anomalous F an 6 in this case study . Therefore, all algorithms satisfy the require- men ts of performance, simplicit y and scalabilit y , whic h mak e them appropriate for deplo ymen t in real cyb er-physi cal sys- tems. In the next section, we discuss con trolled experiments whic h would allow us to in v estigate the eﬀect of the size of the anomalous communit y . 4.4 Identiﬁcation limits from controlled exper - iments Previously , we hav e tested the p erformance of the scheme on detecting the fault y b eha vior of F an 6 already present in the system. In this section, w e report results from con trolled experiments on particular sensors of the oﬃce automation system. In their simplest form, these exp erimen ts consisted in a manipulation of temp erature set points, mimicking lo- calized in trusions of small amplitude. The trials w ere con- ducted on the con trollers related to a small n um b er of sensor units on F an 5 (a non-oscillating fan, see Figure 6), while all sensors related to the anomalous F an 6 ha v e b een excluded to a void an undesired in terference. The experiments that w e rep ort here to ok the following form: the temp erature set p oin ts for 10 chosen V A V w ere raised 0 . 5 ◦ F for 30 minutes and then low ered 1 ◦ F for the next 30 minutes. Each V A V contains three sensors measur- ing temp erature, airﬂo w, and v alv e op ening p osition. The experimental intrusions p oten tially aﬀected a total of 30 data streams. Among these 30 data streams of in terest, only 16 sho wed a signiﬁcan t lev el of correlation. There are sev- eral reasons for this behavior, but the most important one consists in the observ ation that the airﬂo w and v alv e open- ing positions ha v e a muc h faster resp onse to the set-point c hange compared to the temp erature measurements which rise or fall on a muc h longer time scale. In the following w e assume that these k 0 = 16 sensors constitute the ground truth for an anomalous group of no des. Using the collected data, we v alidate the c hoice of k ∗ = √ N put forward in Sections 3.2 and 3.3, and used through- out the study of the anomalous sensors related to the F an 6. In particular, w e v erify that if the size of the group k 0 represen ts a suﬃcien tly small fraction of the total n um ber of signals, then it can not b e correctly lo calized. In order to perform this study , we hav e considered 1000 selections of N randomly chosen signals but alw ays containing the k 0 = 16 anomalous no des. W e applied our detection and lo caliza- tion proto col in each case for a range of N . The lo w-rank algorithm was used for localization as w e hav e seen that at these scales it gives the same results with the fastest com- putation time; other localization methods show equiv alen t results. Note that the lo calization procedure w as triggered only when the detection condition (9) w as satisﬁed. 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 Fraction correct 100% identiﬁcation 100 200 300 400 500 600 Number of signals 0 . 00 0 . 25 0 . 50 0 . 75 1 . 00 Fraction correct 50% identiﬁcation detection localization Figure 9: Empirical probabilit y of successful detec- tion and localization of a group of k 0 = 16 anomalous devices as a function of the total n um b er of signals N . Lo calization is considered as succesfull if all k 0 no des are correctly identiﬁe d (top) and if at least 50% of no des are recov ered (bottom). Eac h p oint is a verage d ov er 1000 random selections of N signals. The results are presented in the Figure 9 with the em- pirical probability of successful detection and localization sho wn as a function of the total num ber of signals N . Two deﬁnitions of success are examined; a full and correct 100% iden tiﬁcation of the ground truth, and a successful lo caliza- tion of at least 50% of the k 0 nodes, i.e. correctly iden tifying at least 8 devices out of 16. F or the 100% identiﬁcaiton case w e ﬁnd a phase transition-t yp e b ehavior as a function of N . The lo calization algorithm starts to fail at some p oin t near N = k 2 0 . This b ehav ior is v ery close to the theoretical bounds derived in the idealized situations of Gaussian and Bernoulli distributions; in particular, it justiﬁes our c hoice for k ∗ in the case where the optimal comm unit y size is un- kno wn. The second case of 50% identiﬁcaiton illustrates that if we allo w for some mistak es in the identiﬁcation of anomalous sensors, then a successful localization o ccurs ev- ery time the detection procedure yields a p ositive result. This pro cedure might b e appropriate if the lab eled netw ork is suﬃcien tly sparse and the common cause of the anomaly can b e easily iden tiﬁed using the sensor lab els even in the case where not all the no des are correctly lo calized. 5. RELA TED WORK Defense of cyb er-ph ysical systems: Metho ds for de- tecting and lo calizing cyber-physical failures and attac ks ha ve attracted signiﬁcant attention [30, 34, 33, 26]. Ma- jor h urdles stem from a high degree of inﬂuence of sensor data from seasonal changes, proximit y correlations and op- erational switches, and from the fact that infrastructure op- erators do not alw ays ha v e an accurate model of the ph ysical net work (the assumption we make in this w ork), or the ex- isting mo dels are not integrated into uniﬁed cyber-physical system mo del [33]. Another imp ortan t factor is an increas- ing size and complexity of the systems under considerations [32]. Some of the previous w orks develop detection tech - niques based on an accurate system mo deling and on ac- coun ting for diﬀerent attac k scenarios [26], which represen ts a completely diﬀeren t approach to the problem compared to the presen t study . Signal detrending: Aiming at general applications, we ha ve used a simple running-me an signal detrending pro ce- dure in Section 2. The goal of detrending an y time series [ X ( t )] τ + T t = τ is to decomp ose the signal into a sup erposition of simpler pieces. There are a wide arra y of detrending meth- ods [5, 4, 13, 8, 17, 22] , and each hav e asso ciated strengths and w eaknesses. These detrending methods assume the time series is stationary whic h is most often ac hieved with a regression-line ﬁt to the observ ed time series. After remo v- ing this trend the res idual time series is ev aluated for station- arit y (i.e. E X ( t ) = E X ( t + τ ) ; τ ∈ N ) using a Dick ey-F uller test [5, 4, 13]. A stationary signal can be further decom- posed by assuming it follo ws a linear auto-regressiv e pro cess [5, 4]. An auto-regressive process is one that supposes the signal at time t is a linear addition of the signal sampled at past time p oints X ( t ) = P i = t − 1 a i X ( i ). Other data-driven approac hes considered for detrending a times series are exp onential -smo othing, for example, the Holt-Win ters metho dology [8]. Exponential and Holt-Win ters smoothing detrend the time series b y assuming the signal at time t is made up of past observ ations weigh ted by a geometrically decreasing parameter α ∈ (0 , 1) suc h that X ( t ) = αX ( t − 1) + (1 − α ) s t − 1 where s t − 1 is the cum u- lativ e sum of past w eighted observ ations [17, 22]. Outliers detection: Anomaly detection is an imp ortant ﬁeld with application to a wide num b er of domains (see [7] for a general survey). A large n umber of metho ds hav e b een suggested, including netw ork [36] and time series [19] spe- ciﬁc tec hniques. A general formulation of the anomaly de- tection problem often takes form of hypothesis testing by considering H 0 (absence of anomaly) versus H 1 (presence of anomaly). In the present w ork, the hypothesis H 1 has been formulated as follows: if the correlation matrix is con- structed and normalized in such a wa y that the normally behaving correlations ﬂuctuate around zero, then there ex- ist a submatrix with elements having a deviating mean [25]. This task is directly related to the problem of ﬁnding hidden cliques and communit y detection in graphs [15]. Optimal denoi sing: Real-w orld c orrelation matrices are noisy , and in general it is not suﬃcien t to w ork directly with the observ ed data. One should dev elop tec hniques for ex- tracting a useful signal from the signal-plus-noise matrix, the procedure also kno wn as denoising whic h app ears in man y mac hine learning [23], signal pro cessing [29] and classiﬁca- tion applications [24]. Moreo ver, in realit y the signal matrix migh t ha ve no special structure, while the form of the noise term is in general unkno wn. Sev eral studies ha v e explored the problem of the eﬀective rank estimation of the signal matrix by optimal thresholding of singular v alues [27, 9]. In this work, we encoun tered a diﬀerent problem of estimat- ing the size of the anomalous submatrix under the rank 1 assumption. 6. CONCLUSIONS In this work we explored a set of metho ds for detection and lo calization of failures in cyb er-ph ysical systems, based on the analysis of correlations b etw een physical time series. The established protocol enables the identiﬁca tion of a group of anomalous sensors and pro vides insight for the lo caliza- tion of the failure source. The dev elop ed detection proce- dure achiev es a num ber of imp ortan t requirements, including lo w computational complexit y and simplicit y of implemen- tation. Our capability to access the cyb er-ph ysical demon- stration system, describ ed in the article, to collect and an- alyze data from this system, and to deploy the presented detection algorithm opens a path forward for future w ork. W e plan to con tinue real-world exp erimen ts whic h will con- sist of manipulating the building control system in a known manner using div erse attac k strategies; this will allo w us to further v alidate the presented methods. Another direction that we intend to explore consists of combining more con- trol comm unication net work data in order to minimize the possibility of false detections and to enhance the quality of failure source localization. These developmen ts are essen tial for conception of algorithms for proportional response and for designing resilient cyb er-physica l netw orks. 7. A CKNO WLEDGMENTS The authors ackno wledge Arth ur Barnes, Gary Go ddard and Hari Khalsa for their help with data collection, and Charles Bordenav e, Mic hael Chertko v, David Gamarnik, Earl La wrence, Sidhant Misra and N. Ra j Rao for fruitful discus- sions. This w ork was funded b y the Department of Energy at Los Alamos National Lab oratory under con tract DE-AC52- 06NA25396 through the Laboratory-Directed Research and Dev elopment Program. 8. REFERENCES [1] I. Beil, I. Hiskens, and S. Backhaus. Round-trip eﬃciency of fast demand resp onse in a large commercial air conditioner. Ener gy and Buildings , 97:47–55, 2015. [2] F. Benayc h-Georges and R. R. Nadakuditi. The eigen v alues and eigenv ectors of ﬁnite, low rank perturbations of large random matrices. A dvanc es in Mathematics , 227(1):494–521, 2011. [3] S. Bhamidi, P . S. Dey , and A. B. Nobel. Energy landscape for large av erage submatrix detection problems in Gaussian random matrices. arXiv pr eprint arXiv:1211.2284 , 2012. [4] G. E. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. Time series analysis: for e c asting and c ontr ol . John Wiley & Sons, 2015. [5] D. R. Brillinger. Time series: data analysis and the ory , volume 36. Siam, 2001. [6] A. A. C´ ardenas, S. Amin, and S. Sastry . Research c hallenges for the securit y of con trol systems. In HotSe c , 2008. [7] V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey . ACM Computing Surveys (CSUR) , 41(3):15, 2009. [8] C. Chatﬁeld. The Holt-Winters forecasting pro cedure. Applie d Statistics , pages 264–279, 1978. [9] S. Chatterjee et al. Matrix estimation b y univers al singular v alue thresholding. The A nnals of Statistics , 43(1):177–214, 2015. [10] A. Co ja-Oghlan. Graph partitioning via adaptive spectral techniques. Combinatorics, Pr ob ability and Computing , 19(02):227–284, 2010. [11] A. d’Aspremont, L. El Ghaoui, M. I. Jordan, and G. R. Lanc kriet. A direct formulation for sparse PCA using semideﬁnite programming. SIAM R eview , 49(3):434–448, 2007. [12] Y. Deshpande and A. Montan ari. Finding hidden cliques of size p N/e in nearly linear time. F oundations of Computational Mathematics , 15(4):1069–1128, 2015. [13] D. A. Dick ey and W. A. F uller. Likelihoo d ratio statistics for autoregressive time series with a unit root. Ec onometric a: Journal of the Ec onometric So ciety , pages 1057–1072, 1981. [14] C. Eck art and G. Y oung. The approxima tion of one matrix b y another of lo wer rank. Psychometrika , 1(3):211–218, 1936. [15] S. F ortunato. Communit y detection in graphs. Physics R ep orts , 486(3-5):75–174, 2010. [16] D. Gamarnik and Q. Li. Finding a large submatrix of a Gaussian random matrix. arXiv pr eprint arXiv:1602.08529 , 2016. [17] E. S. Gardner. Exp onen tial smoothing: The state of the art. Journal of F or e c asting , 4(1):1–28, 1985. [18] G. Go ddard, J. Klose, and S. Backhaus. Mo del dev elopment and identiﬁ cation for fast demand response in commercial HV AC systems. Smart Grid, IEEE T r ansactions on , 5(4):2084–2092, 2014. [19] M. Gupta, J. Gao, C. Aggarwa l, and J. Han. Outlier detection for temp oral data: A survey . Know le dge and Data Engine ering, IEEE T r ansactions on , 26(9):2250–2267, Sept 2014. [20] B. Ha jek, Y. W u, and J. Xu. Information limits for reco vering a hidden comm unity . arXiv pr eprint arXiv:1509.07859 , 2015. [21] Y.-L. Huang, A. A. C´ ardenas, S. Amin, Z.-S. Lin, H.-Y. Tsai, and S. Sastry . Understanding the ph ysical and economic consequences of attacks on control systems. International Journal of Critic al Infr astructur e Pr ote ction , 2(3):73–83, 2009. [22] R. Hyndman, A. B. Ko ehler, J. K. Ord, and R. D. Sn yder. F or e c asting with exp onential smoothi ng: the state sp ac e appr o ach . Springer Science & Business Media, 2008. [23] R. Kannan and S. V empala. Sp e ctral A lgorithms . Norw ell, MA, USA: No w Publishers Inc., 2009. [24] V. C. Klema and A. J. Laub. The singular v alue decomposition: Its computation and some applications. Automatic Contr ol, IEEE T r ansactions on , 25(2):164–176, 1980. [25] Z. Ma, Y. W u, et al. Computational barriers in minimax submatrix detection. The A nnals of Statistics , 43(3):1089–1116, 2015. [26] R. Mitchell and I. R. Chen. Mo deling and analysis of attac ks and coun ter defense mec hanisms for cyber ph ysical systems. IEEE T r ansactions on R eliability , 65(1):350–358, March 2016. [27] R. R. Nadakuditi. Optshrink: An algorithm for impro ved low-rank signal matrix denoising b y optimal, data-driv en singular v alue shrink age. Information The ory, IEEE T rans actions on , 60(5):3002–3018, 2014. [28] D. S. Papailiopoulos, A. G. Dimakis, and S. Korokythakis. Sparse PCA through low-rank appro ximations. JMLR: Workshop and Confer enc e Pr o c e e dings , 28(3):747–755, 2013. [29] L. L. Scharf. The svd and reduced rank signal processing. Signal pr o c essing , 25(2):113–133, 1991. [30] L. Sha, S. Gopalakrishnan, X. Liu, and Q. W ang. Cyber-physical systems: A new fron tier. In Machine L e arning in Cyb er T rust , pages 3–13. Springer, 2009. [31] A. A. Shabalin, V. J. W eigman, C. M. P erou, and A. B. Nobel. Finding large av erage submatrices in high dimensional data. The A nnals of Applie d Statistics , pages 985–1012, 2009. [32] I. Shafer, K. Ren, V. N. Bo ddeti, Y. Ab e, G. R. Ganger, and C. F aloutsos. Rainmon: an integrated approac h to mining bursty timeseries monitoring data. In Pr o c e e dings of the 18th ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 1158–1166. A CM, 2012. [33] A. B. Sharma, F. Iv an ˇ ci´ c, A. Niculescu-Mizil, H. Chen, and G. Jiang. Mo deling and analytics for cyber-physical systems in the age of big data. A CM SIGMETRICS Performanc e Evaluation R eview , 41(4):74–77, 2014. [34] J. Shi, J. W an, H. Y an, and H. Suo. A surv ey of cyber-physical systems. In Wir eless Communic ations and Signal Pr o c essing (WCSP), 2011 International Confer enc e on , pages 1–6. IEEE, 2011. [35] R. L. Thorndike. Who b elongs in the family? Psychometrika , 18(4):267–276, 1953. [36] Y. Zhang, N. Meratnia, and P . Ha vinga. Outlier detection techniques for wireless sensor netw orks: A surv ey . Communic ations Surveys & T utorials, IEEE , 12(2):159–170, 2010. [37] Z. Zhang, H. Zha, and H. Simon. Lo w-rank appro ximations with sparse factors I: Basic algorithms and error analysis. SIAM Journal on Matrix Analysis and Applic ations , 23(3):706–727, 2002.

Detection of Cyber-Physical Faults and Intrusions from Physical Correlations

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment