Discovering Graphical Granger Causality Using the Truncating Lasso Penalty

Components of biological systems interact with each other in order to carry out vital cell functions. Such information can be used to improve estimation and inference, and to obtain better insights into the underlying cellular mechanisms. Discovering…

Authors: Ali Shojaie, George Michailidis

Discovering Graphical Granger Causality Using the Truncating Lasso   Penalty
Disco v ering Graphical Granger Causalit y Using the T runcating Lasso P enalt y Ali Sho jaie and George Mic hailidis Departmen t of Statistics, Universit y of Mic higan Abstract Comp onen ts of biological systems interact with eac h other in order to carry out vital cell functions. Suc h information can b e used to impro v e estimation and inference, and to obtain b etter insights into the underlying cellular mec hanisms. Disco vering regulatory interactions among genes is therefore an im- p ortan t problem in systems biology . Whole-genome expression data ov er time pro vides an opportunity to determine how the expression levels of genes are affected b y changes in transcription levels of other genes, and can therefore b e used to discov er regulatory interactions among genes. In this pap er, we prop ose a no vel penalization metho d, called trunc ating lasso , for estimation of causal relationships from time-course gene expression data. The proposed penalty can correctly determine the order of the un- derlying time series, and impro ves the p erformance of the lasso-type estimators. Moreov er, the resulting estimate pro vides information on the time lag betw een activ ation of transcription factors and their effects on regulated genes. W e provide an efficient algorithm for estimation of model parameters, and show that the proposed method can consisten tly discov er causal relationships in the large p , small n setting. The p erformance of the prop osed mo del is ev aluated fa vorably in simulated, as w ell as real, data examples. The proposed truncating lasso method is implemented in the R-pack age grangerTlasso and is a v ailable at www.stat.lsa.umich.edu/ ∼ sho jaie . 1 In tro duction A critical problem in systems biology is to disco ver causal relationships among comp onents of biological sys- tems. Gene regulatory netw orks, metabolic net w orks and cell signalling net works capture causal relationships in cells. Discov ery of causal relationships ma y be only p ossible through carefully designed exp erimen ts, which can b e challenging. Ho w ever, gene regulation is carried out b y binding of protein pro ducts of transcription factors to cis -regulatory elemen ts of genes. Suc h regulatory mechanisms are evident if the expression lev els of gene X is affected by changes in expression levels of gene Y . Therefore, time course gene expression data can be used to disco ver causal relationships among genes and construct the gene regulatory net work. Differen t metho ds hav e b een developed to infer causal relationships from time series data, including dynamic Bay esian Netw orks ( Murphy , 2002 ) and Granger causalit y ( Granger , 1969 ). In dynamic Ba yesian Net works (DBNs) the state space of Bay esian Netw orks is expanded by replicating the set of v ariables in the netw ork b y the num b er of time p oints. Cyclic net works are then transformed to directed acyclic graphs (D AGs) b y breaking down cycles into in teractions b etw een v ariables at t wo differen t time p oints. Ong et al. ( 2002 ) and P errin et al. ( 2003 ) among others ha ve applied DBNs to infer causal relationships among comp onen ts of biological systems. On the other hand, the concept of Granger causality states that gene X is Granger-causal for gene Y if the autoregressive mo del of Y based on past v alues of both genes is significan tly more accurate than the mo del based on Y alone. This implies that c hanges in expression levels of genes could b e explained by expression lev els of their transcription factors. Therefore, statistical metho ds can b e applied to time-course gene expression observ ations to estimate Granger causality among genes. Exploring Granger causality is closely related to analysis of vector autoregressiv e (V AR) mo dels, whic h are widely used in econometrics. Y amaguchi et al. ( 2007 ) and Opgen-Rhein and Strimmer ( 2007 ) employ ed 1 V AR mo dels to learn gene regulatory netw orks, while F ujita et al. ( 2007 ) prop osed a sparse V AR mo del for b etter performance in cases when the num b er of genes, p is large compared to the sample size, n . Similar sparse models hav e also b een considered b y Mukhopadhy a y and Chatterjee ( 2007 ). Zou and F eng ( 2009 ) compared the p erformance of DBNs and Granger causality metho ds for estimation of causal relationships and concluded that the p erformance of the tw o approaches dep end on the length of the time series as w ell as the sample size. The findings of Zou and F eng ( 2009 ) emphasizes the need for sparse mo dels in cases where the sample size is small. In particular, when p  n , p enalized methods often provide b etter prediction accuracy . Arnold et al. ( 2007 ) applied the lasso (or ` 1 ) p enalty to discov er the structure of graphical mo dels based on the concept of Granger causality and studied the relationship b etw een differen t k ey p erformance indicators in analysis of sto c k prices. Asymptotic and empirical p erformances of the lasso p enalt y for disco very of graphical mo dels ha ve b een studied b y many researchers and a num b er of extensions of the original p enalty hav e b een prop osed (we refer to these v arian ts of the lasso p enalty as “lasso-type” penalties). In particular, to reduce the bias in the lasso estimates, Zou ( 2006 ) proposed the adaptiv e lasso penalty , and show ed that for fixed p , if appropriate w eights are used, the adaptive lasso p enalty can achiev e v ariable selection consistency even if the so-called irr epr esentability assumption is violated. In fact, it can also b e shown that if initial weigh ts are derived from regular lasso estimates, the adaptive lasso penalty is also consisten t for v ariable selection in high dimensional sparse settings ( Sho jaie and Mic hailidis , 2010b ). The lasso estimate of the graphical Granger mo del may result in a mo del in which X is considered to influence Y in a num b er of different time lags. Suc h a mo del is hard to interpret and inclusion of additional co v ariates in the mo del may result in po or mo del selection performance. Lozano et al. ( 2009 ) ha ve recently prop osed to use a group lasso p enalt y in order to obtain a simpler Granger graphical mo del. The group lasso p enalt y takes the a v erage effect of X on Y o ver differen t time lags and considers X to be Granger-causal for Y if the av erage effect is significan t. How ever, this results in significant loss of information, as the time difference b et w een activ ation of X and its effect on Y is ignored. Moreov er, due to the av eraging effect, the sign of effects of the v ariables on eac h other can not b e determined from the group lasso estimate. Hence, whether X is an activ ator or a suppressor for Y and/or the magnitudes of its effect remain unknown. In this pap er, we prop ose a no v el trunc ating lasso p enalty for estimation of graphical Granger models. The prop osed penalty has tw o main features: (i) it automatically determines the order of the V AR model, i.e. the n umber of effective time lags and (ii) it p erforms mo del simplification by reducing the num b er of co v ariates in the mo del. W e prop ose an efficient iterative algorithm for estimation of model parameters, pro vide an error-based c hoice for the tuning parameter and prov e the consistency of the resulting estimate, b oth in terms of sign of the effects, as well as, v ariable selection prop erties. The prop osed metho d is applied to simulated and real data examples, and is shown to pro vide b etter estimates than alternativ e penalization metho ds. The remainder of the pap er is organized as follows. Section 2 , starts with a discussion of the use of lasso- t yp e p enalties for estimation of DA Gs as well as a review of the concept of graphical Granger causality . The prop osed truncating lasso p enalty and asymptotic prop erties of the estimator are discussed in section 2.3 , while the optimization algorithm is presented in section 2.5 . Results of sim ulation studies are presen ted in section 3.1 and applications of the prop osed mo del to time course gene expression data on E-coli and human cancer cell line (HeLa cells) are illustrated in sections 3.2 and 3.3 , resp ectively . A summary of findings and directions for future research are discussed in section 4 . 2 Mo del and Metho ds 2.1 Graphical Mo dels and Penalized Estimates of D A Gs Consider a graph G = ( V , E ), where V corresp onds to the set of no des with p elemen ts and E ⊂ V × V is the edge set. The no des of the graph represent random v ariables X 1 , . . . , X p and the edges capture associations amongst them. An edge is called directed if ( i, j ) ∈ E ⇒ ( j, i ) / ∈ E and undirected if ( i, j ) ∈ E ⇐ ⇒ ( j, i ) ∈ E . W e represent E through the adjacency matrix A of the graph, a p × p matrix whose ( j, i ) − th entry indicates 2 whether there is an edge (and its weigh t) betw een nodes j and i . Causal relationships among v ariables are represen ted b y directed graphs where E consists of only directed edges. Let pa i denote the set of p ar ents of no de i and for j ∈ pa i , write j → i . The causal effect of random v ariables in a DA G can b e explained using structur al e quation mo dels ( Pearl , 2000 ), where each v ariable is mo deled as a (nonlinear) function of its parents. The general form of these mo dels is given by: X i = f i (pa i , Z i ) , i = 1 , . . . , p (2.1) The random v ariables Z i are the laten t v ariables representing the unexplained v ariation in eac h node. T o mo del the asso ciation among nodes of a DA G, w e consider a simplification of ( 2.1 ) where f i is linear. More sp ecifically , let ρ ij represen t the effe ct of gene j on i for j ∈ pa i , then X i = X j ∈ pa i ρ ij X j + Z i , i = 1 , . . . , p (2.2) In the sp ecial case, where the random v ariables on the graph are Gaussian, equations ( 2.1 ) and ( 2.2 ) are equiv alen t in the sense that ρ ij are the coefficients of the linear regression mo del of X i on X j , j ∈ pa i . It is kno wn in the normal case that ρ ij = 0 , j / ∈ pa i . F or the case of DA Gs, it can be sho wn that when the v ariables inherit a natural ordering, the likelihoo d function can be directly written in terms of the adjacency matrix of the D AG. It then follo ws that the p enalized estimate of the adjacency matrix can b e found by solving p − 1 p enalized regression problems. T o see this, let X b e the n × p matrix of observ ations and S = n − 1 X T X b e the empirical co v ariance matrix. Then, the estimate of the adjacency matrix of DA Gs under the general w eighted lasso (or ` 1 ) p enalty , is found b y solving the following ` 1 -regularized least squares problems for i = 2 , . . . , p ˆ A i, 1: i − 1 = argmin θ ∈ R i − 1    n − 1 kX i − X 1: i − 1 θ k 2 2 + λ i i − 1 X j =1 | θ j | w ij    (2.3) where A i, 1: i − 1 denotes the first i − 1 elemen t of the i th ro w of the adjacency matrix and w ij represen ts the w eights. F or the lasso p enalty w ij = 1 and in case of adaptive lasso w ij = 1 ∨ | ˜ A ij | − 1 where ˜ A are the initial estimates obtained with the regular lasso p enalty . 2.2 Graphical Granger Causality Let X 1: T = { X } T t =1 and Y 1: T = { Y } T t =1 , b e tra jectories of tw o sto chastic pro cesses X and Y up to time T and consider the following tw o regression mo dels: Y T = AY 1: T − 1 + B X 1: T − 1 + ε T (2.4) Y T = AY 1: T − 1 + ε T (2.5) Then X is said to be Granger-causal for Y if and only if the model 2.4 results in significan t improv ements o ver mo del 2.5 . Graphical Granger models extend the notion of Granger causalit y among tw o v ariables to p v ariables. In general, let X 1 , . . . , X p b e p sto chastic processes and denote b y X the rearrangement of these sto c hastic pro cesses in to a v ector time series, i.e. X t = ( X t 1 , . . . , X t p ) T . W e consider mo dels of the form X T = A 1 X T − 1 + . . . A T − 1 X 1 + ε T . (2.6) In the graphical Granger mo del, X t j is said to b e Granger-causal for X T i if the corresp onding coefficient, A t i,j is statistically significant. In that case, there exists an edge X t j → X T i in the graphical mo del with T × p no des. Suc h a mo del corresponds to a DA G with T × p v ariables, in which the ordering of the set of p -v ariate v ectors X 1 , . . . , X T is determined by the temp oral index and the ordering among the elemen ts of each v ector is arbitrary . Lasso-t yp e estimates of DA Gs can therefore be used in the con text of graphical Granger mo dels in order to estimate the effects of v ariables on each other. The mo del in ( 2.6 ) is also equiv ale n t to V ector Auto-Regressiv e (V AR) models ( L¨ utk ep ohl , 2005 , Chapter 2), whic h ha ve b een used for estimation of graphical Granger causality by a n umber of researc hers, including Arnold et al. ( 2007 ). 3 2.3 T runcating Lasso for Graphical Granger Mo dels Consider a graphical mo del with p v ariables, observed ov er T time p oints, and let d b e the order of the V AR mo del or the effectiv e num b er of time lags (in ( 2.6 ) d = T − 1). As in section 2.1 , let X t denote the design matrix corresponding to t -th time p oint, and X t i b e its i -th column. The truncating l asso estimate of the graphical Granger model is found b y solving the following estimation problem for i = 1 , . . . , p : argmin θ t ∈ R p n − 1 kX T i − d X t =1 X T − t θ t k 2 2 + λ d X t =1 Ψ t p X j =1 | θ t j | w t j (2.7) Ψ 1 = 1 , Ψ t = M I {k A ( t − 1) k 0

0 , p = p ( n ) = O ( n a ) and | pa i | = O(n b ) , wher e sn 2 b − 1 log n = o (1) as n → ∞ . Mor e over, supp ose that ther e exists ν > 0 such that for al l n ∈ N and al l i ∈ V , V ar  X T i | X T − d : T − 1 1: p  ≥ ν and ther e exists δ > 0 and some ξ > b such that for every i ∈ V and for every j ∈ pa i , | π ij | ≥ δ n − (1 − ξ ) / 2 , wher e π ij is the p artial c orr elation b etwe en X i and X j after r emoving the effe ct of the r emaining variables. Assume that λ  dn − (1 − ζ ) / 2 for some b < ζ < ξ and d > 0 , and the initial weights ar e found using lasso estimates with a p enalty p ar ameter λ 0 that satisfies λ 0 = O ( p log p/n ) . Also, for some lar ge p ositive numb er g , let Ψ t = g exp ( nI {k A ( t − 1) k 0 < p 2 β / ( T − t ) } ) (i.e. M = g e n ). Then if true c ausal effe cts diminish over time, 10 (i) With pr ob ability c onver ging to 1, no additional Gr anger-c ausal effe cts ar e include d in the mo del and the signs of such effe cts ar e c orr e ctly estimate d. (ii) With pr ob ability asymptotic al ly lar ger than 1 − β , true Gr anger-c ausal effe cts and the or der of the V AR mo del ar e c orr e ctly determine d. Pr o of. If β = 0, inclusion of the true causal effect, exclusion of incorrect effects and consistency of signs of effects follow from Theorem 3 of Sho jaie and Michailidis ( 2010b ). Since β has no effect on the probability of false p ositive, this prov es (i). F or any giv en β > 0, suppose t 0 is the smallest t for whic h k A ( t − 1) k 0 < p 2 β / ( T − t ). Then for t < t 0 Ψ t = 1 and has no effect on the estimate. Let t ≥ t 0 . Then using the KKT conditions, a co efficient is included in the w eighted lasso estimate only if | 2 n − 1 ( X t j ) T ( X T i − X t θ t ) | > Ψ t λw t j . How ev er, ( X t j ) T ( X T i − X t θ t ) is sto c hastically smaller than ( X t j ) T X T i , which is in turn a p olynomial function of n . On the other hand, λ and w t j are also p olynomial functions of n , whereas Ψ t increases exp onentially as n → ∞ . Hence, for all j = 1 , . . . , p and t ≥ t 0 , there exists an n such that | 2 n − 1 ( X t j ) T ( X T i − X t θ t ) | < Ψ t λw t j and therefore, A t = 0 , t ≥ t 0 . How ever, since the num ber of true causal effects diminish ov er time, the total num b er of true edges in time lags t ≥ t 0 is less than β . This prov es the first part of (ii). Finally , to prov e that the order of V AR is correctly estimated, i.e. d = t 0 − 1, we consider tw o com- plemen tary even ts: d < t 0 − 1 and d > t 0 − 1. Prior to t 0 , false p ositives o ccur with exp onentially small probabilit y , hence, the probabilit y that d < t 0 − 1, is negligible. On the other hand, d > t 0 − 1 only if true edges are not included in ˆ A t 0 and as a result k ˆ A ( t 0 − 1) k 0 < p 2 β / ( T − t 0 ). But false negativ es o ccur if true edges v anish in the adaptive lasso estimate. Ho wev er, adaptive lasso finds the true edges with exponentially large probabilit y , hence, P ( d < t 0 − 1) ≥ 1 − β − O (exp( − cn d )) for constants c and d . This completes the pro of. References Arnold, A., Liu, Y., and Ab e, N. (2007). T emporal causal mo deling with graphical granger metho ds. In Pr o c e e dings of the 13th ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 66–75. ACM New Y ork, NY, USA. de Leeu w, J. (1994). Blo ck-relaxation algorithms in statistics. In Information System and Data Analysis , pages 308–325. F riedman, J., Hastie, T., and Tibshirani, R. (2008). Regularization Paths for Generalized Linear Mo dels via Co ordinate Descen t. Dep artment of Statistics, Stanfor d University, T e ch. R ep . F ujita, A., Sato, J., Gara y-Malpartida, H., Y amaguchi, R., Miyano, S., Sogay ar, M., and F erreira, C. (2007). Mo deling gene expression regulatory netw orks with the sparse vector autoregressiv e mo del. BMC Systems Biolo gy , 1 (1), 39. Granger, C. (1969). Inv estigating causal relations b y econometric mo dels and cross-spectral methods. Ec ono- metric a , page 424. Kao, K., Y ang, Y., Boscolo, R., Sabatti, C., Royc ho wdhury , V., and Liao, J. (2004). T ranscriptome-based determination of multiple transcription regulator activities in Esc herichia coli b y using net work comp onent analysis. Pr o c e e dings of the National A c ademy of Scienc es , 101 (2), 641–646. Lozano, A., Abe, N., Liu, Y., and Rosset, S. (2009). Grouped graphical Granger modeling for gene expression regulatory net works discov ery. Bioinformatics , 25 (12), i110. L ¨ utk ep ohl, H. (2005). New intr o duction to multiple time series analysis . Springer. Mukhopadh ya y , N. and Chatterjee, S. (2007). Causality and path wa y searc h in microarray time series exp erimen t. Bioinformatics , 23 (4), 442. 11 Murph y , K. (2002). Dynamic Bayesian networks: r epr esentation, infer enc e and le arning . Ph.D. thesis, Univ ersity Of California. Ong, I., Glasner, J., P age, D., et al. (2002). Mo delling regulatory pathw ays in E. coli from time series expression profiles. Bioinformatics , 18 (Suppl 1), S241–S248. Opgen-Rhein, R. and Strimmer, K. (2007). Learning causal net works from systems biology time course data: an effective model selection pro cedure for the v ector autoregressiv e pro cess. BMC bioinformatics , 8 (Suppl 2), S3. P earl, J. (2000). Causality: Mo dels, R e asoning, and Infer enc e . Cambridge Univ Press. P errin, B., Ralaiv ola, L., Mazurie, A., Bottani, S., Mallet, J., and d’Alche Buc, F. (2003). Gene net w orks inference using dynamic Bay esian net works. Bioinformatics , 19 (90002), 138–148. Sam b o, F., Di Camillo, B., and T offolo, G. (2008). CNET: an algorithm for rev erse engineering of causal gene net works. In NETT AB2008. V ar enna, Italy . Sho jaie, A. and Michailidis, G. (2009). Analysis of Gene Sets Based on the Underlying Regulatory Netw ork. Journal of Computational Biolo gy , 16 (3), 407–426. Sho jaie, A. and Michailidis, G. (2010a). Netw ork Enrichmen t Analysis in Complex Experiments. Stat. App. in Genetics and Mol. Biolo gy , 9 (1), Article 22. Sho jaie, A. and Michailidis, G. (2010b). Penalized Likelihoo d Metho ds for Estimation of sparse high dimen- sional directed acyclic graphs. Biometrika (in press). Tseng, P . (2001). Conv ergence of a block co ordinate descent metho d for nondifferentiable minimization. Journal of optimization the ory and applic ations , 109 (3), 475–494. Whitfield, M., Sherlock, G., Saldanha, A., Murray , J., Ball, C., Alexander, K., Matese, J., Perou, C., Hurt, M., Brown , P ., et al. (2002). Iden tification of genes p eriodically expressed in the human cell cycle and their expression in tumors. Mole cular Biolo gy of the Cel l , 13 (6), 1977. Y amaguchi, R., Y oshida, R., Imoto, S., Higuchi, T., and Miyano, S. (2007). Finding mo dule-based gene net works with state-space mo dels-Mining high-dimensional and short time-course gene expression data. IEEE Signal Pr o c essing Magazine , 24 (1), 37–46. Zou, C. and F eng, J. (2009). Granger causality vs. dynamic Ba yesian netw ork inference: a comparative study. BMC bioinformatics , 10 (1), 122. Zou, H. (2006). The Adaptive Lasso and Its Oracle Prop erties. J of the A meric an Statistic al Asso ciation , 101 (476), 1418–1429. 12


Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment