Multivariate Granger Causality and Generalized Variance
Granger causality analysis is a popular method for inference on directed interactions in complex systems of many variables. A shortcoming of the standard framework for Granger causality is that it only allows for examination of interactions between s…
Authors: Adam B. Barrett, Lionel Barnett, Anil K. Seth
Multiv ariate Granger Causality and Generali zed V ariance Adam B. Barrett 1 ∗ , Lionel Barnett 2 † and Anil K. Seth 1 § 1 Sackler C entr e for Consciousness Scienc e, Scho o l of Informatics, Univ ersit y of Sussex , Brigh ton BN1 9QJ, UK 2 Centr e for Com putational Neur osci enc e and R ob otics, Scho ol o f Informatics, Univ ersit y of Sussex , Brigh ton BN1 9QJ, UK Octob er 24, 2018 Abstract Granger causality analysis is a po pular metho d for inference on directed int eractio ns in co mplex systems o f many v aria bles. A shor tco ming of the s tandard framework for Gra nger causality is that it o nly allows fo r examination of interactions betw een sing le (univ ariate) v ar ia bles within a sys tem, p erhaps co nditioned on o ther v ariables. How ever, interactions do not necessa rily take place betw een single v ariables , but may occur among groups, or “ensembles”, o f v ariables. In this s tudy we establish a principled framew ork for Granger causality in the con text of causal int eractio ns among t wo or more m ultiv ar iate sets of v aria bles. Building o n Gew eke’s seminal 1982 work, we o ffer ne w justifica tions for one particular for m of mu ltiv ar iate Granger causa lity based on the g eneralized v ar iances o f r esidual e r rors . T aken together, our results suppo rt a comprehen- sive and theor etically c o nsistent extension of Gr anger causality to the multiv aria te ca se. T reated individually , they highlig ht several sp ecific a dv antages of the gener alized v ariance measure , which we illus trate using applica tions in neuro science as an e x ample. W e further show how the mea sure can b e used to define “pa rtial” Gra nger causality in the m ultiv ar iate context and we also mo tiv ate reformulations of “caus al density” a nd “Gra nger a utonomy”. Our res ults a r e directly a pplicable to exp erimental da ta and promise to reveal new types o f functional relatio ns in complex systems, neural a nd otherwis e . P A CS num b er s: 87.19 .L-, 87.10 .Mn, 8 9.75.- k , 87.1 9.lj Keywords: Granger causa lity , causal inference, multiv ariate s tatistics, generalized v ariance 1 In tro duction A k ey c hallenge across many domains of science and engineering is to un derstand the b eha vior of complex systems in terms of dynamical in teractio ns among their co mp onen t parts. A common wa y ∗ email: A dam.Barrett@sussex.ac.uk † email: L.C.Barnett@sussex.ac.uk § email: A .K.Seth@sussex.ac.uk 1 to address this c hallenge is by analysis of time series data acquired simultaneo usly from m ultiple system comp onents. Increasingly , suc h analysis aims to dra w inferences ab ou t c ausal interact ions among system v ariables [ 1 , 2 , 3 ], as a complement to s tand ard assessments of undirected f unctional connectivit y as rev ealed b y coherence, correlatio n, and the lik e. A first step in an y dyn amical analysis is to iden tify target v ariables. Typicall y , subsequ en t analysis then assumes that functional (causal) in teractio ns take p lace among these v ariables. Ho we v er, in the general case it m a y b e that explanato rily r elev an t causal in teractio ns tak e p lace among gr oups , or “ensembles”, of v ariables [ 4 , 5 ]. It is imp ortan t to accoun t for th is p ossibilit y for at least t wo reasons. First, iden tification of target v ariables is us ually based on a priori syste m knowledge or tec hnical constrain ts, whic h may b e incomplete or arb itrary , resp ectiv ely . Second, ev en giv en appropriate target v ariables, it is p ossib le that relev an t interac tions may operate at multiple scal es within a system, with larger scales in v olving groups of v ariables. Consid er an example from f u nctional neuroimaging. In a typica l fMRI 1 study , the researc her ma y id entify a priori seve ral “regions-of- in terest” (R OI) in the brain, eac h r epresent ed in the fMRI dataset b y multiple vo xels, w here eac h v o xel is a v ariable comprising a single time series r eflecting changes in the un derlying metab olic signal. Assuming that the ob jectiv e of the stud y is to assess the causal connectivit y among the R OIs, a standard appr oac h is to deriv e a single time series for eac h ROI either by a veragi ng or by extracting a principal co mp onen t [ 6 ]; alternativ ely , rep eated pairwise analysis can b e p erformed on eac h pair of v oxe ls. A more appropr iate app roac h, how ev er, may b e to co nsider causal in teractions among the multiv ariate groups of v o xels c omprising eac h R OI . Similar sce narios could b e concocted in a v ery wid e range of application areas, including economics, biology , climate s cience, among others . In this pap er, we d escrib e a principled app roac h to assessing causal in teractions among multi- v ariate groups of v ariables. Our appr oac h is b ased on the concept of Granger causalit y (G -causalit y) [ 7 , 8 ], a statistica l notion of causalit y wh ich originated in econometrics but whic h has s ince found widespread application in many fields, with a particular concen tration in the neurosciences [ 1 , 9 ]. G-causalit y is an example of time series inference on sto chastic p r o cesses and is u sually implemen ted via autoregressiv e mo deling of m ultiv ariate time series. The basic id ea is simple: one v ariable (or time series) can b e called “causal” to another if the abilit y to predict the second v ariable is improv ed b y incorp orating information ab out th e first. More p recisely , giv en inte r-dep enden t v ariables X and Y , it is said that “ Y G-causes X ” if, in a statistically suitable manner, Y assists in predicting the future of X b eyond the degree to wh ic h X already predicts its own f uture. It is straight forwa rd to extend G-causalit y to th e conditional case [ 5 ], where Y is said to G-cause X , conditional on Z , if Y assists in pr edicting the f u ture of X b eyo nd the degree to wh ic h X and Z together already predict the fu ture of X . Imp ortan tly , conditional G-causalit y is orthogonal to the notion of in ferring causalit y among groups of v ariables, w hic h is th e fo cus of the p r esen t pap er and whic h we term mul- tivariate G-ca usalit y . I n the m ultiv ariate case, th e ab o v e description of G-causalit y is generalized to in teractions among se ts of interdep endent v ariables X , Y , and (in the conditional multiv ariate case) Z . The generaliz ation we prop ose was originally in tro duced in th e field of economet rics by Gewek e in 1982 [ 5 ], but h as sin ce b een almost total ly o verlook ed. In deed a d ifferen t measure has recen tly app eared [ 4 ]. In the follo win g, w e d eriv e several justifications for pr eferring Gew ek e’s measure, some of whic h we examine numerically . W e go on to explore a series of implications for the analysis of complex systems in general, with a particular fo cus on applicati ons in neuroscience. 1 F u nctional magnetic resonance imaging. 2 After la ying out our con v en tions in Section 2, in Section 3 w e in tro du ce t w o alternativ e measures of multiv ariate G-causalit y . The form ulations differ according to th eir treatmen t of the co v ariance matrices of residuals in the underlying autoregressiv e mo dels: Gew eke’s measure uses the determinant of this matrix (the generalized v ariance), wh ile the other uses the tr ac e (the total v ariance). Section 4 explores several adv an tageous prop erties of the determinant formulation as compared to the trace form ulation. In br ief, the determinan t formulation is fully equiv alen t to transfer entrop y [ 3 ] un der Gaussian assumptions, is in v arian t under a wider range of v ariable transformations, is expandable as a sum of stand ard univ ariate G-causalitie s, and admits a satisfactory sp ectral d ecomp osition. Numerically , w e sho w that Gewek e’s measur e is just as stable as is the alternativ e measure based on the total v ariance. Section 5 extends the d eterminan t formulation to the imp ortant case of “partial” G-causalit y whic h pro vides some measure of cont rol with resp ect to un m easured laten t or exogenous v ariables. S ection 6 exte nds a previously defin ed measure of “c ausal densit y” [ 10 ] whic h r eflects the o v erall dynamical complexit y of causal interactio ns sustained by a system. In Section 7 we show ho w m ultiv ariate G-causalit y can enhance a measure of “autonom y” (or “self-causation”) based on G- causalit y [ 11 ], and S ection 8 carries th e d iscussion to wa rds the identi fication of macroscopic v ariables via th e n otion of causal ind ep end en ce. Section 9 provides a general discussion and su mmary of con tributions. 2 Notational con v en tions and preliminaries W e use a mathematical v ector/matrix notation in whic h b old t yp e generally denotes vect or qu an tities and upp er-case t yp e denotes m atrices or random v ariables, according to con text. All v ectors are considered to b e c olumn v ectors. ‘ ⊕ ’ denotes vertic al c onc atenation of v ectors, so that for x = ( x 1 , . . . , x n ) ⊺ and y = ( y 1 , . . . , y m ) ⊺ , x ⊕ y is th e vec tor ( x 1 , . . . , x n , y 1 , . . . , y m ) ⊺ of dimension n + m , where the symbol ‘ ⊺ ’ denotes the transp ose op erator. W e also w rite |·| for the determinant and tr ( · ) for the trace of a square matrix. Giv en join tly distributed m ultiv ariate random v ariables (i.e. random vec tors) X , Y , w e denote b y Σ( X ) the n × n matrix of co v ariances co v ( X i , X j ) and by Σ( X , Y ) the n × m matrix of cross- co v ariances co v ( X i , Y α ). W e then use Σ( X | Y ) to denote the n × n mat rix Σ( X | Y ) ≡ Σ( X ) − Σ( X , Y ) Σ( Y ) − 1 Σ( X , Y ) ⊺ , (1) defined when Σ( Y ) is in v ertible. Σ( X | Y ) app ears as the co v ariance matrix of the residuals of a linear regression of X on Y (c.f. Eq. ( 6 ) b elo w); th u s, b y analogy with p artial c orr elation [ 12 ] w e term Σ( X | Y ) the p artial c ovarianc e 2 of X give n Y . Similarly , giv en another join tly distribu ted v ariable Z , w e d efine the p artial cr oss-c ovarianc e Σ( X , Y | Z ) ≡ Σ( X , Y ) − Σ( X , Z ) Σ( Z ) − 1 Σ( Y , Z ) ⊺ . (2) The follo wing iden tit y [ 13 ] will b e useful for deriving c ertain pr op erties of m u ltiv ariate G-causalit y: | Σ( X | Y ) | = | Σ( X ⊕ Y ) | / | Σ( Y ) | . (3) 2 This is to b e distinguished from the c onditional c ovarianc e , which will in general be a random v ariable, though later we note that for Gaussian v ariables the notions coincide. 3 Supp ose we ha v e a multiv ariate sto chastic pro cess X t in discrete time 3 (i.e. the rand om v ariables X it are join tly distributed). W e use the notatio n X ( p ) t ≡ X t ⊕ X t − 1 ⊕ . . . ⊕ X t − p +1 to denote X itself, al ong with p − 1 lags , so that for eac h t , X ( p ) t is a random vec tor of dimen sion pn . Giv en the lag p , w e also often use the shorthand notation X − t ≡ X ( p ) t − 1 for the lagg ed v ariable. 3 Multiv ariate Granger caus alit y G-causalit y analysis is concerned with the comparison of differen t linear regression mo dels of data. Th us, let us consider the (m ultiv ariate) linear regression of one random v ector X , the predictee, on another r andom v ector Y , the predictor: 4 X = A · Y + ε , (4) where the n × m matrix A con tains the regression coefficient s and th e r an d om vecto r ε = ( ε 1 , . . . , ε n ) ⊺ comprises the residu als. The co efficien ts of this m o del are un iquely sp ecified by imp osing zero correlation b et w een the residuals ε and the regressors (pred ictors) Y . Via the Y ule-W alk er pr o cedure [ 1 , 13 ] one obtains A = Σ ( X , Y ) Σ( Y ) − 1 (5) and fi n ds the cov ariance matrix of the residuals to b e giv en b y Σ( ε ) = Σ ( X | Y ) , (6) with Σ( X | Y ) defined as in ( 1 ). Supp ose n o w w e h a v e thr ee jointl y distributed, stationary 5 m ultiv ariate sto c hastic pro cesses X t , Y t , Z t . Then to measure the G-causalit y from Y to X giv en Z , one wa nt s to compare the follo w in g t wo multiv ariate autoregressiv e (MV AR) mo dels for the pro cesses [ 8 ]: X t = A · X ( p ) t − 1 ⊕ Z ( r ) t − 1 + ε t , X t = A ′ · X ( p ) t − 1 ⊕ Y ( q ) t − 1 ⊕ Z ( r ) t − 1 + ε ′ t . (7) Th us the predictee v ariable X is regressed firstly on the previous p lags of itself plus r lag s of the conditioning v ariable Z and secondly , in add ition, on q lags of the predictor v ariable Y (in theory , if not in practice, p , q and r could be infinite). 6 The standard measure of G-causalit y us ed in the literature is defined only for u ni v ariate pred ictor and p redictee v ariables Y a nd X , and is giv en by the log of the ratio of the residual v ariances for 3 While our analysis may b e extended to c ontinuous time we focus here on the discrete time case. 4 Here and in the remainder of this paper we assume, without loss of generality , that all random ve ctors an d random processes have zero mean; thus constant terms are omitted in all linear regressions. 5 The analysis carries through for th e non-stationary case, but for simplicit y we assume h ere th at all pro cesses are stationary . 6 This might b e more familiar as c onditional G-causalit y , with Z th e conditioning v ariable. In practice it is the more useful form; for the non-conditional versi on, Z may simply b e omitted. 4 the r egressions ( 7 ). In our n otation, 7 F Y → X | Z ≡ ln v ar( ε t ) v ar( ε ′ t ) = ln Σ( ε t ) Σ( ε ′ t ) = ln Σ( X | X − ⊕ Z − ) Σ( X | X − ⊕ Y − ⊕ Z − ) , (8) where the last equ alit y follo ws from the general formula ( 6 ). By stationarit y this expression do es not dep end on time t . Note that the residual v ariance of th e first regression will alw a ys b e larger than or equal to that of the second, so that F Y → X | Z ≥ 0 alw a ys. As regards statistical inference, it is kno wn that the corr esp onding m axim um like liho o d estimator 8 b F Y → X | Z will h a v e (asymp totical ly for large s amp les) a χ 2 -distribution under the n ull h yp othesis F Y → X | Z = 0 [ 14 , 15 ], and a n on-cen tral χ 2 -distribution u nder the alternativ e hyp othesis F Y → X | Z > 0 [ 5 , 16 ]. W e no w consider the case wh ere p redictee and pr edictor v ariables are no longer constrained to b e univ ariate, i.e. multiv ariate G-causalit y . F or a multiv ariate predictor, Eq. ( 8 ) ab ov e (with Y rep laced b y the b old-t yp e Y ) is a v alid and consistent form ula for G-c ausalit y . Ho w ev er, for the case of a m ultiv ariate predictee there is n ot yet a standard d efinition for G-causalit y . One p ossibility is to simply use the m ultiv ariate mean squ are error (i.e. tot al v ariance, or exp ected squared length of the m ultiv ariate residual), leading to F tr Y → X | Z ≡ ln tr(Σ( ε t )) tr(Σ( ε ′ t )) = ln tr(Σ( X | X − ⊕ Z − )) tr(Σ( X | X − ⊕ Y − ⊕ Z − )) . (9) W e call this the trace ve rsion of multi v ariate G-causalit y (trvMV GC). As recent ly noted by Ladr oue and colleagues [ 4 ] trvMV GC app ears to b e a natural extension of G-causalit y to th e m ultiv ariate case b ecause total v ariance is a common c hoice for a measur e of goo d ness-of-fit or p rediction error for a m ultiv ariate regression. Moreo ver, the measure is alw ays non-n egativ e, reduces to ( 8 ) when the predictee v ariable is univ ariate, and the regression matrix co efficient s that render the residuals uncorrelated with the regressors also minimize the total v ariance (this is just the “ordinary least squares” pro cedur e, minimizing mean square error). Nonetheless, an alternativ e originally p rop osed b y Gew ek e [ 5 ] uses instead the gener alize d varianc e | Σ( ε t ) | , whic h quan tifies the volume in which the r esiduals lie. T his leads to the measur e F Y → X | Z ≡ ln | Σ( ε t ) | | Σ( ε ′ t ) | = ln | Σ( X | X − ⊕ Z − ) | | Σ( X | X − ⊕ Y − ⊕ Z − ) | . (10) 7 Note that even though X and Y are univ ariate, the lagge d va riables X − and Y − will generally b e multiv ariate (at least if p, q > 1); hence they are written in b old type. 8 W e remark that for significance testing of G-causalit y it is quite common t o use th e app ropriate F -statistic for the regressions ( 7 ) rather than F Y → X | Z itself [ 8 , 17 ]; the quantities are in any case related by a monotonic transformation. 5 Lik e trvMVGC, this measure is alw a ys non-negativ e, reduces to ( 8 ) when the predictee v ariable is univ ariate, and is consisten t with the autoregressiv e app roac h inasmuc h as the Y u le-W alk er regression matrix co efficien ts minimize the generalized v ariance, | Σ( ε ) | , as wel l as the total v ariance, (see App end ix A for a pr o of ). Gew eke [ 5 ] lists a n um b er of motiv ations for taking F Y → X | Z as giv en in Eq. ( 10 ) as the natural extension of G-causalit y to th e multi v ariate case. These include: (i) that the generalized v ariance version ( 10 ) is inv ariant under (linear) transformation of v ariables (see S ection 4.2 ); an d (ii) that the maxim um likelihoo d estimator of this quan tit y , b F Y → X | Z , is asymptotically χ 2 -distributed for large samples. In the follo wing section we further jus tify this choice . Since w e adv ocate the use of Gewek e’s measure ( 10 ) of multiv ariate G-causali t y w e abbreviate this simply as MV GC henceforth. As remarked previously , the expression ( 10 ) defines c onditional MV GC. Gewe k e [ 18 ] giv es the follo w in g in tu itiv ely ap p ealing expr ession for F Y → X | Z in terms of unconditional MVGCs: F Y → X | Z ≡ F Y ⊕ Z → X − F Z → X ; (11) that is, the exten t to wh ic h Y and Z toge ther c ause X less the exten t that Z on its o wn causes X . Note that this ident it y also holds f or trvMVG C. 4 Prop erties of Multiv ariate G ranger causalit y In the follo w in g subsections we discuss some prop erties of MV GC and furth er motiv ate Gew ek e’s definition of this measure. 4.1 Gaussian Equiv alence wit h T ransfer Entrop y When all v ariables are Gaussian distribu ted, the MV GC F Y → X | Z is fu lly equiv alen t to the transf er en trop y T Y → X | Z , an in f ormation-theoretic notion of causalit y [ 13 ], with a simple factor of 2 relating the tw o quan tities, F Y → X | Z = 2 T Y → X | Z . (12) T ransfer en tropy [ 3 , 19 ] is defined by the difference in entropies T Y → X | Z ≡ H X | X − ⊕ Z − − H X | X − ⊕ Y − ⊕ Z − , (13) and quan tifies the degree to whic h knowledge of the past of Y r educes uncertain t y in the fu ture of X . The equiv alence ( 12 ) stems from the entrop y of a Gaussian distrib ution b eing dir ectly p rop ortional to the logarithm of the d eterminan t of its co v ariance matrix; and, further m ore, from any conditional en trop y inv olving Gaussian v ariables b eing d ir ectly p rop ortional to the logarithm of the determinant of the ap p ropriate corresp ond in g p artial cov ariance matrix (see [ 13 ] for details). Due to the u se of the determinan t b eing cru cial for this relationship, for trvMV GC the equiv alence holds only in the more r estricted situation when the p redictee v ariable is univ ariate. In add ition to motiv ating MV GC o ver trvMV GC, the equiv alence ( 12 ) also pro vides a ju stification for th e use of linear regression mo dels in measur ing causalit y . T ransfer en trop y is naturally sensitiv e to nonlinearities in the data, a p rop erty whic h is rightly seen as desirable for measures of causalit y and whic h has motiv ated the dev elopmen t of several nonlinear extensions to standard G-causalit y [ 20 , 21 ]. Ho w ev er, when data are Gaussian, the t w o linear r egressions capture all of the en trop y difference that 6 defines trans fer entrop y , whic h imp lies that non-linear extensions to G-causalit y are of n o add itional utilit y . Indeed for tw o m u ltiv ariate Gaussian v ariables X and Y , the partial co v ariance Σ( X | Y ), whic h is the same qu an tit y as the residual co v ariance u nder lin ear regression, can b e sim p ly thought of as the conditional co v ariance of X giv en Y , b ecause co v ( X | Y = y ) = Σ( X | Y ) for all y . Hence, for Gaussian data, linear regression accoun ts for all the d ep end en ce of the regressee on the r egressor. T o d emonstrate formally th at a stationary Gaussian AR pr o cess must b e line ar , consider a general stationary m ultiv ariate Gaussian p ro cess X t satisfying X t = f X ( p ) t − 1 + ε t , (14) where f ( · ) is some sufficien tly wel l-b ehav ed, p ossibly n onlinear fun ction and the ε t are indep enden t of X t − s for s = 1 , 2 , . . . . F or an y t then, ε t = X t − f X ( p ) t − 1 is indep end en t of X ( p ) t − 1 , so that, in particular, f or an y v alue ξ tak en b y X ( p ) t − 1 , the conditional exp ectation E ε t | X ( p ) t − 1 = ξ = E X t | X ( p ) t − 1 = ξ − f ( ξ ) (15) do es not dep end on ξ and nor, b y statio narit y , on t . Bu t since b y assu mption X t and X ( p ) t − 1 are join tly multiv ariate Gaussian, by a w ell-kno wn result E X t | X ( p ) t − 1 dep end s linearly on ξ , and fr om ( 15 ) it follo ws that f ( ξ ) m u st b e a linear function of ξ . 4.2 In v ariance under t ransformation of v ariables The partial co v ariance Σ( X | Y ) transforms in a simple w a y un d er linear trans f ormation of v ariables. If T and U are resp ectiv e matrices for linear transformations on X and Y then w e ha ve that Σ( T · X | U · Y ) ≡ T Σ( X | Y ) T ⊺ . (16) Using this f orm ula, and the p rop erties of the determinan t and trace op erators, we can fi nd the resp ectiv e groups of linear transformations under which MVGC and trvMV GC are inv ariant. F or MV GC, w e find that the most general transformation that F Y → X | Z is inv arian t u nder is g iv en b y X → T xx · X , Y → T y x · X + T y y · Y + T y z · Z , Z → T z x · X + T z z · Z , (17) where the matrices T xx , T y y and T z z on the diago nal are non-singu lar. All these sym metries are desirable prop erties for a causalit y measur e. There ough t to b e inv ariance under redefinition of the individual v ariables within eac h of X , Y and Z , (i.e. und er the diagonal comp onen ts T xx , T y y and T z z of Eq. ( 17 )), b ecause MV GC is designed to measure causalit y b et w een unified wh oles rather th an b et we en arbitrarily defin ed constituen t elements. Th e “off-diagonal ” comp on ents T y x , T y z and T z x are also in tuitiv e. Adding comp onen ts of Z or X to the predictor Y should n ot change the v alue of MV GC, b ecause MV GC is designed to measure the abilit y of Y at predicting X over and ab ove Z and X . Similarly , adding components of X on to Z should n ot mak e a difference b ecause the predictee X could already b e thought of as a conditional v ariable b efore transf orm ation. 7 trvMV GC has an inv ariance un der a similar group of transformations but with one significan t restriction, n amely that the m atrix T xx m ust b e c onformal (a ngle-preserving), that is T xx m ust satisfy T xx T xx ⊺ = cI f or some constan t c . This difference can h av e practical consequences. The broader in v ariance of MV GC (u nder al l linear transformations T xx ) means th at this measure, but not trvMV GC, is insensitiv e to certain common inaccuracies of data collection, namely those in whic h v ariables w ithin a giv en set X are cont aminated b y con trib utions from other v ariables (see Discussion). T o pu t th is p oin t another wa y , if o ne wishes to infer MVGC b et w een hid den v ariables b y analyzing MVGC b et w een observ ed v ariables, these tw o quantiti es are actually the same if the relationship b et we en hidden and ob s erv ed v ariables is linear and can b e written in the form giv en in Eq. ( 17 ). One ma y also w ish to measure the MV GC fr om the in d ep end ent comp onent s of the predictor to the indep enden t comp onent s of the predictee. Agai n, the in v ariance prop erties of MV GC mean that o ne do es not need t o explicitly find these indep endent comp onents; one can simply compute MV GC b et w een observ ed comp onent s. Th ese observ ations indicate that MV GC tak es in to accoun t correlation b et w een v ariables in a principled wa y . W e see this explicitly in Section 4.3 . The restriction T xx T xx ⊺ = cI for trvMVG C fu rther imp lies that an uneve n r esc aling o f the comp onent s of the p redictee v ariable may c hange the v alue of F tr Y → X | Z . This to o h as practical implications, namely that trvMV GC but not MVGC can b e affe cted by magnitude differences in the comp onent s of X , p erhaps resulting from these comp on ents reflecting und er lyin g m echanisms that are d ifferen tly amp lified or differentiall y accessible to the measur ing equipment, a common situation in many neuroscience con texts (see Discussion). Th is sensitivity is und esirable b ecause causal connectivit y should b e based on the information conte nt of signals (c.f. Section 4.1 ), and not on their resp ectiv e magnitudes. It is worth noting that for transfer ent ropy the symmetry group can b e extended to include all non-singular (not necessarily linear) transformations of the p redictee v ariable, since the ent ropies are in v ariant under such transformations. 9 Since G-causalit y is essen tially a linear version of transfer en trop y , the former sh ould at least b e inv arian t under the linear subgroup of tr ansformations. 4.3 Expansion of Multiv ariate Granger Causalit y MV GC is expandable as a su m of G-causalitie s o v er all co mbinatio ns of univariate pr edictor and predictee v ariables con tained within the multiv ariate comp osites. The existence of this expansion dep end s on the fact that determinants are d ecomp osable into pro d u cts, and that logarithms of pro du cts are d ecomp osable in to s ums of logarithms. No suc h d ecomp osition exists for the logarithm of a trace, and so th er e is no obvious wa y of exp anding trvMV GC in to com binations of un iv ariate comp onent s. The expansion of MV GC is not entirely straight forwa rd b ecause d ifferen t terms in the sum in v olv e conditioning on the past and presen t of differen t subsets of v ariables. Ho w ev er eac h pre- dictor/predictee com bination app ears precisely once in the su m, and eac h term can b e explained in tuitiv ely . The general form ula may b e written as F Y → X | Z = n X i =1 m X α =1 F Y α → X i | Z ⊕ X ⊕ Y 1 ⊕ ... ⊕ Y α − 1 ⊕ X 0 1 ⊕ ... ⊕ X 0 i − 1 , (18) 9 If the predictee v ariable has a contin uous (multiv ariate) distribution, we note that th e Jacobian determinants in the standard change-of-v ariables formula for entrop y calculation cancel out. 8 where the su p erscrip t ‘ 0 ’ ind icates conditioning on th e present (in addition to th e past) of the corresp ondin g v ariables. Th us, in the term for causalit y from Y α to X i one conditions on (i) the past of the en tire multiv ariate conditional v ariable Z , (ii) the past of the en tire m ultiv ariate p redictee v ariable X , (iii) the past of all pr edictor v ariables Y β with β < α and (iv) the p resen t of all predictee v ariables X j with j < i . Th e deriv atio n of the e xpansion ( 18 ) is giv en in App endix B . F or the case of a m ultiv ariate predictor and a univ ariate p redictee we ha v e F Y → X = F Y 1 → X + F Y 2 → X | Y 1 + F Y 3 → X | Y 1 ⊕ Y 2 + · · · + F Y m → X | Y 1 ⊕ Y 2 ⊕ ... ⊕ Y m − 1 . (19) This f ormula is consisten t with the in tuitiv e idea that the total degree to whic h the multiv ariate Y helps p redict the un iv ariate X is: the degree to whic h Y 1 predicts X , plus the d egree to which Y 2 helps p redict X o ver and a b ov e the informatio n al ready presen t in Y 1 , and so on. F or the case of a m ultiv ariate predictee and a univ ariate predictor w e h a v e F Y → X = F Y → X 1 | X + F Y → X 2 | X ⊕ X 0 1 + F Y → X 3 | X ⊕ X 0 1 ⊕ X 0 2 + · · · + F Y → X n | X ⊕ X 0 1 ⊕ X 0 2 ⊕ ... ⊕ X 0 n − 1 . (20) This formula supp orts the intuition that the total degree to whic h th e univ ariate Y h elps predict the multiv ariate X is: the degree to whic h the past of Y helps p redict th e current v alue o f X 1 o v er and ab o v e the degree to which the past of the whole of X predicts the curr ent v alue of X 1 , plus the degree to whic h the past of Y h elps predict the cur ren t v alue of X 2 o v er and abov e th e degree to whic h the past of the whole of X and the curr en t v alue of X 1 predicts the curren t v alue of X 2 , and so on. W e remark on t wo implications of th e expansion of MV GC. First, Ladroue and collea gues sug- gested that use of generalized residual v ariance for causal inference on high-dimensional data might suffer fr om pr oblems of n umerical stabilit y . How ev er, the expansion of MV GC into lo w-dimensional, univ ariate G-causalities suggests that th ere should b e no problem (see Section 4.3.1 for numerica l evidence of this). Second, the expansion ( 18 ) indicates that MV GC control s for, to some extent, the influence of unmeasur ed latent/ exogenous v ariables (see also Section 5 ). By conditioning on th e present of certain appropriate predictee v ariables f or eac h term of th e expans ion, only the effects of eac h p redictor on indep endent comp onents of the predictees en ter the equation. This prop ert y stems from the fact that the d eterminan t of the residual co v ariance matrix refl ects not just residual v ariances, b ut also the exten t to whic h these residual v ariances are indep enden t of eac h other. This is another adv an tage of th e MV GC measure o v er trvMV GC, whic h do es not dep end on residu al correlations. 4.3.1 Stabilit y of Multiv ariat e Granger Causality W e tested numerically our claim (Section 4.3 ) that MV GC sh ould not b e less stable than trvMV GC. W e studied MV AR(1) pr o cesses whose dynamics are give n by X t = A · X t − 1 + ε t , (21) where X cont ains 8 v ariables, the sum of eac h ro w of A (i.e. total afferent to eac h elemen t) is 0.5, all comp onent s in a giv en row of A are equal and p ositive , and eac h co mp onen t of ε t is an indep en d en t Gaussian random v ariable of mean 0 and v ariance 1. W e generated 30 r andom “c onnectivit y” m atrices (or sys tems) A i , ( i = 1 , . . . , 30), eac h w ith an a v erage of 2 non-zero comp onen ts p er ro w. F or eac h A i 9 w e obtained 10 s ets of 3000 (p ost equilibrium) d ata p oin ts via Eq. ( 21 ). F or eac h set, w e co mputed the MVGC across eac h bipartition of the system corresp onding to A i . W e th en ca lculated, for eac h bipartition, the standard deviation of th e MV GC across the 10 data s ets and (excludin g bipartitions with standard d eviation less than 0.01) the corresp ondin g co efficient of v ariation (CoV, standard deviation divided b y mean). Th is p ro cedure allo w ed us to obtain, for eac h A i , a maxim um CoV. Figure 1 (a) sh o ws that the maximum CoV is generally v er y small and nev er large, confirming the stabilit y of MV GC. T o compare the stabilit y of MVGC with that of trvMV GC, for eac h A i and for eac h bip artition w e d ivided the CoV for MV GC b y the CoV for trvMVG C. Figure 1 (b) sho w s the distribution of the a v erage of this ratio across all bipartitions. The clustering of this distribution at ≈ 1, with n o outliers, confir ms that MV GC and trvMV GC ha v e similar stabilit y prop erties, at least in the systems w e h a v e simulate d. T o generalize these results w e next used a genetic algorithm (GA) [ 22 , 23 ] to see if w e could find a net w ork for wh ic h MV GC b ecomes unstable. The GA wa s initialized u s ing a p opulation comp osed of the 30 rand om systems A i describ ed ab ov e. W e r an the GA for 130 generations. In eac h generatio n, we computed the fitness of eac h system as the maxim u m CoV of MVG C. Systems w ere selected to pro ceed to sub sequent generations using sto chastic rank-b ased selectio n. Mutations enabled the add in g of new n on-zero comp onent s to A i , the r emo v al of existing n on-zero comp onents, or the s wapping of comp onents, follo wed b y renormalization of eac h r o w to su m to 0.5 aga in; tw o m utations w ere applied p er system. After 130 generatio ns (sufficient for fitness to asymptote) the a v erage fitness (i.e. maximum CoV) in the p opu lation wa s ≈ 0.25, and the maximum wa s 0.39, whic h is s till a lo w v alue. F or th e A i that ga v e this highest v alue, w e compared the CoV obtained using MV GC w ith that obtained using trvMVGC follo w in g the pro cedure describ ed ab ov e. Th e a v erage r atio (across all bip artitions) was ≈ 1.00, (maxim um v alue 1.12), indicating that MV GC and trvMV GC had similar stabilit y prop erties ev en for systems optimized to b e unstable with resp ect to MV GC. F urther, w e examined some A i for which the sum s of the rows d iffered (i.e. ha ving heterogeneous afferent connectivit y); these systems had similar stabilit y prop erties to those d escrib ed ab o v e. Finally , stabilit y p rop erties w ere unaffected when computations w ere based on 100 0 (rather than 3000) data p oin ts. T ak en together, these sim ulation results confirm that MV GC is n umerically stable, and is not appreciably differen t from trvMV GC in terms of stabilit y prop erties. 4.4 Sp ectral decomposition In this section we review the sp ectral decomp osition of G-causalit y [ 5 , 1 ]. F or simp licit y w e limit ourselv es to the unconditional case, although the pro cedu r e ma y b e readily extended to the condi- tional case (as d escrib ed in e.g. Refs. [ 18 , 1 , 24 ]). W e assum e m ultiv ariate predictor and predictee v ariables, a nd sho w that MV GC but not trvMV GC has a s atisfactory sp ectral decomp osition. Consider th e stationary MV AR X t = A · X ( p ) t − 1 + ε t = p X k =1 A k · X t − k + ε t . (22) W e ma y write this as A ( L ) · X t = ε t , (23) 10 0 2 4 6 8 0 0.1 0.2 0.3 Occurrences Max CoV(MVGC) (a) 0 2 4 6 8 0.92 0.94 0.96 0.98 1 1.02 Occurrences Average CoV(MVGC)/CoV(trvMVGC) (b) Figure 1: Stabilit y of MV GC. (a) Histogram of the maximum CoV of MVG C, observed o v er 10 trials of 300 0 time-ste ps, for ea c h of 30 differen t systems, as describ ed in Sectio n 4.3.1 . (b) Histogram of the av erage r atio b et w een th e CoV of MV GC an d the CoV of trvMV GC, for eac h of the 30 sys tems. MV GC is numericall y stable (a) and is not app reciably differen t from trvMV GC in terms of stabilit y prop erties (b). where L denotes the (single time step) lag op erator, and A ( L ) ≡ I − p X k =1 A k L k . (24) Eq. ( 23 ) may b e solv ed as X t = H ( L ) · ε t , (25) where H ( L ) ≡ A ( L ) − 1 . T ransf orming in to th e frequency domain via th e discrete-time F ourier transform X ( λ ) = P ∞ t = −∞ X t e − iλt yields A ( λ ) · X ( λ ) = ε ( λ ) (replac e L b y e − iλ ), so that X ( λ ) = H ( λ ) · ε ( λ ) , (26) where H ( λ ) ≡ A ( λ ) − 1 is th e tr ansfer matrix . Th e (p o w er) sp e ctr al density of X is then gi v en b y S ( λ ) = H ( λ ) Σ( ε ) H *( λ ) . (27) F rom a standard result [ 25 ], since H ( L ) is a square matrix lag op erator with the iden tit y matrix as leading term, w e hav e 1 2 π Z π − π ln | H ( λ ) H * ( λ ) | d λ = 0 , (28) pro vided that all ro ots of the charac teristic p olynomial | A ( L ) | lie outside the unit circle, wh ic h is a necessary condition for the exi stence of the stationary pro cess ( 22 ). F rom ( 27 ) w e ma y then deriv e the r elation [ 26 ] 1 2 π Z π − π ln | S ( λ ) | d λ = ln | Σ( ε ) | . (29) 11 Consider n o w the statio nary MV AR X t ⊕ Y t = A · X ( p ) t − 1 ⊕ Y ( q ) t − 1 + ε x,t ⊕ ε y , t (30) with co efficient s matrix A ≡ A xx A xy A y x A y y (31) and resid uals co v ariance matrix Σ( ε x ⊕ ε y ) ≡ Σ xx Σ xy Σ y x Σ y y . (32) Let us split the co rresp onding transfer matrix H ( λ ) as H ( λ ) ≡ A ( λ ) − 1 = H xx ( λ ) H xy ( λ ) H y x ( λ ) H y y ( λ ) (33) and the sp ectral density as S ( λ ) = S xx ( λ ) S xy ( λ ) S y x ( λ ) S y y ( λ ) . (34) Then S xx ( λ ) is just the spectral d ensit y of X , whic h from ( 27 ) is giv en by S xx ( λ ) = H xx ( λ )Σ xx H * xx ( λ ) + 2 Re { H xx ( λ )Σ xy H * xy ( λ ) } + H xy ( λ )Σ y y H * xy ( λ ) . (35) The idea is that we wish to decomp ose this expr ession int o a part reflecting the effect of X itself and a part r eflecting the causal influ ence of Y . T he problem is that, due to th e presence of the “cross” term, S xx ( λ ) does not split clea nly into an X and a Y p art. Gew ek e [ 5 ] addresses this issue b y in tro ducing the transformation X ⊕ Y → U · ( X ⊕ Y ) , (36) where U ≡ I 0 − Σ y x Σ − 1 xx I . (37) Note that this transformation lea v es the G-causalit y F Y → X in v ariant (c.f. Section 4.2 ) and, for the transformed regression, we hav e Σ xy ≡ 0; that is, the residuals ε x , ε y are uncorrelated. T h us, assuming the transformation ( 37 ) has b een p re-applied, Eq. ( 35 ) b ecomes S xx ( λ ) = H xx ( λ )Σ xx H * xx ( λ ) + H xy ( λ )Σ y y H * xy ( λ ) , (38) whereby the sp ectral densit y of X splits in to an “intrinsic” part and a “ca usal” part. The sp ectral G-causalit y of Y → X at frequency λ is no w defin ed to b e f Y → X ( λ ) ≡ ln | S xx ( λ ) | | H xx ( λ )Σ xx H * xx ( λ ) | (39) or, in terms of the untr ansforme d v ariables, f Y → X ( λ ) ≡ ln | S xx ( λ ) | S xx ( λ ) − H xy ( λ )Σ y | x H * xy ( λ ) ! , (40) 12 with S xx ( λ ) as in ( 35 ) a nd Σ y | x ≡ Σ y y − Σ y x Σ − 1 xx Σ xy . Gew ek e (Ref. [ 5 ], Theorem 2) then esta blishes the fu n dament al motiv ating relationship b et w een frequency and time domain G-c ausalit y: 1 2 π Z π − π f Y → X ( λ ) d λ = F Y → X , (41) pro vided that all ro ots of | A y y ( L ) | lie outside the unit circle. 10 The p ro of of this relation r elies crucially on the result ( 28 ) which, w e note, inv olve s the determinant of the transf er matrix. Thus if the trace, rather than the determinan t, were to b e us ed in the d efi nition ( 39 ) for f Y → X ( λ ) then w e could n ot exp ect to obtain a relation corresp onding to ( 41 ), since (i) the trace of th e sp ectral dens it y in Eq. ( 27 ) does not factorize, (ii) there is no trace analogue to Eq. ( 28 ), and th u s (iii ) no analogue to Eq. ( 29 ). This wo uld seem to p r eclude a satisfacto ry sp ectral decomp osition for the trace ve rsion of G-causalit y . Similar remarks apply to conditional G-causalit y in the sp ectral domain. In Ref. [ 4 ], ho w ev er, it is conjectured that a trace analogue of Eq. ( 41 ) do es indeed hold. T o test this conjecture we p erformed the follo wing exp erimen t: we s im ulated 1000 MV AR(1) pro cesses of the f orm X t ⊕ Y t = A · ( X t − 1 ⊕ Y t − 1 ) + ε x,t ⊕ ε y , t , ( 42) where X has dimension 2 and Y dimension 1. Residuals ε x,t , ε y , t w ere completely uncorrelated, with unit v ariance (i.e. Σ( ε x,t ⊕ ε y , t ) wa s the 3 × 3 id en tit y matrix) so th at, in particular, the Gew ek e transformation ( 37 ) w as unnecessary . F or eac h trial the 3 × 3 co efficien ts matrix A w as c hosen at random with elemen ts uniform on [ − 1 2 , 1 2 ], and the pro cess ( 42 ) simulat ed for 10 6 stationary time steps (the o ccasional unstable pro cess was rejecte d). T ime domain causalities F Y → X , F tr Y → X and frequency domain causalities f Y → X ( λ ) , f tr Y → X ( λ ) w ere calculat ed in sample using p = 10 lags. (As noted previously , 10 equalit y in ( 41 ) is only assured in the limit of infinite lags; 10 lags was found empirically to ac hiev e go o d accuracy without o verfitting the d ata.) Relativ e errors of integrat ed sp ectral MV GC with resp ect to time-domain MV GC, expressed as a p ercen tage, we re d efined as E % ≡ 100 × 1 2 π R π − π f Y → X ( λ ) d λ − F Y → X F Y → X , E tr % ≡ 100 × 1 2 π R π − π f tr Y → X ( λ ) d λ − F tr Y → X F tr Y → X , (43) for MV GC and trvMV GC resp ectiv ely . (The in tegrals w ere computed b y stand ard numerical qu adra- ture.) Results, displa yed in T able 1 , confirm to goo d accuracy the theo retical prediction of Eq. ( 41 ) for MV GC (the small nega tiv e bias on E % is d ue to the finite n u m b er of lags), while for tr v MVGC relativ e errors are sev eral orders of magnitud e larger and furth er m ore w ere not decreased by choosing longer stationary sequen ces and/or more lags. The full distrib ution of relativ e errors is also d ispla y ed as a histog ram in Fig. 2 . 10 A subtlet y t o note is that ev en if the MV AR ( 30 ) has a finite n umber of lags p, q < ∞ , the exact r estricte d regress ion of X on its own past will generally require an infinite num b er of lags [ 5 ]. Thus in theory , for exact equ alit y in ( 41 ), an infinite num b er of lags is requ ired to calculate the term Σ X | X − whic h app ears in F Y → X (using a finite num b er of lags will generally result in an overe stimate of F Y → X , since residu al errors will b e larger than for the exact regressi on). As ap p lied to empirical data, it is in any case goo d practice to c hoose “sufficient” lags for all regressions so as to mo del the data adequately without o verfitting [ 27 , 28 ]. 13 error mean std. d ev. abs. mean E % − 0 . 0004 0 . 0005 0 . 0005 E tr % − 0 . 0488 10 . 599 5 8.1799 T able 1: Comparison of relativ e err ors of inte grated sp ectral MVGC and trv MVGC with resp ect to time domain MVGC and trvMVG C, for a random sample of MV AR(1 ) pr o cesses. T op ro w sho ws MV GC, b ottom ro w shows trvMV GC. See text for details. Figures in the “abs. m ean” column are the m eans of the absolute v alues | E % | and E tr % . 0 0.1 0.2 0.3 -0.003 -0.0015 0 0.0015 0.003 fraction of trials relative error %E (a) 0 0.05 0.1 0.15 -40 -20 0 20 40 fraction of trials relative error %E tr (b) Figure 2: Distribution of relativ e errors of in tegrated sp ectral multiv ariate G-causalit y w ith resp ect to the time domain f or (a) MV GC (b) trvMV GC, for a random s amp le of MV AR(1) pr o cesses. 14 0 0.2 0.4 0.6 0 π /4 π /2 3 π /4 π causality frequency ( λ ) (a) f Y → X ( λ ) f tr Y → X ( λ ) 0 0.4 0.8 1.2 0 π /4 π /2 3 π /4 π causality frequency ( λ ) (b) f Y → X ( λ ) f tr Y → X ( λ ) Figure 3: Comparison of MVG C and trvMV GC in the fr equency domain: sp ectral MV GC and trvMV GC p lotted against f r equency for (a) a typical MV AR(3) pro cess with dim( X ) = 2 , dim( Y ) = 1 and (b) a typical MV AR(5) pro cess w ith dim( X ) = 3 , dim( Y ) = 2. W e also rep eated the exp erimen t w ith h igher order MV AR( p ) pro cesses, higher dimensional predictee an d p r edictor v ariables and correlated residuals ε x . In all cases, resu lts confirm ed the accuracy of ( 41 ) for MV GC and yielded large relativ e err ors for trvMV GC. W e r emark that qualitative differences (i.e. aside from differences of scale) b et ween sp ectral MV GC and trvMV GC could b e substanti al (Fig. 3 ). Th ese differences, fu rthermore, app eared in general to b e exaggerated b y the presence of r esidual correlations; this is consonan t with the sensitivit y of MV GC as con trasted with the lac k of sensitivit y of trvMV GC to residual correlations (see Sections 4.3 and 5 ). It is straigh tforw ard to sho w that f Y → X ( λ ) is inv arian t und er the same group of linear transfor- mations ( 17 ) as F Y → X ; again, f tr Y → X ( λ ) will in general b e in v ariant on ly under the r estricted group with T xx conformal; this extends to the conditional case. 5 Multiv ariate partial G ranger causalit y Recen tly , a p artial G-c ausality measure h as b een in tro du ced [ 29 ] whic h exploits a parallel with th e concept of p artial c oher enc e [ 30 ] in order to con trol for laten t/exogenous in fluences on standard G- causalit y . Pa rtial G-causalit y mo d ifies the stand ard G-causalit y measur e by including terms based on residual correlations b et w een the p r edictee v ariable and the conditional v ariables. Consid er , in addition to the regressions ( 7 ), th e follo wing regressions of th e c onditioning v ariable Z t : Z t = B · X ( p ) t − 1 ⊕ Z ( r ) t − 1 + η t , Z t = B ′ · X ( p ) t − 1 ⊕ Y ( q ) t − 1 ⊕ Z ( r ) t − 1 + η ′ t . (44) Here the roles of the pred ictee and conditioning v ariables are rev ersed. Then f or u niv ariate predictor and p redictee the partial G-causalit y of Y on X giv en Z is defined b y conditioning th e r esp ectiv e 15 residual co v ariances for the regressions of X on the corresp ond ing r esiduals f or the regressions of Z : F P Y → X | Z ≡ ln Σ( ε t | η t ) Σ( ε ′ t | η ′ t ) . (45) This extends naturally to the fully m u ltiv ariate case (c.f. Eq. ( 10 )), and w e define partial MV GC (pMV GC) as F P Y → X | Z ≡ ln | Σ( ε t | η t ) | | Σ( ε ′ t | η ′ t ) | (46) = ln | Σ( X | X − ⊕ Z − ⊕ Z ) | | Σ( X | X − ⊕ Y − ⊕ Z − ⊕ Z ) | (47) where the RHS ( 47 ) follo ws from the iden tit y ( 64 ) derive d in App endix C , (with W ≡ X − ⊕ Z − and W ≡ X − ⊕ Y − ⊕ Z − for the n u m erator and denominator terms resp ectiv ely). Comparin g with ( 10 ) we see thus that pMV GC d iffers from MVGC in the inclusion of the pr esent of the conditioning v ariable Z in the resp ectiv e regressions. Seen in this form, it is clear that, as is the case for MV GC, pMV GC is alw a ys non-negativ e. 11 One could alternativ ely express pMVGC as (non-partial) MV GC conditioned on a “forward lagged” v ersion of Z : defining ˜ Z t ≡ Z t +1 w e hav e Z t ⊕ Z ( r ) t − 1 ≡ ˜ Z ( r +1) t − 1 , or ˜ Z − = Z ⊕ Z − (note the additional lag on ˜ Z − ), so that, from Eq. ( 47 ), F P Y → X | Z = F Y → X | ˜ Z . (48) As noted in S ection 4.3 , (n on -p artial) MVGC to some exten t already con trols for the influence of laten t/exogenous v ariables b ecause the ge neralized v ariance is sen sitiv e to residual correlations. Ho w ev er, pMVGC tak es into accoun t ev en more correlations with the explicit aim of con trolling for laten t/exog enous in fluences. pMVG C ma y therefore b e preferable when suc h influences are ex- p ected to b e (a) strong and (b) relativ ely uniform in their influen ce on the measured system. Indeed, pMV GC (and the original measure of partial G-ca usalit y) can only b e effectiv e in comp ensating f or laten t/exogenous v ariables that affect al l mo d eled v ariables (i.e. predictee, p r edictor and co ndition- ing) to a roughly equ al degree [ 29 ]. It is in teresting to note that p MV GC may b e expr essed in terms of n on-partial MVG Cs as F P Y → X | Z = F Y → Z ⊕ X − F Y → Z | X . (49) b y straigh tforw ard application of Eq. ( 3 ). As exp ected, ( 49 ) includes a term with a mand atory m ultiv ariate p r edictee, since it is only in this case that r esidual correlation can mak e a difference. It is interesting that Z app ears as a pr e dicte e v ariable; this migh t b e understo o d as pMVGC us ing the conditioning v ariable Z as a “pr o xy” b y wh ic h to assess the influence of laten t or exogenous v ariables. A “trace” version of p MV GC may b e defin ed analogously to ( 46 ). Again by Eq. ( 64 ) of App endix C , the identit y corresp onding to ( 47 ) will hold, as will the trace analog ue of ( 48 ). How eve r, th e 11 In [ 29 ] it is stated that partial G- causalit y may in some circumstances b e ne gative ; t h e justification for this is unclear. 16 analogue of ( 49 ) will not hold in general, since the traces of the partial co v ariance matrices will in general not facto rize appropriately . 12 F rom ( 48 ) it is straigh tforw ard to deriv e a sp ectral d ecomp osition f P Y → X | Z ( λ ) for pMV GC, whic h will int egrate correctly to the time-domain pMVGC F P Y → X | Z . Again, a sp ectral decomp osition for the corresp ondin g trace v ersion is like ly to b e p roblematic, insofar as it will fail in general to in tegrate correctly to the time-domain v alue (c.f. Section 4.4 ). 6 Causal densit y A straigh tforward application of MV GC is to measur es of c ausal density , the o verall lev el of causal in teractivit y sustained by a multiv ariate system X . A previous measur e of causal densit y [ 22 ] has b een defined as th e a v erage of all pairwise (and hence univ ariate) G-causalitie s b et we en system elemen ts, c onditioned on the remaining system element s: 13 cd( X ) ≡ 1 n ( n − 1) X i 6 = j F X i → X j | X [ ij ] (50) where X [ ij ] denotes the su bsystem of X with v ariables X i and X j omitted, an d n is the total num b er of v ariables. Caus al dens it y pro vides a u seful measure of the dynamical “complexit y” of a system inasm uc h as elements that are completely indep en den t will hav e zero causal density , as w ill elemen ts that are completely integ rated in their dyn amics. Exemplifying standard in tuitions ab out complexity [ 31 ], high causal density will only b e ac h ieved wh en elemen ts b eha v e somewhat differently from eac h other, in order to contribute no v el p otent ial p redictiv e information, and at the same time are globally in tegrated, so that the potenti al predictiv e in formation is in fact us efu l [ 32 , 33 ]. Using MV GC, v arious extensions to ( 50 ) can b e sugge sted, based on the v arious p ossible in ter- actions b et we en multiv ariate predictors, p redictees and conditional v ariables. These extensions ma y pro vide a more principled measur e of complexit y b y analyzing a target system at multiple scales. First we define the causal d ensit y f r om size k to size r , cd k → r ( X ), as the a v erage MV GC from a subset of size k to a subset of size r , co nditioned on the rest of the system: cd k → r ( X ) = 1 n k ,r n k,r X i =1 F V k i → U r i | W n − k − r i , (51) where X = V k i ∪ U r i ∪ W n − k − r i denotes the i th of th e n k ,r ≡ n k n − k r distinct tripartitions of X in to disjoin t sub-systems of resp ectiv e sizes k , r and ( n − k − r ). Then using this, one could defin e the bi p artition c ausal density (b cd) as the a v erage of cd k → ( n − k ) ( X ) o v er pr edictor size k , b cd( X ) = 1 n − 1 n − 1 X k =1 cd k → ( n − k ) ( X ) . (52) 12 In [ 4 ], un der the section headed “P artial Complex Granger causalit y”, th e quantity d eveloped app ears to be (the trace version of ) what is conv en tionally referred to as c onditional G-causalit y , rather than p artial G-causalit y as introduced in [ 29 ] and referenced in this section. 13 This is the “wei ghted” vers ion of causal density . An unw eigh ted and [0,1] b oun ded alternative can b e defined as the fraction of all p airwise conditional causalities that are statistically significan t at a given significance level. 17 In terestingly , this quan tit y is closely related to the p opular T on on i-S p orns-Edelman “neu r al com- plexit y” measure [ 34 ] whic h av erages (con temp oraneous) mutual information across bipartitions; (w e are current ly exploring this in w ork in preparation). It could also b e in teresting to compare causal densit y at differen t s cales of predictor plus pr ed ictee size; th u s we define cd s ( X ) ≡ 1 s − 1 s − 1 X k =1 cd k → ( s − k ) ( X ) . (53) Then the original causal densit y measure of Eq. ( 50 ) is just cd 2 and b cd is cd n . Th e av erage of this o v er all scales can b e used to d efine a co mplete trip artition c ausal density (tcd): tcd( X ) ≡ 1 n − 1 n X s =2 cd s ( X ) . (54) A comparison of the pr op erties of all v ersions of causal densit y , as well as related complexit y m ea- sures, is in progress. W e remark that it is straigh tforwa rd to define sp ectral v ersions of these causal densit y measures. 7 Autonom y in complex systems G-causalit y has recen tly b een adapted to pro vide an op erational measure of “autonom y” in complex systems [ 11 ]. A v ariable X can b e said to b e “G-auto nomous” w ith resp ect to a (m ultiv ariate) set of external v ariables Z if its o wn p ast states help pred ict its future states ov er and ab o v e predictions based on Z . This definition rests on the in tuition of autonom y as “self determination” or “self causation”. W e can formalize this notion along the li nes of MV GC as follo ws. Consider the regressions X t = A · Z ( r ) t − 1 + ε t , X t = A ′ · X ( p ) t − 1 ⊕ Z ( r ) t − 1 + ε ′ t , (55) whic h differ from Eqs. ( 7 ) primarily b ecause the p redictee v ariable X is not regressed on itself in one of the equations. Th e G-autonom y of X is then giv en b y A X | Z = ln | Σ( ε t ) | | Σ( ε ′ t ) | . (56) The extension of G-autonom y to th e multiv ariate case is imp ortan t b ecause it ac commo dates situa- tions in wh ic h groups of eleme nts ma y b e join tly autonomous (self-determinin g, self-causing), ev en though the activit y of individu al elemen ts within the group ma y b e adequately predicted b y com- binations of activities of other elemen ts in the group. Un iv ariate formulations of G-autonom y [ 11 ] w ould fail in these cases. Consider as a trivial example an elemen t X 1 whic h is G-a utonomous with resp ect to a bac kground Z . I f X 1 is no w duplicated b y the element X 2 it will no longe r app ear as G-autonomo us w ithin the multiv ariate system X 1 ⊕ X 2 ⊕ Z . Ho we v er, the multiv ariate v ariable X 1 ⊕ X 2 will b e (join tly) G-a utonomous with resp ect to Z . 18 As discussed in [ 11 ] G-autonomy also provides th e basis f or a notion of “G-emergence” as ap- plied to the relation b et we en macr osc opic v ariables “emerging” from the activit y of micr osc opic constituen ts. G-emergence op erationaliz es the intuitio n th at a m acro-lev el v ariable is emergen t to the extent th at it is sim ultaneously autonomous fr om and dep endent up on its micro-lev el constituents [ 11 , 35 ]. Extension of G-emergence to the multiv ariate case u sing MV GC is straightfo rward, allo wing consideration of m ultiv ariate micro- and macro-v ariables. 8 Macroscopic v ariables and causal indep endence Giv en the abilit y to assess m u ltiv ariate ca usal interact ions, a second c hallenge arises: the identi fica- tion of relev ant groupings of v ariables in to multiv ariate en sem bles. One approac h to this c hallenge adopts th e p ersp ectiv e of statistical mec h anics on the emergence of no v el macroscopic v ariables, giv en a microscopic description of a system [ 36 , 37 ]. Here, w e suggest that MV GC ma y fu rnish a useful metho d for macro-v ariable iden tificatio n in this con text. Let us assume that Z t represent s a set of microscopic v ariables definin g a complex (possib ly sto chast ic) dynamical system, and X t ≡ f ( Z t ) a set of macroscopic v ariables f unctionally (p ossib ly deterministically) d ep end ent on the microscopic v ariables. There is th en a sense in which X represents a “parsimonious” h igh-lev el description of the system, to the ext en t that it predicts its o wn dynamical ev olution without recourse to the lo w lev el of description of the system r epresen ted by Z ; that is, to the exten t that X exh ibits strong c ausal indep endenc e with resp ect to Z . In this view, F Z → X furnish es a natur al measure of th e lack of this ca usal indep end ence, whic h migh t then b e u sed to identify parsimonious macroscopic v ariables b y minimizing F Z → f ( Z ) o v er candidate functions f ( · ). The m u ltiv ariate form ulation MV GC wo uld app ear to b e significan t in this con text for reasons similar to the G-autonom y case. Sp ecifically , it ma y b e that a set of macroscopic v ariables X ma y jointly h a v e high causal ind ep end en ce with resp ect to the microscopic v ariables Z , while th e comp onen t v ariables X i ma y individu ally ha v e lo w er causal indep en d ence. The notio ns of G-autonom y , G-emergence , and causal indep endence are d istinct but related. In short, G-autonom y m easures “self-causation”, causal indep endence measures the absenc e of u se- ful predictiv e in f ormation b et w een microscopic and macroscopic descriptions of a system, and G- emergence measur es a com bination of macro-lev el autonom y and m icro-to-macro causal dep endenc e . It is p ossible, and is left as an ob jectiv e of future w ork, that all three measur es could b e applied usefully to s ystems that a v ail m ultiple leve ls of descriptions, (i) to identify relev an t group ings of observ ables at ea c h lev el, (ii) to decomp ose causal in teract ions w ithin eac h lev el, and finally (iii) to quan titativ ely charac terize in ter-lev el relationships. 9 Discussion W e ha ve d escrib ed and motiv ated a measure of m ultiv ariate causal inte raction that is a natural ex- tension of the standard G-causalit y measure. Th e measure, originally intro d uced by Gewek e [ 5 ] bu t almost totally o v erlo ok ed since, uses the generalized v ariance (the determinant of the r esidual co- v ariance matrix) and w e ha v e termed it multivariate G- c ausality (MV GC ). It con trasts with another recen t pr op osal [ 4 ] for addressing the same p roblem whic h uses in s tead the total v ariance (the trace of the residu al co v ariance matrix). In this p ap er, w e hav e p resen ted sev eral theoretical justifications, 19 augmen ted by n umerical mo d eling, for pr eferring MV GC ov er the trace version, wh ic h w e s ummarize b elo w. W e hav e also extend ed MV GC to address nov el c hallenges in the analysis of complex dy- namical systems, including quan titativ e charac terization of “causal densit y”, “autonom y”, and the iden tification of no v el macroscopic v ariables via causal indep endence. 9.1 Imp ortance of m ultiv ariate causal analysis In many an alyses of complex systems, particularly in n euroscience and b iology , there ma y b e no sim- ple or principled relationship b etw een obs erv ed v ariables and explanatorily relev ant colle ctions, or ensem bles, of these v ariables. In the I n tro du ction w e already remark ed on fMRI, wh er e explanatorily relev an t ROIs are eac h comp osed of m ultiple observ ables (v o xels) which are arb itrarily demarcated with resp ect to un derlying n eural mec hanisms. Other non-inv asiv e neuroimaging metho ds share similar v arieties of arbitrarin ess: b oth electro encephalograph y (EEG) and magneto encephalography (MEG) pr o vide signals whic h are complex conv olutions of u nderlying neur al sources. In these and similar cases, multi v ariate causal analysis, and MV GC in p articular, can b e used to aggregate univ ari- ate observ ables into meaningfu l multiv ariate (ensemble) v ariables. It b ears emphasizing that MVG C is fun damen tally different from conditional G-ca usalit y [ 38 ], w hic h assesses the causal connectivit y b et we en t wo un iv ariate v ariables, conditioned on a set of other v ariables. Ev en when it is p ossible to measure directly the activit y of v ariables of in terest, it is still imp ortan t to consider multiv ariate in teractions. C ontin u ing with the n euroscience example, it may b e that m ultiple ROIs act jointly to in fl uence other ROIs, or cognitiv e and/or b ehavioral outputs. In single cell recordings this p oint is ev en more pressin g: since Hebb [ 39 ] it has b een increasingly appreciated that neurons act as ensem bles, rather than singly , in the adaptiv e function of the br ain [ 40 ]. MV GC is wel l suited to disclosing causal relationships among these ensembles as a window on to und erlying principles of brain op eration. Of course, the application of MVG C is not limited to neuroscience. Multiv ariate interac tions are lik ely to b e imp ortant in a ve ry br oad range of app lication areas. F or example, genetic, metab olic, and transcriptional regulatory n et w orks may b e usefu lly decomp osed into multi v ariate ensembles influencing other suc h ensem bles [ 4 ]. In deed, multiv ariate inte ractions may b e imp ortan t in any system, natural or artificial , w hic h can b e d escrib ed in terms of multiple sim ultaneously acquired time series. 9.2 Generalized v ariance vs total v ariance A differen t approac h to m ultiv ariate causal an alysis w as recen tly prop osed by Ladroue and colleagues [ 4 ]. This in v olv ed a measure (whic h w e call trvMVGC) based on th e trace of th e residual cov ariance matrix (the total v ariance), r ather than the determinant (the generalize d v ariance). Gew ek e [ 5 ] p ro- vided the original justifications for the determinant form, b u t did not explicitly discuss the trace form. As noted in Section 3 of Ref. [ 5 ], Gew ek e’s motiv ations includ ed (i) MVG C is inv arian t un der (linear) transformations of v ariables, and (ii) the maximum likeli ho o d estima tor of MV GC is asymptotically χ 2 -distributed for large samples; (there is no standard test statistic for trvMVGC). In this pap er we ha v e su bstan tially enhanced this list, in eac h case comparing MV GC explicitly w ith trvMV GC. In summary: (iii) MV GC is fully equiv alen t to transfer en tropy under Gaussian assump tions, whereas for trvMVGC this equiv alence only holds for th e u n iv ariate case; (iv) MVGC is inv ariant under al l (non-singular) linear tr an s formations of the p r edictee v ariable, while trvMV GC is inv arian t only 20 under conformal linear transformations (see b elo w); (v) only MV GC is expandable as a sum of u ni- v ariate G-causalit ies; (vi) MVGC but not trvMV GC admits a satisfactory sp ectral d ecomp osition, inasm uc h as it guarante es a consisten t relationship with the corresp onding time-domain formula tion; (vii) only MV GC dep ends on residual correlations, and through these accommo dates in a natural w a y the infl uence of exogenous or laten t v ariables, and (viii) the p artial ve rsion of MV GC, pMV GC is decomp osable in terms of non-partial MVGCs, but this is not true in g eneral for trvMV GC. All the ab o v e factors s u ggest that MV GC should b e preferred to trvMV GC. T aken individu- ally they ma y differ in their significance but tak en together they emphasize that MVGC, but not trvMV GC, pro vides a comprehensive and the or e tic al ly c onsistent extension of standard G-causalit y to the multiv ariate case. While this consistency is the most imp ortant reason to prefer MVG C to trvMV GC, let u s consider fu rther three of the ind ividual prop erties. First, the equ iv alence with transfer entrop y is imp ortant b ecause it justifies the u se of linear mo deling for m ultiv ariate causal analysis, at least where Gaussian assumptions are reasonable. Second, th e broader range of inv ariance is imp ortan t b ecause it means that MV GC is robust to a w ider r an ge of common in accuracies du ring data collec tion, in p articular those in whic h univ ariate v ariables are con taminated by co nt ribu tions from other v ariables and in which differen t comp on ents of multiv ariate ensem bles are differen tly scaled by measuremen t constrain ts. It is lik ely that this additional robustness will ha ve significan t practical imp ortance in many exp erimen tal applicatio ns, f or example in EEG and MEG where in- dividual sensors detect signals from multiple neural sou r ces and m ay differentia lly amplify these sources acco rding to their distance from the sens ors and their alignment with th e cortical s urface. Finally , the lac k of a satisfactory sp ectral v ersion of trvMVGC, whic h we establish b oth theoretically and n umerically (Section 4.4 and Figures 2 and 3 ), implies that frequency-domain results obtained using trvMVGC are unreliable, b oth in their magnitude and in their sp ectral pr ofile. Ladroue et al. [ 4 ] note Gewek e’s f orm (i. e. MV GC) and sugge st trvMV GC is preferable in view of p ossible n umerical instabilities attending the computation of d eterminan ts for high-dimensional data. Ho wev er the existence of an expansion of MV GC in terms of univ ariate G-causalities ( 18 ) seems to coun ter this claim, since the univ ariate causaliti es w ould not b e exp ected to b e u nstable. Numerical simulatio ns (Section 4.3.1 and Figure 1 ) confirm our view. 9.3 Quan t ities deriv ed from MVG C In the s econd part of the pap er we used MVG C to derive sev eral nov el measures that hav e the p oten tial to shed su bstant ial n ew ligh t on complex system dynamics. First, MVGC leads imm ediately to a series of r edefinitions of our previous “causal densit y” measure [ 22 ], whic h aims to captur e the dynamical complexit y of a sys tem’s dynamics in terms of co existing in tegration and differentia tion. Extension to the m ultiv ariate case allo ws causal dens it y to b e ev aluated at multiple lev els of description th us furnish ing a more principled m easur e of dynamical complexit y . Causal density has b een su ggested as a measur e of neural dyn amics th at captures certain asp ects of consciousness [ 32 ]. It has b een shown 14 to increase in resp ons e to p erceiv ed stimuli as compared to non-p erceiv ed stimuli in a visu al masking task [ 41 ], and it captures the complex dynamics of small-wo rld net wo rks more effectiv ely than do es a prominent comp eting measure, n eural complexit y [ 33 ]. Multiv ariate causal density has the p oten tial to further strengthen and generaliz e these contributions. 14 In app ro ximation. 21 Second, MV GC can b e used to generalize the concept of G- autonom y , whic h op erationalizes the notion of autonom y as “ self causation” [ 11 ]. Multiv ariate G-autonomy is a significan t enhancement b ecause it deals with the case in w hic h a group of v ariables may b e joint ly au tonomous ev en though, individually , no v ariable is autonomous. Our results therefore pav e the wa y to informativ e app lication of this measure to complex systems. Third, MV GC can b e helpfu l in considering relations b et w een microscopic and macroscopic lev els of description of a system. O ne approac h is to consider how c ausal ly indep endent a macroscopic v ariable is, with resp ect to its s et of constituen t micro-v ariables. W e ha v e suggested that this notion can b e used to identify parsimonious macro-v ariables b y m aximizing causal indep endence o v er a space of functions relating micro- and macro- v ariables. Alternativ ely , th e concept of G-emergence op erationalizes the idea that an emergent macro-v ariable is b oth autonomous fr om and c ausal ly dep endent on its und erlying m icro-lev el constituen ts. Unlik e th e “causal ind ep endence” view, G- emergence ma y b e b etter suited to c haracterizing the degree of emergence as opp osed to identifying prosp ectiv e macro-v ariables; G-emergence also exp licitly measures m icro-to-mac ro causal dep endence rather than assuming that it is p r esen t. Finally , the concepts of redund ancy and synergy amongst v ariables hav e b een recen tly in tro- duced, via the use of a v ariant of the trvMV GC measure [ 42 ]. Th ese quantit ies aim at detecti ng functionally r elev an t p artitions of a system by grouping v ariables acco rding to their su mmed causal influences. Because of the adv anta ges of MVG C ov er trvMV GC, we su ggest it ma y b e useful to redefine r edundan cy and synergy in terms of MV GC. 9.4 Summary Mo dels of complex sys tems typica lly con tain large num b ers of v ariables. Ha ving a measure for directed interacti ons b et ween groups of v ariables, as opp osed to ju st single v ariables, provi des a us eful to ol for the analysis of su c h systems. W e hav e d emonstrated that MVGC is suc h a measur e, and we ha v e pro vided a series of justifi cations, theoretical and n umerical, to prefer it o v er a r elated measure, trvMV GC. Lik e al l measures of d irected in teracti on based on G-c ausalit y , MVGC ca n b e measured for freely co llected data, without p erturbin g or pro viding inputs to the system. Finally , in con trast to alternativ e app roac hes such as str u ctural equation mo deling [ 43 ] or d ynamic causal mo d eling [ 2 ], MV GC can b e applied with v ery litt le prior knowledge of the system und er consider ation. Ac kno wledgemen ts AKS is supp orted by EPSR C Leadership F ello wship EP/G00 7543/1, whic h also sup p orts the w ork of ABB. Sup p ort is also gratefully ac kno wledged from th e Dr. Mortimer and Th er esa Sackle r F oun - dation. App endix A Minimizing the determinan t of the residuals co v ariance matrix W e wish to show that minimizing th e d eterminan t | Σ( ε ) | , where ε = X − A · Y as sp ecified in ( 4 ), leads to the same v alues ( 5 ) for the r egression co efficients A . W e th us solv e for A in the simulta neous 22 equations ∂ | Σ( ε ) | ∂ A iα = 0 , (57) where i runs from 1 . . . n , α from 1 . . . m and Σ( ε ) is giv en b y Σ( ε ) = Σ ( X ) − Σ ( X , Y ) A ⊺ − A Σ( X , Y ) ⊺ + A Σ( Y ) A ⊺ . (58) W e use the formula for an in vertible squ are matrix B ∂ | B | ∂ B j k = | B | B − 1 k j . (59) Assuming Σ( ε ) in vertible and setting W ≡ | Σ( ε ) | Σ( ε ) − 1 w e h a v e ∂ | Σ( ε ) | ∂ A iα = X j,k ∂ | Σ( ε ) | ∂ Σ( ε ) j k ∂ Σ( ε ) j k ∂ A iα = X j,k W k j ∂ Σ( ε ) j k ∂ A iα from ( 59 ) = X j,k W k j ∂ ∂ A iα [Σ( X ) − Σ( X , Y ) A ⊺ − A Σ( X , Y ) ⊺ + A Σ( Y ) A ⊺ ] j k from ( 58 ) = X j,k W k j ∂ ∂ A iα − X β Σ( X , Y ) j β A k β − X β Σ( X , Y ) k β A j β + X β ,γ Σ( Y ) β γ A j β A k γ = X j,k W k j − X β Σ( X , Y ) j β δ ik δ αβ − X β Σ( X , Y ) k β δ ij δ αβ + X β ,γ Σ( Y ) β γ ( A j β δ ik δ αγ + A k γ δ ij δ αβ ) = − X j W ij Σ( X , Y ) j α − X k W k i Σ( X , Y ) k α + X β ,j W ij Σ( Y ) β α A j β + X γ ,k W k i Σ( Y ) αγ A k γ = 2 { W [ A Σ( Y ) − Σ( X , Y )] } iα after gathering terms and s im p lifying, a nd Eq. ( 5 ) follo ws. 23 B Pro of of expansion of m ultiv ariate Granger causalit y Here w e pro ve Eq. ( 18 ). W e co nsider the case of there b eing no conditional third v ariable, since the extension to this case is trivial. W e first expand in terms of predictor v ariables acc ording to F Y → X = log | Σ( X | X − ) | | Σ( X | X − ⊕ Y − ) | = log | Σ( X | X − ) | · Σ X | X − ⊕ Y − 1 · Σ X | X − ⊕ Y − 1 ⊕ Y − 2 · · · Σ X | X − ⊕ Y − 1 ⊕ . . . Y − m − 1 Σ X | X − ⊕ Y − 1 · Σ X | X − ⊕ Y − 1 ⊕ Y − 2 · · · Σ X | X − ⊕ Y − 1 ⊕ . . . Y − m ! = log | Σ( X | X − ) | Σ X | X − ⊕ Y − 1 ! + log Σ X | X − ⊕ Y − 1 Σ X | X − ⊕ Y − 1 ⊕ Y − 2 ! + · · · + log Σ X | X − ⊕ Y − 1 ⊕ . . . Y − m − 1 Σ X | X − ⊕ Y − 1 ⊕ . . . Y − m ! = F Y 1 → X + F Y 2 → X | Y 1 + F Y 3 → X | Y 1 ⊕ Y 2 + · · · + F Y m → X | Y 1 ⊕ Y 2 ⊕ ... ⊕ Y m − 1 . (60) T o exp an d in terms of p redictees we use th e expansion | Σ( X | W ) | = Σ( X 1 ) Σ( X 2 | W ⊕ X 1 ) Σ( X 3 | W ⊕ X 1 ⊕ X 2 ) · · · Σ( X n | W ⊕ X 1 ⊕ . . . X n − 1 ) , (61) whic h follo ws from rep eated application of Eq. ( 3 ). W e obtain F Y 1 → X = log | Σ( X | X − ) | Σ X | X − ⊕ Y − 1 ! = log Σ( X 1 | X − ) Σ( X 2 | X − ⊕ X 1 ) · · · Σ( X n | X − ⊕ X 1 ⊕ X 2 ⊕ . . . ⊕ X n − 1 ) Σ X 1 | X − ⊕ Y − 1 Σ X 2 | X − ⊕ Y − 1 ⊕ X 1 · · · Σ X n | X − ⊕ Y − 1 ⊕ X 1 ⊕ X 2 ⊕ . . . ⊕ X n − 1 ! = F Y 1 → X 1 | X + F Y 1 → X 2 | X ⊕ X 0 1 + F Y 1 → X 3 | X ⊕ X 0 1 ⊕ X 0 2 + · · · + F Y 1 → X n | X ⊕ X 0 1 ⊕ X 0 2 ⊕ ... ⊕ X 0 n − 1 , (62) and similar for the other comp onent s of the sum in Eq. ( 60 ), from w hic h the result follo ws. C P artial co v ariance of residuals for t w o v ariables joi n tly d ep en- den t on a third Giv en the regressions X = A · W + ε , Z = B · W + η , (63) where the r egression co efficien ts A, B are d eriv ed from an ord inary least sq u ares, Y ule-W alk er or equiv alen t pro cedure, we sho w that Σ( ε | η ) = Σ( X | Z ⊕ W ) , (64) 24 assuming that all (partial) co v ariance matrices w hic h app ear b elo w are inv ertible. W e ha ve Σ( ε ) = Σ ( X | W ) , Σ( η ) = Σ( Z | W ) , (65) Σ( ε , η ) = Σ ( X , Z | W ) . Th us we ma y ca lculate that Σ( ε | η ) = Σ( X | W ) − Σ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , X | W ) . (66) Using th e block matrix in version formula f or Σ( Z ⊕ W ), we ma y also calculate th at Σ( X | Z ⊕ W ) = Σ ( X ) − Σ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , X ) − Σ( X , W | Z ) Σ( W | Z ) − 1 Σ( W , X ) . (67) No w expan d ing the Σ( X | W ) ≡ Σ( X ) − Σ( X , W ) Σ( W ) − 1 Σ( W , X ) term in ( 66 ), w e fi n d using ( 67 ) that ( 64 ) is equ iv alen t to Σ( X , W ) Σ( W ) − 1 Σ( W , X ) + Σ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , X | W ) = Σ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , X ) + Σ( X , W | Z ) Σ( W | Z ) − 1 Σ( W , X ) . Or, rearr an ging and factorizing, h Σ( X , W ) Σ( W ) − 1 − Σ( X , W | Z ) Σ( W | Z ) − 1 i Σ( W , X ) = Σ( X , Z | W ) Σ( Z | W ) − 1 [Σ( Z , X ) − Σ( Z , X | W )] . (68) No w the term in squ are brack ets on the R HS of ( 68 ) simplifies to Σ( Z , W ) Σ( W ) − 1 Σ( W , X ) so that, factoring out Σ( W , X ), ( 68 ) is equiv alen t to h Σ( X , W ) Σ( W ) − 1 − Σ( X , W | Z ) Σ( W | Z ) − 1 − Σ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , W ) Σ ( W ) − 1 i × Σ( W , X ) = 0 . (69) W e no w sho w that the term in the square brack ets in ( 69 ) is zero; i.e. th at Σ( X , W ) Σ( W ) − 1 − Σ( X , W | Z ) Σ( W | Z ) − 1 − Σ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , W ) Σ ( W ) − 1 = 0 , (70) th us p ro ving ( 64 ). Rearranging and factoring out Σ( W ) − 1 , ( 70 ) becomes h Σ( X , W ) − Σ ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , W ) i Σ( W ) − 1 = Σ( X , W | Z ) Σ( W | Z ) − 1 , or, multiplying through o n the righ t b y Σ( W | Z ) , h Σ( X , W ) − Σ ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , W ) i Σ( W ) − 1 Σ( W | Z ) = Σ( X , W | Z ) . 25 Expanding Σ( W | Z ), factorizing and rearranging ag ain, w e get h Σ( X , Z ) − Σ( X , W ) Σ( W ) − 1 Σ( W , Z ) i Σ( Z ) − 1 Σ( Z , W ) = Σ( X , Z | W ) Σ( Z | W ) − 1 Σ( Z , W ) Σ ( W ) − 1 Σ( W | Z ) , or, sin ce the te rm in square brac ket s on the L HS is just Σ( X , Z | W ), Σ( X , Z | W ) h Σ( Z ) − 1 Σ( Z , W ) − Σ( Z | W ) − 1 Σ( Z , W ) Σ ( W ) − 1 Σ( W | Z ) i . W e no w sho w that, again, the term in square brac k ets is zero; i.e . that Σ( Z ) − 1 Σ( Z , W ) = Σ( Z | W ) − 1 Σ( Z , W ) Σ( W ) − 1 Σ( W | Z ) . (71) Multiplying th rough on the left b y Σ( Z | W ), ( 71 ) is equiv alen t to Σ( Z | W ) Σ( Z ) − 1 Σ( Z , W ) = Σ( Z , W ) Σ( W ) − 1 Σ( W | Z ) , whic h follo ws immediately on expanding Σ( Z | W ) and Σ( W | Z ), th u s establishing ( 64 ). References [1] M. Ding, Y. Chen and S . Bressler. Gran ger causalit y: Basic theory and application to n eu- roscience. In S. S c helter, M. Wint erhalder, an d J. Timmer, editors, H andb o ok of Time Series Ana lysis , 43 8–460. Wiley , Wienheim, 20 06. [2] K . F riston, L. Harrison and W. Pe nny . Dynamic causal mo deling. Neur oimage , 19(4): 1273–30 2, 2003. [3] T . Sc h reib er. Measurin g in f ormation transfer. Phys R e v L ett , 85(2):461– 4, 2000. [4] C . Ladroue, S. Guo, K . K endric k and J. F eng. Bey ond elemen t-wise in teractions: I d en tifying complex interac tions in biological pro cesses. PL oS One , 4:e6 899–e6 899, 200 9. [5] J . Gewe k e. Measuremen t of linear d ep enden ce and feedb ac k b et we en multiple time series. J Am Stat Asso c , 77(378):304 –313, 1982 . [6] Z . Zh ou, Y. Ch en, M. Ding, P . W righ t, Z . Lu , and Y. Liu. Analyzing brain net w orks with PCA and conditional Granger causalit y . Hum Br ain Mapp , 3 0:2197– 2206, 2009. [7] N. Wiener. The theory of p rediction. In E. F. Bec k en bac h, editor, Mo dern M athematics for Engine ers . Mc Gra w Hill, New Y ork, NY, 1956. [8] C . W. J. Granger. In v estigat ing causal relations by econometric mo dels and cross-sp ectral metho ds. Ec onometric a , 37:424 –438, 1969. [9] S . L. Bressler and A. K . Seth. Wiener-Granger causalit y: A well established metho dology . Neur oimage , x:xx–xx, 2010. 26 [10] A. K. Seth. Explanatory correlates of consciousness: Theoretical and compu tational c hallenges. Co gnitive Computation , 1(1 ):50–63 , 2009. [11] A. K . Seth. Measuring autonom y and emergence via Granger causalit y . Artificial Lif e , 16(2), 2009. [12] M. G. K en dall and A. Stuart. The advanc e d the ory of statistics, volume 2: “Infer enc e and R elationship” . Griffin, Lond on , 197 9. [13] L. Barnett, A. B. Barrett and A. K. Seth. Gran ger causalit y and transfer entrop y are equiv alent for Gaussian v ariables. Phys R ev L ett , 103:23870 1, 2009. [14] C. W. J. Granger. Economic p ro cesses inv olving feedbac k. Inform Contr ol , 6:28–48 , 1963. [15] P . Whittle. The analysis of m ultiple stationary time series. J R oyal Stat So c B , 15(1): 125–139 , 1953. [16] A. W ald. T ests of statistica l hyp otheses concerning sev eral parameters when the num b er of observ ations is large. T Am Math So c , 54(3):426–4 82, 194 3. [17] A. K. Seth. A MA TLAB to olb o x for Granger causal connectivit y analysis. J Neur osci M eth , 186:26 2–273, 2010. [18] J. Gewe k e. Measures of conditional linear dep endence and feedbac k b et w een time series. J Am Stat Asso c , 79(388):907 –915, 1984 . [19] A. Kaiser and T. S c hreib er. Information tr ansfer in con tin uous pro cesses. Physic a D , 166:43 –62, 2002. [20] Y. Chen , G. Rangara jan, J. F eng and M. Ding. Analyzing multiple nonlinear time series with extended Granger causalit y . P hys L ett A , 324:26–35 , 2004 . [21] D. Marinazzo, M. Pelli coro, and S. Stramaglia. Kernel metho d for nonlinear Granger causalit y . Phys R ev L ett , 100( 14):144 103, 200 8. [22] A. K. Seth. Causal connectivit y of ev olv ed n eural n et w orks du r ing b eha vior. Network: Compu- tation in Neur al Systems , 16 :35–54 , 2005. [23] O. Sp orns, G. T ononi and G.M. Ed elman. Theoretical neuroanatom y: Relating anatomical and functional connectivit y in graphs and cortical connection matrices. Cer ebr al Cortex , 10:127–141 , 2000. [24] Y. Chen, S. L. Bressler and M. Ding. F requency decomp osition of conditional Granger causalit y and app lication to m ultiv ariate neur al field p oten tial data. J N eur osci Meth , 150:228–2 37, 20 06. [25] Y. A. Rozano v. Stationar y R ando m Pr o c esses . Holden-Da y , San F ran cisco, CA, 1967. [26] J. Doob. Sto chastic Pr o c esses . J oh n Wile y , New Y ork, NY, 1953. 27 [27] H. Ak aik e. A new lo ok at the statistical mo del id en tification. IEEE T r ans Autom Contr ol , 19:716 -723, 1974. [28] G. S c h w artz. Estimati ng the dimension of a mo d el. The Annals of Statistics , 5(2):461 -464, 1978. [29] S. Guo, A. K. Seth, K. Kendric k, C. Zhou and J. F eng. P artial Grange r causalit y: Eliminating exogenous inp u ts and latent v ariables. J Neur osci M eth , 172:79– 93, 2008. [30] L. A. Baccal´ a and K. Sameshima. Partial directed coherence: a new concept in neur al stru cture determination. Biol Cyb ern , 84:46 3–474, 2001. [31] O. Sp orns. Complexit y . Scholarp e dia , 2(10):1623, 2007. [32] A. K. Seth, E. Izhikevic h, G. N. Reeke and G. M. Edelman. Theories and measures of con- sciousness: An extend ed framew ork. P Natl A c ad Sci, U SA , 103(28 ):10799 –10804, 2006. [33] M. Shan ah an . Dynamical complexit y in sm all-w orld net w orks of spiking neurons. Phys R ev E Stat Nonlin Soft Matter Phys , 78:04192 4–0419 24, 2008. [34] G. T ononi, O. S p orns and G. M. Ed elman. A measure for brain complexit y: Relating fun ctional segregatio n and int egration in the nervous system. P Natl A c ad Sci, USA , 91:5 033–503 7, 1994. [35] M. A. Bedau. W eak emergence . Philosophic al Persp e ctives , 11:375–399 , 199 7. [36] C. R. Shalizi and C. Mo ore. What is a macrostate: Sub jectiv e observ ations and ob jectiv e dynamics. htt p://arxiv.org/abs/cond-mat/0 303625, 2003 . [37] C. R. Shalizi, R . Haslinge r, J-B. Rouqu ier, K. L. Klinkner and C. Mo ore. Automatic fi lters for the detectio n of coheren t structure in sp atiotemp oral systems. P hys R ev E Stat Nonlin Soft Matter Phys , 73:036104 –036104, 200 6. [38] A. K. Seth. Granger causalit y . Scholarp e dia , 2(7):1667 , 2007. [39] D. O. Hebb. The or ganization of b ehavior . Wiley , New Y ork, NY , 19 49. [40] K. D. Harris. Neural signatures of cell assem bly organization. Nat R ev Neu r osci , 6:399 –407, 2005. [41] R. Gaillard, S . Dehaene, C. Adam, S. Cl ´ emenceau, D. Hasb oun, M. Baulac, L. Cohen and L. Naccac h e. Con v erging in tracranial markers of conscious acce ss. PL oS Biol , 7:e6 1–e61, 2009 . [42] L. Angelini, M. de T ommaso, D. Marinazzo, L. Nitti, M. P ellicoro an d S. Stramaglia. Redund an t v ariables and Granger causalit y . Phys R ev E , in press, 2010. [43] R. B. Kline. Principles and pr actic e of structur al e quation mo deling . Guilford Press, New Y ork, NY, 2005. 28
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment