Analysis of multiview legislative networks with structured matrix factorization: Does Twitter influence translate to the real world?
The rise of social media platforms has fundamentally altered the public discourse by providing easy to use and ubiquitous forums for the exchange of ideas and opinions. Elected officials often use such platforms for communication with the broader pub…
Authors: Shawn Mankad, George Michailidis
The Annals of Applie d Statistics 2015, V ol. 9, No. 4, 1950–197 2 DOI: 10.1214 /15-A OAS858 c Institute of Mathematical Statistics , 2 015 ANAL YSIS OF MUL TIVIEW LEGISLA TIVE NETW ORKS WITH STR UC TURED MA TRIX F A C TORIZA TION: D OES TWITTER INFLUENCE TRANSLA TE TO THE REAL W ORLD? By Sha wn Mankad and Geor ge Micha ilidis Cornel l University and University of Michigan The rise of so cial media p latforms has fundamentally altered the public discourse by pro vidin g easy to use and u b iquitous forums for the exchange of ideas and opinions. Elected officials often use such platforms for communication with th e broader public to disseminate information and engage with their constitu en cies and other p u blic of- ficials. In this wo rk, we inv estigate whether Twitter conversa tions b e- tw een legislators reveal their real-worl d p osition and influ ence by an- alyzing multiple Twitter net works that feature different typ es of link relations b etw een the Members of Parlia ment (MPs) in the United Kingdom an d an identical d ata set for p oliticians within Ireland. W e develop and app ly a matrix factorizatio n technique that allows th e analyst to emphasize nodes with con textual local netw ork structures by sp ecifying net work statistics that gu id e the factorization solution. Levera ging only link relation data, w e find th at important p oliticians in Twitter netw ork s are asso ciated with real-world leadership p osi- tions, and that rankings from t he prop osed met h od are correlated with the n umber of future media headlines. 1. In tro d uction. There is a gro wing literature that attempts to under- stand and exploit so cial n et wo r king platforms for resource optimization and mark eting, as it is a ma jor in terest for p riv ate en terpr ises and p olitical cam- paigns attempting to propagate particular opinions or pro ducts [ NYTimes ( 2011 , 2012 , 2013 )]. An imp ortan t pr ob lem is the identi fication of influential individuals that facilitate comm un ication o ver the n et wo r k. In this p ap er, w e deve lop a mo deling approac h that captures influence from m u ltiple net- w orks that feature d ifferen t link r elations b et wee n the same set of no d es (e.g., Twitter accoun ts). Suc h multiview data are increasingly common due to the complex structur e of man y net working platforms. S p ecifically , w e analyze three d ifferen t t y p es of n et works that are commonly deriv ed from Twitter data, eac h comp osed of either w eighte d or binary links. Received Octob er 2014; revised July 2015. Key wor ds and phr ases. Matrix factorizatio n, net w ork s, influ ence, Twitter. This is an e lectronic r eprint of the original ar ticle published by the Institute o f Mathema tica l Statistics in The Annals of Applie d Statistics , 2015, V ol. 9, No. 4, 1950–1 972 . This reprint differs from the orig inal in pagination and typog raphic detail. 1 2 S. MANKAD AND G. MICHAILIDIS Twitter is a p opular platform with o ve r 270 million activ e acco unt s eac h mon th as of September 2014 [T w itter ( 2014 )]. Twitter allo w s accoun ts to p ost short messages of 140 c h aracters or less, commonly referred to as “t we ets,” th at can b e read by an y visitor. A tw eet that is a cop y of an- other account’ s t weet is called a “ret weet. ” Within a t we et, an account can men tion another accoun t b y referrin g to their accoun t name with the @ sym- b ol as a prefix. Accoun ts also declare the other accoun ts th ey are intereste d in “follo wing,” whic h means the f ollo w er receiv es notification wh enev er a new t weet is p osted by the follo wed accoun t. T hese thr ee directed actions define p olitical Tw itter netw orks that we analyze in this work. The first netw ork is a ret weet netw ork, where links are d irected and w eighte d to d enote the log-num b er of retw eets from one accoun t to another o ve r an inte r v al of time. The second netw ork is also comp osed of directed and w eigh ted lin ks that denote the log-n umb er of men tions one accoun t giv es another. Th e third n et work is constru cted with dir ected b inary links that denote the follo wer and follo wed r elationships b et we en accoun ts. These three netw orks, eac h featuring 416 Mem b ers of Pa rliamen t (MPs) in the Un ited Kin gd om, are drawn in the top panel of Figure 1 , where accoun ts are registered to 172 Cons erv ative MPs, 185 Lab our, 43 Lib eral Demo cr ats, 5 MPs representing the S cottish National Pa rt y (SNP), and 11 MPs b elonging to other p arties. T here are 650 MPs formin g the House of Commons, the lo wer house in the bicameral legislat iv e b o d y for the United Kingdom. Eac h MP is demo cratically elected to r epresen t constituencies for fi v e yea r terms, though often elections are held more frequently w hen P arliament is dissolv ed. The second set of p olitical Twitter netw orks that we analyze are d ra wn in th e b ottom panel of Figure 1 . E ac h net work is comp osed of 348 no des that represent the ac counts of Irish p oliticians and p olitical organizations at all lev els of go vernmen t, includin g the Pr esid en t of the Republic of Ireland, mem b ers of the lo cal and national go ve rnment, and elected r ep resen tativ es for the Europ ean Union. The ra w data for b oth data sets, collected and p ro cessed by Greene and Cunnin gham ( 2013 ), consists of app r o ximately 500,000 tw eets and 40,000 follo w er links f rom late 2012. An empirical pattern observed in these data and also in previous studies [Hub erman, Romero and W u ( 2008 )] is that the follo w er net work is v ery d ense in con trast to the ret wee t and men tions net works. Almost all p oliticians inte ract v ia ret weet ing or men tioning with a smaller num b er of other account s, r elativ e to their follo we r declarations. Moreo v er, users with man y follo wers p ost up dates less ofte n than those with few er follo wers [Hub er m an, Romero and W u ( 2008 )]. Suc h empirical find- ings su ggest that not all links are created equally , and u s ually the follo w er net work is discarded b ecause it d o es not accurately capture patterns of con- v ersation. How ev er, eac h netw ork, including the follo we r netw ork, con tains ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 3 (a) Retw eet netw ork (b) Men tions netw ork (c) F ollo ws netw ork Fig. 1. The top p anel shows networks of UK Memb ers of Parliament and the b ottom p anel sh ows net works of Irish p olitici ans and p oli ti c al or ganizations. No de c ol or and vert ex shap es denote p arty affiliation. The aver age de gr e e f or the UK Re twe et, Mentions and F ol lows network is 9.13, 25.51 and 65. 25, r esp e ctively. The aver age de gr e e for the Irish R etwe et, Mentions and F ol lows network shown in the b ottom r ow i s 5.81, 15.28 and 48.44, r esp e ctively. meaningful inf orm ation, esp ecially since we only consider th e p opulation of p oliticians in a sp ecific legislativ e b o dy ins tead of a broad set of users or ev en the en tire Twitter userbase. Previous researc h has f ound that Twitter and other so cial net working platforms h elp facilitate communicati on b et wee n p oliticians, go ve rnment agencies and the broader pu blic. Golb ec k, Grimes an d Rogers ( 2010 ) fi n d b y text mining t w eets that mem b ers of th e United States Congress emplo y Twitter for primarily t w o purp oses: information dissemination and self pro- motion. T umasjan et al. ( 2010 ) fin d that the n um b er of tw eets from th e general public men tioning a p olitica l party or p olitician is a v alid indicator of p olitical sentimen t and a go o d predictor of federal election r esu lts in Ger- man y . More recent ly , similar results ha ve b een foun d for f ederal electi ons in Australia and the U.S. House of Representat iv es [Unank ard et al. ( 2014 ), McKelv ey , DiGrazia and Ro jas ( 2014 )]. In con trast to these previous works, w e rely only on the link relations, so-called “meta-data,” among p oliticians to measure influence and iden tify conv ersation flo ws with n et wo r k analysis. 4 S. MANKAD AND G. MICHAILIDIS Approac h es that u tilize con tent analysis can face significant c hallenges as- so ciated w ith text and image analysis (accoun ts can p ost a p hoto within a t wee t), suc h as language differences, tone and sentimen t characte rization, and so on. There has b een extensive work on r anking no des on a netw ork by th eir imp ortance primarily motiv ated by searc h on the W orld Wide W eb. W e fin d our prop osed m etho d co mpares fa v orably for ranking p oliticians aga inst t wo seminal works called P ageRank [Pag e et al. ( 1999 )] and HIT S [Hyp erlink- Induced T opic Searc h; Klein b erg ( 1999 )]. Th e id ea b eh in d PageRa n k is to use as a measure of imp ortance an estimate of the probabilit y of reac hing a giv en no d e by randomly follo win g edges. HITS utilizes th e so-called author- it y and hub s cores, whic h are compu ted by the leading eigen vecto r of A T A and AA T , r esp ectiv ely , where A is an adjacency matrix. Our m ain goal of iden tifying influent ial p oliticians is also closely r elated to role identi fication, wh ich aims to assign roles based on local connectivit y patterns. T ypically , role analysis metho ds rely on analyzing ego net w ork s (the u nion of a n o de and its neigh b ors), net w ork statistics or graph-coloring tec hniques [Salter-T o wns hend and Murphy ( 2015 )]. Also note that w hile there ha ve b een many recen t adv ances in communit y detection, includin g the sto c hastic blo c k mo d el, laten t p osition cluster mo dels and others [see Fien b erg ( 2012 ), Salter-T ownshend et al. ( 2012 ) for survey articles], the task in this article is differen t from typical communit y detection, wh ic h aims to extract groups of no d es that feature relativ ely dense within-group connectivit y and sparser b et ween-group connectivit y . Th at said, comm unity detection could help guide a searc h for influ en tial p oliticians. F or in stance, an analyst ma y examine eac h netw ork separately by firs t d isco ve r ing com- m u nities, if unkno wn , then s earc hing for in teresting net w ork stat istic profiles within eac h group . There are in principle man y w ays to com bine comm u nit y detection with n et wo rk statistics for the iden tifi cation of influ en tial no des, (e.g., p oliticians), bu t it r emains unclear whic h is the preferred metho d. In this pap er, w e integrat e b oth steps together to address this issue. The pro- p osed factorization mo del is also able to emphasize no des with int eresting path-related prop erties b y in corp orating no de-lev el statistics that capture these nonlinear relationships, th u s leading to more inte rpretable measures of in fl uence and su bstructure. The main idea is to guide the mappin g of the multiview net works to lo we r-dimensional spaces using structured matrix factorizatio n . Nonnegativ- it y constraint s are also imp osed on the lo w er-d imensional spaces to impro ve data representa tion and structural disco very . Suc h constrain ts ha ve b een p opularized with the n onnegativ e matrix f actorization (NMF) and S emi- NMF, where one or all matrix factors are comp osed of on ly n onnegativ e en tries and ha ve b een shown to b e adv antag eous for data representa tion [Lee and Seu n g ( 1999 ), Ding, Li and Jord an ( 2010 )]. As v alidation, we find ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 5 that imp ortan t p oliticians id en tified using our m o deling appr oac h are as- so ciated with real-w orld leadership p ositions, and that r ankings fr om the prop osed metho d are significant ly correlated with f uture media h eadlines. The consisten t fin dings b et wee n b oth data sets suggest the mo del can b e a relativ ely straigh tforward tec h nique for identifying influ en tial individuals with p olitical Twitter net works from other count ries th at feature different go v ernment structures, and that it can complemen t the p oten tially more in volv ed con tent analysis for related tasks. The n ext sectio n introd u ces the matrix factorization m o del, follo w ed by estimation details in Section 3 . Section 4 summarizes and compares r esults of th e prop osed mo del against alternativ e metho dologies with UK MPs and Irish p oliticians. This article closes with a brief discussion in Section 5 . 2. Structured semi-NMF for influen ce disco very . T h e use of lo w-r ank appro ximations to netw ork r elated matrices follo ws a long line of p r evious w ork. In classical sp ectral la yo ut, the co ordinates of eac h no de are giv en by the Singular V alue Decomp osition (SVD) of th e Laplacian matrix [K oren ( 2005 ), Brandes, Fleisc her and Pupp e ( 2006 )]. Recen tly , there has b een ex- tensiv e in terest in sp ectral clustering [Rohe and Y u ( 2012 ), Rohe, Chatterjee and Y u ( 2011 )], whic h disco vers communit y stru ctur e in th e eigen ve ctors of the Lap lacian matrix. Lo w-rank appr o ximations satisfying differen t constrain ts other than or- thonormalit y are also p opular. F or instance, NMF has b een prop osed for o ve rlapping communit y d etectio n on static [Psorakis et al. ( 2011 ), W ang et al. ( 2011 )] and dyn amic [Lin et al. ( 2008 )] net works. When o v erlaps among communities exist, an adv an tage of NMF o ver sp ectral clustering is that NMF can s till fi nd basis vec tors for eac h communit y , wh ile orthogo- nalit y of S VD mak es it un lik ely that the singular vec tors will corresp ond to eac h of the communities [Xu, Liu and Gong ( 2003 )]. The b asic framew ork for NMF in net wo rk analysis is A ≈ U V T , wh ere A is an adjacency matrix and U, V ∈ R n × K ≥ 0 . W ritten in elemen t form, A ij ≈ U i 1 V j 1 + · · · + U iK V j K , one can easily see that eac h edge of the giv en net work is app ro ximated with a non n egativ e su m. Consequentl y , eac h term in the sum , U ik V j k , represents the con tr ib ution of the k th laten t stru cture (often capturin g communit y structure esp ecially when decomp osing sparse adjacency matrices [Mank ad and Mic hailidis ( 2013b )]) to the edge f rom i to j . Ed ge decomp ositions can b e aggreg ated by no de or one can u se the rows of V to directly determine no de communit y mem b ership. T he factors are foun d by minimizing min U ≥ 0 ,V ≥ 0 k A − U V T k 2 F , 6 S. MANKAD AND G. MICHAILIDIS where k · k F denotes the F r ob enius norm . T h e optimizatio n ca n b e p erformed using gradient-desce n t algorithms for p enalized optimization. Give n that the p rop osed mod el in this article u tilizes n onnegativit y , we follo w a similar algorithmic app roac h to th e NMF literature. Enforcing nonn egativit y on a single matrix factor was fir s t prop osed in Ding, Li and Jord an ( 2010 ) w ith the so-called Semi-NMF to improv e inter- pretabilit y of the resultan t factorizatio n s w ith data of mixed signs. W e uti- lize the flexibilit y of Semi-NMF and extend it to the net w ork setting w ith a structured approac h that incorp orates graph geometry in to the factorizatio n through u s er-sp ecified matrices. In particular, we aim to utilize the man y no de-lev el statistics that h a ve b een p rop osed in the net work literature to guide the factorizat ion solution. Next we in tro duce the mo d el for single- view net works, then extend to multiview net works, follo w ed by estimation pro cedures in the next s ection. 2.1. Singleview networks. Let A denote the a djacency matrix from a sin- gle, giv en netw ork with n no des. W e start with the follo wing graph Struc- tured S emi-NMF mo del of Mank ad and Mic h ailidis ( 2013a ): min Λ , Θ ≥ 0 k A − S ΛΘ T k 2 F , (1) where S ∈ R n × D , Λ ∈ R D × K , and Θ ∈ R n × K ≥ 0 . Note that Θ is nonnegativ ely constrained, but Λ is not, w h ic h is why the m o del fits into the Semi-NMF framew ork . Each factor in the pro duct ΛΘ T is estimated from the data and pro v id es co efficient s for ea c h n o de that represen t the giv en adjacency matrix in terms of S . The S m atrix is comp osed of D n o de-lev el statistics that are sp ecified b y the analyst b efore p er f orming the factorization to emph asize no d es th at driv e influ ence. There is an extensiv e literature in n et work analysis p r o viding p oten tial no de-lev el statistics [Newman ( 2010 )]. I n our an alysis, the S matrix is constru cted using D = 4 n etw ork statistics and has form S i = [clustering co efficien t i , b et weenness i , closeness i , degree i ] , where i = 1 , . . . , n . The clustering c o efficient for a giv en no de quantifies ho w close its neigh b ors are to forming a complete graph [Newman ( 2010 )]. A higher clustering co efficien t will emphasize p oliticia ns that “create b uzz.” Betwe e nness [F reeman ( 1979 )] and closeness [Newman ( 2 010 )] rely on short- est path statistics and captur e imp ortan t links from hub no des. De gr e e , the n u m b er of connections a n o de has obtained, ensures that activ e p oliticians within communities are emphasized in the f actoriza tion. If there are no no d e-sp ecific v alues that are ob v ious to use for S , one can start with man y candidate no de-lev el statistics and searc h for s u bsets that fit the data we ll wh ile m aintaining int erpretabilit y . T his strategy will b e ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 7 discussed further b elow to also show robustness and assess the sp ecification of S in our application. I nstead of searc hing o ver no de-sp ecific statistics, one could also b e tempted to set S = I n × n to b e the identit y m atrix. In this case, the factorizati on is essen tially the standard Semi-NMF f actorization. Our results sh o w th at the Semi-NMF mo del p erforms similarly to classical imp ortance m easur es, lik e P ageRank and HITS, wh ich s hould b e p r eferred due to their more efficien t implemen tations. The pr op osed mo del implies certain connectivit y dynamics that can b e seen w hen equation ( 1 ) is written in element form A ij ≈ ( S Λ) i 1 Θ j 1 + · · · + ( S Λ) iK Θ j K , ( S Λ) ik = S i 1 Λ 1 k + · · · + S iD Λ D k . F or any nod e i , outgoing edges are co ntrolled b y its local top ological c harac- teristics, as measured in S , and ho w comm un ities load onto the statistics in S , giv en in the columns of Λ. When m u ltiplied toget her, S Λ form cen troids in a K -d imensional space that capture the outgoing no d e in fluence from eac h of the comm un ities. T he r eceiving n o de j in an edge is determined by the j th row of Θ , where larger v alues mean the n o de is more lik ely to hav e incoming connections and , hence, greater influ ence. Due to nonn egativit y and the fact that Θ mo dulates incoming connec- tions, we accomplish our ultimate go al of measur in g o v erall infl uence for the i th no de by taking its cumulativ e sum of imp ortance to eac h communit y I i = K X k =1 Θ ik . (2) As illustrated in the s upplementa l article [Mank ad and Mic hailidis ( 2015 )] on a to y example, the S matrix plays a piv otal role in the factorizatio n , and causes I to b e an effectiv e imp ortance measure even with its relativ ely simple definition. Next w e prop ose an extension of th is mod el to the m ultiview setting fou n d in p olitical Twitter net works. 2.2. Multiview networks. Let A m denote the adjacency m atrix from the corresp onding Twitter net w ork, wh ere m = { ret weet , men tions , follo ws } . W e extend the singleview mo del w ith min Λ m , Θ ≥ 0 ,V m ≥ 0 X m k A m − S m Λ m (Θ + V m ) T k 2 F , (3) where S m ∈ R n × D , Λ m ∈ R D × K , and Θ , V m ∈ R n × K ≥ 0 . Θ is common to all m net works to capture g eneral structure and mak es the ob jectiv e fu nction n on- separable, whereas V m rev eals n et wo rk-sp ecific structur e and also implicitly w eights eac h net work according to its imp ortance in the f actoriza tion. 8 S. MANKAD AND G. MICHAILIDIS The S m matrices are defin ed similarly to the singleview case, usin g no de- lev el net wo rk statistics. W e define S m using the same four net work statisti cs for eac h netw ork view. W eigh ted v ersions of the clustering co efficient and de- gree are u tilized for the Ret w eet an d Men tion net wo rks in order to tak e into accoun t the frequency of in teraction b et wee n p oliticians, since the fr equency should help measure the strength of a relatio n ship [Barrat et al. ( 2004 )]. F or instance, a w eigh ted n et wo rk statistic will distinguish b etw een a p olitician that is r et we eted by the same account h u ndreds of times v ers u s retw eeted once. The mo d el do es allo w for d ifferent statistics to b e defi n ed w ith eac h net work view, whic h ma y b e adv ant ageous in other con texts. The fi nal imp ortance measure I can also b e calculated similarly u s ing equation ( 2 ). Since Θ is common to all n et wo rks, the imp ortance measure is a result of integ r ating multiple n et wo rk views in addition to structur ed disco very . 3. Algorithms. The estimation algorithm w e present is an iterativ e one that cycles b etw een optimizing with resp ect to Θ , V m and Λ m with the f ol- lo wing u p dates: Θ = X m A T m S m Λ m (Λ T m S T m S m Λ m ) − 1 , V m = A T m S m Λ m (Λ T m S T m S m Λ m ) − 1 , Λ m = ( S T m S m ) − 1 S T m A m (Θ + V m )((Θ + V m ) T (Θ + V m )) − 1 . The u p dates are based on alternating least squares (ALS) and deriv ed through stand ard argumen ts [K ro onen b erg and de L eeuw ( 1980 )], wh ic h are sh o wn in the supplemental article [Mank ad and Michail idis ( 2015 )]. T ec hnically , b oth Θ and V m require solving nonn egativ ely constrained least squares p roblems, w hic h result in high iteration costs. S o, instead of exactly solving the constrained least squares p r oblem, we follo w a heuristic that solv es for an un constrained solution, then sets any entry less than a u ser- sp ecified constan t to that constan t. Pro jecting to a small constan t instead of zero follo w s the discussion in Gillis and Glineur ( 2008 ) and Kata yama, T ak ahashi and T ak euc h i ( 2013 ) to o v ercome n umerical instabilities th at oc- cur wh en to o man y elemen ts are exactly zero. Theoretical prop erties are difficult to obtain due to the p ro jection step. Y et th is appr oximati on is computationally efficien t, easy to implemen t, and has b een shown to ac hieve high quality s olutions [Berry et al. ( 2007 )]. The algorithm easily scales to net works w ith tens of thousands of no des. F or ev en larger netw orks on the order of millions of no d es, lo w-rank factoriza- tions should b e found usin g recent algorithmic adv ances that exploit parallel computing architec ture [Gem u lla et al. ( 2011 ), Rec h t and R ´ e ( 2013 )]. F or ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 9 our data, we find that the alternativ e least squares algorithm is s traigh tfor- w ard to implemen t and able to reco v er meaningful factorizations in a timely fashion. In the supplementa l article [Mank ad and Mic h ailidis ( 2015 )], w e also dis- cuss an alternativ e up dating ap p roac h for Θ and V m that is similar to the p opular “m u ltiplicativ e up dating” for NMF. While this approac h is also v ery easy to implement, w e fin d the ALS algorithm m ore numerically stable in higher dim en sions. 3.1. Initialization and c onver genc e criteria. An adv an tage of the ALS algorithm is that only Λ m needs to b e initializ ed if the order of the up dates is Θ , V m , Λ m . Moreo ver, recall that Λ m is unconstrained , thus bypassing the difficulties of in itializing the nonn egativ e factors whic h ha ve receiv ed extensiv e fo cus in th e NMF literature. W e find stable results b y initializing Λ m with norm ally distrib uted en tr ies having unit m ean and v ariance. Another imp ortant issue is sp ecifying the rank of the matrices Θ and V m . Ideally , the rank should b e equal to th e num b er of u nderlying communities and can b e ascertained by examining the accuracy of the reconstruction as a function of rank . In pr inciple, one could also apply cross-v alidation pro cedures for matrix factorization [Owen and P erry ( 2009 )], though this ma y b ecome cum b ersome w ith sparse or extremely large-sized net works. W e follo w a strategy similar to using a scree plot to choose the n umb er of comp onen ts to retain in Principal Comp onen t Analysis [Jolliffe ( 1986 )]. T o our knowledge, this rank selec tion approac h has not b een previously p ursued in the cont ext of NMF or Semi-NMF. S ho wn in Figure 2 , w e find that r anks greater than six (roughly the n um b er of und erlying p olitical parties) yield little marginal explanatory p o wer. Eac h subfigure is constructed by plotting the b est fitting facto rization o v er all p ossible net work sta tistic s u bsets of s ize t wo through four. The appropr iate rank of the m atrices Θ and V m is stable across the S m subsets, though there app ears to b e signifi can t impro vemen t when S m is defin ed with at least th r ee of the netw ork s tatistics. W e k eep all four n et wo rk statistics wh en defin ing S m for our analysis. Last, w e discuss conv ergence criteria used for the ALS algorithm. Let O ( i ) denote the v alue of the ob jectiv e fun ction at iteration i . Th en the algo rithm stops when |O ( i ) −O ( i − 1) | O ( i − 1) ≤ ε = 10 − 4 . W e find in all our in v estigations that the algorithm con verges w ithin 50 iterations. ε = 10 − 4 is also used for the pro jection th reshold. 4. Analysis of the p olitical multiview Twitter net works. 4.1. Do es Twitter influenc e tr anslate to the r e al world? Using the b est rank six factorization with S m defined with all four n et wo r k statistics, we 10 S. MANKAD AND G. MICHAILIDIS Fig. 2. The p er c entage of varianc e explaine d [ 100 ∗ (1 − P m k A m − ˆ A m k 2 F / k A m − ˆ µ m k 2 F ) , wher e ˆ µ i s a m atrix fil le d with the av er age value of A m ] for the Structur e d Semi-NMF with differ ent c onstruct ions of S m . Plotte d is the most ac cur ate mo del over thirty trials with r andom i nitializations for Λ m at e ach p ossible sp e cific ation. We use the b est r ank six mo del with f our network statistics c omp osing S m for the final analysis. rank MPs according to the estimated Θ and the imp ortance measure defined in equation ( 2 ). Figure 3 shows the imp ortance scores from the Structur ed S emi-NMF, Semi-NMF, P ageRank and HITS . Page Rank and HITS are computed using the ret w eet n et wo rk, whic h h as b een sh o wn to capture con v ersation dynam- ics b etter than the other net wo r k types [Ch a et al. ( 2010 )]. Not su rprisingly , the d ifferen t imp ortance measures are all p ositiv ely correlated. Accordingly , as sho w n in T able 1 , there is general agreemen t b et wee n Structured Semi-NMF, S emi-NMF and HITS in the top ten imp ortan t MPs. Man y of these MPs h eld leadership p ositions in the coalition or Op p osition cabinets. F or instance, Ed Milib and , leader of the Lab our Part y and of the Opp osition a t the time of w r iting, is prominen tly emphasized in all rankings. T om Wats on w as the Deputy Ch air of the Lab our Part y , and Chuka Umunna is the Shado w Secretary of State for Bu s iness, Innov ation and Skills. The exceptions are R achel R e eves , who b ecame the Sh ado w Secretary of S tate for W ork and Pensions for the Opp osition after the data w as collected, and ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 11 Fig. 3. I mp ortanc e sc or es b ase d on Structur e d Semi-NMF, Semi-NMF ( S m = I n × n ), PageR ank and HITS (Authority Sc or es). PageR ank and HITS ar e b oth c alculate d using the R etwe et network, while the ot her me asur es utilize al l thr e e networks . The r adius of the cir cle indic ates the c ount of futur e newsp ap er he ad lines as me asur e d with L exis–Nexis. The top t en MPs for the metho ds in e ach sc atterplot ar e lab ele d. David Camer on, who is Pri me Minister and in b oldfac e, was not in the top ten for any metho d. David Milib and , wh o held several imp ortan t p ositions in pr evious terms pr ior to data collection. Another commonalit y is that, w ith the exceptio n of P ageRank, every MP in the top ten is fr om the Lab our Pa rt y . Lab our MPs tend to b e estimated as most imp ortant, follo we d b y Conserv ativ e, and then Lib eral Demo crat MPs. The relativ e ranking among parties is consisten t with the data, wh ere Lab our MPs tend to b e the most activ e users in our data. Of the top fi ft y Twitter accounts in terms of n u m b er of retw eets or menti ons, only four are affiliated with another party—the Conser v ativ es. The Lib eral Demo crats are even less activ e, ranked in the hundreds in terms of num b er of ret w eets or m en tions. F or instance, Nic k Cle g g , leader of the Lib eral Democrats and Deput y Prime Minister at the time of wr iting, is t yp ically the top-ranked mem b er of his part y at fort y-nin e with Stru ctured Semi-NMF, fort y with P ageRank, and outside th e top h u ndred with b oth S emi-NMF and HIT S. Activit y in the data set is lik ely associated with longevit y on Twitter. F or instance, David Camer on , P r ime Minister and leader of the Conserv ativ es, is rank ed tw en ty-nine with Structur ed Semi-NMF, sixt y-eight with Semi- NMF, sixteen w ith Pag eRank, and tw o hundred and f orty-t w o with HITS. Cameron joined Twitter j ust as the data w as collected in Octob er 2012, and, th us, m ay ha v e artificially lo w lev els of acti vit y when compared against more recen t data. In spite of these c h allenges, Pa geRank and Structured 12 S. MANKAD AND G. MICHAILIDIS T able 1 MP r ankings and in p ar entheses the p arty and fr e quency that the M P app e ars in futur e he ad lines for Structur e d Semi -NMF, Semi-NMF ( S m = I n × n ), PageR ank and HITS (Auth ority Sc or es). L denotes L ab our, C denote s Conservat ive Rank Structured Semi-NMF Semi-NMF Pa geRank HITS 1 Ed Miliband (L, 2478) Ed Miliband (L, 2478) Ian Austin (L, 3) Mic h ael Dugher (L, 120) 2 Ed Balls (L, 580 ) Ed Balls (L, 580 ) William Hague (C, 771) Ed Miliband (L, 2478) 3 T om W atson (L, 253) Mic h ael Dugher (L, 120 ) Hugo Swire (C, 57) Ed Balls (L, 580) 4 Mic h ael Dugher (L, 120) T om W atson (L, 253) T om W atson (L, 253) Chuk a Umunna (L , 203) 5 Chuk a Umunna (L , 203) Chuk a Umunna (L, 203) Ed Balls (L, 580) Andy Burnham (L, 125) 6 Rachel R eeves (L, 54) Rachel Reeves (L, 54) Mic hael D ugher (L, 120) T om W atson (L, 253) 7 Stella Crea sy (L, 178) Chris Bryan t (L, 164) P at McF adden (L, 1) Rac hel Reeves (L, 54 ) 8 Chris Bryan t (L, 164) Stella Crea sy (L, 178) Ed Miliband (L, 2478) Chris Bry ant (L, 164) 9 T om Harris (L, 113) Luciana Berg er (L, 133) S tella Ceas y (L, 178) Diana Johnson (L, 105) 10 Da vid Miliband (L, 489) Andy Burnham (L, 125) Matthew Hanco ck (C, 32) T om Harris (L, 113) ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 13 Semi-NMF with us e of the S m matrix are able to b o ost these k ey MPs imp ortance, ev en though they in teract via Twitter with their MP colleagues relativ ely inf requen tly . W e h a ve s o far seen anecdotal evidence that man y MPs in leadership p o- sitions are emph asized b y the differen t tec hniques. Next, w e test in a regres- sion setting whether these different measures of Tw itter imp ortance p redict media cov erage, whic h is measured us in g Lexis–Nexis ( www.lexisnexis.com ) searc hes of the num b er of times an MP’s name app ears in headlines from Jan u ary 1, 2013, to Octob er 17, 2013. Th is interv al of time is strictly af- ter the Twitter data w as collected to a void end ogeneit y issues. Because the headline counts were ov erdisp ersed , w e use a quasi-P oisson regression. The mean and v ariance of the r egression h as form E (HeadlineCoun t i ) = exp( α + β I i + γ Cont rols i ) , (4) V ar (HeadlineCoun t i ) = ρ E (HeadlineCount i ) , (5) where ρ ≥ 1 is estimated fr om the data. HeadlineCount is the headline oc- currence frequency , I is d eriv ed using the different imp ortance measuremen t tec hniques, and Controls cont ain the v ariables Age, Gender, Constituency Size, Po litical Part y an d an ind icator v ariable denoting wh ether eac h MP represent s a constituency within the cit y of L on d on. Age is an imp ortant con trol v ariable, since w e exp ect younger MPs to b e more s avvy w ith s o cial media, whic h could affec t their headline co verage . S imilarly , we exp ect MPs with la rger co n stituencies, certain p olitical affiliations or London-b ased MPs to receiv e more media atten tion. Additional discuss ion in th e supplemental article [Mank ad and Mic h ai- lidis ( 2015 )] sho ws the Po isson d istributional assumption app ears more v alid when compared to other distribu tions f or o verdisp ersion, lik e negativ e bino- mial. Moreo ver, the qu asi-Poi sson results featured the smallest ro ot mean squared err or (RMSE) for all sp ecifications that we discuss n ext. In Figure 4 , w e examine the RMSE of the mo d el when using only cont rol v ariables, as w ell as con trol v ariables with ea c h influ ence measure separately . W e fi nd that the mo del using the p rop osed factorization features the lo w est RMSE, esp ecially after remo ving an outlier, Da vid Cameron, w ho receiv ed man y more future headlines than p redicted. As ment ioned ab ov e, Da vid Cameron joined Twitter just as the original data set wa s collect ed, p oten- tially creating an artificially lo w presence on Twitter. T able 1 in the supplemental article [M ank ad and Mic hailidis ( 2015 )] sho ws the fu ll results for the estimated m o del with Str u ctured Semi-NMF, w h ere the corresp onding co efficien t is statistically significan t and p ositiv e as ex- p ected. Sp ecifying S m leads to an imp ortance measure th at is asso ciated with future media headlines ev en when controlli ng for other influ ence mea- sures and demographic information, thus illustrating the imp ortance of guid- ing th e factorizati on solution. 14 S. MANKAD AND G. MICHAILIDIS Fig. 4. R o ot me an squar e d err ors f or the pr e dicte d numb er of he ad li nes using differ ent sp e cific ations of the r e gr ession mo del in e quations ( 4 ) and ( 5 ). “None” r ef ers to including only c ontr ol variables. “PageR ank” r efers to the c ontr ol variables plus the PageR ank i n- fluenc e me asur e, “HITS” r efers to the c ontr ol variables plus the HIT S influenc e me asur e, and so on. 4.2. Identifying imp ortant c onversation flows. Another adv an tage of the prop osed factorizatio n is that it can also b e us ed to extract p oten tially imp ortant con versation flows. W e construct su bgraphs b y k eeping no des in the top q th p ercen tile of P k (Θ + V m ) ik to reco v er structure sp ecific to eac h net work view. The Stru ctured Semi-NMF do es not incorp orate p art y affiliation for the factorizat ion. Y et it resu lts in more inte rpretable subgraph s than the al- ternativ e approac h in Figure 5 of lo oking at high degree no des within eac h part y . Sho w n in Figure 6 , there are denser within and b et wee n p art y connec- tions, and few er isolated no d es. Moreo v er, w ith th e exception of a handfu l of MPs, eac h no de can reac h ev ery other no de on the graphs. Thus, these net works h elp explain the infl uence r ankings f rom the pr evious section by iden tifyin g paths through whic h int eresting con tent flow ed. T racing the flow of con v ersations in th e 95 p ercen tile sub grap h s in Fig- ure 7 , w e see that the Lab our p oliticians tend to retw eet eac h other of- ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 15 (a) Ret w eet n etw ork (b) Men tions netw ork (c) F ollow s netw ork Fig. 5. Subnetworks of UK Memb ers of Parliament chosen by taking the highest de gr e e MPs in e ach p arty, with c olor and vertex shap es denoting p arty affiliation. MPs ar e dr awn in the same p ositi on as in Figur e 1 . ten. Man y of the Lab our MPs, including Stel la Cr e asy , Ed Milib and , Chuka Umunna , R achel R e eves , T om Watson and others, w er e universall y r ank ed as imp ortan t in the p revious section. E d Bal ls from Lab our in teracts d i- rectly with Gr e g Hands of the C onserv ativ e part y , who in tur n forms a m uch Fig. 6. Networks of UK Memb ers of Parliament, with c olor and vertex shap es denoting p arty affil iation. MPs in the top q th p er c entile of P k (Θ + V m ) ik ar e kept and dr awn in the same p osition as in Figur e 1 . 16 S. MANKAD AND G. MICHAILIDIS (a) Ret w eet n etw ork (b) Men tions n etw ork (c) F ollow s netw ork Fig. 7. Sub gr aphs c onstructe d for the UK MPs (top p anel) and Iri sh p olitic ans (b ottom p anel), whose no des ar e in t he top q = 95 p er c entile of P k (Θ + V m ) ik . Gr aphs ar e r e dr awn to optimize vertex lab els. smaller ret wee t clique with fello w Conserv ativ es Matthew Hanc o ck and Mike F abric ant . Since retw eeting can amount to an endorsement , w hile menti oning allo ws the author to con trol the con tent and sentimen t, there are a greater n umb er of cross-part y mentions edges. F or instance, David Camer on is mentio n ed often and follo w ed b y Lab our MPs, elev ating his imp ortance on those sp ecific net works, but is nev er ret weete d. This illustrates the v alue of utilizing all three typ es of net works for measuring imp ortance. 4.3. Analys is of Twitter networks fr om the Irish p olitic al spher e. W e pro- duce comparable, though less pronounced results with similar Twitter net- w ork d ata from the Irish p olitical scene fr om late 2012. W e organize the r aw data again pro vided in Greene and Cunningham ( 2013 ) in to the same three Twitter netw orks, eac h con taining 348 no d es that represen t the accoun ts of Irish p oliticians and p olitical organizations. The d ata con tains p oliticia ns from all lev els of go vernmen t, includin g th e Pr esiden t of the Republic of Ireland, mem b ers of the lo cal and n ational gov ernmen t, and elected repre- sen tativ es for the E u rop ean Union. ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 17 A ma jorit y of accoun ts b elong to member s of the Irish national p arlia- men t, which is also a b icameral legislativ e b o dy with elections held at least once ev ery five years us ing a system [Coakley and Gallag her ( 2005 )]. Th e lo we r house (D´ ail ´ Eireann) is the principal house in the Irish system an d con- tains 166 elected mem b ers, the senate (Seanad ´ Eireann) con tains a mixture of 60 app ointe d and elected members. Th er e are multiple p olitical p arties in the d ata: 33 Fianna F´ ail, 127 Fine Gael, 6 Green, 20 In dep end en t, 68 Lab our, 22 Sinn F´ ein and 8 Others. Appro ximately 60 Twitter accoun ts are registered to p olitical parties, for example, “Fine Gael O ffi cial,” “Lab our W omen,” etc. After sp ecifying S m as b efore and setting K = 7 (c h osen in a similar fashion), w e p lot th e imp ortance scores in Figure 8 and list the top ten ac- coun ts in T able 2 from the S tructured S emi-NMF, Semi-NMF, P ageRank and HITS. In contrast to the British MP dynamics, p olitical organizations seem to play a m u c h more imp ortan t role in online conv ersations within the Ir ish p olitical sp here, as there is b road agreemen t among th e different imp ortance measures that part y organizatio n accoun ts are h ighly rank ed, suc h as Fine Gael O ffic i al , Y oung Fine Gael , and The L ab our Party . S ome p oliticians are also universall y rank ed as imp ortan t. Michael D Higgi ns , the President at the time of wr iting, is ranke d elev en th u n der the S tructured Semi-NMF, thirteen th u nder P ageRank and in the top ten for all other m eth- Fig. 8. I mp ortanc e sc or es b ase d on Structur e d Semi-NMF, Semi-NMF ( S m = I n × n ), PageR ank and HITS (Authority Sc or es) ar e b oth c alculate d using the Re twe et network. The r adius of the cir cle indi c ates c ount of futur e newsp ap er he ad lines as m e asur e d with L exis–Nexis. The top ten Irish p olitici ans for the metho ds in e ach sc atterplot ar e l ab ele d. Michael Higgins, Pr esident, is b oldfac e d. 18 S. MANKAD AND G. MICHAILIDIS T able 2 Irish p olitician r ankings and in p ar entheses the p arty and fr e quency that the p olitician app e ars in futur e he ad l ines f or Structur e d Semi-NMF, Semi -NMF ( S m = I n × n ), PageR ank and HITS (Auth ority Sc or es). L denotes L ab our, FG denotes Fi ne Gael, I nd denotes Indep endent and SF denote s Sinn F´ ein. Ther e ar e no p ar enthetic al he ad l i ne c ounts or p arty names for p olitic al or ganizations Rank Structured Semi-NMF Semi-NMF Pa geRank HITS 1 Fine Gael Official The Labour Part y Fine Gael Official Fine Ga el Official 2 Y oung Fine Gael Ao dh´ an ´ O R ´ ıord´ ain (L, 1) Fianna F´ ail Y oung Fine Gael 3 Enda Kenny (FG, 166) Fine Gael Official The Lab our Pa rty The Labour Part y 4 Lucinda Creigh ton (F G, 20) Jillian v an T urnhout (Ind, 0) Sinn F ´ ein Simon Harris (FG , 4) 5 Jillian v an T urnhout (Ind, 0) Mic h ael D Higgins (L, 25) Jillia n va n T urnhout (Ind, 0) Ao d h´ an ´ O R ´ ıord´ ain (L, 1) 6 The Labour Part y Ciar a Conw ay (L, 0) Ao d h´ an ´ O R ´ ıord´ ain (L, 1) Jillia n va n T urnhout (Ind, 0) 7 Jerry Buttimer (FG, 2) Simon Harris (FG , 4) Y oung Fine Gael F rances Fitzgerald (FG, 7) 8 Simon Harris (F G, 4) John Gilro y (L, 3) Dermot Lo oney (Ind , 0) Mic h ael D Higgins (L, 25) 9 Simon Co veney (FG, 10) Dermot Looney (Ind, 0) Simon H arris (FG, 4) Jerry Buttimer (FG, 2) 10 P asc h al Donoho e (FG, 4) Jerry Buttimer (FG , 2) Matt Carth y (SF, 0) Dermot Looney (Ind, 0) ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 19 o ds. Jil lian v an T urnhout is an app ointe d mem b er of the Seanad ´ Eireann and is consistently ranke d highly by the differen t influence measur es. Lik e- wise, Jerry Buttimer is a mem b er of the D´ ail ´ Eireann and form er ly of the Seanad ´ Eireann, and Simon H arris was elected to the D´ ail ´ Eireann in 2011 as its yo u ngest memb er. There are k ey differences, ho wev er, among th e v arious imp ortance mea- sures. De rmot L o oney is ranke d in the top ten for Semi-NMF, P ageRank a nd HITS, b ut nineteenth und er Structured Semi-NMF. He seems to b e ranked higher than one ma y exp ect, since Lo oney was part of a lo cal gov er n men t and serv ed as ma y or of the South Dublin Count y Council. Lucinda Cr eighton is rank ed fourth for the Str u ctured S emi-NMF, but is not in the top ten for other imp ortance measures. A t the time of data co llection, Creigh ton serv ed as Minister for E urop ean Affairs rep resen ting I reland in negotiations on Ireland’s EU/IMF bailout and the hosting of Ireland’s presidency of the Eu- rop ean Union. W e also see that Enda K enny , an Irish Fine Gael p olitician who has b een the T aoiseac h (pr im e minister) since Marc h 2011, is ranked in the top ten only under th e Structured Semi-NMF approac h. He is r ank ed fortieth with Semi-NMF, thirty-fourth with P ageRank and s ev ent y-second with HITS . The larger differences b et ween the Stru ctured Semi-NMF and other im- p ortance measures when compared to the UK MP resu lts can b e explained b y the s parser in put net w orks , as sh o wn in Figure 9 , whic h increase the effect of the S m matrices. Figure 7 sho w s the conv ersation d ynamics that help explain w hy certain accoun ts are r ank ed h ighly with the stru ctured approac h. F or instance, we see that Jil lian van T urnhout , an In dep end ent, tends to b e ret w eeted or menti oned b y Fianna F´ ail organizations in add ition to Fine Gael, Lab our and other In dep end en t p oliticians. Accoun ts w ithin the Lab our part y also form their o w n cli que, cen tered around Michael D Higgins and the official Lab our party accoun t. Finally , w e test whether these d ifferen t m easur es of T witter imp ortance predict media co verag e with the s ame quasi-Poi sson mo del as in equa- tions ( 4 ) and ( 5 ). Headline o ccurren ce f requency from Jan u ary 1, 2013, to Octob er 17, 2013, is again measured using Lexis–Nexis searc h es, I is deriv ed using th e differen t imp ortance measurement tec hniqu es, and Con - trols conta in s the v ariables Age, Gend er, P oliticia n Type (local, presiden tial, D´ ail ´ Eireann, Seanad ´ Eireann, Eur op ean Union), Constituency and P olit- ical P art y . Since the d ata conta in s p oliticians in lo cal go vernmen t, wh ere, for example, exact constituency size is not easily defin ed for council mem- b ers, w e include a fixed effect for every unique elec toral district or area. The 134 uniqu e areas are identified usin g a num b er of online sour ces, includ - ing official p art y and candidate website s, newsp ap er articles and election results p osted on h ttps ://ele ctionsireland.org/ . P art y organizatio n account s are remov ed when estimating the r egression mo d el. 20 S. MANKAD AND G. MICHAILIDIS Fig. 9. Networks of Irish p olitici ans, with c olor and vertex shap es denoting p arty affil- iation. Politicians in the top q th p er c entile of P k (Θ + V m ) ik ar e kept and dr awn i n the same p osition as in Fi gur e 1 . T able 2 in the supplemental article [M ank ad and Mic hailidis ( 2015 )] sho ws the Structured Semi-NMF measure is again a s tatistically significan t predic- tor for headline co verage rate, after con tr olling for all other v ariables, and Figure 4 shows again that the prop osed appr oac h results in an influ ence measure that improv es forecasting accuracy relativ e to alternativ e m o del sp ecifications. 5. Conclusion. The Structured Semi-NMF p erforms b est in b oth data sets, th ough the improv emen t wa s only sligh t in the Irish cont ext. The o v erall results were drive n by utilizing all three t yp es of n et wo rks for measurin g imp ortance and sp ecifying the S m matrices to b o ost imp ortant p oliticians with particular t yp es of link ages. One p otenti al issue w ith the analysis is that Lexis–Nexis co v erage of non- US media and, in particular, the Irish media app ears to b e imp erfect. How- ev er, even with p o or cov erage, as long as it is representa tive of the ov erall media landscap e, then the rep orted results will b e meaningful. W e are also una w are of other to ols that can b e us ed for suc h searc hes. An other issue is that p oliticians m a y app ear in headlines that reference their office, f or example, “the president .” A more comprehensive newsp ap er h eadline coun t is difficult to ascertain, bu t could in f uture w ork provide further v alidation of the results presente d here. ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 21 Giv en that b oth data sets are exclusiv ely link meta-data, our find ings supp ort the notion that the significant c hallenges asso ciated with con ten t analysis can often b e complimen ted or a voided with netw ork analysis to ols for tasks lik e identifying individuals influentia l w ith in so cial netw orking plat- forms. W e b eliev e this is partly explained b y the restrictio n of the p opu lation to p oliticia ns and cl osely related orga nizations, which ensures to some extent that the unob s erv ed conte n t is b oth homogeneous and relev an t. A related pr oblem of identifying emergence of key individ u als, comm u- nities or trend s b ased on net work data requires data collected ov er time. Smo othing strategies, su c h as in Mank ad and Michai lidis ( 2013b ), should b e useful to extend the giv en mo d el for n et wo rk time-series. W e b eliev e the prop osed model ca n b e useful for applicatio n s in marke tin g and e- commerce, where data is collected on ecosystems that are close to a steady state. Oth - erwise, as w e sa w with Da vid Cameron, the mo del can misc haracterize the imp ortance of k ey individuals. S p ecific q u estions relati ng to path prop erties, suc h as information diffusion [Romero, Meeder and Kleinberg ( 2011 )] or the spread of epidemics [Chew and Eysenbac h ( 2010 )], like ly require additional metho ds an d tec h n iques sp ecific to those su btopics. There also has b een recen t w ork on a related problem w h en no de f ea- tures are measured along with net w ork data [ F osdick and Hoff ( 2013 , 2014 ), Y ang, McAuley and Lesk ov ec ( 2013 )]. F or instance, one ma y ha v e access to demographic information or topics and themes of eac h accoun t’s tw eets as in Greene, O’Callaghan and Cun ningham ( 2012 ). While it app ears the pro- p osed mod el could b e us efu l in this setting, using external cov ariates on th e no des to construct S m lik ely raises additional issues th at requir e care, such as v ariables b eing a v ailable for some, but not all no d es. In this work, the no de-lev el statistics are “in tern ally” calculated directly from th e n et wo rk and, thus, will alwa ys co ve r the full n et wo rk. A strength of the Str u ctured Semi-NMF mo d el is th at it encompasses differen t t yp es of links (w eigh ted and binary ), in tegrates information f rom m u ltiple net works and allo ws the analyst to utilize cont extual kn owledge ab out the give n netw ork ed system. The method dep ends up on the analyst c ho osing appropriate, con text-sp ecific no de-leve l statistics. As s u c h, the al- ternating least squares algorithm pr o vides opp ortun ities for additional regu- larizatio n in situations where the S m matrices are high dimensional or when there are no no de-sp ecific v alues that are ob vious to use. SUPPLEMENT AR Y MA TERIAL Supp lemen t to “Analysis of m u ltiview legislativ e net works with struc- tured matrix factorization: Do es Twitter in fl uence translate to the real w orld?” (DOI: 10.121 4/15-A OAS858SUPP ; .p d f ). W e pr o vide add itional sim u lation r esults, details and deriv ations for estimation algorithms, and detailed P oisson regression r esu lts. 22 S. MANKAD AND G. MICHAILIDIS REFERENCES Barra t, A. , Bar th ´ elemy, M. , P astor-Sa torras, R. and Vespignani , A. (2004). The arc h itecture of complex wei ghted netw orks. Pr o c. Natl. A c ad. Sc i. USA 101 3747–3752 . Berr y, M. W. , Br o wne, M. , Langv i lle, A . N. , P auca, V. P. and Plemmons, R. J. (2007). A lgorithms and applications for approximate nonnegative matrix factorization. Comput. Stat ist. Data Anal. 52 155– 173. MR2409971 Brandes, U. , Fleischer, D. and Puppe, T. (2006). D ynamic sp ectral la yout of small w orlds. In Gr aph Dr awing ( P. Heal y and N. Nik olo v , eds.). L e ctur e N otes in Com- puter Scienc e 3843 25–3 6. Sp ringer, Berlin. MR2244497 Cha, M. , Haddadi, H. , Be nevenuto, F. and Gummadi, P. K. (2010). Measuring u ser influence in Twitter: The million follo wer fallacy. ICWSM 10 10–1 7. Chew, C. and Eysen ba ch, G. (2010). Pandemics in the age of Twitter: Conten t anal ysis of tw eets during the 2009 H1N1 outbreak. PL oS ON E 5 e14118. Co akley, J. and Galla gher, M. (2005). Politics in the R epublic of Ir eland . Psychology Press, New Y ork. Ding, C. , Li, T. and Jordan, M. I. (2010 ) . Co nv ex and se mi-nonnegative matrix factor- izations. IEEE T r ans. Patter n Ana l. Mach. Intel l. 32 45–55. Fienberg, S. E. (2012). A b rief history of statistical mo dels for netw ork an alysis and open c h allenges. J. Comput. Gr aph. Statist. 21 825–839. MR3005799 F osdick, B. K. and Hoff, P. D. (2013). T esting and modeling dep end encies betw een a netw ork and no dal attributes. Av ailable at arXiv:1306.4 708 . F osdick, B . K. and Hoff, P. D. (2014). Separable factor analysis with applications to mortalit y data. A nn. Appl. Stat. 8 120– 147. MR3191985 Freeman, L. C. (1979). Centrali ty in so cial netw orks conceptual clarification. So cial Networks 1 215–239 . Gemulla, R. , Nijkamp, E. , Haas, P. J. and Sismani s, Y. (2011). Large-scale matrix factorizatio n with distributed sto chastic gradient descent. I n Pr o c e e di ngs of the 17th ACM SIGKDD International Confer enc e on Know le dge Disc overy and Data M i ning 69–77. ACM , New Y ork. Gillis, N. and Glineur, F. (2008). Nonnegative factorization and the maximum edge biclique problem. Av ailable at arXiv:0810.42 25 . Golbeck, J. , Grimes, J. M. and Ro gers, A. (2010). Twitter use by th e U.S. Congres s. J. Am. So c. Inf. Sci. T e chnol. 61 1612– 1621. Greene, D. and Cunni ngham, P. (2013). Pro ducing a unified graph representatio n from multiple so cial n etw ork views. Av ailable at arXiv:1301.58 09 . Greene, D. , O’Callaghan, D. and Cu n ningham, P. (2012). Identifying topical Twit- ter communities via user list aggregation. In 2nd International Workshop on Mining Communities and Pe ople Re c ommenders (COMM PER 2012) at ECML 2012 . Bristol, UK. Huberman, B. A. , Ro mero, D. M. and Wu, F. (2008). So cial netw orks that matter: Twitter under the microscope. CoRR abs/ 0812.1045 . Jolliffe, I. T. (1986 ). Princip al Comp onent A nalysis . Springer, New Y ork. MR0841268 Ka t a y ama, J. , T akahashi , N. and T akeuchi, J. (2013). Boundedness of modified m ul- tiplicativ e up dates for n on n egativ e matrix factorization. In IEEE 5th International Workshop on Computational A dvanc es in Multi-Sensor A daptive Pr o c essing (CAM- SAP) 252–25 5. St. Martin. Kleinberg, J. M. (1999). A uthoritative sources in a hyperlinked en vironment. J. A CM 46 604–632. MR1747649 Ko re n, Y. (2005). Dra wing graphs by eigenv ectors: Theory and practice. Comput. Math. Appl. 49 1867–18 88. MR2154691 ANAL YZING MUL TIVIEW NETWORKS WITH MA TRIX F ACT ORIZA TION 23 Kro one nberg, P. M. and de Lee uw, J. (1980 ). Principal component analysis of three- mod e data by means of alternating least squares algori thms. Psychometrika 45 69–97. MR0570771 Lee, D. D. and Seung, H. S. (1999). Learning the parts of ob jects by non-negative matrix factorizatio n. Natur e 401 788–7 91. Lin, Y.-R. , Chi , Y . , Zhu, S. , Sun daram, H . and Tseng, B. L. (2008). F acetnet: A framew ork for analyzing communities and their evolutions in dy namic netw orks. In Pr o c e e dings of the 17th Inter national Confer enc e on W orld W ide Web . 685–694. ACM, New Y ork. Mankad, S. and Michailid is, G . (2013a). Discov ery of path-imp ortant nodes using struc- tured semi-nonnegative matrix factorization. In IEEE 5th International W orkshop on Computational A dvanc es in Multi-Sensor A daptive Pr o c essing (CAMSAP) 288–29 1. St. Martin. Mankad, S. and Michailid is, G . (2013b). St ructural and functional disco very in dyn amic netw orks with non-negative matrix factorization. Phys. R ev. E 88 042812 . Mankad, S. and Mi chailidis, G. (2015). Supp lemen t to “Analysis of multiview legi slativ e netw orks with structured matrix factoriza tion: D o es Twitter influen ce translate to th e real w orld?” DOI: 10.1214 /15-AO AS858SUPP . McKel vey, K. , DiGrazi a, J. and Rojas, F. (2014). Twitter pub lics: Ho w online p olitical comm unities signaled electoral outcomes in the 2010 U S house election. Information, Communic ation & So ciety 17 436–450. Newman, M. E. J. (2010). Networks . Oxford Un iv. Press, Oxford. MR2676073 Ow e n, A. B . and Perr y, P. O. (2009 ). Bi-cross-v alidation of the SVD and the nonneg- ative matrix factorization. Ann. Appl. Stat. 3 564–594 . MR2750673 P a ge, L. , Brin, S. , M otw ani , R. and Wi n ograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab, Stanford, CA. Avai lable at: http://ilpubs.stanford.edu:8090/422/ . Psorakis, I. , Rob er ts, S. , Ebden, M. and Sheldon, B. (2011). Overlapping communit y detection using Ba yesian non-n egativ e matrix factorization. Phys. R ev. E 83 066114. Recht, B . and R ´ e, C. (2013). P arallel stochastic gra dient algorithms for large-scale ma- trix completion. Math. Pr o gr am. Comput. 5 201–2 26. MR3069879 Ro he, K. , Cha tterjee, S . and Yu, B. (2011). Sp ectral clustering and the high- dimensional stochastic blo c kmo del. Ann. Statist. 39 1878–1 915. MR2893856 Ro he, K. and Yu, B. (2012). Co-clustering for directed graphs; th e sto chastic co- blockmod el and a spectral algorithm. A v ailable at arXiv:1204.22 96 . Ro mero, D. M. , Meede r, B. and Klei nberg, J. (2011). Differences in the mechanics of information diffusion across topics: Idioms, p olitical hash tags, and complex contagion on Twitter. In Pr o c e e di ngs of the 20th International Confer enc e on World Wide Web . 695–704 . AC M, New Y ork. Sal ter-To wnshe nd, M. and Murphy, T. B . (2015). Role analysis in netw orks using mixtures of exp onential random graph models. J. Comput. Gr aph. Statist. 24 520 –538. Sal ter-To wnshe nd, M. , White, A. , G ollini, I. and Murphy, T. B. (2012). Review of statistical n etw ork analysis: Models, algorithms, and softw are. Stat. A nal. Data Min. 5 260–26 4. MR2958152 The New York Times Blogs (2011). Twitter Starts Selling Po liti- cal Ads. Av ailable at http://thecauc us. blogs.nytimes.com/2011/09/21/ twitter- starts- selling-pol itical- ads/ . Accessed: 2013-11-13. The New York Times B logs (2012). Pepsi and Twitter Ann ounce Partnership on Ad Campaign. A v ailable at http://mediadecode r. blogs.nytimes.com/2012/05/30/ pepsi- and- twitter- announce- partnership- on- ad- campaign . Accessed: 2013-11-13. 24 S. MANKAD AND G. MICHAILIDIS The New York Time s (2013). Using Twitter to Mov e the Markets. http://www. nytimes.com/2013/10/07/business/media/using- twitter- to- move- the- markets. html . Accessed: 2013-11 -13. Tumasjan, A. , Sprenge r , T . O. , Sandner, P. G. and Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal ab out p olitical sentime nt. In Pr o- c e e dings of the F ourth International AAAI Confer enc e on Weblo gs and So cial Me dia 178–185 . W ashington, DC. Twitter, Inc. (2014). Ab out Twitter, I nc. Av ailable at https://about.twitter.com/ company . Accessed: 2014-09 -19. Unankard, S. , Li, X. , Sharaf, M. , Zhong, J. and Li, X. (2014). Predicting elec- tions from social netw orks based on sub-event det ection and sentimen t analysis. In Web I nformation Systems Engine ering—WISE 2014 ( B . B ena t a llah , A. Best a v r os , Y. Manolopoulos , A. V akali and Y. Zhang , eds.). L e ctur e Notes i n Computer Sci- enc e 8787 1–16. Springer, Berlin. W ang, F. , Li, T. , W ang, X. , Zhu, S . and Ding, C. (20 11). Comm unity disco very using nonnegative matrix fa ctorization. Data Min. Know l. Di sc ov. 22 493–5 21. MR2785131 Xu, W. , Liu, X. and Gong, Y. (2003). Do cument clustering based on non-negative matrix factorizatio n. In Pr o c e e dings of the 26th Annual International ACM SIGIR C onfer enc e on R ese ar ch and D evelopment in Informaion R etrieval 267–273. ACM, New Y ork. Y ang, J. , McAuley, J. and Lesko vec, J. (2013). Comm u nity detection in netw orks with n o de attributes. In IEEE 13th International Confer enc e on Data Mini ng (ICDM) 1151–11 56. IEEE, New Y ork. Opera tions, Technology and Informa tion Management Cornell University Ithaca, New York 14850 USA E-mail: smank ad@cornell.edu Dep ar tment of St atis tics University of Michigan Ann Arbor, Michigan 4 8109 USA E-mail: gmich ail@umich.edu
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment