Dynamic Network Cartography

IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 1 Dynamic Network Cartography † Gonzalo Mateos (contact aut hor) and K etan Rajawat ∗ Abstract Communica tion networks have e volved from specialized, resear ch and tactical tran smission systems to large-scale and highly complex interco nnection s of intelligent devices, in creasingly becoming m ore commercia l, consum er-oriented, and heterog eneou s. Propelled by emergent social networking serv ices and high-d eﬁnition streaming platforms, n etwork trafﬁc has gr own explosively than ks to the a dvances in pro- cessing speed and storage capacity of state-o f-the-ar t commun ication technolog ies. As “n etizens” deman d a seamless ne tworking exper ience that e ntails no t on ly high er speeds, but also re silience and robustness to failures a nd m alicious cyber-attacks, ample o ppor tunities for signal proc essing (SP) resear ch arise. Th e vision is for u biquitou s smar t network de vices to enable d ata-driven statistical learn ing algor ithms for distributed, robust, and online network operation and man agement, ad aptable to the dyn amically-evolving network landscape with minimal need for h uman interventio n. Th e present paper aims at delineating the analytical b ackgro und and the relevance of SP tools to dynamic network mon itoring, introd ucing the SP reader ship to the con cept of dynamic n etwork cartography – a fram ew ork to construct map s of the dynamic network state in an efﬁcient and scalable man ner tailored to large-scale heterog eneous networks. I . I N T RO D U C T I O N Emergence of multimedia-enriched s ocial n etworking services and Internet-friendly p ortable devices is multiplying network trafﬁc volume da y b y da y [53]. W ireless connec ti vity un der the en visioned d ynamic spectrum p aradigm [29] relies on mo bile n etworks of div erse nodes, which are nev ertheles s u nited by unparalleled co gnition c apabilities, a daptability , a nd d ecision-making a ttrib utes. Moreover , the advent of networks of intelligent devices such as those de ployed to monitor the smart power grid, transportation networks, med ical information networks, and cogn iti ve radio (CR) networks, will transform the commu- nication infrastructure to an ev en more co mplex an d heterogene ous o ne. Thus, ensu ring compliance to service-level agreeme nts an d qua lity-of-service (QoS) guarantees ne cessitates b reakthrough ma nagemen t and mo nitoring tools providing operators with a c omprehens i ve vie w of the network lan dscape . Situational awareness provided by suc h tools will be the key e nabler for e ff ective information dissemination, routing and co ngestion c ontrol, network hea lth manag ement, risk analysis, and security a ssuranc e. But this great promise come s wit h great challenges . Acquiring network-wide performance and utilization metrics for lar ge ne tworks is no ea sy tas k. Sup pose for instance that traf ﬁc v olumes are of interest, not only for ga uging instantane ous network health, but a lso for more complex network ma nagemen t tasks such as intrusion detection, capa city provisioning, and n etwork planning [56]. While traf ﬁc volumes on † W ork in this paper was supported by the NS F-ECCS grant no. 1202135 . The authors would like to t hank Prof. G. B. Giannakis (U. of Minnesota), for his in valuab le help as P hD advisor . ∗ The authors are with the Dept. of Electrical and Computer Engineering and the Digital T echnology Center, U. of Minnesota, 200 Union St reet S E, Minneapolis, MN 55455. T el/fax: ( 612)626 -7781/625-4583 ; Emails: { mate0058,keta n } @umn.edu IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 2 links (als o called link c ounts) are readily acquired u sing off-the-shelf tools such a s the simple network manageme nt protocol (SNMP), missing link-count measurements may s till skew the network operator’ s perspec ti ve. SNMP packets may be dropp ed for insta nce, if s ome links become c ongeste d, rendering link- count information for those links more important, as well as le ss av ailable [47], [49]. Class ical a pproach es relying e ither on s imple time-series interpolation or on regularized least-squa res (LS) formulations for predicting the missing link counts [50 ], have not b een able to fully c apture the comp lexity o f the Internet traf ﬁc. This is evidence d by the recen t upsurge of effort s tow ard advanced network tomography [13], an d spatio-temporal trafﬁc estimation algorithms for network monitoring [26], [49], [56]. Similarly , path metrics such a s end-to-end delays are of gre at interest to se rvice providers be cause they directly af fect the end-user e xperienc e. The challenge here is that the number of paths gro ws very f a st as the number of nodes inc reases. Prob ing exhaus ti vely all origin-destination pa irs is impractical a nd wasteful of resource s even for mo derate-size networks [17], [48]. Accurate pred iction of missing d elays bas ed on the inhe rent e. g., topology-induced c orrelation or smoothness traits among link and path quantities is therefore crucial for statistical ana lysis and monitoring tas ks [32]. While the p rev a iling operational paradigm adopted in current n etworks en tails nodes c ontinuously communicating their link mea surements to a cen tral mon itoring s tation, in-network distributed coo peration through local interactions is preferred for s calability an d robustness considerations [38]. Con ventional n etwork monitoring tools entail a cou ple of additional limitations. First, they are typically resource heavy and tend to overload network ope rators with crude, u nreﬁned data, without enoug h process ing to s eparate the “ data wheat from the chaff ”; see e.g., [19 ] and refe rences therein. It is thu s of paramoun t importanc e to construct parsimonious descriptors of the network state , for the purpo se of modeling, monitoring, and management of complex interconnected systems. Due to the di versity of modern networks, the netw ork state can incorporate typical quantiti es such as t rafﬁc volumes and end-to-end delays, as we ll as latent s ocial me trics s uch as h ierarchy , repu tation, an d vu lnerability . Se cond, ma licious activiti es intended to undermine network functionality or co mpromise secrecy of da ta h av e grown in s ophistication, thus rendering traditional signature-base d intrusion detection schemes increasingly obsolete. Intrusion attempts and ma licious attacks manifest themselves as abrup t cha nges in network states [5], a nd such anomalous patterns a re o ftentimes h idden within the raw high-dimen sional n etwork data [55]. For thes e reasons , un veiling network anomalies in a reliable a nd computationally-efﬁcient manne r is a challenging yet e ssential goal [33], [38], [55]. All in a ll, accurate ne twork d iagnosis and statistical a nalysis tools are instrumental for maintaining seamless e nd-user experienc e in d ynamic e n v ironments, a s well a s for en suring ne twork sec urity and stability . In this direction, this tutorial advocates the co ncept of dy namic network c artography as a tool for statistical mo deling, monitoring, a nd mana gement of complex networks. Focus will be placed on two complementary as pects of network cartography , namely , online con struction of globa l network state maps using o nly a fe w measureme nts, and un veiling o f ne twork a nomalies a cross network ﬂows an d time. The surveyed cartography algorithms leverage recent advances in mach ine learning a nd statistical signal IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 3 process ing (SP) methods, includ ing s parsity-cogniza nt lea rning, kri ged Kalman ﬁltering of d ynamical process es over networks, n uclear n orm minimization for low-rank matrix completion, se mi-supervised dictionary learning, a nd in-network optimization via the alternating-directions metho d of multipliers. Through a unifying treatment t hat rev olves around network c artography , this pape r de monstrates how beneﬁts from foundationa l SP methods c an pe rmeate to dynamic n etwork monitoring, and collectiv ely enable inferen ce of g lobal n etwork health, thus leading to enhanc ed network robustness a nd Qo S. I I . G L O B A L P E R F O R M A N C E P R E D I C T I O N V I A DY N A M I C A L N E T W O R K C A RT O G R A P H Y This section deals with the problem of mapping the network state fr om incomplete sets of measuremen ts, and touch es upo n two a pplication domains . A d ictionary learning algorithm is introduc ed ﬁ rst to efﬁciently impute missing link traf ﬁc v olumes, using measureme nts from a wide c lass of (possibly non-stationary) traf ﬁc pa tterns [26]. Su bseque ntly , the problem of tracking and predicting e nd-to-end network delay is considered , a nd the dyn amic n etwork kriging approa ch of [45] is desc ribed. A. Semi-supervised dictionary learn ing for trafﬁc maps Consider an Internet protocol (IP) n etwork comprising N nod es and L links, carrying the trafﬁc of F origin-destination ﬂows (network c onnections ). Let x l,t denote the trafﬁc volume (in bytes or pac kets) passing through link l ∈ { 1 , . . . , L } over a ﬁxed interval of time ( t, t + ∆ t ) . Link counts across the e ntire network a re collecte d in the vector x t ∈ R L , e.g. , using the ubiquitous SNMP protocol. Sinc e measu red link counts are both unreliable and incomp lete due to hardware or software malfunctioning, jitter , an d communication e rrors [47], [56 ], they a re expressed as noisy versions of a subs et of S < L links y t = S t x t + ǫ t , t = 1 , 2 , . . . (1) where S t is an S × L se lection matrix with 0-1 entries who se rows correspond to rows of the identity matrix of s ize L , an d ǫ t is a n S × 1 zero-mea n noise term with c onstant variance ac counting for measu rement and s ynchroniza tion errors. Given y t the aim is to form an estimate ˆ x t of the full vector of link counts x t , which in this c ase de ﬁnes the network state. A simple approach implemented in measu rement-process ing software such as RR Dtool [43], i s to ignore the no ise term an d rely on one -dimensional interpolation for the time series { x l,t } per link l . The applicability and ac curacy of this sc heme is howev er limited, s ince it tacitly a ssumes that the entries of x t are unc orrelated; missing e ntries x l,t are few and do no t occu r in bursts; a nd the time s eries { x t } is stationa ry . Nevertheless, n one of these as sumptions h olds true in real ne tworks [47]. The relianc e on stationarity and av ailability of measurements from contiguous time intervals can b e for gone if e stimation of x t is pe rformed for each t indivi dua lly . In principle, ˆ x t can b e obtaine d if the volumes o f o rigin-destination (OD) trafﬁc ﬂ ows z t ∈ R F are av ailable, sinc e they are rela ted through x t = Rz t (2) IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 4 where the so-termed routing matrix R := [ r l,f ] ∈ { 0 , 1 } L × F is su ch that r l,f = 1 if link l carries the ﬂow f , a nd zero o therwise. Howe ver , measu ring z t is even more difﬁcult and in prac tice z t is itself e stimated from { x t } through tomographic traf ﬁc infer ence [13], [32], wh ere given R and noisy link counts, the goal is to e stimate the OD ﬂows a s the so lution of a linea r in verse p roblem. Since the in verse problem is h ighly und er- determined  F = O ( N 2 ) ≫ L = O ( N )  , e arly approach es relied on p rior knowledge in the form of statistical models for the OD ﬂ ows (su ch as the Poisso n, Gauss ian, logit-choice , or gra vity models), tha t ultimately s erve as comp lexity-controlling (that is regularization) mecha nisms [32, Ch. 9]. Among thes e, the state-of-the-art traf ﬁc matrix e stimation a lgorithm u ses an entropy-based regularizer , and ha s b een sh own to be fast, acc urate, robust, and ﬂexible [54]. T ime-series an alysis-base d approach es (such as the Kalman ﬁlter in [50]) have also be en prop osed for sce narios where link-coun t meas urements are available over co ntiguous time slots. Recently , a li nk-co unt prediction algorit hm w as put forth in [26], wh ere missing entries o f x t are estimated from historical mea surements in T S := { y t } T t =1 by leveraging the structural regularity of R through a se mi-supervised dictionary lea rning (DL) app roach. Un der the DL framework, data-dr iven dictionaries for s parse signa l representation are ad opted as a versatile mean s of capturing parsimonious signal structures ; see e .g., [52] for a tutorial treatment. Propelled by the s ucces s of compress i ve sa mpling (CS) [23], s parse signa l modeling h as led to major advances in se veral ma chine learning, audio and image proc essing tasks [51], [52]. Motiv a ted by these idea s, it i s postulated in [26 ] that link counts can b e re presented as a linear comb ination x t = Bw t of a few ( ≪ Q ) co lumns of an over -complete dictionary (basis) matrix B := [ b 1 , . . . , b Q ] ∈ R L × Q , where w t ∈ R Q is a sparse vector of expansion coefﬁcients. Many signals includ ing speec h and n atural images admit sparse repres entations even u nder generic predeﬁn ed dictionaries, suc h as tho se based on the Fourier an d the wavelet bases, respectively [52]. Like audio and natural imag es, link c ounts can exhibit strong correlations as evidence d from the s tructure of R [cf. (2)]. For ins tance, the traf ﬁc volumes on links i a nd j a re highly correlated if they both carry common ﬂows. DL scheme s are a ttracti ve due to their ﬂexibility , sinc e they u tilize training data to learn an appropriate over-complete basis customized for the da ta a t ha nd. Howe ver , the u se o f DL for modeling network data is well moti vated but s o far relati vely u nexplored. Prediction o f link counts. Su ppose for now that e ither a learnt, or , a suitable pre-sp eciﬁed dictionary B is avail able, and conside r predicting the missing link counts. Data-driven learning of dictionaries from historical data will be a ddresse d in the ensuing subse ction. Gi ven R and the link count measu rements y t , contemporary tools developed in the area of CS and semi-supervised learning can be used to form ˆ x t , which includes estimates for the missing L − S link counts [8], [23], [51]. The spatial regularity of the link c ounts is cap tured through the auxiliary weigh ted graph G with L vertices, o ne for eac h link in the network. The e dge weights for all edges in G are subs umed by the off-diagonal entries of the Gram matrix G = [ g i,j ] := RR ′ ∈ R L × L , wh ere ( · ) ′ denotes trans position. The off- diagon al entries g i,j count the numbe r of OD ﬂows tha t are common to both links i a nd j . Main diagonal entries of G count the number of O D ﬂows that us e the corresp onding links. IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 5 Gi ven a s napsh ot of inco mplete link c ounts y t during the operational pha se (where a suitable ba sis B is avail able), the sparse basis expansion coefﬁcient vector w t is es timated as ˆ w t := arg min w t k y t − S t Bw t k 2 2 + λ w k w t k 1 + λ g w ′ t B ′ LBw t (3) where L := diag ( G1 L ) − G den otes the Laplac ian matrix of G ; λ w , λ g > 0 are tunab le regularization parameters; and 1 L is the L × 1 vector of all on es. The criterion in (3) co nsists o f a LS error between the obs erved a nd postulated link co unts, along with two regularizers. The ℓ 1 -norm k w t k 1 encourag es sparsity in the coefﬁcient vector ˆ w t [23], [51]. W ith x t := [ x 1 ,t , . . . , x L,t ] ′ giv en by x t = Bw t , the Laplacian regularization can be explicitly written as w ′ t B ′ LBw t = (1 / 2) P L i =1 P L j =1 g i,j ( x i,t − x j,t ) 2 . It is thus app arent that w ′ t B ′ LBw t encourag es the link counts to be close if their correspon ding vertices are c onnec ted in G . Each su mmand is we ighted according to the numb er of OD ﬂ ows common to links i and j . T ypically a dopted for semi-supervise d learning, such a regularization term e ncourage s Bw t to lie on a smooth ma nifold a pproximated b y G , which c onstrains h ow the measured link counts relate to x t [8], [44 ]. It is also c ommon to use normalized variants of the Laplacian instead of L [32, p. 4 6]. The cost in (3) is con vex but n on-smooth, and cus tomized s olvers developed for ℓ 1 -norm regularized optimization can be emp loyed h ere as well, e.g ., [27]. Onc e ˆ w t is available, an e stimate of the full vector of link co unts is read ily obtained as ˆ x t := B ˆ w t . It is appa rent that the qua lity of the imputation de pends on the cho sen B , and DL from historical network da ta in T S is described next. Data-driven d ictionary lea rning. In its canon ical form, DL seek s a (typica lly fat) dictiona ry B so that training data T L := { x t } T t =1 are well approx imated as x t ≈ Bw t , t = 1 , . . . , T , for some spa rse vectors w t of expa nsion co efﬁcients [52]. Standard DL algorithms c annot, howev er , be directly app lied to learn B since they rely on the entire vector x t . T o learn the dictionary in the training pha se us ing incomple te link counts T S instead of T L , the idea is to capitalize on the structure in x t , of which G is an abstraction [26]. T o this en d, one can adop t a simil ar cost function as i n the operational ph ase [cf. (3)], yielding the data-driv en ba sis and the corres ponding spa rse represen tation { ˆ W , ˆ B } := arg min W , B : {k b q k 2 ≤ 1 } Q q =1 T X t =1  k y t − S t Bw t k 2 2 + λ w k w t k 1 + λ g w ′ t B ′ LBw t  (4) where ˆ W := [ ˆ w 1 , . . . , ˆ w T ] ∈ R Q × T . T he c onstraints {k b q k 2 ≤ 1 } Q q =1 remove the s caling ambigu ity in the products Bw t , a nd prevent the entries in B from growing unboun ded. Again, the combined regularization terms in (4) promote both spa rsity in w t through the ℓ 1 -norm, and smoothne ss a cross the entries o f Bw t via the Lap lacian L . The regularization parameters λ w and λ g are typically cross -validated [27], [51]. Although (4) is non-conv ex, a block c oordinate-des cent (BCD) solver still gu arantees c on ver gence to a stationary point [9]. The BCD updates in volv e s olving for B and W in a n alternating fashion, both do able efﬁciently via con vex programming [26]. Alternativ ely , the online DL algorithm in [36] offers enhanc ed scalability b y sequ entially proc essing the d ata in T S . The training a nd operational (prediction) ph ases are summarized in Fig. 1 , whe re C t ( B , w ) d enotes the t -th s ummand from the co st in (4). The explicit need for La placian regularization is appa rent from (4). Inde ed, if measureme nts from a certain link are not presen t in T S , the c orresponding row of B may still be estimated with reasona ble IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 6 min w t C t ( B [ k ] , w t ) min k b q k≤ 1 T X t =1 C t ( B , w t [ k ]) min w t C t ( ˆ B , w t ) ˆ x t = ˆ B ˆ w t W t [ k ] ˆ w t ˆ B B [ k + 1] y t , t > T { y t } T t =1 ˆ x t T R A I N I N G P H A S E O P E R A T I O NA L P H A S E Fig. 1. T raining and operational phases of the semi-supervised D L approach for link-trafﬁc cartography in [26]. accuracy b ecaus e o f the third term in C t ( B , w ) . O n top of that, it is b ecaus e o f Laplac ian regularization that the prediction p erformance degrades gracefully as the number of miss ing entries in y t increases ; see also Fig. 2. It is worth stressing tha t the time se ries { y t } ne ed not be s tationary o r even contiguous in time. The link-trafﬁc cartograph y a pproach described so far c an also be adapte d to accommo date time-v arying network topologies or routing matrices, us ing a time-depe ndent Laplac ian L t . A word of caution is d ue howe ver , since drastic chan ges in e ither L t or in the statistical prope rties of the un derlying OD ﬂows z t , will necess itate re-training B to attain s atisfactory performance. Finally , n ote that DL techniqu es incu r a co mplexity at lea st cub ic in the size of the network, and a re better suited for monitoring of ba ckbon e wide-area networks which are typically not very large. Next, a numerical tes t on link cou nt data from the Internet2 mea surement archiv e [1] is outlined. The data consists of l ink counts, sampled at 5 minute interv als, collected ov er se veral we eks. For the purpo ses of comparison, the training phase consisted of 2000 t ime slots, with a random subset of 50 links measured (out of L = 54 p er time slot. The performance of the learned dictionary i s then assess ed over the next T 0 = 2000 time slots. Each tes t vector y t is co nstructed by ran domly sele cting S entries of the full link cou nt vector x t . Th e tuning pa rameters are ch osen via cross-validation ( λ s = 0 . 1 a nd λ g = 10 − 5 ). Fig. 2 shows the normalized reconstruction error (NRE), ev aluated as ( LT 0 ) − 1 P T 0 t =1 k y t − ˆ x t k 2 for dif ferent values of Q and S . For c omparison, the prediction performance with a ﬁxed diffusion wa velet matrix [18] (instead of the data-trained d ictionary), as well as that o f the e ntropy-penalized LS method [54] is also shown. The latter approa ch solves a LS problem aug mented with a spec iﬁc e ntropy-based regularizer , that enco urages the trafﬁc volumes at the source/de stination pairs to be stochas tically independ ent. The DL-ba sed method markedly outpe rforms the comp eting ap proache s, e specia lly for low values of S . Furthermore, no te how performance degrade s gra cefully as S d ecrease s. Re markably , the p redictions a re close to the actual trafﬁc ev en when using only 30 link coun ts during the prediction p hase. IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 7 30 32 34 36 38 40 42 44 46 48 50 10 −1 10 0 S Av erage NRE Q = 100 Q = 80 Q = 60 Q = 54 DW Entr opy- p enali zed LS Fig. 2. Link-trafﬁc cartography of Internet2 data [1]. Comparison of NRE for different va lues of S [ 26]. B. Delay car tography via dyn amic network kriging Instead of link cou nts, cons ider now the problem of monitoring delays d p,t on a se t o f mu ltihop paths p ∈ P , that con nect P := |P | s ource-des tination p airs in an IP network. Path de lays a re important me trics required by network ope rators for a ssess ment, planning, and f ault diagno sis [17], [32 ], [45]. Howe ver , monitoring path metrics is ch allenging primarily b ecaus e P generally grows a s the s quare of the numbe r of node s in the n etwork. Therefore, at any time t delays can only b e meas ured on a su bset of paths S t ⊂ P , collected in the vec tor d s t . Base d on the pa rtial current and past meas urements H t := { d s τ } t τ = 1 , delay ca rtography amounts to pred icting the remaining path delays d ¯ s t := { d p,t } p ∈P \S . A promising approac h in this con text has be en the a pplication o f kr iging , a tool for spatial prediction popular in geos tatistics and en vironmental scien ces [21]. A network kriging sche me was developed in [17], which a dvocates prediction of network-wide path delays using meas urements on a ﬁxed subset of paths. The cla ss of linear predictors introduced therein leverages network topo logy information to model the covari anc e among p ath de lays. Building o n thes e ideas, a d ynamic network kriging approac h c apable of real-time spa tio-temporal delay p redictions was put forth in [45]. Speciﬁcally , a kriged Kalman ﬁlter is employed to explicitly c apture tempo ral variations due to queuing d elays, while retaining the top ology- based sp atial kriging pre dictor . T he per-path delay d p,t comprises several indep endent compo nents due to contributi ons from each intermediate link and router , an d is mod eled in [45] as d p,t = χ p,t + ν p,t + ǫ p,t . (5) The queuing delay χ p,t (collected in χ t ∈ R P ) depends on the trafﬁc, and exhibits spatio-temporal correlation, p eriodic be havior as well a s o ccasion al bursts, prompting the follo wing rando m walk mo del χ t = χ t − 1 + η t (6) where the d ri ving noise η t has zero mean and covari anc e ma trix C η . The seco nd term in (5), collected in the vector ν t , co mbines the proc essing, transmiss ion, a nd propa gation delays , and is temporally white but spatially co rrelated, owing to the overlap be tween paths. Similar to [17], the correlation be tween two IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 8 paths is modeled a s being proportional to the n umber of links they share, so that the c ov ariance matrix C ν = α UU ′ , where, u p,l = 1 if path p c ontains link l , a nd u p,l = 0 otherwise. Finally , the n oise term ǫ p,t is ze ro mea n i.i.d. with known variance σ 2 . De ﬁning the S × P pa th se lection matrix as in Sec . II-A, the mea surement e quation can be written as (introduce ν s t := S t ν t and like wise ǫ s t ) d s t = S t χ t + ν s t + ǫ s t . (7) In the abse nce of S t , the spa tio-temporal mode l in (6)-( 7) is widely employed in geostatistics, where χ t is ge nerally referred to as trend, an d ν t captures the random ﬂuctua tions around χ t ; se e e.g. [40]. Similar mo dels have been employed in [30] to des cribe the dynamics of wireless p ropagation chann els, and in [20] for s patio-temporal random ﬁeld e stimation. For a static se lection matrix, i.e. , S t := S for all t , the n etwork kriging ap proach [17] entails the followi ng two-step procedu re: (s1) treat ν s t as noise, and estimate χ t using the gen eralized LS criterion; and (s2) use the afores aid estimate to ﬁnd the linea r minimum me an-squa re error (LMMSE) es timator (deno ted b y E ∗ ) for ν s t , na mely E ∗ [ ν s t | χ t ] = SC ν S ′  SC ν S ′ + σ 2 I S  − 1 [ d s t − S t χ t ] . (8) Recently , a CS -based ap proach has also been reported for predic ting ne twork-wide performance met- rics [18]. For instanc e, diffusion wavelets were u tilized in [18] to o btain a compress ible represen tation of the de lays, and a ccoun t for spatial and temporal correlations. Although this allows for enh anced prediction accuracy relati ve to [17], it requires batch proc essing of measu rements which do es not scale we ll to lar ge networks for real-time ope ration. Pictorially , the performance of dif ferent a lgorithms ca n be assessed through the de lay map s s hown in Fig. 3. The s patio-temporal mo del se t forth ea rlier c an provide a better es timate o f χ t by efﬁciently p rocessing both prese nt and past mea surements jointly . T ow ards this end, a Ka lman ﬁlter is employed in [45], which at time t y ields the following upda te e quations ˆ χ t := E ∗ [ χ t |H t ] = ˆ χ t − 1 + K t ( d s t − S t ˆ χ t − 1 ) M t := E  ( χ t − ˆ χ t )( χ t − ˆ χ t ) ′  = ( I P − K t S t )( M t − 1 + C ν ) where K t := ( M t − 1 + C ν ) S ′ t  S t ( C ν + C η + M t − 1 ) S ′ t + σ 2 I S  − 1 is the so-termed Kalman gain. The ﬁnal p redictor , referred also as the kriged Ka lman ﬁlter (KKF), is given b y ˆ d ¯ s t := ¯ S t ˆ χ t + ¯ S t C ν S ′ t  S t C ν S ′ t + σ 2 I S  − 1 [ d s t − S t ˆ χ t ] and the pred iction error c ov ariance matrix is M ¯ s t := E   d ¯ s t − ˆ d ¯ s t   d ¯ s t − ˆ d ¯ s t  ′  = σ 2 I S + ¯ S t  ( M t − 1 + C ν + C η ) − 1 + 1 σ 2 S ′ t S t  − 1 ¯ S ′ t . The KKF framework for dynamic ne twork delay c artography ha s several a ttracti ve features. First, the KKF y ields the LMMSE estimate even for non-Gauss ian distrib uted noise . The Kalman ﬁlter step a lso allows for a τ -step pred iction g i ven by ˆ d t + τ = ˆ χ t , wh ich can be useful for preempti ve routing and conges tion c ontrol algorithms, as well as for extr apo lating miss ing measuremen ts. Sec ond, the KKF IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 9 Fig. 3. T rue and predicted delay map for 62 paths in t he Internet-2 dataset [1] ov er an interval of 100 minutes. ( T op-left) True delays; (T op-right) network kriging [17]; (Bottom-left) difussion wav elets [18]; and (Bottom-right) KK F [45]. Delays of sev eral paths change slightly around t = 80 , but t his change is only discernible from the delay predictions offe red by KKF . Delay maps summarize the network state, and are useful tools aiding operational decision in network monitoring and control stations [45]. framew ork provides a metric, namely the error covariance ma trix M ¯ s t , for c hoosing the paths to be measured at each t , which deﬁne the s election ma trix S t . In the presen t setting, it turns ou t that the D-optimal design metric log d et M ¯ s t is monotonic and supermodu lar with res pect to the set S [45]. Thus, a simple gree dy algo rithm with complexit y O ( P S 3 ) can be employed to ﬁn d the set of paths that a re a t least 63% optimal [42 ]; s ee Fig. 4. Conseq uently , the technique c an be rea dily applied to large- scale networks since the complexity increase s only linearly with P . The framework also admits related problem formulations s uch as se lecting the bes t se t of monitors (nodes) cap able of me asuring d elay on all its outgo ing paths . This represents a signiﬁcant de parture from state-of-the-art de lay prediction/tracking methods [17], [18 ], where path selec tion is heuristic. Note tha t training is required to estimate the mode l parameters C η and α . T o this en d, emp irical estimation techn iques similar to those in [41] c an be adap ted to the pres ent c ase. I I I . D Y N A M I C A N O M A L O G R A P H Y This section switches gea rs to anomalography , the problem of un veiling and mapping-out network traf ﬁc anomalies ac ross ﬂows and time gi ven link-le vel trafﬁc meas urements. This is a c rucial monitoring task tow ards enginee ring network trafﬁc, since ano malies can res ult in conge stion and limit QoS provisioning. IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 10 Fig. 4. Delay cartography using the NZ -AMP dataset [2], which includes path delays collected ov er a month for an IP network where P = 186 and N = 30 [45]. Normalized mean-square prediction error (NMSPE) as a function of S . (Left) Random path selection; and (Right) “Optimal” path selection, that is, using heuristic or approximate al gorithms speciﬁed for each algorithm. Observe further that the performance of the KKF improve s as the length of the training interval t T increases. A. T raf ﬁc mode ling Consider a back bone IP network where N and L den ote the se ts of no des (routers) and phy sical links of cardinality |N | = N and |L| = L , resp ectiv ely . T he op erational goal o f the network is to transport a s et of OD traf ﬁc ﬂows F (with |F | = F ) asso ciated with spe ciﬁc OD (ingress -egress router) pairs. Single- path routing is ado pted here, meaning a giv en ﬂow’ s trafﬁc is carried through multiple links connec ting the correspond ing source-de stination pair along a single path. According ly , over a discre te time horizon t ∈ [1 , T ] the mea sured link c ounts X := [ x l,t ] ∈ R L × T and (un observable) OD ﬂow trafﬁc matrix Z := [ z f ,t ] ∈ R F × T , a re thus rela ted throu gh X = RZ [cf. (2)]. Unless otherwise stated , the rou ting matrix R is ass umed given, since it c an be otherwise estimated using traceroute or topology inference algorithms [24]. It is also fat, as for backbo ne ne tworks the numbe r of OD ﬂows is much larger than the nu mber of ph ysical links ( F ≫ L ) . A cardina l property of the trafﬁc ma trix is no tew orthy . Common temporal pa tterns ac ross OD traf ﬁc ﬂows in addition to their almos t periodic behavior , re nder mo st ro ws (respectiv ely columns) o f the traf ﬁ c matrix linearly depe ndent, and thu s Z typically has low rank . This intuiti ve property ha s been extensively validated with real n etwork d ata; se e Fig. 5 and e.g. , [33]. It is not un common for some of the OD ﬂ ow rates to experienc e unexpec ted abrup t change s. Th ese so- termed trafﬁc volume anomalies a re typically due to (uninten tional) n etwork equipme nt misconﬁgu ration or outright failure, unforese en behaviors follo wing routing p olicy modiﬁcations, o r , cyb erattacks (e.g., DoS a ttacks) which aim at compromising the se rvices offered b y the n etwork [33], [55]. Let a f ,t denote the unk nown amo unt of anoma lous traf ﬁc in ﬂow f at time t . Explicitly accou nting for the pres ence of anomalous ﬂows, the mea sured trafﬁc carried by link l is then giv en by y l,t = P f ∈F r l,f ( z f ,t + a f ,t ) + ǫ l,t , t = 1 , ..., T , where the noise variables ǫ l,t capture me asuremen t errors a nd unmo deled dyna mics. T rafﬁc volume anomalies are (un signed) sudd en ch anges in OD ﬂow’ s traf ﬁc, a nd as su ch their effect can span mu ltiple links in the network. A key difﬁculty in unv eiling a nomalies from link-level meas urements only is that oftentimes, clearly discernible anomalous spikes in the ﬂow traf ﬁc can be masked through “destructive interference” of the superimpose d OD ﬂows [33]. An a dditional challenge stems from mis sing IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 11 Fig. 5. V olumes of 6 representativ e (out of 121 total) OD ﬂows, t aken from the operation of Internet-2 during a se ven-day period [1]. T emporal periodicities and correlations across ﬂ o ws are apparent. As expected, in t his case Z can be well approximated by a low-rank matrix, since it s normalized singular v alues decay rapidly to zero. link-le vel measureme nts y l,t , an u nav oidable operationa l reality affecting most trafﬁc engineering tasks that rely on (indirect) measureme nt o f traf ﬁc matrices [47], [56]. T o mod el miss ing link measureme nts, collect the tup les ( l, t ) associated with the available obse rvati ons y l,t in the set Ω ⊆ [1 , 2 , ..., L ] × [1 , 2 , ..., T ] . Introducing the matrices Y := [ y l,t ] , E := [ ǫ l,t ] ∈ R L × T , and A := [ a f ,t ] ∈ R F × T , the (pos sibly incomplete) s et of link-trafﬁc me asuremen ts c an b e expres sed in compa ct matrix form as P Ω ( Y ) = P Ω ( X + RA + E ) (9) where the samp ling operator P Ω ( . ) s ets the entries of its ma trix argument not in Ω to z ero, and keeps the rest uncha nged. Since the objective here is not to estimate the OD ﬂ ow trafﬁc matrix Z , (9) is expressed in terms of the nominal (anomaly-free) link-level trafﬁc rates X , which inherits the low-rank p roperty of Z . Ano malies in A a re expe cted to occur sp oradically over time, a nd last for a short time relativ e to the (possibly long) me asuremen t interval [1 , T ] . In addition, only a s mall fraction of the ﬂows is supp osed to be anoma lous at a any giv en time instant. This rend ers the a nomaly trafﬁc matrix A sparse acros s both rows (ﬂows) an d c olumns (time). B. Un veiling anomalies via sparsity an d low rank Gi ven link-level traf ﬁc mea surements P Ω ( Y ) adhe ring to (9), dyna mic anom alography is a critical network monitoring task that aims a t accu rately estimating the anomaly matrix A . As argued next, capitalizing on the sparsity of A an d the low-rank property of X wi ll be instrumental in achieving this ambitious g oal. From a network ca rtography vantage point, the resultant e stimated map ˆ A offers a depiction of the network’ s “he alth state” along both the ﬂow and time dimension s. If | ˆ a f ,t | > 0 , the f -th IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 12 ﬂow at time t is dee med anomalous , o therwise it is he althy . This joint e stimation-detection task not only allows o ne to identify the time of the anoma ly in addition to the affected ﬂows, but also to estimate its magnitude w hich hints to the importance of the anomaly event. By examining R the network o perator can immediately d etermine the links carrying the anomalous ﬂows. Subse quently , plann ed contingen cy measures in volving trafﬁc-engineering algo rithms c an be impleme nted to a ddress n etwork cong estion. The lo w-rank prope rty o f the traf ﬁc matrix Z (and X ) is at the heart of the seminal network anomaly detection approa ch in [33]. In the abs ence o f missing data, the method the rein a dopts principal compone nt analysis (PCA) to decompo se the link trafﬁc Y = [ y 1 , . . . , y T ] into n ominal an d anoma lous comp onents (also known as mo deled and residual traf ﬁc). For instan ce, if mos t of the variance in Y is captured by r ≪ min( L, T ) domina nt principal c omponen ts, then by cons truction the n ominal sub space S n is s panned by the r d ominant right singular vec tors of Y ′ (cf. the low rank assu mption). Na turally , the anomalous subspa ce S a correspond s to the orthogonal c omplement, i.e., S a := S ⊥ n . In the operational ph ase, an anomaly is declare d at time t when k P S a y t k 2 2 exceeds a gi ven threshold, whe re P S a is an orthogon al projection matrix onto S a . Subsequ ently , a s ingle anomalou s ﬂo w is iden tiﬁed after runn ing a gree dy algorithm, and an e stimate o f the amo unt of anoma lous trafﬁc is obtained a s a byproduc t. Likewise, the spatial approach within the n etwork anomography framew ork [55] forms the matrix P S a Y of link anomalies, thus exploiti ng the correlation between traf ﬁc ac ross dif ferent links. T emporal ap proaches obtain link a nomalies a s YT instead, wh ere T is a linear operator which judiciously ﬁlters the trafﬁc time series per link (implementing an “anomaly-pa ss” ﬁlter). Several c hoices for T are proposed to this end, based on dif ferent forms of temporal analysis including autoregressive integrated moving av erage (ARIMA), wa velets, a nd fast Fourier transform (FFT). Different from [33], the inference algorithm in [55] capitalizes on the spa rsity of A to es timate the anomaly map by e.g., solving in the spatial c ase ˆ A := arg min A k A k 1 , s. t. P S a Y = RA . Network ano mography algo rithms ca n be extended to a ccommoda te rou ting chang es acros s time; see [55] for further details an d c omprehens i ve performanc e tests. Recently , a natural estimator le veraging the low rank property of X and the sparsity of A was put forth in [38], which can be found at the crossroads of CS [23] and timely low- rank plus sparse matrix decompo sitions [10], [14]. The idea is to ﬁt the incomplete data P Ω ( Y ) to the mod el X + RA [cf. (9)] in the LS error s ense, as well as minimize the rank of X , and the n umber of non zero en tries of A meas ured b y its ℓ 0 -(pseudo) n orm. Unfortunately , albeit natural b oth rank and ℓ 0 -norm c riteria are in general NP-hard to o ptimize. T ypically , the nu clear norm k X k ∗ := P k σ k ( X ) ( σ k ( X ) denotes the k -th singular value of X ) and the ℓ 1 -norm k A k 1 are adopted as surrogate s [11], [25], since they are the clos est con vex approximants to ran k ( X ) and k A k 0 , respectively . Accordingly , one solves min { X , A } kP Ω ( Y − X − RA ) k 2 F + λ ∗ k X k ∗ + λ 1 k A k 1 (10) where λ ∗ , λ 1 ≥ 0 are rank- and spa rsity-controlling parameters. While a non-smo oth optimization problem, being con vex (10) is appea ling. An e f ﬁcien t accelerated proximal gradient algorithm with quan tiﬁable IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 13 iteration co mplexity was developed to un veil network anomalies [39 ]. Interestingly , (10) also off ers a cleanse d e stimate of the link-level traf ﬁc ˆ X , that cou ld be subse quently utilized for network tomo graphy tasks. In addition, (10) jointly exploits the spatio-temporal co rrelations in the link trafﬁc as w ell as the sparsity of the ano malies, throug h a n o ptimal single-shot estimation-detection proce dure that ha s b een shown to o utperform the algorithms in [33] and [55] (that decoup le the e stimation and detection ste ps). 0 200 400 0 50 100 0 2 4 6 Time index (t) Flow index (f) Anomaly amplitude Fig. 6. Un veiling anomalies from Internet-2 data [1]. (Left) R OC curve comparison between (10) and the PCA methods in [33], [55], for different values of r := dim ( S n ) . Lev eraging sparsity and low rank jointly leads to improve d performance. (Right) In red, the estimated anomaly map ˆ A obtained via (10) superimposed to the “true” anomalies shown in blue [37]. Before moving on to distrib uted implementations, it is ins tructi ve to elab orate on the generality of (10). When the re is no miss ing data and X = 0 L × T , one is left with an un der-determined sparse signal recovery problem typically encou ntered with CS; see e.g ., [23]. The de composition Y = X + A correspon ds to principal compo nent pursuit (PCP), also referred to as robust PCA [10 ], [14]. For the ide alized no ise-free setting ( E = 0 L × T ), sufﬁcient con ditions for exa ct recovery of the unknowns are avail able for both of the aforementioned spec ial case s [10 ], [11], [14]. Howe ver , the superpos ition o f a low-r ank plus a compressed sparse matrix in (9) further challenge s identiﬁability of { X , A } ; see [39] for early resu lts. Go ing back to the C S p aradigm, even wh en X is non zero one c ould envision a variant wh ere the measu rements are corrupted with correlated (lo w-rank) noise [15 ]. Last but no t leas t, when A = 0 F × T and Y is noisy , the recovery of X su bject to a rank cons traint is nothing but PCA – arguably , the workhorse of high- dimensional data ana lytics. Th is same formulation is adopted for low-rank ma trix co mpletion, to impute the miss ing en tries of a low-rank ma trix obs erved in n oise, i.e., P Ω ( Y ) = P Ω ( X + E ) [12]. C. In-network distributed p r o cessing Implementing (10) presu mes that network nodes con tinuously communicate their link traf ﬁc mea sure- ments to a cen tral mon itoring station, which u ses the ir aggregation in P Ω ( Y ) to unv eil anomalies. While for the most part this is the p rev a iling operational pa radigm ad opted in current n etworks, it is fair to say there are limitations as sociated with this architecture. For instance, fusing all this information may en tail excessive p rotocol overheads . Moreov er , minimizing the exchan ges of raw me asuremen ts may b e de sirable IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 14 to reduce unavoidable communica tion errors that translate to missing data. Solving (10) centrally raise s robustness c oncerns as well, s ince the ce ntral monitoring station repres ents an isolated point of failure. These reas ons motiv a te well devising fully-distributed iterati ve algorithms for dynamic ano malography , embedding the n etwork an omaly de tection func tionality to the routers. In a n utshell, pe r iteration n odes n ∈ N carry o ut simple c omputational tasks locally , relying on their own link count meas urements (a submatrix Y n within Y = [ Y ′ 1 , . . . , Y ′ N ] ′ correspond ing to router n ’ s links). Subseq uently , local estimates are reﬁn ed after exc hanging message s only with directly c onnec ted neighbors, which facilitates percolation of local information to the whole network. The end goal is for n etwork nodes to cons ent on a global map o f network anomalies ˆ A , an d attain (or at lea st come close to) the es timation performanc e of the centralized counterpart (10) w hich h as all d ata P Ω ( Y ) av ailable. Problem (10) is not amenab le for distrib uted implementation due to the non-se parable n uclear norm present in the cost fun ction. If an upper b ound rank ( ˆ X ) ≤ ρ is a priori av ailable [recall ˆ X is the estimated link-le vel trafﬁc ob tained via (10)], (10)’ s search spa ce is ef fectiv ely reduc ed and one can factorize the decision v ariable as X = PQ ′ , where P and Q are L × ρ and T × ρ matrices, respec ti vely . Again, it is p ossible to interpret the c olumns of X (viewed a s points in R L ) as belong ing to a low-rank nominal subspa ce S n , sp anned by the columns of P . Th e rows of Q are thus the projec tions of the columns of X onto S n . Next, cons ider the following alternati ve c haracterization o f the n uclear norm (see e.g . [46]) k X k ∗ := min { P , Q } 1 2  k P k 2 F + k Q k 2 F  , s. t. X = PQ ′ (11) where the optimization is over a ll po ssible bilinear factorizations of X , s o that the n umber of columns ρ of P and Q is also a variable. Lev eraging (11 ), the following reformulation o f (10) provides an important ﬁrst step towards obtaining a distrib uted an omalography algo rithm min { P , Q , A } N X n =1  kP Ω n ( Y n − P n Q ′ − R n A ) k 2 F + λ ∗ 2 N  N k P n k 2 F + k Q k 2 F  + λ 1 N k A k 1  (12) which is no n-con vex d ue to the bilinear terms P n Q ′ , and wh ere R := [ R ′ 1 , . . . , R ′ N ] ′ is partitioned into local routing tables av ailable per router n . Ad opting the sep arable F robenius-norm regularization in (12 ) comes with n o loss of optimality relative to (10), provided rank ( ˆ X ) ≤ ρ . By ﬁn ding the g lobal minimum of (12) [which co uld have cons iderably less variables than (10)], one c an recover the optimal solution of (10). Bu t since (12) is n on-con vex, it may have stationary p oints which need not be globa lly op timum. As asserted in [38, Prop. 1] h owe ver , if a s tationary point { ¯ P , ¯ Q , ¯ A } of (12) satisﬁes kP Ω ( Y − ¯ P ¯ Q ′ − ¯ A ) k < λ ∗ , then { ˆ X := ¯ P ¯ Q ′ , ˆ A := ¯ A } is the globally optimal so lution of (10). Note that for sufﬁciently small ρ the residual kP Ω ( Y − ¯ P ¯ Q ′ − ¯ A ) k bec omes lar ge, and the qualiﬁca tion inequality is violated [unles s λ ∗ is large eno ugh, in which case a sufﬁciently low-rank solution to (10) is expec ted]. Th e cond ition o n the residual implicitly e nforces rank ( ˆ X ) ≤ ρ , which is neces sary for the equiv alence betwe en (10) and (12). T o decomp ose the co st in (12 ), in which summa nds inside the sq uare brackets are coupled throu gh the global variables { Q , A } , introduce aux iliary c opies { Q n , A n } N n =1 representing loca l estimates of { Q , A } , IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 15 one pe r n ode n . Th ese loc al co pies along w ith c onsens us co nstraints y ield the distributed estimator min { P n , Q n , A n } N X n =1  kP Ω n ( Y n − P n Q ′ n − R n A n ) k 2 F + λ ∗ 2 N  N k P n k 2 F + k Q n k 2 F  + λ 1 N k A n k 1  (13) s. t. Q n = Q m , A n = A m m linked with n ∈ N which is eq uiv a lent to (12) provided the ne twork topo logy graph is conne cted. Even though co nsens us is a fortiori impos ed within neigh borhoods, it extends to the who le (con nected) network a nd local e stimates agree on the glob al solution of (12). E xploiting the separab le structure of (13), a g eneral framework for in-network spa rsity-regularized rank minimization was pu t forth in [38]. Speciﬁca lly , distrib uted iterations were obtaine d after adopting the alternating-direction method of multipliers (ADMM), an iterati ve La- grangian method well-suited for pa rallel proce ssing [9]. In a nutshe ll, local tas ks pe r iteration k = 1 , 2 , . . . entail so lving small unc onstrained qua dratic programs to reﬁne the no rmal sub space P n [ k ] , in a ddition to soft-thresholding ope rations to upda te the anoma ly maps A n [ k ] p er router . Ea ch iteration, rou ters excha nge their estimates { Q n [ k ] , A n [ k ] } only with directly connected neighb ors. This way the communication overhead remains affordable, and indepe ndent of the network size N . When employed to solve non-conv ex problems s uch as (13), so far ADMM o f fers n o conv ergence gua r - antees. Howev er , there is a mple experimental evidence in the literature that suppo rts empirical con ver gen ce of ADMM, es pecially whe n the non -con vex problem a t hand exhibits “ fa vorable” structure. For instan ce, (13) is a linearly cons trained bi-con vex problem with po tentially goo d con ver genc e prop erties – extensiv e numerical tests in [38] demo nstrate that this is ind eed the case. Wh ile es tablishing con vergence remains an open problem, one c an still p rove tha t upon conv ergence the distributed iterations attain c onsen sus and global o ptimality , offering the de sirable cen tralized pe rformance guara ntees [38]. D. Real-time anoma ly trackers Monitoring of large-scale IP networks n ecess itates massive recollection of data which far o utweigh the ability of mo dern computers to store and analyze them in rea l time. In ad dition, no nstationarities due to routing changes and miss ing data further challenge identiﬁcation of ano malies. In dynamic networks routing tables are constantly readjusted to effect trafﬁc load b alancing and av oid c onges tion ca used by e.g., trafﬁc anomalies. T o acco unt for slowly time-v aring routing tab les, let R t ∈ R L × F denote t he routing ma trix a t time t . In this dy namic setting, the partially observed link counts at time t adhere to P Ω t ( y t ) = P Ω t ( x t + R t a t + ǫ t ) , t = 1 , 2 , . . . , wh ere the link-level trafﬁc x t := R t z t . In ge neral, routing change s may alter a link load con siderably by e .g., routing trafﬁc completely away from a speciﬁc link. Therefore, ev en tho ugh the OD ﬂow vectors { z t } live in a low-dimensional su bspac e, the sa me ma y no t be true for the { x t } when the routing upda tes are major and frequent. In ba ckbone n etworks howev er , routing change s are sporadic relati ve to the time-sca le o f data acquisition used for n etwork monitoring tasks. For example, data collected from the operation of Internet-2 network reveals that only a few rows of R t change per wee k [1]. It is thus s afe to ass ume that { x t } still lies in a low-dim ens ional subsp ace, and exploit the spatio-temporal c orrelations of the ob servations to identify the ano malies in real-time. IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 16 On top of the previous arguments, in practice link measu rements are a cquired sequen tially in time, which mo ti vates upd ating previously obtained estimates rather than re-computing new on es from sc ratch each time a new datum beco mes av ailable. The goal is then to rec ursiv ely estimate { ˆ x t , ˆ a t } at time t from historical observations {P Ω τ ( y τ ) } t τ = 1 , naturally placing mo re importance o n rece nt meas urements. T o this end, one pos sible ad aptiv e cou nterpart to (12) is the expo nentially-weighted LS estimator fou nd by minimizing the e mpirical cos t [37 ] min { P , Q , A } t X τ = 1 β t − τ " kP Ω τ ( y τ − Pq τ − R τ a τ ) k 2 2 + λ ∗ 2 P t u =1 β t − u k P k 2 F + λ ∗ 2 k q τ k 2 2 + λ 1 k a τ k 1 # (14) in which 0 < β ≤ 1 is the so-termed for getting factor . When β < 1 da ta in the distant past are expon entially downweighted, which facilitates tracking network a nomalies in n onstationary en vironments. For static routing ( R t = R ) a nd inﬁnite me mory ( β = 1) , the formulation (14) coincides with the batch e stimator (12). A provably con vergent online algorithm for d ynamic a nomalography is developed in [37], ba sed on alternating minimization of (14); se e Fig. 7. Each time a new datum is acquired, anoma ly e stimates are formed via the Lasso [51 ], and the lo w-rank nominal trafﬁc su bspace is reﬁned u sing recursive L S. For situations we re reducing comp utational complexity is critical, an online stocha stic gradient a lgorithm based on Ne sterov’ s acc eleration techniqu e is developed a s we ll [37]. Fig. 7. Unv eiling anomalies in r eal time from Internet-2 data [ 1]. (Left) Measured link trafﬁc and cleansed estimates for three representati ve links; and (Right) three ro ws of the estimated anomaly map ˆ A corresponding to three anomalous ﬂ ows [37]. Algorithms in [37] a re clos ely related to timely robust s ubspa ce trackers, which aim at es timating a low- rank subsp ace P from grossly corrupted a nd possibly incomplete da ta, namely P Ω t ( y t ) = P Ω t ( Pq t + a t + ǫ t ) , t = 1 , 2 , . . . . In the a bsenc e of sp arse “outliers” { a t } ∞ t =1 , a n on line algorithm based on increme ntal gradient de scent on the Grassmannian ma nifold o f subspa ces was put forth in [4]. The s econd-orde r RL S- type algorithm in [16] extends the se minal projection approximation subs pace tracking (P AST ) a lgorithm to ha ndle missing data. When ou tliers are p resent, robust cou nterparts ca n be found in [15], [28]. Relative to all a forementioned works, the estimation problem (14 ) is more cha llenging due to the presence of the (compression) routing ma trix R t ; s ee [39] for funda mental ide ntiﬁability iss ues related to the model (9). IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 17 I V . B RO A D E N I N G T H E N E T W O R K A T L A S Additional c artography instances are outlined in this sec tion, including anoma lography from ﬂow measureme nts and n etwork distance prediction. T o exemplify the d ev elopme nt of s ensing infrastructure for situational awareness at the physical layer of wireles s CR networks, the notion of RF ca rtography is intro- duced as we ll. All thes e problems c an be ta ckled through SP methods sub sumed by (10), namely PC P [14], low- rank matrix completion [12 ], the Lasso [51], a nd non-parametric versions o f basis pursuit [7]. A. Un veiling anomalies fr o m ﬂo w data Since some networks nowadays collect OD ﬂow (not link-level) measuremen ts z f ,t + a f ,t for a t least part of t heir network ( using e.g., the Ne tﬂow protoco l), anomalies can b e detected using t empo ral decompo sition and s tandard chan ge-detection ap proaches per ﬂ ow . L ev eraging the low-r ank prop erty of the trafﬁc matrix and the sparsity of anomalies, anomalog raphy from OD ﬂ ow measureme nts was formulated as the PCP matrix deco mposition problem a nd solved c entrally in [3]; s ee also [38] for a distributed implementation of the PCP e stimator aimed at sc alable monitoring of networks. B. Network distance prediction End-to-end n etwork distance information is critical towards enhan cing QoS in Internet a pplications such as c ontent distributi on an d peer-to-peer ﬁle sharing sy stems. Clients n aturally prefer to establish connec tions with “clos er” n etwork res ources or servers that are likely to respond faster . The re are diff erent metrics to qu antify the distanc e b etween a pa ir of network nodes . The mo st common choices are deﬁn ed in terms of latency (one -way de lay a nd the s o-termed round-trip time) or router hop-counts. Un fortunately , either p robing or pass iv ely me asuring a ll pairwise distance s bec omes infeasible in large-scale ne tworks. Gi ven those fe w affordable distance measuremen ts, the problem of ne twork distance prediction is to impute (that is interpola te) the missing entries in a highly-incomplete matrix of end-to-end distances. If o ne collects the end-to-end latencies d i,j of s ource-sink pa irs ( i, j ) in a delay matrix D := [ d i,j ] ∈ R N × N , strong dep endenc ies amo ng path delays ren der D low rank; see e .g., [35] for an experimental validation with multiple datas ets. Intuiti vely , correlations among rows an d c olumns of D emerge bec ause nearby n odes (e.g., those belong ing to a c ommon subne twork) are con nected to every other node through paths with signiﬁca nt overlap, poss ibly sharing common bo ttleneck links. The low-rank property of D along with the distrib uted-proces sing requirements of lar ge -scale networks, moti vated decentralized matri x- factorization [35] and nuclear -norm minimi zation [38] algorithms for netw ork distance prediction. Di fferent from sc hemes based on Euclidean emb edding via multi-dimensional s caling [22], low-r ank mode ling does not require distances in D to be s ymmetric and sa tisfy the tr iangle inequality – prope rties t hat are o ftentimes violated b y n etwork-related distance s [34]. T o av oid the excess i ve overhead of active probing mechan isms, one can leverage network monitors that pas siv ely observe router hop-coun ts from trafﬁc traversing those monitored links; s ee e.g., [24] and references therein. Co llect these ho p-count mea surements in the matrix H := [ h m,n ] ∈ N M × N , whe re M is the nu mber of monitors, a nd N ( ≫ M ) the total h osts observed. Be cause mo nitor m o nly observes IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 18 a fraction of the total network trafﬁc, H will be dep leted with missing entries. Des pite typically having rank ( H ) = M , H consists of low-rank column b locks, eac h correspon ding to a sub network w ith ac cess to the Internet core through a sing le bo rder router . Reco gnizing this structure, a high -rank ma trix comp letion algorithm that performs s ubspa ce clustering o f incomple te hop -count data was put forth in [24], and shown to attain goo d pe rformance both in theo ry and practice . Dif ferent from the dy namic n etwork delay cartography problem considered in Sec. I I-B, network distance prediction ap proaches do not a ccoun t for the tempo ral variations in the delays, an d typically re ly on ba tch imputation o f the distance matrix of interest. The tec hniques us ed in Sec. II-B do not apply in this context e ither , since so me path d elays are never obse rved, and thus it is imposs ible to estimate the spatial covari anc e matrices (suc h a s C η and C ν ) c ompletely . C. RF ca rtography In the d omain of s pectrum s ensing for CR networks, RF c artography amou nts to cons tructing in a distrib uted fashion: m1) global p ower spec tral density (PSD) maps c apturing the distributi on of radiated power across s pace, time, and frequ ency; a nd m2) loc al ch annel g ain (CG) ma ps offering the propa gation medium per frequency from eac h node to any point in s pace. These maps enab le identiﬁcation of oppor- tunistically available spe ctrum b ands for re-use and h andoff operation; as well a s localization, transmit- power estimation, an d track ing of p rimary user activities. While the focus he re is on the construction of PSD ma ps, the interested reade r is referred to [29] for a tutorial treatment on CG c artography . A c ooperative approac h to RF cartograph y was introduced in [6], tha t builds on a ba sis expan sion model of the PSD map Φ( x , f ) across s pace x ∈ R 2 , and frequency f . Spatially-distributed CRs collec t smoothed pe riodogram sa mples o f the receiv ed sign al a t given sampling frequenc ies, base d on which they want to determine t he unkn own expansion coef ﬁcients. I ntroducing a virtual spatial grid of candidate source locations, the es timation task ca n be cas t a s a linea r LS p roblem with an augmented vector of un known parameters. Stil l, the problem complexity (or ef fecti ve de grees of freedom) can be controlled by capitalizing on two forms of sp arsity: the ﬁrst o ne introduced by the narrow-band nature of trans mit-PSDs relati ve to the b road swaths of usa ble spe ctrum; and the sec ond on e emerging from sparsely located active radios in the operationa l sp ace (due to the grid artifact). Nonz ero entries in the parameter vector s ought correspo nd to spatial loc ation-frequency ba nd pairs c orresponding to activ e transmiss ions. All in all, e stimating the PSD ma p an d locating the a ctiv e transmitters as a byproduct b oils down to a variable s election problem. This motiv ates well employment of the Lasso for distributed s parse linea r regression [38], an estimator also su bsumed by (10) when X = 0 L × T , T = 1 , and the regression matrix R ha s a s peciﬁc structure tha t depend s o n the c hosen ba ses a nd p ath-loss propagation model. Sparse total LS variants are also available to cope with unc ertainty in the regression matrix, arising due to inaccurate channe l es timation and grid-mismatch ef fects [29]. No nparametric spline-based PSD map estimators [7] hav e b een a lso shown e f fectiv e in capturing general propagation c haracteristics including both s hadowing an d fading; see also Fig. 8 for a n a ctual PS D atlas s panning 14 frequ ency sub -bands. IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 19 Fig. 8. Spline-based RF cartograp hy using the dataset [31]. (Left) Detailed ﬂoor plan schematic including the location of N = 166 sensing radios; (Right-bottom) original measurements spanning 14 fr equenc y sub-bands; (Ri ght-center) estimated maps ov er the surveye d area; and (Right-top) extrapo lated maps. The proposed estimator is capable of recov ering the 9 (out of 14 total) center frequencies that are being utilized for transmission. It accurately recov ers the po wer levels in the survey ed area with a smooth extrapolation to zones were there are no measurements, and suggests possible locations for the transmitters [7]. V . C O N C L U D I N G R E M A R K S In this tutorial, the conce pt of dy namic network cartography is introduc ed as a framework to c onstruct maps of the dynamica lly ev olving n etwork state, in an ef ﬁcien t a nd scalable ma nner even for large-scale heterogene ous n etworks. Focus is placed o n key task s geared to obtaining full ye t succinc t representation of network state metrics suc h as link trafﬁc and pa th delays , a s well as prompt a nd accu rate ide ntiﬁcation of network anoma lies from poss ibly partial and corrupted measuremen t data. Looking forward, the unc easing de mand for co ntinuous situational awareness c alls for innovati ve and lar ge-sc ale distributed SP algorithms, compleme nted by collabo rati ve and a daptive monitoring platforms to accomplish the objec ti ves o f n etwork manage ment and control. A venues where signiﬁcan t impact ca n be mad e include: i) judicious de sign o f critical cognition infrastructure to sense , learn, and ada pt to the en vironment where ne tworks operate; ii) development of sca lable tools for distilling, summarizing, a nd tracking the network s tate for the purpo se of network manag ement; iii) ensu ring robustness in the face of missing and gros sly-corrupted ne twork da ta, in addition to p ossibly ma licious a ttacks; and iv) developing eff ective network adaptation tec hniques bas ed on g lobal network inference, further impacting protocol designs, ne twork taxon omy , and categorization. R E F E R E N C E S [1] [Online]. A v ailable: http://www .internet2.edu [2] [Online]. A v ailable: http://erg.cs.waikato.ac.nz /amp/matrix.php/ipv4/latency /NZ [3] A. Abdelkeﬁ, Y . Jiang, W . W ang, A. Aslebo, and O. Kvittem, “Robust trafﬁc anomaly detection with principal component pursuit, ” in Proc. of t he ACM CoNEXT Student W orkshop , Philadelphia, P A, Nov . 2010. [4] L. Balzano, R. Nowak, and B. Recht, “Online identiﬁcation and tracking of subspaces from highly incomplete information, ” in Pr oc. of Allerton Conf. on Communication, Contr ol, and Computing , Monticello, IL, 2010. IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 20 [5] V . W . Bandara and A. P . Jayasumana, “Extracting baseline patterns in internet trafﬁc using robust principal components, ” in Pr oc. IE EE Intl. Conf. on Local Computer Netw . , Bonn, Germany , 2011. [6] J. A. Bazerque and G. B. Giannakis, “Distri buted spectrum sensing for cognitive radio networks by exploiting sparsity , ” IEEE T rans. Signal Pr ocess. , vol. 58, pp. 1847–1862, Mar . 2010. [7] J. A. Bazerque, G. Mateos, and G. B. Giannakis, “Group L asso on splines for spectrum cartography , ” IEEE Tr ans. Signal Pr ocess. , vol. 59, pp. 4648–4663 , Oct. 2011. [8] M. B elkin, P . Niyogi, and V . Sindhwani, “Manifold regularization: A geometric framework for learning from labeled and unlabeled examples, ” J. Mach. Learn. Res. , vol. 7, pp. 2399–24 34, D ec. 2006. [9] D. P . Bertsekas and J. N. Tsit siklis, P arallel and Distributed Computation: Numerical Methods . Athena-Scientiﬁc, 1999. [10] E. J. Candes, X. L i, Y . Ma, and J. Wright, “Robust principal component analysis?” Journ al of the ACM , vol. 58, no. 1, pp. 1–37, 2011. [11] E. J. Candes and T . T ao, “Decoding by linear programming, ” IEEE Tr ans. Info. Theory , vol. 51, no. 12, pp. 4203–42 15, 2005. [12] E. Candes and Y . Plan, “Matrix completion with noise, ” Pr oc. of the IEEE , vol. 98, pp. 925–936 , 2009. [13] R. Castro, M. Coates, G. L iang, R. Now ak, and B. Y u , “Network tomography : Recent de velopments, ” Statist. Sci. , vol. 19, no. 3, pp. 499–51 7, 2004. [14] V . Chandrasekaran, S. Sanghav i, P . R. Parrilo, and A. S . Willsky , “Rank-sparsity incoherence for matrix decomposition, ” SIAM J. Optim. , vol. 21, no. 2, pp. 572–596, 2011. [15] Q. Chenlu and N. V aswani, “Recursiv e sparse recov ery in large b ut correlated noise, ” in Proc. of Allerton Conf. on Communication, Contr ol, and Computing , Monticello, IL , 2011. [16] Y . Chi, Y . C. E ldar , and R. Calderbank, “Petrels: Subspace estimation and tracking from partial observations, ” in Proc. of IEEE International Confer ence on Acoustics, Speec h and Signal Pr ocessing , Kyoto, Japan, Mar . 2012. [17] D. Chua, E. Kolaczy k, and M. Cr ov ella, “Network kriging, ” IEEE J. Sel. Areas C ommun. , vol. 24, 2006. [18] M. Coates, Y . P ointurier , and M. Rabbat, “Compressed network monitoring for IP and all-optical networks, ” i n Proc. ACM Internet Measur ement Conf. , San Di ego, CA, Oct. 2007. [19] G. Conti, Security Data V isualization: Graphical T ec hniques for Network Analysis . No Starch P ress, 2007. [20] J. Cort ´ es, “Distributed Kriged Kalman ﬁl ter for spatial estimation, ” vol. 54, no. 12, pp. 2816–2827, Dec. 2009. [21] N. Cressie, “The origins of kriging, ” Mathematical Geology , vol. 22, no. 3, pp. 239–252, 1990. [22] F . Dabek, R. Cox, F . Kaashoek, and R. Morris, “V iv aldi: A decentralized network coordinate system, ” in Pr oc. of ACM SIGCOMM , Portland, OR, Aug. 2004. [23] D. L. Donoho, “Compressed sensing, ” IEEE Tr ans. Info. Theory , vol. 52, no. 4, pp. 1289 –1306, Apr . 2006. [24] B. E riksson, L. Balzano, and R. No wak, “High-rank matrix completion, ” in Pr oc. of Intl. Conf. on Artiﬁcial Intell . and Stat. , La Palma, Canary Islands, Apr . 2012. [25] M. Fazel, “Matrix rank minimization with applications, ” Ph.D. dissertation, El ectrical Eng. Dept., Stanford Univ ersity , 2002. [26] P . A. Forero, K. Rajawat, and G. B. Giannakis, “Semi-supervised dicti onary learning f or network-wide link l oad prediction, ” in Pr oc. Cognitive Information Pr ocessing W orkshop , Baiona, S pain, May 2012. [27] J. F riedman, T . Hastie, H. H ¨ oﬂing, and R. T ibshirani, “Pathwise coordinate optimization, ” The A nnals of A pplied Statistics , vol. 1, no. 2, pp. 302–332 , 2007. [28] J. He, L. Balzano, an d A. Szlam, “Incremental gradient on th e Grassmannian for on line foregrou nd and backgroun d separation in subsampled video, ” in Pro c. of IEEE Confer ence on Computer V isi on and P attern Recognition , Providence, Rhode Isl and, Jun. 2012. [29] S.-J. Kim, E . Dal l’Anese, J. A. Bazerque, K. Rajawat, and G. B. Giannakis, “ Adv ances in spectrum sensing and cross-layer design for cognitiv e radio networks, ” Elsevier , E-Refer ence Signal Pr ocessing , 2012. [30] S.-J. Kim, E. Dall’Anese, and G. B. Giannakis, “Cooperativ e spectrum sensing for cognitiv e radios using Kriged Kalman ﬁltering, ” IEEE Jrnl. Sel. T opics in Signal P r ocess. , vol. 5, no. 1, pp. 24–36, Feb . 2011. [31] T . Ki ng, S. Kop f, T . Haenselmann, C. Lubberg er, and W . Effelsber g, “CRA WD AD data set mannheim/compass (v . 2008- 04-11), ” Download ed from http://crawdad.cs.dartmouth.edu/mann heim/compass, Apr . 2008. IEEE SIGNAL PR OCESSING MAGAZINE (TO APPEAR) 21 [32] E. D. K olaczyk, Statistical Analysis of N etwork Data: Methods and Models . Springer , 2009. [33] A. Lakhina, M. Crovella, and C. D iot, “Diagnosing network-wide trafﬁc anomalies, ” in Pro c. of ACM SIGCOMM , Portland, OR, Aug. 2004. [34] S. Lee, Z . Zhang, S. Sahu, and D. Saha, “On suitability of Euclidean embedding Internet hosts, ” in SIGMET RICS , Saint Malo, France, Jun. 2006. [35] Y . L iao, P . Geurts, and G. Leduc, “Network distance prediction based on decentralized matrix factorization, ” in Proc . of IFIP Networking Conf. , Chennai, India, May 2010. [36] J. Mairal, J. Bach, J. Ponce, and G. Sapiro, “Online learning f or matrix factorization and sparse coding, ” Jrnl. of Machine Learning Resear ch , vol. 11, pp. 19–60, Jan. 2010. [37] M. Mardani, G. Mateos, and G. B . Giannakis, “Dynamic anomalography: Tracking netwo rk anomalies via sparsity and low rank, ” IEEE Jrnl. Sel. T opics in Signal Pr ocess. , 2012, see also arXiv:1208.40 43v1 [cs.NI]. [38] ——, “In-network sparsity-regularized rank minimization: Applications and algorithms, ” IEEE T rans. Signal Proces s. , 2012, see also arXiv:1203.150 7v1 [cs.MA]. [39] ——, “Recov ery of low-rank plus compressed sparse matrices with application to unv eiling trafﬁc anomalies, ” IEE E Tr ans. Info. Theory , 2012, see also arXiv:1204.6537 v1 [cs.IT ]. [40] K. V . Mardia, C. Goodall, E. J. Redfern, and F . J. Alonso, “The Kriged Kalman ﬁlter, ” T est , vol. 7, no. 2, pp. 217–285, Dec. 1998. [41] K. Myers and B. T apley , “ Adapti ve sequential estimation with unknown noise statistics, ” IEEE Tr ans. Automat. Contr . , vol. 21, no. 4, pp. 520–52 3, Aug. 1976. [42] G. L . Nemhauser , L. A. W olsey , and M. L. Fisher , “ An analysis of approximations for maximizing submodular set functions - I, ” Mathematical Pro gramming , no. 1, pp. 265–294, Dec. 1978. [43] T . O etiker . About rrdtool. [Online]. A vailable: http://people.ee.ethz.ch/ oetiker/webtools/rrdtool [44] R. Raina, A. Battle, H. Lee, B. Packer , and A. Y . Ng, “Self-taught learning: transfer learning from unlabeled data, ” in Pr oceedings of the 24th Intl. Conf. on Mach ine learning , ser . I CML ’07, 2007, pp. 759–766. [45] K. Rajawat, E. Dall’Anese, and G. B . Giannakis, “Dynamic network kriging, ” in Pr oc. IEEE Statisti cal Signal Pr ocessing W o rkshop , Ann Arbor , MI, Aug. 2012, see also arXiv :1204.5507v1 [cs.NI ]. [46] B. Recht and C. Re, “Parallel stochastic gradient algorithms for l arge-scale matrix completion, ” 2011, (submitted). [47] M. Roughan, “ A case study of the accuracy of SNMP measurements, ” Jo urnal of Electrical and Computer Engineering , vol. 2010, 2010, article ID 812979. [48] Y . Shavitt, X. Sun, A. W ool, and B. Y ener , “Computing the unmeasured: An algebraic approach to internet mapping, ” in Pr oc. IEEE Intl. Conf. on Computer Commun. , Anchorage, Alaska, Apr . 2001. [49] A. S oule, A. Lakhina, N. T aft, K. Papagiannaki, K. Salamatian, A. Nucci, M. Crov ella, and C. Diot, “Traf ﬁc matrices: Balancing measurements, inference and modeling, ” in Proc . ACM SIGMETRICS , Banff, AB, Jun. 2005. [50] A. Soule, K. S alamatian, A. Nucci, and N. T aft, “T rafﬁc matrix tracking using kalman ﬁlters, ” SIGMETRICS P erform. Eval. Rev . , vol. 33, no. 3, pp. 24–31, Dec. 2005. [51] R. Tibshiran i, “Regression shrinkage and selection via the Lasso, ” J ournal of the Royal Statistical Society . Series B (Methodolo gical) , vol. 58, no. 1, pp. 267–288, 1996. [52] I. T o ˇ si ´ c and P . Fr ossard, “Dictionary learning, ” IE EE Signal Pr ocess. Mag . , vol. 28, pp. 27–38, Mar . 2010. [53] X. Wu, K. Y u, and X. W ang, “On the growth of internet application ﬂ o ws: A complex network perspectiv e, ” in Proc. IEEE Intl. Conf. on Computer Commun. , Shangai, China, Jun. 2011. [54] Y . Zhang, M. Roughan, C. Lund, and D. L. Donoho, “Estimating point-to-point and point-to-multipoint t rafﬁc matrices: an information-theoretic approach, ” IEEE/ACM T ransa ctions on Networking , vol. 13, no. 5, pp. 947 – 960, Oct. 2005. [55] Y . Zhang, Z. Ge, A. Greenberg, and M. Roughan, “Network anomography , ” in Proc. ACM SIGCOM Conf. on Interen t Measur ements , Berkeley , CA, Oct. 2005. [56] Y . Zhang, M. R oughan, W . Willinger , and L. Q iu, “Spatio-temporal compressiv e sensing and internet trafﬁc matrices, ” in Pr oc. of ACM SIGCOM Conf. on Data C ommun. , New Y ork, USA, Oct. 2009.

Dynamic Network Cartography

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment