Rejoinder: Monitoring Networked Applications With Incremental Quantile Estimation

Statistic al Scienc e 2006, V ol. 21, No. 4, 485– 486 DOI: 10.1214 /0883423 06000000592 Main article DO I: 10.1214/0883 42306000000583 In th e Public Domain Rejoinder: Monito ring Net w o rk ed Applications With Incremental Quantile Estimation John M. Chamb ers, David A. James, Diane Lamb ert and Scott Vander Wiel 1. DIVERSITY OF MON ITORING GOALS AND CONSTRAINTS There are man y kinds of net w orks, eac h with man y t yp es of v ariables and monitoring goals. Ou r pap er addressed only one o f the coun tless p ossible co m- binations of n et w ork and monitoring goals. W e are grateful to the discussan ts for expanding o ur pap er b y pro viding insights in to other netw ork m onitoring problems that p resen t diﬀeren t c hallenges to stati s- ticians. Den b y , Lan d w ehr and Melo c he (DLM) describ e three net w ork monitoring problems, eac h with dif- feren t requirements for detection sp eed, communi- cation constrain ts and scalabilit y . T he V o ice o ver In ternet proto col (V oIP) applicatio n, for example, requires goo d scalabilit y , lo w o v erhead and quic k resp onses to problems that m anifest in a v ariet y of qualit y-of-service (QoS) metrics. Monitoring service- lev el agreemen ts, on the other hand, needs a prompt signal w h en path transit times b ecome to o long— a more fo cused goal than the V oIP problem. Our monitoring problem is most similar to DLM’s third example, monitoring call cente rs through ﬂexible re- p orting of historical reliabilit y and p erformance. These problems t ypically hav e a wid e v ariet y of an- alytic goals, some of whic h are not d etermined until an analyst b egins to drill throu gh high-lev el sum - maries int o data slices that s ho w u nusual b eha vior. Whereas DLM concentrat e on full-path QoS f or V oIP , La wrence, Mic hailidis and Nair (LMN) de- scrib e a QoS pr oblem in w hic h path measuremen ts are used to estimate lin k-lev el c haracteristics, pre- sumably for the purp ose of managing the netw ork, This is a n electronic r eprint of the o r iginal article published by the Institute of Mathematical Statistics in Statistic al S cienc e , 2 006, V ol. 21, No. 4, 485– 486 . This reprint diﬀer s from the orig inal in pagina tion and t yp ogr aphic detail. p erhap s b y mo difying r outing tables, adding key links or upgrading hard w are at no des. T o the list of monitoring pr oblems that w e and the discussan ts h av e describ ed, w e w ould add detec- tion of worm outbreaks (Bu, Chen, V ander Wiel and W oo, 2006 ), d y n amic thr esholding of err or count s (Lam b ert and Liu, 2006 ), fraud d etectio n (Cahill, Lam b ert, P inheiro and S un, 2002 ) and call b lo c k- ing ev en ts (Bec ker, Clark and Lam b ert, 1998 ). And there are certainly others that we are ov erlo oking. The v ariet y of applications raised by the review- ers and ou r own exp erience d emonstrate that th ere is no canonical statistical pr oblem in th e domain of monitoring netw orks for p erformance and reliabilit y . In our application, the soft w are archite cts imp osed a hard constrain t that the summary records had to ha v e a ﬁxed length and w ould b e transmitted at r eg- ular int erv als. Also, the requirement for a very small fo otprint stemmed from the need f or the agen t soft- w are to ru n on p ersonal computers that ma y b e old and slow an d ma y b e connected to the net w ork by a lo w b an d width link. While th e qu an tile estimates m ust b e reasonably acc ur ate, the gro wth plan for the business placed m uch more emphasis on ease of implement ation for n ew features and up graded arc h itecture to impr ov e scalabilit y . Therefore, im- pro v ement s to quantile accuracy h ad to b e made with relativ ely lo w develo pment (soft w are co d ing) cost. Th e s im p licit y of In cremen tal Q uan tiles (IQ) w as ob v ious ly attractiv e. 2. D A T A COMPRESSION DLM, LMN and Y u all discuss connections that the IQ algorithm has to method s for compressing and sketc hing data streams. Although compression w as not likel y to b e used in our application, it is critical for sensor n et works, for example, w here data transmission is m uch more costly . W e hop e that Y u and others will pu rsue statistical compr ession meth- o ds that allo w up dating summaries withou t decom- pression. 1 2 J. M. CHAMBERS, D. A . JAMES, D. LA MBER T AND S. V ANDER WI EL 3. SMOOTHING AND DETECTION PERF ORMANCE LMN advocate that, for monitoring pur p oses, “the pro cedur e should b e devised to estimate the cu r- ren t scenario” and then outline ho w exp onentially w eigh ted mo ving a v erages (EWMAs) could b e formed using either quant iles or cum ulativ e d istribution func- tions (CDFs). W e lik e the idea of extending IQ to compute EWMAs of CDFs an d , in fact, we prop osed this p ossibilit y to the p r o duct managers of the monitor- ing soft ware. Ho w ev er, th ey w ere not prepared to mo dify the meanin g of th e basic summaries com- puted b y agent s. O n e reason for their reluctance is that temp oral c hanges in p erformance c haracteris- tics r epresen t just on e typ e of anomaly that analysts w an t to unco ve r. O ther anomalies are top ographi- cally deﬁn ed. F or example, an outage might aﬀect only a sm all group of u sers o v er an extended p erio d of time. F urtherm ore, app ropriate EWMA weig ht parameters will diﬀer according to the goals of the analyst, and these goals could v ary wid ely . There- fore EWMA calculations w ould need to b e done in real time at the serv er in our application and n ot by the agen ts. Y u outlines a sc heme that w ould trac k the curr ent CDF using a m o ving window of d ata, pro cessed in blo c ks that are s m all enough f or within-blo c k sta- tionarit y to b e a r easonable assum p tion. A mo vin g windo w of blo c ks w ould not b e diﬃcult to imp le- men t, although EWMAs would ac hiev e m uch the same goal with less complexit y b ecause an EWMA sc heme w ould use only the previous qu an tile esti- mates an d the new data in D and wo uld hav e the same lev el of complexit y as th e nominal IQ algo- rithm. DLM, LMN and Y u all w ere dissatisﬁed that w e did not explore p erformance of the monitoring scheme in terms of false alarm rates and d etection times. Al- though we agree that go o d detection p er f ormance is, in general, an im p ortan t design goal, the p ortion of the soft wa re suite that uses I Q do es not attempt to p ro du ce real-time alarms of anomalous ev ent s; that asp ect of monitoring is handled by a compan- ion system th at analyzes net w ork eve nt data. Nev- ertheless, th e pro cedu r e that DLM sketc h in w hic h an agen t emits a summary record wh en triggered by a lo w p -v alue for testing the hyp othesis of a c hange in distrib u tion is a reasonable appr oac h to the on- line detection pr oblem if changes are large enough to b e detected b y individual agen ts. The problem is more diﬃ cu lt, ho wev er, if the signal for a prob - lem is b uried in noisy data and distributed o v er man y agen ts. In th is case, tw o-wa y comm unication b et wee n th e agen ts and the server could b e v alu- able. F urthermore, if the goal is d ynamic resp onse to an emerging problem, then th e in formation b e- ing shared will n eed to extend b eyond evidence of a c h ange and in clude the charact er of the c h ange as w ell. 4. A CCURACY AND EFFICIENCY LMN explain that the computational cost of IQ is O ( N log ( N )) or ev en u p to O ( N 2 ). It is imp or- tan t to clarify th at N is th e ﬁxed length of th e D - buﬀer and therefore the sorting op eration repr esen ts a ﬁxed amount of ov erhead for eac h r ound of the IQ algorithm. IQ is linear in terms of the total n um- b er of data elemen ts that are pr o cessed thr ough the algorithm. T he computational complexit y of sort- ing comes into p la y when considerin g the pr ice of impro ving the accuracy by gro wing D , bu t in pr ac- tice mo dern sorting algorithms are extremely eﬃ- cien t ev en f or large, b ut memory-resid ent, blo cks of data. LMN d iscuss ε -appr o x im ate algorithms that ap- p ear in the compu ter science literature. These guar- an tee that an estimate is within ε of the correct quan tile lev el; for example, ε = 0 . 01 assures that the p = 0 . 98 quant ile estimate lies b et ween th e ac- tual 0.97 and 0.99 sample quant iles. Accuracy that is uniform in p is appropriate for constructing appro x- imate equidepth h istograms but tail quan tiles need high p -r esolution that seems diﬃcult to ac hieve with ε -approxi mate algorithms. W e would like to see the ε -approxi mate algorithms extended to pr o v id e accu- racy that impro v es in the tails. F or example, if an algorithm r ep orts the q th s ample quantile as an esti- mate of the p th sample qu an tile, then we w ould lik e a guarantee that th e logit v alues of p and q diﬀer by less than ε . I Q has n o such gu arantee, b ut neither do es any other algorithm, as far as we are aw are. All the discussants ha v e raised prob lems that re- main to b e addressed. W e thank them and the Ed i- tor for helping to raise aw areness of the man y statis- tical issues that remain to b e resolv ed in the con text of net wo rk mon itoring. REFERENCES Becker, R., Clark, L. and Lamber t, D. (1998). Even ts de- ﬁned by duration an d severit y , with an application to net - REPL Y 3 w ork reliabilit y (with discussion). T e chnometrics 40 177– 194. Bu, T., Chen , A., V ander Wi el, S. and Woo, T. (2006). Design and ev aluation of a fast and robust worm detection algorithm. I n Pr o c. IEEE INF O COM 2006 . IEEE Press, Piscata w ay , NJ. Cahill, M., Lamber t, D., Pinheiro, J. and Sun, D. (2002). Detecting fraud in th e real w orld. In Handb o ok of Massive Datasets 911–929 . K luw er, Dordrech t. Lamber t, D. and Liu, C. (2006). Ad aptive thresholds: Mon- itoring streams of n etw ork counts. J. Amer. Statist. Asso c. 101 78–88. MR2252435

Rejoinder: Monitoring Networked Applications With Incremental Quantile Estimation

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment