A Learning-Based Approach to Reactive Security

Despite the conventional wisdom that proactive security is superior to reactive security, we show that reactive security can be competitive with proactive security as long as the reactive defender learns from past attacks instead of myopically overre…

Authors: Adam Barth, Benjamin I. P. Rubinstein, Mukund Sundararajan

A Learning-Based Approac h to Reactiv e Securit y A dam Barth 1 , Benjamin I. P . Rubinstein 1 , Mukund Sundarara jan 3 , John C. Mitc hel l 4 , Da wn Song 1 , and P eter L. Bartlett 1,2 1 Computer Science Division 2 Departmen t of S tatistics, UC Berk eley 3 Go ogle Inc., Moun tain View, CA 4 Departmen t of Computer Science, Stanford Uni v ersit y Abstract. Despite the con v en tional wisdom that proactiv e securit y is sup erior to reactiv e securit y , w e sho w that reactiv e securit y can b e com- p etitiv e with proactiv e securit y as long as the reactiv e defender learns from past attac ks instead of m y opically o v erreacting t o the last attac k. Our game-theoretic mo del follo w s common practice in the s ecurit y lit- erature b y making w orst-case assumptions ab out the attac k er: w e gran t the attac k er com p lete kno wledge of the defender's strategy and do not require the attac k er to act rationally . In th is mo del, w e b ound t he com- p etitiv e ratio b et w een a reactiv e defense algorithm (wh ic h is in spired b y online learning theory) and the b est  xed proactiv e defense. A ddit ionally , w e sho w that , unlik e p r oact iv e defenses, th is reactiv e st r at eg y is robust to a lac k of in f ormation ab out the attac k er's incen tiv es and kno wledge. 1 In tr o duction Man y en terpr i ses e mplo y a Chief Information Securit y Ocer (CISO) to man- age the en terprise's information securit y risks. T ypically , an en te r prise has man y more securit y vulnerabil ities than it can realistically repair. Instead of declaring the en terprise insecure un til ev ery last vulnerabili t y is plugged, CISOs t ypi- cally p erform a cost-b enet analysis to iden tify whic h risks to address, but what constitutes an eectiv e CISO strategy? The con v en tional wisdom [28,21] is that CISOs ough t to adopt a forw ard-lo oking proac t i v e approac h to mitigating se- curit y r i sk b y examining t he en terprise for v ul nerabilities that migh t b e exploited in the future. A dv o cates of proactiv e sec ur i t y often e q uate reactiv e securit y with m y opic bug-c hasing and consider it ineectiv e. W e establish sucien t c onditions for when reacting str ate gic al ly to attac ks is as eectiv e in discouraging attac k ers. W e study the ecacy of reactiv e strategies in an economic mo del of the CISO's securit y cost-b enet trade-os. Unlik e pr e v i ously prop osed e conomic mo dels of securit y (see Section 7), w e do not assume the attac k er acts according to a xed probabilit y di s tribution. Instead, w e conside r a game-theoretic mo del with a s trateg ic attac k er who resp onds to t he defender's strategy . As is standard in the securit y literature, w e mak e w orst-case assumptions ab out the attac k er. F or example, w e gran t t he attac k er complete kno wledge of th e defende r ' s strategy and do not require the attac k er to act rationally . F urther, w e mak e conserv ativ e assumptions ab out the reactiv e defender's kno wledge and do not assume the 2 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett defender kno ws all the vulnerabilities in the sy stem or the attac k e r's incen tiv es. Ho w ev er, w e do assume that t he defender can observ e the attac k er's past actions, for example via an in trusion detection sy stem or user metrics [3]. In our mo del, w e nd that t w o prop erties are sucien t for a reactiv e strategy to p erform as w e ll as the b est proactiv e strategies. First, no single attac k is catastrophic, meaning the defender can surviv e a n um b e r of attac ks. This is consisten t with situations where in t rusions (that, sa y , steal credit card n um b ers) are regrettable but not business-ending. Second, the defender's bu dge t is liquid , meaning the defender can re-allo cate resources without p enalt y . F or e x am p l e, a CISO ca n reassign mem b ers of the sec ur i t y team from managi ng rew all rules to impro ving database access con trols at relativ ely lo w switc hing costs. Because our mo del abstracts man y v ul nerabilities in to a single graph edge, w e view the act of defense as increasing the attac k er's c ost for moun ting an attac k instead of prev en ting the attac k (e. g., b y patc hing a single bug). By making this assumption, w e c ho ose not t o study the tactical patc h-b y-patc h i n teraction of the attac k er and defender. Instead, w e mo del en terprise securit y at a more abstract lev el appropriate for the CISO. F or example, the CISO migh t allo cate a p ortion of his or her budget to engage a c ons ul t a n c y , suc h as WhiteHat or iSEC P artners, to nd and x cross-site scri p ti n g in a particular w eb application or to require that emplo y e es use SecurID tok ens dur i ng authen tication. W e mak e the tec hnical assumption that attac k er costs are linearly dep ende n t on defense in v estmen ts lo call y . This assumption do es not reect patc h-b y-patc h in teracti on, whic h w ould b e b etter represen ted b y a step function (with the step placed at the cost to deplo y the patc h). Instead, this assumption reects the CISO's higher- lev el viewp oin t where the staircase of summed step functions fades in to a slop e. W e ev aluate the defender's strategy b y measuring the attac k er's cum ulativ e return-on-in v estmen t, the r eturn-on-attack (R O A), whic h has b een prop osed previously [8]. By studying this metric, w e fo cus on defenders who seek to cut o the attac k er's o xygen, that is to reduce the attac k er's incen tiv es for attac k- ing the en terprise. W e do not dist i nguish b et w een succ essf ul  and unsuccessful attac ks. Instead, w e compare the pa y o the attac k er receiv es from his or her ne- farious deeds with the cost of p erforming said de eds. W e imagine that sucien tly disincen tivized attac k e r s will seek alternativ es, suc h as attac king a dieren t or- ganization or starting a legitimate business. In our main result, w e sho w sucien t conditions for a learning-based reactiv e strategy to b e comp etitiv e with the b est xed proac t i v e defense in the sense that the comp etitiv e r a t i o b et w een the reactiv e R O A and the proactiv e R O A is at most 1 +  , for all  > 0 , pro vided the game lasts sucien tly m an y rounds (at least Ω (1 / ) ). T o pro v e our theorems, w e dra w on tec hniques from the online learning literature. W e extend these tec hn i q ue s to the case where the learner do es not kno w all the game matrix ro ws a priori , letting us analyze situations where the defe n de r do es not kno w all the vulnerabilities i n adv ance. Although our main results are in a graph-based mo del with a single attac k er, our results generalize to a mo del based on Horn clauses w i th m ultiple attac k ers. Our results A Learni ng-Based Approac h to Reactiv e Securit y 3 Fig. 1.1. An attac k graph represen ting an en terprise data cen ter. are also robust to switc hing from R O A to attac k er prot and to allo wing the proactiv e defender to revise the defense allo cation a x e d n um b er of times. Although m y opic bug c hasing is most lik ely an ineectiv e reactiv e strategy , w e nd that in some situations a str ate gic reactiv e strategy is as eectiv e as the opti- mal xed proactiv e defense. In fact, w e nd that the natural strategy of gradually reinforcing attac k ed edges b y shifting budget from unattac k ed edges learns the attac k er's i ncen tiv es and constructs an eectiv e defense. Suc h a strategic reactiv e strategy is b oth easier to implemen t than a proactiv e strategyb ecause it do es not presume that the defender kno ws the attac k er's in ten t and capabilitiesa nd is less w asteful than a proactiv e strategy b ecause the de f e nder do es not exp e nd budget on attac k s that do not actually o ccur. Based on our re s ul t s, w e encourage CISOs to question the assumption that proactiv e ri s k managemen t is inheren tly sup erior to reactiv e risk managemen t. Organization. Section 2 formalizes our mo del. Section 3 sho ws that p erimeter defense and defense-in-depth arise naturally in our mo del. Section 4 presen ts our main results b ounding the com p etitiv e ratio of reac tiv e v ersus pr oa ctiv e defense strategies. Section 5 outlines scenarios in whic h reactiv e securit y out-p erforms proactiv e securit y . Section 6 generali zes our results to Horn clauses and m ultipl e attac k ers. Section 7 relates related w ork. Section 8 concludes. 2 F ormal Mo del In this section, w e presen t a game-theoretic mo del of attac k and defense. Unlik e traditional bug-l ev el attac k graphs, our m o del is mean t to capture a managerial p ersp ec tiv e on en terprise securit y . The mo del is somewhat general in the sense that attac k graphs can represen t a n um b er of concrete situations, including a net w ork (see Figure 1.1), comp onen ts in a complex soft w are system [9], or an In ternet F raud Battleeld [13]. System. W e mo del a system using a directed graph ( V , E ) , whic h denes the game b et w een an attac k er and a defender. Eac h v ertex v ∈ V in the graph represen ts a state of the system. Eac h edge e ∈ E represen ts a state transition the attac k er can induce. F or example, a v ertex migh t represen t whether a particular mac hine in a net w ork has b een compromised b y an attac k er. An e dge from one 4 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett mac hine to another migh t represen t that an attac k er who has compromised the rst mac hine migh t b e able to compromise th e second mac hine b ecause the t w o are connected b y a net w ork. Alternativ ely , the v e r ti ces migh t represen t dieren t comp onen ts in a soft w are system. An edge migh t represen t that an attac k er sending input to the rst comp onen t can send input to the second. In attac king the system, the attac k er selects a path in the graph that b e- gins with a designated start vertex s . Our results hold in more general mo dels (e.g., based on Horn clauses), but w e defer discussing suc h gene r al izations un til Section 6. W e think of the attac k as driving the s ystem through the series of state transitions indicated b y the edges included in the path. In the net w orking example in Figure 1.1, an attac k er migh t rst compromise a fron t-end serv e r and then lev erage the serv er's c onnectivit y to the bac k-end database serv er to stea l credit card n um b ers from the database. Incen tiv es and Rew ards. A ttac k ers resp ond to incen tiv es. F or example, at- tac k ers co mpr o mise mac hines and form b otnets b ecause they mak e money from spam [20] or re n t the b otnet to others [32]. Other attac k ers steal credit card n um b ers b ecause credit card n um b ers ha v e monetary v alue [10]. W e mo del the attac k er's incen tiv es b y attac hing a non-negativ e r ewar d to eac h v ertex. These rew ards are the utilit y the attac k er deriv es from driving the system in to the state represen ted b y the v ertex. F or example, compromisi n g the database serv er migh t ha v e a sizabl e re w ard b ecause the database serv er con tains easily monetizable credit card n um b ers. W e assume the s tart v ertex has zero rew ard, forcing the attac k er to undertak e some action b efore earning utilit y . Whenev er the attac k er moun ts an attac k, the attac k er receiv es a p ayo equal to the sum of the rew ards of the v ertices visited in the att a c k path: pa yoff ( a ) = P v ∈ a rew ard( a ) . In the example from Figure 1.1, if an attac k er compromises b oth a fron t-end serv er and the database serv er, the attac k er receiv es b oth rew ards . A ttac k Surface and Cost. The defender has a xed defense budget B > 0 , whic h the defender can divide among the edges in the graph according t o a defense al lo c ation d : for al l e ∈ E , d ( e ) ≥ 0 and P e ∈ E d ( e ) ≤ B . The defender's allo cation of budget to v arious edge s corresp onds to the de- cisions made b y the Chief Information Se curit y Ocer (CISO) ab out where to allo cate the en terprise's securit y resources. F or example, the CISO migh t a llo- cate organizational headcoun t to fuzzing en terprise w eb applications for XSS vulnerabilities. These kinds of in v estmen ts are con tin uous in the sense that the CISO ca n allo cate 1 / 4 of a full-time emplo y ee to w orrying ab out XSS. W e denote the set of feasible allo cations of budget B on edge set E b y D B ,E . By defending an e d g e, the defender mak es it more dicult for the attac k er to use that edge in an attac k. Eac h unit of bu dge t the defender allo cates to an edge raises the cost that the attac k er m ust pa y t o use that edge in an att a c k. Eac h e dge has an attack surfac e [19] w that represen ts the dicult y in defending against that state transition. F or example, a serv er that runs b oth Apac he and Sendmail has a larger attac k s urface than one that runs only Apac he b ecause defending the  rst serv er i s more dicult than the second. F ormally , the attac k er m ust pa y the follo wing c ost to tra v erse the edge: cost( a, d ) = P e ∈ a d ( e ) /w ( e ) . A Learni ng-Based Approac h to Reactiv e Securit y 5 Allo cati ng defense budget to an edge do es not reduce an edge's attac k surface. F or example, consider defending a hallw a y with bric ks. The wider the hallw a y (the larger the attac k surface), the more bric ks (budget allo cation) r e q ui r e d to build a w all of a certain heigh t (the cost to the attac k er). In this form ulation, the function mapping th e defe nd e r's budget allo cation to attac k er cost is li near, prev en ting the defender from ev er fully d e fending an edge. Our use of a linear functi on reects a lev el of abstraction more appropriate to a CISO who can nev er fully defend assets, whic h w e ju stify b y observing that the rate of vulnerabilit y disco v ery in a particular pie ce of soft w are is roughly constan t [29]. A t a lo w er lev el of detail, w e migh t replace this function with a step function, indicating that the defender can patc h a vulnerabilit y b y allo cating a threshold amoun t of budget. Ob jectiv e. T o ev aluate defense strategies, w e measure the attac k er's incen tiv e for attac king using the r eturn-on-attack (R O A) [8], whic h w e dene as follo ws: R OA( a, d ) = pa yoff ( a ) cost( a, d ) W e use this metric for ev aluating defense strategy b ecause w e b eli ev e that if the defender lo w ers the R O A sucien tly , the attac k er will b e discouraged from attac king the system and will nd other uses for his or her capital or industry . F or example, the attac k er m igh t decide to attac k another system. Anal ogous results hold if w e quan tify the attac k er's incen tiv es in terms of prot (e.g., with profit( a, d ) = pay off ( a ) − cost( a, d ) ), but w e fo cus on R O A for simplicit y . A purely rational attac k er will moun t attac ks that maximi ze R O A. Ho w ev er, a real attac k er migh t not maximize R O A. F or example, the attac k er mi gh t not ha v e complete kno wledge of the system or its defense. W e strengthen our results b y c ons i dering all attac ks, not just those that maximize R O A. Proactiv e Securit y . W e ev aluate our learning-based r e activ e approac h b y c om- paring it against a pr o active approac h to risk managemen t in whic h the defender carefully examines the sys te m and constructs a defense in order to fend o future attac ks. W e st re n g t he n this b enc hmark b y pro viding the proactiv e defender com- plete kno wledge ab out the system, but w e require that the defender comm it to a xed strategy . T o str e ngth e n our re s ults, w e state our main result in terms of al l suc h proacti v e defenders. In particular, this class of defenders includes the r atio- nal pr o active defender who em p l o ys a defense allo cation th a t minimizes the max- im um R O A the attac k er can extract from the sy stem : argmin d max a R OA( a, d ) . 3 Case Stud ie s In this section, w e describ e i ns tanc es of our mo del to build the reader's in tu- ition. These examples illustrate that some fam iliar securit y concepts, including p erimeter defense and defense in depth, arise naturally as optimal defenses in our mo del. These defenses can b e constructed either b y r a t i onal proactiv e attac k ers or con v erged to b y a learning-based reactiv e defense. 6 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett w :1 w :1/9 Internet Front End Database reward:1 reward:9 s Fig. 3.1. A ttac k graph represen ting a simplie d data cen ter net w ork. P erimeter Defense. Consider a system in whic h the attac k er's rew ard is non- zero at e x ac tly one v ertex, t . F or example, i n a medical system, the attac k er's rew ard for obtaining electronic medical records migh t w ell dominate the v alue of other attac k targets suc h as emplo y ees' v acation calendars. In suc h a system, a rational attac k er will select the minim u m -cost path from the start v ertex s to the v aluable v ertex t . The optimal defense limits the attac k er's R O A b y maximizing the cost of the m inim um s - t path. The algorithm for constructing this defense is straigh tf o r w ard [7]: 1. Let C b e the minim um w eigh t s - t cut in ( V , E , w ) . 2. Select the follo wing defe n se : d ( e ) = ( B w ( e ) / Z if e ∈ C 0 otherwise , where Z = X e ∈ C w ( e ) . Notice that this algorithm constructs a p erimeter defense : the defender al lo cates the en tire defense bud g et to a single cut in the graph. Essen tially , the defender spreads the de f e ns e budget o v er the attac k surface of the cut. By c ho osing the minim um-w eigh t cut, the defender is c ho osing to d e fend the smallest attac k surface that separates the start v ertex from t he target v ertex. Real de f e nders use similar p e rimeter defenses, for example, when they install a rew all at the b oundary b et w een their organization and the In ternet b ecause the net w ork's p erimeter is m uc h smaller than its in terior. Defense in Depth. Man y exp erts in securit y practice rec ommend that defend- ers emplo y defense in depth. Defense in d e pth rises naturally in our mo del as an optimal defense for some systems. Consider, for example, the system depicted in Figure 3 .1. This attac k graph is a s i mplied v ersion of t he data cen ter ne t - w ork depicted in Figure 1.1. Al t hough the attac k er rec eiv es the largest rew ard for compromising the bac k-end database serv er, the attac k er also rec eiv es some rew ard for compromising the fron t-end w eb serv er. Moreo v er, the fron t-end w eb serv er has a larger attac k surface than the bac k-end database serv e r b ecause the fron t-end serv er exp oses a more complex in terface (an en tire en terprise w eb application), wherea s the database serv er exp oses only a simple SQL in terface. Allo cati ng defense budget to the left-mo s t edge represen ts try i n g to protect sen- sitiv e database informati on with a complex w eb appli cation rew all instead of database access con trol lists (i.e., p ossible, but economically inecien t). The optimal defense against a rational attac k er is to a llo cate half of the defense budget to the left-most e d g e and half of the budget to the righ t-most edge, limiting the a t tac k er to a R O A of unit y . Shifting the en tire budget to A Learni ng-Based Approac h to Reactiv e Securit y 7 the righ t-most edge (i.e., defe n di ng only the database) is disastrous b ecause the attac k er will simply attac k the fron t-end at zero cost, ac hieving an un b ounded R O A . Shifting the en tire budget to the left-most edge is also problema t i c b ecause the attac k er will attac k t he database (ac hieving an R O A of 5 ). 4 Reactiv e Securit y T o analyze reactiv e securit y , w e mo del the attac k er and defender as pla ying an iterativ e game, alternating mo v es. First, the defender sel ects a defense, and then the attac k er selects an attac k. W e presen t a learning-based reactiv e defense strategy that is obli v io us to v ertex rew ards and to edges that ha v e not y et b een used in attac ks. W e pro v e a t he orem b ounding the comp etitiv e ratio b et w een this reactiv e strategy and the b est proactiv e defense via a serie s of reductions to results from the online learning theory li teratur e . Other applications of this literature include managing sto c k p ortfolios [26], pla y i n g zero-sum games [12], and b o osting other mac hine learning heuristics [11]. Although w e pro vide a few tec hnical extensions, our mai n con tribution comes from applying results from online learning to risk managemen t. Rep eated Game. W e formali ze the rep eated game b et w een the defender a n d the attac k er as follo ws. I n eac h round t from 1 to T : 1. The defender c ho oses de f e ns e allo cation d t ( e ) o v er the edges e ∈ E . 2. The attac k er c ho oses an attac k path a t in G . 3. The path a t and attac k surfaces { w ( e ) : e ∈ a t } are rev ealed to the defender. 4. The attac k er pa ys cost( a t , d t ) and gains pa yoff ( a t ) . In eac h r o u nd, w e let the attac k er c ho ose the attac k path after the defende r commits to the defense allo cation b ecause the defender's budget allo cation is not a secret (in the sense of a cryptographic k ey). F ollo wing the no securit y through obscurit y pr i nciple, w e mak e the conserv ativ e assumption that the att a c k er can accurately determine the defender's budget allo cation. Defender Kno wledge. Unlik e proactiv e defenders, reactiv e defenders do not kno w all of the vulnerabilities t hat exist in the system in adv ance. (If defend- ers had c omplete kno wledge of vulnerabiliti es, conferences suc h as B lac k Hat Briengs w ould s e r v e little purp ose.) Instead, w e rev eal an edge (and its attac k surface) to the de f e nd e r after the attac k e r uses the edge in an a t tac k. F or exam- ple, the defende r migh t monitor the system and learn ho w the attac k er attac k ed the system b y doing a p ost-mortem analysis o f in trusion logs. F or m ally , w e dene a r e active defense s t r ate gy to b e a function from attac k sequences { a i } and the subsystem induced b y the edges con tained in S i a i to defense allo cations suc h that d ( e ) = 0 if edge e 6∈ S i a i . Notice that this requires the defender's strategy to b e oblivious to the system b ey ond t he edges used b y the attac k er. 8 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett Algorithm 1 A reactiv e defe n se strategy for hidden edge s.  Initialize E 0 = ∅  F or eac h round t ∈ { 2 , ..., T } • Let E t − 1 = E t − 2 ∪ E ( a t − 1 ) • F or eac h e ∈ E t − 1 , let S t − 1 ( e ) = ( S t − 2 ( e ) + M ( e, a t − 1 ) if e ∈ E t − 2 M ( e, a t − 1 ) otherwise. ˜ P t ( e ) = β S t − 1 ( e ) t − 1 P t ( e ) = ˜ P t ( e ) P e 0 ∈ E t ˜ P t ( e 0 ) , where M ( e, a ) = − 1 [ e ∈ a ] /w ( e ) is a matrix with | E | ro ws and a column for eac h attac k. Algorithm. Algorithm 1 is a reactiv e defe n se strategy based on the m ultiplica- tiv e up date learning algorithm [5,12]. The algorithm reinforces edges on the attac k path m ultiplicativ ely , taking the attac k surface in to accoun t b y allo cat- ing more budget to easier-to-defend edges. Wh e n new edges are rev e aled, the algorithm re-all o cates budget uniformly f rom the already-rev ealed edges to the newly rev ealed edges. W e state the algorithm in terms of a normalized defense allo cati on P t ( e ) = d t ( e ) /B . Notice that this algorithm is oblivious to unattac k ed edges and the attac k er's rew ard for v i s i t i ng eac h v ertex. An appropriate setting for the algorithm parameters β t ∈ [0 , 1) will b e describ ed b elo w. The algori t hm b egins without an y kno wl edge of the graph whatso ev er, and so allo cates no defense budget to the s ystem. Up on the t th attac k on the system, the a lgorithm up dates E t to b e the set of e dges rev ealed up to this p oin t, and up dates S t ( e ) to b e a w eigh t coun t o f the n um b er of times e has b een used in an attac k th us far. F or eac h edge that has ev er b een rev ealed, the defense allo ca t i on P t +1 ( e ) is c hosen to b e β S t ( e ) t normalized to sum to unit y o v er all edges e ∈ E t . In this w a y , an y edge attac k ed in round t will ha v e its defense allo cation reinforced. The parameter β con trols ho w aggressiv ely the defender reallo cates defense budget to recen tly attac k ed edges. If β is innitesimal, the defender will mo v e the en tire defense budget to the edge on the most recen t attac k path with the smallest attac k surface. If β is enormous, the defender will not b e v ery agile and, instead, l ea v e the defense budget in the initial allo cation. F or an appropriate v alue of β , the algorithm will con v erge to the optim al defense strategy . F or instance, the min c ut in the example from Sec tion 3. Theorems. T o compare this reactiv e defense strategy to all proactiv e defense strategies, w e use the notion of r e gr et from online l earning theory . The follo wing is an additiv e regret b ound relating the attac k er's prot under reactiv e and proactiv e defense strategie s . A Learni ng-Based Approac h to Reactiv e Securit y 9 Theorem 1 The aver age attacker pr ot again st A lgorithm 1 c onver ges to the aver age attacker p r o t against the b est pr o active defense. F ormal ly, if defense al lo c ations { d t } T t =1 ar e output by A lgorithm 1 with p ar ameter se quenc e β s =  1 + p 2 log | E s | / ( s + 1)  − 1 on any system ( V , E , w , rew ard , s ) r eve ale d online and any attack se quenc e { a t } T t =1 , then 1 T T X t =1 profit( a t , d t ) − 1 T T X t =1 profit( a t , d ? ) ≤ B r log | E | 2 T + B (log | E | + w − 1 ) T , for al l pr o act i ve defense str ate gies d ? ∈ D B ,E wher e w − 1 = | E | − 1 P e ∈ E w ( e ) − 1 , the me an of the surfac e r e cipr o c als. Remark 2 W e c an interpr et The or em 1 as establis hing sucient c onditions under which a r e act i v e defense str ate gy is withi n an additive c onstant of the b est pr o active defense str ate gy. Inste ad of c ar eful ly an alyzing the system t o c onstruct the b est pr o active defense, the defender ne e d only r e act to attacks in a principle d manner to achieve almost the same quality of defense in terms of attacker pr ot. Reactiv e defense strategies can also b e comp etitiv e with proactiv e defense st rate - gies when w e consider an attac k er motiv ated b y retur n on attac k (R O A). The R O A form ulation is app ealing b ecause (unlik e with pr o t ) the ob jectiv e function do es not require measuring attac k er cost and defender budget in the same units. The next result considers the c omp etitiv e ratio b et w een the R O A f or a reactiv e defense strategy and the R O A for the b est proactiv e defense strategy . Theorem 3 The R O A against A lgorithm 1 c onver ges to the R O A again st b est pr o active defens e . F ormal ly, c onsid e r the cumulative R O A: R OA  { a t } T t =1 , { d t } T t =1  = P T t =1 pa yoff ( a t ) P T t =1 cost( a t , d t ) (W e abuse notation slightly and use singleton ar guments to r epr esent the c or- r esp onding c onstant se quenc e.) If defense al lo c ations { d t } T t =1 ar e output by A l- gorithm 1 with p ar amet ers β s =  1 + p 2 log | E s | / ( s + 1)  − 1 on any system ( V , E , w , rew ard , s ) r eve ale d onli ne, such that | E | > 1 , and any attack se quenc e { a t } T t =1 , then for al l α > 0 and pr o active defense str ate gies d ? ∈ D B ,E R OA  { a t } T t =1 , { d t } T t =1  R OA  { a t } T t =1 , d ?  ≤ 1 + α , pr ovide d T is suciently lar ge. 1 Remark 4 Notic e that the r e active defender c an use the same alg or i th m r e- gar d less of whether the attacker is motivate d by pr ot or by R O A. As discusse d in Se ction 5 the optimal pr o active defense is not similarly r obust. 1 T o wit: T ≥ “ 13 √ 2 ` 1 + α − 1 ´ “ P e ∈ inc( s ) w ( e ) ”” 2 log | E | . 10 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett W e prese n t pro ofs of these theorems in App endix A . W e rst pro v e the theorems in the simpler setting where the defender kno ws the en tire graph. Sec ond, w e remo v e the h yp othesis that th e defender kno ws t he edges in adv ance . Lo w er Bounds. In App endix A, w e use a t w o-v ertex, t w o-edge graph to estab- lish a lo w er b ound on the comp etitiv e ratio of the R O A for all reactiv e strategies. The lo w er b ound sho ws that the analysis of Alg or i thm 1 is tigh t and that Algo- rithm 1 is optimal giv en the information a v ailable to the algorithm. The pro of giv es an example where the b est proactiv e defense (sli gh tly) out-p erforms ev ery reactiv e strategy , suggesting the b e nc hmark is not un re as o nably w eak. 5 A d v an tages of Reactivit y In this section, w e examine some situations in whic h a reactiv e defender out- p erforms a proactiv e defender. Proactiv e d e fenses hinge on the defender's mo del of the attac k er's incen tiv e s. If the defender's mo del is inaccurate, the defender will constru c t a proactiv e de f e ns e that is far from optimal. By con trast, a reactiv e defender need not reason ab out the attac k er's incen tiv es directly . Instead, th e reactiv e defender l earns these incen tiv es b y observing the attac k er in action. Learning Rew ards. One w a y to mo del inaccurac ies in the de f e nder's estimates of t he attac k er's incen tiv es is to hide the attac k er's rew ards from the defe n de r. Without kno wledge of the pa y os, a proactiv e defender has dicult y limiting the attac k er's R O A. Cons i der, for example, the star system whose edges ha v e equal attac k surfaces, as depicted in Figure 5.1. Without kno wledge of the attac k er's rew ards, a proactiv e defender has li tt l e c hoice but to allo cate the defense budget equally to eac h edge (b ecause the e d g es are indistinguishable). Ho w ev er, if the attac k er's rew ard is concen trated at a single v ertex, the comp etitiv e ratio for attac k er's R O A (compared to the rational proactiv e defense) is the n um b er of leaf v ertices. (W e can, of course, mak e the ratio w orse b y adding more v ertices.) By con trast, the reactiv e algorithm w e analyze i n Section 4 is comp etitiv e with the rational proactiv e defense b ecause the reactiv e algo r i th m eectiv ely learns the rew ards b y observing whic h attac ks the attac k er c ho oses. Robustness to Ob jectiv e. Another w a y to mo del inaccuracies in the de- fender's e st i mates of the attac k er's incen tiv e s is to assume the defender mis- tak es whic h of prot and R O A actually matter to the attac k er. The defense constructed b y a ratio n a l proac t i v e defender dep ends cruciall y on whether the attac k er's actual incen tiv es are based on prot or based on R O A, whereas the reactiv e algorithm w e a n a lyze in Section 4 is robust to this v ariation. In partic- ular, consider the system depicted in Figure 5 .2, and assume the defender has a budget of 9 . If the d e fender b eliev es the attac k er is motiv ated b y prot, the ratio- nal pr oa ctiv e defense is to allo cate the en tire defense budget to the righ t - most edge (making the prot 1 on b oth edges). Ho w ev er, this defense is disastrous when view ed in terms of R O A b ecause the R O A for the left edge is innite (as opp osed to near unit y when the pr oa ctiv e defender optimizes for R O A). A Learni ng-Based Approac h to Reactiv e Securit y 11 w :1 reward:0 reward:8 reward:0 reward:0 reward:0 reward:0 reward:0 reward:0 w :1 w :1 w :1 w :1 w :1 w :1 w :1 s Fig. 5.1. Star-shap ed attac k graph with rew ards concen- trated in an unkno wn v ertex. reward:1 w :1 w :1 Satellite Office Internet Headquarters reward:10 s Fig. 5.2. An attac k graph that sepa- rates the minimax st rate gies optimizing R O A and attac k e r prot. Catac hresis. The defense constructed b y the rational proactiv e defender is op- timized for a rational attac k er. I f the attac k er is not p erfectly rational, there i s ro om for out-p erforming the rational proactiv e de f e ns e . There are a n um b er of situations in whic h the attac k er migh t not moun t optimal attac ks:  The attac k er mi gh t not ha v e complete kno wledge of the attac k graph. Con- sider, for example, a soft w are v endor who disco v ers v e equally sev e re vulner- abilities in one of their pro ducts via fuzzing. A ccording to proac t i v e securit y , the defender ough t to dedicate equal resources to r e pairing these v e vul- nerabilities. Ho w ev er, a reactiv e de f e nder migh t dedicate more r e sour c es to xing a vu l nerabilit y actually exploited b y a t tac k ers in the wild. W e can mo del these situations b y making the attac k er oblivious to som e edges.  The attac k er migh t not ha v e complete kno wledge of the defense allo cation. F or example, an attac k er attempti n g to in v ade a corp orate net w ork migh t target com p ute r s in h uman resources without realizing that attac king the customer relationship managemen t database i n sales has a higher return-on- attac k b ecause the database is ligh tly de f e nded. By observing attac ks, the reactiv e strategy le arn s a defense tuned for the actual attac k er, causing the attac k er to receiv e a lo w er R O A. 6 Generalizations Horn Clauses. Th us far, w e ha v e presen ted our results using a graph-based system mo del. Our results extend, ho w ev er, to a more general s ystem mo del based on Horn clauses. Datalog programs, whi c h are based on Horn clauses, ha v e b een used in previous w ork to represen t vulnerabili t y-lev el attac k graphs [27]. A 12 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Horn clause is a stateme n t in prop ositional logic of the form p 1 ∧ p 2 ∧ · · · ∧ p n → q . The prop ositions p 1 , p 2 , . . . , p n are called the ante c e dents , and q is called the c onse quent . The set of an teceden ts migh t b e empt y , in whic h case the clause simply ass e r ts the consequen t. Notice that Horn clauses are negation-free . In some sense, a Horn clause represen ts an e dge in a h yp ergraph where m ultiple pre-conditions are required b efore taking a certain state trans i tion. In the Horn mo del, a system consists of a set of Horn clauses, an attac k surface for eac h clause, and a rew ard for eac h prop osition. The defender allo cates defense budge t among the Horn clauses. T o moun t a n attac k, the attac k er sele cts a valid pr o of : an ordered list of rules suc h that eac h an te ceden t app ears as a consequen t of a rule earlier in the list. F or a giv en pr o of Π , cost( Π , d ) = X c ∈ Π d ( c ) /w ( e ) pa yoff ( Π ) = X p ∈ [ [ Π ] ] rew ard( p ) , where [ [ Π ] ] is the set of prop ositions pro v ed b y Π (i.e., those prop ositions that app ear as consequen ts in Π ). Prot and R O A are computed as b efore. Our results generaliz e to this mo del directly . Essen tially , w e need only replace eac h instance of the w ord edge with Horn clause and path with v alid pro of. F or example, the ro ws of the matrix M used throughout the pro of b ecome the Horn clauses, and the columns b ecome the v alid pro ofs (whic h are n umerous, but no matter). The en tries of the matrix b ecome M ( c, Π ) = 1 /w ( c ) , analogous to the graph case. The one non-ob v i ous substitution is inc( s ) , whic h b ecome s the set of clauses that lac k an teceden ts. Multiple A ttac k ers. W e ha v e fo cused on a securit y game b e t w een a single attac k er and a defender. In practice, a securit y system migh t b e attac k ed b y sev eral unco ordinated attac k ers, eac h with dieren t information and dieren t ob je ctiv es. F ortunately , w e can sho w that a mo del with m ultiple attac k ers is mathematically equiv alen t to a mo del with a single attac k er with a randomized strategy: Use the set of attac ks, one p er attac k er, to dene a distr i bu ti on o v er edges where the probabilit y of an edge is linearly prop ortional to the n um b er of attac ks whic h use the edge . This precludes the in terpretation of an attac k as an s -ro oted path, but our pro ofs do not rely up on th i s i n terpretation and our results hold in suc h a mo del with appropriate mo d i cations. A daptiv e Proactiv e Defenders. A simple application of an online learning result [18], omitted due to space constrain ts, mo dies our regret b ounds for a proactiv e defender who re-al lo cates budget a  xed n um b er of times. In this mo del, our results remain quali tativ ely the same. 7 Related W ork Anderson [1] and V arian [ 31] informally discuss (via anecdotes) ho w the desig n of information securit y m ust tak e incen ti v es in to accoun t. August and T unca [2] compare v arious w a y s to incen tivize users to patc h their systems in a setting where the users are more susceptible to att a c ks if their neigh b ors do not patc h. A Learni ng-Based Approac h to Reactiv e Securit y 13 Gordon and Lo eb [15] and Hausk en [17] analyze the costs and b enets of securit y in an economic mo del (with non-strategic attac k ers) where the proba- bilit y of a successful exploit is a function of the defense in v estmen t. Th e y use this mo del to compute the optimal lev el of in v estmen t. V arian [30] studies v arious (single-shot) securit y games and iden ties ho w m uc h agen ts in v est in securit y at equilibrium. Grossklags [16] extends this mo del b y letting agen ts self-insure. Miura et al. [24] study externalities t hat app ear due to u se r s ha ving the same passw ord across v arious w ebsites and discuss pareto-impro ving securit y in v estmen ts. Miura and Bam b os [25] r a n k vulnerabil ities according to a random- attac k er mo del. Skyb o x and RedSeal oer practical systems that help en terprises prioritize vulnerabil ities based on a random-attac k er mo del. Kumar et al. [22] in v estigate optimal securit y arc hitectures for a m ulti-division en terprise, taking in to accoun t losses due to lac k of a v ailabilit y and conden tialit y . None of the ab o v e pap ers explicitly mo del a truly adv ersarial attac k er. F ultz [14] generalizes [16] b y mo deling attac k ers explicitly . Ca vusoglu et al. [4] highligh t the imp ortance of using a game-theoretic mo del o v er a decision theo- retic mo del due to the pre s e nce of adv ersarial attac k ers. Ho w ev er, these mo dels lo ok at idealized settings that are not genericall y applicable. Ly e and Wing [23] study t he Nash equilibrium of a single-shot game b e t w een an attac k er and a de- fender that mo dels a p a r ti cular en terprise se curit y scenario. Arguably this mo del is most similar to ours in terms of abstraction l ev el. Ho w ev er, calculating the Nash equili br i um requires detailed kno w l edge of the adv e r sary's incen tiv es, whic h as discussed in the in tro duction, migh t not b e readily a v ailable to the defender. Moreo v er, their game con tai n s m ultiple equilibria, w eak ening their prescriptions. 8 Conclusions Man y securit y exp erts equate reactiv e securit y with m y opic bug-c hasing and ig- nore principled reactiv e strategies when they recomm end adopting a proactiv e approac h to risk managemen t. In this pap er, w e est a b l ish sucien t conditions for a learning-based reactiv e strategy to b e comp etitiv e with the b est xed proactiv e defense. A dditionally , w e sho w that reactiv e defenders can out-p erform proac- tiv e defenders when the proactiv e defender defends against attac ks that nev er actually o ccur. Although our mo d e l is an abstraction of the complex in terpla y b et w een attac k ers and defenders, our results supp ort the follo wing practic al ad- vice for CISOs making securit y in v estmen ts:  Emplo y monitoring to ols that let y ou detect and analyze attac ks against y our en terprise. These to ols hel p fo c us y our eorts on th w arting real attac ks.  Mak e y our securit y organiz ation more agile. F or example , build a rigorous testing lab th a t lets y ou roll out securit y patc hes quic kly o n c e y ou detect that attac k ers are exploiting these vulnerabil ities.  When determining ho w to exp end y our securit y budget, a v oid o v erreacting to the most recen t attac k. Instead, consider al l previous attac ks, but discoun t the imp ortance of past attac ks exp onen tially . 14 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett In some situations, proac tiv e securit y can o u t-p erform re activ e securit y . F or ex- ample, reactiv e approac hes are ill-suited for defending agai ns t catastrophic at- tac ks b ecause there is no next roun d in whic h the defender can use information learned from th e attac k. W e hop e our results will le ad to a pro ductiv e discussion of the limitations of our mo del and the v alidit y of our conclusions. Instead of assuming that proactiv e securit y is alw a ys sup erior to reacti v e securit y , w e in vite the rea d e r to consider when a reactiv e approac h migh t b e appropriate. F or the parts of an en t e rp ri s e where the defender's budget is liquid and there are no catastrophic losses, a carefully constructed reactiv e strategy can b e as eectiv e as the b est proactiv e defense in the w orst case and signican tly b etter in the b est case. A c kno wledgmen ts W e w ould lik e to thank Elie B ur szte in, Eu-Jin Goh, and Matt Finifter for their though tful commen ts and helpful f e edbac k. W e gratefully ac kno wledge the supp ort of the NSF through the TR UST Science and T ec h- nology Cen ter and gran ts DMS-0707060, CCF-0424422, 0311808, 0448452, and 0627511, and the su pp ort of the AF OSR through the MURI Program, and the supp ort of the Sieb el Sc holars F oundation. References 1. Anderson, R .: Wh y information secu r i t y is hardAn economic p ersp ect iv e. 17th Ann ual Computer Securit y Applications Conference pp. 358365 (2001) 2. August, T., T unca, T.I.: Net w ork sof t w a re securit y and u ser incen tiv es. Manage- men t Science 52(11), 17031720 (2006) 3. Beard, C.: In tro ducing Test Pilot (Marc h 2008), http://labs.mozilla.co m/2008/03/introducing- test- pilot/ 4. Ca vusoglu, H., Ragh unathan, S., Y ue, W.: Decision-theoretic and game-theoretic approac hes to IT securit y in v estmen t. Journal of Managemen t Information Systems 25(2), 2813 04 (2008) 5. Cesa-Bianc hi, N., F reund, Y., Ha u ssler, D., Helm b old, D.P ., Sc hapire, R.E., W ar- m u th, M.K.: Ho w to use exp ert advice. Journal of the Asso ciation for Computing Mac hinery 44(3), 427485 (Ma y 1997) 6. Cesa-Bianc hi, N., F reund, Y., Helm b old, D.P ., Haussler, D., Sc hapire, R.E., W ar- m u th, M.K.: Ho w to use exp ert advice. In : Pro ceedings of the T w en t y-Fifth Ann ual A CM Symp osium on Theory of C o mp uting. pp. 382391. A CM, New Y ork, NY, USA ( 19 93) 7. Chakrabart y , D., Meh t a, A., V azirani, V.V.: D es ign is as easy as opt imization. In: 33rd In ternational Collo quium on Automata, Languages and Programming (ICALP). LNCS 4051, v ol. P art I, pp. 477488 (2006) 8. Cremonini, M.: Ev aluating in f ormation securit y in v est men ts from attac k ers p er- sp ectiv e: the retu rn-on-attac k (R O A). In: F ourth W orkshop on the Economics o f Information Securit y (2005) 9. Fisher, D.: Multi-pro cess arc hitecture (July 2008), http://dev.chromium.or g/ developers/design- documents/multi- process- archi tecture 10. F ranklin, J., P axson , V ., P errig, A., Sa v age, S.: An inquiry in to the nature and causes of the w ealth of in t er n et miscrean ts. In: Pro ceedings of the 2007 A CM Conference on Computer and Comm unication s Sec urit y . pp . 375388. A CM, N ew Y ork, NY, USA (2007) A Learni ng-Based Approac h to Reactiv e Securit y 15 11. F reund, Y., Sc hapire, R.: A short in tro d uction to b o ostin g. Journal of the Japanese So ciet y for Articial In telligence 14(5), 77 1780 (1999) 12. F reund, Y., Sc hapire, R.E.: A daptiv e game pla ying u sing m ultiplicativ e w eigh ts. Games a n d Economic Beha vior 29, 79103 (1999) 13. F riedb erg, J.: In ternet fraud battleeld (April 2007), http://www.ftc.gov/ bcp/ workshops/proofpositiv e/Battlefield_Overview.pdf 14. F ultz, N., Grossklags, J. (eds.): Blue v ersus Red: T o w ards a mo del of distribut ed securit y attac ks. Pro ceedings of the Thirteen th In ternational Conference Financial Cryptograph y and Data Securit y ( F ebruary 2009) 15. Gordon , L.A., Lo eb, M.P .: The economics of information securit y in v estmen t. A CM T ransact ions on Information and System Secu r i t y 5(4), 438457 ( 20 02) 16. Grosskl a gs, J., Christin, N., Ch uang, J.: Se cure or insure?: A game-theoretic anal- ysis of information securit y games. In: Pro ceed ing of the 17th in ternat ional con- ference on W orld Wide W eb. p p. 209218. A CM, New Y ork, NY, USA (2008) 17. Hausk en, K.: Returns to information secu rit y in v estmen t: The e ec t of alternativ e information secu rit y b rea c h funct ions on opt ima l in v estmen t and sen sitivit y to vulnerabilit y . Information Systems F ron tiers 8(5), 338349 (2006) 18. Herbst er, M., W arm uth, M.K.: T rac k ing the b est exp ert. Mac hine Learni ng 32(2), 151178 (1998) 19. Ho w ard, M.: A ttac k surface: Mitigate securit y risks b y minimizing the co de y ou exp ose to un trusted users. MSDN Magazine (No v em b er 2004), http://msdn.microsoft. com/en- us/magazine/cc1 63882.aspx 20. Kani c h, C., Kreibic h, C. , Lev c henk o, K., Enrigh t, B., V o elk er, G.M ., P axson, V., Sa v age, S.: Spamalytics: An empirical analysis of spam mark eting con v ersion. In: Pro c eedings of the 2008 A CM Conference on Computer an d Comm unications Se- curit y . pp. 314. A CM, New Y ork, NY, USA (2008) 21. Kark, K., P enn, J., Dil l, A.: 2008 C ISO priorities: The righ t ob jectiv es but the wrong fo cus. Le Magazine de la Sécurité Inform at ique (April 2009) 22. Ku m ar, V., T elang, R., Mukhopadh y a y , T. : Optimal information secu rit y arc hitec- ture for the en terprise, http://ssrn.com/abst ract=1086690 23. L y e, K.w., Wing, J.M.: Game strategies in net w ork s ecurit y . In: Pro ceedin g s of the F ou ndations of Computer Securit y W orkshop. pp . 1322 (2002) 24. M iura-K o, R.A., Y olk en , B., Mitc hell, J., Bam b os, N.: Secu r i t y decision-making among in terdep enden t organizat ions. In: Pro ceedings of th e 21st IEEE Comput er Securit y F oundations Symp osium. p p. 6680. IEEE Computer So ciet y , W ashing- ton, DC, USA (2008) 25. M iura-K o, R., Bam b os, N.: SecureRank: A risk-based vuln erabilit y managemen t sc heme for co mp uting infrastructures. In: Pro ceedings of IEEE In ternational Con- ference on Comm u nications. pp. 14551460 (June 2007) 26. Ord en tlic h, E., C o v er, T.M.: The cost of ac hieving the b est p ortfolio in hi ndsigh t. Mathematics of Op erations Researc h 23(4) , 960982 (1998) 27. Ou , X., Bo y er, W.F., McQueen, M.A.: A scalable approac h to a t tac k graph gener- ation. In: Pro ceedings of the 13th A CM Conference on Computer and Comm uni- cations Securit y . pp. 336345 (2006) 28. Piron ti, J.P .: Key elemen ts of an information securit y program. In f ormation Sys- tems Con trol Journal 1 (2005) 29. R escorla, E.: Is nding securit y holes a go o d idea? IEEE Securit y and Priv acy 3(1), 1419 (2005) 30. V arian, H.: System reliab ilit y and free riding (2001) 31. V arian, H.R.: Managing o n line securit y risks. New Y ork Times. Jun 1, 2000 32. W arner, B.: Home P C s ren ted out in sab otage-for-hire rac k et. Reuters (July 2004) 16 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Algorithm 2 Reactiv e defense strategy for k no wn edges using the m ultiplicativ e up date algorithm.  F or eac h e ∈ E , initialize P 1 ( e ) = 1 / | E | .  F or eac h round t ∈ { 2 , . . . , T } and e ∈ E , l et P t ( e ) = P t − 1 ( e ) · β M ( e,a t − 1 ) /Z t where Z t = X e 0 ∈ E P t − 1 ( e ) β M ( e 0 ,a t − 1 ) A Pro ofs W e no w describ e a series of reductions that establish the m ain results. First, w e pro v e Theorem 1 in the simpler setting where the defender kno ws the en tire graph. Second, w e remo v e the h yp othesis that the defender kno ws the edges i s adv ance. Finall y , w e extend our results to R O A. Prot (Kno wn Edges). Supp ose that the re activ e defender is gran ted full kno wledge of the system ( V , E , w , rew ard , s ) from the outset. Sp ecically , the graph, att a c k surfaces, and rew a r ds are all rev ealed to the defender prior to the rst round. Algorithm 2 is a reactiv e defense strategy that mak e s use of this additional kno wledge. Lemma 5 If defense al lo c ation s { d t } T t =1 ar e output by A lgorithm 2 with p a- r ameter β =  1 + q 2 log | E | T  − 1 on an y system ( V , E , w , rew ard , s ) and attack se quenc e { a t } T t =1 , then 1 T T X t =1 profit( a t , d t ) − 1 T T X t =1 profit( a t , d ? ) ≤ B r log | E | 2 T + B log | E | T , for al l pr o active defense str ate g i es d ? ∈ D B ,E . The lemma's pro of is a reduction to the follo wing regret b ound from online learning [12, Corollary 4]. Theorem 6 If the multiplic ative up date a lg or ithm (A lgor ithm 2) is run with any game matrix M with elem ents in [0 , 1] , and p ar ameter β =  1 + p 2 log | E | /T  − 1 , then 1 T T X t =1 M ( P t , a t ) − min P ? ≥ 0: P e ∈ E P ? ( e )=1 ( 1 T T X t =1 M ( P ? , a t ) ) ≤ r log | E | 2 T + log | E | T . Pr o of (of L emma 5). Due to the normalization b y Z t , the sequence of defense allo cati ons { P t } T t =1 output b y Algorithm 2 is in v arian t to adding a constan t to A Learni ng-Based Approac h to Reactiv e Securit y 17 all elemen ts of m atr i x M . L et M 0 b e the matrix obtained b y adding constan t C to al l en tries of arbitrary game ma t ri x M , and let sequences { P t } T t =1 and { P 0 t } T t =1 b e obtained b y running m ultiplicativ e up date with matrix M and M 0 resp ectiv ely . Then, for all e ∈ E and t ∈ [ T − 1] , P 0 t +1 ( e ) = P 1 ( e ) β P t i =1 M 0 ( e,a i ) P e 0 ∈ E P 1 ( e 0 ) β P t i =1 M 0 ( e 0 ,a i ) = P 1 ( e ) β ( P t i =1 M ( e,a i ) ) + tC P e 0 ∈ E P 1 ( e 0 ) β ( P t i =1 M ( e 0 ,a i ) ) + tC = P 1 ( e ) β P t i =1 M ( e,a i ) P e 0 ∈ E P 1 ( e 0 ) β P t i =1 M ( e 0 ,a i ) = P t +1 ( e ) . In particular Algorithm 2 pro duces t he same defense allo cation sequence as if the game matrix el emen ts are increased b y one to M 0 ( e, a ) = ( 1 − 1 /w ( e ) if e ∈ a 1 otherwise. Because this new matrix has en tries in [0 , 1] w e can apply Theorem 6 to pro v e for the or i ginal matrix M that 1 T T X t =1 M ( P t , a t ) − min P ? ∈D 1 ,E ( 1 T T X t =1 M ( P ? , a t ) ) ≤ r log | E | 2 T + log | E | T . (A.1) No w, b y denition of the origi nal game matrix, M ( P t , a t ) = X e ∈ E − ( P t ( e ) /w ( e )) · 1 [ e ∈ a t ] = − X e ∈ a t P t ( e ) /w ( e ) = − B − 1 X e ∈ a t d t ( e ) /w ( e ) = − B − 1 cost( a t , d t ) . Th us I nequ a lit y (A.1) is equiv alen t to − 1 T T X t =1 B − 1 cost( a t , d t ) − min d ? ∈D 1 ,E ( − 1 T T X t =1 B − 1 cost( a t , d ? ) ) ≤ r log | E | 2 T + log | E | T Simple algebraic manipulation yields 1 T T X t =1 profit( a t , d t ) − min d ? ∈D B,E ( 1 T T X t =1 profit( a t , d ? ) ) = 1 T T X t =1 (pa yoff ( a t ) − cost( a t , d t )) − min d ? ∈D B,E ( 1 T T X t =1 (pa yoff ( a t ) − cost( a t , d ? )) ) = 1 T T X t =1 ( − cost( a t , d t )) − min d ? ∈D B,E ( 1 T T X t =1 ( − cost( a t , d ? )) ) ≤ B r log | E | 2 T + B log | E | T . 18 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Prot (Hidden Edges). The s tandard algorithms in online learning assumes that the ro ws o f the matrix are kno wn in adv ance. Here, the edge s are not kno wn in adv ance and w e m ust relax this a s sumption using a sim ulation ar- gumen t, whic h is p erhaps t he least ob vious part of the reduction. The defense allo cati on c hosen b y Algorithm 1 at time t is precisely the same as the defense allo cati on th a t w ould ha v e b een c hosen b y Algorithm 2 had the defender run Algorithm 2 on the curren tly visible subgraph. The fol lo wing l emma formalizes this equiv alence. Note that Algorithm 1's parameter is re activ e: it corresp onds to the Algorithm 2's parameter, but for the s ubgraph induced b y the edges re- v ealed so far. That is, β t dep ends only on edges visible to the defender in round t , letting the defender actually run the a lgorithm. Lemma 7 Consider arbitr ary r ound t ∈ [ T ] . If A l g or ithm s 1 a nd 2 ar e ru n with p ar ameters β s =  1 + p 2 log | E s | / ( s + 1)  − 1 for s ∈ [ t ] and p ar ameter β =  1 + p 2 log | E t | / ( t + 1)  − 1 r esp e ctively, with the latter r un on the sub gr aph induc e d by E t , then the defense al lo c a t i ons P t +1 ( e ) output by the algorithms ar e identic al for al l e ∈ E t . Pr o of. If e ∈ E t then ˜ P t +1 ( e ) = β P t i =1 M ( e,a i ) b ecause β t = β , and the round t + 1 defense allo cation of Algorithm 1 P t +1 is simply ˜ P t +1 normalized to sum to unit y o v er edge set E t , whic h is exactly the defense allo cation output b y Algorithm 2. Armed wi t h this corresp ondenc e, w e sho w that Algorithm 1 i s almost as eectiv e as Algorithm 2. In other w ords, hiding unattac k ed edges from the defender do es not cause m uc h harm to the reactiv e defender's abil it y to disi n c en tivize the attac k er. Lemma 8 If defense al lo c ations { d 1 ,t } T t =1 and { d 2 ,t } T t =1 ar e output by A lgo- rithms 1 and 2 with p ar amete r s β t =  1 + p 2 log | E t | / ( t + 1)  − 1 for t ∈ [ T − 1] and β =  1 + p 2 log | E | / ( T )  − 1 , r esp e ctively, on a system ( V , E , w , rew ard , s ) and attack se quenc e { a t } T t =1 , then 1 T T X t =1 profit( a t , d 1 ,t ) − 1 T T X t =1 profit( a t , d 2 ,t ) ≤ B T w − 1 . Pr o of. Consider attac k a t from a r ound t ∈ [ T ] and consider an edge e ∈ a t . If e ∈ a s for some s < t , then the defense budget allo cated to e at time t b y Algorithm 2 cannot b e greater than the budget allo cated b y Algorithm 1. Th us, the instan taneous cost paid b y the attac k er on e when Algorithm 1 defends is at least th e cost paid when Algorithm 2 defends: d 1 ,t ( e ) /w ( e ) ≥ d 2 ,t ( e ) /w ( e ) . If e / ∈ S t − 1 s =1 a s then for all s ∈ [ t ] , d 1 ,s ( e ) = 0 , b y denition. The sequence { d 2 ,s ( e ) } t − 1 s =1 is decreasing and p ositiv e. Th us max s 0 , wher e inc( v ) ⊆ E denotes the e dges incident to vertex v . Pr o of. Let d ? = argmax d ∈D B,E min a cost( a, d ) witness the game's v alue V , then max d ∈D B,E P T t =1 cost( a t , d ) ≥ P T t =1 cost( a t , d ? ) ≥ T V . Consider the defen- siv e allo cation for eac h e ∈ E . If e ∈ inc( s ) , let ˜ d ( e ) = B w ( e ) / P e ∈ inc( s ) w ( e ) > 0 , and otherwise ˜ d ( e ) = 0 . This allo cation is f e asible b ecause X e ∈ E ˜ d ( e ) = B P e ∈ inc( s ) w ( e ) P e ∈ inc( s ) w ( e ) = B . By denition ˜ d ( e ) /w ( e ) = B / P e ∈ inc( s ) w ( e ) for eac h edge e inciden t to s . There- fore, cost( a, ˜ d ) ≥ B / P e ∈ inc( s ) w ( e ) for an y non-trivial attac k a , whic h necessar- ily includes at l east one s -inciden t edge. Finally , V ≥ min a cost( a, ˜ d ) pro v e s V ≥ B P e ∈ inc( s ) w ( e ) . (A.2) No w, consider a defense al lo cation d and x an attac k a that minimi zes the total attac k er cost under d . A t most one edge e ∈ a can ha v e d ( e ) > 0 , for otherwise the cost under d can b e re du c ed b y rem o ving an e dge from a . Moreo v er an y attac k a ∈ argmin e ∈ inc( s ) d ( e ) /w ( e ) minimizes attac k er cost under d . Th us the maximin V is witnessed b y defense allo cations that maximize min e ∈ inc( s ) d ( e ) /w ( e ) . This maximization is ac hiev e d b y allo cation ˜ d and so I n e q ua lit y (A.2) is an equalit y . 20 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett W e are no w ready to pro v e the main R O A theorem: Pr o of (of The or em 3). First, observ e that for all B > 0 and all A, C ∈ R A B ≤ C ⇐ ⇒ A − B ≤ ( C − 1) B . (A.3) W e will use this equiv alence to con v ert the regret b ound on prot to the desired b ound on R O A. T ogether Theorem 1 and L emma 9 imply α T X t =1 cost( a t , d t ) ≥ α max d ? ∈D B,E T X t =1 cost( a t , d ? ) − α B 2 p T log | E | − αB  log | E | + w − 1  ≥ αV T − α B 2 p T log | E | − αB  log | E | + w − 1  where V = max d ∈D B,E min a cost( a, d ) > 0 . If √ T ≥ 13 √ 2  1 + α − 1  p log | E | X e ∈ inc( s ) w ( e ) , w e can use inequalities V = B / P e ∈ inc( s ) w ( e ) , w − 1 ≤ 2 log | E | (since | E | > 1 ), and  P e ∈ inc( s ) w ( e )  − 1 ≤ 1 to sho w √ T ≥  (1 + α ) B + p [(1 + α ) B + 24 α V ] (1 + α ) B  (2 √ 2 αV ) − 1 p log | E | , whic h com b i nes with Theorem 1 and Inequali t y A.4 to imply α T X t =1 cost( a t , d t ) ≥ α V T − α B 2 p T log | E | − αB  log | E | + w − 1  ≥ B 2 p T log | E | + B  log | E | + w − 1  ≥ T X t =1 profit( a t , d t ) − min d ? ∈D B,E T X t =1 profit( a t , d ? ) = T X t =1 ( − cost( a t , d t )) − min d ? ∈D B,E T X t =1 ( − cost( a t , d ? )) = max d ? ∈D B,E T X t =1 cost( a t , d ? ) − T X t =1 cost( a t , d t ) . A Learni ng-Based Approac h to Reactiv e Securit y 21 Finally , com bining this equation with Equiv alence A.3 yiel ds the result R OA  { a t } T t =1 , { d t } T t =1  min d ? ∈D B,E R OA  { a t } T t =1 , d ?  = P T t =1 pa yoff ( a t , d t ) P T t =1 cost( a t , d t ) · max d ? ∈D B,E P T t =1 cost( a t , d ? ) P T t =1 pa yoff ( a t , d ? ) = max d ? ∈D B,E P T t =1 cost( a t , d ? ) P T t =1 cost( a t , d t ) ≤ 1 + α . B Lo w er Bounds W e briey argue the optimalit y of Algorithm 1 for a par ti cular graph, i.e. w e sho w that Algorithm 1 has optimal con v ergence time for small enough α , up to constan ts. (F or v ery large α , Algorithm 1 con v erges in constan t time , and therefore is optimal up to constan ts, v acuously .) The argumen t considers an at- tac k er who randomly selects an attac k path, rendering kno wle dge of past attac ks useless. Consider a t w o-v ertex graph where the start v ertex s is connected to a v ertex r (with r e w ard 1 ) b y t w o parallel edges e 1 and e 2 , eac h with an attac k surface of 1 . F urther supp o s e that the defense budget B = 1 . W e rst sho w a lo w er b ound on all reactiv e algorithms: Lemma 10 for al l r e active algorithms A , the c omp etitive r atio C is at le ast ( x + Ω ( √ T )) /x , i.e. at le ast ( T + Ω ( √ T )) /T b e c ause x ≤ T . Pr o of. Consider the follo wing random attac k sequence: F or eac h round, sele ct an attac k path uniform I ID from the set { e 1 , e 2 } . A reac tiv e st rate gy m ust commit to a defense in ev ery round without kno wledge of the attac k, and therefore ev ery strategy that exp ends the e n tire budget of 1 inicts an exp e cted cost of 1 / 2 in ev ery roun d. Th us, ev ery reactiv e strategy inicts a total exp ected cost of (at most) T / 2 , where the exp ec tation is o v er the coin-tosses of the random attac k pro cess. Giv en an attac k sequence, ho w ev er, there exists a proactiv e defense allo cation with b etter p erformance. W e can think of the proactiv e defender b eing prescien t as to whic h edge ( e 1 or e 2 ) will b e attac k ed m os t frequen tly and allo cating the en tire defense budget to that edge. It is w el l-kno wn (for instance via an analysis of a one-dimensional random w alk) that in suc h a random pro cess, one of the edges will o ccur Ω ( √ T ) more often than the other, in exp ectation. By the probabilistic metho d, a prop ert y that is true in exp ectation m ust hold existen tially , and, therefore, for ev ery reactiv e s trateg y A , there exists an attac k sequence suc h that A has a cost x , whereas the b est proactiv e strategy (in retrosp e ct) has a co s t x + Ω ( √ T ) . Beca u se the pa y o of eac h attac k i s 1 , the total rew ard in either ca s e is T . The prescien t proactiv e defender, therefore, has an R O A of T / ( x + Ω ( √ T )) , but the reactiv e algorithm has an R O A of T /x , establishing the lemma. 22 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Giv en this lemma, w e sho w that Algorithm 1 is optimal giv en the information a v ailable . In this case, n = 2 and, ignoring constan ts from Theorem 3, w e are trying to matc h a con v ergence time T is at most (1 + α − 1 ) 2 , whic h is appro xi- mately α − 2 for small α . F or large e nough T , there exists a c ons tan t c suc h that C ≥ ( T + c √ T ) /T . By easy algebra, ( T + c √ T ) /T ≥ 1 + α whenev er T ≤ c 2 /α 2 , concluding the argumen t. W e can generalize the ab o v e argumen t of optimalit y to n > 2 using the com binatorial Lemma 3.2.1 from [6]. Sp ecically , w e can sho w that for ev ery n , there is an n edge graph for whic h Algorithm 1 is optimal up to constan ts for small enough α .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment