A Learning-Based Approach to Reactive Security

A Learning-Based Approac h to Reactiv e Securit y A dam Barth 1 , Benjamin I. P . Rubinstein 1 , Mukund Sundarara jan 3 , John C. Mitc hel l 4 , Da wn Song 1 , and P eter L. Bartlett 1,2 1 Computer Science Division 2 Departmen t of S tatistics, UC Berk eley 3 Go ogle Inc., Moun tain View, CA 4 Departmen t of Computer Science, Stanford Uni v ersit y Abstract. Despite the con v en tional wisdom that proactiv e securit y is sup erior to reactiv e securit y , w e sho w that reactiv e securit y can b e com- p etitiv e with proactiv e securit y as long as the reactiv e defender learns from past attac ks instead of m y opically o v erreacting t o the last attac k. Our game-theoretic mo del follo w s common practice in the s ecurit y lit- erature b y making w orst-case assumptions ab out the attac k er: w e gran t the attac k er com p lete kno wledge of the defender's strategy and do not require the attac k er to act rationally . In th is mo del, w e b ound t he com- p etitiv e ratio b et w een a reactiv e defense algorithm (wh ic h is in spired b y online learning theory) and the b est  xed proactiv e defense. A ddit ionally , w e sho w that , unlik e p r oact iv e defenses, th is reactiv e st r at eg y is robust to a lac k of in f ormation ab out the attac k er's incen tiv es and kno wledge. 1 In tr o duction Man y en terpr i ses e mplo y a Chief Information Securit y Ocer (CISO) to man- age the en terprise's information securit y risks. T ypically , an en te r prise has man y more securit y vulnerabil ities than it can realistically repair. Instead of declaring the en terprise insecure un til ev ery last vulnerabili t y is plugged, CISOs t ypi- cally p erform a cost-b enet analysis to iden tify whic h risks to address, but what constitutes an eectiv e CISO strategy? The con v en tional wisdom [28,21] is that CISOs ough t to adopt a forw ard-lo oking proac t i v e approac h to mitigating se- curit y r i sk b y examining t he en terprise for v ul nerabilities that migh t b e exploited in the future. A dv o cates of proactiv e sec ur i t y often e q uate reactiv e securit y with m y opic bug-c hasing and consider it ineectiv e. W e establish sucien t c onditions for when reacting str ate gic al ly to attac ks is as eectiv e in discouraging attac k ers. W e study the ecacy of reactiv e strategies in an economic mo del of the CISO's securit y cost-b enet trade-os. Unlik e pr e v i ously prop osed e conomic mo dels of securit y (see Section 7), w e do not assume the attac k er acts according to a xed probabilit y di s tribution. Instead, w e conside r a game-theoretic mo del with a s trateg ic attac k er who resp onds to t he defender's strategy . As is standard in the securit y literature, w e mak e w orst-case assumptions ab out the attac k er. F or example, w e gran t t he attac k er complete kno wledge of th e defende r ' s strategy and do not require the attac k er to act rationally . F urther, w e mak e conserv ativ e assumptions ab out the reactiv e defender's kno wledge and do not assume the 2 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett defender kno ws all the vulnerabilities in the sy stem or the attac k e r's incen tiv es. Ho w ev er, w e do assume that t he defender can observ e the attac k er's past actions, for example via an in trusion detection sy stem or user metrics [3]. In our mo del, w e nd that t w o prop erties are sucien t for a reactiv e strategy to p erform as w e ll as the b est proactiv e strategies. First, no single attac k is catastrophic, meaning the defender can surviv e a n um b e r of attac ks. This is consisten t with situations where in t rusions (that, sa y , steal credit card n um b ers) are regrettable but not business-ending. Second, the defender's bu dge t is liquid , meaning the defender can re-allo cate resources without p enalt y . F or e x am p l e, a CISO ca n reassign mem b ers of the sec ur i t y team from managi ng rew all rules to impro ving database access con trols at relativ ely lo w switc hing costs. Because our mo del abstracts man y v ul nerabilities in to a single graph edge, w e view the act of defense as increasing the attac k er's c ost for moun ting an attac k instead of prev en ting the attac k (e. g., b y patc hing a single bug). By making this assumption, w e c ho ose not t o study the tactical patc h-b y-patc h i n teraction of the attac k er and defender. Instead, w e mo del en terprise securit y at a more abstract lev el appropriate for the CISO. F or example, the CISO migh t allo cate a p ortion of his or her budget to engage a c ons ul t a n c y , suc h as WhiteHat or iSEC P artners, to nd and x cross-site scri p ti n g in a particular w eb application or to require that emplo y e es use SecurID tok ens dur i ng authen tication. W e mak e the tec hnical assumption that attac k er costs are linearly dep ende n t on defense in v estmen ts lo call y . This assumption do es not reect patc h-b y-patc h in teracti on, whic h w ould b e b etter represen ted b y a step function (with the step placed at the cost to deplo y the patc h). Instead, this assumption reects the CISO's higher- lev el viewp oin t where the staircase of summed step functions fades in to a slop e. W e ev aluate the defender's strategy b y measuring the attac k er's cum ulativ e return-on-in v estmen t, the r eturn-on-attack (R O A), whic h has b een prop osed previously [8]. By studying this metric, w e fo cus on defenders who seek to cut o the attac k er's o xygen, that is to reduce the attac k er's incen tiv es for attac k- ing the en terprise. W e do not dist i nguish b et w een succ essf ul  and unsuccessful attac ks. Instead, w e compare the pa y o the attac k er receiv es from his or her ne- farious deeds with the cost of p erforming said de eds. W e imagine that sucien tly disincen tivized attac k e r s will seek alternativ es, suc h as attac king a dieren t or- ganization or starting a legitimate business. In our main result, w e sho w sucien t conditions for a learning-based reactiv e strategy to b e comp etitiv e with the b est xed proac t i v e defense in the sense that the comp etitiv e r a t i o b et w een the reactiv e R O A and the proactiv e R O A is at most 1 +  , for all  > 0 , pro vided the game lasts sucien tly m an y rounds (at least Ω (1 / ) ). T o pro v e our theorems, w e dra w on tec hniques from the online learning literature. W e extend these tec hn i q ue s to the case where the learner do es not kno w all the game matrix ro ws a priori , letting us analyze situations where the defe n de r do es not kno w all the vulnerabilities i n adv ance. Although our main results are in a graph-based mo del with a single attac k er, our results generalize to a mo del based on Horn clauses w i th m ultiple attac k ers. Our results A Learni ng-Based Approac h to Reactiv e Securit y 3 Fig. 1.1. An attac k graph represen ting an en terprise data cen ter. are also robust to switc hing from R O A to attac k er prot and to allo wing the proactiv e defender to revise the defense allo cation a x e d n um b er of times. Although m y opic bug c hasing is most lik ely an ineectiv e reactiv e strategy , w e nd that in some situations a str ate gic reactiv e strategy is as eectiv e as the opti- mal xed proactiv e defense. In fact, w e nd that the natural strategy of gradually reinforcing attac k ed edges b y shifting budget from unattac k ed edges learns the attac k er's i ncen tiv es and constructs an eectiv e defense. Suc h a strategic reactiv e strategy is b oth easier to implemen t than a proactiv e strategyb ecause it do es not presume that the defender kno ws the attac k er's in ten t and capabilitiesa nd is less w asteful than a proactiv e strategy b ecause the de f e nder do es not exp e nd budget on attac k s that do not actually o ccur. Based on our re s ul t s, w e encourage CISOs to question the assumption that proactiv e ri s k managemen t is inheren tly sup erior to reactiv e risk managemen t. Organization. Section 2 formalizes our mo del. Section 3 sho ws that p erimeter defense and defense-in-depth arise naturally in our mo del. Section 4 presen ts our main results b ounding the com p etitiv e ratio of reac tiv e v ersus pr oa ctiv e defense strategies. Section 5 outlines scenarios in whic h reactiv e securit y out-p erforms proactiv e securit y . Section 6 generali zes our results to Horn clauses and m ultipl e attac k ers. Section 7 relates related w ork. Section 8 concludes. 2 F ormal Mo del In this section, w e presen t a game-theoretic mo del of attac k and defense. Unlik e traditional bug-l ev el attac k graphs, our m o del is mean t to capture a managerial p ersp ec tiv e on en terprise securit y . The mo del is somewhat general in the sense that attac k graphs can represen t a n um b er of concrete situations, including a net w ork (see Figure 1.1), comp onen ts in a complex soft w are system [9], or an In ternet F raud Battleeld [13]. System. W e mo del a system using a directed graph ( V , E ) , whic h denes the game b et w een an attac k er and a defender. Eac h v ertex v ∈ V in the graph represen ts a state of the system. Eac h edge e ∈ E represen ts a state transition the attac k er can induce. F or example, a v ertex migh t represen t whether a particular mac hine in a net w ork has b een compromised b y an attac k er. An e dge from one 4 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett mac hine to another migh t represen t that an attac k er who has compromised the rst mac hine migh t b e able to compromise th e second mac hine b ecause the t w o are connected b y a net w ork. Alternativ ely , the v e r ti ces migh t represen t dieren t comp onen ts in a soft w are system. An edge migh t represen t that an attac k er sending input to the rst comp onen t can send input to the second. In attac king the system, the attac k er selects a path in the graph that b e- gins with a designated start vertex s . Our results hold in more general mo dels (e.g., based on Horn clauses), but w e defer discussing suc h gene r al izations un til Section 6. W e think of the attac k as driving the s ystem through the series of state transitions indicated b y the edges included in the path. In the net w orking example in Figure 1.1, an attac k er migh t rst compromise a fron t-end serv e r and then lev erage the serv er's c onnectivit y to the bac k-end database serv er to stea l credit card n um b ers from the database. Incen tiv es and Rew ards. A ttac k ers resp ond to incen tiv es. F or example, at- tac k ers co mpr o mise mac hines and form b otnets b ecause they mak e money from spam [20] or re n t the b otnet to others [32]. Other attac k ers steal credit card n um b ers b ecause credit card n um b ers ha v e monetary v alue [10]. W e mo del the attac k er's incen tiv es b y attac hing a non-negativ e r ewar d to eac h v ertex. These rew ards are the utilit y the attac k er deriv es from driving the system in to the state represen ted b y the v ertex. F or example, compromisi n g the database serv er migh t ha v e a sizabl e re w ard b ecause the database serv er con tains easily monetizable credit card n um b ers. W e assume the s tart v ertex has zero rew ard, forcing the attac k er to undertak e some action b efore earning utilit y . Whenev er the attac k er moun ts an attac k, the attac k er receiv es a p ayo equal to the sum of the rew ards of the v ertices visited in the att a c k path: pa yoﬀ ( a ) = P v ∈ a rew ard( a ) . In the example from Figure 1.1, if an attac k er compromises b oth a fron t-end serv er and the database serv er, the attac k er receiv es b oth rew ards . A ttac k Surface and Cost. The defender has a xed defense budget B > 0 , whic h the defender can divide among the edges in the graph according t o a defense al lo c ation d : for al l e ∈ E , d ( e ) ≥ 0 and P e ∈ E d ( e ) ≤ B . The defender's allo cation of budget to v arious edge s corresp onds to the de- cisions made b y the Chief Information Se curit y Ocer (CISO) ab out where to allo cate the en terprise's securit y resources. F or example, the CISO migh t a llo- cate organizational headcoun t to fuzzing en terprise w eb applications for XSS vulnerabilities. These kinds of in v estmen ts are con tin uous in the sense that the CISO ca n allo cate 1 / 4 of a full-time emplo y ee to w orrying ab out XSS. W e denote the set of feasible allo cations of budget B on edge set E b y D B ,E . By defending an e d g e, the defender mak es it more dicult for the attac k er to use that edge in an attac k. Eac h unit of bu dge t the defender allo cates to an edge raises the cost that the attac k er m ust pa y t o use that edge in an att a c k. Eac h e dge has an attack surfac e [19] w that represen ts the dicult y in defending against that state transition. F or example, a serv er that runs b oth Apac he and Sendmail has a larger attac k s urface than one that runs only Apac he b ecause defending the  rst serv er i s more dicult than the second. F ormally , the attac k er m ust pa y the follo wing c ost to tra v erse the edge: cost( a, d ) = P e ∈ a d ( e ) /w ( e ) . A Learni ng-Based Approac h to Reactiv e Securit y 5 Allo cati ng defense budget to an edge do es not reduce an edge's attac k surface. F or example, consider defending a hallw a y with bric ks. The wider the hallw a y (the larger the attac k surface), the more bric ks (budget allo cation) r e q ui r e d to build a w all of a certain heigh t (the cost to the attac k er). In this form ulation, the function mapping th e defe nd e r's budget allo cation to attac k er cost is li near, prev en ting the defender from ev er fully d e fending an edge. Our use of a linear functi on reects a lev el of abstraction more appropriate to a CISO who can nev er fully defend assets, whic h w e ju stify b y observing that the rate of vulnerabilit y disco v ery in a particular pie ce of soft w are is roughly constan t [29]. A t a lo w er lev el of detail, w e migh t replace this function with a step function, indicating that the defender can patc h a vulnerabilit y b y allo cating a threshold amoun t of budget. Ob jectiv e. T o ev aluate defense strategies, w e measure the attac k er's incen tiv e for attac king using the r eturn-on-attack (R O A) [8], whic h w e dene as follo ws: R OA( a, d ) = pa yoﬀ ( a ) cost( a, d ) W e use this metric for ev aluating defense strategy b ecause w e b eli ev e that if the defender lo w ers the R O A sucien tly , the attac k er will b e discouraged from attac king the system and will nd other uses for his or her capital or industry . F or example, the attac k er m igh t decide to attac k another system. Anal ogous results hold if w e quan tify the attac k er's incen tiv es in terms of prot (e.g., with proﬁt( a, d ) = pay oﬀ ( a ) − cost( a, d ) ), but w e fo cus on R O A for simplicit y . A purely rational attac k er will moun t attac ks that maximi ze R O A. Ho w ev er, a real attac k er migh t not maximize R O A. F or example, the attac k er mi gh t not ha v e complete kno wledge of the system or its defense. W e strengthen our results b y c ons i dering all attac ks, not just those that maximize R O A. Proactiv e Securit y . W e ev aluate our learning-based r e activ e approac h b y c om- paring it against a pr o active approac h to risk managemen t in whic h the defender carefully examines the sys te m and constructs a defense in order to fend o future attac ks. W e st re n g t he n this b enc hmark b y pro viding the proactiv e defender com- plete kno wledge ab out the system, but w e require that the defender comm it to a xed strategy . T o str e ngth e n our re s ults, w e state our main result in terms of al l suc h proacti v e defenders. In particular, this class of defenders includes the r atio- nal pr o active defender who em p l o ys a defense allo cation th a t minimizes the max- im um R O A the attac k er can extract from the sy stem : argmin d max a R OA( a, d ) . 3 Case Stud ie s In this section, w e describ e i ns tanc es of our mo del to build the reader's in tu- ition. These examples illustrate that some fam iliar securit y concepts, including p erimeter defense and defense in depth, arise naturally as optimal defenses in our mo del. These defenses can b e constructed either b y r a t i onal proactiv e attac k ers or con v erged to b y a learning-based reactiv e defense. 6 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett w :1 w :1/9 Internet Front End Database reward:1 reward:9 s Fig. 3.1. A ttac k graph represen ting a simplie d data cen ter net w ork. P erimeter Defense. Consider a system in whic h the attac k er's rew ard is non- zero at e x ac tly one v ertex, t . F or example, i n a medical system, the attac k er's rew ard for obtaining electronic medical records migh t w ell dominate the v alue of other attac k targets suc h as emplo y ees' v acation calendars. In suc h a system, a rational attac k er will select the minim u m -cost path from the start v ertex s to the v aluable v ertex t . The optimal defense limits the attac k er's R O A b y maximizing the cost of the m inim um s - t path. The algorithm for constructing this defense is straigh tf o r w ard [7]: 1. Let C b e the minim um w eigh t s - t cut in ( V , E , w ) . 2. Select the follo wing defe n se : d ( e ) = ( B w ( e ) / Z if e ∈ C 0 otherwise , where Z = X e ∈ C w ( e ) . Notice that this algorithm constructs a p erimeter defense : the defender al lo cates the en tire defense bud g et to a single cut in the graph. Essen tially , the defender spreads the de f e ns e budget o v er the attac k surface of the cut. By c ho osing the minim um-w eigh t cut, the defender is c ho osing to d e fend the smallest attac k surface that separates the start v ertex from t he target v ertex. Real de f e nders use similar p e rimeter defenses, for example, when they install a rew all at the b oundary b et w een their organization and the In ternet b ecause the net w ork's p erimeter is m uc h smaller than its in terior. Defense in Depth. Man y exp erts in securit y practice rec ommend that defend- ers emplo y defense in depth. Defense in d e pth rises naturally in our mo del as an optimal defense for some systems. Consider, for example, the system depicted in Figure 3 .1. This attac k graph is a s i mplied v ersion of t he data cen ter ne t - w ork depicted in Figure 1.1. Al t hough the attac k er rec eiv es the largest rew ard for compromising the bac k-end database serv er, the attac k er also rec eiv es some rew ard for compromising the fron t-end w eb serv er. Moreo v er, the fron t-end w eb serv er has a larger attac k surface than the bac k-end database serv e r b ecause the fron t-end serv er exp oses a more complex in terface (an en tire en terprise w eb application), wherea s the database serv er exp oses only a simple SQL in terface. Allo cati ng defense budget to the left-mo s t edge represen ts try i n g to protect sen- sitiv e database informati on with a complex w eb appli cation rew all instead of database access con trol lists (i.e., p ossible, but economically inecien t). The optimal defense against a rational attac k er is to a llo cate half of the defense budget to the left-most e d g e and half of the budget to the righ t-most edge, limiting the a t tac k er to a R O A of unit y . Shifting the en tire budget to A Learni ng-Based Approac h to Reactiv e Securit y 7 the righ t-most edge (i.e., defe n di ng only the database) is disastrous b ecause the attac k er will simply attac k the fron t-end at zero cost, ac hieving an un b ounded R O A . Shifting the en tire budget to the left-most edge is also problema t i c b ecause the attac k er will attac k t he database (ac hieving an R O A of 5 ). 4 Reactiv e Securit y T o analyze reactiv e securit y , w e mo del the attac k er and defender as pla ying an iterativ e game, alternating mo v es. First, the defender sel ects a defense, and then the attac k er selects an attac k. W e presen t a learning-based reactiv e defense strategy that is obli v io us to v ertex rew ards and to edges that ha v e not y et b een used in attac ks. W e pro v e a t he orem b ounding the comp etitiv e ratio b et w een this reactiv e strategy and the b est proactiv e defense via a serie s of reductions to results from the online learning theory li teratur e . Other applications of this literature include managing sto c k p ortfolios [26], pla y i n g zero-sum games [12], and b o osting other mac hine learning heuristics [11]. Although w e pro vide a few tec hnical extensions, our mai n con tribution comes from applying results from online learning to risk managemen t. Rep eated Game. W e formali ze the rep eated game b et w een the defender a n d the attac k er as follo ws. I n eac h round t from 1 to T : 1. The defender c ho oses de f e ns e allo cation d t ( e ) o v er the edges e ∈ E . 2. The attac k er c ho oses an attac k path a t in G . 3. The path a t and attac k surfaces { w ( e ) : e ∈ a t } are rev ealed to the defender. 4. The attac k er pa ys cost( a t , d t ) and gains pa yoﬀ ( a t ) . In eac h r o u nd, w e let the attac k er c ho ose the attac k path after the defende r commits to the defense allo cation b ecause the defender's budget allo cation is not a secret (in the sense of a cryptographic k ey). F ollo wing the no securit y through obscurit y pr i nciple, w e mak e the conserv ativ e assumption that the att a c k er can accurately determine the defender's budget allo cation. Defender Kno wledge. Unlik e proactiv e defenders, reactiv e defenders do not kno w all of the vulnerabilities t hat exist in the system in adv ance. (If defend- ers had c omplete kno wledge of vulnerabiliti es, conferences suc h as B lac k Hat Briengs w ould s e r v e little purp ose.) Instead, w e rev eal an edge (and its attac k surface) to the de f e nd e r after the attac k e r uses the edge in an a t tac k. F or exam- ple, the defende r migh t monitor the system and learn ho w the attac k er attac k ed the system b y doing a p ost-mortem analysis o f in trusion logs. F or m ally , w e dene a r e active defense s t r ate gy to b e a function from attac k sequences { a i } and the subsystem induced b y the edges con tained in S i a i to defense allo cations suc h that d ( e ) = 0 if edge e 6∈ S i a i . Notice that this requires the defender's strategy to b e oblivious to the system b ey ond t he edges used b y the attac k er. 8 Barth, Rubinstein, Sundarara jan, Mitc hell, Song, Bar t lett Algorithm 1 A reactiv e defe n se strategy for hidden edge s.  Initialize E 0 = ∅  F or eac h round t ∈ { 2 , ..., T } • Let E t − 1 = E t − 2 ∪ E ( a t − 1 ) • F or eac h e ∈ E t − 1 , let S t − 1 ( e ) = ( S t − 2 ( e ) + M ( e, a t − 1 ) if e ∈ E t − 2 M ( e, a t − 1 ) otherwise. ˜ P t ( e ) = β S t − 1 ( e ) t − 1 P t ( e ) = ˜ P t ( e ) P e 0 ∈ E t ˜ P t ( e 0 ) , where M ( e, a ) = − 1 [ e ∈ a ] /w ( e ) is a matrix with | E | ro ws and a column for eac h attac k. Algorithm. Algorithm 1 is a reactiv e defe n se strategy based on the m ultiplica- tiv e up date learning algorithm [5,12]. The algorithm reinforces edges on the attac k path m ultiplicativ ely , taking the attac k surface in to accoun t b y allo cat- ing more budget to easier-to-defend edges. Wh e n new edges are rev e aled, the algorithm re-all o cates budget uniformly f rom the already-rev ealed edges to the newly rev ealed edges. W e state the algorithm in terms of a normalized defense allo cati on P t ( e ) = d t ( e ) /B . Notice that this algorithm is oblivious to unattac k ed edges and the attac k er's rew ard for v i s i t i ng eac h v ertex. An appropriate setting for the algorithm parameters β t ∈ [0 , 1) will b e describ ed b elo w. The algori t hm b egins without an y kno wl edge of the graph whatso ev er, and so allo cates no defense budget to the s ystem. Up on the t th attac k on the system, the a lgorithm up dates E t to b e the set of e dges rev ealed up to this p oin t, and up dates S t ( e ) to b e a w eigh t coun t o f the n um b er of times e has b een used in an attac k th us far. F or eac h edge that has ev er b een rev ealed, the defense allo ca t i on P t +1 ( e ) is c hosen to b e β S t ( e ) t normalized to sum to unit y o v er all edges e ∈ E t . In this w a y , an y edge attac k ed in round t will ha v e its defense allo cation reinforced. The parameter β con trols ho w aggressiv ely the defender reallo cates defense budget to recen tly attac k ed edges. If β is innitesimal, the defender will mo v e the en tire defense budget to the edge on the most recen t attac k path with the smallest attac k surface. If β is enormous, the defender will not b e v ery agile and, instead, l ea v e the defense budget in the initial allo cation. F or an appropriate v alue of β , the algorithm will con v erge to the optim al defense strategy . F or instance, the min c ut in the example from Sec tion 3. Theorems. T o compare this reactiv e defense strategy to all proactiv e defense strategies, w e use the notion of r e gr et from online l earning theory . The follo wing is an additiv e regret b ound relating the attac k er's prot under reactiv e and proactiv e defense strategie s . A Learni ng-Based Approac h to Reactiv e Securit y 9 Theorem 1 The aver age attacker pr ot again st A lgorithm 1 c onver ges to the aver age attacker p r o t against the b est pr o active defense. F ormal ly, if defense al lo c ations { d t } T t =1 ar e output by A lgorithm 1 with p ar ameter se quenc e β s =  1 + p 2 log | E s | / ( s + 1)  − 1 on any system ( V , E , w , rew ard , s ) r eve ale d online and any attack se quenc e { a t } T t =1 , then 1 T T X t =1 proﬁt( a t , d t ) − 1 T T X t =1 proﬁt( a t , d ? ) ≤ B r log | E | 2 T + B (log | E | + w − 1 ) T , for al l pr o act i ve defense str ate gies d ? ∈ D B ,E wher e w − 1 = | E | − 1 P e ∈ E w ( e ) − 1 , the me an of the surfac e r e cipr o c als. Remark 2 W e c an interpr et The or em 1 as establis hing sucient c onditions under which a r e act i v e defense str ate gy is withi n an additive c onstant of the b est pr o active defense str ate gy. Inste ad of c ar eful ly an alyzing the system t o c onstruct the b est pr o active defense, the defender ne e d only r e act to attacks in a principle d manner to achieve almost the same quality of defense in terms of attacker pr ot. Reactiv e defense strategies can also b e comp etitiv e with proactiv e defense st rate - gies when w e consider an attac k er motiv ated b y retur n on attac k (R O A). The R O A form ulation is app ealing b ecause (unlik e with pr o t ) the ob jectiv e function do es not require measuring attac k er cost and defender budget in the same units. The next result considers the c omp etitiv e ratio b et w een the R O A f or a reactiv e defense strategy and the R O A for the b est proactiv e defense strategy . Theorem 3 The R O A against A lgorithm 1 c onver ges to the R O A again st b est pr o active defens e . F ormal ly, c onsid e r the cumulative R O A: R OA  { a t } T t =1 , { d t } T t =1  = P T t =1 pa yoﬀ ( a t ) P T t =1 cost( a t , d t ) (W e abuse notation slightly and use singleton ar guments to r epr esent the c or- r esp onding c onstant se quenc e.) If defense al lo c ations { d t } T t =1 ar e output by A l- gorithm 1 with p ar amet ers β s =  1 + p 2 log | E s | / ( s + 1)  − 1 on any system ( V , E , w , rew ard , s ) r eve ale d onli ne, such that | E | > 1 , and any attack se quenc e { a t } T t =1 , then for al l α > 0 and pr o active defense str ate gies d ? ∈ D B ,E R OA  { a t } T t =1 , { d t } T t =1  R OA  { a t } T t =1 , d ?  ≤ 1 + α , pr ovide d T is suciently lar ge. 1 Remark 4 Notic e that the r e active defender c an use the same alg or i th m r e- gar d less of whether the attacker is motivate d by pr ot or by R O A. As discusse d in Se ction 5 the optimal pr o active defense is not similarly r obust. 1 T o wit: T ≥ “ 13 √ 2 ` 1 + α − 1 ´ “ P e ∈ inc( s ) w ( e ) ”” 2 log | E | . 10 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett W e prese n t pro ofs of these theorems in App endix A . W e rst pro v e the theorems in the simpler setting where the defender kno ws the en tire graph. Sec ond, w e remo v e the h yp othesis that th e defender kno ws t he edges in adv ance . Lo w er Bounds. In App endix A, w e use a t w o-v ertex, t w o-edge graph to estab- lish a lo w er b ound on the comp etitiv e ratio of the R O A for all reactiv e strategies. The lo w er b ound sho ws that the analysis of Alg or i thm 1 is tigh t and that Algo- rithm 1 is optimal giv en the information a v ailable to the algorithm. The pro of giv es an example where the b est proactiv e defense (sli gh tly) out-p erforms ev ery reactiv e strategy , suggesting the b e nc hmark is not un re as o nably w eak. 5 A d v an tages of Reactivit y In this section, w e examine some situations in whic h a reactiv e defender out- p erforms a proactiv e defender. Proactiv e d e fenses hinge on the defender's mo del of the attac k er's incen tiv e s. If the defender's mo del is inaccurate, the defender will constru c t a proactiv e de f e ns e that is far from optimal. By con trast, a reactiv e defender need not reason ab out the attac k er's incen tiv es directly . Instead, th e reactiv e defender l earns these incen tiv es b y observing the attac k er in action. Learning Rew ards. One w a y to mo del inaccurac ies in the de f e nder's estimates of t he attac k er's incen tiv es is to hide the attac k er's rew ards from the defe n de r. Without kno wledge of the pa y os, a proactiv e defender has dicult y limiting the attac k er's R O A. Cons i der, for example, the star system whose edges ha v e equal attac k surfaces, as depicted in Figure 5.1. Without kno wledge of the attac k er's rew ards, a proactiv e defender has li tt l e c hoice but to allo cate the defense budget equally to eac h edge (b ecause the e d g es are indistinguishable). Ho w ev er, if the attac k er's rew ard is concen trated at a single v ertex, the comp etitiv e ratio for attac k er's R O A (compared to the rational proactiv e defense) is the n um b er of leaf v ertices. (W e can, of course, mak e the ratio w orse b y adding more v ertices.) By con trast, the reactiv e algorithm w e analyze i n Section 4 is comp etitiv e with the rational proactiv e defense b ecause the reactiv e algo r i th m eectiv ely learns the rew ards b y observing whic h attac ks the attac k er c ho oses. Robustness to Ob jectiv e. Another w a y to mo del inaccuracies in the de- fender's e st i mates of the attac k er's incen tiv e s is to assume the defender mis- tak es whic h of prot and R O A actually matter to the attac k er. The defense constructed b y a ratio n a l proac t i v e defender dep ends cruciall y on whether the attac k er's actual incen tiv es are based on prot or based on R O A, whereas the reactiv e algorithm w e a n a lyze in Section 4 is robust to this v ariation. In partic- ular, consider the system depicted in Figure 5 .2, and assume the defender has a budget of 9 . If the d e fender b eliev es the attac k er is motiv ated b y prot, the ratio- nal pr oa ctiv e defense is to allo cate the en tire defense budget to the righ t - most edge (making the prot 1 on b oth edges). Ho w ev er, this defense is disastrous when view ed in terms of R O A b ecause the R O A for the left edge is innite (as opp osed to near unit y when the pr oa ctiv e defender optimizes for R O A). A Learni ng-Based Approac h to Reactiv e Securit y 11 w :1 reward:0 reward:8 reward:0 reward:0 reward:0 reward:0 reward:0 reward:0 w :1 w :1 w :1 w :1 w :1 w :1 w :1 s Fig. 5.1. Star-shap ed attac k graph with rew ards concen- trated in an unkno wn v ertex. reward:1 w :1 w :1 Satellite Office Internet Headquarters reward:10 s Fig. 5.2. An attac k graph that sepa- rates the minimax st rate gies optimizing R O A and attac k e r prot. Catac hresis. The defense constructed b y the rational proactiv e defender is op- timized for a rational attac k er. I f the attac k er is not p erfectly rational, there i s ro om for out-p erforming the rational proactiv e de f e ns e . There are a n um b er of situations in whic h the attac k er migh t not moun t optimal attac ks:  The attac k er mi gh t not ha v e complete kno wledge of the attac k graph. Con- sider, for example, a soft w are v endor who disco v ers v e equally sev e re vulner- abilities in one of their pro ducts via fuzzing. A ccording to proac t i v e securit y , the defender ough t to dedicate equal resources to r e pairing these v e vul- nerabilities. Ho w ev er, a reactiv e de f e nder migh t dedicate more r e sour c es to xing a vu l nerabilit y actually exploited b y a t tac k ers in the wild. W e can mo del these situations b y making the attac k er oblivious to som e edges.  The attac k er migh t not ha v e complete kno wledge of the defense allo cation. F or example, an attac k er attempti n g to in v ade a corp orate net w ork migh t target com p ute r s in h uman resources without realizing that attac king the customer relationship managemen t database i n sales has a higher return-on- attac k b ecause the database is ligh tly de f e nded. By observing attac ks, the reactiv e strategy le arn s a defense tuned for the actual attac k er, causing the attac k er to receiv e a lo w er R O A. 6 Generalizations Horn Clauses. Th us far, w e ha v e presen ted our results using a graph-based system mo del. Our results extend, ho w ev er, to a more general s ystem mo del based on Horn clauses. Datalog programs, whi c h are based on Horn clauses, ha v e b een used in previous w ork to represen t vulnerabili t y-lev el attac k graphs [27]. A 12 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Horn clause is a stateme n t in prop ositional logic of the form p 1 ∧ p 2 ∧ · · · ∧ p n → q . The prop ositions p 1 , p 2 , . . . , p n are called the ante c e dents , and q is called the c onse quent . The set of an teceden ts migh t b e empt y , in whic h case the clause simply ass e r ts the consequen t. Notice that Horn clauses are negation-free . In some sense, a Horn clause represen ts an e dge in a h yp ergraph where m ultiple pre-conditions are required b efore taking a certain state trans i tion. In the Horn mo del, a system consists of a set of Horn clauses, an attac k surface for eac h clause, and a rew ard for eac h prop osition. The defender allo cates defense budge t among the Horn clauses. T o moun t a n attac k, the attac k er sele cts a valid pr o of : an ordered list of rules suc h that eac h an te ceden t app ears as a consequen t of a rule earlier in the list. F or a giv en pr o of Π , cost( Π , d ) = X c ∈ Π d ( c ) /w ( e ) pa yoﬀ ( Π ) = X p ∈ [ [ Π ] ] rew ard( p ) , where [ [ Π ] ] is the set of prop ositions pro v ed b y Π (i.e., those prop ositions that app ear as consequen ts in Π ). Prot and R O A are computed as b efore. Our results generaliz e to this mo del directly . Essen tially , w e need only replace eac h instance of the w ord edge with Horn clause and path with v alid pro of. F or example, the ro ws of the matrix M used throughout the pro of b ecome the Horn clauses, and the columns b ecome the v alid pro ofs (whic h are n umerous, but no matter). The en tries of the matrix b ecome M ( c, Π ) = 1 /w ( c ) , analogous to the graph case. The one non-ob v i ous substitution is inc( s ) , whic h b ecome s the set of clauses that lac k an teceden ts. Multiple A ttac k ers. W e ha v e fo cused on a securit y game b e t w een a single attac k er and a defender. In practice, a securit y system migh t b e attac k ed b y sev eral unco ordinated attac k ers, eac h with dieren t information and dieren t ob je ctiv es. F ortunately , w e can sho w that a mo del with m ultiple attac k ers is mathematically equiv alen t to a mo del with a single attac k er with a randomized strategy: Use the set of attac ks, one p er attac k er, to dene a distr i bu ti on o v er edges where the probabilit y of an edge is linearly prop ortional to the n um b er of attac ks whic h use the edge . This precludes the in terpretation of an attac k as an s -ro oted path, but our pro ofs do not rely up on th i s i n terpretation and our results hold in suc h a mo del with appropriate mo d i cations. A daptiv e Proactiv e Defenders. A simple application of an online learning result [18], omitted due to space constrain ts, mo dies our regret b ounds for a proactiv e defender who re-al lo cates budget a  xed n um b er of times. In this mo del, our results remain quali tativ ely the same. 7 Related W ork Anderson [1] and V arian [ 31] informally discuss (via anecdotes) ho w the desig n of information securit y m ust tak e incen ti v es in to accoun t. August and T unca [2] compare v arious w a y s to incen tivize users to patc h their systems in a setting where the users are more susceptible to att a c ks if their neigh b ors do not patc h. A Learni ng-Based Approac h to Reactiv e Securit y 13 Gordon and Lo eb [15] and Hausk en [17] analyze the costs and b enets of securit y in an economic mo del (with non-strategic attac k ers) where the proba- bilit y of a successful exploit is a function of the defense in v estmen t. Th e y use this mo del to compute the optimal lev el of in v estmen t. V arian [30] studies v arious (single-shot) securit y games and iden ties ho w m uc h agen ts in v est in securit y at equilibrium. Grossklags [16] extends this mo del b y letting agen ts self-insure. Miura et al. [24] study externalities t hat app ear due to u se r s ha ving the same passw ord across v arious w ebsites and discuss pareto-impro ving securit y in v estmen ts. Miura and Bam b os [25] r a n k vulnerabil ities according to a random- attac k er mo del. Skyb o x and RedSeal oer practical systems that help en terprises prioritize vulnerabil ities based on a random-attac k er mo del. Kumar et al. [22] in v estigate optimal securit y arc hitectures for a m ulti-division en terprise, taking in to accoun t losses due to lac k of a v ailabilit y and conden tialit y . None of the ab o v e pap ers explicitly mo del a truly adv ersarial attac k er. F ultz [14] generalizes [16] b y mo deling attac k ers explicitly . Ca vusoglu et al. [4] highligh t the imp ortance of using a game-theoretic mo del o v er a decision theo- retic mo del due to the pre s e nce of adv ersarial attac k ers. Ho w ev er, these mo dels lo ok at idealized settings that are not genericall y applicable. Ly e and Wing [23] study t he Nash equilibrium of a single-shot game b e t w een an attac k er and a de- fender that mo dels a p a r ti cular en terprise se curit y scenario. Arguably this mo del is most similar to ours in terms of abstraction l ev el. Ho w ev er, calculating the Nash equili br i um requires detailed kno w l edge of the adv e r sary's incen tiv es, whic h as discussed in the in tro duction, migh t not b e readily a v ailable to the defender. Moreo v er, their game con tai n s m ultiple equilibria, w eak ening their prescriptions. 8 Conclusions Man y securit y exp erts equate reactiv e securit y with m y opic bug-c hasing and ig- nore principled reactiv e strategies when they recomm end adopting a proactiv e approac h to risk managemen t. In this pap er, w e est a b l ish sucien t conditions for a learning-based reactiv e strategy to b e comp etitiv e with the b est xed proactiv e defense. A dditionally , w e sho w that reactiv e defenders can out-p erform proac- tiv e defenders when the proactiv e defender defends against attac ks that nev er actually o ccur. Although our mo d e l is an abstraction of the complex in terpla y b et w een attac k ers and defenders, our results supp ort the follo wing practic al ad- vice for CISOs making securit y in v estmen ts:  Emplo y monitoring to ols that let y ou detect and analyze attac ks against y our en terprise. These to ols hel p fo c us y our eorts on th w arting real attac ks.  Mak e y our securit y organiz ation more agile. F or example , build a rigorous testing lab th a t lets y ou roll out securit y patc hes quic kly o n c e y ou detect that attac k ers are exploiting these vulnerabil ities.  When determining ho w to exp end y our securit y budget, a v oid o v erreacting to the most recen t attac k. Instead, consider al l previous attac ks, but discoun t the imp ortance of past attac ks exp onen tially . 14 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett In some situations, proac tiv e securit y can o u t-p erform re activ e securit y . F or ex- ample, reactiv e approac hes are ill-suited for defending agai ns t catastrophic at- tac ks b ecause there is no next roun d in whic h the defender can use information learned from th e attac k. W e hop e our results will le ad to a pro ductiv e discussion of the limitations of our mo del and the v alidit y of our conclusions. Instead of assuming that proactiv e securit y is alw a ys sup erior to reacti v e securit y , w e in vite the rea d e r to consider when a reactiv e approac h migh t b e appropriate. F or the parts of an en t e rp ri s e where the defender's budget is liquid and there are no catastrophic losses, a carefully constructed reactiv e strategy can b e as eectiv e as the b est proactiv e defense in the w orst case and signican tly b etter in the b est case. A c kno wledgmen ts W e w ould lik e to thank Elie B ur szte in, Eu-Jin Goh, and Matt Finifter for their though tful commen ts and helpful f e edbac k. W e gratefully ac kno wledge the supp ort of the NSF through the TR UST Science and T ec h- nology Cen ter and gran ts DMS-0707060, CCF-0424422, 0311808, 0448452, and 0627511, and the su pp ort of the AF OSR through the MURI Program, and the supp ort of the Sieb el Sc holars F oundation. References 1. Anderson, R .: Wh y information secu r i t y is hardAn economic p ersp ect iv e. 17th Ann ual Computer Securit y Applications Conference pp. 358365 (2001) 2. August, T., T unca, T.I.: Net w ork sof t w a re securit y and u ser incen tiv es. Manage- men t Science 52(11), 17031720 (2006) 3. Beard, C.: In tro ducing Test Pilot (Marc h 2008), http://labs.mozilla.co m/2008/03/introducing- test- pilot/ 4. Ca vusoglu, H., Ragh unathan, S., Y ue, W.: Decision-theoretic and game-theoretic approac hes to IT securit y in v estmen t. Journal of Managemen t Information Systems 25(2), 2813 04 (2008) 5. Cesa-Bianc hi, N., F reund, Y., Ha u ssler, D., Helm b old, D.P ., Sc hapire, R.E., W ar- m u th, M.K.: Ho w to use exp ert advice. Journal of the Asso ciation for Computing Mac hinery 44(3), 427485 (Ma y 1997) 6. Cesa-Bianc hi, N., F reund, Y., Helm b old, D.P ., Haussler, D., Sc hapire, R.E., W ar- m u th, M.K.: Ho w to use exp ert advice. In : Pro ceedings of the T w en t y-Fifth Ann ual A CM Symp osium on Theory of C o mp uting. pp. 382391. A CM, New Y ork, NY, USA ( 19 93) 7. Chakrabart y , D., Meh t a, A., V azirani, V.V.: D es ign is as easy as opt imization. In: 33rd In ternational Collo quium on Automata, Languages and Programming (ICALP). LNCS 4051, v ol. P art I, pp. 477488 (2006) 8. Cremonini, M.: Ev aluating in f ormation securit y in v est men ts from attac k ers p er- sp ectiv e: the retu rn-on-attac k (R O A). In: F ourth W orkshop on the Economics o f Information Securit y (2005) 9. Fisher, D.: Multi-pro cess arc hitecture (July 2008), http://dev.chromium.or g/ developers/design- documents/multi- process- archi tecture 10. F ranklin, J., P axson , V ., P errig, A., Sa v age, S.: An inquiry in to the nature and causes of the w ealth of in t er n et miscrean ts. In: Pro ceedings of the 2007 A CM Conference on Computer and Comm unication s Sec urit y . pp . 375388. A CM, N ew Y ork, NY, USA (2007) A Learni ng-Based Approac h to Reactiv e Securit y 15 11. F reund, Y., Sc hapire, R.: A short in tro d uction to b o ostin g. Journal of the Japanese So ciet y for Articial In telligence 14(5), 77 1780 (1999) 12. F reund, Y., Sc hapire, R.E.: A daptiv e game pla ying u sing m ultiplicativ e w eigh ts. Games a n d Economic Beha vior 29, 79103 (1999) 13. F riedb erg, J.: In ternet fraud battleeld (April 2007), http://www.ftc.gov/ bcp/ workshops/proofpositiv e/Battlefield_Overview.pdf 14. F ultz, N., Grossklags, J. (eds.): Blue v ersus Red: T o w ards a mo del of distribut ed securit y attac ks. Pro ceedings of the Thirteen th In ternational Conference Financial Cryptograph y and Data Securit y ( F ebruary 2009) 15. Gordon , L.A., Lo eb, M.P .: The economics of information securit y in v estmen t. A CM T ransact ions on Information and System Secu r i t y 5(4), 438457 ( 20 02) 16. Grosskl a gs, J., Christin, N., Ch uang, J.: Se cure or insure?: A game-theoretic anal- ysis of information securit y games. In: Pro ceed ing of the 17th in ternat ional con- ference on W orld Wide W eb. p p. 209218. A CM, New Y ork, NY, USA (2008) 17. Hausk en, K.: Returns to information secu rit y in v estmen t: The e ec t of alternativ e information secu rit y b rea c h funct ions on opt ima l in v estmen t and sen sitivit y to vulnerabilit y . Information Systems F ron tiers 8(5), 338349 (2006) 18. Herbst er, M., W arm uth, M.K.: T rac k ing the b est exp ert. Mac hine Learni ng 32(2), 151178 (1998) 19. Ho w ard, M.: A ttac k surface: Mitigate securit y risks b y minimizing the co de y ou exp ose to un trusted users. MSDN Magazine (No v em b er 2004), http://msdn.microsoft. com/en- us/magazine/cc1 63882.aspx 20. Kani c h, C., Kreibic h, C. , Lev c henk o, K., Enrigh t, B., V o elk er, G.M ., P axson, V., Sa v age, S.: Spamalytics: An empirical analysis of spam mark eting con v ersion. In: Pro c eedings of the 2008 A CM Conference on Computer an d Comm unications Se- curit y . pp. 314. A CM, New Y ork, NY, USA (2008) 21. Kark, K., P enn, J., Dil l, A.: 2008 C ISO priorities: The righ t ob jectiv es but the wrong fo cus. Le Magazine de la Sécurité Inform at ique (April 2009) 22. Ku m ar, V., T elang, R., Mukhopadh y a y , T. : Optimal information secu rit y arc hitec- ture for the en terprise, http://ssrn.com/abst ract=1086690 23. L y e, K.w., Wing, J.M.: Game strategies in net w ork s ecurit y . In: Pro ceedin g s of the F ou ndations of Computer Securit y W orkshop. pp . 1322 (2002) 24. M iura-K o, R.A., Y olk en , B., Mitc hell, J., Bam b os, N.: Secu r i t y decision-making among in terdep enden t organizat ions. In: Pro ceedings of th e 21st IEEE Comput er Securit y F oundations Symp osium. p p. 6680. IEEE Computer So ciet y , W ashing- ton, DC, USA (2008) 25. M iura-K o, R., Bam b os, N.: SecureRank: A risk-based vuln erabilit y managemen t sc heme for co mp uting infrastructures. In: Pro ceedings of IEEE In ternational Con- ference on Comm u nications. pp. 14551460 (June 2007) 26. Ord en tlic h, E., C o v er, T.M.: The cost of ac hieving the b est p ortfolio in hi ndsigh t. Mathematics of Op erations Researc h 23(4) , 960982 (1998) 27. Ou , X., Bo y er, W.F., McQueen, M.A.: A scalable approac h to a t tac k graph gener- ation. In: Pro ceedings of the 13th A CM Conference on Computer and Comm uni- cations Securit y . pp. 336345 (2006) 28. Piron ti, J.P .: Key elemen ts of an information securit y program. In f ormation Sys- tems Con trol Journal 1 (2005) 29. R escorla, E.: Is nding securit y holes a go o d idea? IEEE Securit y and Priv acy 3(1), 1419 (2005) 30. V arian, H.: System reliab ilit y and free riding (2001) 31. V arian, H.R.: Managing o n line securit y risks. New Y ork Times. Jun 1, 2000 32. W arner, B.: Home P C s ren ted out in sab otage-for-hire rac k et. Reuters (July 2004) 16 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Algorithm 2 Reactiv e defense strategy for k no wn edges using the m ultiplicativ e up date algorithm.  F or eac h e ∈ E , initialize P 1 ( e ) = 1 / | E | .  F or eac h round t ∈ { 2 , . . . , T } and e ∈ E , l et P t ( e ) = P t − 1 ( e ) · β M ( e,a t − 1 ) /Z t where Z t = X e 0 ∈ E P t − 1 ( e ) β M ( e 0 ,a t − 1 ) A Pro ofs W e no w describ e a series of reductions that establish the m ain results. First, w e pro v e Theorem 1 in the simpler setting where the defender kno ws the en tire graph. Second, w e remo v e the h yp othesis that the defender kno ws the edges i s adv ance. Finall y , w e extend our results to R O A. Prot (Kno wn Edges). Supp ose that the re activ e defender is gran ted full kno wledge of the system ( V , E , w , rew ard , s ) from the outset. Sp ecically , the graph, att a c k surfaces, and rew a r ds are all rev ealed to the defender prior to the rst round. Algorithm 2 is a reactiv e defense strategy that mak e s use of this additional kno wledge. Lemma 5 If defense al lo c ation s { d t } T t =1 ar e output by A lgorithm 2 with p a- r ameter β =  1 + q 2 log | E | T  − 1 on an y system ( V , E , w , rew ard , s ) and attack se quenc e { a t } T t =1 , then 1 T T X t =1 proﬁt( a t , d t ) − 1 T T X t =1 proﬁt( a t , d ? ) ≤ B r log | E | 2 T + B log | E | T , for al l pr o active defense str ate g i es d ? ∈ D B ,E . The lemma's pro of is a reduction to the follo wing regret b ound from online learning [12, Corollary 4]. Theorem 6 If the multiplic ative up date a lg or ithm (A lgor ithm 2) is run with any game matrix M with elem ents in [0 , 1] , and p ar ameter β =  1 + p 2 log | E | /T  − 1 , then 1 T T X t =1 M ( P t , a t ) − min P ? ≥ 0: P e ∈ E P ? ( e )=1 ( 1 T T X t =1 M ( P ? , a t ) ) ≤ r log | E | 2 T + log | E | T . Pr o of (of L emma 5). Due to the normalization b y Z t , the sequence of defense allo cati ons { P t } T t =1 output b y Algorithm 2 is in v arian t to adding a constan t to A Learni ng-Based Approac h to Reactiv e Securit y 17 all elemen ts of m atr i x M . L et M 0 b e the matrix obtained b y adding constan t C to al l en tries of arbitrary game ma t ri x M , and let sequences { P t } T t =1 and { P 0 t } T t =1 b e obtained b y running m ultiplicativ e up date with matrix M and M 0 resp ectiv ely . Then, for all e ∈ E and t ∈ [ T − 1] , P 0 t +1 ( e ) = P 1 ( e ) β P t i =1 M 0 ( e,a i ) P e 0 ∈ E P 1 ( e 0 ) β P t i =1 M 0 ( e 0 ,a i ) = P 1 ( e ) β ( P t i =1 M ( e,a i ) ) + tC P e 0 ∈ E P 1 ( e 0 ) β ( P t i =1 M ( e 0 ,a i ) ) + tC = P 1 ( e ) β P t i =1 M ( e,a i ) P e 0 ∈ E P 1 ( e 0 ) β P t i =1 M ( e 0 ,a i ) = P t +1 ( e ) . In particular Algorithm 2 pro duces t he same defense allo cation sequence as if the game matrix el emen ts are increased b y one to M 0 ( e, a ) = ( 1 − 1 /w ( e ) if e ∈ a 1 otherwise. Because this new matrix has en tries in [0 , 1] w e can apply Theorem 6 to pro v e for the or i ginal matrix M that 1 T T X t =1 M ( P t , a t ) − min P ? ∈D 1 ,E ( 1 T T X t =1 M ( P ? , a t ) ) ≤ r log | E | 2 T + log | E | T . (A.1) No w, b y denition of the origi nal game matrix, M ( P t , a t ) = X e ∈ E − ( P t ( e ) /w ( e )) · 1 [ e ∈ a t ] = − X e ∈ a t P t ( e ) /w ( e ) = − B − 1 X e ∈ a t d t ( e ) /w ( e ) = − B − 1 cost( a t , d t ) . Th us I nequ a lit y (A.1) is equiv alen t to − 1 T T X t =1 B − 1 cost( a t , d t ) − min d ? ∈D 1 ,E ( − 1 T T X t =1 B − 1 cost( a t , d ? ) ) ≤ r log | E | 2 T + log | E | T Simple algebraic manipulation yields 1 T T X t =1 proﬁt( a t , d t ) − min d ? ∈D B,E ( 1 T T X t =1 proﬁt( a t , d ? ) ) = 1 T T X t =1 (pa yoﬀ ( a t ) − cost( a t , d t )) − min d ? ∈D B,E ( 1 T T X t =1 (pa yoﬀ ( a t ) − cost( a t , d ? )) ) = 1 T T X t =1 ( − cost( a t , d t )) − min d ? ∈D B,E ( 1 T T X t =1 ( − cost( a t , d ? )) ) ≤ B r log | E | 2 T + B log | E | T . 18 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Prot (Hidden Edges). The s tandard algorithms in online learning assumes that the ro ws o f the matrix are kno wn in adv ance. Here, the edge s are not kno wn in adv ance and w e m ust relax this a s sumption using a sim ulation ar- gumen t, whic h is p erhaps t he least ob vious part of the reduction. The defense allo cati on c hosen b y Algorithm 1 at time t is precisely the same as the defense allo cati on th a t w ould ha v e b een c hosen b y Algorithm 2 had the defender run Algorithm 2 on the curren tly visible subgraph. The fol lo wing l emma formalizes this equiv alence. Note that Algorithm 1's parameter is re activ e: it corresp onds to the Algorithm 2's parameter, but for the s ubgraph induced b y the edges re- v ealed so far. That is, β t dep ends only on edges visible to the defender in round t , letting the defender actually run the a lgorithm. Lemma 7 Consider arbitr ary r ound t ∈ [ T ] . If A l g or ithm s 1 a nd 2 ar e ru n with p ar ameters β s =  1 + p 2 log | E s | / ( s + 1)  − 1 for s ∈ [ t ] and p ar ameter β =  1 + p 2 log | E t | / ( t + 1)  − 1 r esp e ctively, with the latter r un on the sub gr aph induc e d by E t , then the defense al lo c a t i ons P t +1 ( e ) output by the algorithms ar e identic al for al l e ∈ E t . Pr o of. If e ∈ E t then ˜ P t +1 ( e ) = β P t i =1 M ( e,a i ) b ecause β t = β , and the round t + 1 defense allo cation of Algorithm 1 P t +1 is simply ˜ P t +1 normalized to sum to unit y o v er edge set E t , whic h is exactly the defense allo cation output b y Algorithm 2. Armed wi t h this corresp ondenc e, w e sho w that Algorithm 1 i s almost as eectiv e as Algorithm 2. In other w ords, hiding unattac k ed edges from the defender do es not cause m uc h harm to the reactiv e defender's abil it y to disi n c en tivize the attac k er. Lemma 8 If defense al lo c ations { d 1 ,t } T t =1 and { d 2 ,t } T t =1 ar e output by A lgo- rithms 1 and 2 with p ar amete r s β t =  1 + p 2 log | E t | / ( t + 1)  − 1 for t ∈ [ T − 1] and β =  1 + p 2 log | E | / ( T )  − 1 , r esp e ctively, on a system ( V , E , w , rew ard , s ) and attack se quenc e { a t } T t =1 , then 1 T T X t =1 proﬁt( a t , d 1 ,t ) − 1 T T X t =1 proﬁt( a t , d 2 ,t ) ≤ B T w − 1 . Pr o of. Consider attac k a t from a r ound t ∈ [ T ] and consider an edge e ∈ a t . If e ∈ a s for some s < t , then the defense budget allo cated to e at time t b y Algorithm 2 cannot b e greater than the budget allo cated b y Algorithm 1. Th us, the instan taneous cost paid b y the attac k er on e when Algorithm 1 defends is at least th e cost paid when Algorithm 2 defends: d 1 ,t ( e ) /w ( e ) ≥ d 2 ,t ( e ) /w ( e ) . If e / ∈ S t − 1 s =1 a s then for all s ∈ [ t ] , d 1 ,s ( e ) = 0 , b y denition. The sequence { d 2 ,s ( e ) } t − 1 s =1 is decreasing and p ositiv e. Th us max s 0 , wher e inc( v ) ⊆ E denotes the e dges incident to vertex v . Pr o of. Let d ? = argmax d ∈D B,E min a cost( a, d ) witness the game's v alue V , then max d ∈D B,E P T t =1 cost( a t , d ) ≥ P T t =1 cost( a t , d ? ) ≥ T V . Consider the defen- siv e allo cation for eac h e ∈ E . If e ∈ inc( s ) , let ˜ d ( e ) = B w ( e ) / P e ∈ inc( s ) w ( e ) > 0 , and otherwise ˜ d ( e ) = 0 . This allo cation is f e asible b ecause X e ∈ E ˜ d ( e ) = B P e ∈ inc( s ) w ( e ) P e ∈ inc( s ) w ( e ) = B . By denition ˜ d ( e ) /w ( e ) = B / P e ∈ inc( s ) w ( e ) for eac h edge e inciden t to s . There- fore, cost( a, ˜ d ) ≥ B / P e ∈ inc( s ) w ( e ) for an y non-trivial attac k a , whic h necessar- ily includes at l east one s -inciden t edge. Finally , V ≥ min a cost( a, ˜ d ) pro v e s V ≥ B P e ∈ inc( s ) w ( e ) . (A.2) No w, consider a defense al lo cation d and x an attac k a that minimi zes the total attac k er cost under d . A t most one edge e ∈ a can ha v e d ( e ) > 0 , for otherwise the cost under d can b e re du c ed b y rem o ving an e dge from a . Moreo v er an y attac k a ∈ argmin e ∈ inc( s ) d ( e ) /w ( e ) minimizes attac k er cost under d . Th us the maximin V is witnessed b y defense allo cations that maximize min e ∈ inc( s ) d ( e ) /w ( e ) . This maximization is ac hiev e d b y allo cation ˜ d and so I n e q ua lit y (A.2) is an equalit y . 20 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett W e are no w ready to pro v e the main R O A theorem: Pr o of (of The or em 3). First, observ e that for all B > 0 and all A, C ∈ R A B ≤ C ⇐ ⇒ A − B ≤ ( C − 1) B . (A.3) W e will use this equiv alence to con v ert the regret b ound on prot to the desired b ound on R O A. T ogether Theorem 1 and L emma 9 imply α T X t =1 cost( a t , d t ) ≥ α max d ? ∈D B,E T X t =1 cost( a t , d ? ) − α B 2 p T log | E | − αB  log | E | + w − 1  ≥ αV T − α B 2 p T log | E | − αB  log | E | + w − 1  where V = max d ∈D B,E min a cost( a, d ) > 0 . If √ T ≥ 13 √ 2  1 + α − 1  p log | E | X e ∈ inc( s ) w ( e ) , w e can use inequalities V = B / P e ∈ inc( s ) w ( e ) , w − 1 ≤ 2 log | E | (since | E | > 1 ), and  P e ∈ inc( s ) w ( e )  − 1 ≤ 1 to sho w √ T ≥  (1 + α ) B + p [(1 + α ) B + 24 α V ] (1 + α ) B  (2 √ 2 αV ) − 1 p log | E | , whic h com b i nes with Theorem 1 and Inequali t y A.4 to imply α T X t =1 cost( a t , d t ) ≥ α V T − α B 2 p T log | E | − αB  log | E | + w − 1  ≥ B 2 p T log | E | + B  log | E | + w − 1  ≥ T X t =1 proﬁt( a t , d t ) − min d ? ∈D B,E T X t =1 proﬁt( a t , d ? ) = T X t =1 ( − cost( a t , d t )) − min d ? ∈D B,E T X t =1 ( − cost( a t , d ? )) = max d ? ∈D B,E T X t =1 cost( a t , d ? ) − T X t =1 cost( a t , d t ) . A Learni ng-Based Approac h to Reactiv e Securit y 21 Finally , com bining this equation with Equiv alence A.3 yiel ds the result R OA  { a t } T t =1 , { d t } T t =1  min d ? ∈D B,E R OA  { a t } T t =1 , d ?  = P T t =1 pa yoﬀ ( a t , d t ) P T t =1 cost( a t , d t ) · max d ? ∈D B,E P T t =1 cost( a t , d ? ) P T t =1 pa yoﬀ ( a t , d ? ) = max d ? ∈D B,E P T t =1 cost( a t , d ? ) P T t =1 cost( a t , d t ) ≤ 1 + α . B Lo w er Bounds W e briey argue the optimalit y of Algorithm 1 for a par ti cular graph, i.e. w e sho w that Algorithm 1 has optimal con v ergence time for small enough α , up to constan ts. (F or v ery large α , Algorithm 1 con v erges in constan t time , and therefore is optimal up to constan ts, v acuously .) The argumen t considers an at- tac k er who randomly selects an attac k path, rendering kno wle dge of past attac ks useless. Consider a t w o-v ertex graph where the start v ertex s is connected to a v ertex r (with r e w ard 1 ) b y t w o parallel edges e 1 and e 2 , eac h with an attac k surface of 1 . F urther supp o s e that the defense budget B = 1 . W e rst sho w a lo w er b ound on all reactiv e algorithms: Lemma 10 for al l r e active algorithms A , the c omp etitive r atio C is at le ast ( x + Ω ( √ T )) /x , i.e. at le ast ( T + Ω ( √ T )) /T b e c ause x ≤ T . Pr o of. Consider the follo wing random attac k sequence: F or eac h round, sele ct an attac k path uniform I ID from the set { e 1 , e 2 } . A reac tiv e st rate gy m ust commit to a defense in ev ery round without kno wledge of the attac k, and therefore ev ery strategy that exp ends the e n tire budget of 1 inicts an exp e cted cost of 1 / 2 in ev ery roun d. Th us, ev ery reactiv e strategy inicts a total exp ected cost of (at most) T / 2 , where the exp ec tation is o v er the coin-tosses of the random attac k pro cess. Giv en an attac k sequence, ho w ev er, there exists a proactiv e defense allo cation with b etter p erformance. W e can think of the proactiv e defender b eing prescien t as to whic h edge ( e 1 or e 2 ) will b e attac k ed m os t frequen tly and allo cating the en tire defense budget to that edge. It is w el l-kno wn (for instance via an analysis of a one-dimensional random w alk) that in suc h a random pro cess, one of the edges will o ccur Ω ( √ T ) more often than the other, in exp ectation. By the probabilistic metho d, a prop ert y that is true in exp ectation m ust hold existen tially , and, therefore, for ev ery reactiv e s trateg y A , there exists an attac k sequence suc h that A has a cost x , whereas the b est proactiv e strategy (in retrosp e ct) has a co s t x + Ω ( √ T ) . Beca u se the pa y o of eac h attac k i s 1 , the total rew ard in either ca s e is T . The prescien t proactiv e defender, therefore, has an R O A of T / ( x + Ω ( √ T )) , but the reactiv e algorithm has an R O A of T /x , establishing the lemma. 22 Barth, Rubinst ein, Sundarara jan, Mitc h ell, Song, Bartlett Giv en this lemma, w e sho w that Algorithm 1 is optimal giv en the information a v ailable . In this case, n = 2 and, ignoring constan ts from Theorem 3, w e are trying to matc h a con v ergence time T is at most (1 + α − 1 ) 2 , whic h is appro xi- mately α − 2 for small α . F or large e nough T , there exists a c ons tan t c suc h that C ≥ ( T + c √ T ) /T . By easy algebra, ( T + c √ T ) /T ≥ 1 + α whenev er T ≤ c 2 /α 2 , concluding the argumen t. W e can generalize the ab o v e argumen t of optimalit y to n > 2 using the com binatorial Lemma 3.2.1 from [6]. Sp ecically , w e can sho w that for ev ery n , there is an n edge graph for whic h Algorithm 1 is optimal up to constan ts for small enough α .

A Learning-Based Approach to Reactive Security

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment