On the Semantics of Purpose Requirements in Privacy Policies
Privacy policies often place requirements on the purposes for which a governed entity may use personal information. For example, regulations, such as HIPAA, require that hospital employees use medical information for only certain purposes, such as tr…
Authors: Michael Carl Tschantz, Anupam Datta, Jeannette M. Wing
On the Seman tics of Pu rp ose Requiremen ts in Priv acy P olicies Mic hael Carl Tscha n tz An upam Datta Jeannette M. Wing F ebruary 18, 20 11 CMU-CS-11-102 Sc ho ol of Computer Scie nce Carnegie Mello n Unive rsit y Pittsburgh, P A 15213 Abstract Priv acy p olicies often place requiremen ts on the purp oses fo r whic h a go v erned en tit y may use p ersonal information. F or example, regulations, such as HIP AA, require that hospital emplo y ees use medical information for only certain purp oses, suc h as treatmen t. Th us, us in g form al or automated metho ds for enforcing pr iv acy p olicies requires a seman tics of purp ose r e quir ements to determine whether an action is for a purp ose or not. W e pro vide su c h a seman tics using a formalism based on planning . W e m o del plann ing us ing a m o dified version of Marko v Decision Pro cesses, whic h exclude redun dan t actions for a form al definition of r e dundant . W e us e the mo d el to formalize when a sequence of actions is only for or not for a purp ose. This seman tics enables us to p r o vide an algorithm for automating auditing, and to describ e formally and compare rigorously p r evious enforcemen t metho d s. This research w as supp orted b y the US A rmy Researc h Office under grant num b ers W911NF0910273 and DAAD- 1902103 89. The v iews and conclusions con tained in this do cument are those of the authors and should not b e interpreted as representing the official p olicies, either ex pressed or implied, of any sp onsoring institu t ion, th e U.S. go vernmen t or any other entit y . This man uscript was submitted to the 24th IEEE C omputer Security F oundations Symp osium. Keyw ords: Pr iv acy , F ormal Metho ds 1 In tro duction Purp ose is a k ey concept for priv acy p olicies. F or example, the Eu rop ean Union requires that [The95]: Mem b er States sh all pro vide that p ersonal data must b e [. . .] collect ed for sp ecified, explicit and legitimate pu rp oses and not fu rther pro cessed in a w a y incompatible with those p urp oses. The United States also has la ws p lacing pur p ose requirements on information in some domains suc h as HIP AA [Off03] for medical inf ormation and the Gramm-Leac h-Bliley Act [Uni10] for finan cial records. These la ws and b est pr actices m otiv ate organizations to discuss in their priv acy p olicies the purp oses for which they w ill u se information. Some priv acy p olicies w arn users that the p olicy pro vider ma y use certain inform ation f or certain purp oses. F o r example, the priv acy p olicy of a medical pro vider states, “W e m ay disclose y our [protected health inf ormation] f or p ublic health activiti es and pu rp oses [. . .]” [W as03]. Suc h w arnings d o not constrain the b eha vior of the p olicy p ro vider. Other p olicies that proh ib it u sing certain information for a p urp ose do constrain the b ehavio r of the p olicy pro vider. Examples include th e p r iv acy p olicy of Y ah o o! Email, which states that “Y aho o!’s practice is not to use the con ten t of messages stored in y our Y aho o! Mail account for mark eting purp oses” [Y ah10b, emp hasis add ed ]. Some p olicies ev en limit the us e of certain inf ormation to an explicit list of purp oses. Th e priv acy p olicy of The Bank of America states, “Employ ees are auth orized to access C ustomer Information for b usiness pu rp oses only .” [Ban05, emph asis add ed]. The HIP AA Priv acy Rule [Off03] requires that co v ered en tities (e.g., h ealth care providers and business p artners) only us e or disclose p rotected health in formation ab out a patien t with that patien t’s written authorization or: [. . .] for the follo wing purp oses or situations: (1) T o the Individual [. . .]; (2) T reatment, P a ymen t, and Health Care Op erations; (3) Opp ortunit y to Agree or Ob ject; (4) I nciden t to an otherwise p ermitted use and d isclosure; (5) P ublic Inte rest and Benefit Activities; and (6) Limited Data S et for the pur p oses of researc h, public health or health care op erations. These examples sho w that v erifying that an organizati on ob eys a pr iv acy p olicy requir es a seman tics of purp ose r e quir ements . In particular, enforcemen t requires the abilit y to d etermine that the organizatio n und er scrutin y ob eys at least t w o classes of pur p ose requiremen ts. As sh own in the example rule from Y aho o!, the first requirement is that the organization do es not u se certain sensitiv e information for a giv en purp ose. The second, as the example rule fr om HIP AA shows, is that the organization uses certain sensitiv e information only for a giv en list of pu r p oses. W e call the first class of requiremen ts pr ohibitive (not-for) and the second class r estrictive (only-for). Each class requires d etermining whether the organization’s b eha vior is for a pu r p ose or not, bu t they differ in whether this in dicates a violation or compliance, resp ectiv ely . F or example, consider a physician acc essing a medical record. Under the HIP AA P riv acy Rule, the physician m ay access the record only for certain p urp oses su ch as treatmen t, r esearc h , and billing. Thus, for an auditor (either in ternal or external) to d etermine whether the p h ysician h as ob ey ed the Priv acy Rule requires the auditor to determine the purp oses for whic h the ph ysician accessed the record. T he auditor’s abilit y to determine the pur p oses b ehin d actions is limited since the auditor can only observe the b eha vior of th e physicia n. As a physicia n ma y p erform the exact 1 same actions for different purp oses, the auditor can neve r b e s u re of the pur p oses b ehind an action. Ho wev er, if the aud itor determines that the record access could not h a v e p ossibly b een f or any of the p urp oses allo w ed u nder the Priv acy Rule, then th e aud itor knows that the physicia n violated the p olicy . Man ual enforcement of th ese priv acy p olicies is lab or int ensiv e and er r or prone. Th us, to reduce costs and make their op erations more tru st w orth y , organizations wo uld lik e to automate the enforcemen t of th e p riv acy p olicies go verning their op erations; to ol sup p ort for this activit y is b eginning to emerge in the mark et. F or example, F air W arn ing offers automated services for the detection of p riv acy breac hes in a h ospital setting [F ai ]. Mean while, p revious research h as pu rp osed formal metho ds to enforce p urp ose requirements [AKSX02, BBL05, HA05, AF07, BL08, PGY08 , JSNS09, NBL + 10, EKWB11]. Ho wev er, eac h of these en dea v ors start b y assu ming that actions or sequ en ces of actions are lab eled with the purp oses they are for . They a v oid analyzing the meaning of purp ose and provide no method of p erformin g this lab eling other th an thr ough intuition alo ne. The absence of a formal seman tics to guid e this determination has hamp ered th e dev elopmen t o f metho ds for en suring p olicy compliance. Su c h a definition w ould pro vide insigh ts in to ho w to dev elop to ols that iden tify suspicious accesses in need of d etailed auditing and algorithms f or determining whic h p urp oses an action could p ossibly b e f or. S uc h a defin ition would also sho w whic h enforcemen t app roac hes are most accurate. More fundamenta lly , suc h a definition could frame the scien tifi c basis of a so cietal and legal und er s tanding of pu rp ose an d of p riv acy p olicies that use the notion of pu rp ose. S uc h a foundation can, f or example, guid e implement ers as they co dify in soft w are an organizatio n’s in terpretation of internal and go vernmen t-imp osed priv acy p olicies. 1.1 Solution Approac h The goal of this w ork is to study the meaning of purp ose in the con text of enforcing p riv acy p olicies and prop ose formal definitions suitable for automating the enforcemen t of p urp ose requirement s. Since p ost-ho c auditing p ro vides the p ersp ectiv e often required to determine the purp ose of an action, we fo cus on automated auditing. Ho w ev er, w e b eliev e our seman tics is applicable to other formal metho ds and ma y also clarify informal reasoning. W e fi nd that planning is cen tral to the meaning of purp ose. W e see the r ole of planning in the definition of the sense of the wo rd “purp ose” most relev ant to our w ork [OED89]: The ob ject f or w hic h an ything is done or m ade, or f or w hic h it exists; the r esult or effect in tended or sought; end, aim. Similarly , w ork on cognitiv e ps yc hology calls purp ose “the cen tral determinant o f b eha vior” [DKP96, p19]. If our auditors are concerned with rational auditees (the p erson or organizatio n b eing au- dited), then we ma y assume the auditee uses a plan to determine what actions it will p erform in its attempt to ac h iev e its purp oses. W e (as hav e ph ilosophers [T a y66]) conclude that if an auditee selects to p erform an action a w hile planning to achiev e the pu rp ose p , then the auditee’s action a is for the purp ose p . In th is pap er, we mak e these n otions formal. 1.2 Ov erview of C on tributions W e first presen t an example that illustrates k ey factors in determining whether an action is for a purp ose or n ot. W e fi nd that the auditor should mo d el the aud itee as an agen t that in teracts with 2 an envir onment mo del . The environmen t mo del sho ws ho w the actions the auditee can p er f orm affect the state of th e environmen t. It also mo d els ho w well eac h state satisfies eac h purp ose that the mod eled auditee m igh t p ossibly find motiv ating. Limiting consid er ation to one pu rp ose, the en vironmen t mo del b ecomes a Mark o v Decision Pro cess (MDP) where the degree of satisfaction of that p urp ose is th e r ew ard fu nction of the MDP . If the auditee is motiv ated to act by only that purp ose, then the aud itee’s actions must corresp ond to an optimal plan for this MDP and these actions are for t hat purp ose. Additionally , we use a stricter definition of optimal than standard MDPs to r eject redu ndant actions that neither decrease nor increase the total rew ard. W e formalize this mo d el in Section 3. F or example, consider a physician ordering a medical test and an auditor attempting to de- termine whether the physic ian could ha v e ordered this test for the purp ose of treatmen t (and is therefore in compliance with the HIP AA Pr iv acy Ru le). The auditor would examine an MDP mo deling the ph ysician’s en vironment with the qualit y of treatmen t as the rew ard function to b e optimized. If no optimal plans for this MDP inv olv e ord ering the test, then the aud itor can conclude definitiv ely that the physician did not order the test for treatmen t. W e mak e this au d iting pro cess formal in Section 4 w here we discuss the ramifi cations of the auditor only obs er v in g the b eha viors of the aud itee and not the underlyin g plannin g pr o cess of the auditee th at resulted in these b eha viors. W e sh o w that in some circum stances, the auditor can still acquire en ou gh inform ation to determine that the aud itee v iolated the priv acy p olicy . T o do so, the auditor m ust fi rst use our MDP mo del to constru ct all the p ossible b eha viors that the pr iv acy p olicy allo ws and then compare it with all the b eha viors of the auditee that could ha v e resulted in the observed auditing log. S ection 5 presents an algorithm for auditing based on our formal definitions, illus tr ating the relev ance of our work. The seman tics d iscu ssed thus far is sufficient to pu t the previous w ork on enforcing priv acy p olicies on firm semantic ground. In S ection 6, we do so and discuss the strengths and weaknesses of eac h suc h approac h. I n particular, we find that eac h approac h may b e viewe d as a m etho d of enforcing the p olicy giv en the set of all p ossible allo wed b eha viors, an intermediate resu lt of our analysis. W e compare the previous auditing approac hes, whic h differ in their trade-offs b et we en auditing complexit y and accuracy of represent ing this set of b ehavi ors. Most auditees are actual ly int erested in m ultiple purp oses and select plans that sim ultaneously satisfy as many of the d esired purp oses as p ossible. Handlin g the in teractions b et w een purp oses complicates our semantics. In particular, actions selected by a single plan ma y b e for differen t purp oses. In Section 7, we presen t examples sho wing wh en our seman tics can extend to handle m ultiple pur p oses an d when difficulties arise in determining wh ic h purp oses an action is for when an auditee is attempting to satisfy v arious pu rp oses at once. Cu r ren tly , the state-of-the-art in the understand ing of human plann in g limits our abilities to impr o v e up on our semant ics. How ev er, as this un derstanding improv es, one ma y replace our MDP-lik e formalism with m ore detailed ones while retaining our general framew ork of defin ing purp ose in term s of planning. W e en d b y d iscussing other related w ork, futu re work, and conclusions. Our contributions include: • The first seman tic formalism of when a sequence of actions is for a pu rp ose, • An auditing algorithm for this formalism, • The resituating of previous p olicy enforcemen t metho d s in our formalism an d a comparativ e study of their expressiveness, and 3 • The firs t attempt to formally consider the effects on auditing caused by in teractions among m ultiple purp oses. Although motiv ated by our goal to f orm alize the notions of use and purp ose p rev alen tly found in p riv acy p olicies, our work is more generally applicable to a br oad range of p olicies, suc h as fiscal p olicies gov erning tra vel r eim bursement. 2 Motiv ation of Our Ap proac h W e start with an informal example that su ggests that an action is for a purp ose if the action is p art of a plan for achieving that purp ose . Consider a p h ysician w orking at a hospital who, as a sp ecialist, also o wns a pr iv ate practice that tests for b one damage using a no v el tec hnique for extracting information from X-ra y images. After seeing a patien t and taking an X-ra y , the physician forw ards the patien t’s medical r ecord including the X-ra y to his priv ate practice to apply th is new tec h nology . As this action ent ails the transmission of protected health in formation, the physician will hav e violated HIP AA if this transmission is not for one of the purp oses HIP AA allo ws. The physic ian w ould also run afoul of the hospital’s o wn p olicies go v ern ing when outside consu ltations are p ermissible u nless this action w as f or a legi timate pu rp ose. Finally , the patien t’s insurance will only reim bur se th e costs asso ciated with this consu ltation if a medical reason (pu rp ose) exists for them. The physician claims that this consu ltation was for reac hing a diagnosis. As suc h, it is for the purp ose of treatmen t and , therefore, allo we d under eac h of these p olicies. The hospital auditor, ho w ev er, has selected this action for in v estigatio n since the ph ysician’s making a referr al to his own priv ate practice makes the alternate m otiv ation of pr ofit p ossible. Whether or not the p h ysician violated these p olicies dep ends up on details not presented in the ab o v e description. F or example, w e would exp ect th e aud itor to ask questions suc h as: (1) W as the test relev ant to the p atient’ s condition? (2) Did the patient b enefit medically from ha ving the test? (3) W as this the b est option for the p atien t? W e will in tro duce these details as w e introdu ce eac h of the factors relev an t to th e pur p oses b ehind the physicia n’s actions. States and Actions. Sometimes the pu rp oses for whic h an action is take n d ep end up on the previous actions and the state of the system. In th e ab o v e example, whether or n ot the test is relev an t dep ends up on the cond ition of the p atien t, th at is, the state that the p atien t is in. While an auditor could mo del the act of transmitting the record as tw o (or more) d ifferen t actions based up on the state of the patien t, mo deling t w o concepts with one f ormalism could in tro duce errors. A b etter appr oac h is to mo del th e state of the system. The state captures the con text in w hic h the physician tak es an actio n and allo ws for the purp oses of an action to d ep end up on the actions that precede it. The physic ian’s o wn actions also affect the s tate of the system an d , th us, the purp oses for whic h his actions are. F or example, had the p h ysician transm itted the patien t’s medical record b efore taking the X-ray , then the transmission could not ha v e b een for treatmen t s in ce the p h ysician’s priv ate practice only op er ates on X-ra ys and would ha v e no u s e for the r ecord without the X-ra y . The ab o v e example illustrates that wh en an action is for a p urp ose, the action is part of a sequence of actions that can lead to a state in which some goal asso ciated w ith the p urp ose is ac h iev ed. In the example, the goal is reac hing a diagnosis. Only when the X-ra y is first added to the record is this goal reac hed. 4 Non-redundancy. Some actio ns, h o w ev er, may b e part of s u c h a sequ ence without actually b eing for the p u rp ose. F or example, su pp ose that the patien t’s X-ray clearly sh o ws the patient ’s problem. Then, the physicia n can reac h a diagnosis without sending the record to the pr iv ate practice. Thus, while b oth taking the X-ray and send ing the med ical record migh t b e part of a s equence of act ions that leads to ac hieving a diagnosis, the transmission d o es n ot actually contribute to ac hieving the diagnosis: the physician could omit it and the diagnosis could still b e reac hed. F rom this example, it ma y b e tempting to conclude that an action is for a purp ose only if that action is ne c essary to ac hiev e that purp ose. Ho w ev er, consider a p h ysician w h o has a c hoice b et w een tw o sp ecialists to wh om to send the medical record and must do so to reac h a diagnosis. In this scenario, the p h ysician’s sendin g the record to th e fir st sp ecialist is n ot necessary since h e could send it to the second. Lik ewise, sending the record to th e second sp ecialist is not necessary . Y et, the physicia n must send the record to one or the other sp ecialist and that transmission will b e for the p urp ose of d iagnosis. Thus, an action may b e for a p urp ose without b eing necessary for ac h ieving the purp ose. Rather than ne c essity , we use t he wea k er noti on of non-r e dundancy foun d in w ork on the seman tics of c ausation (e.g., [Mac74]). Giv en a sequ ence of actions that ac hiev es a goal, an action in it is r e dundant if that sequence with that action r emo v ed (and otherwise u nc hanged) also ac hieve s the goal. An action is non-r e dundant if remo ving that actio n from the sequ ence w ould result in the goal n o longer b eing ac hiev ed. Thus, non-r edundancy ma y b e view ed as necessit y under an otherwise fixed sequence of actions. F or example, supp ose the p hysic ian decides to send the medical record to the first sp ecialist. Then, the sequence of actions mo dified by remo vin g this action wo uld n ot lead to a state in wh ic h a diagnosis is reac hed. Thus, the transmission of the medical record to the first sp ecialist is non- redund an t. Ho wev er, had the X-ra y rev ealed to the physicia n the diagnosis without needing to send it to a sp ecialist, the sequence of actio ns that resu lts from remo ving the transmission from the original sequence w ould still result in a d iagnosis. Thus, the transmission w ou ld b e redundant. Quan titative Purp oses. Ab o v e we implicitly presumed that the diagnosis from eac h sp ecialist had equal qualit y . This need not b e the case. Indeed, man y purp oses are actually fulfilled to v arying degrees. F or example, the p urp ose of mark eting is nev er completely ac hiev ed since there is alw a ys more market ing to do. Thus, w e mo del a purp ose b y assignin g to eac h s tate-action pair a n umber that describ es ho w w ell that action fulfills that purp ose when p erformed in that state. W e require that the physician selects the test that maximizes the qualit y of the d iagnosis as determined b y total pur p ose score accum ulated o v er all h is actions. Probabilistic Systems. The su ccess of man y medical tests and pro cedures is probabilistic. F or example, with some probabilit y the ph ysician’s test m ay fail to reac h a diagnosis. The physicia n w ould still ha v e transm itted the m ed ical r ecord for the purp ose of diagnosis even if the test failed to reac h one. This p ossibilit y affec ts our seman tics of p u rp ose: no w an action m a y b e for a p urp ose ev en if that purp ose is never ac hiev ed. T o accoun t for such p robabilistic ev en ts, we mo del the environmen t in w hic h the physici an op erates as probabilistic. F or an action to b e for a purp ose, w e requ ire that ther e b e a n on-zero probabilit y of th e p u rp ose b eing ac hieve d an d that the p h ysician attempts to maximize the exp ected rew ard. In essence, we require that the physician attempts to ac hiev e a diagnosis. Thus, th e auditee’s plan determines the pu rp oses b ehind his actions rather than just the actions themselv es. 5 3 Planning for a Purp ose In this section, w e present a formalism for planning that accoun ts for qu an titativ e pu rp oses, p rob- abilistic systems and non-redu ndancy . W e start b y mo d eling th e en vir on m en t in w h ic h the auditee op erates as a Mark o v Decision Pro cess (MDP)—a natural mo d el for pr obabilistic systems. T he rew ard fu nction of the MDP quan tifies the degree of satisfaction of a pur p ose up on taking an ac- tion f r om a s tate. If the aud itee is motiv ated to action b y only that purp ose, then the auditee’s actions m ust corresp ond to an optimal plan for this MDP and these actions are for th at pu rp ose. W e dev elop a str icter definition of optimal than standard MDPs, whic h w e call NMDPs for N on- r e dundant MDP , to reject redu ndant actions th at neither decrease nor increase the total reward. W e end with an example illus tr ating the use of an NMDP to mo del an audited en vironmen t. 3.1 Mark o v Decision P ro cesses An MDP may b e thought of as a p robabilistic automaton where trans itions are lab eled with a rew ard in addition to an action. Rather than ha ving accepting or goal states, the “goal” of a MDP is maximizing the total rew ard ov er time. An MDP is a tuple m = hQ , A , t, r , γ i where Q is a set of states, A is a set of actions, t : Q × A → D ( Q ) a transition function from a state and an actio n to a distribution o v er states (repr esented as D ( Q )), r : Q × A → R a rew ard function, and γ a discoun t factor s u c h that 0 < γ < 1. F or eac h state q in Q , the agen t using the MDP to plan selects an action a from A to p erform. Up on p erformin g the action a in the state q , the agent r eceiv es the r ew ard r ( q , a ). The en vironment then transitions to a new state q ′ with probabilit y µ ( q ′ ) w here µ is the distribu tion provided by t ( q , a ). Th e goal of the agen t is to select actions to maximize its exp ected total discounted rew ard E P ∞ i =0 γ i ρ i where i ∈ N (the set of natural num b ers) ranges o v er time mo deled as discrete steps, ρ i is the rewa rd at time i , and th e exp ectation is tak en o v er the p robabilistic tr an s itions. W e formalize the agen t’s plan as a stationar y str ate g y (commonly called a “p olicy”, bu t we reserv e that word f or priv acy p olicies). A stationary s trategy is a fu nction σ f r om the state space Q to the set A of actions (i.e., σ : Q → A ) su c h that at a state q in Q , the agen t alw a ys selects to p erform the action σ ( q ). Given a strategy σ for an MDP m , its exp ected total discounte d r ew ard is V m ( σ , q ) = r ( q , σ ( q )) + γ X q ′ ∈Q t ( q , σ ( q ))( q ′ ) ∗ V m ( σ , q ′ ) The agen t selects one of the str ategie s that optimizes this equation. W e denote this set of optimal strategies as opt ( hQ , A , t, r, γ i ), or when the transition system is clear from con text, as opt ( r ). Suc h strategies are sufficient to maximize the agen t’s exp ected total discount ed reward despite only dep ending up on the MDP’s current state. Giv en the strategy σ and the actual r esults of the probabilistic transitions yielded b y t , the agen t exhibits an exe cu tion . W e repr esen t this execution as an infinite sequence e = [ q 1 , a 1 , q 2 , a 2 , . . . ] of alternating states and actions starting with a state, where q i is the i th s tate that the agen t w as in and a i is the i th action the agen t took, for all i in N . W e sa y an execution e is c onsistent with a strategy σ iff a i = σ ( q i ) for all i in N wher e a i is the i th actio n in e and q i is the i th state in e . W e call a finite prefix of an execution a b ehavior . A b eha vior is consisten t with a s tr ategy if it can b e extended to an execution consistent with that strategy . Under this form alism, the aud itee pla ys the role of the agent optimizing the MDP to plan. W e presume that eac h p urp ose ma y b e mo deled as a rew ard fu nction. Th at is, we assume the d egree 6 to which a p u rp ose is satisfied may b e captured by a function from states and actions to a real n umber. T he higher the n umber, the higher the degree to wh ic h that pu rp ose is satisfied. When the auditee w an ts to plan for a p u rp ose p , it u ses a rewa rd function, r p , suc h that r p ( q , a ) is the degree to whic h taking the action a from state q aids the pur p ose p . W e also assume that th e exp ected total discoun ted r ew ard can capture the degree to wh ic h a purp ose is satisfied o v er time. W e sa y that the auditee plans for the purp ose p when the au d itee adopts a strategy σ that is optimal for the MDP hQ , A , t, r p , γ i . Th e app endix pr o vides add itional backg round information on MDPs. 3.2 Non-redundancy MDPs do n ot requir e that strategies be non-redundant . Eve n give n that the auditee had an execution e fr om using a strategy σ in opt ( r p ), some actions in e migh t not b e for the p urp ose p . The r eason is that some actions may b e redund an t despite b eing costless. The MDP optimization criterion b ehin d opt prev en ts redun dant actions from dela ying the ac hiev emen t of a goal as the rew ard asso ciated with that goal w ould b e fur ther discoun ted making suc h redundant actions sub-optimal. Ho we v er, the optimization criterion is not affected b y r ed undant actions when they app ear after all actions that pr ovide non-zero rewards. Intuitiv ely , the hyp othetical agen t planning only for the purp ose in question would not p erform su c h u nneeded actions ev en if they hav e zero rew ard. Th u s, to create our formalism of non-r ed undant MDPs (NMDPs), we replace opt with a new optimization criterion opt ∗ that preven ts these r edundant actions while mainta ining the same transition structure as a s tand ard MDP . T o accoun t for redund an t actio ns, we must fi rst contrast that w ith doing nothing. Th us, we in tro duce a distinguished action N that stands for doing nothing. F or all states q , N lab els a transition with zero rew ard (i.e., r ( q , N ) = 0) that is a self-lo op (i.e., t ( q , N )( q ) = 1). (W e could put N on only th e subset of states that represent p ossible stopp ing p oints b y slight ly complicating our formalism.) S ince w e only allo w deterministic stationary strategies and N only lab els self-lo ops, this decision is irrevocable: once nothing is done, it is done forever. As selecting to do nothing results in only zero rewards h enceforth, it ma y b e view ed as stopping with the previously acquired total d iscounted reward. Giv en an execution e , let active ( e ) d enote the prefix of e b efore the fi rst in stance of the nothing actions. acti ve ( e ) will b e equal to e in the case w here e do es not con tain the nothing action. W e use the idea of nothing to mak e formal wh en one execution intuitiv ely conta in more acti ons than another despite b oth b eing of infi nite length. An execution e 1 is a pr op er sub-exe cution of an execution e 2 if and only if acti v e ( e 1 ) is a p r op er subsequence of active ( e 2 ) using the standard notion of subsequen ce. Note if e 1 do es not cont ain th e nothing action, it cannot b e a p rop er sub -execution of any execution. T o compare strategies, we construct all the executions they could p ro duce. T o do s o, let a c ontingency κ b e a function from Q × A × N to Q suc h that κ ( q , a, i ) is the state that resu lts from taking the action a in the state q the i th time. W e sa y that a con tingency κ is c onsistent with an MDP iff κ only picks s tates to whic h the transition fu nction t of the MDP assigns a non-zero probabilit y to (i.e., for all q in Q , a in A , and i in N , t ( q , a )( κ ( q, a, i )) > 0). Giv en an MDP m , let m ( q , κ ) b e the p ossibly infinite state mo d el that results of having κ resolv e all the probabilistic c hoices in m and having the mo del start in state q . Let m ( q , κ, σ ) denote the execution that resu lts from u sing the strategy σ and state q in the non-prob ab ilistic mo del m ( q , κ ). Henceforth, w e on ly consider conti ngencies consistent with the m o del un der discuss ion. Giv en t w o strategies σ and σ ′ , w e write σ ′ ≺ σ if and only if for all con tingencies κ and states 7 send , 0 2 0 . 9 1 1 3 diagnose , 1 2 1 6 5 0 . 2 0 . 8 send , 0 1 take , 0 4 diagnose , 1 2 1 0 . 1 diagnose , 12 Figure 1: The environmen t mo d el m ex that the ph ysician used. C ircles rep resen t states, b lo c k arro ws denote p ossib le actions, and squ iggly arro ws denote probabilistic outcomes. Self-lo op s of zero reward und er all actions, includ ing the sp ecial action N , are not sho wn. q , m ( q , κ, σ ′ ) is a pr op er su b-execution of or equal to m ( q , κ, σ ), and for at least one con tingency κ ′ and state q ′ , m ( q ′ , κ ′ , σ ′ ) is a prop er su b-execution m ( q ′ , κ ′ , σ ). Int uitiv ely , σ ′ pro v es that σ pro du ces a redu n dant execution under κ ′ and q ′ . W e define opt ∗ ( r ) to b e the subset of opt ( r ) holding only strategies σ such that for no σ ′ ∈ opt ( r ) d o es σ ′ ≺ σ . The follo w ing th eorem, pro v ed in the app endix, shows that n on-redund an t optimal strategies alw a ys exist. Theorem 1. F or al l envir onment mo dels m , opt ∗ ( m ) is not empty. 3.3 Example Supp ose an auditor is insp ecting a hospital and comes across a physicia n referring a medical record to h is own pr iv ate practice for analysis of an X-ra y as describ ed in Section 2. As physic ians ma y only make s u c h r eferrals for the pur p ose of treatment ( treat ), the auditor may find the physi- cian’s b eha vior su spicious. T o inv estigate, the auditor m a y formally mo del the hospital usin g our formalism. The au d itor w ould construct the NMDP m ex = h Q ex , A ex , t ex , r treat ex , γ ex i sho w n in Figure 1. The figure con v eys all comp onents of the NMDP except γ ex . F or instance, the blo ck arr ow from the state 1 lab eled tak e and th e squiggly arrows lea ving it denote that after the agen t p erforms the action take from state 1, the environmen t will transition to the state 2 with probabilit y 0 . 9 and to state 4 w ith p r obabilit y of 0 . 1 (i.e., t ex (1 , take )(2) = 0 . 9 and t ex (1 , take )(4) = 0 . 1). Th e num b er o v er the blo c k arro w further indicates the degree to w hic h the action satisfies the pu rp ose of t reat . In this instance, it sho ws that r treat ex (1 , take ) = 0. Th is transition mo dels the ph ysician taking an X-ra y . With pr obabilit y 0 . 9, he is able to mak e a diagnosis right a w a y (from state 2); with probabilit y 0 . 1, he must send the X-ra y to h is practice to make a diagnosis. Similarly , the transition from state 4 mo dels that h is p ractice’s test has a 0 . 8 success rate of making a d iagnosis; with probabilit y 0 . 2, no diagnosis is ev er reac h ed. Using the mo del, the auditor computes opt ( r treat ex ), w h ic h consists of those strategies that max- imizes the exp ected total discounted d egree of satisfaction of the p u rp ose of treatment wh ere the exp ectation is o v er th e probabilistic transitions of the mo d el. o pt ( r treat ex ) includes the approp r iate strategy σ 1 where σ 1 (1) = take , σ 1 (4) = send , σ 1 (2) = σ 1 (3) = σ 1 (5) = diagnose , and σ 1 (6) = N . F urthermore, opt ( r treat ex ) excludes the redund an t strategy σ 2 that p erforms a redun dant send w here σ 2 is the same as σ 1 except f or σ 2 (2) = send . P erforming the extra action send dela ys the reward 8 of 12 for ac hieving a diagnosis r esu lting in its discount ed rew ard b eing γ 2 ex ∗ 12 instead of γ ex ∗ 12 and, thus, the strategy is not optimal. Ho wev er, opt ( r treat ex ) do es in clud e the redundant strategy σ 3 that is th e same as σ 1 except for σ 3 (6) = send . opt ( r treat ex ) includes this strategy despite the send actions from state 6 b eing r edundant since no p ositiv e r ew ards f ollo w the send actions. F ortun ately , opt ∗ ( r treat ex ) does not include σ 3 since σ 1 is b oth in opt ( r treat ex ) and σ 1 ≺ σ 3 . T o see that σ 1 ≺ σ 3 note that for ev ery con tingency κ and state q , the m ex ( q , κ, σ 1 ) has th e form b f ollo wed b y an finite sequence of nothing actions (inte rlea v ed with the state 6) for s ome fin ite pr efix b . F or the same κ , m ex ( q , κ, σ 3 ) h as the form b f ollo wed by an infinite sequence of s end actions (in terlea ved with the state 6) for the same b . Thus, m ex ( q , κ, σ 1 ) is a prop er sub-execution of m ex ( q , κ, σ 3 ). 4 Auditing In th e ab o v e example, the auditor constructed a mo d el of the en vironment in whic h the aud itee op erates. The auditor must us e the m o del to determine if the auditee ob ey ed the p olicy . W e first discuss this pro cess for auditing restrictiv e p olicy rules and revisit the ab ov e example. Th en, w e discuss the pr o cess for pr ohibitiv e p olicy ru les. In the next section, w e pr o vide an auditing algorithm that automates comparin g the aud itee’s b eha vior, as recorded in a log, to the set of allo wed b eha viors. 4.1 Auditing Restr ictiv e Rules Supp ose that an auditor w ould like to d etermine whether an auditee p erformed some logged actions only for th e purp ose p . The aud itor can compare the logged b eha vior to the b eha vior that a h yp othetical age n t w ould p erform when planning for the p urp ose p . In particular, the hyp othetical agen t selects a strategy from opt ∗ ( hQ , A , t, r p , γ i ) where Q , A , and t mo d els the en vironment of the auditee; r p is a r ew ard fu nction mo deling th e degree to whic h the pu r p ose p is satisfied; and γ is an appr op r iately selected discounting factor. If the logged b eha vior of the auditee w ould nev er ha v e b een p erformed by the hypothetical agen t, then the au d itor kno ws th at th e auditee violated the p olicy . In particular, the auditor m ust consider all the p ossible b ehaviors the h yp othetical agen t could ha v e p erformed. F or a mo d el m , let b ehv ∗ ( r p ) repr esen t this set where a finite prefix b of an execution is in b ehv ∗ ( r p ) if and only if there exists a strategy σ in opt ∗ ( r p ), a con tingency κ , and a state q such that b is a sub sequence of m ( q , κ, σ ). The auditor m ust compare b ehv ∗ ( r p ) to the set of all b eha viors that could ha v e caused the auditor to observ e the log that he did . W e presu me that the log ℓ was created by a pro cess log that records features of the current b eha vior. Th at is, l og : B → L where B is the set of b eh aviors and L the set of logs, and ℓ = log ( b ) where b is the prefix of the actual execution of the en vironment a v ailable at the time of a uditing. Th e auditor m ust consider all th e b ehavio rs in l og − 1 ( ℓ ) as p ossible where log − 1 is the in v erse of the logging fun ction. In the b est case for the auditor, the log records th e w hole prefix b of the execution that transpired un til the time of aud iting, in wh ic h case log − 1 ( ℓ ) = { ℓ } . If log − 1 ( ℓ ) ∩ b ehv ∗ ( r p ) is emp ty , then the auditor ma y conclude that the au d itee d id not plan for the p urp ose p , and, th u s, violate d the rule that auditee m ust only p erform the actions recorded in ℓ for the purp ose p ; otherwise, the auditor must consider it p ossible that the auditee planned for 9 the purp ose p . If log − 1 ( ℓ ) ⊆ b ehv ∗ ( r p ), the auditor migh t b e tempted to conclude that the auditee s u rely ob ey ed the p olicy r ule. How ev er, as illustrated in the second example b elo w, this is not n ecessarily true. The pr oblem is that log − 1 ( ℓ ) might ha v e a non-empty intersecti on w ith b ehv ∗ ( r p ′ ) for some other purp ose p ′ . In this case, the auditee migh t ha v e b een actually p lanning for the pur p ose p ′ instead of p . Indeed, giv en the likelihoo d of su ch other pu r p oses for non-trivial scenarios, w e consider pro ving compliance pr actica lly imp ossib le. Ho w ev er, this incapabilit y is of little consequ ence: l og − 1 ( ℓ ) ⊆ b ehv ∗ ( r p ) do es imply that the auditee is b eha ving as though h e is ob eying the p olicy . That is, in the w orse case, the auditee is still d oing th e righ t things ev en if for the w rong reasons. 4.2 Example Belo w w e revisit the example of Section 3.3. W e consider t w o cases. In the firs t, th e auditor sho ws that the physician violated the p olicy . I n the second, au d iting is inconclusiv e. Violation F ound. Supp ose after constructing the mo del as ab o v e in Section 3.3, the auditor maps the actions r ecorded in the access log ℓ 1 to the actions of the mo d el m ex , and find s log − 1 ( ℓ 1 ) holds only a single b ehavio r: b 1 = [1 , tak e , 2 , send , 3 , diagnose , 6 , N , 6]. Next, using opt ∗ ( r treat ex ), as computed ab ov e, the auditor constructs the set b ehv ∗ ( r treat ex ) of all b eha viors an agen t plann ing for treatmen t migh t exhib it. Th e auditor would find that b 1 is not in b ehv ∗ ( r treat ex ). T o see this, note that ev ery execution e 1 that has b 1 as a prefix is generated f rom a strategy σ such th at σ (2) = send . The strategy σ 2 from Section 3.3 is one such str ategy . None of these strategies are m em b ers of opt ( r treat ex ) for th e same reason as σ 2 is not a mem b er. T h us, b 1 cannot b e in b ehv ∗ ( r treat ex ). As log − 1 ( ℓ ) ∩ b ehv ∗ ( r treat ex ) is empt y , th e audit rev eals that the physician violated the p olicy . Inconclusiv e. No w supp ose that th e aud itor s ees a differen t log ℓ 2 suc h that log − 1 ( ℓ 2 ) = { b 2 } where b 2 = [1 , take , 4 , send , 5 , dia gnose , 6 , N , 6]. In th is case, our formalism w ould not fin d a violation since b 2 is in b ehv ∗ ( r treat ex ). In particular, the strategy σ 1 from ab o v e pro du ces the b ehavio r b 2 under the conti ngency th at selects the b ottom probab ilistic transition from state 1 to state 4 u nder the action t ak e . Nev ertheless, the auditor cannot b e sure that the physician ob ey ed the p olicy . F or example, consider the NMDP m ′ ex that is m ex altered to u s e the reward function r p rofit ex instead of r treat ex . r p rofit ex assigns a reward of zero to all transitions except for the send actions f r om states 2 and 4, to whic h it assigns a rew ard of 9. σ 1 is in opt ∗ ( r p rofit ex ) meaning that not only the same actions (those in b 2 ), b ut ev en the exact same str ategy can b e either for the allo w ed purp ose treat or th e d isallo wed purp ose p rofit . Thus, if the physici an did r efer th e record to his practice f or pr ofit, he cannot b e caugh t as he has tenable den iabilit y of his ulterior motive of profit. 4.3 Auditing Prohibitive R ules In the ab o v e example, the aud itor was enforcing the r ule that the physici an’s actions b e only for treatmen t. No w, consider auditing to en force the rule the th at physic ian’s actions are not for p ersonal p rofit. After seeing the log ℓ , the auditor could c hec k whether log − 1 ( ℓ ) ∩ b ehv ∗ ( r p rofit ex ) is empt y . If so, then the auditor kn o ws that the p olicy was ob eye d. If not, then the auditor cannot pro v e nor disprov e a violation. In the ab o v e example, just as the auditor is unsu re wh ether the 10 actions we re for the requir ed purp ose of treatmen t, the auditor is un sure w hether the actions are not for the p rohibited pu rp ose of profit. An auditor migh t decide to inv estigate some of the cases w here l og − 1 ( ℓ ) ∩ b ehv ∗ ( r p rofit ex ) is not empt y . In this case, the auditor could limit his attent ion to only th ose p ossible violations of a prohibitiv e rule that cannot b e explained a wa y b y some allo w ed p u rp ose. F or example, in the inconclusiv e example ab ov e, th e physician’s actions can b e explained with the allo wed purp ose of treatmen t. As the physician has tenable d eniabilit y , it is unlike ly that inv estigating h is actions w ould b e a pro d uctiv e use of the auditor’s time. Thus, the auditor sh ould limit his atten tion to those logs ℓ suc h that b oth log − 1 ( ℓ ) ∩ b ehv ∗ ( r p rofit ex ) is non-empt y and log − 1 ( ℓ ) ∩ b ehv ∗ ( r treat ex ) is empt y . A similar additional c hec k using disallo w ed pur p oses could b e applied to enforcing r estrictive rules. Ho w ev er, for restrictiv e rules, this chec k w ould identi fy cases where the aud itee’s b eha vior could ha v e b een either for the allo w ed p urp ose or a disallo w ed purp ose. Th us, it wo uld s erv e to find additional cases to in v estigate and increase the auditor’s w orkload rather than redu ce it. F urthermore, the auditee would h a v e tenable deniabilit y for these p ossible ulterior motive s, making these inv estigations a p o or use of the auditor’s time. 5 Auditing Algorithm W e w ould lik e to automate th e aud iting pr o cess d escrib ed ab ov e. T o this end, we present in Figure 2 an algo rithm A udit that aids the aud itor in comparing th e log to the set of allo w ed b eha viors. As w e are not in terested in the details of the logging pro cess and would like to fo cu s on the planning asp ects of our seman tics, we limit our atten tion to the case where log ( b ) = b . As p ro v ed b elo w (Theorem 2), A udit ( m, b ) returns true if and only if log − 1 ( b ) ∩ b ehv ∗ ( m ) is empt y . In the case of a restrictiv e rule, the auditor ma y conclude that the p olicy was violat ed w hen A ud it returns true. In case of a prohib itiv e rule, the aud itor ma y conclude the p olicy w as ob ey ed when A udit returns true. A udit op erates in t w o steps. The first c hec ks to mak e sure that the b eha vior b is not inherent ly redund an t (lines 01–05). If it is, then log − 1 ( b ) ∩ b ehv ∗ ( m ) will b e empt y and the algorithm returns true. Audit chec ks b by comparing the actions taken in eac h state to doing nothing. If the exp ected total discounte d rew ard for doing nothing in a state q is h igher than that for doing the action a in q , then a in trod uces r ed undancy into any strategy σ su c h that σ ( q ) = a . T h us, if b = [ . . . , q , a, . . . ], w e m a y conclude that log − 1 ( b ) ∩ b ehv ∗ ( m ) is empt y . The second step compares th e optimal v alues of tw o MDPs. On e of the them is the NMDP m treated as an MDP , whic h is already optimized du ring the first step. The other m ′ is constructed from m (lines 07–17) so that only the actions in the log b are selected during optimization. If the exp ected total discoun ted r ew ard of eac h of these MDPs is unequal, then log − 1 ( b ) ∩ b ehv ∗ ( m ) is empt y . Belo w w e formalize these ideas. Lemma 1 ju stifies our tw o s tep app roac h while Lemmas 2 and 3 ju stify h ow we p erform the fir st and second step, resp ectiv ely . T hey allo w us to conclude the correctness of our algorithm in Th eorem 2. W e d efer pro ofs and add itional prop ositions to th e app end ix. 11 A udit ( hQ , A , t, r , γ i , [ q 0 , a 1 , q 1 , . . . a n , q n ]): 01 V ∗ m := solveMDP ( hQ , A , t, r , γ i ) 02 for( i := 0; i < n ; i ++): 03 if( a i +1 6 = N ): 04 if( r [ q i ][ a i +1 ] + γ P |Q| j =0 t [ q i ][ a i +1 ][ j ] ∗ V ∗ m [ j ] ≤ 0): 05 return tru e 06 r ∗ := 0 07 for( j := 0; j < |Q| ; j ++): 08 for( k := 0; k < |A| ; k ++): 09 r ′ [ j ][ k ] := r [ j ][ k ] 10 if( r ∗ < absoluteV alue ( r [ j ][ k ]): 11 r ∗ := absoluteV alue ( r [ j ][ k ]) 12 ω := 2 ∗ r ∗ / (1 − γ ) + 1 13 for( i := 0; i < n ; i ++): 14 for( k := 0; k < |A| ; k ++): 15 if( k 6 = a i +1 ): 16 r ′ [ q i ][ k ] := − ω 17 m ′ := hQ , A , t, r ′ , γ i 18 V ∗ m ′ := solveMDP ( hQ , A , t, r ′ , γ i ) 19 for( j := 0; j < |Q| ; j ++): 20 if( V ∗ m [ j ] = V ∗ m ′ [ j ]): 21 return false 22 retur n true Figure 2: The algorithm A udit . solveMDP may b e any MDP solving algorithm. Th e algorithm assumes fu nctions are represent ed as arra ys and states and actions are repr esented as indexes in to these arr a ys. 12 5.1 Useless States and the Two Steps W e sa y an action is useless at a s tate if taking it w ould alw a ys lead to r ed undancy . F ormally , let the set U m b e the subs et of Q × A su ch that h q , a i is in U m if and only if a 6 = N and for all strategies σ , Q m ( σ , q , a ) ≤ 0 where Q m ( σ , q , a ) = r ( q , a ) + γ P q ′ t ( q , a )( q ′ ) ∗ V m ( σ , q ′ ). W e call h q , a i in set U m useless s ince an y strategy σ suc h that σ ( q ) = a could b e r eplaced b y a str ategy σ ′ that is the same as σ except f or ha ving σ ′ ( q ) = N without lo w ering the exp ected total discounte d reward. T o mak e this formal, let U ( σ ) b e a strategy su c h that U ( σ )( q ) = N if h q , σ ( q ) i ∈ U and U ( σ )( q ) = σ ( q ) otherwise. The follo w in g ju stifies calling th ese pairs useless : for all σ and q , V m ( σ , q ) ≤ V m ( U m ( σ ) , q ) (Prop osition 1). W e are also in terested in the set strg ( b ) of strategies that could h a v e r esulted in the b eha vior b : strg ( b ) = { σ ∈ Q → A | ∀ i < n.a i +1 = σ ( q i ) } where b = [ q 0 , a 1 , q 1 , a 2 , . . . , a n , q n ]. Lemma 1. F or al l envir onment mo dels m and al l b ehaviors b = [ q 0 , a 1 , q 1 , . . . , a n , q n ] , log − 1 ( b ) ∩ b ehv ∗ ( m ) is empty if and only if (1) ther e exists i such that 0 ≤ i < n and h q i , a i +1 i ∈ U m or (2) strg ( b ) ∩ opt ( m ) is empty, Th us, chec king w hether l og − 1 ( b ) ∩ b ehv ∗ ( m ) is empty has b een reduced to chec king the t w o conditions (1) and (2). W e explain how to chec k eac h of these in the next t w o sections. 5.2 Step 1: Inheren t Redundancy Rather than construct U m explicitly , w e use the follo win g lemma to c hec k condition (1). The lemma uses the definition Q ∗ m ( q , a ) = r ( q , a ) + γ P q ′ t ( q , a )( q ′ ) ∗ V ∗ m ( q ′ ) w here V ∗ m ( q ) = max σ V m ( σ , a ). Lemma 2. F or al l envi r onment mo dels m , states q , and actions a , h q , a i is in U m if and only if a 6 = N and Q ∗ m ( q , a ) ≤ 0 . 5.3 Step 2: Checkin g Optimality T o c hec k (2), we construct a mo del m ′ from m that limits the optimization to selecting a strategy that can cause the observ ed b eha vior b . T o do so, we adju st the reward f unction of m so that the ac- tions tak en in b are alw a ys take n b y the optimal strategie s of m ′ . That is, if b = [ q 0 , a 1 , q 1 , . . . , a n , q n ], then for eac h q i and a i +1 , w e rep lace the reward for taking an action a ′ other than a i +1 from the state q with a negativ e rewa rd − ω that is so lo w as to assur e that the action a ′ w ould not b e used b y an y optimal strategy . W e u se ω > 2 r ∗ / (1 − γ ) wh ere r ∗ is the rew ard with the largest magnitude app earing in m since the total discounted rew ard is b ounded from b elo w by − r ∗ / (1 − γ ) and fr om ab o v e b y r ∗ / (1 − γ ) (recall th at P ∞ i =0 γ i r ∗ = r ∗ / (1 − γ )). W e formally define m ′ to b e fix ( m, b ) wh ere fix ( m, []) = m and fix ( hQ , A , t, r , γ i , [ q 0 , a 1 , q 1 , . . . , a n , q n ]) = fix ( hQ , A , t, r ′ , γ i , [ q 1 , . . . , a n , q n ]) where r ′ ( q 0 , a ) = − ω for all a 6 = a 1 and r ′ ( q 0 , a 1 ) = r ( q 0 , a 1 ). The construction fix has the follo win g us eful pr op ert y: strg ( b ) ∩ opt ( m ) is empt y if and only if opt ( fix ( m, b )) ∩ opt ( m ) is empty (Prop osition 11). This prop erty is useful since testing whether opt ( m ) ∩ opt ( fix ( m, b )) is emp t y may b e redu ced to simply comparing their optimal v alues: opt ( m ) ∩ opt ( fix ( m, b )) is empty if and only if for all states q , max σ V fix ( m,b ) ( σ , q ) 6 = max σ V m ( σ , q ) (Prop osition 12). F ortunately , algorithms exist for fin ding the optimal v alue of MDPs (see, e.g., [RN03]). These tw o pr op ositions com bine to yield the n ext lemma, wh ich justifies h o w we conduct testing for the second condition of Lemma 1 in the second step of A udit . 13 Lemma 3. F or al l envir onment mo dels m and b ehaviors b , strg ( b ) ∩ opt ( m ) is empty if and only if for al l q , m ax σ V fix ( m,b ) ( σ , q ) 6 = max σ V m ( σ , q ) . These lemmas com b ine with reasoning ab out the actual code of the p rogram to yield its cor- rectness. Theorem 2. F or al l envir onment mo dels m and b ehaviors b , audit ( m, b ) r e tu rns true if and only if log − 1 ( b ) ∩ b ehv ∗ ( m ) is empty. The r unnin g time of the algo rithm is d ominated b y the tw o MDP optimizations. T h ese may b e done exactly by reducing the optimization to a system of linear equ ations [d’E63]. Su c h systems ma y b e solv ed in p olynomial time [Kha79, Kar84]. Ho wev er, in pr actice, large systems are often difficult to solv e. F ortun ately , a large num b er of algorithms for making iterativ e appro ximations exist whose ru n time dep ends on the qualit y of the approximati on. (See [LDK95] f or a discus sion.) 6 Applying our F ormali sm to P ast Metho ds P ast metho d s of enforcing pu rp ose r equiremen ts hav e n ot provided metho ds of assigning p urp oses to sequences of actions. Rather, they presu m e th at the auditor (or someone else) already h as a metho d of determinin g whic h b ehaviors are for a purp ose. I n essence, these metho ds presu pp ose that the auditor already has th e set of allo wed b eha viors b ehv ∗ ( r p ) for the p urp ose p that he is en- forcing. Th ese metho ds differ in th eir in tensional r epresen tations of the set b ehv ∗ ( r p ). Thus, some ma y represent a giv en set exactly while others ma y only b e able to appro ximate it. These differ- ences mainly arise from the different mechanisms they use to ensure that the auditee only exhibits b ehavio rs from b ehv ∗ ( r p ). W e u se our seman tics to stud y ho w r easonable these appr o ximations are. Byun et al. use r ole-based access control [San96] to consider p urp oses [BBL05, BL08, NBL + 10]. They asso ciate purp oses with sensitive resources and with roles, and their metho d only grants the user access to th e resource when the purp ose of the user’s role matc hes the resource’s pur p ose. The metho d do es not, ho w ev er, explain how to d etermine whic h purp oses to asso ciate with whic h roles. F u rthermore, a user in a role can p erform actions that d o n ot fit the p urp oses asso ciated w ith his role allo wing him to use the resource for a purp ose other than the intended one. Th us, their metho d is only capable of enforcing p olicies when there exists some sub set A of the set of actions A such that b ehv ∗ ( r p ) is equal to th e set of all interlea vings of A with Q of fin ite but unb ounded length (i.e., b ehv ∗ ( r p ) = ( Q × A ) ∗ : Q where : is app end raised to w ork o ver sets in the standard pairwise mann er). Th e sub set A corresp onds to those actions that us e a resource w ith the same purp ose as the auditee’s role. Despite these limitations, their m etho d can imp lemen t the run-time enforcemen t used at some organizations, suc h as a h ospital that allo ws physicia ns access to any record to a v oid denying access in time-critical emergencies. Ho w ev er, it do es not allo w for the fine- grain d istinctions used d uring p ost-ho c auditing done at some hospitals to ensure that physic ians do not abuse their pr ivileges. Al-F edaghi uses the wo rk of Byun et al. as a starting p oin t b ut concludes that r ather than as- so ciating p urp oses with roles, one should asso ciate p urp oses with sequences of actio ns [AF07]. Influenced b y Al-F edaghi, Jafari et al . adopt a similar position calling these sequences work- flows [JSNS09]. The set of w orkflo ws allo w ed for a p urp ose p corresp onds to b ehv ∗ ( r p ). T hey do n ot pro vide a formal metho d of determining w hic h w orkflo ws b elong in the allo w ed set. Th ey d o 14 not consider p robabilistic transitions and the intuition they sup ply su ggests that th ey w ould only include wo rkflows that successfully ac h iev es or impr ov es the pu r p ose. Thus, our approac h app ears more lenien t by including some b eha viors that fail to impr o v e the pur p ose. Others hav e ad op ted a h ybrid approac h allo wing for the roles of an au d itee to c hange based on the state of th e system [PGY08, EKWB11]. These c hanges effectiv ely allo w role-based access con trol to sim ulate the w orkflo w metho ds to b e jus t as expr essive while introd ucing a level of indirection inhabited by dynamic roles. Agra wal et al. use a query i ntrusion mo del to enforce purp ose requir emen ts that op erates in a manner s imilar to in trusion detection [AKSX02]. Their metho d fl ags a request f or acce ss as a p ossible violation if th e request claims to b e f or a purp ose despite b eing dissimilar to previous requests for the same pu r p ose. T o av oid false p ositive s, the s et of allo wed b ehavi ors b ehv ∗ ( r p ) w ould h a v e to b e small or h av e a pattern that the query in trusion m o del could recognize. Jif is a language extension to Ja v a designed to enforce r equ iremen ts on th e flo ws of information in a program [CMVZ09]. Hay ati and Abadi exp lain ho w to redu ce purp ose requiremen ts to inform ation flo w prop erties th at Jif can enforce [HA05]. Their metho d requ ires that inputs are lab eled with the pu rp oses f or wh ich the p olicy allo ws the program to u s e them and th at eac h unit of cod e b e lab eled with the pu rp oses for whic h that co de op erates. If information can flo w fr om an inp ut statemen t lab eled with one purp ose to cod e lab eled for a different p urp ose, th eir metho d pro du ces a compile-time t yp e error. (F or simp licit y , w e ignore their use of sub-t yping to mo del sub-pur p oses.) In essen ce, their metho d enforces the rule i f information i flows to c o de c , then i and c must b e lab ele d with the same purp ose . The in teresting case is when the co de c uses the information i to p erform some observ able action a c,i , s uc h as pro du cing output. Und er our semanti cs, we treat the program as the auditee and view the p olicy as limiting these actions. By d irectly lab eling co d e, their metho d d o es not consider the context s in which these actio ns o ccur. Rather the action a c,i is a wa ys either allo wed or not based on the purp ose lab els of c and i . By not consid er in g context , their metho d is s ub ject to the s ame limitations as the metho d of Byun et al. w ith the su bset A b eing equal to the set of all actions a c,i suc h that c and i ha v e the same lab el. Ho w ev er, us ing more adv anced t yp e systems (e.g., typ estate [SY86]), they might b e able extend th eir metho d to consider the cont ext in whic h co de is executed and increase the metho d’s expressiv eness. 7 Multiple Purp oses So far, our formalism allo ws our hyp othetical agen t to consider only a single purp ose. Ho w ev er , auditees ma y p erform an acti on for more than one purp ose. In many cases, the auditor may simp ly ignore any action that is not go v er n ed b y the priv acy p olicy and not r elev ant to the plans the auditee is employing that uses go v erned actions. In the ph ysician example ab o v e, the physician already implicitly considered many other pur p oses b efore even seeing this curren t patient . F or example, the physic ian presumably p erformed many actions not men tioned in the mo del in b et ween taking the X-ra y , sending it, and making a diagnosis, suc h as going on a coffee break. As these actions are not go v erned by the priv acy p olicy and neither impro v es nor degrades the diagnosis ev en indir ectly , the auditor may safely ignore them. T hus, our seman tics can hand le m ultiple purp oses in th is limited fashion. Ho wev er, in other cases, the interac tions b et w een pu rp oses b ecome imp ortant. Belo w we discuss t w o complemen tary wa ys that an auditee can consid er m ultiple pur p oses th at pro du ce inte ractions. In the first, the au d itee considers one pur p ose after another. In th e second, the aud itee attempts to 15 optimize for multiple purp oses simultaneously . W e fi nd that our seman tics may easily b e extended to handle the first, bu t difficulties arise for the second. W e end the section b y consid ering w hat f eatures a form alism would need to handle sim ultaneous consideration of pu rp oses and the c hallenges they raise f or auditing. 7.1 Sequen tial Consideration Y aho o!’s priv acy p olicy states that they will not con tact children f or the purp ose of mark et- ing [Y ah10a]. Supp ose Y aho o! decides to c h ange the n ame of gam es.yahoo .com to fun .yahoo.c om b ecause they b eliev e the new name will b e easier to mark et. They notify users of ga mes.yaho o.com , including children, of the up coming c h ange so that they ma y u p d ate their b o okmarks. In this example, the decision to change names, mad e f or marketi ng, causes Y aho o! to cont act c hildren. Ho w ev er, w e do not f eel this is a violatio n of Y aho o!’s priv acy p olicy . A decision made for marketing altered the exp ected f uture of Y aho o! in such a w a y that customer service would suffer if Y aho o! did nothing. Thus, to maintai n go o d cus tomer service, Y aho o! made the decision to notify us ers without further consideration of mark eting. Sin ce Y aho o! did not consider the purp ose of marke ting while m aking this decision, conta cting the c hildren was not for mark eting despite Y aho o! considering th e imp lications of c hanging the n ame f or marketi ng while making its decision to conta ct children. Bratman describ es suc h planning in his w ork form alizing intentions [Bra87]. He views it as a sequence of planning steps in which the inten tion to act (e.g., to c hange the name) at one step ma y affect the plans formed at later steps. In particular, eac h s tep of planning s tarts with a m o del of the en vironmen t th at is refined by the inten tions formed b y eac h of the previous p lanning steps. The step then creates a plan for a purp ose that fur ther refines the mo del with new inten tions resulting from this p lan. Thus, a purp ose of a previous step may affect the plan formed in a later step for a different purp ose by constraining th e c hoices av ailable at the later step of plann ing. W e adopt the stance th at an action selected at a step is f or the purp ose optimized at that step but not other previous p urp oses affect ing the step. 7.2 Sim ult aneous C onsideration A t other times, an au d itee migh t consider more th an one pur p ose in the s ame step. F or example, the physic ian ma y ha v e to b oth pro v id e qualit y treatmen t and resp ect the patien t’s financial concerns. In this case, the ph ysician ma y not b e able to sim ultaneously pr o vide the highest qualit y care at the lo w est price. The t w o comp eting concerns must b e balanced and the r esu lt ma y not maximize the satisfactio n of either of them. The traditional w a y of mo deling th e simultaneous optimization of multiple rewards is to com bine them in to a single reward using a weigh ted av erage o ve r the rew ards. Eac h rewa rd w ould b e w eigh ted by ho w imp ortan t it is to the aud itee p erforming the optimization. This amalgamation of the v arious pu rp ose r ew ards makes it difficult to determine for which pur p ose v arious actions are selected. One p ossibilit y is to analyze the situation using coun terfactual reasoning (see, e.g., [Mac74]). F or example, give n that the auditee p erform ed an action a wh ile optimizing a com bination of purp oses p 1 and p 2 , the auditor could ask if the auditee would hav e still p erformed the action a ev en if the aud itee h ad n ot considered the pur p ose p 1 and had only optimized the purp ose p 2 . If 16 flyDC , 0 , 1 flyNY , 1 , 0 driveDC , 0 , 2 2 , 0 driveNY , Figure 3: Mo del of a tra veler d eciding whether to fly or drive . S ince ev ery transition is deterministic, w e represen t eac h as a single arrow. Each is lab eled with th e action name, the rew ards for business and the rewards for lecturing in th at order . Self-lo ops of zero rewa rd are n ot sho wn includ in g all those lab eled with the nothing action N . not, th an the auditor could determine that the action wa s for p 1 . Ho w ev er, as the next example sho ws, su c h reasoning is not su fficien t to determine the p urp oses of the actions. T o sh o w the generalit y of pur p oses, w e consider an example in v olving trav el reimbursemen t. Consider a Ph iladelphian who n eeds to go to New Y ork Cit y for a b usiness meeting with his employ er and is in vited to giv e a lecture at a conference in W ashington, D.C., with his tra v el exp enses reim bursed by the confer en ce. He could d riv e to either New Y ork or W ashington (mo deled as the actions driveNY and driveDC , r esp ectiv ely). Ho w ev er, due to time constrain ts h e cannot driv e to b oth of them. T o attend b oth even ts, he needs to fly to b oth (mo deled as actions flyNY and flyDC ). As flying is more exp ensiv e, b oth driving actions receiv es a higher r ew ard than flying (2 instead of 1), bu t flyin g is b etter th an not going (0). Figure 3 mo dels th e tra v eler’s environmen t. Giv en these constraints, he d ecides to fly to b oth only to fi nd auditors at b oth even ts scrutinizing his d ecision. F or example, an auditor wo rking for the conferen ce could fin d that h is flight to W ashington wa s not for the lecture since the tra v eler w ould ha v e drive n had it not b een for w ork. If the conference’s p olicy requires that reimbursed fl ights are only for the lecture, th e aud itor might den y r eim bursement. Ho w ev er, the emp lo y er seems ev en less lik ely to reim b urse the tra v eler for his fligh t to W ashington since the fligh t is redun dant for getting to New Y ork. Ho wev er, und er the seman tics discu ssed ab o v e, eac h fl igh t w ould b e for b oth pur p oses since only when the tra v eler considers b oth do es he decide to tak e either fligh t. While ha ving the conference reim burse the tra v eler for his fl ight to W ashin gton s eems reasonable, the idea that they should also reim burse him for his fligh t to New Y ork app ears coun terin tuitiv e. Our approac h of sequential planning also cannot explain th is example. T o plan sequ en tially , the tra v eler must consider one of the tw o ev en ts fir st. If, for example, he considers New Y ork fir st, he will decided to drive to New Y ork and then d ecline the invitati on to W ashington. Only b y considering b oth eve nt s at once, d o es he decide to fly . W e b eliev e resolving this conflict requires extending our s eman tics to consider requirements that an action b e for a pur p ose (as opp osed to not for or only for ) . F urthermore, w e b eliev e th at the optimization of com binations of purp oses do es not accurately m o del human planning with multiple purp oses. Intuiti v ely , the tra v eler selects flyDC not for w ork bu t also n ot only for th e conference. Rather fl yDC seems b e for the conference und er the constrain t that it m ust n ot prev en t the tra v eler from attending the m eeting. I n the n ext section, we consider the p ossib ilit y of mo deling human planning more accurately . 17 7.3 Mo deling H uman Planning While MDPs are useful for automated planning, th ey are not sp ecialized for mo deling plann ing b y h umans, leading to the searc h for more tailored mo dels [Sim55, GS02]. Simon pr op osed to mo del h umans as having b ounde d r ationality to acco unt f or their limitations and th eir lac k of information [Sim55 ]. W ork on formalizing b ound ed rationalit y has resulted in a v ariety of p lanning principles ranging from the systematic (e.g., Simon’s satisficing ) to the heur istic (e.g., [Gig02]). Ho wev er, “[a] comprehensive, coherent theory of b oun d ed rationalit y is not a v ailable” [Sel02, p14] and there still is “a significan t amount of unpredictabilit y in ho w an animal or a h uman b eing will undertake to solve a p roblem” such as p lanning [DKP96, p40]. W e view creating seman tics more closely tied to human planning interesting f uture work. How- ev er, mo deling human plann ing ma y pr o v e complex enough to ju stify accepting the imp erf ections of semantics such as our s or ev en heuristic based ap p roac hes for fin ding violations such as the qu ery in trusion mo del discu s sed ab o v e [AKSX02]. Despite these difficulties, one could lo ok for discrepancies b etw een a seman tics of purp ose r e- quirement s and exp erimenta l results on p lann ing. In this manner one could ju dge ho w closely a seman tics appro ximates human planning in the wa ys relativ e to purp ose requiremen ts. In particular, our seman tics app ears to hold h uman auditees to to o high of a standard : they are unlik ely to alw a ys b e able to pic k the optimal strategy for a pu rp ose. When enforcing a restrictive rule, this strictness could result in th e aud itor inv estigating some auditees who honestly plann ed for the only allo w ed purp ose, but f ailed to fin d the optimal p olicy . While suc h inv estigations wo uld b e false p ositiv es, th ey d o h a v e the pleasing side-effect of highligh ting areas in wh ic h an aud itee could impr o v e his plann ing. In the case of enf orcing prohibitiv e rules, this strictness could cause the auditor to m iss some violatio ns that do not optimize the p rohibited purp ose, b ut, nev ertheless, are for the purp ose. The additional chec ks prop osed at the en d of S ection 4.3 could b e usefu l for detecting these violatio ns: if the auditee’s actions are n ot consisten t with a strategy that optimizes any of the allo wed p urp oses but d o es imp ro v e to some degree the prohibited purp ose, the actions ma y w arrant extra scru tin y . While our semantic s is limited b y our understandin g of human p lanning, it still rev eals concepts crucial to the meaning of purp ose . Ideas suc h as plannin g and n on-redund an cy w ill guide future in v estigatio ns on the topic. 8 Related W ork W e h a v e already co vered the most closely related w ork in Section 6. Belo w we d iscuss w ork on related p r oblems and wo rk on pu r p ose from other fields. Minimal Disclosure. The w orks most similar to ours in approac h h a v e b een on minimal disclo- sur e , which requ ires that th e amount of information used in gran ting a request for access should b e as little as p ossible w hile still ac hieving the purp ose b ehind the request. M assacci, Mylop ou- los, and Zannone define minimal disclosure for Hipp o cratic d atabases [MMZ06 ]. Barth, Mitc hell, Datta, and Su n daram study min imal d isclosur e in th e con text of workflo ws [BMDS07]. They mo del a workflo w as meeting a utilit y goal if it satisfies a temp oral logic form ula. Minimizing the amount of information disclosed is similar to an agent maximizing his r ew ard and thereb y not p erformin g actions that ha ve costs but no b en efits. Ho wev er, in addition to ha ving differen t researc h goals, we 18 consider several factors that th ese works do not, including quantit ativ e p urp oses that are satisfied to v arying degrees and p robabilistic b eh a vior r esulting in actions b eing for a purp ose despite th e purp ose not b eing ac hiev ed. Expressing Priv acy Polic ies with Purp ose. W ork on u nderstanding th e comp onen ts of pri- v acy p olicies has sho wn that purp ose is a common comp onent of priv acy ru les (e.g., [BA05, BA08]). Some languages for sp ecifying access-con trol p olicies allo w the p urp ose of an action to partially de- termine if access is grante d [PS03, Cra02, BKK F05, BKK06]. Ho wev er, these languages do not giv e a formal seman tics to the pu rp oses. In stead they rely up on the system usin g th e p olicy to determine wh ether an action is f or a purp ose or n ot. Philosophical F oundations. T a ylor p r o vides a detailed explanation of the imp ortance of plan- ning to the meaning of purp ose , but do es not p ro vide any formalism [T a y66]. The sense in whic h the w ord “purp ose” is used in priv acy p olicies is also related to the ideas of desir e , motivation , and intention discu s sed in works of philosophy (e.g., [Ans57]). The most closely related to our w ork is that of Bratman’s on in ten tions from whic h we get our mo del of sequen tial planning [Bra87]. In his w ork, an i ntention is an action an agen t plans to tak e where the plan is form ed w hile attempting to maximize the satisfactio n of th e agen t’s desir es ; Bratman’s desir es corresp ond to our purp oses . Roy formalized Bratman’s w ork u s ing logics and game theory [Roy08]. Ho wev er, th ese w orks are concerned with when an action is r ational rather than determining the purp oses b ehin d the action. W e b orrow the notion of non-r e dundancy from Mac kie’s wo rk on formalizing c ausality usin g coun terfactual reasoning [Mac74]. In particular, Mac kie defines a c ause to b e a non-redun d an t part of a sufficient exp lanation of an effect. Roughly sp eaking, we rep lace the causes with actions and the effect with a purp ose. The extension to our semant ics prop osed in Section 7.2, ma y b e s een as another instance of n on-redund an cy . This time, w e replace the causes with p urp oses and the effect with an acti on. This suggests that for an action to b e f or a purp ose, we exp ect b oth th at the action was non-redun dan t for improving that p u rp ose and that th e purp ose w as n on-redund ant in motiv ating the action. Th at is, we exp ect planning to b e parsimonious . Planning. Psychol ogical studies hav e pro d uced mo dels of human though t (e.g., [ABB + 04]). Ho w- ev er, these are to o low-l ev el an d in complete f or our needs [DKP96]. The GO MS form alism p ro vides a higher lev el mo del, bu t is limited to selecting b eha vior using simp le p lanning approac hes [JK96]. Simon’s appr oac h of b ounde d r ationality [Sim55] and related heuristic-based app roac h es [GS02] mo del more complex planning, but with less precise p redictions. 9 Conclusions and F uture W ork W e use planning to present the fir st formal semantic s for determining when a sequ ence of actions is for a purp ose. In particular, our seman tics uses an MDP-lik e mo del for p lann ing, w h ic h allo ws us to automate auditing for b oth restrictiv e and p rohibitiv e pu rp ose requirements. F u rthermore, our semant ics highligh ts that an action can b e for a purp ose ev en if that p urp ose is nev er ac hiev ed, a p oin t presen t in p hilosophical w orks on th e sub ject (e.g. , [T a y66 ]), bu t whose ramifications on p olicy enforcemen t h ad b een unexplored. Lastly , our fr amework allo ws us to explain and compare previous m etho ds of p olicy enforcemen t in terms of a f ormal semant ics. 19 Ho wev er, we recognize the limitations of this mo del: it imp erfectly mo d els h uman planning an d only captures some f orms of planning for multiple p urp oses. Nev ertheless, we b eliev e the essence of our w ork is correct: an action is f or a p urp ose if the actor s elects to p erform that action while planning for the p urp ose. F utur e work w ill instantiat e our semantic framework with more complete mo dels of human planning. F undamenta lly , our w ork sho ws th e d ifficulties of enforcemen t d ue to iss ues su c h as the tenable deniabilit y of ulterior motiv es. These d iffi cu lties justify p olicies prohibiting conflicts of in terest and r equiring the separation of duties desp ite p ossibly causing inefficiencies. F or example, many hospitals w ould err on the side of caution and d isallo w referral from a physicia n to his own pr iv ate practice or require a second opinion to do so, th ereb y r estraining the ulterior motive of profi t. Indeed, desp ite the maxim that privacy is se curity with a purp ose , due to these difficulties, p urp ose p ossibly pla ys the role of guidance in crafting more operational internal p olicies that organizations enforce rather than the role of a d irect input to the formal auditing pro cess itself. In ligh t of this p ossibilit y , one ma y view our work as a wa y to jud ge the q u alit y of these op erational p olicies relev an t to th e inte nt of the p u rp ose requiremen ts foun d in the actual p riv acy p olicy . W e further b eliev e that ou r formalism may aid organizations in d esigning their p ro cesses to a void the p ossibilit y of or to increase the detectabilit y of p olicy violations. F or example, the organizatio n can d ecrease violations by aligning emp lo yee incentiv es with the allo wed purp oses. Ac kno w ledgmen ts. W e appreciate the discu ssions we ha v e had with Lorrie F aith Cran or and Joseph Y. Halp ern on this work. W e th an k Dilsun Ka ynar and Divy a S harma for man y helpful commen ts on this pap er. References [ABB + 04] J ohn R. Anderson, Da niel Bo thell, Mic hael D. Byrne, Scott Dougla ss, Chr istian Lebiere, and Y ulin Qin. An in tegrated theory of the mind. Psycholo gi c al R eview , 111:10 36–106 0, 2004. [AF07] Sabah S. Al-F edaghi. Beyo nd purp ose-based priv acy access con trol. In A DC ’07: Pr o c e e dings of the E i ghte enth Confer enc e on Austr alasian D atab ase , pages 23–32, Dar- lingh urst, Australia, 2007. Australian Computer So ciet y , Inc. [AKSX02] Rak esh Agra w al, Jerry Kiernan, Ramakrishnan Srik ant, and Yirong Xu. Hipp o cratic databases. In VLDB ’ 02: Pr o c e e dings of the 28th International Confer enc e on V ery L ar ge Data Bases , pages 143–15 4. VLDB En do wmen t, 2002 . [Ans57] G.E.M. Anscom b e. Intention . Harv ard Unive rsit y Press, 1957. [BA05] T ra vis D. Breaux and Annie I. Ant´ on. Analyzing goal semant ics for r igh ts, p ermissions, and obliga tions. In RE ’05: Pr o c e e dings of the 13th IEEE International Confer enc e on R e quir ements Engine ering , pages 177–188, W ashington, DC, USA, 2005. I EEE Com- puter So ciet y . [BA08] T ra vis D. Breaux and Ann ie I. Ant´ on. Analyzing regulatory rules for p riv acy and securit y r equiremen ts. IEEE T r ans. Softw. Eng. , 34(1):5– 20, 2008. 20 [Ban05] Bank of America Corp oration. Bank of America priv acy p olicy for consumers, Septem b er 2 005. Accessed F eb. 4, 2011. Av ailable from: http://w ww.banko famerica.com /privacy/pdf/eng- boa.pdf . [BBL05] Ji-W on Byu n, Elisa Bertino, and Ningh ui Li. Purp ose based access con trol of complex data for priv acy pr otection. In SACMA T ’05: Pr o c e e dings of the T enth ACM Symp o- sium on A c c ess Contr ol Mo dels and T e chnolo gies , pages 102–11 0, New Y ork, NY, USA, 2005. A CM. [BKK06] Carolyn A. Bro die, Clare-Ma rie Karat, and John Karat. An empirical study of natural language parsin g of priv acy p olicy ru les usin g the sparcle p olicy w orkb ench. In SOU PS ’06: Pr o c e e dings of the Se c ond Symp osium on U sable Privacy and Se curity , pages 8–19, New Y ork, NY, USA, 2006. ACM. [BKKF05] Carolyn Bro die, Clare-Marie Karat, J ohn Karat, and Jinjuan F eng. Usable securit y and p riv acy: a case study of dev eloping priv acy managemen t to ols. In SOUPS ’05: Pr o c e e dings of the 2005 Symp osium on U sable Privacy and Se cu rity , pages 35–43, New Y ork, NY, USA, 2005. A CM. [BL08] Ji-W on Byu n and Ningh ui Li. Purp ose based access control for priv acy protection in relational database sys tems. The VLD B Journal , 17(4):60 3–619, 2008. [BMDS07] Adam Barth, John Mitchell , Anupam Datta, and Sharada Su ndaram. Pr iv acy and utilit y in busin ess pro cesses. In CSF ’ 07: Pr o c e e dings of the 20th IEEE Computer Se curity F oundations Symp osium , pages 279–2 94, W ashington, DC, USA, 2007. IEEE Computer So ciet y . [Bra87] Mic h ael E. Bratman. Intention, Plans, and Pr actic al R e ason . Harv ard Univ ersit y Press, Cam bridge, Mass., 1987. [CMVZ09] Stephen Chong, Andrew C. My ers , K. Vikr am, and Lantia n Z h eng. Jif R efer enc e Manual , F ebr uary 2009. Av ailable from: http:// www.cs.co rnell.ed u/jif . [Cra02] Lorrie F aith Cranor. W eb Privacy with P3P . O’Reilly , 2002. [d’E63] F. d’Ep enoux. A probabilistic p r o duction and in v en tory p roblem. Management Scienc e , 10(1): 98–108, Octob er 1963. [DKP96] Jagannath Prasad Das, Bino d C. Kar, and Rauno K. Pa rrila. Co gnitive Planning: The Psycholo gic al Basis of Intel lige nt Behavior . Sage, 1996 . [EKWB11] Md. Enamul Kabir, Hua W ang, and Elisa Bertino. A conditional purp ose-based access con trol mo del w ith dynamic roles. Exp ert Syst. Appl. , 38:14 82–148 9, March 201 1. Av ailable from: http://d x.doi.org /10.1016/j.eswa.2010.07.057 . [F ai] F airW arning. F airW arnin g: Priv acy breac h detection for healthcare. Accessed F eb. 7, 2011. Av ailable fr om: http:/ /fairwarn ingaudit .com/ . [Gig02] Gerd Gigerenzer. The adaptiv e to olb ox. In Gerd Gigerenzer and Reinhard Selten, editors, Bounde d R ationality: The A daptive T o olb ox , Dahlem W orkshop Rep orts, pages 37–50 . MIT Press, 2002. 21 [GS02] Gerd Gigerenzer and Reinhard S elten, ed itors. Bounde d R ationality: The A daptive T o olb ox . Dahlem W orkshop Rep orts. MIT Press, 2002. [HA05] Katia Ha yat i and Mart ´ ın Abadi. Language-based enf orcemen t of priv acy p olicies. In PET 2004: Workshop on Privacy Enhancing T e chnolo gi e s , p ages 302–3 13. Sp ringer- V erlag, 2005. [JK96] Bonnie E. John and Da vid E. Kieras. The GOMS family of user interface analysis tec h niques: comparison and con trast. ACM T r ans. Comput.-Hum. Inter act. , 3:320– 351, Decem b er 199 6. Av ailable fr om: http ://doi.a cm.org/1 0.1145/235833.236054 . [JSNS09] Mohammad Jafari, Reihan eh Safa vi-Naini, and Nic holas P aul Shep pard. Enforcing purp ose of use via w orkflo ws. In WPES ’09: Pr o c e e dings of the 8th ACM Workshop on Privacy in the Ele ctr onic So ciety , pages 113–11 6, New Y ork, NY, US A, 2009. A CM. Av ailable from: http://d oi.acm.or g/10.1145/1655188.1655206 . [Kar84] N. K armark ar. A new p olynomial-time alg orithm for linear programming. In STOC ’ 84: P r o c e e dings of the Sixte enth Annual ACM Symp osium on The ory of Computing , pages 302 –311, New Y ork , NY, USA, 1984. A C M. Av ailable from: http://d oi.acm.o rg/10.1145/8 00057.808695 . [Kha79] L. G. Kh achian. A p olynomial algorithm in linear p rogramming. Dokl. Akad. Nauk SSSR , 244:1093 –1096, 19 79. En glish translation in So viet Math. Dokl. 20, 191-194 , 1979. [LDK95] Mic h ael L. Littman, Thomas L. Dean, and Leslie P . Kaelbling. On the complexit y of solving marko v d ecision pr oblems. In Pr o c e e dings of the E leventh Annual Confer enc e on Unc ertainty in Artificial Intel ligenc e (UAI–95) , p ages 394–402 , Mon tr eal, Qu ´ eb ec, Canada, 1995 . [Mac74 ] John L. Mac kie. The Cement of the Univ e rse: A Study of Causation . Oxford Universit y Press, 1974. [MMZ06] F abio Massacci, John Mylop oulos, and Nicola Zannone. Hie rarc hical hipp o cratic databases with m inimal disclosure for virtual organiza tions. Th e VLDB Journal , 15(4): 370–387 , 2006 . [NBL + 10] Qu n Ni, Elisa Bertino, Jorge Lob o, Carolyn Bro d ie, Clare-Marie K arat, John Karat, and Alb erto T rom b etta. Priv acy-a w are role-based access con- trol. ACM T r ans. Inf. Syst. Se cur. , 13:24:1–2 4:31, July 20 10. Av ailable from: http://d oi.acm.o rg/10.1145/1 805974.1805980 . [OED89] purp ose, n. In The O xfor d English Dictionary . Oxf ord Unive rsit y Press, 2nd edition, 1989. [Off03] Office for Civil Rights, U.S. Department of Health and Human Services. Summary of the HIP AA p riv acy r ule. OC R Priv acy Brief, 2003. [PGY08] Huanc h un Peng, Ju n Gu, and Xiao jun Y e. Dynamic purp ose-based access cont rol. In International Symp osium on Par al lel and D istribute d Pr o c essing with Applic ations , pages 695–700, Los Alamitos, CA, USA, 2008. IEEE Computer So ciet y . 22 [PS03] Calvin Po w ers a nd Matthias S c h un ter. Enterprise priv acy a uthorization languag e (EP AL 1.2). W3C Mem b er Subm iss ion, No vem b er 2003. [RN03] Stuart J. Ru ssell and Pet er Norvig. Artificial Intel ligenc e: A Mo dern A ppr o ach . P earson Education, 2 edition, 2003. [Ro y08] Olivier Ro y . Thinking b efor e A cting: Intentions, L o gic, R ationa l Choic e . PhD thesis, Institute for Logic, Language and Compu tation; Universitei t v an Amsterdam, 2008. [San96] Ra vi S. Sand hu. Role hierarchies and constrain ts for lattice-based acce ss cont rols. In ESORICS ’96: Pr o c e e dings of the 4th Eur op e an Symp osium on R ese ar ch in Computer Se curity , pages 65–79, Lond on , UK, 1996. S p ringer-V erlag. [Sel02] Reinhard S elten. What is b ounded rationalit y? In Gerd Gige renzer and Reinh ard Sel- ten, editors, Bounde d R ationa lity: The A dap tive T o olb ox , Dahlem W orkshop Rep orts, pages 13–36. MIT Press, 2002. [Sim55] Herb ert A. Simon. A b eh avioral mo del of rational choic e. Quarterly Journal of Ec o- nomics , 69:99– 118, 1955. [SY86] R E Strom and S Y emini. Typ estate: A pr ogramming language concept for enh ancing soft w are reliabilit y . IEE E T r ans. Softw. Eng. , 12:157 –171, Jan uary 1986. Av ailable from: http://po rtal.acm .org/citation.cfm?id=10677.10693 . [T a y66] Ric hard T a ylor. A ction and Purp ose . Pr entice- Hall, 1966. [The95] The Europ ean P arliamen t and the Cou n cil of the Eur op ean Union. Directiv e 95/46/ EC. Official Journal of the Eur op e an U nion , L 281:31– 50, No v em b er 1995. [Uni10] United States Con gress. Financial services mo dern ization act of 1999 . Title 15, United States Co de, S ection 6802, F ebruary 2010. Accessed F eb. 4, 2011. Av ailable from: http://w ww.law.c ornell.edu/u scode/15/usc_sec_15_00006802- - - - 000- . html . [W as03] W ashington Radiology Associates, P .C. Notic e of priv acy prac- tices, April 2003. Accessed F eb . 4, 2011 . Av ailable from: http://w ww.washi ngtonradiolo gy.com/office- guide/privacy.asp . [Y ah10a] Y aho o! P r iv acy p olicy: Information collec tion and use, 20 10. Av ailable from: http://i nfo.yaho o.com/privac y/us/yahoo/details.html#2 . [Y ah10b] Y aho o! Priv acy p olicy: Y aho o Mail, 2010. Av ailable from: http://i nfo.yaho o.com/privac y/us/yahoo/mail/details.html . 23 A Details of MDPs One ma y find a discuss ion of MDPs in most introd u ctions to artifi cial in telligence (e.g., [RN03]). F or an MDP m = hQ , A , t , r, γ i , the d iscount factor γ accoun ts for the preferen ce of p eople for receiving rewa rds so on er than later. It may b e though t of as s im ilar to inflation. W e require that γ < 1 to ens u re that the exp ected total discounted reward is b ounded . The v alue of a state q und er a strategy σ is V m ( σ , q ) = E " ∞ X i =0 γ i r ( q i , σ ( q i )) # The Bellman equation sho ws that V m ( σ , q ) = r ( q , σ ( q )) + γ X q ′ ∈Q t ( q , σ ( q ))( q ′ ) ∗ V m ( σ , q ′ ) A s trategy σ ∗ is optimal if and only if for all states q , V m ( σ ∗ , q ) = max σ V m ( σ , q ). A t least one optimal p olicy alwa ys exists. F urtherm ore, if σ ∗ is optimal, then σ ∗ ( q ) = arg max a ∈A r ( s, a ) + γ X q ′ ∈Q t ( q , σ ( q ))( q ′ ) ∗ V m ( σ , q ′ ) B Pro of of Theorem 1 The prop er su b -execution r elation is a strict partial order. This follo w s directly from the p rop er- subsequence relation ⊏ b eing a strict partial order. W e write ⊳ for pr op er sub -exe cution and E for pr op er sub-exe cution or e qual . No w , w e sh o w th at ≺ is also strict partial ordering. • Irr eflexivit y : for no σ is σ ≺ σ . F or σ ≺ σ to b e true, there would h a v e to exist a σ ∈ opt suc h that for at least one continge ncy κ ′ and q ′ , m ( q ′ , κ ′ , σ ′ ) is a prop er sub-execution of itself. Ho wev er, this is imp ossible s in ce the sub-execution relation is strict partial order. • Asymmetry: for all σ 1 and σ 2 , if σ 1 ≺ σ 2 , th en it is not the case that σ 2 ≺ σ 1 . T o s h o w a con tradiction, su pp ose σ 1 ≺ σ 2 and σ 2 ≺ σ 1 are b oth true. I t w ould h a v e to b e the case that for all con tingencies κ and states q , m ( q , κ, σ 1 ) E m ( q , κ, σ 2 ) and m ( q , κ, σ 2 ) E m ( q , κ, σ 1 ). Since ⊳ is a strict partial order, this implies that for all q and κ , m ( q , κ, σ 1 ) = m ( q , κ, σ 2 ). Th us, there cannot exist a con tingency κ ′ and state q ′ suc h that m ( q ′ , κ ′ , σ 2 ) ⊳ m ( q ′ , κ ′ , σ 1 ). Then σ 2 ≺ σ 1 cannot b e tru e, a con tradiction. • T r ansitivit y: for all σ 1 , σ 2 , and σ 3 , if σ 1 ≺ σ 2 and σ 2 ≺ σ 3 , then σ 1 ≺ σ 3 . Supp ose σ 1 ≺ σ 2 and σ 2 ≺ σ 3 . Then for all for all contingencie s κ and s tates q , m ( q , κ, σ 1 ) E m ( q , κ, σ 2 ) and m ( q , κ, σ 2 ) E m ( q , κ, σ 3 ). S ince E has transitivit y , this implies th at m ( q , κ, σ 1 ) E m ( q , κ, σ 3 ) for all κ and q . F urthermore, it must b e the case that there exists a con tingency κ ′ and state q ′ suc h that m ( q ′ , κ ′ , σ 1 ) ⊳ m ( q ′ , κ ′ , σ 2 ). F rom abov e, m ( q ′ , κ ′ , σ 2 ) E m ( q ′ , κ ′ , σ 3 ). Thus, b y the transitivit y of E , m ( q ′ , κ ′ , σ 1 ) ⊳ m ( q ′ , κ ′ , σ 3 ) as needed. 24 Since ≺ is a s tr ict partial order in g and Q → A is finite, Q → A is w ell-founded u nder ≺ . Q → A b eing fi nite also means that opt ( m ) is finite. It is also known to b e n on-empt y [RN03]. Supp ose opt ∗ ( m ) w ere empty . T his would mean for eve ry σ of opt ( m ), there exists σ ′ in opt ( m ) suc h th at σ ′ ≺ σ . Since opt ( m ) is fin ite but non-empty , this could on ly happ en if ≺ con tained cycles. Ho w ev er, this is a con tradiction since ≺ is a str ict partial order and Q → A is well-fo und ed under it. Thus, opt ∗ ( m ) is n ot empt y . C Pro ofs ab out Useless States Prop osition 1. F or al l envir onment mo dels m , sets U such that U ⊆ U m , str ate gies σ , and states q , V m ( σ , q ) ≤ V m ( U ( σ ) , q ) . Pr o of. Let exec ( b ) b e all the executions with the b eha vior b as a prefix. Let B U b e the set of all b ehavio rs b suc h that for some j , b = [ q 0 , a 1 , q 1 . . . , q j , a j +1 , q j +1 ] such that h q j , a j +1 i is in U bu t for not i < j is h q i − 1 , a i i in U . W e ma y use B U and exec ( b ) to partition th e space of executions E . Th us, V m ( σ , q ) = E " ∞ X i =0 γ i r ( q i , σ ( q i )) # = X e ∈ E Pr[ e | σ ] " ∞ X i =0 γ i r ( q i , σ ( q i )) # = X b ∈ B U X e ∈ exec ( b ) Pr[ e | σ ] " ∞ X i =0 γ i r ( q i , σ ( q i )) # (1) (Note: as E is uncoun table, taking a summ ation ov er it is ill advised. W e could tak e an inte gral instead. Alternativ ely , one could tak e t he sum ov er executions of b ounded length. T h is will in tro duce an err or term. Ho w ev er, as th e b ound increases th e m agnitude of this term will drop exp onenti ally fast du e to the factor γ . In essence, this is ho w most practical algorithms for solving MDPs op erate. See [RN03].) F or an y b in B U , consid er e ∈ exec ( b ). Since e is in exec ( b ), it must ha v e th e follo wing form [ q 0 , a 1 , q 1 . . . , q j , a j +1 , q j +1 , . . . ] where h q j , a j +1 i ∈ U but for i < j is h q i , a i +1 i / ∈ U w h ere b = [ q 0 , a 1 , q 1 . . . , q j , a j +1 , q j +1 ]. 25 F or σ ∈ strg ( b ), w e r eason as sho wn as follo ws. X e ∈ exec ( b ) Pr[ e | σ ] " ∞ X i =0 γ i r ( q i , σ ( q i )) # = X e ∈ exec ( b ) Pr[ e | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) + ∞ X i = j γ i r ( q i , σ ( q i )) = X e ∈ exec ( b ) Pr[ e | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) + γ j X e ∈ exec ( b ) Pr[ e | σ ] ∞ X i = j γ i − j r ( q i , σ ( q i )) = X e ∈ exec ( b ) Pr[ e | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) + γ j E ∞ X i = j γ i r ( q i , σ ( q i )) = X e ∈ exec ( b ) Pr[ e | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) + γ j V m ( σ , q j ) (2) F urthermore, X e ∈ exec ( b ) Pr[ e | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) = Pr[ b | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) Th us, the left term is equal und er σ and U ( σ ): X e ∈ exec ( b ) Pr[ e | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) = Pr[ b | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) (3) = Pr[ b | σ ] j − 1 X i =0 γ i r ( q i , U ( σ )( q i )) (4) = X e ∈ exec ( b ) Pr[ e | U ( σ )] j − 1 X i =0 γ i r ( q i , U ( σ )( q i )) (5) where line 4 follo ws since σ ( q i ) = U ( σ )( q i ) f or h q i , a i +1 i / ∈ U . Since h q j , a j +1 i ∈ U , w e know that Q m ( σ , q j , a j +1 ) ≤ 0. F urthermore, since σ ∈ strg ( b ), it is the case that σ ( q j ) = a j +1 . Thus, V m ( σ , q j ) = Q m ( σ , q j , σ ( q j )) ≤ 0. F u r thermore, since h q j , a j +1 i ∈ U , 26 V m ( U ( σ ) , q j ) = Q m ( σ , q j , N ) = 0. Thus, w e may conclude X e ∈ exec ( b ) Pr[ e | σ ] " ∞ X i =0 γ i r ( q i , σ ( q i )) # = X e ∈ exec ( b ) Pr[ e | σ ] j − 1 X i =0 γ i r ( q i , σ ( q i )) + γ j V m ( σ , q j ) (6) = X e ∈ exec ( b ) Pr[ e | U ( σ )] j − 1 X i =0 γ i r ( q i , U ( σ )( q i )) + γ j V m ( σ , q j ) (7) ≤ X e ∈ exec ( b ) Pr[ e | U ( σ )] j − 1 X i =0 γ i r ( q i , U ( σ ( q i ))) + γ j V m ( U ( σ ) , q j ) (8) (9) = X e ∈ exec ( b ) Pr[ e | U ( σ )] " ∞ X i =0 γ i r ( q i , U ( σ )( q i )) # (10) where lines 6 and 10 come from the reasoning leading to line 2, and line 7 comes from the reasoning leading to line 5. Note that the ab o v e also trivially holds when σ / ∈ strg ( b ) since Pr[ e | σ ] = 0 an d Pr[ e | U ( σ )] = 0 for all e ∈ exec ( b ). Thus, for all σ , w e h a v e X e ∈ exec ( b ) Pr[ e | σ ] " ∞ X i =0 γ i r ( q i , σ ( q i )) # ≤ X e ∈ exec ( b ) Pr[ e | U ( σ )] " ∞ X i =0 γ i r ( q i , U ( σ )( q i )) # (11) Th us, V m ( σ , q ) = X b ∈ B U X e ∈ exec ( b ) Pr[ e | σ ] " ∞ X i =0 γ i r ( q i , σ ( q i )) # (12) ≤ X b ∈ B U X e ∈ exec ( b ) Pr[ e | σ ] " ∞ X i =0 γ i r ( q i , σ ( q i )) # (13) = V m ( U ( σ ) , q ) (14) where line 12 and 14 comes from the reasoning of line 1, and line 13 comes from equation 11. D Pro of of Lemma 1 First w e prov e that this log − 1 ( b ) ∩ behv ∗ ( m ) in the lemma may b e r eplaced with strg m ( b ) ∩ o pt ∗ ( m ). Then, we p ro v e the mo dified statemen t with t w o p rop ositions. W e ha v e one corresp onding to th e if direction and one to the only if directio n. Prop osition 2. F or envi r onment mo dels m , if f or al l observable b ehaviors b , log ( b ) = b , then strg ( b ) ∩ opt ∗ ( m ) is empty if and only if log − 1 ( b ) ∩ b ehv ∗ ( m ) is empty. 27 Pr o of. Since log − 1 ( b ) = { b } , log − 1 ( b ) ∩ b ehv ∗ ( m ) is empt y if and only b / ∈ b ehv ∗ ( m ). b is in b ehv ∗ ( m ) if and only if th ere exists a strategy σ in opt ∗ ( m ) suc h that there exists a con tingency κ , and a state q su c h that b is a subsequence of m ( q , κ, σ ). F or all σ in opt ∗ ( m ), ∃ κ, q .b ⊏ m ( q , κ, σ ) is equiv alent to ∀ i ∈ [0 , n ) .σ ( q i ) = a i +1 where b = [ q 0 , a 1 , q 1 , a 2 , . . . , a n , q n ]. T o see this, note b was observed and, thus, it must ha v e b een pr o duced b y a con tingency consisten t with m . ∀ i ∈ [0 , n ) .σ ( q i ) = a i +1 is equiv alent to σ ∈ strg ( b ). Thus, b is in b ehv ∗ ( m ) if and only if there exists a strategy σ in opt ∗ ( m ) su c h that σ is in strg ( b ). T hus, log − 1 ( b ) ∩ b ehv ∗ ( m ) is not empt y if and only if strg ( b ) ∩ opt ∗ ( m ) is n ot empt y . Prop osition 3. F or al l envir onment mo dels m and b ehaviors b = [ q 0 , a 1 , q 1 , . . . , a n , q n ] , strg ( b ) ∩ opt ∗ ( m ) is not empty if (1) for al l i such that 0 ≤ i < n , h q i , a i +1 i / ∈ U m and (2) strg ( b ) ∩ opt ( m ) is not empty. Pr o of. Supp ose the conditions (1) and (2) are true. Since strg ( b ) ∩ opt ( m ) is not empty , there exists some σ 1 in b oth of them. S ince σ 1 is in strg ( b ), for all 0 ≤ i < n , σ 1 ( q i ) = a i +1 . Thus, by condition (2), h q i , σ 1 ( q i ) i / ∈ U m . Th is furth er implies that a i +1 is not N . Let σ 2 = U m ( σ 1 ). σ 2 is in strg ( b ) b ecause f or all 0 ≤ i < n , σ 1 ( q i ) = σ 2 ( q i ) since h q i , σ 1 ( q i ) i / ∈ U m . F urtherm ore, b y Prop osition 1, for all q , V m ( σ 1 , q ) ≤ V m ( σ 2 , q ). Th us, σ 2 is in opt ( m ) as well. T o show th at σ 2 is also in opt ∗ ( m ), supp ose it w ere n ot. Since σ 2 is in opt ( m ), this im- plies that ther e exists σ ′ in opt ( m ) s uc h that σ ′ ≺ σ 2 . F or th is to b e tru e, there m ust exist κ ′ and state q ′ suc h that active ( m ( q ′ , κ ′ , σ ′ )) ⊏ active ( m ( q ′ , κ ′ , σ 2 ). Th us, for some i , m ( q ′ , κ ′ , σ 2 ) m ust h a v e the f orm [ q 0 , a 1 , q 1 , . . . , q i − 1 , a i , q i , a i +1 , q i +1 , . . . ], and m ( q ′ , κ ′ , σ ′ ) must ha v e the form [ q 0 , a 1 , q 1 , . . . , q i − 1 , a i , q i , N , q i , . . . ] wh er e a i +1 is not N . Sin ce σ 2 ( q i ) = a i +1 , by the construction of σ 2 , h q i , a i +1 i is not in U m . Thus, there exists s ome σ 3 suc h th at Q m ( σ 3 , q i , a i +1 ) > 0. Since σ 2 is in opt ( m ), Q m ( σ 2 , q i , a i +1 ) ≥ Q m ( σ 3 , q i , a i +1 ) > 0. Thus, we hav e V m ( σ 2 , q i ) = Q m ( σ 2 , q i , a i +1 ) > 0. How ev er, V m ( σ ′ , q i ) = 0 meaning that σ ′ is n ot in opt ( m ), a con tradiction. Prop osition 4. F or al l envir onment mo dels m and b ehaviors b = [ q 0 , a 1 , q 1 , . . . , a n , q n ] , if st rg ( b ) ∩ opt ∗ ( m ) is not empty, then (1) for al l i such that 0 ≤ i < n , h q i , a i +1 i / ∈ U m and (2) strg ( b ) ∩ opt ( m ) is not empty. Pr o of. Condition (2) follo ws from the fact that opt ∗ ( m ) ⊆ opt ( m ). T o pro v e cond ition (1), supp ose st rg ( b ) ∩ opt ∗ ( m ) is not emp t y bu t condition (1) do es n ot hold. T hen there exists σ 1 in st rg ( b ) ∩ opt ∗ ( m ). F ur th ermore, there exists so me i ′ suc h that h q i ′ , a i ′ +1 i ∈ U m . Since σ 1 ∈ strg ( b ), it must b e the case that for all i < n , a i +1 = σ ( q i ). Thus, σ 1 ( q i ′ ) = a i ′ +1 . By Prop osition 1, for all q , V m ( σ 1 , q ) ≤ V m ( U m ( σ 1 ) , q ). F u rthermore, U ( σ 1 ) ≺ σ 1 . T o see this, recall that U m is n ot empt y . Thus, an y con tingency κ ′ that results in state q i ′ , m ( q 0 , κ ′ , U m ( σ 1 )) ⊏ m ( q 0 , κ ′ , σ 1 ) since on ly U m ( σ 1 ) do es nothing at q i ′ . F or κ th at do not lead to q i ′ , th e t w o executions will b e th e same. Since U m ( σ 1 ) ≺ σ 1 and U ( σ 1 ) is in opt ( m ), σ 1 cannot b e in opt ∗ ( b ), a con tradiction. E Pro of of Lemma 2 If h q , a i is in U m , then a 6 = N and for all strategies σ , Q m ( σ , q , a ) ≤ 0. Th us, the lemma is true if the follo wing is true: Q ∗ ( q , a ) ≤ 0 iff ∀ σ.Q m ( σ , q , a ) ≤ 0. 28 T o sho w this, note that ∀ σ.Q m ( σ , q , a ) ≤ 0 iff max σ Q m ( σ , q , a ) ≤ 0. F u rthermore, max σ Q m ( σ , q , a ) = max σ r ( q , a ) + γ X q ′ t ( q , a )( q ′ ) ∗ V m ( σ , q ′ ) = r ( q , a ) + γ X q ′ t ( q , a )( q ′ ) ∗ max σ V m ( σ , q ′ ) = r ( q , a ) + γ X q ′ t ( q , a )( q ′ ) ∗ V ∗ ( q ′ ) = Q ∗ m ( q , a ) Th us, ∀ σ.Q m ( σ , q , a ) ≤ 0 iff Q ∗ m ( q , a ) ≤ 0. F Prop erties of fi x Prop osition 5. F or al l envir onment mo dels m , str ate gies σ , and states q , V fix ( m,b ) ( σ , q ) ≤ V m ( σ , q ) . Pr o of. Let m = hQ , A , t, r , γ i and fix ( m, b ) = hQ , A , t, r ′ , γ i . V m ( σ , q ) = E " ∞ X i =0 γ i r ( q i , σ ( q i )) # (15) ≤ E " ∞ X i =0 γ i r ′ ( q i , σ ( q i )) # (16) = V fix ( m,b ) ( σ , q ) (17) where line 16 follo w s from the fact that f or all q and a , r ′ ( q , a ) ≤ r ( q , a ). Prop osition 6. F or al l envir onment mo dels m , b ehaviors b , σ ∈ strg ( b ) , and states q , V fix ( m,b ) ( σ , q ) = V m ( σ , q ) Pr o of. Let m = hQ , A , t, r , γ i and fix ( m, b ) = hQ , A , t, r ′ , γ i . Let b = [ q 0 , a 1 , q 1 , . . . , a n , q n ]. Since σ is in strg ( b ), for all i suc h that 0 ≤ i < n , σ ( q i ) = a i +1 . Thus, r ′ ( q i , a i +1 ) = r ( q i , a i +1 ). F or all q that is not equal to q i for any i , r ′ ( q , a ) = r ( q , a ) for all a . Thus, for all a and q , r ′ ( q , σ ( q )) = r ( q , σ ( q )). T his implies V m ( σ , q ) = E " ∞ X i =0 γ i r ( q i , σ ( q i )) # = E " ∞ X i =0 γ i r ′ ( q i , σ ( q i )) # = V fix ( m,b ) ( σ , q ) Prop osition 7. F or al l envir onment mo dels m , b ehaviors b , and σ 1 / ∈ strg ( b ) , ther e exists a σ 2 ∈ strg ( b ) such that for al l states q , V fix ( m,b ) ( σ 1 , q ) ≤ V fix ( m,b ) ( σ 2 , q ) . 29 Pr o of. Let fi x ( m, b ) = hQ , A , t, r ′ , γ i . Let b = [ q 0 , a 1 , q 1 , . . . , a n , q n ]. Since σ 1 is n ot in strg ( b ), there must exist some i suc h that σ 1 ( q i ) 6 = a i +1 . Let the set I hold all suc h indexes i : I = { i ∈ [0 , n ) | σ 1 ( q i ) 6 = a i +1 } . Let σ 2 b e the strategy suc h that σ 2 ( q ) = a i +1 if q = q i for some i ∈ I and σ 2 ( q ) = σ 1 ( q ) otherwise. By constru ction, σ 2 is in strg ( b ). By the constru ction of fix ( m, b ), for all i ∈ I , r ′ ( q i , σ 1 ( q i )) = − ω ≤ r ′ ( q i , a i +1 ) = r ′ ( q i , σ 2 ( q i )). Th us, for all q , r ′ ( q , σ 1 ( q )) ≤ r ′ ( q , σ 2 ( q )). Thus, for all states q , V fix ( m,b ) ( σ 1 , q ) ≤ V fix ( m,b ) ( σ 2 , q ). Prop osition 8. F or al l envir onment mo dels m , b ehaviors b , σ 1 / ∈ strg ( b ) , and σ 2 ∈ strg ( b ) , ther e exists a state q such that V fix ( m,b ) ( σ 1 , q ) < V fix ( m,b ) ( σ 2 , q ) . Pr o of. Let b = [ q 0 , a 1 , q 1 , . . . , a n , q n ]. Sin ce σ 1 is n ot in strg ( b ), ther e must exist some i suc h th at σ 1 ( q i ) 6 = a i +1 . By the construction of fix ( m, b ), r ′ ( q i , σ 1 ( q i )) = − ω . Recall that ω > 2 r ∗ / (1 − γ ) where r ∗ is the rewa rd with th e largest magnitude. Th us, V fix ( m,b ) ( σ 1 , q i ) = r ( q i , σ 1 ( q i )) + γ X q ′ t ( q i , σ 1 ( q i ))( q ′ ) ∗ V m ( σ , q ′ ) (18) = − ω + γ X q ′ t ( q i , σ 1 ( q i ))( q ′ ) ∗ V m ( σ , q ′ ) (19) ≤ − ω + γ X q ′ t ( q i , σ 1 ( q i ))( q ′ ) ∗ r ∗ / (1 − γ ) (20) = − ω + γ ∗ r ∗ / (1 − γ ) (21) ≤ − ω + r ∗ / (1 − γ ) (22) < − [2 r ∗ / (1 − γ )] + r ∗ / (1 − γ ) (23) = − r ∗ / (1 − γ ) ( 24) ≤ V m ( σ 2 , q ) (25) = V fix ( m,b ) ( σ 2 , q ) (26) where line 21 follo ws from t ( q , σ 1 ( q ) b eing a probabilit y distrib ution o v er states, line 25 f ollo ws from the d efinition of r ∗ and kno wn b ound s (e.g., [RN03]), and line 26 f ollo ws from by Prop osition 6. Prop osition 9. F or al l envir onment mo dels m and b ehaviors b , opt ( fix ( m, b )) is a subset of strg ( b ) . Pr o of. Supp ose σ 1 w ere n ot in strg ( b ). By Prop osition 8, for all σ 2 ∈ strg ( b ), there exists a state q suc h th at V fix ( m,b ) ( σ 1 , q ) < V fix ( m,b ) ( σ 2 , q ). Th us, σ 1 is not in opt ( fix ( m, b )). Prop osition 10 . F or al l envir onment mo dels m , b ehaviors b , and str ate gies σ in opt ( fix ( m, b )) , V fix ( m,b ) ( σ , q ) = V m ( σ , q ) . Pr o of. Let σ b e in opt ( fix ( m, b )). σ m ust b e in strg ( m, b ) by Prop osition 9. Thus, V fix ( m,b ) ( σ , q ) = V m ( σ , q ) b y Pr op osition 6. G Pro of of Lemma 3 This lemma follo ws directly from the Prop ositions 11 and 12 b elow. Prop osition 11. F or al l envir onment mo dels m and b ehaviors b , strg ( b ) ∩ opt ( m ) = opt ( fix ( m, b )) ∩ opt ( m ) . 30 Pr o of. Consider the set strg-opt ( m, b ) = strg ( b ) − opt ( fix ( m, b )). F or all σ in strg-opt ( m, b ), σ is in strg ( b ) bu t not opt ( fix ( m, b )). By b eing in strg ( b ), V fix ( m,b ) ( σ , q ) = V m ( σ , q ) by Prop osition 6. Th us, since σ is not in opt ( fix ( m, b )), σ is n ot in opt ( m ) either by Pr op osition 5. Th is means that strg-opt ( m, b ) ∩ opt ( m ) is empty . F urthermore, opt ( fix ( m, b )) ⊆ strg ( b ) b y Prop osition 9. Thus, strg ( b ) = opt ( fix ( m, b )) ∪ ( strg ( b ) − opt ( fix ( m, b ))) = opt ( fix ( m, b )) ∪ s trg-opt ( m, b ). Thus, strg ( b ) ∩ opt ( m ) = ( opt ( fix ( m, b )) ∪ strg-opt ( m, b )) ∩ opt ( m ) = ( opt ( fix ( m, b )) ∩ opt ( m )) ∪ ( strg-opt ( m, b ) ∩ opt ( m )) = opt ( fix ( m, b )) ∩ opt ( m ) Prop osition 12. F or al l envir onment mo dels m and b ehaviors b , opt ( m ) ∩ opt ( fix ( m, b )) is empty if and only if for al l q , max σ V fix ( m,b ) ( σ , q ) 6 = max σ V m ( σ , q ) . Pr o of. Supp ose that opt ( m ) ∩ opt ( fix ( m, b )) is n ot empty . Then there exists σ ∗ in b oth of them. Th us, max σ V fix ( m,b ) ( σ , q ) = V fix ( m,b ) ( σ ∗ , q ) (27) = V m ( σ ∗ , q ) (28) = max σ V m ( σ , q ) (29) where line 28 follo w s fr om P rop osition 10 and lines 27 and 29 follo w from σ ∗ b eing in b oth opt ( m ) and opt ( fix ( m, b )). Supp ose that for all q , max σ V m ( σ , q 0 ) = max σ V fix ( m,b ) ( σ , q 0 ). Let σ ∗ b e in opt ( fix ( m, b )). F or all q , V m ( σ ∗ , q ) = V fix ( m,b ) ( σ ∗ , q ) (30) = max σ V fix ( m,b ) ( σ , q 0 ) (31) = max σ V m ( σ , q ) (32) where line 30 follo ws f rom Prop osition 10 and lin e 31 from σ ∗ ∈ opt ( fix ( m, b )). Thus, σ ∗ is in opt ( m ) and opt ( m ) ∩ opt ( fix ( m, b )) is n ot emp t y . H Pro of of T heorem 2 Line 05 will return true if there exists i suc h that a i +1 6 = N and Q ∗ m ( q i , a i +1 ) ≤ 0. By Lemma 2, this implies that h q i , a i +1 is in U m . By Lemma 1 , this imp lies that log − 1 ( b ) ∩ b ehv ∗ ( m ) is emp t y under cond ition (1). Lines 06–1 6 constructs m ′ = fix ( m, b ). It constructs r ′ from r b y fi rst setting r ′ = r . On lines 13–16 , it then s ets r ′ ( q i , k ) to b e − ω for all k suc h that k 6 = a i +1 . Th us, r ′ ( q i , a i +1 ) will b e left as r ( q i , a i +1 ) as needed. 31 If Line 05 do es not Line 21 will retur n false if there exists j su ch that V ∗ m ( j ) = V ∗ m ( j ). I n this case, it cannot b e that for all q , m ax σ V m ( σ , q 0 ) = max σ V fix ( m,b ) ( σ , q 0 ). T h us, b y Lemma 3 , strg ( b ) ∩ opt ( m ) is n ot empt y and condition (2) is false of Lemma 1. S ince the function wo uld had returned already at Line 05 if condition (1) we re tru e, w e know it is false. T h us, b y Lemma 1 , log − 1 ( b ) ∩ b ehv ∗ ( m ) is n ot empt y . If Line 22 is reac hed, true is r eturned. This can only happ en if condition (2) is tru e. Th is implies that log − 1 ( b ) ∩ b ehv ∗ ( m ) is empt y b y Lemm a 1. Th us, the algorithm is correct w hether it returns true or f alse . 32
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment