Random Differential Privacy

Random Diﬀeren tial Priv acy Rob Hall Alessandro Rinaldo Larry W asserman No vem ber 27, 2024 Abstract W e propose a relaxed priv acy deﬁnition called r andom diﬀer ential privacy (RDP). Diﬀerential priv acy requires that adding an y new observ ation to a database will ha ve small eﬀect on the output of the data-release procedure. Random diﬀeren tial priv acy requires that adding a r andomly dr awn new observation to a database will ha v e small eﬀect on the output. W e sho w an analog of the composition property of diﬀeren tially priv ate procedures whic h applies to our new deﬁnition. W e sho w how to release an RDP histogram and we sho w that RDP histograms are muc h more accurate than histograms obtained using ordinary diﬀerential priv acy . W e ﬁnally show an analog of the global sensitivity framew ork for the release of functions under our priv acy deﬁnition. 1 In tro duction Diﬀer ential privacy (DP) ([8]) is a type of priv acy guarantee that has b ecome quite p opular in the computer science literature. The adv antage of diﬀerential priv acy is that it gives a strong and mathematically rigorous guaran tee. The disadv antage is that the s trong priv acy guarantee often comes at the exp ense of the statistical utility of the released information. W e prop ose a weak er notion of priv acy , called “random diﬀerential priv acy” (RDP), under whic h it is p ossible to achiev e b etter accuracy . The priv acy guaran tee pro vided b y RDP represen ts a radical w eakening of the ordinary diﬀerential priv acy . This could be a cause for concern for those who w ant v ery strong priv acy guarantees. Indeed, we ar e not suggesting the RDP should r eplac e or dinary diﬀer ential privacy. How ever, as w e shall sho w in this pap er (and has b een observ ed man y times in the past), diﬀerential priv acy can lead to large information losses in some cases (see e.g., [9]). Thus, w e feel there is great v alue in exploring weak ened v ersions of diﬀeren tial priv acy . In other w ords, we are prop osing a new priv acy deﬁnition as a w ay of exploring the priv acy/accuracy tradeoﬀ. W e b egin by introducing ordinary diﬀerential priv acy and setting up some notation. W e then explore the lo wer limits for accuracy of diﬀerentially priv ate techniques in the context of histograms. W e in tro duce a concept whic h parallels minimaxity in statistics, and identify the minimax risk for a diﬀerentially priv ate histogram. W e describ e an important subset of these minimax diﬀeren tially priv ate histograms which w e show to ha v e risk which is uniformly low er b ounded at a rate which is linear in the dimension of the histogram. W e then introduce our prop osed relaxation to diﬀerential priv acy , under whic h our technique enjoys the same minimax risk, but with a lo wer b ound which dep ends only on the size of the support of the histogram (namely , the num b er of nonzero cells). Th us we sho w that in the con text of sparse histograms, the relaxation allows for a strictly b etter data release. W e also demonstrate some important properties of our relaxation, suc h as an analog of the comp osition lemma. 2 Diﬀeren tial Priv acy (DP) 2.1 Deﬁnition Let X = ( X 1 , . . . , X n ) ∈ X n b e an input database with n observ ations where X i ∈ X . The goal is to pro duce some output Z ∈ Z . F or example the inputs may consist of database rows in whic h each column is a measurement of 1 an individual, and the output is the num b er of individuals having some prop ert y . Let Q n ( · | X ) b e a conditional distribution for Z given X . W rite X ∼ X 0 if X , X 0 ∈ X n and X and X 0 diﬀer in one co ordinate. W e say that X and X 0 are neighb oring datab ases. 1 W e sa y Q n satisﬁes α diﬀeren tial priv acy if, for all measurable B ⊂ Z and all X ∼ X 0 ∈ X n , e − α ≤ Q n ( Z ∈ B | X ) Q n ( Z ∈ B | X 0 ) ≤ e α . (1) The intuition is that, for small α > 0, the v alue of one individual’s data has small eﬀect on the output. W e consider an y DP algorithm to b e a family of distributions Q n o ver the output space Z . W e index a family of distributions b y n to sho w the size of the dataset. It has been sho wn b y researc hers in priv acy that diﬀeren tial priv acy pro vides a very strong guaran tee. Essen tially it means that whether or not one particular individual is en tered in the database, has negligible eﬀect on the output. The researc h in diﬀeren tial priv acy is v ast. A few k ey references are [8], [7], [2], [5], [3] and references therein. 2.2 Nonin teractiv e Priv acy and Histograms Muc h researc h on diﬀerential priv acy fo cuses on the case where Z is a resp onse to some query such as “what is the mean of the data.” A simple wa y to achiev e diﬀerential priv acy in that case is to add some noise to the mean of X where the noise has a Laplace distribution. The user may send a sequence of such queries. This is called inter active privacy . W e instead fo cus on the noninter active privacy where the goal is to output a whole database (or a “syn thetic dataset”) Z = ( Z 1 , . . . , Z N ). Then the user is not restricted to a small n umber of queries. One wa y to release a priv ate database is to ﬁrst release a priv atized histogram. W e can then draw an arbitrarily large sample Z = ( Z 1 , . . . , Z N ) from the histogram. It is easy to show that if the histogram satisﬁes DP then Z also satisﬁes DP . Hence, in the rest of the pap er, w e focus on constructing a priv ate histogram. W e consider priv atization mechanisms whic h are p erm utation inv ariant with resp ect to their inputs (i.e., those distributions which treat the v alues x i as a set rather than a vector) in the con text of histograms this app ears to b e a very mild restriction. W e partition the sample space X into k cells (or bins) { B j } k j =1 . 2 W e consider the input to b e a lattice p oin t in the k -simplex, by taking the function: θ n ( x 1 , . . . , x n ) = ( θ 1 , . . . , θ k ), θ j = 1 n P n i =1 1 { x i ∈ B j } . The image of this mapping Θ = θ n ( X n ) is the set of lattice p oin ts in the simplex which corresp ond to histograms of n observ ations in k bins. Note that this is in essence a “normalized histogram” since the elements sum to one. This set dep ends on k although w e suppress this notation. F or the remainder of this paper w e consider the output space Z to be the same as the input space (i.e., a normalized histogram). No w we giv e a concrete example of a Q n whic h achiev es diﬀerential priv acy . Deﬁne z j = θ j + 2 L j / ( nα ) where L 1 , . . . , L k are independent draws from a Laplace distribution with mean zero and rate one. Then ( z 1 , . . . , z k ) satisfy DP (see e.g.,[8]). Ho wev er, the z i themselv es do not represent a histogram, b ecause they can b e negative and they do not necessarily sum to one. Hence w e may take, for example: δ ( z ) = arg min θ ∈ Θ k z − θ k 1 (2) where we use the ` 1 norm: || x || 1 = P j | x j | . This pro cedure hence results in a v alid histogram. Note that δ ( z ) satisﬁes the diﬀeren tial priv acy , since eac h subset of v alues it ma y tak e clearly corresp onds to a measurable subset of R k . Since the diﬀerential priv acy held for the real vector then it also holds for the pro jection (see e.g., [16]). W e will refer to this as the histo gr am p erturb ation metho d (see e.g., [16]). There are other metho ds for generating diﬀeren tially priv ate histograms, and our results below concern hold o ver a large subset of all the possible techniques a v ailable (to be made precise after prop osition 3.2). Hence our results apply to more than the abov e concrete sc heme. 1 In some papers, the deﬁnition is changed so that one sample is a strict subset of the other, ha ving exactly one less element. Although this deﬁnition is pe rhaps slightly stronger, w e do not use it and remark that the approac hes w e present b elo w ma y all be ﬁt into this framework if so desired. 2 In this pap er, k is taken as a given integer. The problem of choosing an optimal k in a priv ate matter is the sub ject of future work. 2 3 Lo w er Bounds for Accuracy with Diﬀeren tial Priv acy T o motiv ate the need for relaxed versions of diﬀeren tial priv acy , we consider here the accuracy of diﬀerentially priv ate histograms. W e ev aluate a diﬀeren tially priv ate pro cedure in terms of its “risk” whic h is a natural measure of accuracy tak en from statistics. W e consider the ` 1 loss function, and the associated risk: R ( θ , Q n ) = Z Θ k b θ − θ k 1 dQ n ( b θ | θ ) . (3) where b θ is the output of the diﬀerentially priv ate algorithm, θ is the input histogram, and the distribution Q n is the one induced b y the randomized algorithm. Typically this risk will b e a non-constant function of the parameter θ and of the distribution Q n . Therefore w e consider the “minimax risk” which is the smallest achiev able w orst-case risk, and gives a measure of the hardness of the problem which do es not dep end on a particular choice of pro cedure: R ? = inf Q n sup θ ∈ Θ R ( θ , Q n ) (4) W e next describe the minimax risk of the best fully diﬀerentially priv ate mechanism Q n . Prop osition 3.1. R ? ≥ c 0 k − 1 αn Pr o of. The pro of uses a standard metho d for deriving minimax low er b ounds in statistical estimation. Consider the k − 1- dimensional hypercub e ( σ 1 τ n , . . . , σ k − 1 τ n , ( n − P k − 1 i =1 σ i ) τ n ! : σ i ∈ { 0 , 1 } ) . T ake θ , θ 0 , to b e neigh b oring corners of this hypercub e (namely tw o elements which diﬀer in exactly one co or- dinate σ i ). T ak e the KL div ergence b et ween the conditional distributions at these corners to be: K L  Q n ( ·| θ )   Q n ( ·| θ 0 )  = Z Θ log Q n ( b θ | θ ) Q n ( b θ | θ 0 ) dQ n ( b θ | θ ) By considering a sequence of p oin ts corresp onding to neigh b oring inputs, we ﬁnd the ratio of densities to ha ve the upp er bound: Q n ( b θ | θ ) Q n ( b θ | θ 0 ) ≤ e ατ since τ elements of the input hav e to c hange to mo ve from θ to θ 0 , and the ratio at eac h step is b ounded by e α . Therefore the KL divergence ob eys K L  Q n ( ·| θ )   Q n ( ·| θ 0 )  ≤ ατ . The “aﬃnity” b et ween the t wo distributions is: k Q n ( ·| θ ) ∧ Q n ( ·| θ 0 ) k = Z Θ min n Q n ( b θ | θ ) , Q n ( b θ | θ 0 ) o d b θ . The Kullback-Csiszar-Kemperman inequalit y [17] yields a low er b ound on the aﬃnit y b et ween these distribu- tions: k Q n ( ·| θ ) ∧ Q n ( ·| θ 0 ) k ≥ 1 − r ατ 2 . Assouad’s lemma (see [17] again) thus giv es the lo wer bound: R ? ≥ ( k − 1) τ 2 n  1 − r ατ 2  . 3 T aking τ = t/α giv es R ? ≥ ( k − 1) t 2 αn 1 − r t 2 ! . F or α < 1 w e may tak e t < 1, which results in the parenthetical expression being p ositiv e. Remark 1. The pr evious r esult demonstr ates that the minimax risk of the diﬀer ential ly private histo gr am is of the or der O  k αn  . Remark 2. Har dt and T alwar [10] have a similar r esult although their setting is somewhat diﬀer ent. In p articular, they do not r estrict to the sp ac e of histo gr ams b ase d on n observations. The ab ov e results demonstrates that for ev ery diﬀerentially priv ate scheme, there is at least one input for whic h the risk is gro wing in the order shown (in fact, at least one p oin t in ev ery hypercub e of side length τ /n ). Ho wev er the prosp ect exists that at man y other inputs the risk is m uch low er. W e no w demonstrate that this is not the case when k = 2, by presenting a uniform lo wer b ound for the risk among all minimax schemes. In the case of k = 2 the output may b e regarded as a single num ber a n where a ∈ { 0 , . . . , n } , which gives the prop ortion of the data p oin ts in the ﬁrst bin. Our result will show that in a sense, the minimax diﬀerential priv acy sc hemes are similar to “equalizer rules” in the sense that the risk is on the same order for every input. Prop osition 3.2. F or k = 2 for any Q n which achieves sup θ R ( θ , Q n ) ≤ c 0 αn we have that inf θ R ( θ , Q n ) ≥ c 1 αn Pr o of. Note that for an y θ 1 and c > c 0 , due to the uniform upp er b ound on the risk, Marko v’s inequalit y gives Z Z 1 {| b θ − θ 1 | ≤ c αn } dQ n ( b θ | θ 1 ) ≥ 1 − c 0 c . Therefore, due to the constrain t of diﬀeren tial priv acy , w e ha ve that, for any θ 0 , Z Z 1 {| b θ − θ 1 | ≤ c αn } dQ n ( b θ | θ 0 ) ≥  1 − c 0 c  exp n − αn 2 k θ 0 − θ 1 k 1 o Since n 2 k θ 0 − θ 1 k elements of the input change to mo ve from θ 0 to θ 1 . Therefore taking θ 1 to give k θ 0 − θ 1 k = 2 c αn giv es R ( θ 0 , Q n ) ≥ c αn  1 − c 0 c  e − c = c 1 αn . As θ 0 is arbitrary , this giv es a uniform lo wer b ound under the conditions abov e. F or the relaxation of diﬀeren tial priv acy given in deﬁnition 2.2 of [10], the abov e result remains in tact for large enough n . The relaxation is: Q n ( z | X ) ≤ Q n ( z | X 0 ) e α + η ( n ) where η ( n ) is negligible (i.e., tending to zero faster than any inv erse p olynomial in n ). Thus via the same tec hnique as abov e, we hav e R ( θ 0 , δ, Q n ) ≥ c αn  (1 − c 0 c ) e − c − c 2 η ( n )  = c 1 − η ( n ) αn . F or large enough n this latter term is bounded from b elo w by c 3 αn . This indicates that the ab o ve relaxation of diﬀeren tial priv acy will not be useful in ac hieving higher accuracy . F or k > 2, we may write R ( θ , Q n ) = k X i =1 R i ( θ , Q n ) 4 With R i ( θ , Q n ) def = Z Z | b θ − θ i | dQ n ( b θ | θ ) , where the subscript means the i th co ordinate. Thus, whenever we ha v e that R i ≤ c 0 αn uniformly o v er i , w e ha v e that R ( θ, δ, Q n ) ≥ c 1 ( k − 1) αn . Therefore the only opp ortunit y to improv e up on the rate of k αn is when some θ hav e some coordinate i at which the risk upp er bound do es not apply . W e conclude b y remarking that we hav e demonstrated, that for a certain class of diﬀerentially priv ate algorithms whic h achiev e the “minimax rate,” their risk is uniformly low er b ounded at the same rate. The rate in question is linear in k , which is problematic when k is large relative to n . It remains an op en question whether there are diﬀeren t techniques which achiev e the minimax rate, yet do not hav e this prop ert y . Suc h a technique would ha ve to lose the uniform upper bound on the coordinate-wise risk. Belo w, w e presen t a w eakening of diﬀeren tial priv acy , whic h admits release mec hanisms, whic h b oth keep the uniform upp er bound on the co ordinate-wise risk, and also ha ve a minimax risk which is growing only in the supp ort of the histogram (namely , the num b er of cells which con tain observ ations). 4 Random Diﬀeren tial Priv acy In random diﬀeren tial priv acy (RDP) we view the data X = ( X 1 , . . . , X n ) as random draws from an unkno wn distribution P . This is certainly the case in statistical sampling and of course it is the usual assumption in most learning theory . Let us denote the observ ed v alues of the random v ariables X = ( X 1 , . . . , X n ) b y x = ( x 1 , . . . , x n ). Recall that under DP , Q ( Z ∈ B | x 1 , . . . , x n ) is not strongly aﬀected if w e replace some v alue x i with another v alue x 0 i . W e con tinue to restrict to the case in which, Q ( Z ∈ B | x 1 , . . . , x n ) is inv ariant to p erm utations of ( x 1 , . . . , x n ). Th us we ma y restate DP by sa ying that Q ( Z ∈ B | x 1 , . . . , x n ) is not strongly aﬀected if we replace x n b y some other arbitrary v alue x 0 n . In RDP , w e require instead that the distribution Q n ( ·| x 1 , . . . , x n ) is not strongly aﬀected if we replace x n b y some new x 0 n whic h is also randomly dra wn from P . Deﬁnition 1 (( α , γ )-Random Diﬀerential Priv acy) . We say that a r andomize d algorithm Q n is ( α, γ ) -R andomly Diﬀer ential ly Private when: P  ∀ B ⊆ Z , e − α ≤ Q n ( Z ∈ B | X ) Q n ( Z ∈ B | X 0 ) ≤ e α  ≥ 1 − γ wher e X = ( X 1 , . . . , X n − 1 , X n ) , X 0 = ( X 1 , . . . , X n − 1 , X n +1 ) (i.e., X ∼ X 0 ), and the pr ob ability is with r esp e ct to the n + 1 -fold pr o duct me asur e P n +1 on the sp ac e X n +1 , that is, X 1 , . . . , X n +1 iid ∼ P . W e also giv e the “random” analog of the ( α, δ )-Diﬀerential Priv acy: Deﬁnition 2 (( α, η , γ )-Random Diﬀerential Priv acy) . We say that a r andomize d algorithm Q n is ( α, η , γ ) -R andomly Diﬀer ential ly Private when: P ( ∀ B ⊆ Z , Q n ( Z ∈ B | X ) ≤ e α Q n ( Z ∈ B | X 0 ) + η ( n )) ≥ 1 − γ wher e η is ne gligible (i.e., de cr e asing faster than any inverse p olynomial). W e note that [12] also consider a probabilistic relaxation of DP . How ever, their relaxation is quite diﬀerent than the one considered here. Namely , their relaxation b ounds the probability that the diﬀerential priv acy criteria is not met, but where the probability is tak en with resp ect to the randomized algorithm itself. Our relaxation takes the probabilit y with respect to the generation of the data itself. The following result is clear from the deﬁnition of random diﬀeren tial priv acy . 5 Prop osition 4.1. ( α, γ ) -RDP is a strict r elaxation of α -DP. That is, if Q n is DP then it is also RDP. However, ther e ar e RDP pr o c e dur es that ar e not DP. Remark 3. Although an α -DP pr o c e dur e fulﬁls the r e quir ement of ( α, 0) -RDP, the c onverse is not true. The r e ason is that the latter r e quir es that the c ondition (that the r atio of densities b e b ounde d) holds almost everywher e with r esp e ct to the unknown me asur e, wher e as DP r e quir e that this c ondition holds uniformly everywher e in the sp ac e. W e next sho w an important property of the deﬁnition, namely , that RDP algorithms may be comp osed to give other RDP algorithms with diﬀerent constants. The analogous comp osition prop ert y for DP was considered to b e imp ortan t because it allow ed rapid dev elopment of techniques whic h release m ultiple statistics, as well as techniques whic h allow in teractiv e access to the data. Prop osition 4.2 (Comp osition) . Supp ose Q, Q 0 ar e distributions over Z , Z 0 which ar e ( α, γ ) -RDP and ( α 0 , γ 0 ) - RDP r esp e ctively. The fol lowing distribution C over Z × Z 0 is ( α + α 0 , γ + γ 0 ) -RDP: C ( Z, Z 0 | X ) = Q ( Z | X ) · Q 0 ( Z 0 | X ) . This result is simply an application of the union b ound combined with the standard comp osition prop ert y of diﬀeren tial priv acy . As an example, supp ose it is required to release k diﬀerent statistics of some data sample. If eac h one is released via a ( α/k , γ /k )-RDP pro cedure, then the ov erall release of all k statistics together achiev es ( α, γ )-RDP . A similar result holds for the composition of ( α, δ, γ )-RDP releases. 5 RDP Sparse Histograms W e ﬁrst giv e a technique for the release of a histogram whic h w orks w ell in the case of a sparse histogram, and whic h satisﬁes the ( α , γ )-Random Diﬀerential Priv acy . W e then compare the accuracy of this metho d to a low er b ound on the accuracy of a α -Diﬀerentially Priv ate approach. The basic idea is to not add any noise to cells with low coun ts. This results in partitioning the space into t wo blo c ks and releasing a noise-free histogram in one blo c k, and use a diﬀerentially priv ate histogram in the other. The partition will depend on the data itself. F or a sample x 1 , . . . , x n , w e denote: S = S ( x 1 , . . . , x n ) = { j : θ j = 0 } . Then w e consider the release mec hanism: z j = ( θ j j ∈ S and 2 k ≤ γ n θ j + 2 nα L o/w (5) Prop osition 5.1. The r andom ve ctor Z = ( z 1 , . . . , z k ) as deﬁne d in (5) satisﬁes the ( α , γ ) -RDP. In demonstrating RDP , we tak e the sample x 1 , . . . , x n , x n +1 and denote: S = S ( x 1 , . . . , x n ) and S 0 = S ( x 1 , . . . , x n − 1 , x n +1 ). W e consider the output distribution of our metho d when applied to each of the neighboring samples. The even t that the ratio of densities fail to meet the requisite b ound is a subset of the even t where either x n +1 ∈ S or x n ∈ S 0 , and when 2 k ≤ γ n . In the complemen t of this even t then the partitions are the same, and the diﬀering samples b oth fall within the blo c k which receives the Laplace noise, so the DP condition is ac hieved. In demonstrating the RDP , w e simply b ound the probabilit y of the aforemen tioned ev ent, conditional on the order statistics. Pr o of of pr op osition 5.1. In the interest of space let the vector of order statistics b e denoted T = ( x (1) , . . . , x ( n +1) ). Let S ? ( x 1 , . . . , x n , x n +1 ) = n j : P n +1 i =1 1 { x i = j } ≤ 1 o . W e ha ve that S, S 0 ⊆ S ? . W e th us hav e P ( x n ∈ S 0 or x n +1 ∈ S | T ) ≤ P ( x n ∈ S ? or x n +1 ∈ S ? | T ) . The latter probabilit y is just the fraction of w a ys in which the order statistics may be rearranged so that x n , x n +1 fall within S ? . Due to the condition 2 k ≤ γ n , we hav e | S ? | ≤ k ≤ γ n 2 . Therefore the num b er of rearrangemen ts 6 ha ving at least one of x n or x n +1 in S ? is bounded ab o ve P ( x n ∈ S ? or x n +1 ∈ S ? | T ) ≤ 2 | S ? | n + 1 < γ . Therefore P ( x n ∈ S 0 or x n +1 ∈ S ) ≤ Z X n +1 P ( x n ∈ S 0 or x n +1 ∈ S | T ) dP ( T ) leq Z X n +1 P ( x n ∈ S ? or x n +1 ∈ S ? | T ) dP ( T ) < γ Z X n +1 dP ( T ) = γ . Finally: P  ∀ Z ⊆ Z , e − α ≤ Q n ( Z | X ) Q n ( Z | X 0 ) ≤ e α  = 1 − P ( x n ∈ S 0 or x n +1 ∈ S ) > 1 − γ . 5.1 Accuracy Here w e show that δ ( z ) from (2) is close to θ ev en when the histogram is sparse. Theorem 5.2. Supp ose that 2 k ≤ γ n . L et θ n ( x 1 , . . . , x n ) = ( θ 1 , . . . , θ r , 0 , . . . , 0) for some 1 ≤ r < k . Then || θ − δ ( z ) || 1 = O P ( r /αn ) . Pr o of. Let L 1 , . . . , L r ∼ Laplace. Let E b e the even t that L j > − nα 2 θ j for all 1 ≤ j ≤ r . Then E holds, except on a set of exp onen tially small probability . Supp ose E holds. Let W = P r j =1 L j = O P ( r ). F or 1 ≤ j ≤ r , z j =  θ j + (2 L j ) / ( nα )  F or j > r , z j = θ j = 0. Hence k z − θ k 1 = O P ( r /αn ). F urthermore k δ ( z ) − z k 1 ≤ r n ≤ r αn Hence via the triangle inequalit y w e ha ve, || δ ( z ) − θ || 1 = O P ( r /αn ). W e thus hav e a tec hnique for whic h the risk is uniformly bounded ab ov e by O ( k /αn ) as with the DP tec hnique, and whic h also enjoys the co ordinate-wise upp er b ound on the risk. Ho wev er in this regime, the risk is no longer uniformly lo wer bounded with a rate linear in k , since the upper b ound is linear in r in the case of sparse vectors. 6 RDP via Sensitivit y Analysis W e next demonstrate that RDP allows schemes for release of other kinds of statistics (besides histograms). A com- mon technique used to establish a diﬀerentially priv ate technique is to use Laplace noise with v ariance prop ortional to the “global sensitivity” of the function [6]. W e show that there is an analog of this tec hnique for RDP . W e next demonstrate a method for the RDP release of an arbitrary function g n ( x 1 , . . . , x n ) ∈ R . W e consider the algorithm whic h samples the distribution Q n ( z | x 1 , . . . , x n ) ∝ exp  − α | z − g n ( x 1 , . . . , x n ) | s n ( x 1 , . . . , x n )  (6) It is well kno wn that when s n is the constant function whic h gives an upper b ound of the global sensitivity [6] of g n , this method enjo ys the α -DP . As w e allow s n to dep end on the data w e may make use of the local sensitivit y framew ork of [14]. There it is demonstrated that whenever: 7 ∀ X ∼ X 0 s n ( X ) ≤ e β s n ( X 0 ) (7) and ∀ X sup X 0 ∼ X | g n ( X ) − g n ( X 0 ) | ≤ s n ( X ) (8) then (6) giv es (2 α, η )-DP with: η = e − α 2 β (9) (see [14] deﬁnition 2.1, lemma 2.5 and example 3). In moving from DP to RDP we ma y now require that conditions (7) and (8) hold only with the requisite probability 1 − γ . Then (6) will ac hieve (2 α, η , γ )-RDP . W e consider a special subset of functions for whic h: sup X ∼ X 0 | g n ( X ) − g n ( X 0 ) | = n − 1 sup x,x 0 h ( x, x 0 ) . Examples of functions satisfying this prop ert y are e.g., statistical p oint estimators [15] and regularized logistic regression estimates [4]. In particular in these cases it is assumed that X is some compact subset of R d and then e.g., sup x,x 0 h ( x, x 0 ) = k x − x 0 k 2 giv es the diameter of this set. W e replace conditions (7) and (8) with: P  s n ( X ) ≤ e β s n ( X 0 )  ≥ 1 − γ 1 (10) and P  n − 1 h ( x, x 0 ) ≤ min { s n ( X ) , s n ( X 0 ) }  ≥ 1 − γ 2 . (11) Note that x, x 0 are random dra ws from P whic h are indep enden t of the random vectors X, X 0 . The ﬁrst condition simply requires (7), to hold except on a set of measure γ 1 . The second condition implies that b oth s n ( X ) and s n ( X 0 ) give upp er bounds to the local sensitivit y , except on a set of measure γ 2 . Putting these together along with the ab ov e considerations will yield a (2 α, η , γ 1 + γ 2 )-RDP metho d. W e note that we are essentially asking that s n ( X ) and s n ( X 0 ) b oth give v alid quantiles for the random v ariable h ( x, x 0 ), and that they giv e similar v alues with high probabilit y . W e consider the empirical process based on h and the data sample X given b y: D ( X, t ) = 2 n n/ 2 X i =1 1  h ( x i , x i + n/ 2 ) ≤ t  This is exactly an empirical CDF for the distribution of h ( x, x 0 ), based on n/ 2 indep enden t samples of h ( x, x 0 ). W e ma y an ticipate that sample quan tiles of this empirical CDF will be close to the quan tiles from the true CDF, whic h w e denote b y H ( t ) = P ( h ≤ t ). This is made precise by the DKW inequalit y (see e.g., [13]), which in this case yields: P  sup t | H ( t ) − D ( X , t ) | ≥   ≤ 2 e − n 2 . (12) Th us taking d δ ( X ) to b e the smallest d with D ( X, d ) = 1 − δ , and h δ 0 to give the 1 − δ 0 quan tile of h , with δ < δ 0 , w e hav e: P ( h ( x, x 0 ) > d δ ) ≤ δ 0 + P ( d δ ( X ) < h δ 0 ) ≤ δ 0 + 2 e − ( δ 0 − δ ) 2 n . 8 The second inequality comes from applying the monotone function D ( X , · ) to b oth sides of the inequality statemen t in the probability , and then rearranging, to yield P ( D ( X, h δ 0 ) − H ( h δ 0 ) > ( δ 0 − δ )) which is b ounded due to the DKW inequality (12). Th us for some appropriate choice of δ, δ 0 w e may take s n ( X ) = n − 1 d δ ( X ), and th us achiev e (11). No w to achiev e (10) we turn to the Bahadur-Kiefer representation of sample quantiles (see [11]). W e ha ve that: d δ ( X ) − h δ = D ( X, h δ ) − H ( h δ ) H 0 ( h δ ) + O p ( n − 3 / 4 ) where H 0 is the deriv ativ e of H (namely the density). Hence w e concentrate on the case when h is a con tinuous random v ariable. W e ﬁnd the ratio to be b ounded in probabilit y: d δ ( X ) d δ ( X 0 ) ≤ 1 + | d δ ( X ) − d δ ( X 0 ) | d δ ( X 0 ) = 1 + O p ( n − 1 / 2 ) h δ + O p ( n − 1 / 2 ) where the ﬁnal equality stems from using DKW to b ound the D ( X, h δ ) − H ( h δ ) and along with the triangle inequalit y to b ound | D ( X , h δ ) − D ( X 0 , h δ ) | . This therefore demonstrates that: d δ ( X ) d δ ( X 0 ) ≤ 1 + O p ( n − 1 / 2 ) = O p ( e n − 1 / 2 ) This means that for large enough n , and some probabilit y 1 − γ 2 , the ratio is bounded b y e β where β is polynomial in n − 1 / 2 . Examining (9) w e ﬁnd η to b e negligible for such a choice of β . Therefore the use of s n ( X ) = n − 1 d δ ac hieves the RDP as required. W e note that in principle this same approac h would work, were w e to replace D ( X, t ) with the U-statistic pro cess: U ( X , t ) = 1  n 2  X i>j 1 { h ( x i , x j ) ≤ t } . Though this is essen tially another empirical CDF, it is based on non-independent samples since each x i partici- pates in n − 1 of the ev aluations of h . Nev ertheless an analog of the DKW inequality still applies to this pro cess, and w e still ha ve the same b ehavior of the quantiles (see e.g., [1]). 7 Priv acy Concerns As stated ab o ve, we mainly use random diﬀerential priv acy as a vehicle for a theoretical exploration of the boundaries of diﬀeren tial priv acy . Although it is a conceptually reasonable weak ening of diﬀeren tial priv acy , whether it is appropriate to use in practice requires more attention. F or example, if the hypothesized adversary (of e.g., [16] theorem 2.4), really had access to a subset of n − 1 of the data, and the one remaining element was the only inhabitan t of its histogram cell, then this would be immediately revealed to the adv ersary . Whether this is a critical problem depends on the application. 8 Example W e present tw o examples in which the RDP technique and the DP tec hniques are compared on syn thetic histogram data. In the ﬁrst example the histogram has k = 25 bins, all but tw o of which are empty and n = 500 p oin ts fall in to the other tw o. Figure 1(a) shows the original data as well as the sanitized data due to diﬀerential priv acy and RDP . Figure 1(b) shows the distribution of L 1 loss from 100 simulations of both approaches. W e see that the risk of the RDP histogram is typically muc h low er than that of the DP histogram, which o ccasionally has risk in 9 −1.0 −0.5 0.0 0.5 1.0 Original Data −1.0 −0.5 0.0 0.5 1.0 Sanitized Data −1.0 −0.5 0.0 0.5 1.0 Original Data −1.0 −0.5 0.0 0.5 1.0 Santized Data (a) Original and synthetic data for DP (top) and RDP (b ottom) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 Differential Privacy 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Random Differential Privacy (b) Empirical error distribution for DP (top) and RDP (bottom) Figure 1: A one dimensional example. excess of 0.5 (recall that the maxim um possible loss is 2 in the case that the original and sanitized histograms had completely disjoin t supp ort). W e present an analogous t w o dimensional example in ﬁgure 2. Here the histogram has k = 400 bins in which all but 16 are empt y . In this example w e see that the RDP tec hnique has uniformly better loss than the DP technique. 9 Conclusion W e hav e introduced a relaxed version of diﬀerential priv acy— random diﬀerential priv acy—shown ho w to apply it to histograms and examined the accuracy of the resulting metho d. W e also demonstrated some prop erties of our deﬁnition, and explained a basic construction for release of arbitrary functions of the data. As we mentioned in the in tro duction, we are not suggesting that diﬀerential priv acy should b e abandoned and replaced by random diﬀeren tial priv acy . How ev er, w e do think it is fruitful to consider v arious relaxations of diﬀeren tial priv acy to gain a deep er understanding of the tradeoﬀs betw een the strength of the priv acy guarantee and the accuracy of the data release mec hanism. In ongoing work we are extending this work to allo w for data dep enden t choices of the n umber of bins and to allow for other densit y estimators b esides histograms. W e are also considering other relaxations of diﬀerential priv acy . W e will rep ort on these results in future work. References [1] Miguel A. Arcones. The bahadur-kiefer representation for u-quantiles. The Annals of Statistics , 24(3):1400– 1422, 1996. [2] B. Barak, K. Chaudhuri, C. Dw ork, S. Kale, F. McSherry , and K. T alwar. Priv acy , accuracy , and consistency to o: a holistic solution to contingency table release. Pr o c e e dings of the twenty-sixth A CM SIGMOD-SIGACT- SIGAR T symp osium on Principles of datab ase systems , pages 273–282, 2007. 10 −1.0 −0.5 0.0 0.5 1.0 Original Data 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 Differential Privacy 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 14 Random Differential Privacy Figure 2: Empirical error distributions for a t wo dimensional histogram, display ed in the top left. [3] A. Blum, C. Dw ork, F. McSherry , and K. Nissim. Practical priv acy: the SuLQ framew ork. Pr o c e e dings of the twenty-fourth A CM SIGMOD-SIGACT-SIGAR T symp osium on Principles of datab ase systems , pages 128–138, 2005. [4] Kamalik a Chaudh uri and Claire Mon teleoni. Priv acy preserving logistic regression. NIPS 2008 , 2008. [5] C. Dwork and J. Lei. Diﬀerential priv acy and robust statistics. Pr o c e e dings of the 41st ACM Symp osium on The ory of Computing , pages 371–380, Ma y–June 2009. [6] C. Dw ork, F. McSherry , K. Nissim, and A. Smith. Calibrating noise to sensitivity in priv ate data analysis. Pr o c e e dings of the 3r d The ory of Crypto gr aphy Confer enc e , pages 265–284, 2006. [7] C. Dw ork, F. McSherry , and K. T alw ar. The price of priv acy and the limits of LP decoding. In Pr o c e e dings of Symp osium on the The ory of Computing , 2007. [8] Cynthia Dwork. Diﬀerential priv acy . 33r d International Col lo quium on Automata, L anguages and Pr o gr am- ming , pages 1–12, 2006. [9] Stephen E. Fienberg, Alessandro Rinaldo, and Xiaolin Y ang. Diﬀeren tial priv acy and the risk-utility tradeoﬀ for m ulti-dimensional contingency tables. Privacy in Stat istic al Datab ases , pages 197 – 199, 2010. [10] Moritz Hardt and Kunal T alw ar. On the geometry of diﬀerential priv acy . STOC ’10 Pr o c e e dings of the 42nd A CM symp osium on The ory of c omputing , pages 705–714, 2010. [11] J. Kiefer. On bahadur’s represen tation of sample quan tiles. The A nnals of Mathematic al Statistics , 38(5):1323– 1342, 1967. [12] A. Machana v a jjhala, D. Kifer, J. Abowd, J. Gehrk e, and L. Vilh ub er. Priv acy: Theory meets Practice on the Map. Pr o c e e dings of the 24th International Confer enc e on Data Engine ering , pages 277–286, 2008. 11 [13] P . Massart. The Tigh t Constan t in the Dvoretzky-Kiefer-W olfo witz Inequality. The Annals of Pr ob ability , 18(3), 1990. [14] K. Nissim, S. Raskho dnik ov a, and A. Smith. Smo oth sensitivity and sampling in priv ate data analysis. Pr o- c e e dings of the 39th annual A CM annual ACM symp osium on The ory of c omputing , pages 75–84, 2007. [15] Adam Smith. Eﬃcien t, diﬀerentially priv ate p oin t estimators. , 2008. [16] Larry W asserman and Shuheng Zhou. A statistical framew ork for diﬀeren tial priv acy . The Journal of the A meric an Statistic al Asso ciation , 105:375–389, 2010. [17] Bin Y u. Assouad, fano, and le cam. In D. Pollard, E. T orgersen, and G. Y ang, editors, F estschrift for Lucien L e Cam , pages 423–435. Springer, 1997. 12

Random Differential Privacy

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment