Random Differential Privacy

We propose a relaxed privacy definition called {\em random differential privacy} (RDP). Differential privacy requires that adding any new observation to a database will have small effect on the output of the data-release procedure. Random differentia…

Authors: Rob Hall, Aless, ro Rinaldo

Random Differen tial Priv acy Rob Hall Alessandro Rinaldo Larry W asserman No vem ber 27, 2024 Abstract W e propose a relaxed priv acy definition called r andom differ ential privacy (RDP). Differential priv acy requires that adding an y new observ ation to a database will ha ve small effect on the output of the data-release procedure. Random differen tial priv acy requires that adding a r andomly dr awn new observation to a database will ha v e small effect on the output. W e sho w an analog of the composition property of differen tially priv ate procedures whic h applies to our new definition. W e sho w how to release an RDP histogram and we sho w that RDP histograms are muc h more accurate than histograms obtained using ordinary differential priv acy . W e finally show an analog of the global sensitivity framew ork for the release of functions under our priv acy definition. 1 In tro duction Differ ential privacy (DP) ([8]) is a type of priv acy guarantee that has b ecome quite p opular in the computer science literature. The adv antage of differential priv acy is that it gives a strong and mathematically rigorous guaran tee. The disadv antage is that the s trong priv acy guarantee often comes at the exp ense of the statistical utility of the released information. W e prop ose a weak er notion of priv acy , called “random differential priv acy” (RDP), under whic h it is p ossible to achiev e b etter accuracy . The priv acy guaran tee pro vided b y RDP represen ts a radical w eakening of the ordinary differential priv acy . This could be a cause for concern for those who w ant v ery strong priv acy guarantees. Indeed, we ar e not suggesting the RDP should r eplac e or dinary differ ential privacy. How ever, as w e shall sho w in this pap er (and has b een observ ed man y times in the past), differential priv acy can lead to large information losses in some cases (see e.g., [9]). Thus, w e feel there is great v alue in exploring weak ened v ersions of differen tial priv acy . In other w ords, we are prop osing a new priv acy definition as a w ay of exploring the priv acy/accuracy tradeoff. W e b egin by introducing ordinary differential priv acy and setting up some notation. W e then explore the lo wer limits for accuracy of differentially priv ate techniques in the context of histograms. W e in tro duce a concept whic h parallels minimaxity in statistics, and identify the minimax risk for a differentially priv ate histogram. W e describ e an important subset of these minimax differen tially priv ate histograms which w e show to ha v e risk which is uniformly low er b ounded at a rate which is linear in the dimension of the histogram. W e then introduce our prop osed relaxation to differential priv acy , under whic h our technique enjoys the same minimax risk, but with a lo wer b ound which dep ends only on the size of the support of the histogram (namely , the num b er of nonzero cells). Th us we sho w that in the con text of sparse histograms, the relaxation allows for a strictly b etter data release. W e also demonstrate some important properties of our relaxation, suc h as an analog of the comp osition lemma. 2 Differen tial Priv acy (DP) 2.1 Definition Let X = ( X 1 , . . . , X n ) ∈ X n b e an input database with n observ ations where X i ∈ X . The goal is to pro duce some output Z ∈ Z . F or example the inputs may consist of database rows in whic h each column is a measurement of 1 an individual, and the output is the num b er of individuals having some prop ert y . Let Q n ( · | X ) b e a conditional distribution for Z given X . W rite X ∼ X 0 if X , X 0 ∈ X n and X and X 0 differ in one co ordinate. W e say that X and X 0 are neighb oring datab ases. 1 W e sa y Q n satisfies α differen tial priv acy if, for all measurable B ⊂ Z and all X ∼ X 0 ∈ X n , e − α ≤ Q n ( Z ∈ B | X ) Q n ( Z ∈ B | X 0 ) ≤ e α . (1) The intuition is that, for small α > 0, the v alue of one individual’s data has small effect on the output. W e consider an y DP algorithm to b e a family of distributions Q n o ver the output space Z . W e index a family of distributions b y n to sho w the size of the dataset. It has been sho wn b y researc hers in priv acy that differen tial priv acy pro vides a very strong guaran tee. Essen tially it means that whether or not one particular individual is en tered in the database, has negligible effect on the output. The researc h in differen tial priv acy is v ast. A few k ey references are [8], [7], [2], [5], [3] and references therein. 2.2 Nonin teractiv e Priv acy and Histograms Muc h researc h on differential priv acy fo cuses on the case where Z is a resp onse to some query such as “what is the mean of the data.” A simple wa y to achiev e differential priv acy in that case is to add some noise to the mean of X where the noise has a Laplace distribution. The user may send a sequence of such queries. This is called inter active privacy . W e instead fo cus on the noninter active privacy where the goal is to output a whole database (or a “syn thetic dataset”) Z = ( Z 1 , . . . , Z N ). Then the user is not restricted to a small n umber of queries. One wa y to release a priv ate database is to first release a priv atized histogram. W e can then draw an arbitrarily large sample Z = ( Z 1 , . . . , Z N ) from the histogram. It is easy to show that if the histogram satisfies DP then Z also satisfies DP . Hence, in the rest of the pap er, w e focus on constructing a priv ate histogram. W e consider priv atization mechanisms whic h are p erm utation inv ariant with resp ect to their inputs (i.e., those distributions which treat the v alues x i as a set rather than a vector) in the con text of histograms this app ears to b e a very mild restriction. W e partition the sample space X into k cells (or bins) { B j } k j =1 . 2 W e consider the input to b e a lattice p oin t in the k -simplex, by taking the function: θ n ( x 1 , . . . , x n ) = ( θ 1 , . . . , θ k ), θ j = 1 n P n i =1 1 { x i ∈ B j } . The image of this mapping Θ = θ n ( X n ) is the set of lattice p oin ts in the simplex which corresp ond to histograms of n observ ations in k bins. Note that this is in essence a “normalized histogram” since the elements sum to one. This set dep ends on k although w e suppress this notation. F or the remainder of this paper w e consider the output space Z to be the same as the input space (i.e., a normalized histogram). No w we giv e a concrete example of a Q n whic h achiev es differential priv acy . Define z j = θ j + 2 L j / ( nα ) where L 1 , . . . , L k are independent draws from a Laplace distribution with mean zero and rate one. Then ( z 1 , . . . , z k ) satisfy DP (see e.g.,[8]). Ho wev er, the z i themselv es do not represent a histogram, b ecause they can b e negative and they do not necessarily sum to one. Hence w e may take, for example: δ ( z ) = arg min θ ∈ Θ k z − θ k 1 (2) where we use the ` 1 norm: || x || 1 = P j | x j | . This pro cedure hence results in a v alid histogram. Note that δ ( z ) satisfies the differen tial priv acy , since eac h subset of v alues it ma y tak e clearly corresp onds to a measurable subset of R k . Since the differential priv acy held for the real vector then it also holds for the pro jection (see e.g., [16]). W e will refer to this as the histo gr am p erturb ation metho d (see e.g., [16]). There are other metho ds for generating differen tially priv ate histograms, and our results below concern hold o ver a large subset of all the possible techniques a v ailable (to be made precise after prop osition 3.2). Hence our results apply to more than the abov e concrete sc heme. 1 In some papers, the definition is changed so that one sample is a strict subset of the other, ha ving exactly one less element. Although this definition is pe rhaps slightly stronger, w e do not use it and remark that the approac hes w e present b elo w ma y all be fit into this framework if so desired. 2 In this pap er, k is taken as a given integer. The problem of choosing an optimal k in a priv ate matter is the sub ject of future work. 2 3 Lo w er Bounds for Accuracy with Differen tial Priv acy T o motiv ate the need for relaxed versions of differen tial priv acy , we consider here the accuracy of differentially priv ate histograms. W e ev aluate a differen tially priv ate pro cedure in terms of its “risk” whic h is a natural measure of accuracy tak en from statistics. W e consider the ` 1 loss function, and the associated risk: R ( θ , Q n ) = Z Θ k b θ − θ k 1 dQ n ( b θ | θ ) . (3) where b θ is the output of the differentially priv ate algorithm, θ is the input histogram, and the distribution Q n is the one induced b y the randomized algorithm. Typically this risk will b e a non-constant function of the parameter θ and of the distribution Q n . Therefore w e consider the “minimax risk” which is the smallest achiev able w orst-case risk, and gives a measure of the hardness of the problem which do es not dep end on a particular choice of pro cedure: R ? = inf Q n sup θ ∈ Θ R ( θ , Q n ) (4) W e next describe the minimax risk of the best fully differentially priv ate mechanism Q n . Prop osition 3.1. R ? ≥ c 0 k − 1 αn Pr o of. The pro of uses a standard metho d for deriving minimax low er b ounds in statistical estimation. Consider the k − 1- dimensional hypercub e ( σ 1 τ n , . . . , σ k − 1 τ n , ( n − P k − 1 i =1 σ i ) τ n ! : σ i ∈ { 0 , 1 } ) . T ake θ , θ 0 , to b e neigh b oring corners of this hypercub e (namely tw o elements which differ in exactly one co or- dinate σ i ). T ak e the KL div ergence b et ween the conditional distributions at these corners to be: K L  Q n ( ·| θ )   Q n ( ·| θ 0 )  = Z Θ log Q n ( b θ | θ ) Q n ( b θ | θ 0 ) dQ n ( b θ | θ ) By considering a sequence of p oin ts corresp onding to neigh b oring inputs, we find the ratio of densities to ha ve the upp er bound: Q n ( b θ | θ ) Q n ( b θ | θ 0 ) ≤ e ατ since τ elements of the input hav e to c hange to mo ve from θ to θ 0 , and the ratio at eac h step is b ounded by e α . Therefore the KL divergence ob eys K L  Q n ( ·| θ )   Q n ( ·| θ 0 )  ≤ ατ . The “affinity” b et ween the t wo distributions is: k Q n ( ·| θ ) ∧ Q n ( ·| θ 0 ) k = Z Θ min n Q n ( b θ | θ ) , Q n ( b θ | θ 0 ) o d b θ . The Kullback-Csiszar-Kemperman inequalit y [17] yields a low er b ound on the affinit y b et ween these distribu- tions: k Q n ( ·| θ ) ∧ Q n ( ·| θ 0 ) k ≥ 1 − r ατ 2 . Assouad’s lemma (see [17] again) thus giv es the lo wer bound: R ? ≥ ( k − 1) τ 2 n  1 − r ατ 2  . 3 T aking τ = t/α giv es R ? ≥ ( k − 1) t 2 αn 1 − r t 2 ! . F or α < 1 w e may tak e t < 1, which results in the parenthetical expression being p ositiv e. Remark 1. The pr evious r esult demonstr ates that the minimax risk of the differ ential ly private histo gr am is of the or der O  k αn  . Remark 2. Har dt and T alwar [10] have a similar r esult although their setting is somewhat differ ent. In p articular, they do not r estrict to the sp ac e of histo gr ams b ase d on n observations. The ab ov e results demonstrates that for ev ery differentially priv ate scheme, there is at least one input for whic h the risk is gro wing in the order shown (in fact, at least one p oin t in ev ery hypercub e of side length τ /n ). Ho wev er the prosp ect exists that at man y other inputs the risk is m uch low er. W e no w demonstrate that this is not the case when k = 2, by presenting a uniform lo wer b ound for the risk among all minimax schemes. In the case of k = 2 the output may b e regarded as a single num ber a n where a ∈ { 0 , . . . , n } , which gives the prop ortion of the data p oin ts in the first bin. Our result will show that in a sense, the minimax differential priv acy sc hemes are similar to “equalizer rules” in the sense that the risk is on the same order for every input. Prop osition 3.2. F or k = 2 for any Q n which achieves sup θ R ( θ , Q n ) ≤ c 0 αn we have that inf θ R ( θ , Q n ) ≥ c 1 αn Pr o of. Note that for an y θ 1 and c > c 0 , due to the uniform upp er b ound on the risk, Marko v’s inequalit y gives Z Z 1 {| b θ − θ 1 | ≤ c αn } dQ n ( b θ | θ 1 ) ≥ 1 − c 0 c . Therefore, due to the constrain t of differen tial priv acy , w e ha ve that, for any θ 0 , Z Z 1 {| b θ − θ 1 | ≤ c αn } dQ n ( b θ | θ 0 ) ≥  1 − c 0 c  exp n − αn 2 k θ 0 − θ 1 k 1 o Since n 2 k θ 0 − θ 1 k elements of the input change to mo ve from θ 0 to θ 1 . Therefore taking θ 1 to give k θ 0 − θ 1 k = 2 c αn giv es R ( θ 0 , Q n ) ≥ c αn  1 − c 0 c  e − c = c 1 αn . As θ 0 is arbitrary , this giv es a uniform lo wer b ound under the conditions abov e. F or the relaxation of differen tial priv acy given in definition 2.2 of [10], the abov e result remains in tact for large enough n . The relaxation is: Q n ( z | X ) ≤ Q n ( z | X 0 ) e α + η ( n ) where η ( n ) is negligible (i.e., tending to zero faster than any inv erse p olynomial in n ). Thus via the same tec hnique as abov e, we hav e R ( θ 0 , δ, Q n ) ≥ c αn  (1 − c 0 c ) e − c − c 2 η ( n )  = c 1 − η ( n ) αn . F or large enough n this latter term is bounded from b elo w by c 3 αn . This indicates that the ab o ve relaxation of differen tial priv acy will not be useful in ac hieving higher accuracy . F or k > 2, we may write R ( θ , Q n ) = k X i =1 R i ( θ , Q n ) 4 With R i ( θ , Q n ) def = Z Z | b θ − θ i | dQ n ( b θ | θ ) , where the subscript means the i th co ordinate. Thus, whenever we ha v e that R i ≤ c 0 αn uniformly o v er i , w e ha v e that R ( θ, δ, Q n ) ≥ c 1 ( k − 1) αn . Therefore the only opp ortunit y to improv e up on the rate of k αn is when some θ hav e some coordinate i at which the risk upp er bound do es not apply . W e conclude b y remarking that we hav e demonstrated, that for a certain class of differentially priv ate algorithms whic h achiev e the “minimax rate,” their risk is uniformly low er b ounded at the same rate. The rate in question is linear in k , which is problematic when k is large relative to n . It remains an op en question whether there are differen t techniques which achiev e the minimax rate, yet do not hav e this prop ert y . Suc h a technique would ha ve to lose the uniform upper bound on the coordinate-wise risk. Belo w, w e presen t a w eakening of differen tial priv acy , whic h admits release mec hanisms, whic h b oth keep the uniform upp er bound on the co ordinate-wise risk, and also ha ve a minimax risk which is growing only in the supp ort of the histogram (namely , the num b er of cells which con tain observ ations). 4 Random Differen tial Priv acy In random differen tial priv acy (RDP) we view the data X = ( X 1 , . . . , X n ) as random draws from an unkno wn distribution P . This is certainly the case in statistical sampling and of course it is the usual assumption in most learning theory . Let us denote the observ ed v alues of the random v ariables X = ( X 1 , . . . , X n ) b y x = ( x 1 , . . . , x n ). Recall that under DP , Q ( Z ∈ B | x 1 , . . . , x n ) is not strongly affected if w e replace some v alue x i with another v alue x 0 i . W e con tinue to restrict to the case in which, Q ( Z ∈ B | x 1 , . . . , x n ) is inv ariant to p erm utations of ( x 1 , . . . , x n ). Th us we ma y restate DP by sa ying that Q ( Z ∈ B | x 1 , . . . , x n ) is not strongly affected if we replace x n b y some other arbitrary v alue x 0 n . In RDP , w e require instead that the distribution Q n ( ·| x 1 , . . . , x n ) is not strongly affected if we replace x n b y some new x 0 n whic h is also randomly dra wn from P . Definition 1 (( α , γ )-Random Differential Priv acy) . We say that a r andomize d algorithm Q n is ( α, γ ) -R andomly Differ ential ly Private when: P  ∀ B ⊆ Z , e − α ≤ Q n ( Z ∈ B | X ) Q n ( Z ∈ B | X 0 ) ≤ e α  ≥ 1 − γ wher e X = ( X 1 , . . . , X n − 1 , X n ) , X 0 = ( X 1 , . . . , X n − 1 , X n +1 ) (i.e., X ∼ X 0 ), and the pr ob ability is with r esp e ct to the n + 1 -fold pr o duct me asur e P n +1 on the sp ac e X n +1 , that is, X 1 , . . . , X n +1 iid ∼ P . W e also giv e the “random” analog of the ( α, δ )-Differential Priv acy: Definition 2 (( α, η , γ )-Random Differential Priv acy) . We say that a r andomize d algorithm Q n is ( α, η , γ ) -R andomly Differ ential ly Private when: P ( ∀ B ⊆ Z , Q n ( Z ∈ B | X ) ≤ e α Q n ( Z ∈ B | X 0 ) + η ( n )) ≥ 1 − γ wher e η is ne gligible (i.e., de cr e asing faster than any inverse p olynomial). W e note that [12] also consider a probabilistic relaxation of DP . How ever, their relaxation is quite different than the one considered here. Namely , their relaxation b ounds the probability that the differential priv acy criteria is not met, but where the probability is tak en with resp ect to the randomized algorithm itself. Our relaxation takes the probabilit y with respect to the generation of the data itself. The following result is clear from the definition of random differen tial priv acy . 5 Prop osition 4.1. ( α, γ ) -RDP is a strict r elaxation of α -DP. That is, if Q n is DP then it is also RDP. However, ther e ar e RDP pr o c e dur es that ar e not DP. Remark 3. Although an α -DP pr o c e dur e fulfils the r e quir ement of ( α, 0) -RDP, the c onverse is not true. The r e ason is that the latter r e quir es that the c ondition (that the r atio of densities b e b ounde d) holds almost everywher e with r esp e ct to the unknown me asur e, wher e as DP r e quir e that this c ondition holds uniformly everywher e in the sp ac e. W e next sho w an important property of the definition, namely , that RDP algorithms may be comp osed to give other RDP algorithms with different constants. The analogous comp osition prop ert y for DP was considered to b e imp ortan t because it allow ed rapid dev elopment of techniques whic h release m ultiple statistics, as well as techniques whic h allow in teractiv e access to the data. Prop osition 4.2 (Comp osition) . Supp ose Q, Q 0 ar e distributions over Z , Z 0 which ar e ( α, γ ) -RDP and ( α 0 , γ 0 ) - RDP r esp e ctively. The fol lowing distribution C over Z × Z 0 is ( α + α 0 , γ + γ 0 ) -RDP: C ( Z, Z 0 | X ) = Q ( Z | X ) · Q 0 ( Z 0 | X ) . This result is simply an application of the union b ound combined with the standard comp osition prop ert y of differen tial priv acy . As an example, supp ose it is required to release k different statistics of some data sample. If eac h one is released via a ( α/k , γ /k )-RDP pro cedure, then the ov erall release of all k statistics together achiev es ( α, γ )-RDP . A similar result holds for the composition of ( α, δ, γ )-RDP releases. 5 RDP Sparse Histograms W e first giv e a technique for the release of a histogram whic h w orks w ell in the case of a sparse histogram, and whic h satisfies the ( α , γ )-Random Differential Priv acy . W e then compare the accuracy of this metho d to a low er b ound on the accuracy of a α -Differentially Priv ate approach. The basic idea is to not add any noise to cells with low coun ts. This results in partitioning the space into t wo blo c ks and releasing a noise-free histogram in one blo c k, and use a differentially priv ate histogram in the other. The partition will depend on the data itself. F or a sample x 1 , . . . , x n , w e denote: S = S ( x 1 , . . . , x n ) = { j : θ j = 0 } . Then w e consider the release mec hanism: z j = ( θ j j ∈ S and 2 k ≤ γ n θ j + 2 nα L o/w (5) Prop osition 5.1. The r andom ve ctor Z = ( z 1 , . . . , z k ) as define d in (5) satisfies the ( α , γ ) -RDP. In demonstrating RDP , we tak e the sample x 1 , . . . , x n , x n +1 and denote: S = S ( x 1 , . . . , x n ) and S 0 = S ( x 1 , . . . , x n − 1 , x n +1 ). W e consider the output distribution of our metho d when applied to each of the neighboring samples. The even t that the ratio of densities fail to meet the requisite b ound is a subset of the even t where either x n +1 ∈ S or x n ∈ S 0 , and when 2 k ≤ γ n . In the complemen t of this even t then the partitions are the same, and the differing samples b oth fall within the blo c k which receives the Laplace noise, so the DP condition is ac hieved. In demonstrating the RDP , w e simply b ound the probabilit y of the aforemen tioned ev ent, conditional on the order statistics. Pr o of of pr op osition 5.1. In the interest of space let the vector of order statistics b e denoted T = ( x (1) , . . . , x ( n +1) ). Let S ? ( x 1 , . . . , x n , x n +1 ) = n j : P n +1 i =1 1 { x i = j } ≤ 1 o . W e ha ve that S, S 0 ⊆ S ? . W e th us hav e P ( x n ∈ S 0 or x n +1 ∈ S | T ) ≤ P ( x n ∈ S ? or x n +1 ∈ S ? | T ) . The latter probabilit y is just the fraction of w a ys in which the order statistics may be rearranged so that x n , x n +1 fall within S ? . Due to the condition 2 k ≤ γ n , we hav e | S ? | ≤ k ≤ γ n 2 . Therefore the num b er of rearrangemen ts 6 ha ving at least one of x n or x n +1 in S ? is bounded ab o ve P ( x n ∈ S ? or x n +1 ∈ S ? | T ) ≤ 2 | S ? | n + 1 < γ . Therefore P ( x n ∈ S 0 or x n +1 ∈ S ) ≤ Z X n +1 P ( x n ∈ S 0 or x n +1 ∈ S | T ) dP ( T ) leq Z X n +1 P ( x n ∈ S ? or x n +1 ∈ S ? | T ) dP ( T ) < γ Z X n +1 dP ( T ) = γ . Finally: P  ∀ Z ⊆ Z , e − α ≤ Q n ( Z | X ) Q n ( Z | X 0 ) ≤ e α  = 1 − P ( x n ∈ S 0 or x n +1 ∈ S ) > 1 − γ . 5.1 Accuracy Here w e show that δ ( z ) from (2) is close to θ ev en when the histogram is sparse. Theorem 5.2. Supp ose that 2 k ≤ γ n . L et θ n ( x 1 , . . . , x n ) = ( θ 1 , . . . , θ r , 0 , . . . , 0) for some 1 ≤ r < k . Then || θ − δ ( z ) || 1 = O P ( r /αn ) . Pr o of. Let L 1 , . . . , L r ∼ Laplace. Let E b e the even t that L j > − nα 2 θ j for all 1 ≤ j ≤ r . Then E holds, except on a set of exp onen tially small probability . Supp ose E holds. Let W = P r j =1 L j = O P ( r ). F or 1 ≤ j ≤ r , z j =  θ j + (2 L j ) / ( nα )  F or j > r , z j = θ j = 0. Hence k z − θ k 1 = O P ( r /αn ). F urthermore k δ ( z ) − z k 1 ≤ r n ≤ r αn Hence via the triangle inequalit y w e ha ve, || δ ( z ) − θ || 1 = O P ( r /αn ). W e thus hav e a tec hnique for whic h the risk is uniformly bounded ab ov e by O ( k /αn ) as with the DP tec hnique, and whic h also enjoys the co ordinate-wise upp er b ound on the risk. Ho wev er in this regime, the risk is no longer uniformly lo wer bounded with a rate linear in k , since the upper b ound is linear in r in the case of sparse vectors. 6 RDP via Sensitivit y Analysis W e next demonstrate that RDP allows schemes for release of other kinds of statistics (besides histograms). A com- mon technique used to establish a differentially priv ate technique is to use Laplace noise with v ariance prop ortional to the “global sensitivity” of the function [6]. W e show that there is an analog of this tec hnique for RDP . W e next demonstrate a method for the RDP release of an arbitrary function g n ( x 1 , . . . , x n ) ∈ R . W e consider the algorithm whic h samples the distribution Q n ( z | x 1 , . . . , x n ) ∝ exp  − α | z − g n ( x 1 , . . . , x n ) | s n ( x 1 , . . . , x n )  (6) It is well kno wn that when s n is the constant function whic h gives an upper b ound of the global sensitivity [6] of g n , this method enjo ys the α -DP . As w e allow s n to dep end on the data w e may make use of the local sensitivit y framew ork of [14]. There it is demonstrated that whenever: 7 ∀ X ∼ X 0 s n ( X ) ≤ e β s n ( X 0 ) (7) and ∀ X sup X 0 ∼ X | g n ( X ) − g n ( X 0 ) | ≤ s n ( X ) (8) then (6) giv es (2 α, η )-DP with: η = e − α 2 β (9) (see [14] definition 2.1, lemma 2.5 and example 3). In moving from DP to RDP we ma y now require that conditions (7) and (8) hold only with the requisite probability 1 − γ . Then (6) will ac hieve (2 α, η , γ )-RDP . W e consider a special subset of functions for whic h: sup X ∼ X 0 | g n ( X ) − g n ( X 0 ) | = n − 1 sup x,x 0 h ( x, x 0 ) . Examples of functions satisfying this prop ert y are e.g., statistical p oint estimators [15] and regularized logistic regression estimates [4]. In particular in these cases it is assumed that X is some compact subset of R d and then e.g., sup x,x 0 h ( x, x 0 ) = k x − x 0 k 2 giv es the diameter of this set. W e replace conditions (7) and (8) with: P  s n ( X ) ≤ e β s n ( X 0 )  ≥ 1 − γ 1 (10) and P  n − 1 h ( x, x 0 ) ≤ min { s n ( X ) , s n ( X 0 ) }  ≥ 1 − γ 2 . (11) Note that x, x 0 are random dra ws from P whic h are indep enden t of the random vectors X, X 0 . The first condition simply requires (7), to hold except on a set of measure γ 1 . The second condition implies that b oth s n ( X ) and s n ( X 0 ) give upp er bounds to the local sensitivit y , except on a set of measure γ 2 . Putting these together along with the ab ov e considerations will yield a (2 α, η , γ 1 + γ 2 )-RDP metho d. W e note that we are essentially asking that s n ( X ) and s n ( X 0 ) b oth give v alid quantiles for the random v ariable h ( x, x 0 ), and that they giv e similar v alues with high probabilit y . W e consider the empirical process based on h and the data sample X given b y: D ( X, t ) = 2 n n/ 2 X i =1 1  h ( x i , x i + n/ 2 ) ≤ t  This is exactly an empirical CDF for the distribution of h ( x, x 0 ), based on n/ 2 indep enden t samples of h ( x, x 0 ). W e ma y an ticipate that sample quan tiles of this empirical CDF will be close to the quan tiles from the true CDF, whic h w e denote b y H ( t ) = P ( h ≤ t ). This is made precise by the DKW inequalit y (see e.g., [13]), which in this case yields: P  sup t | H ( t ) − D ( X , t ) | ≥   ≤ 2 e − n 2 . (12) Th us taking d δ ( X ) to b e the smallest d with D ( X, d ) = 1 − δ , and h δ 0 to give the 1 − δ 0 quan tile of h , with δ < δ 0 , w e hav e: P ( h ( x, x 0 ) > d δ ) ≤ δ 0 + P ( d δ ( X ) < h δ 0 ) ≤ δ 0 + 2 e − ( δ 0 − δ ) 2 n . 8 The second inequality comes from applying the monotone function D ( X , · ) to b oth sides of the inequality statemen t in the probability , and then rearranging, to yield P ( D ( X, h δ 0 ) − H ( h δ 0 ) > ( δ 0 − δ )) which is b ounded due to the DKW inequality (12). Th us for some appropriate choice of δ, δ 0 w e may take s n ( X ) = n − 1 d δ ( X ), and th us achiev e (11). No w to achiev e (10) we turn to the Bahadur-Kiefer representation of sample quantiles (see [11]). W e ha ve that: d δ ( X ) − h δ = D ( X, h δ ) − H ( h δ ) H 0 ( h δ ) + O p ( n − 3 / 4 ) where H 0 is the deriv ativ e of H (namely the density). Hence w e concentrate on the case when h is a con tinuous random v ariable. W e find the ratio to be b ounded in probabilit y: d δ ( X ) d δ ( X 0 ) ≤ 1 + | d δ ( X ) − d δ ( X 0 ) | d δ ( X 0 ) = 1 + O p ( n − 1 / 2 ) h δ + O p ( n − 1 / 2 ) where the final equality stems from using DKW to b ound the D ( X, h δ ) − H ( h δ ) and along with the triangle inequalit y to b ound | D ( X , h δ ) − D ( X 0 , h δ ) | . This therefore demonstrates that: d δ ( X ) d δ ( X 0 ) ≤ 1 + O p ( n − 1 / 2 ) = O p ( e n − 1 / 2 ) This means that for large enough n , and some probabilit y 1 − γ 2 , the ratio is bounded b y e β where β is polynomial in n − 1 / 2 . Examining (9) w e find η to b e negligible for such a choice of β . Therefore the use of s n ( X ) = n − 1 d δ ac hieves the RDP as required. W e note that in principle this same approac h would work, were w e to replace D ( X, t ) with the U-statistic pro cess: U ( X , t ) = 1  n 2  X i>j 1 { h ( x i , x j ) ≤ t } . Though this is essen tially another empirical CDF, it is based on non-independent samples since each x i partici- pates in n − 1 of the ev aluations of h . Nev ertheless an analog of the DKW inequality still applies to this pro cess, and w e still ha ve the same b ehavior of the quantiles (see e.g., [1]). 7 Priv acy Concerns As stated ab o ve, we mainly use random differential priv acy as a vehicle for a theoretical exploration of the boundaries of differen tial priv acy . Although it is a conceptually reasonable weak ening of differen tial priv acy , whether it is appropriate to use in practice requires more attention. F or example, if the hypothesized adversary (of e.g., [16] theorem 2.4), really had access to a subset of n − 1 of the data, and the one remaining element was the only inhabitan t of its histogram cell, then this would be immediately revealed to the adv ersary . Whether this is a critical problem depends on the application. 8 Example W e present tw o examples in which the RDP technique and the DP tec hniques are compared on syn thetic histogram data. In the first example the histogram has k = 25 bins, all but tw o of which are empty and n = 500 p oin ts fall in to the other tw o. Figure 1(a) shows the original data as well as the sanitized data due to differential priv acy and RDP . Figure 1(b) shows the distribution of L 1 loss from 100 simulations of both approaches. W e see that the risk of the RDP histogram is typically muc h low er than that of the DP histogram, which o ccasionally has risk in 9 −1.0 −0.5 0.0 0.5 1.0 Original Data −1.0 −0.5 0.0 0.5 1.0 Sanitized Data −1.0 −0.5 0.0 0.5 1.0 Original Data −1.0 −0.5 0.0 0.5 1.0 Santized Data (a) Original and synthetic data for DP (top) and RDP (b ottom) 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 Differential Privacy 0.0 0.2 0.4 0.6 0.8 1.0 0 5 10 15 20 Random Differential Privacy (b) Empirical error distribution for DP (top) and RDP (bottom) Figure 1: A one dimensional example. excess of 0.5 (recall that the maxim um possible loss is 2 in the case that the original and sanitized histograms had completely disjoin t supp ort). W e present an analogous t w o dimensional example in figure 2. Here the histogram has k = 400 bins in which all but 16 are empt y . In this example w e see that the RDP tec hnique has uniformly better loss than the DP technique. 9 Conclusion W e hav e introduced a relaxed version of differential priv acy— random differential priv acy—shown ho w to apply it to histograms and examined the accuracy of the resulting metho d. W e also demonstrated some prop erties of our definition, and explained a basic construction for release of arbitrary functions of the data. As we mentioned in the in tro duction, we are not suggesting that differential priv acy should b e abandoned and replaced by random differen tial priv acy . How ev er, w e do think it is fruitful to consider v arious relaxations of differen tial priv acy to gain a deep er understanding of the tradeoffs betw een the strength of the priv acy guarantee and the accuracy of the data release mec hanism. In ongoing work we are extending this work to allo w for data dep enden t choices of the n umber of bins and to allow for other densit y estimators b esides histograms. W e are also considering other relaxations of differential priv acy . W e will rep ort on these results in future work. References [1] Miguel A. Arcones. The bahadur-kiefer representation for u-quantiles. The Annals of Statistics , 24(3):1400– 1422, 1996. [2] B. Barak, K. Chaudhuri, C. Dw ork, S. Kale, F. McSherry , and K. T alwar. Priv acy , accuracy , and consistency to o: a holistic solution to contingency table release. Pr o c e e dings of the twenty-sixth A CM SIGMOD-SIGACT- SIGAR T symp osium on Principles of datab ase systems , pages 273–282, 2007. 10 −1.0 −0.5 0.0 0.5 1.0 Original Data 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 Differential Privacy 0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 14 Random Differential Privacy Figure 2: Empirical error distributions for a t wo dimensional histogram, display ed in the top left. [3] A. Blum, C. Dw ork, F. McSherry , and K. Nissim. Practical priv acy: the SuLQ framew ork. Pr o c e e dings of the twenty-fourth A CM SIGMOD-SIGACT-SIGAR T symp osium on Principles of datab ase systems , pages 128–138, 2005. [4] Kamalik a Chaudh uri and Claire Mon teleoni. Priv acy preserving logistic regression. NIPS 2008 , 2008. [5] C. Dwork and J. Lei. Differential priv acy and robust statistics. Pr o c e e dings of the 41st ACM Symp osium on The ory of Computing , pages 371–380, Ma y–June 2009. [6] C. Dw ork, F. McSherry , K. Nissim, and A. Smith. Calibrating noise to sensitivity in priv ate data analysis. Pr o c e e dings of the 3r d The ory of Crypto gr aphy Confer enc e , pages 265–284, 2006. [7] C. Dw ork, F. McSherry , and K. T alw ar. The price of priv acy and the limits of LP decoding. In Pr o c e e dings of Symp osium on the The ory of Computing , 2007. [8] Cynthia Dwork. Differential priv acy . 33r d International Col lo quium on Automata, L anguages and Pr o gr am- ming , pages 1–12, 2006. [9] Stephen E. Fienberg, Alessandro Rinaldo, and Xiaolin Y ang. Differen tial priv acy and the risk-utility tradeoff for m ulti-dimensional contingency tables. Privacy in Stat istic al Datab ases , pages 197 – 199, 2010. [10] Moritz Hardt and Kunal T alw ar. On the geometry of differential priv acy . STOC ’10 Pr o c e e dings of the 42nd A CM symp osium on The ory of c omputing , pages 705–714, 2010. [11] J. Kiefer. On bahadur’s represen tation of sample quan tiles. The A nnals of Mathematic al Statistics , 38(5):1323– 1342, 1967. [12] A. Machana v a jjhala, D. Kifer, J. Abowd, J. Gehrk e, and L. Vilh ub er. Priv acy: Theory meets Practice on the Map. Pr o c e e dings of the 24th International Confer enc e on Data Engine ering , pages 277–286, 2008. 11 [13] P . Massart. The Tigh t Constan t in the Dvoretzky-Kiefer-W olfo witz Inequality. The Annals of Pr ob ability , 18(3), 1990. [14] K. Nissim, S. Raskho dnik ov a, and A. Smith. Smo oth sensitivity and sampling in priv ate data analysis. Pr o- c e e dings of the 39th annual A CM annual ACM symp osium on The ory of c omputing , pages 75–84, 2007. [15] Adam Smith. Efficien t, differentially priv ate p oin t estimators. , 2008. [16] Larry W asserman and Shuheng Zhou. A statistical framew ork for differen tial priv acy . The Journal of the A meric an Statistic al Asso ciation , 105:375–389, 2010. [17] Bin Y u. Assouad, fano, and le cam. In D. Pollard, E. T orgersen, and G. Y ang, editors, F estschrift for Lucien L e Cam , pages 423–435. Springer, 1997. 12

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment