A New Class of Private Chi-Square Tests

In this paper, we develop new test statistics for private hypothesis testing. These statistics are designed specifically so that their asymptotic distributions, after accounting for noise added for privacy concerns, match the asymptotics of the class…

Authors: Daniel Kifer, Ryan Rogers

A New Class of Private Chi-Square Tests
A New Class of Priv ate Chi-Square T ests Daniel Kifer ∗ Ry an Rogers † Octob er 26, 2016 Abstract In this pap er, w e dev elop new test statistics for priv ate h ypothesis testing. These statistics are designed specifically so that their asymptotic distributions, after accounting for noise added for priv acy concerns, match the asymptotics of the classical (non-priv ate) c hi-square tests for testing if the multinomial data parameters lie in low er dimensional manifolds (examples include go odness of fit and indep endence testing). Empirically , these new test statistics outp erform prior w ork, which fo cused on noisy versions of existing statistics. ∗ Departmen t of Computer Science and Engineering, The Pennsylv ania State Universit y . Email: dkifer@cse.psu.edu . Supp orted in part by NSF grant CNS-1228669. † Departmen t of Applied Mathematics and Computational Science, Universit y of Pennsylv ania. Email: ryrogers@sas.upenn.edu . 1 Con ten ts 1 In troduction 3 2 Related W ork 4 3 Priv acy Preliminaries 4 4 General Chi-Square T ests 6 4.1 Priv ate Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 4.2 Minim um Chi-Square Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5 Priv ate Go o dness of Fit T ests 8 5.1 Unpro jected Priv ate T est Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.2 Pro jected Priv ate T est Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 5.3 Comparison of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.4 P o w er Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.5 Exp erimen ts for Goo dness of Fit T esting . . . . . . . . . . . . . . . . . . . . . . . . . 16 6 General Chi-Square Priv ate T ests 18 6.1 Application - Independence T est . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 6.2 Application - GW AS T esting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 7 General Chi-Square T ests with Arbitrary Noise Distributions 25 7.1 Application - Goo dness of Fit T esting . . . . . . . . . . . . . . . . . . . . . . . . . . 26 7.2 Application - Independence T esting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 8 Conclusions 29 References 32 A Pro ofs for Section 4.2 35 2 1 In tro duction In 2008, Homer et al. [ 13 ] published a pro of-of-concept attack sho wing that participation of indi- viduals in scien tific studies can b e inferred from aggregate data typically published in genome-wide asso ciation studies (GW AS). Since then, there has b een renewed interest in protecting confiden- tialit y of participants in scientific data [ 15 , 22 , 27 , 20 ] using priv acy definitions suc h as differen tial priv acy and its v ariations [ 7 , 6 , 3 , 5 ]. An imp ortant to ol in statistical inference is hyp othesis testing , a general framework for deter- mining whether a giv en mo del – called the n ull hypothesis H 0 – of a population should b e rejected based on a sample from the p opulation. One of the main b enefits of h yp othesis testing is that it giv es a w ay to control the probabilit y of false disco v ery or T yp e I error – falsely concluding that a mo del should b e rejected when it is indeed true. T yp e II error is the probabilit y of failing to reject H 0 when it is false. T ypically , scientists w an t a test that guarantees a pre-sp ecified Type I error (sa y 0.05) and has high p ower – complement of Type II error. The standard approach to hypothesis testing is to (1) estimate the mo del parameters from the data, (2) compute a test statistic T (a function of the data and the model parameters), (3) deter- mine the (asymptotic) distribution of T under the assumption that the mo del generated the data, (4) compute the p -value (Type I error) as the probabilit y of T being more extreme than the realized v alue computed from the data. 1 Our main contribution is a general template for creating test statistics inv olving categorical data. Empirically , they impro ve on the p o w er of previous work on differen tially priv ate hypothesis testing [ 12 , 24 ], while main taining at most some given T yp e I error. Our approach is to select certain prop erties of non-priv ate hypothesis tests (e.g., their asymptotic distributions) and then build new test statistics that match these properties when Gaussian noise is added (e.g., to ac hieve c onc entr ate d differ ential privacy [ 5 , 3 ] or (appr oximate) differ ential privacy [ 6 ]). Although the test statistics are designed with Gaussian noise in mind, other noise distributions can b e applied, e.g. Laplace. 2 W e point out that implications of this w ork extend beyond simply alleviating priv acy concerns. In adaptive data analysis , data may b e reused for multiple analyses, each of whic h may depend on previous outcomes thus p otentially ov erfitting. This problem w as recently studied in the computer science literature b y Dw ork et al. [ 8 ], who sho w that differen tial priv acy can help prev ent o verfitting despite reusing data. There hav e b een several follow up works [ 9 , 4 , 1 ] that improv e and extend the connection betw een differen tial priv acy and generalization guaran tees in adaptiv e data analysis. Sp ecifically , [ 18 ] deals with p ost-sele ction hyp othesis testing where they can ensure a bound on T yp e I error ev en for several adaptiv ely chosen tests, as long as eac h test is differen tially priv ate. W e discuss related work in Section 2 , provide background information ab out priv acy in Section 3 , present our extension of minim um chi-square theory in Section 4 and show ho w it can b e applied to go o dness of fit (Section 5 ) and indep endence testing (Section 6 ). Exp erimen ts app ear in these latter tw o sections. W e ev aluate our test statistics with non-Gaussian noise in Section 7 . W e presen t conclusions in Section 8 . 1 F or one-sided tests, the p -v alue is the probability of seeing the computed statistic or anything larger under H 0 . 2 If w e use Laplace noise instead, w e cannot matc h prop erties like the asymptotic distribution of the non-priv ate statistics, but the new test statistics still empirically improv e the p ow er of the tests. 3 2 Related W ork One of the first w orks to study the asymptotic distributions of statistics that use differentially priv ate data came from W asserman and Zhou [ 25 ]. Smith [ 21 ] then sho wed that for a large family of statistics, there is a corresp onding differentially priv ate statistic that shares the same asymptotic distribution as the original statistic. How ev er, these results do not ensure that statistically v alid conclusions are made for finite samples. It is then the goal of a recen t line of work to dev elop statistical inference tools that give v alid conclusions for even reasonably sized datasets. The previous work on priv ate statistical inference for categorical data can b e roughly grouped together into t wo main approaches (with most primarily dealing with GW AS sp ecific applications). The first group adds appropriately scaled noise to the sampled data (or histogram of data) to ensure differen tial priv acy and uses existing classical h yp othesis tests, disregarding the additional noise distribution [ 15 ]. This approach is based on the argument that the impact of the noise b ecomes small as the sample size grows large. Along these lines, [ 23 ] studies ho w man y more samples would b e needed b efore the test with additional noise reco vers the same lev el of p ow er as the original test on the actual data. How ever, as p ointed out in [ 11 , 16 , 17 , 12 ], even for mo derately sized datasets, the impact of priv acy noise is non-negligible and therefore suc h an approach can lead to misleading and statistically in v alid results, sp ecifically with muc h higher Type I error than the prescrib ed amoun t. The second group of work consists of tests that fo cus on adjusting step (3) in the standard approac h to h yp othesis testing given in the in tro duction. That is, these tests use the same statistic in the classical h yp othesis tests (without noise) and after making the statistic differen tially priv ate, they determine the resulting mo dified asymptotic distribution of the priv ate statistic [ 22 , 27 , 24 , 12 ]. Unfortunately , the resulting asymptotic distribution cannot b e written analytically , and so Monte Carlo (MC) sim ulations or n umerical approximations are commonly used to determine at what p oin t to reject the n ull h yp othesis. W e fo cus on a different tec hnique from these t w o differen t approac hes, namely mo difying step (2) in our outline of hypothesis testing. Thus, w e consider transforming the test statistic itself so that the resulting distribution is close to the original asymptotic distribution when additional Gaussian noise is used. If the noise is non-Gaussian, then this is follo wed by another step that appropriately adjusts the asymptotic distribution. The idea of mo difying the test statistic for r e gr ession c o efficients to obtain a t -statistic in ordinary least squares has also b een considered in [ 19 ]. 3 Priv acy Preliminaries F ormal priv acy definitions can be used to protect scientific data with the careful injection of noise. Hyp othesis testing must then properly account for this noise to av oid generating false conclusions. Since our primary fo cus is on the noise added b y the priv acy definitions (rather than their sp ecific priv acy seman tics), w e briefly discuss the priv acy definitions and then elab orate on ho w to add noise to satisfy those definitions. Let X b e an arbitrary domain for records. W e define t w o datasets x = ( x 1 , · · · , x n ) , x 0 = ( x 0 1 , · · · , x 0 n ) ∈ X n to b e neighb oring if they differ in at most one en try , i.e. there is some i ∈ [ n ] where x i 6 = x 0 i , but x j = x 0 j for all j 6 = i . W e now define differen tial priv acy (DP)[ 7 , 6 ]. Definition 3.1 (Differen tial Priv acy) . A randomized algoirthm M : X n → O is ( , δ )-DP if for all 4 neigh b oring datasets x , x 0 and each subset of outcomes S ⊆ O , Pr [ M ( x ) ∈ S ] ≤ e  Pr  M ( x 0 ) ∈ S  + δ . If δ = 0, w e simply say M is  -DP . In this work, we fo cus on a recent v ariation of differential priv acy , called zer o c onc entr ate d differ ential privacy (zCDP) [ 3 ] (see also [ 5 ] for the definition of concen trated differen tial priv acy whic h [ 3 ] modified). In order to define zCDP , w e first define the R ´ en yi-div ergence b et w een t wo probabilit y distributions. Definition 3.2 (R´ en yi-Divergence) . Let P 1 and P 2 b e probability distributions on space Ω. F or α ∈ (1 , ∞ ), we define the R ´ en yi-Div ergence of order α of P 1 from P 2 as D α ( P 1 || P 2 ) = 1 α − 1 log E x ∼ P 1 "  P 1 ( x ) P 2 ( x )  α − 1 #! . R emark 3.3 . Note as α → 1 w e get KL-divergence and as α → ∞ w e get max-div ergence. W e are no w ready to define zCDP . Definition 3.4 (zCDP) . A mechanism M : X n → O is ρ -zero concentrated differen tially priv ate (zCDP), if for all neigh b oring datasets x , x 0 ∈ X n and all α ∈ (1 , ∞ ) w e ha v e D α  M ( x ) ||M ( x 0 )  ≤ ρα The following result shows that zCDP lies b etw een pur e -DP where δ = 0 and appr oximate -DP where δ may be p ositiv e. Theorem 3.5 ([ 3 ]) . If M is  -DP, then M is  2 2 -zCDP. F urther, if M is ρ -zCDP then M is ( ρ + 2 p ρ ln(1 /δ ) , δ ) -DP for every δ > 0 . In order to compute some statistic f : X n → R d on the data, a differentially priv ate mec hanism is to simply add symmetric noise to f ( x ) with standard deviation that depends on the glob al sensitivity of f , which we define as ∆ p ( f ) = max neighboring x , x 0 ∈X n {|| f ( x ) − f ( x 0 ) || p } . In statistical h yp othesis tests, it is typical to use the central limit theorem to form statistics of the data that are asymptotically normally distributed. Then we can determine whether to reject the given mo del in h yp othesis testing b y computing the corresp onding p -v alues based on the asymptotic distribution of the statistic, whic h w orks well in practice. Because Gaussian random v ariables hav e nice composition guaran tees, lik e the sum of tw o Gaussian random v ariables is again Gaussian (a prop ert y that is not shared with Laplace random v ariables), it is then desirable to use a priv acy definition whic h is more accustomed to Gaussian p erturbations. W e then define the Gaussian mechanism M Gauss : X n → R d for statistic f : X n → R d , where σ = ∆ 2 ( f ) √ 2 ρ , as M Gauss ( x ) ∼ N ( f ( x ) , σ 2 I d ) . (1) Theorem 3.6. F or statistic f : X n → R d , the Gaussian me chanism M Gauss is ρ -zCDP. 5 W e now state several of the nice prop erties that zCDP shares with DP . Theorem 3.7 (P ost Pro cessing [ 3 ]) . L et M : X n → O and g : O → O 0 b e r andomize d algorithms. If M is ρ -zCDP then M 0 : X n → O 0 wher e M 0 ( x ) = g ( M ( x )) is ρ -zCDP. Theorem 3.8 (Composition [ 3 ]) . L et M 1 : X n → O and M 2 : X n → O 0 b e r andomize d algorithms wher e M 1 is ρ 1 -zCDP and M 2 is ρ 2 -zCDP. Then the c omp osition M : X n → O × O 0 wher e M ( x ) = ( M 1 ( x ) , M 2 ( x )) is ( ρ 1 + ρ 2 ) -zCDP. F or this w ork we will b e considering categorical data. That is, w e assume the domain X has b een partitioned into d buck ets or outcomes and the function f : X n → R d returns a histogram counting ho w man y records are in eac h buck et. Our test statistics will only dep end on this histogram. Since neigh b oring datasets x , x 0 of size n differ on only one entry , their corresp onding histograms differ b y ± 1 in exactly t w o buc kets. Hence, we will sa y that tw o histograms are neigh b oring if they differ in at most tw o en tries by at most 1. In this case, ∆ 1 ( f ) = 2 and ∆ 2 ( f ) = √ 2. T o preserv e priv acy , w e will add noise to the corresp onding histogram X = ( X 1 , · · · , X d ) of our original dataset to get ˜ X = ( ˜ X 1 , . . . , ˜ X d ). W e p erform hypothesis testing on this noisy histogram ˜ X . By Theorem 3.7 , w e kno w that eac h of our hypothesis tests will b e ρ -zCDP as long as w e add Gaussian noise with v ariance 1 /ρ to each coun t in X . Similarly , if we add Laplace noise with scale 2 / to each coun t, w e will ac hieve  -DP (this is just an instance of the Laplace Mec hanism [ 7 ]). 4 General Chi-Square T ests In the non-priv ate setting, a chi-square test inv olves a histogram X and a mo del H 0 that produces exp ected counts X o ver the d buc k ets. In general, H 0 will hav e k < d parameters and will estimate the parameters from X . The chi-square test statistic is defined as T chi = d X i =1 ( X i − X i ) 2 /X i . If the data w ere generated from H 0 and if k parameters had to b e estimated, then the asymptotic distribution of T chi is χ 2 d − k − 1 , a c hi-square random v ariable with d − k − 1 degrees of freedom. This is the prop erty w e wan t our statistics to hav e when they are computed from the noisy histogram ˜ X instead of X . Note that in the classical chi-square tests (e.g. Pearson indep endence test), the statistic T chi is computed and if it is larger than the 1 − α p ercen tile of χ 2 d − k − 1 , then the mo del is rejected. The ab ov e facts are part of a more general minimum chi-squar e asymptotic the ory [ 10 ], whic h w e o v erview in Section 4.2 . How ever, we first explain the differences b etw een priv ate and non-priv ate asymptotics [ 24 , 12 ]. 4.1 Priv ate Asymptotics In non-priv ate statistics, a function of n data records is considered a random v ariable, and non- priv ate asymptotics considers this distribution as n → ∞ . In priv ate asymptotics, there is another quan tit y σ 2 n , the v ariance of the added noise. In the classic al private r e gime , one studies what happ ens as n/σ 2 n → ∞ ; i.e., when the v ariance due to priv acy is insignifican t compared to sampling v ariance in the data (i.e. O ( n )). In practice, 6 asymptotic distributions derived under this regime result in unreliable hypothesis tests b ecause priv acy noise is significant [ 22 ]. In the varianc e-awar e private r e gime , one studies what happens as n/σ 2 n → constant as n → ∞ ; that is, when the v ariance due to priv acy is prop ortional to sampling v ariance. In practice, asymptotic distributions derived under this regime result in h yp othesis tests with reliable Type I error (i.e. the p -v alues they generate are accurate) [ 12 , 24 ]. F rom no w on, we will be using the v ariance-a w are priv acy regime. 3 4.2 Minim um Chi-Square Theory In this section, w e presen t imp ortant results ab out minimum chi-squar e the ory . The discussion is based largely on [ 10 ] (Chapter 23). Our work relies on this theory to construct new priv ate test statistics in Sections 5 and 6 whose asymptotic b ehavior matches the non-priv ate asymptotic b eha vior of the classical c hi-square test. W e consider a sequence of d -dimensional random v ectors V ( n ) for n ≥ 1 (e.g. the data his- togram). The parameter space Θ is a non-empt y op en subset of R k , where k ≤ d . The mo del A maps a k -dimensional parameter θ ∈ Θ into a d -dimensional vector (e.g., the exp ected v alue of V ( n ) ), hence it maps Θ to a subset of a k -dimensional manifold in d -dimensional space. In this abstract setting, the n ull hypothesis is that there exists a θ 0 ∈ Θ such that: 4 √ n  V ( n ) − A ( θ 0 )  D → N (0 , C ( θ 0 )) (2) where C ( θ ) ∈ R d × d is a co v ariance matrix. Intuitiv ely , Equation 2 says that the Cen tral Limit Theorem can be applied for θ 0 . W e measure the distance b et ween V ( n ) and A ( θ ) with a test statistic given by the following quadratic form: D ( n ) ( θ ) = n  V ( n ) − A ( θ )  | M ( θ )  V ( n ) − A ( θ )  (3) where M ( θ ) ∈ R d × d is a symmetric p ositive-semidefinite matrix; differen t choices of M will result in different test statistics. W e mak e the following standard assumptions about A ( θ ) and M ( θ ). Assumption 4.1. F or al l θ ∈ Θ , we have: • A ( θ ) is bic ontinuous, 5 • A ( θ ) has c ontinuous first p artial derivatives, which we denote as ˙ A ( θ ) with ful l r ank k , • M ( θ ) is c ontinuous in θ and ther e exists an η > 0 such that M ( θ ) − η I d is p ositive definite in an op en neighb orho o d of θ 0 . The following theorem will b e useful in determining the distribution for the quadratic form D ( n ) ( θ ). 3 Note that taking n and σ 2 n to infinit y is just a mathematical tool for simplifying expressions while mathematically k eeping priv acy noise v ariance proportional to the data v ariance; it do es not mean that the amount of actual noise added to the data dep ends on the data size. 4 Here D → means conv ergence in distribution, as in the Central Limit Theorem [ 10 ]. 5 i.e. θ j → θ ⇔ A ( θ j ) → A ( θ ). 7 Theorem 4.2 ([ 10 ]) . L et W ∼ N (0 , Λ) . W | W ∼ χ 2 r (chi-squar e distribution with r de gr e es of fr e e dom) if and only if Λ is a pr oje ction of r ank r . If Λ is invertible, W | Λ − 1 W ∼ χ 2 r . If θ 0 is known, setting M ( θ ) = C ( θ ) − 1 in ( 3 ) and applying Theorem 4.2 sho ws that then D ( n ) ( θ 0 ) con v erges in distribution to χ 2 d . How ever, as we show in Section 5 , this can b e a sub-optimal c hoice of M . When θ 0 is not known, we need to estimate a go o d parameter b θ ( n ) to plug into ( 3 ). One approac h is to set b θ ( n ) = arg min θ ∈ Θ D ( n ) ( θ ). Ho wev er, this can b e a difficult optimization. If there is a rough estimate of θ 0 based on the data, call it φ ( V ( n ) ), and if it con v erges in probability to θ 0 (i.e. φ ( V ( n ) ) P → θ 0 as n → ∞ ), then we can plug it in to the middle matrix to get: b D ( n ) ( θ ) = n  V ( n ) − A ( θ )  | M ( φ ( V ( n ) ))  V ( n ) − A ( θ )  . (4) and then set our estimator b θ ( n ) = arg min θ ∈ Θ b D ( n ) ( θ ). The test statistic b ecomes b D ( n ) ( b θ ( n ) ) and the following theorems describe its asymptotic prop erties under the n ull h yp othesis. W e use the shorthand A = A ( θ 0 ), M = M ( θ 0 ), and C = C ( θ 0 ). See the appendix for the full pro of, whic h follo ws a similar argument as in [ 10 ]. Theorem 4.3. L et b θ ( n ) = argmin θ ∈ Θ b D ( n ) ( θ ) . Given Assumption 4.1 and ( 2 ) , we have √ n ( b θ ( n ) − θ 0 ) D → N (0 , Ψ) wher e θ 0 is the true p ar ameter and Ψ =  ˙ A | M ˙ A  − 1 ˙ A | M C M ˙ A  ˙ A | M ˙ A  − 1 . W e then state the follo wing result using a slight mo dification of Theorem 24 in [ 10 ], whic h w e pro v e in the app endix. Theorem 4.4. L et ν b e the r ank of C ( θ 0 ) . If Assumption 4.1 and ( 2 ) hold, and, for al l θ ∈ Θ , C ( θ ) M ( θ ) C ( θ ) = C ( θ ) and C ( θ ) M ( θ ) ˙ A ( θ ) = ˙ A ( θ ) then for b θ ( n ) given in The or em 4.3 and b D ( n ) ( θ ) given in ( 4 ) we have: b D ( n )  b θ ( n )  D → χ 2 ν − k 5 Priv ate Go o dness of Fit T ests As a warm up, w e will first co v er go o dness of fit testing where the null h yp othesis is simply testing whether the underlying unknown parameter is equal to a particular v alue. W e consider categorical data X ( n ) =  X ( n ) 1 , · · · , X ( n ) d  ∼ Multinomial( n, p ) where p = ( p 1 , · · · , p d ) is some probabilit y v ector ov er the d outcomes. W e w an t to test the null hypothesis H 0 : p = p 0 , where eac h comp onent of p 0 is positive, but we w ant to do so in a priv ate wa y . W e then hav e the follo wing classical result [ 2 ]. 8 Lemma 5.1. Under the nul l hyp othesis H 0 : p = p 0 , X ( n ) /n is asymptotic al ly normal √ n X ( n ) n − p 0 ! D → N (0 , Σ) wher e Σ has r ank d − 1 and c an b e written as Σ defn = Diag ( p 0 ) − p 0 ( p 0 ) | . (5) 5.1 Unpro jected Priv ate T est Statistic T o preserv e ρ -zCDP , we will add appropriately scaled Gaussian noise to eac h comp onen t of the histogram X ( n ) . W e then define the zCDP statistic U ( n ) ρ =  U ( n ) ρ, 1 , · · · , U ( n ) ρ,d  where we write Z ∼ N (0 , 1 /ρ · I d ) and U ( n ) ρ defn = √ n X ( n ) + Z n − p 0 ! . (6) W e next derive the asymptotic distribution of U ( n ) ρ under b oth priv ate asymptotic regimes in Section 4.1 (note that σ 2 = 1 /ρ ). Lemma 5.2. The r andom ve ctor U ( n ) ρ n fr om ( 6 ) under the nul l hyp othesis H 0 : p = p 0 has the fol lowing asymptotic distribution. If nρ n → ∞ then U ( n ) ρ n D → N (0 , Σ) . F urther, if nρ n → ρ > 0 then U ( n ) ρ n D → N (0 , Σ ρ ) wher e Σ ρ has ful l r ank and Σ ρ defn = Σ + 1 /ρ · I d . (7) Pr o of. W e know from the cen tral limit theorem that U ( n ) ρ will conv erge in distribution to a m ulti- v ariate normal with co v ariance matrix given in ( 7 ). W e now show that Σ ρ is full rank. F rom ( 5 ) we kno w that Σ is p ositive-semidefinite b ecause it is a cov ariance matrix, hence it has all nonnegative eigen v alues. W e then consider the eigenv alues of Σ ρ . Let x ∈ R d b e b e an eigen vector of Σ ρ with eigen v alue λ ∈ R , i.e. Σ ρ x = λ x = ⇒ Σ x = ( λ − 1 /ρ ) x . W e then m ust ha v e that x is also an eigen vector of Σ. Because Σ is p ositiv e-semidefinite w e hav e the following inequality λ − 1 /ρ ≥ 0 = ⇒ λ ≥ 1 /ρ > 0 . Th us, all the eigenv alues of Σ ρ are p ositive, whic h results in Σ ρ b eing nonsingular. Because Σ ρ is in vertible when the priv acy parameter ρ > 0, we can create a new statistic based on U ( n ) ρ that has a chi-square asymptotic distribution under v ariance-aw are priv acy asymptotics. Theorem 5.3. L et U ( n ) ρ n b e given in ( 6 ) for nρ n → ρ > 0 . If the nul l hyp othesis H 0 : p = p 0 holds, then for Σ nρ n given in ( 7 ) , we have Q ( n ) ρ n defn =  U ( n ) ρ n  | Σ − 1 nρ n U ( n ) ρ n D → χ 2 d . (8) 9 Pr o of. W e directly apply Theorem 4.2 with W ( n ) = Σ − 1 / 2 nρ n U ( n ) ρ n whic h is asymptotically multiv ariate normal with mean zero and co v ariance Σ − 1 / 2 ρ Σ ρ Σ − 1 / 2 ρ = I d . By computing the inv erse of Σ nρ n w e can simplify the statistic Q ( n ) ρ n . Lemma 5.4. We c an r ewrite the statistic in ( 8 ) as Q ( n ) ρ = d X i =1  U ( n ) ρ,i  2 p 0 i + 1 nρ + nρ P d ` =1 p 0 ` p 0 ` + 1 nρ   d X j =1 p 0 j p 0 j + 1 nρ · U ( n ) ρ,j   2 . (9) Pr o of. W e b egin b y writing the inv erse of the cov ariance matrix Σ ρ from ( 7 ) by applying W o o d- bury’s formula [ 26 ] which gives the inv erse of a modified rank deficien t matrix, Σ − 1 ρ = Diag ( p 0 + 1 /ρ · 1 1 1) − 1 + 1 1 − p 0 · ω ( ρ ) ω ( ρ ) ω ( ρ ) | (10) where ω ( ρ ) =  p 0 1 p 0 1 +1 /ρ , · · · , p 0 d p 0 d +1 /ρ  | = p 0 p 0 +1 /ρ · 1 1 1 . W e note that the vector 1 1 1 is an eigenv ector of Σ ρ and Σ − 1 ρ with eigen v alue 1 /ρ and ρ , respectively . Letting ˜ X ( n ) i = X ( n ) i + Z i b e the perturb ed v ersion of X ( n ) i leads to the test statistic  U ( n ) ρ  | Σ − 1 nρ U ( n ) ρ = d X i =1 ( ˜ X ( n ) i − np 0 i ) 2 np 0 i + 1 /ρ + 1 1 − P i ( p 0 i ) 2 p 0 i + 1 nρ d X i =1 ( ˜ X ( n ) i − np 0 i ) √ n p 0 i p 0 i + 1 nρ ! 2 = d X i =1 ( ˜ X ( n ) i − np 0 i ) 2 np 0 i + 1 /ρ + 1 1 − P i ( p 0 i ) 2 p 0 i + 1 nρ d X i =1 ( ˜ X ( n ) i − np 0 i ) √ n p 0 i p 0 i + 1 nρ ! 2 W e can then rewrite the term in the denominator, 1 − d X i =1 ( p 0 i ) 2 p 0 i + 1 nρ = d X i =1 p 0 i ( p 0 i + 1 nρ ) p 0 i + 1 nρ − ( p 0 i ) 2 p 0 i + 1 nρ ! = 1 nρ · d X i =1 p 0 i p 0 i + 1 nρ . Recalling the form of U ( n ) ρ from ( 6 ) concludes the proof. Note that the coefficient on the second term of ( 9 ) gro ws large as nρ → ∞ , so this test statistic do es not approach the nonpriv ate test for a fixed ρ . This is not surprising since Σ nρ m ust con verge to a singular matrix as nρ → ∞ . F urther, the additional noise adds a degree of freedom to the asymptotic distribution of the original statistic. This additional degree of freedom results in increasing the p oin t in which w e reject the null h yp othesis, i.e. the critical v alue. Thus, rejecting an incorrect mo del becomes harder as we increase the degrees of freedom, and hence decreases p o w er. 10 5.2 Pro jected Priv ate T est Statistic Giv en that the test statistic in the previous section dep ends on a nearly singular matrix, we no w deriv e a new test statistic for the priv ate go o dness of fit test. It has the remark able prop erty that its asymptotic distribution is χ 2 d − 1 under b oth priv ate asymptotics. W e start with the follo wing observ ation. In the classical c hi-square test, the random v ariables  ( X ( n ) i − np 0 i ) √ np 0 i  d i =1 ha v e co v ariance matrix I d − p p 0 p p 0 | under the n ull h yp othesis H 0 : p = p 0 . The classical test essen tially uncorrelates these random v ariables and pro jects them on to the subspace orthogonal to p p 0 . W e will use a similar in tuition for the priv acy-preserving random vector U ( n ) ρ . The matrix Σ ρ in ( 7 ) has eigenv ector 1 1 1 with eigen v alue 1 /ρ – regardless of the true parameters of the data-generating distribution. Hence we think of this direction as pure noise. W e therefore pro ject U ( n ) ρ on to the space orthogonal to 1 1 1 (i.e. enforce the constraint that the entries in U ( n ) ρ add up to 0, as they would in the noiseless case). W e then define the pr oje cte d statistic Q Q Q ( n ) ρ as the follo wing where w e write the pro jection matrix P P P defn = I d − 1 d 1 1 1 1 1 1 | Q Q Q ( n ) ρ defn =  U ( n ) ρ  | P P P Σ − 1 nρ P P P U ( n ) ρ . (11) It will b e useful to write out the middle matrix in Q Q Q ( n ) ρ n for analyzing its asymptotic distribution whic h we prov e in the supplementary file. Lemma 5.5. F or the c ovarianc e matrix Σ nρ n given in ( 7 ) , we have the fol lowing identity when nρ n → ρ > 0 P P P Σ − 1 nρ n P P P → Σ − 1 ρ − ρ d · 1 1 1 1 1 1 | (12) F urther, when nρ n → ∞ , we have the fol lowing P P P Σ − 1 nρ n P P P → P P P Diag  p 0  − 1 P P P (13) Pr o of. T o prov e ( 12 ), w e use the fact that Σ − 1 ρ has eigen v alue ρ with eigen v ector 1 1 1. W e then fo cus on proving ( 13 ). W e use the identit y for the in verse of Σ − 1 nρ n from ( 10 ). P P P Σ − 1 nρ n P P P = P P P Diag ( p 0 + 1 nρ n · 1 1 1) − 1 P P P + nρ n P d i =1 p 0 i p 0 i + 1 nρ n · P P P p 0 p 0 + 1 nρ n 1 1 1 ! p 0 p 0 + 1 nρ n 1 1 1 ! | P P P W e then focus on the second term and write λ n = P d i =1 p 0 i p 0 i + 1 nρ n . nρ n λ n · P P P p 0 p 0 + 1 nρ n 1 1 1 ! p 0 p 0 + 1 nρ n 1 1 1 ! | P P P = nρ n λ n · p 0 p 0 + 1 nρ n 1 1 1 − λ n d · 1 1 1 ! p 0 p 0 + 1 nρ n · 1 1 1 − λ n d · 1 1 1 ! | 11 = nρ n λ n · p 0 p 0 + 1 nρ n · 1 1 1 ! p 0 p 0 + 1 nρ n · 1 1 1 ! | − nρ n d · p 0 p 0 + 1 nρ n · 1 1 1 ! 1 1 1 | − nρ n d · 1 1 1 p 0 p 0 + 1 nρ n · 1 1 1 ! | + nρ n λ n d 2 1 1 1 1 1 1 | W e consider en try ( i, j ) of the ab o ve matrix, which w e can write as nρ n λ n · p 0 i p 0 i + 1 nρ n ! · p 0 j p 0 j + 1 nρ n ! − nρ n d p 0 i p 0 i + 1 nρ n + p 0 j p 0 j + 1 nρ n ! + nρ n λ n d 2 = nρ n dλ n λ 2 n d − 1 ( p 0 i + 1 nρ n )( p 0 j + 1 nρ n )  λ n nρ n ( p 0 i + p 0 j ) − p 0 i p 0 j ( d − 2 λ n )  ! = nρ n λ n d 2 − (2 λ n − d ) p 0 i p 0 j d ( p 0 i + 1 nρ n )( p 0 j + 1 nρ n ) ! − p 0 i + p 0 j dλ n ( p 0 i + 1 nρ n )( p 0 j + 1 nρ n ) . W e then let n → ∞ to get nρ n d λ n d − (2 λ n − d ) p 0 i p 0 j λ n ( p 0 i + 1 nρ n )( p 0 j + 1 nρ n ) ! − p 0 i + p 0 j dλ n ( p 0 i + 1 nρ n )( p 0 j + 1 nρ n ) → 1 p 0 i + 1 p 0 j − 1 p 0 i − 1 p 0 j = 0 . Th us, we hav e shown that for nρ n → ∞ , P P P Σ − 1 nρ n P P P → P P P Diag  p 0  − 1 P P P . W e now sho w that the pro jected statistic is asymptotically c hi-square distributed in b oth priv ate asymptotic regimes. Theorem 5.6. L et U ( n ) ρ b e given in ( 6 ) . F or nul l hyp othesis H 0 : p = p 0 , we c an write the pr oje cte d statistic Q Q Q ( n ) ρ in the fol lowing way for ˜ n = P d i =1 ( X ( n ) i + Z i ) Q Q Q ( n ) ρ = d X i =1  U ( n ) i ( ρ )  2 p 0 i + 1 nρ − ρ d ( ˜ n − n ) 2 + nρ P d ` =1 p 0 ` p 0 ` + 1 nρ   d X j =1 p 0 j p 0 j + 1 nρ · U ( n ) j ( ρ )   2 . (14) F urther for nρ n → ρ > 0 , if the nul l hyp othesis holds then Q Q Q ( n ) ρ n D → χ 2 d − 1 . Pr o of. W e first show that we can write the pro jected statistic in ( 11 ) in the prop osed wa y . Using ( 12 ), w e can write the pro jected statistic in terms of the unpro jected statistic in ( 9 ), whic h will giv e the expression in ( 14 ) Q Q Q ( n ) ρ n = U ( n ) i ( ρ n ) |  Σ − 1 nρ n − nρ n d · 1 1 1 1 1 1 |  U ( n ) i ( ρ n ) = Q ( n ) ρ n − nρ n d · U ( n ) i ( ρ n ) | 1 1 1 1 1 1 | U ( n ) i ( ρ n ) . 12 W e then turn to determining the asymptotic distribution of the pro jected statistics when nρ n → ρ > 0. Recall that 1 1 1 is an eigen vector of Σ ρ . Note that Σ ρ is diagonalizable, i.e. Σ ρ = B D B | where D is a diagonal matrix and B is an orthogonal matrix with one column b eing 1 /d · 1 1 1. F or the following matrix Λ, we can write it as a d × d identit y matrix except one of the en tries on the diagonal is zero. Λ = Σ − 1 / 2 ρ P P P B D B | P P P Σ − 1 / 2 ρ . Th us, Λ is idemp otent and has rank d − 1. W e define W ∼ N (0 , I d − 1 ). W e then know that Q Q Q ( n ) ρ n has the same asymptotic distribution as W | W and so we can apply Theorem 4.2 . Theorem 5.7. F or histo gr am data X ( n ) , the pr oje cte d statistic Q Q Q ( n ) ρ n in The or em 5.6 c onver ges in distribution to a χ 2 d − 1 when H 0 : p = p 0 holds and nρ n → ∞ . In fact, as nρ n → ∞ , the differ enc e b etwe en Q Q Q ( n ) ρ n and the classic al chi-squar e statistic P d i =1  X ( n ) i − np 0 i  2 np 0 i c onver ges in pr ob ability to 0. Pr o of. Although Σ − 1 nρ n do es not exist as nρ n → ∞ , we can still write the asymptotic pro jected statistic. The middle matrix in the pro jected statistic when nρ n → ∞ is then P P P Diag ( p 0 ) − 1 P P P . When nρ n → ∞ , w e also ha ve that U ( n ) ρ n D → N (0 , Σ) from Lemma 5.2 . W e then analyze the asymptotic distribution of the pro jected statistic, where we write U ∼ N (0 , Σ) and study the distribution of U | P P P Diag ( p 0 ) − 1 P P P U . W e note that w e ha ve U | 1 1 1 = 0, whic h simplifies the asymptotic distribution of the pro jected statistic. U | P P P Diag ( p 0 ) − 1 P P P U = d X i =1 U 2 i p 0 i Note that this last final form is exactly the original chi-square statistic used in the classical test, whic h is kno wn to conv erge to χ 2 d − 1 . 5.3 Comparison of Statistics W e now wan t to compare the tw o priv ate c hi-square statistics in ( 8 ) and ( 11 ) to see which ma y lead to a larger p ower (i.e. smaller Type I I error). The follo wing theorem sho ws that w e can write the unpro jected statistic ( 8 ) as a combination of b oth the pro jected statistic ( 11 ) and squared indep enden t Gaussian noise. Theorem 5.8. Consider histo gr am data X ( n ) that has Gaussian noise Z ∼ N (0 , 1 /ρ · I d ) adde d to it. F or the statistics Q ( n ) ρ and Q Q Q ( n ) ρ b ase d on the noisy c ounts given in ( 8 ) and ( 11 ) r esp e ctively, we have Q ( n ) ρ = Q Q Q ( n ) ρ + nρ d d X i =1 Z i ! 2 . F urther, for any fixe d data X ( n ) , Q Q Q ( n ) ρ is indep endent of  P d i =1 Z i  2 . T o prov e this we will use the noncentral v ersion of Craig’s Theorem. Theorem 5.9 (Craig’s Theorem [ 14 ]) . L et Y ∼ N ( µ, V ) . Then the quadr atic forms Y | AY and Y | B Y ar e indep endent if AV B = 0 . 13 W e are no w ready to pro ve our theorem. Pr o of of The or em 5.8 . W e first show that we can write Q ( n ) ρ − Q Q Q ( n ) ρ = nρ d  P d i =1 Z i  2 . Note that  U ( n ) ρ  | 1 1 1 = P d i =1 Z i / √ n and Σ − 1 ρ has eigenv alue ρ with eigen vector 1 1 1. W e then hav e Q ( n ) ρ =  U ( n ) ρ  | Σ − 1 nρ U ( n ) ρ =  U ( n ) ρ  |  I d − 1 d 1 1 1 1 1 1 | + 1 d 1 1 1 1 1 1 |  | Σ − 1 nρ  I d − 1 d 1 1 1 1 1 1 | + 1 d 1 1 1 1 1 1 |  U ( n ) ρ = Q Q Q ( n ) ρ + 2 d ( U ( n ) ρ ) | 1 1 1 1 1 1 | Σ − 1 nρ  I d − 1 d 1 1 1 1 1 1 |  U ( n ) ρ + 1 d 2 ( U ( n ) ρ ) | 1 1 1 1 1 1 | Σ − 1 nρ 1 1 1 1 1 1 | U ( n ) ρ = Q Q Q ( n ) ρ + 2 d ( U ( n ) ρ ) | 1 1 1 1 1 1 | Σ − 1 nρ P P P U ( n ) ρ + nρ d d X i =1 Z i ! 2 = Q Q Q ( n ) ρ + 2 nρ d d X i =1 Z i / √ n ! 1 1 1 | U ( n ) ρ − 2 nρ d d X i =1 Z i ! 2 + nρ d d X i =1 Z i ! 2 = Q Q Q ( n ) ρ + nρ d d X i =1 Z i ! 2 W e now apply Craig’s Theorem to sho w that for a fixed histogram X ( n ) , we ha ve Q Q Q ( n ) ρ is inde- p enden t of  P d i =1 Z i  2 . When X ( n ) is fixed, w e can define the random v ariable Y ∼ N ( µ, 1 /ρI d ) where µ = ( X ( n ) − n p 0 ) / √ n . If w e set A = P P P Σ − 1 nρ P P P , then our pro jected statistic can be rewritten as Y | AY . F urther, if we define B = 1 1 1 1 1 1 | , then  P d i =1 Y i  2 = Y | B Y . W e then ha ve A (1 /ρ · I d ) B = 0, so that the pro jected statistic is indep endent of  P d i =1 Y i  2 . W e next note that Y = Z + µ and that 1 1 1 | µ = 0. Hence, Y | B Y = ( Z + µ ) | B ( Z + µ ) = Z | B Z + 2 µ | B Z + µ | B µ = Z | B Z = d X i =1 Z i ! 2 . Algorithm 1 ( zCDP-GOF ) sho ws how to perform go o dness of fit testing with either of these t w o test statistics, i.e. unpro jected ( 8 ) or pro jected ( 11 ). W e note that our test is zCDP for neighboring histogram datasets due to it b eing an application of the Gaussian mec hanism and Theorem 3.7 . Hence: Theorem 5.10. zCDP-GOF ( · ; ρ, α, p 0 ) is ρ -zCDP. 5.4 P ow er Analysis F rom Theorem 5.8 we see that the difference b et w een Q ( n ) ρ and Q Q Q ( n ) ρ is the addition of squared indep enden t noise. This additional noise can only hurt p ower , b ecause for the same data the 14 Algorithm 1 zCDP Chi-Square Goo dness of Fit T est pro cedure zCDP-GOF ( X ( n ) ; ρ , α , H 0 : p = p 0 ) Set ˜ X ( n ) ← X ( n ) + Z where Z ∼ N (0 , 1 /ρ · I d ). F or the unpr oje cte d statistic : T ← 1 n  ˜ X ( n ) − n p 0  | Σ − 1 nρ  ˜ X ( n ) − n p 0  t ← (1 − α ) quan tile of χ 2 d F or the pr oje cte d statistic : T ← 1 n  ˜ X ( n ) − n p 0  | P P P Σ − 1 nρ P P P  ˜ X ( n ) − n p 0  t ← (1 − α ) quan tile of χ 2 d − 1 if T > t then Reject statistic Q ( n ) ρ has larger v ariance than Q Q Q ( n ) ρ and do es not dep end on the underlying data. If we fix an alternate h yp othesis, w e can obtain asymptotic distributions for our tw o test statistics. Theorem 5.11. Consider the nul l hyp othesis H 0 : p = p 0 and the alternate hyp othesis H 1 : p = p 0 + ∆ ∆ ∆ √ n wher e P d i =1 ∆ i = 0 . Assuming th e data X ( n ) c omes fr om the alternate H 1 , the two statistics Q Q Q ( n ) ρ n , and Q ( n ) ρ n have nonc entr al chi-squar e distributions when nρ n → ρ > 0 , i.e. Q ( n ) ρ n D → χ 2 d  ∆ ∆ ∆ | Σ − 1 ρ ∆ ∆ ∆  & Q Q Q ( n ) ρ n D → χ 2 d − 1  ∆ ∆ ∆ | Σ − 1 ρ ∆ ∆ ∆  . F urther, if nρ n → ∞ then Q Q Q ( n ) ρ n D → χ 2 d − 1 X i ∆ 2 i p 0 i ! W e point out that in the case where nρ n → ∞ , the pro jected statistic has the same asymptotic distribution as the classical (nonpriv ate) c hi-square test under the same alternate h yp othesis. W e will use the follo wing result to prov e this theorem. Lemma 5.12 ([ 10 ]) . Supp ose Y ∼ N ( δ δ δ , V ) . If V is a pr oje ction of r ank ν and V δ δ δ = δ δ δ then U | U ∼ χ 2 ν ( δ δ δ | δ δ δ ) . Pr o of of The or em 5.11 . In this case we hav e the random v ector U ( n ) ρ n from ( 6 ) con verging in dis- tribution to N ( ∆ ∆ ∆ , Σ ρ ) if nρ n → ρ > 0 or N ( ∆ ∆ ∆ , Σ) if nρ n → ∞ by Lemma 5.2 . W e first consider the case when nρ n → ρ > 0. Consider U ∼ N ( ∆ ∆ ∆ , Σ ρ ) and Y = Σ − 1 / 2 ρ U ∼ N ((Σ − 1 / 2 ρ ∆ ∆ ∆ , I d ). W e then kno w that Y | Y and the unpro jected statistic Q ( n ) ρ n ha v e the same asymptotic distribution. In order to use Lemma 5.12 , w e need to verify that Σ − 1 / 2 ρ Σ ρ Σ − 1 / 2 ρ  Σ − 1 / 2 ρ ∆ ∆ ∆  = Σ − 1 / 2 ρ ∆ ∆ ∆, which indeed holds. W e then consider the pro jected statistic Q Q Q ( n ) ρ n where nρ n → ρ > 0. Similar to the pro of of Theorem 5.6 , w e diagonalize Σ ρ = B D B | where B is an orthogonal matrix with one column b eing 1 /d · 1 1 1 and D is a diagonal matrix. W e then let U ∼ N ( ∆ ∆ ∆ , Σ ρ ) and let Y = Σ − 1 / 2 ρ P P P U 15 W e then hav e Y | Y and Q Q Q ( n ) ρ n will hav e the same asymptotic distribution. Recall that Λ = Σ − 1 / 2 ρ P P P Σ ρ P P P Σ − 1 / 2 ρ is idemp oten t with rank d − 1. Lastly , to apply Lemma 5.12 we need to sho w the following Λ  Σ − 1 / 2 ρ P P P ∆ ∆ ∆  = Σ − 1 / 2 ρ P P P ∆ ∆ ∆ . Let b B ∈ R d × ( d − 1) b e the same as matrix B whose corresponding column for 1 /d · 1 1 1 is missing, whic h w e assume to b e the last column of B . F urther, we define b D ∈ R ( d − 1) × ( d − 1) to b e the same as D without the last row and column. W e can then write P P P Σ ρ P P P = b B b D b B | . W e can then simplify the left hand side to ha ve Σ − 1 / 2 ρ P P P Σ ρ P P P Σ − 1 ρ P P P = B D − 1 / 2 B | P P P Σ ρ b B b D − 1 b B | = B D − 1 / 2 B | b B D B | b B b D − 1 b B | = B D − 1 / 2 B | b B D b D − 1 b B | = B D − 1 / 2 b B | = B D − 1 / 2 B | P P P = Σ − 1 / 2 ρ P P P The noncentral parameter is then ∆ ∆ ∆ | P P P Σ − 1 ρ P P P ∆ ∆ ∆ W e then note that P i ∆ i = 0. F or the case when nρ n → ∞ . F rom ( 13 ), w e hav e P P P Σ nρ n P P P → M ∞ , which can be diagonalized. As we show ed in Theorem 5.7 , w e ha ve  U ( n ) ρ n  | M ∞ U ( n ) ρ n =  U ( n ) ρ n  | Diag ( p 0 ) − 1 U ( n ) ρ n F rom Lemma 5.2 , we kno w that  U ( n ) ρ n  D → N ( ∆ ∆ ∆ , Σ). W e then write U ∼ N ( ∆ ∆ ∆ , Σ) so that our pro jected chi-square statistic has the same asymptotic distribution as U | Diag ( p 0 ) − 1 U whic h has a χ 2 d − 1 ( ∆ ∆ ∆ Diag ( p 0 ) − 1 ∆ ∆ ∆) distribution. Note that the noncentral parameters in the previous theorem are the same for b oth statistics and only the degrees of freedom are differen t. 5.5 Exp erimen ts for Go o dness of Fit T esting Throughout all of our experiments, w e will fix α = 0 . 05 and priv acy parameter ρ = 0 . 001. All of our tests are designed to ac hieve Type I error at most α , as we empirically sho w for different n ull hypotheses p 0 and sample size n in Figure 1 . W e include 1.96 times the standard error of our 100 , 000 indep endent trials (giving a 95% confidence interv al) for eac h sample size and eac h n ull h yp othesis. 16 3 4 5 6 7 8 9 0.046 0.047 0.048 0.049 0.050 0.051 0.052 0.053 T ype I Error for p0 = (1/4,1/4,1/4,1/4) ln(n) T ype I Error ProjGOF UnProjGOF 5 6 7 8 9 0.046 0.047 0.048 0.049 0.050 0.051 0.052 0.053 T ype I Error for p0 = (1/2,1/6,1/6,1/6) ln(n) T ype I Error ProjGOF UnProjGOF Figure 1: Empirical T yp e I Error for our new goo dness of fit tests in zCDP-GOF with error bars cor- resp onding to 1.96 times the standard error in 100 , 000 trials. W e set ρ = 0 . 001 whic h corresp onds to a v ariance of 1 , 000 for the additional noise to the coun ts due to priv acy . The horizontal line corresp onds to the target α = 0 . 05 T yp e I error that w e p ermit. 17 W e then empirically chec k the pow er of our new tests in zCDP-GOF for b oth the pro jected and unpro jected statistic. Sub ject to the constrain t that our tests ac hieve Type I error at most α , w e seek to maximize p ower , or the probability of rejecting the null h yp othesis when a distribution p 1 6 = p 0 , called the alternate hyp othesis , is true. W e exp ect to see the pro jected statistic ac hiev e higher p o wer than the unpro jected statistic due to Theorem 5.8 . F urther, the fact that the critical v alue w e use for the pro jected statistic is smaller than the critical v alue for the unpro jected statistic migh t lead to the pro jected statistic having higher p ow er. Here w e presen t a t ypical experimental scenario. F or our experiments, w e set the n ull h yp othesis p 0 = (1 / 2 , 1 / 6 , 1 / 6 , 1 / 6) and alternate h yp othesis p 1 = p 0 + 0 . 01 · (1 , − 1 / 3 , − 1 / 3 , − 1 / 3) for v arious sample sizes (we empirically found this to b e a tough alternative hypothesis for our statistics). F or eac h sample size n , we sample 5 , 000 indep endent datasets from the alternate h yp othesis and test H 0 : p = p 0 in zCDP-GOF . W e present the resulting pow er plots in Figure 2a for zCDP-GOF from Algorithm 1 . W e lab el “NonPriv ate” as the classical c hi-square go o dness of fit test used on the actual data (and thus not priv ate). F urther, w e write “Pro jGOF” as the test from zCDP-GOF with the pro jected statistic whereas “UnPro jGOF” uses the unpro jected statistic. Clearly in our results the pro jected outperforms the unpro jected statistic. W e then compare the pro jected and unpro jected statistic in zCDP-GOF to prior work in Figure 2b . Since the pro jected statistic outp erforms the other tests, we plot the difference in p ow er b etw een the pro jected statistic and the other tests. W e label “GLR V MCGOF GA USS” as the Mon te-Carlo (MC) based test with Gaussian noise from [ 12 ], 6 and “GLR V GOF Asympt” as the hypothesis test based on the asymptotic distribution with Gaussian noise from [ 12 , 24 ]. Note that the error bars sho w 1 . 96 times the standard error in the difference of prop ortions from 100 , 000 trials, giving a 95% confidence in terv al. 6 General Chi-Square Priv ate T ests W e no w consider the case where the null hypothesis con tains many distributions, so that the best fitting distribution must b e estimated and used in the test statistics. The data is m ultinomial X ( n ) ∼ Multinomial( n, p ( θ 0 )) and p is a function that con verts parameters in to a d -dimensional m ultinomial probabilit y vector. The n ull h yp othesis is H 0 : θ 0 ∈ Θ; i.e. p ( θ 0 ) belongs to a subset of a low er-dimensional manifold. W e again use Gaussian noise Z ∼ N (0 , 1 /ρ · I d ) to ensure ρ -zCDP , and we define U ( n ) ρ ( θ ) defn = √ n X ( n ) + Z n − p ( θ ) ! . (15) With θ 0 b eing the unkno wn true parameter, we are no w ready to define our tw o test statistics in terms of some function φ : R d → R , suc h that φ ( X ( n ) + Z ) P → θ 0 (recall from Section 4.2 that φ is a simple but possibly a suboptimal estimate of the true parameter θ 0 based on the noisy data) and the cov ariance matrix Σ ρ ( θ ) defn = Diag ( p ( θ )) − p ( θ ) p ( θ ) | + 1 /ρ · I d . W e define the unpr oje cte d statistic R ( n ) ρ ( θ ) as follo ws: c M defn =  Σ nρ  φ ( X ( n ) + Z )  − 1 6 W e set the the n umber of MC trials m = 59 in these exp eriments, which guarantees at most 5% Type I error. 18 0 10000 20000 30000 40000 0.0 0.2 0.4 0.6 0.8 1.0 GOF P ower Curves in 5000 T rials n power NonPrivate ProjGOF UnProjGOF (a) A comparison of p ow er for go o dness of fit testing b etw een the pro jected and unpro jected statistics in zCDP-GOF with the classical nonpri- v ate test for v arious n with 5 , 000 trials each. 5000 15000 25000 35000 −0.02 0.00 0.02 0.04 0.06 P ower Comparisons with Projected Statistic n Loss of P ower vs Projected Statistic GLR V_GOF_Asympt GLR V_MCGOF_GAUSS UnProjGOF (b) The empirical loss in p ow er from using other priv ate go o dness of fit tests instead of using the pro jected statistic in zCDP-GOF with error bars corresp onding to 1 . 96 times the standard error of eac h difference for 100 , 000 trials. Figure 2: Empirical p ow er results for our new go o dness of fit tests in zCDP-GOF with α = 0 . 05 and a comparisons to previous priv ate tests in [ 12 ]. W e use ρ = 0 . 001, which corresp onds to the v ariance of 1 , 000 for the additional noise to the coun ts. 19 R ( n ) ρ ( θ ) defn = U ( n ) ρ ( θ ) | c M U ( n ) ρ ( θ ) . (16) This is a s pecialization of ( 4 ) in Section 4.2 with the follo wing substitutions: V ( n ) =  X ( n ) + Z n  , A ( θ ) = p ( θ ), and M ( θ ) = (Σ nρ ( θ )) − 1 . F or the pr oje cte d statistic R R R ( n ) ρ ( θ ), the corresponding substitutions are P P P = I d − 1 d 1 1 1 1 1 1 | , V ( n ) = P P P ·  X ( n ) + Z n  , A ( θ ) = P P P · p ( θ ), and again M ( θ ) = (Σ nρ ( θ )) − 1 giving: R R R ( n ) ρ ( θ ) defn = U ( n ) ρ ( θ ) | · P P P c M P P P · U ( n ) ρ ( θ ) . (17) W e then assume that for b oth the pro jected and unpro jected statistic Assumption 4.1 holds using their relativ e vectors V ( n ) , A ( θ ), and matrix M ( θ ). W e now present the asymptotic distribution of b oth statistics, whic h is prov ed using the result in Theorem 4.4 . Theorem 6.1. Under H 0 : θ 0 ∈ Θ , the fol lowing ar e true as n → ∞ . Setting b θ ( n ) = arg min θ ∈ Θ R ( n ) ρ n ( θ ) we have R ( n ) ρ n ( b θ ( n ) ) D → χ 2 d − k if nρ n → ρ > 0 . F urthermor e, setting b θ ( n ) = arg min θ ∈ Θ R R R ( n ) ρ n ( θ ) we have R R R ( n ) ρ n ( b θ ( n ) ) D → χ 2 d − k − 1 if nρ n → ρ or nρ n → ∞ . Pr o of. T o prov e this result, w e appeal to Theorem 4.4 . F or the unpro jected statistic R ( n ) ρ ( · ) w e ha v e that C ( θ ) = Σ ρ ( θ ) and the middle matrix M ( θ ) is simply the inv erse of it, which satisfies the h yp otheses of Theorem 4.4 . F or the pro jected statistic R R R ( n ) nρ n ( · ), w e will write C ( θ ) = P P P · Σ nρ n ( θ ) · P P P , M ( θ ) = Σ − 1 nρ n ( θ ), and ˙ A ( θ ) = P P P · ∇ p ( θ ) ∈ R d × k . Note that C ( θ ) has rank d − 1 for all θ ∈ Θ in a neighborho o d of θ 0 and all n . W e will now show that we can satisfy the hypotheses in Theorem 4.4 with these matrices, i.e. we show the following t w o equalities hold for all θ ∈ Θ C ( θ ) · M ( θ ) · C ( θ ) = C ( θ ) & C ( θ ) · M ( θ ) · ˙ A ( θ ) = ˙ A ( θ ) . W e first fo cus on proving the first equality C ( θ ) · M ( θ ) · C ( θ ) = C ( θ ). F rom ( 12 ), we can simplify the left hand side of the equality significantly b y rewriting it as P P P Σ nρ n ( θ ) P P P − nρ n d P P P Σ nρ n ( θ ) 1 1 1 1 1 1 | Σ nρ n ( θ ) P P P W e no w show that P P P Σ nρ n ( θ ) 1 1 1 1 1 1 | Σ nρ n ( θ ) = 0 for all n , which would pro ve this equality . Note that Σ nρ n ( θ ) is symmetric and has eigen v ector 1 1 1 with eigen v alue 1 nρ n . Thus, P P P Σ ρ ( θ ) 1 1 1 1 1 1 | Σ nρ n ( θ ) = 1 n 2 ρ 2 n P P P 1 1 1 1 1 1 | = 0 ∀ n. W e no w prov e the second equalit y C ( θ ) · M ( θ ) · ˙ A ( θ ) = ˙ A ( θ ). W e again use ( 12 ) to simplify the left hand side of the equalit y: P P P Σ nρ n ( θ ) h Σ − 1 nρ n ( θ ) − nρ n d · 1 1 1 1 1 1 | i ∇ p ( θ ) = P P P h I d − nρ n d · Σ nρ n ( θ ) 1 1 1 1 1 1 | i ∇ p ( θ ) = P P P P P P ∇ p ( θ ) = P P P ∇ p ( θ ) . This completes the pro of for b oth cases nρ n → ρ > 0 and nρ n → ∞ . 20 Again, the pro jected statistic has the same distribution under b oth priv ate asymptotic regimes and matches the non-priv ate chi-square test asymptotics. W e present our more general test zCDP-Min- χ 2 in Algorithm 2 . The quic k-and-dirty estimator φ ( · ) is application-specific (Section 6.1 gives indep endence testing as an example). 7 F urther, for neigh b oring histogram data, we hav e the following priv acy guarantee. Theorem 6.2. zCDP-Min- χ 2 ( · ; ρ, α, φ, Θ) is ρ -zCDP. Algorithm 2 zCDP General Chi-Square T est pro cedure zCDP-Min- χ 2 ( X ( n ) ; ρ , α , φ , H 0 : θ 0 ∈ Θ) Set ˜ X ( n ) ← X ( n ) + Z where Z ∼ N (0 , 1 /ρ · I d ). Set c M = Σ nρ  φ ( ˜ X ( n ) )  − 1 F or the unpr oje cte d statistic : T( θ ) = 1 n  ˜ X ( n ) − n p ( θ )  | c M  ˜ X ( n ) − n p ( θ )  Set b θ ( n ) = argmin θ ∈ Θ T( θ ) t ← (1 − α ) quan tile of χ 2 d − k F or the pr oje cte d statistic : T( θ ) = 1 n  ˜ X ( n ) − n p ( θ )  | P P P c M P P P  ˜ X ( n ) − n p ( θ )  Set b θ ( n ) = argmin θ ∈ Θ T( θ ) t ← (1 − α ) quan tile of χ 2 d − k − 1 if T( b θ ( n ) ) > t then Reject 6.1 Application - Indep endence T est W e show case our general chi-square test zCDP-Min- χ 2 b y giving results for indep endence testing. Conceptually , it is con v enien t to think of the data histogram as an r × c table, with p i,j b eing the probabilit y a person is in the buck et in row i and column j . W e then consider t wo m ultinomial random v ariables Y ∼ Multinomial(1 , π (1) ) for π (1) ∈ R r (the marginal row probability vector) and Y 0 ∼ Multinomial(1 , π (2) ) for π (2) ∈ R c (the marginal column probability v ector). Under the null h yp othesis of independence b et w een Y and Y 0 , p i,j = π (1) i π (2) j . Generally , w e write the probabilities as p ( π (1) , π (2) ) = π (1)  π (2)  | so that X ( n ) ∼ Multinomial  n, p ( π (1) , π (2) )  . Th us we ha v e the underlying parameter v ector θ 0 =  π (1) 1 , · · · , π (1) r − 1 , π (2) 1 , · · · , π (2) c − 1  - we do not need the last comp onent of π (1) or π (2) b ecause w e know that eac h m ust sum to 1. Also, w e hav e 7 F or go o dness-of-fit testing, φ alwa ys returns p 0 and k = 0 so zCDP-Min- χ 2 is a generalization of zCDP-GOF . 21 d = r c and k = ( r − 1) + ( c − 1) in this case. W e wan t to test whether Y is indep endent of Y 0 . F or our data, we are giv en a collection of n indep endent trials of Y and Y 0 . W e then coun t the n um b er of joint outcomes in a con tingency table giv en in T able 1 . Each cell in the con tingency table contains element X ( n ) i,j that gives the n um b er of o ccurrences of Y i = 1 and Y 0 j = 1. Since our test statistics notationally treat the data as a vector, when needed, we conv ert X ( n ) to a v ector that go es from left to righ t along eac h ro w of the table. T able 1: Contingency T able. Y \ Y 0 1 2 · · · c Marginals 1 X ( n ) 1 , 1 X ( n ) 1 , 2 · · · X ( n ) 1 ,c X ( n ) 1 , · 2 X ( n ) 2 , 1 X ( n ) 2 , 2 · · · X ( n ) 2 ,c X ( n ) 2 , · . . . . . . . . . . . . . . . . . . r X ( n ) r, 1 X ( n ) r, 2 · · · X ( n ) r,c X ( n ) r, · Marginals X ( n ) · , 1 X ( n ) · , 2 · · · X ( n ) · ,c n In order to compute the statistic R ( n ) ρ ( b θ ( n ) ) or R R R ( n ) ρ ( b θ ( n ) ) in zCDP-Min- χ 2 , w e need to find a quic k-and-dirt y estimator φ ( X ( n ) + Z ) that con verges in probabilit y to p  π (1) , π (2)  as n → ∞ . W e will use the estimator for the unknown probabilit y vector based on the marginals of the table with noisy coun ts, so that for na ¨ ıv e estimates ˜ π (1) i = X ( n ) i, · + Z i, · ˜ n , ˜ π (2) j = X ( n ) · ,j + Z · ,j ˜ n where ˜ n = n + P i,j Z i,j w e hav e 8 φ  X ( n ) + Z  =  ˜ π (1) 1 , · · · , ˜ π (1) r − 1 , ˜ π (2) 1 , · · · , ˜ π (2) c − 1  (18) Note that as n → ∞ , the marginals conv erge in probabilit y to the true probabilities even for Z ∼ N (0 , 1 /ρ n · I rc ) with ρ n = ω (1 /n 2 ), i.e. we ha ve that ˜ π (1) i P → π (1) i and ˜ π (2) j P → π ( j ) j for all i ∈ [ r ] and j ∈ [ c ]. Recall that in Theorem 6.1 , in order to guarantee the correct asymptotic distribution w e require the nρ n → ρ > 0, or in the case of the pro jected statistic, w e need ρ n = Ω(1 /n ). Th us, Theorem 6.1 imposes more restrictive settings of ρ n for the unpro jected statistic than what w e need in order for the na ¨ ıve estimate to con v erge to the true underlying probabilit y . F or the pro jected statistic, we only need ρ n = ω (1 /n ) to satisfy the conditions in Theorem 6.1 and for φ ( X ( n ) + Z ) P → p  π (1) , π (2)  . W e then use this statistic φ ( X ( n ) + Z ) in our unpro jected and pro jected statistic in zCDP-Min- χ 2 to hav e a ρ -zCDP hypothesis test for indep endence b et w een tw o categorical v ariables. Note that in this setting, the pro jected statistic has a χ 2 ( r − 1)( c − 1) distribution, which is exactly the same asymptotic distribution used in the classical Pearson chi-square indep endence test. F or our results we will again fix α = 0 . 05 and ρ = 0 . 001. In Figure 3 w e giv e the empirical T yp e I error for our indep endence tests given in zCDP-Min- χ 2 for b oth the pro jected and unpro jected statistic in 100 , 000 trials for v arious n and data distributions. W e note that for small sample sizes w e are ac hieving m uch smaller Type I Errors than the target α due to the fact that sometimes the noise forces us to hav e small exp ected counts ( < 5 in an y cell) in the contingency table based on the noisy coun ts, in which case our tests are inconclusiv e and fail to reject H 0 . 8 W e note that in the case of small sample sizes, we follo w a common rule of thum b where if any of the expected cell counts are less than 5, i.e. if n ˜ π (1) i ˜ π (2) j ≤ 5 for any ( i, j ) ∈ [ r ] × [ c ], then we do not make any conclusion. 22 5 6 7 8 9 0.01 0.02 0.03 0.04 0.05 T ype I Error for pi1 = (1/2,1/2), pi2 = (1/2,1/2) ln(n) T ype I Error ProjIND UnProjIND 5 6 7 8 9 0.01 0.02 0.03 0.04 0.05 T ype I Error for pi1 = (2/3,1/3), pi2 = (1/2,1/2) ln(n) T ype I Error ProjIND UnProjIND Figure 3: Empirical T yp e I Error for our new indep endence tests in zCDP-Min- χ 2 with 1.96 times the standard error in 100 , 000 trials. W e set ρ = 0 . 001 which corresponds to v ariance 1 , 000 due to noise in each cell coun t. It is desired to hav e T yp e I error at most α = 0 . 05, which is giv en as the horizon tal line. 23 20000 40000 60000 80000 0.0 0.2 0.4 0.6 0.8 1.0 Independence T est Po wer Curves in 5000 T rials size n power NonPrivate ProjIND UnProjIND (a) A comparison of p ow er for indep endence testing b etw een the pro jected and unpro jected statistics in zCDP-Min- χ 2 with the classical non- priv ate test. W e set α = 0 . 05, ρ = 0 . 001 and 5 , 000 to b e the num b er of trials. 10000 20000 30000 40000 50000 60000 −0.02 0.00 0.02 0.04 0.06 0.08 0.10 P ower Comparisons with Projected Statistic n Loss of P ower vs Projected Statistic GLR V_IND_Asympt GLR V_MCIND_GAUSS UnProjIND n (b) The empirical p o wer loss from using other priv ate independence tests instead of using the pro jected statistic in zCDP-Min- χ 2 with error bars corresp onding to 1 . 96 times the standard error of each difference in 50 , 000 trials. W e set α = 0 . 05 and ρ = 0 . 001. Figure 4: Empirical p o w er results for our new indep endence tests in zCDP-Min- χ 2 and a comparisons to previous priv ate tests in [ 12 ] W e then compare the pow er zCDP-Min- χ 2 ac hiev es for either of our test statistics. As a sample of our experiments, w e set r = c = 2 and π (1) = (2 / 3 , 1 / 3) , π (2) = (1 / 2 , 1 / 2). W e then sample our con tingency table X ( n ) from Multinomial( n, p ( π (1) , π (2) ) + ∆ ∆ ∆) where ∆ ∆ ∆ = 0 . 01 · (1 , 0 , − 1 , 0), so that the null hypothesis is indeed false and should b e rejected. W e giv e the empirical pow er of zCDP-Min- χ 2 in Figure 4a using b oth the unpro jected R ( n ) ρ ( b θ ( n ) ) from ( 16 ) and pro jected statistic R R R ( n ) ρ ( b θ ( n ) ) from ( 17 ) for 5 , 000 indep endent trials and v arious sample sizes n . Note that again we pic k b θ ( n ) from Theorem 4.3 relative to the statistic w e use. W e lab el “NonPriv ate” as the classical P earson c hi-square test used on the actual data and“Pro jIND” as the test from zCDP-Min- χ 2 with the pro jected statistic whereas “UnPro jIND” uses the unpro jected statistic. The pro jected statistic again outp erforms prior w ork, so in Figure 4b , we plot the difference in p o w er b etw een the pro jected statistic in zCDP-Min- χ 2 and the comp etitors (the unpro jected statis- tic and indep endence tests from [ 12 ]) in 50 , 000 trials. Note that we lab el “GLR V MCIND GAUSS” as the MC based independence test with Gaussian noise and “GLR V IND Asympt” as the h yp oth- esis test based on the asymptotic distribution from [ 12 ]. 24 6.2 Application - GW AS T esting W e next turn to demonstrating that our new general class of priv ate hypothesis tests for categorical data significan tly improv es on existing priv ate hypothesis tests even when extra structure is assumed ab out the dataset. Sp ecifically , w e will b e in terested in GW AS data, whic h was the primary reason for wh y hypothesis tests for indep endence should b e made priv ate [ 13 ]. W e will then assume that r = 3 and c = 2 and the data is ev enly split betw een the t wo columns - as is the case in a con trol and case group. F or such tables, w e can directly compute the sensitivity of the classical c hi-square statistic q ( · ): q ( X ( n ) ) = 3 X i =1 2 X j =1 n ·  X ( n ) i,j − X ( n ) · ,j · X ( n ) i, · n  2 X ( n ) · ,j · X ( n ) i, · Lemma 6.3 ([ 22 , 27 ]) . The ` 1 and ` 2 glob al sensitivity of the chi-squar e statistic q ( · ) b ase d on a 3 × 2 c ontingency table with p ositive mar gins and n/ 2 c ases and n/ 2 c ontr ols is ∆( q ) = 4 n/ ( n + 2) . Hence, a different approac h for a priv ate independence test is to add Gaussian noise with v ariance σ 2 = ∆ 2 ( q ) 2 2 ρ to the statistic q ( · ) itself, whic h w e call output p erturb ation . Our statistic is then simply Gaussian mechanism M Gauss for statistic q . W e then compare the priv ate statistic v alue with the distribution of T Gauss ( n, ρ ) = χ 2 2 + N  0 , σ 2  where the degrees of freedom is 2 because w e ha ve ( r − 1) · ( c − 1) = 2. Th us, giv en a T yp e I error of at most α , we then set our critical v alue as τ Gauss ( α ; n, ρ ) where Pr [ T Gauss ( n, ρ ) > τ Gauss ( α ; n, ρ )] = α Hence, if M Gauss ( X ( n ) ) for the statistic q is larger than τ Gauss ( α ; n, ρ ) then we reject the null h yp othesis. F or our exp erimen ts, we again set ρ = 0 . 001 and α = 0 . 05. W e fix the probabilit y v ec- tor (1 / 3 , 1 / 3 , 1 / 3) o ver the 3 ro ws in the first column whereas in the second column w e set (1 / 2 , 1 / 4 , 1 / 4), therefore the case and control groups do not pro duce the same outcomes. In Fig- ure 5 , we sho w a comparison in the p ow er b etw een our test with the pro jected statistic, which assumes no structure on the data, and the output p erturbation test, which crucially relies on the fact that the data is evenly split b etw een the case and con trol groups. W e lab el “Pro jIND” and “UnPro jIND” as the tests from zCDP-Min- χ 2 with the pro jected statistic and unpro jected statis- tic, resp ectively . F urther, we lab el “YFSU Gauss” as the output p erturbation tests for Gaussian noise proposed in [ 27 ]. Note that our new proposed test do es significan tly b etter than the output p erturbation test, sometimes requiring 5 times more samples to achiev e the same level of pow er than for our pro jected statistic test. 7 General Chi-Square T ests with Arbitrary Noise Distributions W e next sho w that we can apply our testing framework in Algorithm 2 for any type of noise distribution w e w an t to include for priv acy concerns. F or example, we consider adding Laplace noise rather than Gaussian noise if our priv acy b enchmark w ere (pure) differential priv acy (DP). In this case, w e add Laplace noise with v ariance 8 / 2 when computing the t wo statistics R ( n )  2 / 8 ( b θ ( n ) ) from ( 16 ) and R R R ( n )  2 / 8 ( b θ ( n ) ) from ( 17 ) so that the resulting tests will b e  -DP and hence  2 2 -zCDP 25 2000 4000 6000 8000 10000 0.0 0.2 0.4 0.6 0.8 1.0 GW AS Test P ower Comparisons in 5000 T rials n power ProjIND UnProjIND YSFU_Gauss Figure 5: A comparison of pow er b etw een differen t hypothesis tests for independence testing for GW AS t yp e datasets where the data is publicly kno wn to b e ev enly split b et w een the t wo columns and there are three ro ws in the contingency table. from Theorem 3.5 . Note that the resulting asymptotic distribution in this case will not b e c hi- square when w e use noise other than Gaussian. W e will then rely on Mon te Carlo (MC) sampling to find the critical v alue in whic h to reject the null hypothesis. W e give the MC based test which adds indep enden t Laplace noise with v ariance 8 / 2 in Algorithm 3 and is thus  -DP , but an y noise distribution can be used where we replace the parameter 1 /ρ in the tw o statistics to b e the v ariance of the noise that is added to eac h coun t. In fact, Gaussian noise can b e used in this framew ork although the asymptotic distribution seems to do w ell in practice ev en for small sample sizes. 7.1 Application - Go o dness of Fit T esting W e first sho w that w e can use the general chi-square test DP-MC-MIN with  -DP whic h uses Laplace noise in Algorithm 3 for go o dness of fit testing H 0 : p = p 0 . In this case w e select p ( b θ ( n ) ) = p 0 and φ ( X ( n ) + Z ) = p 0 in both the unpro jected and pro jected statistics. F rom the wa y that we ha v e selected the critical v alue τ ( α ,  ) in Algorithm 3 , we ha v e the follo wing result on T yp e I error, whic h follows directly from Theorem 5.3 in [ 12 ]. Theorem 7.1. When the numb er of indep endent samples m we cho ose for our MC sampling is lar ger than 1 /α , testing H 0 : p = p 0 in A lgorithm 3 guar ante es T yp e I err or at most α . W e then fo cus on empirically c hec king the p o w er of DP-MC-MIN with α = 0 . 05 for the dif- feren t statistics. As w e did in the previous exp erimen ts, we will set the n ull hypothesis p 0 = (1 / 2 , 1 / 6 , 1 / 6 , 1 / 6) and alternate h yp othesis p 1 = p 0 + 0 . 01 · (1 , − 1 / 3 , − 1 / 3 , − 1 / 3) for v arious sam- ple sizes. W e set the priv acy parameter  = √ 2 · . 001 ≈ 0 . 045, whic h implies ( ρ = 0 . 001)-zCDP due 26 Algorithm 3 DP Minimum Chi-Square T est using MC pro cedure DP-MC-MIN (Histogram data X ( n ) = ( X ( n ) 1 , · · · , X ( n ) d );  , α , H 0 : θ 0 ∈ Θ, m trials) Set ˜ X ( n ) ← X ( n ) + Z where Z = ( Z 1 , · · · , Z d ), where Z i ∼ Lap(2 / ). Set c M = Σ nρ  φ ( ˜ X ( n ) )  − 1 F or the unpr oje cte d statistic : T( θ ) = 1 n  ˜ X ( n ) − n p ( θ )  | c M  ˜ X ( n ) − n p ( θ )  Set b θ ( n ) = argmin θ ∈ Θ T( θ ) Sample { r 1 , · · · , r m } as m samples from the distribution of T( b θ ( n ) ). Set τ ( α,  ) to b e the d ( m + 1)(1 − α ) e -largest v alue of { r 1 , · · · , r m } . F or the pr oje cte d statistic : T( θ ) = 1 n  ˜ X ( n ) − n p ( θ )  | P P P c M P P P  ˜ X ( n ) − n p ( θ )  Set b θ ( n ) = argmin θ ∈ Θ T( θ ) Sample { r 1 , · · · , r m } as m samples from the distribution of T( b θ ( n ) ). Set τ ( α,  ) to b e the d ( m + 1)(1 − α ) e -largest v alue of { r 1 , · · · , r m } . if T( b θ ( n ) ) > τ ( α,  ) then Reject to Theorem 3.5 . W e set the n umber of indep enden t samples w e dra w from the distribution of the statistic under the null h yp othesis as m = 59. In Figure 6a , we compare the p o w er of the pro jected and unpro jected statistitic in DP-MC-GOF , lab eled “Pro jGOF LAP” and “UnPro jGOF LAP” re- sp ectiv ely , with the classical non-priv ate c hi-square test for v arious n each with 5 , 000 trials. Note that there is a drastically larger p ow er when we use the pro jected statistic as opp osed to the unpro jected statistic. W e then sho w that the pro jected statistic using Laplace noise ac hiev es significantly higher pow er than using the other unpro jected test statistic as well as previous DP h yp othesis tests with Laplace noise from [ 12 ]. W e then lab el “GLR V MCGOF LAP” as the MC based test with Laplace noise from [ 12 ] and plot the p o wer loss in Figure 6b that the other DP go o dness of fit tests suffer when compared to the p o wer that DP-MC-GOF ac hieves with the pro jected statistic. Note that the error bars in the figure show 1 . 96 times the standard error in the difference of proportions from 100 , 000 trials. 7.2 Application - Indep endence T esting W e then apply our general framew ork to indep endence testing as in Section 6.1 . Unlik e our go o dness of fit testing, we are not guaranteed to ha v e Type I error at most α when w e hav e comp osite tests, e.g. indep endence testing, in DP-MC-MIN b ecause we are not sampling from the exact data distribution. W e then empirically sho w the Type I error is at most the desired lev el α = 0 . 05. W e again fix  = √ 2 ∗ 0 . 001 ≈ 0 . 045, which ensures  -DP as w ell as ( ρ = 0 . 001)-zCDP due to Theorem 3.5 . W e will use m = 59 samples in all of our MC testing. W e then giv e the empirical Type I error for 27 0 10000 20000 30000 40000 0.0 0.2 0.4 0.6 0.8 1.0 GOF P ower Curves in 5000 T rials with Laplace n power NonPrivate ProjGOF_LAP UnProjGOF_LAP (a) A comparison of p ow er for go o dness of fit testing b etw een the pro jected and unpro jected statistics in DP-MC-MIN with the classical nonpri- v ate test for v arious n with 5 , 000 trials each. 5000 15000 25000 35000 0.00 0.05 0.10 0.15 P ower Comparisons with Projected Statistic n Loss of P ower vs Projected Statistic GLR V_MCGOF_LAP NonProjGOF_LAP (b) The empirical loss in p ow er from using other priv ate go o dness of fit tests instead of using the pro jected statistic in DP-MC-GOF . The error bars corresp ond to 1 . 96 times the standard error of eac h difference for 100 , 000 trials. Figure 6: Empirical p o w er results for our new DP goo dness of fit tests in DP-MC-MIN and a compar- isons to a previous priv ate test in [ 12 ] that uses Laplace noise with v ariance 4 , 000 added to each cell count. 28 5 6 7 8 9 0.00 0.01 0.02 0.03 0.04 0.05 0.06 DP T ype I Error for pi1 = (1/2,1/2), pi2 = (1/2,1/2) ln(n) T ype I Error ProjIND_LAP UnProjIND_LAP 5 6 7 8 9 0.00 0.01 0.02 0.03 0.04 0.05 0.06 DP T ype I Error for pi1 = (2/3,1/3), pi2 = (1/2,1/2) ln(n) T ype I Error ProjIND_LAP UnProjIND_LAP Figure 7: Empirical T yp e I Error for the new DP indep endence tests in DP-MC-MIN with 1.96 times the standard error in 10 , 000 trials. W e set  = √ . 001 ∗ 2 ≈ 0 . 045 which corresp onds to v ariance 1 , 000 due to noise for each cell coun t. It is desired to hav e Type I error at most α = 0 . 05, whic h is given as the horizon tal line. v arious n and data distributions in Figure 7 . Note that w e use the same rule of th um b as b efore where if our na ¨ ıve estimate for the probability distribution pro duces exp ected cell counts smaller than 5, then our test in inconclusive and fails to reject. This is why in our exp eriments, the T yp e I error is close to zero for small sample sizes. W e also consider the p ow er of our tests in DP-MC-MIN . As we did before, we will set the data distribution with π (1) = (2 / 3 , 1 / 3) , π (2) = (1 / 2 , 1 / 2). W e then sample our contingency table X ( n ) from Multinomial( n, p ( π (1) , π (2) ) + ∆ ∆ ∆) where ∆ ∆ ∆ = 0 . 01 · (1 , 0 , − 1 , 0) for v arious sample sizes. In Figure 8a , we compare the p ow er of the pro jected and unpro jected statistic in DP-MC-MIN , labeled “Pro jIND LAP” and “UnPro jIND LAP” respectively , with the classical non-priv ate chi-square test for 1 , 000 trials. W e then label “GLR V MCIND LAP” as the MC based test with Laplace noise from [ 12 ]. Note that the error bars sho w 1 . 96 times the standard error in the difference of proportions from 10 , 000 trials, giving a 95% confidence in terv al. 8 Conclusions W e hav e demonstrated a new broad class of priv ate h yp othesis tests zCDP-Min- χ 2 for categorical data based on the minim um c hi-square theory . W e ga v e tw o statistics ( unpr oje cte d and pr oje cte d ) 29 20000 40000 60000 80000 0.0 0.2 0.4 0.6 0.8 1.0 Ind P ower Curves in 5000 T rials with Laplace size n power NonPrivate ProjIND_LAP UnProjIND_LAP (a) A comparison of p ow er for indep endence testing b etw een the pro jected and unpro jected statistics in DP-MC-MIN with the classical nonpri- v ate test. 10000 20000 30000 40000 50000 60000 0.00 0.05 0.10 0.15 P ower Comparisons with Projected Statistic n Loss of P ower vs Projected Statistic GLR V_MCIND_LAP UnProjIND_LAP (b) The empirical loss in p ow er from using other priv ate go o dness of fit tests instead of using the pro jected statistic in DP-MC-MIN with error bars corresp onding to 1 . 96 times the standard error of eac h difference for 10 , 000 trials. Figure 8: Empirical pow er results for our new DP independence tests in DP-MC-MIN and a compar- isons to a previous priv ate test in [ 12 ] that uses Laplace noise. 30 that conv erge to a chi-square distribution when w e use Gaussian noise and thus lead to zCDP h yp othesis tests. Unlike prior w ork, these statistics ha ve the same asymptotic distributions in the priv ate asymptotic regime as the classical c hi-square tests ha ve in the classical asymptotic regime. Our sim ulations sho w that with either the unpro jected or pro jected statistic our tests achiev e at most α T yp e I error. W e then empirically sho wed that our tests using the pro jected statistic significan tly impro v es the T yp e II error when compared to the unpro jected statistic and previous priv ate hypothesis tests from [ 12 ]. F urther, our new tests give comparable p o w er to the classical (nonpriv ate) c hi-square tests. W e then gav e further applications of our new statistics to GW AS data and how we can incorp orate other noise distributions (e.g. Laplace) using an MC sampling approac h. 31 References [1] R. Bassily , K. Nissim, A. D. Smith, T. Steinke, U. Stemmer, and J. Ullman. Algorithmic sta- bilit y for adaptiv e data analysis. In Pr o c e e dings of the 48th A nnual A CM SIGA CT Symp osium on The ory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21 , pages 1046–1059, 2016. [2] Y. M. M. Bishop, S. E. Fien b erg, and P . W. Holland. Discrete multiv ariate analysis: Theory and practice, 1975. [3] M. Bun and T. Steinke. Concen trated differential priv acy: Simplifications, extensions, and lo w er b ounds. ArXiv e-prints , Ma y 2016. [4] R. Cummings, K. Ligett, K. Nissim, A. Roth, and Z. S. W u. Adaptiv e learning with robust generalization guaran tees. In Pr o c e e dings of the 29th Confer enc e on L e arning The ory, COL T 2016, New Y ork, USA, June 23-26, 2016 , pages 772–814, 2016. [5] C. Dwork and G. N. Rothblum. Concentrated differen tial priv acy . CoRR , abs/1603.01887, 2016. [6] C. Dwork, K. Kenthapadi, F. McSherry , I. Mirono v, and M. Naor. Our data, ourselv es: Priv acy via distributed noise generation. In Pr o c e e dings of the 24th Annual International Confer enc e on The The ory and Applic ations of Crypto gr aphic T e chniques , EUR OCR YPT’06, pages 486–503, Berlin, Heidelb erg, 2006. Springer-V erlag. [7] C. Dwork, F. McSherry , K. Nissim, and A. Smith. Calibrating noise to sensitivity in priv ate data analysis. In TCC ’06 , pages 265–284, 2006. [8] C. Dw ork, V. F eldman, M. Hardt, T. Pitassi, O. Reingold, and A. Roth. Preserving statistical v alidit y in adaptiv e data analysis. In STOC , 2015. [9] C. Dw ork, V. F eldman, M. Hardt, T. Pitassi, O. Reingold, and A. Roth. Generalization in adaptive data analysis and holdout reuse. In A dvanc es in Neur al Information Pr o c essing Systems , 2015. [10] T. F erguson. A Course in L ar ge Sample The ory . Chapman & Hall T exts in Statistical Science Series. T aylor & F rancis, 1996. ISBN 9780412043710. [11] S. E. Fien b erg, A. Rinaldo, and X. Y ang. Differential priv acy and the risk-utilit y tradeoff for m ulti-dimensional con tingency tables. In Pr o c e e dings of the 2010 International Confer enc e on Privacy in Statistic al Datab ases , PSD’10, pages 187–199, Berlin, Heidelberg, 2010. Springer- V erlag. [12] M. Gab oardi, H. Lim, R. M. Rogers, and S. P . V adhan. Differentially priv ate chi-squared h yp othesis testing: Go o dness of fit and indep endence testing. In Pr o c e e dings of the 33r d International Confer enc e on Machine L e arning, ICML 2016, New Y ork City, NY, USA, June 19-24, 2016 , pages 2111–2120, 2016. [13] N. Homer, S. Szelinger, M. Redman, D. Duggan, W. T embe, J. Muehling, J. V. P earson, D. A. Stephan, S. F. Nelson, and D. W. Craig. Resolving individuals con tributing trace amounts of 32 dna to highly complex mixtures using high-densit y snp genot yping microarrays. PL oS Genet , 4(8), 08 2008. [14] M. F. D. John G. Reid. An accessible pro of of craig’s theorem in the noncentral case. The A meric an Statistician , 42(2):139–142, 1988. ISSN 00031305. URL http://www.jstor.org/ stable/2684489 . [15] A. Johnson and V. Shmatik o v. Priv acy-preserving data exploration in genome-wide asso ciation studies. In Pr o c e e dings of the 19th ACM SIGKDD International Confer enc e on Know le dge Disc overy and Data Mining , KDD ’13, pages 1079–1087, New Y ork, NY, USA, 2013. ACM. [16] V. Karw a and A. Sla vko vi´ c. Differentially priv ate graphical degree sequences and syn thetic graphs. In J. Domingo-F errer and I. Tinnirello, editors, Privacy in Statistic al Datab ases , v olume 7556 of L e ctur e Notes in Computer Scienc e , pages 273–285. Springer Berlin Heidelb erg, 2012. [17] V. Karw a and A. Sla vk o vi ´ c. Inference using noisy degrees: Differentially priv ate β -mo del and syn thetic graphs. Ann. Statist. , 44(P1):87–112, 02 2016. [18] R. Rogers, A. Roth, A. Smith, and O. Thakk ar. Max-information, differen tial priv acy , and p ost-selection hypothesis testing. In Pr o c e e dings of the 57th Annual IEEE Symp osium on F oundations of Computer Scienc e, New Brunswick, NJ, USA, Octob er 9 - 11 , pages 487–494, 2016. [19] O. Sheffet. Differentially priv ate least squares: Estimation, confidence and rejecting the null h yp othesis. arXiv pr eprint arXiv:1507.02482 , 2015. [20] S. Simmons, C. Sahinalp, and B. Berger. Enabling priv acy-preserving { GW ASs } in heteroge- neous human p opulations. Cel l Systems , 3(1):54 – 61, 2016. [21] A. Smith. Priv acy-preserving statistical estimation with optimal con vergence rates. In Pr o c e e d- ings of the F orty-thir d Annual ACM Symp osium on The ory of Computing , STOC ’11, pages 813–822, New Y ork, NY, USA, 2011. ACM. [22] C. Uhler, A. Slavk ovic, and S. E. Fienberg. Priv acy-preserving data sharing for genome-wide asso ciation studies. Journal of Privacy and Confidentiality , 5(1), 2013. [23] D. V u and A. Slavk ovi ´ c. Differential priv acy for clinical trial data: Preliminary ev aluations. In Pr o c e e dings of the 2009 IEEE International Confer enc e on Data Mining Workshops , ICDMW ’09, pages 138–143, W ashington, DC, USA, 2009. IEEE Computer Society . [24] Y. W ang, J. Lee, and D. Kifer. Differen tially priv ate hypothesis testing, revisited. CoRR , abs/1511.03376, 2015. [25] L. W asserman and S. Zhou. A statistical framew ork for differen tial priv acy . Journal of the A meric an Statistic al Asso ciation , 105(489):375–389, 2010. [26] M. A. W o o dbury . Inverting Mo difie d Matric es . Number 42 in Statistical Researc h Group Memorandum Rep orts. Princeton Universit y , Princeton, NJ, 1950. 33 [27] F. Y u, S. E. Fienberg, A. B. Slavk ovi ´ c, and C. Uhler. Scalable priv acy-preserving data sharing metho dology for genome-wide asso ciation studies. Journal of Biome dic al Informatics , 50:133– 141, 2014. 34 A Pro ofs for Section 4.2 Pr o of of The or em 4.3 . Since φ ( V ( n ) ) con v erges in probabilit y to θ 0 and M ( · ) is a contin uous map- ping, then for an y b > 0 , c > 0 there exists an n 0 suc h that when n ≥ n 0 then M ( φ ( V ( n ) )) is within a distance b from M ( θ 0 ) with probabilit y at least 1 − c , which mak es M ( φ ( V ( n ) )) p ositive definite with high probability for sufficiently large n . F urthermore, for any d > 0, we can choose n large enough so that the smallest eigen v alue of M ( φ ( V ( n ) )) is at least γ − d . Since the parameter space is compact, we kno w a minimizer exists for R ( n ) ( θ ). T ogether, this implies that for sufficiently large n and with high probabilit y b D ( n ) ( b θ ( n ) ) ≥ 0. Also, b D ( n ) ( b θ ( n ) ) ≤ b D ( n ) ( θ 0 ) but b D ( n ) ( θ 0 ) /n P → 0 since M ( φ ( V ( n ) )) P → M and V ( n ) P → A . Thus b D ( n ) ( b θ ( n ) ) /n P → 0 which means V ( n ) − A ( b θ ( n ) ) P → 0 (since M ( φ ( V ( n ) )) is p ositive definite with high probabilit y and uniformly b ounded a w a y from 0 in a neighborho o d of θ 0 ). This implies that A ( b θ ( n ) ) P → A and so b θ ( n ) P → θ 0 since A ( θ ) is bicontin uous by assumption. Th us, with high probability (e.g., ≥ 1 − c for large enough n ), b θ ( n ) satisfies the first order optimalit y condition ∇ b D ( n ) ( b θ ( n ) ) = 0. This is the same as ˙ A ( b θ ( n ) ) | M ( φ ( V ( n ) )( V ( n ) − A ( b θ ( n ) )) = 0 (19) Expanding A ( b θ ( n ) ) around θ 0 . A ( b θ ( n ) ) = A ( θ 0 ) + Z 1 0 ˙ A ( θ 0 + t ( b θ ( n ) − θ 0 )) dt | {z } ≡ B ( b θ ( n ) ) ( b θ ( n ) − θ 0 ) (20) Substituting ( 20 ) into ( 19 ), we get: ˙ A ( b θ ( n ) ) | M ( φ ( V ( n ) ))  ( V ( n ) − A ( θ 0 ) − B ( b θ ( n ) )( b θ ( n ) − θ 0 )  = 0 (21) ˙ A ( b θ ( n ) ) | M ( φ ( V ( n ) )) B ( b θ ( n ) ) √ n ( b θ ( n ) − θ 0 ) = ˙ A ( b θ ( n ) ) | M ( φ ( V ( n ) )) √ n ( V ( n ) − p ( θ 0 )) (22) No w, by the contin uity of ˙ A ( · ) and the definition of B and the con v ergence in probabilit y of b θ ( n ) to θ 0 , w e ha v e B ( b θ ( n ) ) P → ˙ A ( θ 0 ). Since ˙ A ( θ ) has full rank by assumption, then for sufficiently large n , B ( b θ ( n ) ) has full rank with high probabilit y . This leads to the following expression with high probabilit y for sufficien tly large n , √ n ( b θ ( n ) − θ 0 ) =  ˙ A ( b θ ( n ) ) | M ( φ ( V ( n ) )) B ( b θ ( n ) )  − 1 ˙ A ( b θ ( n ) ) | M ( φ ( V ( n ) )) √ n ( V ( n ) − A ) (23) Since M ( φ ( V ( n ) )) has smallest eigenv alue at least γ − d > 0 with high probability for n large enough, and since φ ( V ( n ) ) P → θ 0 , b θ ( n ) P → θ 0 , B ( b θ ( n ) ) → ˙ A ( θ 0 ) in probabilit y , using contin uity in all of the ab ov e functions, and the assumption that √ n ( V ( n ) − A ) → N (0 , C ) in distribution (and Slutsky’s theorem) w e get: √ n ( b θ ( n ) − θ 0 ) D → N (0 , Ψ) as n → ∞ . (24) 35 Pr o of of The or em 4.4 . Note that Theorem 24 in [ 10 ] shows that if the hypotheses hold then n  V ( n ) − A ( b θ ( n ) )  | M ( b θ ( n ) )  V ( n ) − A ( b θ ( n ) )  D → χ 2 ν − k . Note that w e ha ve φ ( V ( n ) ) P → θ 0 and b θ ( n ) P → θ 0 for the true parameter θ 0 ∈ Θ. W e can then apply Slutsky’s Theorem due to M ( · ) being con tinuous, to obtain the result for b D ( n ) ( b θ ( n ) ). 36

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment