Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates

Lo cal Priv acy , Data Pro cessing Inequalities, and M inima x Rates John C. Duc hi † Mic hael I. Jordan ∗ Martin J. W ainwrigh t ∗ jduchi@s tanford.edu jordan@s tat.berkeley.ed u wainwrig @stat.berkeley. ed u Stanford Univ ersit y † Univ ersit y of Califo rnia, Berk eley ∗ Stanford, CA 943 05 Berk eley , CA 947 20 Abstract W orking under a model of priv acy in which da ta remains priv ate even from the sta tis ticia n, we study the tr adeoﬀ b et w een priv acy guara n tees and the utilit y of the resulting statistical es- timators. W e prove b ounds on information-theoretic quantities, including mutual information and Kullback-Leibler div ergence, that dep end on the priv acy guar an tees. When combined with standard minimax techn iques, including the Le Cam, F ano, and Assouad metho ds, these in- equalities allow for a precise c haracterizatio n of statistical rates under lo cal priv a cy constraints. W e provide a treatmen t of several cano nical families of pr oblems: mean estimatio n, pa rameter estimation in ﬁx e d-design reg ression, multinomial pr obabilit y estimatio n, and nonpara metric density estimatio n. F or all of these families, we provide lower and upper b ounds that match up to constant facto rs, and e x hibit new (optimal) priv acy-pres e rving mechanisms and compu- tationally eﬃcient es t imators that achiev e the b ounds. 1 In tro duction A ma jor c hallenge in statistical inference is that of c haracterizing and balancing statistica l utilit y with the pr iv acy of in dividuals from whom data is obtained [20, 21, 28 ]. Suc h a c haracterizatio n requires a formal deﬁnition of priv acy , and diﬀer ential privacy h as b een put forth as one suc h formalization [e.g ., 24, 10, 25 , 34, 35]. In the database and cryptograph y literatures from wh ic h diﬀeren tial priv acy arose, early researc h w as mainly algorithmic in fo cus, and researc h ers ha ve used diﬀerentia l priv acy to ev aluate priv acy-retaining mec h a nisms f o r transp orting, indexing, an d querying data. More recen t work aims to link diﬀerentia l priv acy to statistical concerns [22, 51, 33, 48, 16, 46]; in particular, researc hers ha v e deve lop ed algorithms for p riv ate robust statistical estimators, p oin t and histogram estimation, and principal comp onen ts analysis. Guaran tees of optimalit y in this line of w ork ha ve ofte n b een non-inf er ential, aiming to appr o ximate a class of statistics u nder priv acy-resp ecting transf o rmations of the data at hand and not with resp ect to an underlying p opulation. There has also b een recen t wo rk within the con text of classiﬁcation problems and the “probably appr oximate ly correct” fr amework of statistical lea rning theory [e.g. 37, 8] that treats the data as random and aims to reco v er asp ects of the underlyin g p opulation; we discuss this w ork in Section 6. In this pap er, we tak e a fully infer ential point of view on priv acy , bringing diﬀeren tial pri- v acy in to con tact with stat istical decision theory . Our fo c us is on the fundamen tal limits of diﬀeren tially-priv ate estimation. By treating diﬀerenti al priv acy as an ab s t ract constrain t on esti- mators, w e obtain ind epend e nce from sp eciﬁc estima tion pro cedures and priv acy-preservin g mech- anisms. Within this fr a mew ork , w e derive b ot h lo wer b ounds and matc hing upp er b o unds on minimax r isk . W e obtain our lo w er b ounds b y in tegrating d iﬀeren tial priv acy in to t he cl assical 1 Z 1 Z 2 Z n X 1 X 2 X n Z 1 Z 2 Z n X 1 X 2 X n Figure 1. Left: graphica l structure of priv a te Z i and no n-priv ate da t a X i in interactive cas e. Righ t: graphical structure of c hannel in no n-in tera c t ive c a se. paradigms for b ounding min im ax risk via the inequalitie s of Le C a m, F ano, and Assouad, wh ile w e obtain matc hing upp er b ounds by pr o p osing and analyzing sp eciﬁc priv ate pro cedures. W e study the setting of lo c al privacy , in whic h pro vider s do n o t ev en trust the statistician collect ing the data. Although lo cal priv acy is a relativ ely stringent requirement, w e view this setting as a natural step in id en tifying minimax risk b ounds under priv acy constrain ts. Indeed, lo c al priv acy is o ne of the oldest forms of priv acy: its essen tial form date s to W arner [50], who prop osed it as a remedy for what h e termed “ev asiv e answe r bias” in surv ey sampling. W e hop e that w e can lev erage deep er understand in g of this classical setting to treat other priv acy-preserving approac hes to data analysis. More formally , le t X 1 , . . . , X n ∈ X b e observ ations dra wn according to a distribution P , and let θ = θ ( P ) b e a parameter of this unknown distribution. W e wish to estimate θ based on access to obscured v iews Z 1 , . . . , Z n ∈ Z of the original data. The original random v ariables { X i } n i =1 and the pr iv atized observ ations { Z i } n i =1 are linked via a family of conditional distrib u tio ns Q i ( Z i | X i = x, Z 1: i − 1 = z 1: i − 1 ). T o simp li fy notatio n, w e t yp ic ally omit the subscript in Q i . W e refer to Q as a channel distribution , as it ac ts as a co nduit from the original to the pr iv atized d at a, and w e assum e it is se quential ly inter active , meaning the c h annel has the cond it ional in depen dence structure { X i , Z 1 , . . . , Z i − 1 } → Z i and Z i ⊥ X j | { X i , Z 1 , . . . , Z i − 1 } for j 6 = i, illustrated on the left of Figure 1. A sp ecial case of suc h a channel is the non-inter active case, in whic h eac h Z i dep ends only on X i (Fig. 1, righ t). Our w ork is based on the follo wing d e ﬁnition of priv acy . F or a giv en priv acy parameter α ≥ 0, w e say that Z i is an α - diﬀer ential ly lo c al ly private vie w of X i if for all z 1 , . . . , z i − 1 and x, x ′ ∈ X w e h av e sup S ∈ σ ( Z ) Q i ( Z i ∈ S | X i = x, Z 1 = z 1 , . . . , Z i − 1 = z i − 1 ) Q i ( Z i ∈ S | X i = x ′ , Z 1 = z 1 , . . . , Z i − 1 = z i − 1 ) ≤ exp( α ) , (1) where σ ( Z ) denotes an appr o priate σ - ﬁeld on Z . Deﬁn it ion (1) does not c onstrain Z i to b e a release of data based exclusiv ely on X i : t he channel Q i ma y b e inter active [24], changing based on prior priv ate obs e rv ations Z j . W e also consider the non-in teractiv e case [50, 27] where Z i dep ends only on X i (see the r ig h t sid e of Figure 1); here the b ound (1) reduces to sup S ∈ σ ( Z ) sup x,x ′ ∈X Q ( Z i ∈ S | X i = x ) Q ( Z i ∈ S | X i = x ′ ) ≤ exp( α ) . (2) These deﬁnitions capture a t yp e of plaus ib le deniabilit y: no matter what d ata Z is released, it is nearly equally as likel y to h a ve come from one p oin t x ∈ X as any other. It is also p ossible 2 to interpret diﬀeren tial priv acy within a h yp othesis testing fr a mew ork, where α con trols the error rate in tests for the presence or absence of ind ividual d at a p oin ts in a dataset [51]. S uc h guarantee s against disco v ery , together with the treatmen t of issues of side information or adve rsarial strength that are p roblemati c for other formalisms, ha v e b een used to make the case for diﬀerentia l priv acy within the co mputer science literature; see, for example, the paper s [27, 24, 6, 30]. Although diﬀerentia l p riv acy pro vid es an elegan t formalism for limiting disclosure and pr ot ecting against man y forms of pr iv acy breac h, it is a strin g en t m e asure of pr iv acy , and it is conceiv ably o ve rly stringent for statistical practice. Indeed, Fien b erg et al. [29] criticize the use of diﬀerenti al priv acy in relea sing continge ncy tables, arguin g th at kno wn mec hanisms for diﬀerent ially pr iv ate data release can giv e u nac ceptably p o or p erformance. As a consequence, they advocate—in some cases—recourse to w eaker pr iv acy guarante es to main tain the u til it y and usabilit y of released data. There are results that are m o re fav orable f o r d iﬀe ren tial pr iv acy; for example, S mith [48] shows th a t the n o n-lo ca l form of diﬀeren tial p riv acy [24] ca n b e satisﬁed while yielding asymptotically optimal parametric rates of con ve rgence for s o me p oi n t estimators. Resolving suc h diﬀering p ersp ect iv es requires inv estigation into whether particular metho ds ha ve optimalit y prop erties that would allo w a general criticism of the framew ork, and c haracterizing the trade-oﬀs b et w een priv acy and statistical eﬃciency . Suc h are the go als of the curren t pap er . 1.1 Our con t ribu tions The main con tribu tion of this wo rk is to p ro vid e general tec h niques for deriving minimax b oun d s under lo cal pr iv acy constrain ts and to illustrate these tec hn iques by co mputing minimax rates for several canonical problems: (a) mean estimation; (b) parameter estimation in ﬁxed d e sign regression; (c) multinomial p robabilit y estimation; and (d ) densit y e stimation. W e no w outline our m ain con tribu tio ns. (Beca use a dee p er comparison of the curren t w ork with prior rese arc h requires a formal deﬁnition of our m inimax framework and p resen tation of our main r esu lt s, we defer a fu ll discussion of related w ork to S e ction 6. W e note here, how ev er, that our minimax rates are for estimatio n of p opulation quan tities, in accordance w it h our connectio ns to statistica l decision theory; b y wa y of comparison, most pr ior wo rk in the priv acy literature fo cuses on accurate appro ximation of statisti cs in a conditional analysis in whic h the data are trea ted as ﬁxed. Man y metho ds for obtaining minimax b ounds inv olve information-theoretic quanti ties relating data-generating distributions [53, 52, 49]. In particular, let P 1 and P 2 denote t wo d istr ibutions on the observ ations X i , and for ν ∈ { 1 , 2 } , d eﬁne the marginal distribution M n ν on Z n b y M n ν ( S ) := Z Q n ( S | x 1 , . . . , x n ) dP ν ( x 1 , . . . , x n ) for S ∈ σ ( Z n ) . (3) Here Q n ( · | x 1 , . . . , x n ) d en o tes the joint distribution on Z n of the pr iv ate sample Z 1: n , conditioned on X 1: n = x 1: n . The mutual information of samples dra w n according to distributions of the f o rm (3) and the KL dive rgence b et ween suc h distributions are ke y ob j e cts in statistical discrimin a bilit y and minimax rates [36, 9, 53, 52, 49 ], where they are often applied in one of three l o wer-bou n ding tec hniqu es: Le Cam’s, F a no’s, and Assouad’s metho ds. Keeping in mind the cent ralit y of these inform a tion-theoretic quant ities, w e summ arize our main r esults at a h ig h-lev el as f ollo ws. Theorem 1 b ounds the KL dive rgence b et we en distribu ti ons M n 1 and M n 2 , as deﬁned by the marginal (3), b y a quantit y dep enden t on th e d iﬀe ren tial priv acy parameter α and the total v ariation distance b et ween P 1 and P 2 . The essence of T heorem 1 is that D kl ( M n 1 k M n 2 ) . α 2 n k P 1 − P 2 k 2 TV , 3 where . denotes inequalit y up to n u m e rical constan ts. When α 2 < 1, which is the usual r e gion of in terest, this result s h o w s that for statistical pro cedures whose min ima x rate of conv ergence can b e determined b y classical information-theoretic method s, the additional requ ir e men t of α -local diﬀeren tial pr iv acy causes the eﬀe ctive sample size of any statistica l pro cedure to b e reduced from n to at most α 2 n . Section 3.1 con tains the formal state men t of this theorem, while S e ction 3.2 pro vides corollaries sh o win g its applicatio n to minimax risk b ounds. W e follo w this in Section 3.3 with applications of these results to estimation of one-dimen sio nal means and ﬁ xed-design regression problems, p ro viding corresp onding upp er b ounds on the minimax r isk. In ad d iti on to our general analysis, w e exhibit s ome striking d iﬃ cu lt ies of lo cally priv ate estimation in non-compact spaces: if w e wish to estimate the mean of a r an d om v ariable X satisfying V a r( X ) ≤ 1, the m inimax r a te of estimation of E [ X ] decreases from the parametric 1 /n r a te to 1 / √ nα 2 . Theorem 1 is appropr ia te for man y one- dimensional problems, but it d o es n o t add ress diﬃ- culties inheren t in h ig her-dimensional problems. With this motiv ation, our next tw o main resu lts (Theorems 2 and 3) generalize Theorem 1 and incorp orate dimensionalit y in an essen tial wa y: eac h pr o vid es b ounds on information-theoretic quan tities by dimension-dep endent analog ues of total v ariation. More sp eciﬁcal ly , Theorem 2 pro vides b ounds on m u tual in formati on quantitie s essen tial in in f o rmation-theoretic techniques suc h as F an o’s metho d [53, 52], while Theorem 3 pro- vides anal ogous b o unds o n summed pairs of KL-divergences us e ful in applications of Assouad’s metho d [5 , 53, 4]. As a consequence of Theorems 2 and 3, we obtain that for many d -dimensional estima tion problems the e ﬀectiv e sample size is reduced fr o m n t o nα 2 /d ; as our examples i llustrate, this dimension-dep endent reduction in sample size can ha ve dramatic consequences. W e pro vide the main statemen t and consequences of Theorem 2 in Section 4, sho wing its application to obtaining minimax rates for mean estimation in b oth classical and high-dimen s io nal settings. In Section 5, w e present Th e orem 3, sho w ing how it provides (sharp) minimax lo wer b ounds for multinomial and probabilit y d e nsit y estimation. Our results enable us to derive (often new) optimal mec h an ism s for these p roblems. O n e in teresting consequence of our results is that W arner’s rand omiz ed resp onse pro cedure [50] from the 1960s is an optimal mec hanism for m u lti nomial estimation. Notation: F or distributions P and Q deﬁned on a sp a ce X , eac h absolutely con tinuous with resp ect to a distr i bution µ (with corresp onding densities p and q ), the KL div ergence b et w een P and Q is D kl ( P k Q ) := Z X dP log dP dQ = Z X p log p q dµ. Letting σ ( X ) denote the (an appropriate) σ -ﬁeld on X , the total v ariation d ist ance b et ween t w o distributions P and Q is k P − Q k TV := sup S ∈ σ ( X ) | P ( S ) − Q ( S ) | = 1 2 Z X | p ( x ) − q ( x ) | dµ ( x ) . Let P and P Y denote marginal distributions of rand om vec tors X and Y and P Y ( · | X ) denote the distribution of Y conditional on X . Th e mutual information b et w een X and Y is I ( X ; Y ) = E P [ D kl ( P Y ( · | X ) k P Y ( · ))] = Z D kl ( P Y ( · | X = x ) k P Y ( · )) dP ( x ) . Random v ariable Y has Laplace( α ) distrib u tio n if its density is p Y ( y ) = α 2 exp ( − α | y | ). F or matrices A, B ∈ R d × d , the notatio n A  B means that B − A is p ositiv e semideﬁnite. F or real sequences 4 { a n } and { b n } , w e use a n . b n to mean there is a u niv ersal constant C < ∞ such that a n ≤ C b n for all n , and a n ≍ b n to denote that a n . b n and b n . a n . 2 Bac kground and pr ob lem form ulation W e ﬁrst esta blish the min ima x framework w e use thr o ughout this p a p er; see references [52, 53, 49] for further bac kgroun d. Let P denote a class of distributions on the sample space X , and let θ ( P ) ∈ Θ denote a f u nctio n deﬁned on P . T he space Θ in whic h the parameter θ ( P ) tak es v alues dep ends on the und er lyin g stat istical mo del (for un iv ariate m ean estimation, it is a subset of the real line). Let ρ d e note a semi-metric on the space Θ, whic h we use to measure the error of an estimator for the parameter θ , and let Φ : R + → R + b e a non-decreasing fu nctio n with Φ(0) = 0 (for example, Φ( t ) = t 2 ). In the c lassical s etting, th e statisti cian is giv en direct access to i.i.d. observ ations X i dra wn according to some P ∈ P . The lo cal priv acy sett ing i n v olve s an additional ingredient , namely , a conditional d istr i bution Q that transf o rms the sample { X i } n i =1 in to the priv ate sample { Z i } n i =1 taking v alues in Z . Based on these Z i , our goal is to estimate the unkno wn parameter θ ( P ) ∈ Θ. An estima tor b θ is a measurable f unction b θ : Z n → Θ , and we assess the qualit y of the esti mate b θ ( Z 1 , . . . , Z n ) in terms of th e risk E P ,Q h Φ  ρ ( b θ ( Z 1 , . . . , Z n ) , θ ( P ))  i . F or instance, for a univ ariate mean problem w it h ρ ( θ , θ ′ ) = | θ − θ ′ | and Φ( t ) = t 2 , this risk is the mean-squared error. F or any ﬁ xed conditional distrib ution Q , the minimax rate is M n ( θ ( P ) , Φ ◦ ρ, Q ) := inf b θ sup P ∈P E P ,Q h Φ  ρ ( b θ ( Z 1 , . . . , Z n ) , θ ( P ))  i , (4) where we tak e the supr emum ov er distribu tio ns P ∈ P , and the inﬁmum is tak en o v er all estimators b θ . F or α > 0, let Q α denote the set of all conditional distribu ti ons guaran teeing α -lo cal p riv acy (1). By minimizing the minimax risk (4) o v er all Q ∈ Q α , w e obtain the cen tral ob ject of study for this pap er, a f unctional wh ic h c haracterizes the optimal r at e of estimation in terms of the priv acy parameter α . Deﬁnition 1. Giv en a family of d istributions θ ( P ) and a priv acy parameter α > 0, the α -minimax r ate in the metric ρ is M n ( θ ( P ) , Φ ◦ ρ, α ) := inf Q ∈Q α inf b θ sup P ∈P E P ,Q h Φ( ρ ( b θ ( Z 1 , . . . , Z n ) , θ ( P ))) i . (5) F ro m estimation to test ing : A standard ﬁr st step in p ro ving m i nimax b ounds is to red u ce the estimation problem to a testing pr o blem [53, 52, 49]. W e u se t wo t yp es of testing p roblems: on e a m ultiple hyp othesis test, the second b ase d on multiple binary hyp othesis tests. W e begin with the ﬁ rst of the tw o. Giv en an index set V of ﬁnite card i nalit y , consider a family of distribu tions { P ν , ν ∈ V } conta ined within P . This f a mily indu ce s a collectio n of parameters { θ ( P ν ) , ν ∈ V } ; it is a 2 δ -pac kin g in the ρ -semimetric if ρ ( θ ( P ν ) , θ ( P ν ′ )) ≥ 2 δ for all ν 6 = ν ′ . (6) W e use this family to deﬁne the c anonic al hyp othesis testing pr oblem : 5 • ﬁrst, nature c ho oses V according to the uniform d istr i bution o ve r V ; • second, conditioned on the c hoice V = ν , the random sample X = ( X 1 , . . . , X n ) is drawn from the n -fold pro duct distribution P n ν . In the c lassical sett ing, the statistic ian directly observ es the sample X , while the local priv acy constrain t means that a n e w rand om sample Z = ( Z 1 , . . . , Z n ) is generated b y sampling Z i from the distribution Q ( · | X 1: n ). By construction, conditioned on the choic e V = ν , the priv ate sample Z is d istributed ac cording to the marginal measure M n ν deﬁned in equation (3). Giv en the observed v ector Z , the goa l is to determine the v alue of the underlying index ν . W e refer to any measurable mapping ψ : Z n → V as a test function. Its asso ciated error p robabilit y is P ( ψ ( Z 1 , . . . , Z n ) 6 = V ), where P denotes the joint distribution o ver the rand o m in d ex V and Z . The classical reduction from estimat ion to testing [e.g ., 49, Section 2.2] guaran tees that the minimax error (4) has lo wer b ound M n (Θ , Φ ◦ ρ, Q ) ≥ Φ ( δ ) inf ψ P ( ψ ( Z 1 , . . . , Z n ) 6 = V ) . (7) The remaining c hallenge is to lo we r b ound th e p r obabilit y of error in the underlying multi- w a y h yp othesis testing p roblem. There are a v ariet y of tec hniques f or this, and we fo cus on b ou n ds on the probabilit y of error (7) du e to Le Cam and F ano. The simplest form of Le C am’s inequalit y [e.g., 53, Lemma 1] is applicable wh e n there are t w o v alues ν , ν ′ in V . In this case, inf ψ P ( ψ ( Z 1 , . . . , Z n ) 6 = V ) = 1 2 − 1 2 k M n ν − M n ν ′ k TV , (8) where the marginal M is deﬁned as in expression (3). More generally , F a no’s inequalit y [52, 32, Lemma 4.2.1] holds when n at ure chooses uniformly at random from a set V of ca rdinalit y larger than t wo , and tak es the form inf ψ P ( ψ ( Z 1 , . . . , Z n ) 6 = V ) ≥  1 − I ( Z 1 , . . . , Z n ; V ) + log 2 log |V |  . (9) The second reduction w e consider—w hic h transforms estimation problems into multiple bin a ry h yp othesis testing problems—u s e s the stru c ture of the hypercu be in an essentia l w ay . F or some d ∈ N , we set V = {− 1 , 1 } d . W e say that the the family P ν induces a 2 δ -Hamming separation for Φ ◦ ρ if there exists a function v : θ ( P ) → {− 1 , 1 } d satisfying Φ( ρ ( θ , θ ( P ν ))) ≥ 2 δ d X j =1 1 { [ v ( θ )] j 6 = ν j } . (10) Letting P ± j denote the join t d istribution o v er the r a ndom index V and Z co nditional on the j th co o rdinate V j = ± 1, we are able to establish the follo win g sharp ening of Assouad’s lemma [5, 4 ] (see App endix F.1 for a pro of ). Lemma 1. Under the c onditions of the pr evious p ar agr aph, we have M n ( θ ( P ) , Φ ◦ ρ, Q ) ≥ δ d X j =1 inf ψ [ P + j ( ψ ( Z 1: n ) 6 = +1) + P − j ( ψ ( Z 1: n ) 6 = − 1)] . 6 With the deﬁ n iti on of the marginals M n ± j = 2 − d +1 P ν : ν j = ± 1 M n ν , expression (8) sh o ws that Lemma 1 is equiv alen t to the lo we r b ound M n ( θ ( P ) , Φ ◦ ρ, Q ) ≥ δ d X j =1 h 1 −   M n + j − M n − j   TV i . (11) As a consequence of the p rece ding reductions to testing and the err o r b ounds (8), (9), and (11), w e obtain b ound s on the priv ate minimax rate (5) b y con trolling v ariation distances of th e f orm k M n 1 − M n 2 k TV or th e m utu a l inf o rmation b et we en the random p a rameter index V and the s equ e nce of random v ariables Z 1 , . . . , Z n . W e devot e the follo wing sections to these tasks. 3 P airwise b ounds under p r iv acy: Le Cam and lo c a l F ano metho ds W e b egi n with results that upp er b ound the symmetrized Kullbac k-Leibler d iv ergence under a priv acy constrain t, dev eloping consequences of this result for b oth L e Cam’s metho d and a lo cal form of F ano’s metho d. Using these metho d s, w e d e riv e sh arp minimax rates un der lo c al priv acy for estimating 1-dimensional means and for d -dimensional ﬁxed design regression. 3.1 P airwise upper bounds on Kullbac k-Leibler div ergences Man y statistical prob lems dep end on comparisons b et wee n a pair of distributions P 1 and P 2 deﬁned on a common s pac e X . Any conditional d istribution Q transforms s uc h a pair of distributions into a new pair ( M 1 , M 2 ) via the marginaliza tion (3); that is, M j ( S ) = R X Q ( S | x ) dP j ( x ) for j = 1 , 2. Our ﬁrst main result b ounds the sym met rized Kullbac k-Leibler (KL) div ergence b et we en these induced marginals as a function of the priv acy parameter α > 0 asso ciat ed with the co nditional distribution Q and the total v ariation distance b et we en P 1 and P 2 . Theorem 1. F or any α ≥ 0 , let Q b e a c onditional distribution that guar ante es α -diﬀer ential privacy. Then for any p air of distributions P 1 and P 2 , the i nd uc e d mar ginals M 1 and M 2 satisfy the b ound D kl ( M 1 k M 2 ) + D kl ( M 2 k M 1 ) ≤ min { 4 , e 2 α } ( e α − 1) 2 k P 1 − P 2 k 2 TV . (12) Remarks: Theorem 1 is a t yp e of str ong data pr o c essing inequalit y [3], p ro viding a quan titativ e relationship from the div ergence k P 1 − P 2 k TV to the K L-div ergence D kl ( M 1 k M 2 ) that arises after applying the c h a nnel Q . T he result of Theorem 1 is similar to a result due to Dw ork et al. [25, Lemma I I I.2], who show that D kl ( Q ( · | x ) k Q ( · | x ′ )) ≤ α ( e α − 1) for an y x, x ′ ∈ X , wh ic h implies D kl ( M 1 k M 2 ) ≤ α ( e α − 1) by conv exit y . This upp er b ound is wea k er th an Th e orem 1 since it lac ks th e term k P 1 − P 2 k 2 TV . This total v ariation term is essential to our minimax lo wer b ounds: more than pro viding a b ound on KL d ivergence, Theorem 1 sho ws that diﬀerential priv acy ac ts as a cont raction on the space of probabilit y measur es. Th is con tractivit y holds in a strong sense: indeed, the b ound (12) sho w s that ev en if w e start with a pair of distribu tio ns P 1 and P 2 whose KL div ergence is inﬁn ite, the induced marginals M 1 and M 2 alw a ys hav e ﬁnite KL div ergence. W e pro vide the pr oof of Theorem 1 in Section 7. Here w e dev elop a corollary that has useful consequences for minimax theory under lo cal p riv acy constrain ts. S upp ose that conditionally on V = ν , w e dra w a sample X 1 , . . . , X n from the pro duct measure Q n i =1 P ν,i , and that we dra w the 7 α -locally priv ate sample Z 1 , . . . , Z n according to the c hannel Q ( · | X 1: n ). Conditioned on V = ν , the pr iv ate sample is d istributed ac cording to the measure M n ν deﬁned previously (3). Because w e allo w in teractiv e pr oto cols, the distribu t ion M n ν need not b e a pro duct distribution in general. Giv en this setup , we hav e the follo wing: Corollary 1. F or any α - lo c al ly diﬀer ential ly private (1) c onditional distribution Q and any p air e d se q u enc es of distributions { P ν,i } and { P ν ′ ,i } , D kl ( M n ν k M n ν ′ ) + D kl ( M n ν ′ k M n ν ) ≤ 4( e α − 1) 2 n X i =1   P ν,i − P ν ′ ,i   2 TV . (13) See Sectio n 7.2 for the p roof, which requires a few in termediate steps to obtain the add iti v e inequalit y . Inequalit y (13) also immediately implies a mutual information b ound, whic h ma y b e us efu l in applicatio ns of F ano’s inequalit y . In particular, if w e deﬁne the mean distribution M n = 1 |V | P ν ∈V M n ν , then b y the deﬁnition of m utual information, we h a ve I ( Z 1 , . . . , Z n ; V ) = 1 |V | X ν ∈V D kl  M n ν k M n  ≤ 1 |V | 2 X ν,ν ′ D kl ( M n ν k M n ν ′ ) ≤ 4( e α − 1) 2 n X i =1 1 |V | 2 X ν,ν ′ ∈V   P ν,i − P ν ′ ,i   2 TV , (14) the ﬁ r st inequalit y follo w ing from th e joint con vexit y of the KL div ergence and the ﬁn a l inequality from Corollary 1. Remarks: Mutual information b ound s und er local priv acy ha ve app eared previously . McGregor et al. [43] study relati onships b et ween communicatio n complexit y and d iﬀe ren tial pr iv acy , showing that diﬀerentiall y priv ate sc hemes all o w lo w comm un ication. Th e y pr o vide a result [43 , Prop. 7] guaran teeing I ( X 1: n ; Z 1: n ) ≤ 3 αn ; they strengthen this b ound to I ( X 1: n ; Z 1: n ) ≤ (3 / 2) α 2 n wh e n the X i are i.i.d. uniform Bernoulli v ariables. Since the total v ariation distance is at most 1, our result also im p lie s this scaling (for arbitrary X i ), but it is stronger s in c e it inv olves the total v ariation terms k P ν,i − P ν ′ ,i k TV , which are essential in our minimax resu lt s. In addition, C orollary 1 allo ws for any (sequenti ally) inte ractiv e c hannel Q ; eac h Z i ma y dep end on the priv ate answe rs Z 1: i − 1 of other data pro viders . 3.2 Consequences for minimax theory under lo cal priv acy constrain ts W e no w tu rn to some consequences of Theorem 1 for minimax theory under lo ca l priv acy constraints. F or ease o f presen tation, we analyze the case of indep enden t and id e n tically distributed (i.i.d.) samples, m e aning th a t P ν,i ≡ P ν for i = 1 , . . . , n . W e sh o w that in b oth Le Cam’s inequalit y and the lo cal ve rsion of F ano’s metho d, the constrain t of α -local d iﬀe ren tial pr iv acy reduces the eﬀectiv e sample size (at least) fr o m n to 4 α 2 n . Consequence fo r Le Cam’s metho d: The classical non-pr iv ate v ersion of Le Cam’s m etho d b ounds the us u al minimax risk M n ( θ ( P ) , Φ ◦ ρ ) := inf b θ sup P ∈P E P h Φ  ρ ( b θ ( X 1 , . . . , X n ) , θ ( P ))  i , 8 for estimators b θ : X n → Θ by a binary hyp othesis test. One ve rsion of Le C am’s lemma (8) asserts that, for an y pair of distributions { P 1 , P 2 } suc h that ρ ( θ ( P 1 ) , θ ( P 2 )) ≥ 2 δ , w e hav e M n ( θ ( P ) , Φ ◦ ρ ) ≥ Φ( δ ) n 1 2 − 1 2 √ 2 p nD kl ( P 1 k P 2 ) o . (15) Returning to the α -lo c ally pr iv ate setting, in which the estimator b θ dep ends only on the priv ate v ariables ( Z 1 , . . . , Z n ), w e measure the α -priv ate minimax risk (5). By applying Le Cam’s method to the pair ( M 1 , M 2 ) alo ng with Corollary 1 in the form of inequalit y (13), we ﬁ nd: Corollary 2 (Priv ate form of Le Cam b ound) . Given observation s fr om an α -lo c al ly diﬀer ential private channel for some α ∈ [0 , 22 35 ] , the α -private minimax risk is lower b ounde d as M n ( θ ( P ) , Φ ◦ ρ, α ) ≥ Φ( δ ) n 1 2 − 1 2 √ 2 q 8 nα 2 k P 1 − P 2 k 2 TV o . (16) Using the fact that k P 1 − P 2 k 2 TV ≤ 1 2 D kl ( P 1 k P 2 ), comparison with the original Le Cam b ound (15) sho w s that for α ∈ [0 , 22 35 ], the eﬀect of α -local d iﬀe ren tial priv acy is to reduce the eﬀe ctive sample size from n to 4 α 2 n . W e illustrate u se of this p riv ate version of Le Cam’s b ound in our analysis of the one-dimensional mean problem to follo w . Consequences for lo cal F a no’s metho d: W e no w tu r n to consequences for the so-called local form of F ano’s metho d. Th is metho d is based on constructing a family of distr ib utions { P ν , ν ∈ V } that deﬁnes a 2 δ -pac king, meaning ρ ( θ ( P ν ) , θ ( P ν ′ )) ≥ 2 δ for all ν 6 = ν ′ , satisfying D kl ( P ν k P ν ′ ) ≤ κ 2 δ 2 for some ﬁxed κ > 0 . (17) W e refer to any suc h construction a s a ( δ , κ ) lo c al p acking . Recalling F ano’s inequalit y (9), the pairwise upp er b ounds (17) imply I ( X 1 , . . . , X n ; V ) ≤ n κ 2 δ 2 b y a con v exit y argument. W e th us obtain the local F ano lo wer b o und [36, 9 ] on the classical minimax risk: M n ( θ ( P ) , Φ ◦ ρ ) ≥ Φ( δ ) n 1 − nκ 2 δ 2 + log 2 log |V | o . (18) W e no w state the extension of this b ound to th e α -lo cally priv ate setting. Corollary 3 (Priv ate form of local F ano in e qualit y) . Consider observations fr om an α -lo c al ly dif- fer e ntial private channel f o r some α ∈ [0 , 22 35 ] . Given any ( δ, κ ) lo c al p acking, the α - p rivate minimax risk has lower b ound M n (Θ , Φ ◦ ρ, α ) ≥ Φ( δ ) n 1 − 4 nα 2 κ 2 δ 2 + log 2 log |V | o . (19) Once again, b y comparison to the classical version (18), w e see that, for all α ∈ [0 , 22 35 ], the price f o r priv acy is a reduction in the eﬀectiv e samp le size from n to 4 α 2 n . The pro of is again straigh tfo w ard using Theorem 1. By Pins k er’s inequalit y , the pairwise b ound (17) implies that k P ν − P ν ′ k 2 TV ≤ 1 2 κ 2 δ 2 for all ν 6 = ν ′ . W e ﬁnd that I ( Z 1 , . . . , Z n ; V ) ≤ 4 nα 2 κ 2 δ 2 for all α ∈ [0 , 22 35 ] b y combining this inequalit y w it h the upp er b ound (14) from Corollary 1. Th e claim (19) follo ws b y com b ining this u pp er b ound with the usual local F ano b ound (18). 9 3.3 Some applications of Theorem 1 In this section, w e illustrate the u se of the α -priv ate v ersions of Le Cam’s and F ano’s in e qualities, established in the previous section as Corollaries 2 and 3 of Th eo rem 1. Fi rst, we study the p roblem of one-dimensional mean estimatio n. In addition to demonstrating ho w th e min i max rate c hanges as a fun c tion of α , w e also rev eal some interesting (and p erhaps disturbing) eﬀects of enforcing α -local d iﬀe ren tial p riv acy: the eﬀectiv e sample size ma y b e ev en p olynomially smaller than α 2 n . Our seco nd example s tu dies ﬁxed design li near regressio n, wh ere we again see th e reduction in eﬀectiv e sample size from n to α 2 n . W e state eac h of our b ounds assuming α ∈ [0 , 1]; the b ounds hold (with diﬀeren t numerical constants) whenev er α ∈ [0 , C ] for some universal constant C . 3.3.1 One-dimensional mean estimation F or some k > 1, consider the family P k :=  distributions P suc h that E P [ X ] ∈ [ − 1 , 1] and E P [ | X | k ] ≤ 1  , and sup pose that our goa l is to estimate the mean θ ( P ) = E P [ X ]. The next prop osition c h arac ter- izes the α -priv ate minimax risk in squared ℓ 2 -error: M n ( θ ( P k ) , ( · ) 2 , α ) := inf Q ∈Q α inf b θ sup P ∈P k E h  b θ ( Z 1 , . . . , Z n ) − θ ( P )  2 i . Prop o sition 1. Ther e exist u niversa l c onstants 0 < c ℓ ≤ c u < ∞ such t hat for al l k > 1 and α ∈ [0 , 1] , the minimax err or M n ( θ ( P k , ( · ) 2 , α ) is b ounde d as c ℓ min  1 ,  nα 2  − k − 1 k  ≤ M n ( θ ( P k ) , ( · ) 2 , α ) ≤ c u min  1 , u k  nα 2  − k − 1 k  , (20) wher e u k = max { 1 , ( k − 1) − 2 } . W e pr ov e this resu lt using the α -priv ate v ers i on (16) of Le Cam’s inequalit y , as stated in Corol- lary 2. See Section 7.3 for the details. T o u nderstand the b ounds (20 ), it is w orth w hile consid e ring some sp ecial cases, b eginning with the usual setting of random v ariables with ﬁnite v ariance ( k = 2). In the non-priv ate setting in whic h th e original sample ( X 1 , . . . , X n ) is observ ed, the sample m ean b θ = 1 n P n i =1 X i has mean- squared error at most 1 / n . When w e requir e α -lo cal diﬀeren tial p r iv acy , Pr oposition 1 sho ws that the minimax rate worsens to 1 / √ nα 2 . More generall y , for an y k > 1, the minimax rate scales as M n ( θ ( P k ) , ( · ) 2 , α ) ≍ ( nα 2 ) − k − 1 k , ignoring k -dep enden t p re-fa ctors. As k ↑ ∞ , the momen t condition E [ | X | k ] ≤ 1 b ecomes equiv alen t to the b oundedness constraint | X | ≤ 1 a.s., and w e obtain the more standard parametric r a te ( nα 2 ) − 1 , where there is no redu ct ion in th e exp onent . More generally , the b eha vior of the α -p r iv ate min ima x r a tes in (20) helps demarcate situations in which lo cal diﬀerent ial priv acy ma y or ma y not b e acceptable. In particular, for b ounded domains—where we ma y tak e k ↑ ∞ —lo cal diﬀerenti al priv acy ma y b e qu i te reasonable. Ho we v er, in situations in whic h the s amp le tak es v alues i n an un b ounded space, local diﬀeren tial pr iv acy pro vides m u c h stricter constrain ts. In deed, in App endix G, w e discuss an example that illustrates the pathologica l consequen c es of pro viding (lo c al) diﬀerentia l priv acy for non-compact spaces. 10 3.3.2 Linear regression with ﬁxed design W e tu r n no w to the pr o blem of linear regression. Concretely , for a giv en design matrix X ∈ R n × d , consider the standard lin e ar mo del Y = X θ ∗ + ε, (21) where ε ∈ R n is a v ector of indep endent, zero-mean r a ndom v ariables. By rescaling as needed, we ma y assu me th a t θ ∗ ∈ Θ = B 2 (1), the Eu clidean b a ll of radius one. Moreo ver, w e assume that a scaling constant σ < ∞ suc h that th e noise sequence | ε i | ≤ σ fo r all i . Giv en t he challe nges of non-compactness exhibited by the location family estimation problems (cf. Prop osition 1), this t yp e of assumption is required for non -trivial r e sults. W e also assume that X has rank d ; otherwise, the design matrix X h a s a non-trivial nullspace and θ ∗ cannot b e estimated ev en w hen σ = 0. With the mo del (21) in place, let us consider estimation of θ ∗ in the squared ℓ 2 -error, where we pro vide α -lo cally diﬀeren tially priv ate views of the resp onse Y = { Y i } n i =1 . By follo wing the outline established in Section 3.2, we pro vid e a sharp charac terization of the α -priv ate minimax rate. In stating the result, we let ρ j ( A ) denote the j th singular v alue of a matrix A . (See Section 7.4 for the pro of.) Prop o sition 2. In the ﬁxe d design r e gr ession mo del wher e the variables Y i and ar e α -lo c al ly dif- fer e ntial ly private for some α ∈ [0 , 1] , min  1 , σ 2 d nα 2 ρ 2 max ( X/ √ n )  . M n  Θ , k·k 2 2 , α  . min ( 1 , σ 2 d α 2 nρ 2 min ( X/ √ n ) ) . (22) T o in terpret the b ounds (22), it is helpfu l to consid e r some sp ec ial case s. First consider the c ase of an o rthonormal design, meaning th a t 1 n X ⊤ X = I d × d . The b ounds (22) imply that M n (Θ , k·k 2 2 , α ) ≍ σ 2 d/ ( nα 2 ), so that th e α -pr iv ate m inimax rate is fully determined (up to con- stan t pr e -factors). Standard minimax rates f o r linear regression problems scale as σ 2 d/n ; th us , by comparison, w e see that requiring diﬀeren tial priv acy indeed causes an eﬀectiv e sample size redu c- tion fr o m n to nα 2 . More generally , up to the diﬀerence b et ween the maxim u m and minim um singular v alues of the d e sign X , Pr o p osition 2 pr o vides a sharp characte rization of the α -priv ate rate f o r ﬁxed-design linear regression. As the p r oof mak es clea r, the u pp e r b ounds are attai ned b y adding Laplacian noise to the resp onse v ariables Y i and solving the resulting normal equations as in standard linear r eg ression. In this case, the standard Laplacian mec h anism [24] is optimal. 4 Mutual information under lo cal priv acy: F ano’s metho d As w e hav e previously noted, T heo rem 1 provides ind irect upp er b oun d s on the m utual information. Ho wev er, since the resulting b ounds in volv e p airw ise distances only , as in Corollary 1, they must b e us ed with lo ca l pac kings. Exploiti ng F ano’s inequalit y in its full generalit y requires a more sophisticated upp er b ound on the m utual information under lo cal priv acy , whic h is the main to pic of this section. W e ill ustrate this more p o w erful tec hnique by deriving lo wer b ounds fo r mean estimation problems in b oth classical as w ell as high-dimensional settings under th e non -interact iv e priv acy mod el (2). 11 4.1 V a riational b ound s on mu tual information W e b egin b y introdu c ing some deﬁnitions needed to state th e result. Let V b e a discrete rand o m v ariable uniformly distribu te d o v er s ome ﬁn ite set V . Giv en a family of distributions { P ν , ν ∈ V } , w e d eﬁ ne the mixture distribution P := 1 |V | X ν ∈V P ν . A sample X ∼ P can b e obtained by ﬁr s t drawing V from the un iform distribu ti on o ver V , and then conditionally on V = ν , d ra wing X from the distribution P ν . By deﬁnition, the m u t ual information b et w een the random index V and the sample X is I ( X ; V ) = 1 |V | X ν ∈V D kl  P ν k P  , a representati on that pla ys an imp ortant r ole in our theory . As in the deﬁnition (3 ), any conditional distribution Q induces the family of marginal distribu t ions { M ν , ν ∈ V } and the asso ciated m ixt ure M := 1 |V | P ν ∈V M ν . O ur goa l is to upp er b ound the mutual information I ( Z 1 , . . . , Z n ; V ), wh e re conditioned on V = ν , the r a ndom v ariables Z i are dra wn according to M ν . Our up per b ound is v ariational in nature: it inv olv es optimization o v er a su bset of the space L ∞ ( X ) :=  f : X → R | k f k ∞ < ∞  of uniform ly b ounded fun ctions, equipp ed w it h the us u al norm k f k ∞ = sup x ∈X | f ( x ) | . W e deﬁne the 1-ball of the supremum norm B ∞ ( X ) := { γ ∈ L ∞ ( X ) | k γ k ∞ ≤ 1 } . (23) W e s ho w that this set describ es the maximal amoun t of p erturbation allo wed in the conditional Q . Since the set X is generally clea r f rom con text, we t yp ic ally omit this dep endence. F or eac h ν ∈ V , w e d eﬁ ne the linear functional ϕ ν : L ∞ ( X ) → R by ϕ ν ( γ ) = Z X γ ( x )( dP ν ( x ) − d P ( x )) . With these deﬁnitions, w e ha v e the follo w ing result: Theorem 2. L et { P ν } ν ∈V b e an ar bitr ary c ol le ction of pr ob ability me asur es on X , and let { M ν } ν ∈V b e the set of mar ginal distributions induc e d by an α -diﬀer ential ly private distribution Q . Then 1 |V | X ν ∈V  D kl  M ν k M  + D kl  M k M ν  ≤ ( e α − 1) 2 |V | sup γ ∈ B ∞ ( X ) X ν ∈V ( ϕ ν ( γ )) 2 . (24) It is imp ortan t to note that, at least up to constan t factors, Theorem 2 is n ev er w eak er th an the results provided by Theorem 1, includ ing the b ounds of C o rollary 1. B y deﬁnition of the linea r functional ϕ ν , w e hav e sup γ ∈ B ∞ ( X ) X ν ∈V ( ϕ ν ( γ )) 2 ( i ) ≤ X ν ∈V sup γ ∈ B ∞ ( X ) ( ϕ ν ( γ )) 2 = 4 X ν ∈V   P ν − P   2 TV , 12 where inequalit y ( i ) follo ws b y interc hanging the summation and suprem um. O v erall, w e hav e I ( Z ; V ) ≤ 4( e α − 1) 2 1 |V | 2 X ν,ν ′ ∈V k P ν − P ν ′ k 2 TV . The strength of Th e orem 2 arises from the fact that inequalit y ( i )—the inte rc h a nge of the order of supremum and summation—ma y b e quite lo ose. W e n ow presen t a corollary that extends Theorem 2 to the setting of rep eated sampling, provid- ing a tensorization inequ a lit y analogous to Corollary 1. Let V b e distributed uniformly at random in V , and assume that giv en V = ν , the observ ations X i are sampled indep enden tly according to the d istr ibution P ν for i = 1 , . . . , n . F or this coroll ary , w e require the non-in teractiv e setting (2) of lo c al priv acy , where ea c h p riv ate v ariable Z i dep ends only on X i . Corollary 4. Supp ose that the distributions { Q i } n i =1 ar e α -lo c al ly diﬀer ential ly private i n the non- inter active setting (2) . Then I ( Z 1 , . . . , Z n ; V ) ≤ n ( e α − 1) 2 1 |V | sup γ ∈ B ∞ X ν ∈V ( ϕ ν ( γ )) 2 . (25) W e pro vide the p roof of C o rollary 4 in Section 8.2. W e conjecture that the b ound (25) also holds in the fully in teractiv e setting, but give n w ell-kno wn diﬃculties of c haracterizing multiple c hannel capacitie s with feedbac k [17, Chapter 15] , it ma y b e c hallenging to ve rify this conjecture. Theorem 2 and Corollary 4 relate the amount of mutual information b et w een the rand o m p erturb ed views Z of the data to geometric or v ariational prop erties of the u nderlying pac king V of the parameter space Θ. In p a rticular, Theorem 2 and Corollary 4 show that if w e ca n ﬁnd a pac king set V th at yields linear functionals ϕ ν whose sum has go od “sp ectral” prop erties—meaning a small op erator norm wh e n taking su p rema ov er L ∞ -t yp e spaces—we can provide sharp er r esu lt s. 4.2 Applications of Theorem 2 to mean estimation In this section, we sh o w h ow T heo rem 2, coupled with Corollary 4, leads to sharp c haracteriza- tions of the α -p r iv ate min imax r a tes for classical and high-dimensional mean estimation p roblems. Our results sh o w that for in d -dimensional mean-estimati on pr o blems, the requirement of α -local diﬀeren tial priv acy causes a reduction in eﬀectiv e sample size from n to n α 2 /d . Th r o ughout this section, w e assume that the c hannel Q is non-inter active , meaning that the random v ariable Z i dep ends only on X i , and so that local priv acy tak es the simpler form (2). W e also state eac h of our results for priv acy parameter α ∈ [0 , 1], bu t n ote th a t all of our b ounds hold for any constan t α , with appropriate c hanges in the n umerical pre-facto rs. Before pr oceeding, we describ e t w o samp l ing mec hanisms f o r enforcing α -lo cal diﬀerential pri- v acy . Our m e tho ds for ac hieving the up per b ounds in minimax r at es are based on unbiase d estima- tors. Let us assume we wish to construct an α -priv ate unbiased estimate Z for the v ector v ∈ R d . The follo w ing sampling strategies are based on a radius r > 0 and a b ound B > 0 sp eciﬁed for eac h problem, and they r equire the Bernoulli random v ariable T ∼ Bernoulli( π α ) , where π α := e α / ( e α + 1) . 13 v 1 1+ e α e α 1+ e α v 1 1+ e α e α 1+ e α (a) (b) Figure 2. Priv a te sampling strategies. (a) Strategy (26a) for the ℓ 2 -ball. Outer b oundary of highlighted r egion sampled uniformly with pr obabilit y e α / ( e α + 1). (b) Str ategy (26b) for the ℓ ∞ -ball. Circled p oin t set sampled uniformly with pr o babilit y e α / ( e α + 1). Strategy A: Giv en a ve ctor v with k v k 2 ≤ r , set e v = r v/ k v k 2 with pr o babilit y 1 2 + k v k 2 / 2 r and e v = − r v / k v k 2 with probabilit y 1 2 − k v k 2 / 2 r . Then sample T ∼ Bernoulli( π α ) and s e t Z ∼ ( Uniform( z ∈ R d : h z , e v i > 0 , k z k 2 = B ) if T = 1 Uniform( z ∈ R d : h z , e v i ≤ 0 , k z k 2 = B ) if T = 0 . (26a) Strategy B: Giv en a v ector v with k v k ∞ ≤ r , co nstruct e v ∈ R d with coordinates e v j sampled indep enden tly f rom {− r, r } w it h probab ilities 1 / 2 − v j / (2 r ) and 1 / 2 + v j / (2 r ). Then sample T ∼ Bernoulli( π α ) and set Z ∼ ( Uniform( z ∈ {− B , B } d : h z , e v i > 0) if T = 1 Uniform( z ∈ {− B , B } d : h z , e v i ≤ 0) if T = 0 . (26b) See Figure 2 for visualizati ons of these sampling strategies. By insp ection, eac h is α -diﬀeren tially priv ate for an y v ector satisfying k v k 2 ≤ r or k v k ∞ ≤ r for Str ategy A or B, resp ectiv ely . Moreo v er, eac h strateg y is eﬃcie n tly implemen table: Strategy A by normalizing a sample fr o m the N (0 , I d × d ) distribution, and Strategy B b y rejection sampling o v er the scaled h yp ercub e {− B , B } d . Giv en these samplin g strategies, w e stud y the d -dimensional problem of estimating the m e an θ ( P ) := E P [ X ] of a ran d om v ector. W e consider a few diﬀerent metrics for the error of an estimator of the mean to ﬂ esh out the testing reduction in Section 2. Due to the diﬃculties asso ci ated with diﬀeren tial pr iv acy on non-compact s pac es (recall Section 3.3.1), w e fo cus on distribu tions with compact s u pp ort. W e defer all p roofs to App endix A; they use a com bination of Th eo rem 2 with F ano’s method . 14 4.2.1 Minimax rat es W e b egin b y b ounding the minimax rate in the squ a red ℓ 2 -metric. F or a parameter p ∈ [1 , 2] and radius r < ∞ , consider the family P p,r :=  distributions P supp orted on B p ( r ) ⊂ R d  . (27) where B p ( r ) = { x ∈ R d | k x k p ≤ r } is the ℓ p -ball of r a dius r . Prop o sition 3. F or the me an estimation pr oblem, for al l p ∈ [1 , 2] and privacy levels α ∈ [0 , 1] , r 2 min  1 , 1 √ nα 2 , d nα 2  . M n ( θ ( P p,r ) , k·k 2 2 , α ) . r 2 min  d nα 2 , 1  . This b ound do es not dep end on the norm for X so long as p ∈ [1 , 2], which is consisten t w ith the classical mean estimation problem. Prop osition 3 d emo nstrates the sub sta n tial diﬀerence b et wee n d -dimensional mean estimati on in priv ate and non-priv ate set tings: m o re precisely , the priv acy constrain t leads to a m ultiplicativ e p enalt y of d/α 2 in term s of mean-squ a red error. Indeed, in the non-priv ate setting, the standard mean estimator b θ = 1 n P n i =1 X i has mean-squared error at most r 2 /n , since k X k 2 ≤ k X k p ≤ r by assump tio n. Thus, Prop osition 3 exhib it s an eﬀectiv e sample size reduction of n 7→ nα 2 /d . T o sho w the applicabilit y of th e general metric construction in Section 2, we no w consider estimation in ℓ ∞ -norm; estimation in this metric is natural in scenarios wh er e one wish e s only to guaran tee that the m a xim u m error of any particular comp onen t in the v ector θ is small. W e focu s in this scenario on the family P ∞ ,r of distributions P supp orted on B ∞ ( r ) ⊂ R d . Prop o sition 4. F or the me an estimation pr oblem, for al l α ∈ [0 , 1] , min ( r , r p d log(2 d ) √ nα 2 ) . M n ( θ ( P ∞ ,r ) , k·k ∞ , α ) . min ( r , r p d log(2 d ) √ nα 2 ) . Prop osition 4 p ro vides a similar message to Pr o p osition 3 on th e loss of statistical eﬃciency . This is clearest from an example: let X i b e r a ndom vect ors b ounded by one in ℓ ∞ -norm. Th en classical results on sub-Gaussian random v ariables [e.g., 12]) immediately imply that the standard non- priv ate mean b θ = 1 n P n i =1 X i satisﬁes E [ k b θ − E [ X ] k ∞ ] ≤ p log(2 d ) /n . Comparing this result to the rate p d log(2 d ) /n of Prop osition 4, w e again see the eﬀectiv e sample size reduction n 7→ nα 2 /d . Recen tly , there h as b een subs t an tial in terest in high-dimensional p roblems, in which the d imen- sion d is larger than the sample size n , but there is a lo w-dim en sio nal laten t structure that mak es inference p ossible. (See the pap er b y Negah ban et al. [44] for a ge neral o verview.) Accordingly , let us consider a n ideal ized v ersion of the high-dimensional m e an esti mation problem, in which w e assume that θ ( P ) = E [ X ] ∈ R d has (at most) one non-zero en try , so k E [ X ] k 0 ≤ 1. In the non-priv ate case, estimation of su ch an s -spars e p r edict or in the squ ared ℓ 2 -norm is p ossible at rate E [ k b θ − θ k 2 2 ] ≤ s log ( d/ s ) /n , so th at the dimension d can b e exp onen tially larger than the sample size n . With this con text, the next r esult sho ws that local priv acy can hav e a dramatic impact in the high-dimensional setting. Consider the family P s ∞ ,r := n distributions P sup ported on B ∞ ( r ) ⊂ R d with k E P [ X ] k 0 ≤ s o . 15 Prop o sition 5. F or the 1 -sp arse me ans pr oblem, for al l α ∈ [0 , 1] , min  r 2 , r 2 d log(2 d ) nα 2  . M n  θ ( P 1 ∞ ,r ) , k·k 2 2 , α  . min  r 2 , r 2 d log(2 d ) nα 2  . See Section A.3 for a pro of. F r o m Prop osition 5, it b eco mes clear that in lo c ally p r iv ate bu t non-in teractiv e (2 ) settings, high-dimens ional estimation is eﬀectiv ely imp ossible. 4.2.2 Optimal mechanisms: attainability for mean estimation In this section, w e describ e h o w to ac hiev e matc h ing up p er b ounds in Prop ositions 3 and 4 using simple and p r ac tical algo rithms—namely , the “righ t” t yp e of stochasti c p erturbation of the obser- v ations X i coupled with a sta ndard mean e stimator. W e sho w the optimalit y o f priv atizing via the sampling strateg ies (26a) and (26b); interestingly , w e also sho w that priv atizing via Laplace p erturbation is strictly sub - optimal. T o giv e a priv ate mec hanism, w e m ust sp ec ify the conditional distribution Q satisfying α - lo c al d iﬀe ren tial pr iv acy used to constru ct Z . In this case, giv en an observ ation X i , we construct Z i b y p erturbing X i in suc h a w a y that E [ Z i | X i = x ] = x . Eac h of the s tr a tegies (26a) and (26b) also requires a constant B , and w e sho w how to choose B for eac h strategy to satisfy the u nbiasedness condition E [ Z | X = x ] = x . W e b egin with the mean estimatio n problem for distribu tions P p,r in Prop osition 3, for w h ic h w e use the sampling scheme (26a). That is, let X = x ∈ R d satisfy k x k 2 ≤ k x k p ≤ r . Then we construct th e random v ector Z ac cording to strategy (26a), where w e set the initial vect or v = x in the sampling sc heme. T o ac hiev e the un b ia sedness condition E [ Z | x ] = x , w e deﬁne the b o und B = r e α + 1 e α − 1 d √ π Γ( d − 1 2 + 1) Γ( d 2 + 1) . (28) (See App endix F.2 for a pro of that E [ Z | x ] = x w it h this choice of B ). Notably , the c hoice (28) implies B ≤ cr √ d/α for a universal constan t c < ∞ , since d Γ( d − 1 2 + 1) / Γ( d 2 + 1) . √ d and e α − 1 = α + O ( α 2 ). As a consequence, ge nerating eac h Z i b y this p erturbation strategy and usin g the mean estimat or b θ = 1 n P n i =1 Z i , the estima tor b θ is unbiased for E [ X ] and satisﬁes E h k b θ − E [ X ] k 2 2 i = 1 n 2 n X i =1 V ar( Z i ) ≤ B 2 n ≤ c r 2 d nα 2 for a univ ersal constan t c . In Prop osition 4, we consider the family P ∞ ,r of distrib u tio ns supp orted on the ℓ ∞ -ball of radius r . In our mechanism for attaining the up per b ound , w e u se the sampling s c h eme (26 b) to generate the priv ate Z i , so th at for an observ ation X = x ∈ R d with k x k ∞ ≤ r , w e resample Z (from the initial v ector v = x ) according to strategy (26b). Again, we would lik e to guaran tee the unbiase dness condition E [ Z | X = x ] = x , f or which we use a resu lt of Duc h i et al. [19]. That p a p er sho w s that taking B = c r √ d α (29) for a (particular) un iv ersal constan t c , yields the desired u n biasedness [19, Corollary 3]. S ince the random v ariable Z satisﬁes Z ∈ B ∞ ( r ) with p r o babilit y 1, eac h coord inate [ Z ] j of Z is sub-Gaussian. 16 As a co nsequence, w e obtain via standard b o unds [12] that E [ k b θ − θ k 2 ∞ ] ≤ B 2 log(2 d ) n = c 2 r 2 d log(2 d ) nα 2 for a univ ersal constan t c , pro ving the upp er b oun d in Prop osit ion 4. T o conclude this section, w e note that the strategy o f adding Laplacian noise to the v ectors X is sub-optimal. I ndeed, consider the the family P 2 , 1 of distribu ti ons supp orted on B 2 (1) ⊂ R d as in Prop ositio n 3. T o guaran tee α -diﬀeren tial priv acy using indep endent Laplace noise ve ctors for x ∈ B 2 (1), we tak e Z = x + W where W ∈ R d has comp o nen ts W j that are indep enden t and distributed as Laplace( α/ √ d ). W e ha ve the follo w in g information-theoretic result: if the Z i are constructed via the Laplac e noise mec h an ism , inf b θ sup P ∈P E P h k b θ ( Z 1 , . . . , Z n ) − E P [ X ] k 2 2 i & min  d 2 nα 2 , 1  . (30) See Ap pend ix A.4 for the pr oof of this claim. The p oorer dimension d epend e nce exhibted b y the Laplace m e c hanism (3 0) in comparison to Prop osition 3 demonstr ates that sampling mec hanisms m u st b e chose n carefully , as in the s t rategies (26a)–(26b), in order to obtain s t atistical ly op timal rates. 5 Bounds on m ultiple p a irwise div ergences: Assouad’s metho d Th us far, w e ha ve seen h o w Le C a m’s metho d and F ano’s metho d, in the form of Theorem 2 and Corollary 4, can giv e sharp minimax rates for v arious problems. Ho w ever, their app li cation app ears to b e limited to problems whose minimax rates can b e cont rolled via reductions to bin a ry hyp ot hesis tests (Le Cam’s metho d) or f o r n o n-in teractiv e c h a nnels satisfying the simpler deﬁnition (2 ) of lo cal priv acy (F a no’s metho d). In this section, we sho w that a priv atized form of Assouad’s metho d (in the form of Lemma 1) can b e used to obtain sharp minimax rates in inte ractiv e settings. In particular, it can b e applied when the loss is suﬃ c ien tly “decomp osable,” so that the co ordinate- wise natur e of the Assouad construction can b e brought to b ear. Concretely , we sho w that an upp er b ound on a s um of paired KL-div ergences, when com bined with Assouad’s method, pro vides s h arp lo wer b ounds for sev eral problems, including multinomial probabilit y estimation and n onparametric densit y estimati on. Eac h of these p roblems can b e c h a racterized in terms of an eﬀec tiv e dimension d , and our results (p a ralleling those of Section 4) sho w that the requiremen t of α -local diﬀeren tial priv acy causes a r e duction in eﬀec tiv e sample size from n to nα 2 /d . 5.1 V a riational b ound s on paired div er gences F or a ﬁx ed d ∈ N , we consider collections of distrib u tio ns indexed using the Bo olea n hyp ercub e V = {− 1 , 1 } d . F or eac h i ∈ [ n ] and ν ∈ V , we let the distribution P ν,i b e sup ported on the ﬁxed set X , and we deﬁ ne the pro duct distrib u tio n P n ν = Q n i =1 P ν,i . Th en for j ∈ [ d ] we deﬁne the paired mixtures P n + j = 1 2 d − 1 X ν : ν j =1 P n ν , P n − j = 1 2 d − 1 X ν : ν j = − 1 P n ν , P ± j,i = 1 2 d − 1 X ν : ν j = ± 1 P ν,i . (31) 17 (Note that P n + j is n ot necessarily a pro duct d i stribution.) Recalling the marginal c han n el (3), w e ma y then deﬁn e the marginal mixtures M n + j ( S ) := 1 2 d − 1 X ν : ν j =1 M n ν ( S ) = Z Q n ( S | x 1: n ) dP n + j ( x 1: n ) for j = 1 , . . . , d, with the distrib utions M n − j deﬁned analogo usly . F or a giv en pair of distributions ( M , M ′ ), w e let D sy kl ( M | | M ′ ) = D kl ( M k M ′ ) + D kl ( M ′ k M ) denote the symmetrized KL-div er gence. Recalling the 1-ball of the suprem u m n orm (23), with these deﬁnitions w e hav e the follo wing theorem: Theorem 3. Under the c onditions of the pr evious p ar agr aph, for any α -lo c al ly diﬀer ential ly p ri- vate (1) channel Q , we have d X j =1 D sy kl  M n + j | | M n − j  ≤ 2( e α − 1) 2 n X i =1 sup γ ∈ B ∞ ( X ) d X j =1  Z X γ ( x ) dP + j,i ( x ) − dP − j,i ( x )  2 . Theorem 3 generaliz es Theorem 1, whic h corresp onds to the sp ecial case d = 1, though it also has parallels with Theorem 2, as taking the sup r em um outside th e su mmatio n is essen tial to obtain sharp results. W e provide the pr oof of Theorem 2 in Section 9. Theorem 3 allo ws us to prov e s harp e r lo w er b ounds on the minimax risk. A com bination of Pinsk er’s inequalit y and Cauch y-Sc hw arz implies d X j =1   M n + j − M n − j   TV ≤ 1 2 √ d  d X j =1 D kl  M n + j k M n − j  + D kl  M n − j k M n + j   1 2 . Th us, in com bination with the sharp er Assouad inequalit y (11), wh enev er P ν induces a 2 δ -Hamming separation for Φ ◦ ρ we h a ve M n ( θ ( P ) , Φ ◦ ρ ) ≥ dδ " 1 −  1 4 d d X j =1 D sy kl  M n + j | | M n − j   1 2 # . (32) The com b inati on of inequalit y (32) with Theorem 3 is the foun datio n for t he r emaind e r of this section. 5.2 Multinomial estimation under lo cal priv acy F or our ﬁr s t app l ication of Th eo rem 3, w e return to the original motiv ation for lo cal priv acy [50]: a vo iding sur v ey answ er bias. Consider the p robabilit y simplex ∆ d := n θ ∈ R d | θ ≥ 0 and d X j =1 θ j = 1 o . An y vec tor θ ∈ ∆ d sp eciﬁes a multinomial random v ariable taking d state s, in particular with probabilities P θ ( X = j ) = θ j for j ∈ { 1 , . . . , d } . Giv en a sample f rom this distribution, our goal is to estimate the p r o babilit y v ector θ . W arner [50] studied t he Be rnoulli v arian t of this p r oblem (corresp onding to d = 2), prop osing a mec hanism kno w n as r andomize d r esp onse : for a give n surve y question, resp onden ts answer truthfully with probabilit y p > 1 / 2 and lie with p robabilit y 1 − p . Here w e show that an e xtension of this mec hanism is optimal fo r α -lo cal ly diﬀeren tially priv ate m ultinomial estimation. 18 5.2.1 Minimax rat es of conv ergence for multinomial estimation Our ﬁr st r e sult pr o vid es b ounds on the minimax error measur ed in either the squared ℓ 2 -norm or the ℓ 1 -norm for (sequential ly) in teractiv e channels. Th e ℓ 1 -norm is sometimes more ap p ropriate for probabilit y estimation d ue to its connectio ns w i th total v ariation distance and testing. Prop o sition 6. F or the multinomial estimation p r oblem, for a ny α -lo c al ly diﬀe r ential ly private channel (1) , ther e exist univ ersa l c onstants 0 < c ℓ ≤ c u < 5 suc h that for al l α ∈ [0 , 1] , c ℓ min  1 , 1 √ nα 2 , d nα 2  ≤ M n  ∆ d , k·k 2 2 , α  ≤ c u min  1 , d nα 2  , ( 33) and c ℓ min  1 , d √ nα 2  ≤ M n (∆ d , k·k 1 , α ) ≤ c u min  1 , d √ nα 2  . (34) See App endix B for the pro ofs of the low er b ound s . W e p ro vide simple estimation strategies ac hieving the u pp e r b ounds in th e next section. As in the p revious section, let us compare the priv ate rates to the classical rate in whic h there is no priv acy . The maxim u m lik eliho od estimate sets b θ j as the prop ortion of samp le s taking v alue j ; it has mean-squared error E h k b θ − θ k 2 2 i = d X j =1 E h ( b θ j − θ j ) 2 i = 1 n d X j =1 θ j (1 − θ j ) ≤ 1 n  1 − 1 d  < 1 n . An analog ous calculation for the ℓ 1 -norm yields E [ k b θ − θ k 1 ] ≤ d X j =1 E [ | b θ j − θ j | ] ≤ d X j =1 q V ar( b θ j ) ≤ 1 √ n d X j =1 q θ j (1 − θ j ) < √ d √ n . Consequent ly , for estimation in ℓ 1 or ℓ 2 -norm, the eﬀect of pro vid ing α -diﬀeren tial priv acy causes the eﬀectiv e samp le size to decrease as n 7→ n α 2 /d . 5.2.2 Optimal mechanisms: attainability for m ultinomial estimation An in teresting consequence of the lo we r b ound (33) is the follo wing: a minor v ariant of W arner’s randomized resp onse strategy is an optimal mec h a nism. Th e re are also ot her relativ ely simple estimation strateg ies that achiev e con ve rgence rate d/nα 2 ; the Laplace p erturbation approac h [24] is another. Nonetheless, its ease of u se, coupled with our optimalit y results, p ro vid e s u pp ort for randomized resp onse as a desirable probabilit y estimation metho d. Let us demonstrate that these strategies attain the optimal rate of con verge nce. S ince there is a bijection b et w een multinomial observ ations x ∈ { 1 , . . . , d } and the d standard basis v ectors e 1 , . . . , e d ∈ R d , w e abus e notation and represen t observ ations as either when designing estimat ion strategies. In rand omized resp onse, w e constru c t the priv ate v ector Z ∈ { 0 , 1 } d from a multi nomial observ ation x ∈ { e 1 , . . . , e d } b y sampling d coord inate s indep endent ly via th e pro cedure [ Z ] j = ( x j with probabilit y exp( α/ 2) 1+exp( α/ 2) 1 − x j with probabilit y 1 1+exp( α/ 2) . (35) 19 The distribution (35) is α -diﬀeren tially priv ate: indeed, for x, x ′ ∈ ∆ d and any z ∈ { 0 , 1 } d , w e ha v e Q ( Z = z | x ) Q ( Z = z | x ′ ) = exp  α 2  k z − x k 1 −   z − x ′   1   ∈ [exp( − α ) , exp( α )] , where th e triangle inequalit y guarante es | k z − x k 1 − k z − x ′ k 1 | ≤ 2. W e no w compu te the exp ected v alue and v ariance of the random v ariables Z . Using the deﬁ n iti on (35), w e ha ve E [ Z | x ] = e α/ 2 1 + e α/ 2 x + 1 1 + e α/ 2 ( 1 − x ) = e α/ 2 − 1 e α/ 2 + 1 x + 1 1 + e α/ 2 1 . Since th e random v ariables Z are Bernou lli, w e h av e the v ariance b ound E [ k Z k 2 2 ] ≤ d . Letting Π ∆ d denote the pro jection op erator onto the simplex, we arr iv e at the natural estimator b θ part := 1 n n X i =1  Z i − 1 / (1 + e α/ 2 )  e α/ 2 + 1 e α/ 2 − 1 and b θ := Π ∆ d  b θ part  . (36) The pro jection of b θ part on to th e probabilit y simplex can b e done in time linear in th e dim en sio n d of the p roblem [1 1], so th e estimat or (36) is eﬃcien tly computable. Sin ce pr o ject ions onto con ve x sets are non-expansiv e, an y pair of v ectors in the simplex are at most ℓ 2 -distance √ 2 apart, and E θ [ b θ part ] = θ b y construction, we hav e E h k b θ − θ k 2 2 i ≤ min n 2 , E h k b θ part − θ k 2 2 io ≤ min  2 , d n e α/ 2 + 1 e α/ 2 − 1 ! 2  . min  1 , d nα 2  . Similar resu lts h o ld for the ℓ 1 -norm: using the same estimator, since Euclidean pro jections to the simplex are non-expansiv e for the ℓ 1 distance, E h k b θ − θ k 1 i ≤ min  1 , d X j =1 E h | b θ part ,j − θ j | i  . min  1 , d √ nα 2  . 5.3 Densit y est imation under lo cal priv acy In this section, we sh o w that the eﬀects of local diﬀerential p riv acy are more sev ere for n onpara- metric density estimation: instead of j u st a m ultiplicativ e loss in the eﬀec tiv e sample size as in previous sections, imp osing lo cal diﬀerential priv acy leads to a d iﬀe ren t con v ergence rate. This result holds even though we solve a problem in w hic h the function estimate d and the observ ations themselv es b elo ng to compact s p ac es. Deﬁnition 2 (Elliptical Sob olev space) . F or a give n orthonormal basis { ϕ j } of L 2 ([0 , 1]), smo oth- ness parameter β > 1 / 2 and radiu s C , the Sob olev class of order β is giv en by F β [ C ] :=  f ∈ L 2 ([0 , 1]) | f = ∞ X j =1 θ j ϕ j suc h that ∞ X j =1 j 2 β θ 2 j ≤ C 2  . 20 If w e c ho ose the trignometric basis as our orth on orm a l b a sis, mem b ership in the class F β [ C ] corresp onds to smo o thness constraint s on the deriv ativ es of f . More precisely , for j ∈ N , consider the orthonormal basis for L 2 ([0 , 1]) of trigo nometric functions: ϕ 0 ( t ) = 1 , ϕ 2 j ( t ) = √ 2 cos(2 π j t ) , ϕ 2 j +1 ( t ) = √ 2 sin(2 π j t ) . ( 37) Let f b e a β -times almost ev erywhere diﬀerentia ble function for whic h | f ( β ) ( x ) | ≤ C for almost ev ery x ∈ [0 , 1] satisfying f ( k ) (0) = f ( k ) (1) for k ≤ β − 1. Then, uniformly ov er all suc h f , there is a univ ersal co nstan t c ≤ 2 suc h that that f ∈ F β [ cC ] (see, for instance, [49, Lemma A.3]). Supp ose our goal is to estimate a density fun ct ion f ∈ F β [ C ] and that qu ality is measured in terms of the squ ared error (squared L 2 [0 , 1]-norm) k b f − f k 2 2 := Z 1 0 ( b f ( x ) − f ( x )) 2 dx. The w ell-kno wn [53, 52, 49] (non-priv ate) minimax squ ared risk scales as M n  F β , k·k 2 2 , ∞  ≍ n − 2 β 2 β +1 . (38) The goal of this section is to u nderstand ho w this minimax rate change s when we add an α -priv acy constrain t to the problem. Ou r main result is to demonstrate t hat the classica l rate (38) is no longer attai nable wh e n w e require α -lo ca l diﬀerentia l priv acy . 5.3.1 Lo wer b ounds on density estimation W e b egin b y giving our main lo w er b ound on the minimax rate of estimation o f densities when observ ations from the density are diﬀeren tially p riv ate. W e provide the pro of of the follo wing prop osition in Section C.1. Prop o sition 7. Consider the class of densities F β deﬁne d using the trigonometric b asis (37) . Ther e exi sts a c onstant c β > 0 such that for any α -lo c al ly diﬀer ential ly priva te channel (1) with α ∈ [0 , 1] , the private minimax risk has lower b ound M n  F β [1] , k ·k 2 2 , α  ≥ c β  nα 2  − 2 β 2 β +2 . (39) The most imp ortant feature of the lo wer b ound (39) is that it inv olv es a diﬀer ent p olynomial exp onent th a n the classical minimax r ate (38). Whereas the exp onen t in classical case (38) is 2 β / (2 β + 1), it r e duces to 2 β / (2 β + 2) in the lo c ally pr iv ate setting. F or example, when w e estimate Lipsc h itz dens i ties ( β = 1), the rate degrades from n − 2 / 3 to n − 1 / 2 . In terestingly , no estimator based on Laplace (or exp onen tial) p erturbation of the observ ations X i themselv es ca n attain t he rate of conv er gence (39). Th is fact f o llo ws from r esults of Carroll and Hall [13] on nonparametric d e con vol ution. They show th a t if observ ations X i are p erturb ed by additiv e noise W , where the c haracteristic fu ncti on φ W of the additiv e n oi se has tails b eha ving as | φ W ( t ) | = O ( | t | − a ) for s o me a > 0, then no estimator can decon volv e X + W and attain a rate of con verge nce b etter than n − 2 β / (2 β +2 a +1) . Since the c haracteristic function of the Laplace distr ib ution has tails deca ying as t − 2 , no estimator based on the Laplace mec hanism (applied d irect ly to the observ ations) can attain rate of conv ergence b ette r than n − 2 β / (2 β +5) . In order to attain the lo w er b ound (39 ), we must th us study alternativ e p riv acy mec hanisms. 21 5.3.2 Ac hiev a bilit y by histogram estimat o rs W e no w tu r n to the mean-squ a red errors ac h ie v ed b y sp eciﬁc practical sc hemes, beginnin g with the s p ecial case of Lip sc h it z densit y fun ctions ( β = 1). I n this sp ecial case, it suﬃces to consider a priv ate ve rsion of a classical histogram estimate. F or a ﬁx ed p ositi v e in teger k ∈ N , let {X j } k j =1 denote the partitio n of X = [0 , 1] into the inte rv als X j = [( j − 1) /k , j /k ) for j = 1 , 2 , . . . , k − 1, and X k = [( k − 1) /k , 1] . An y histogram estima te of the d e nsit y based on these k bins can b e sp eciﬁed by a v ector θ ∈ k ∆ k , where w e recall ∆ k ⊂ R k + is th e probabilit y simplex. Letting 1 E denote the characte ristic (ind ic ator) function of the s e t E , an y suc h v ector θ ∈ R k deﬁnes a densit y estimate via the sum f θ := k X j =1 θ j 1 X j . Let u s no w describ e a mec hanism that guaran tees α -lo cal d iﬀe ren tial p riv acy . Giv en a sample { X 1 , . . . , X n } from the d ist ribution f , consider v ectors Z i := e k ( X i ) + W i , for i = 1 , 2 , . . . , n, (40) where e k ( X i ) ∈ ∆ k is a k -v ector with j th en try e qual to one if X i ∈ X j and zero e s in all other en tries, and W i is a random vec tor w i th i.i.d. Laplace( α/ 2) en tries. The v ariables { Z i } n i =1 deﬁned in this wa y are α -locally d iﬀe ren tially p riv ate for { X i } n i =1 . Using these priv ate v ariables, we form the densit y estimate b f := f b θ = P k j =1 b θ j 1 X j based on the v ector b θ := Π k  k n P n i =1 Z i  , wher e Π k denotes the Eu cl idean pr o jection op erator on to the set k ∆ k . By construction, w e ha ve b f ≥ 0 and R 1 0 b f ( x ) dx = 1, so b f is a v alid densit y estimate. The follo wing resu lt c haracterizes its m e an-squared estimation error: Prop o sition 8. Consider the estima te b f b ase d o n k = ( nα 2 ) 1 / 4 bins in the histo gr am. F or any 1 -Lipschitz density f : [0 , 1] → R + , the MSE is upp er b ounde d as E f h   b f − f   2 2 i ≤ 5( α 2 n ) − 1 2 + √ αn − 3 / 4 . (41) F or any ﬁ x ed α > 0, the ﬁrst term in the b ound (41 ) dominates, and the O (( α 2 n ) − 1 2 ) rate matc hes the min imax lo w er b ound (39) in the case β = 1. Consequ e n tly , the priv atized h isto gram estimator is m inimax-o ptimal for Lipschitz densities, pro viding a priv ate analog of the classical result that histogram estimators are minimax-optimal for Lipshitz densities. See S ection C.2 for a pro of of Prop osition 8. W e remark that a randomized r esponse s cheme p aral lel to that of Section 5.2 .2 ac hiev es th e same rate of conv ergence, sh o w i ng that this classical mec h anism is again an optimal sc heme. 5.3.3 Ac hiev a bilit y by orthogonal pro jection estimators F or h ig her degrees of smo othness ( β > 1), standard histogram estimators n o longer ac h i ev e optimal rates in the classica l setting [47]. Accordingly , w e no w turn to dev eloping estimators based on orthogonal series expansion, and sho w that ev en in the setting of lo ca l p riv acy , they can ac hiev e the lo we r b ound (39 ) for all orders of smo othn ess β ≥ 1. 22 Recall the elliptical Sob olev sp ace (Deﬁnition 2), in whic h a fu n ct ion f is repr ese n ted in terms of its basis expansion f = P ∞ j =1 θ j ϕ j . This representat ion underlies the orthonormal series estima tor as follo ws. Given a sample X 1: n dra wn i.i.d. according to a densit y f ∈ L 2 ([0 , 1]), compute the empirical basis coeﬃcien ts b θ j = 1 n n X i =1 ϕ j ( X i ) for j ∈ { 1 , . . . , k } , (42) where th e v alue k ∈ N is c hosen either a p riori based on known p roperties of the estimation problem or adap tively , for example, using cross-v alidation [26, 49]. Using these empirical coeﬃcients, the densit y estimate is b f = P k j =1 b θ j ϕ j . In our local priv acy setting, w e consider a mechanism that, instead of releasing the v ector of co e ﬃcien ts  ϕ 1 ( X i ) , . . . , ϕ k ( X i )  for eac h data p oint, employs a random v ector Z i = ( Z i, 1 , . . . , Z i,k ) satisfying E [ Z i,j | X i ] = ϕ j ( X i ) for eac h j ∈ [ k ]. W e assu me the basis functions are B 0 -uniformly b ounded, that is, sup j sup x | ϕ j ( x ) | ≤ B 0 < ∞ . This b oundedness cond it ion h o lds for many standard bases, includ in g the trigonomet ric basis (37) that underlies the cla ssical Sob olev classes and the W al sh basis. W e generate the random v ariables from the v ector v ∈ R k deﬁned b y v j = ϕ j ( X ) in the hyp e rcub e-based s a mpling scheme (26b), wh er e we assume that the outer b ound B > B 0 . With this sampling s tr a tegy , iteration of exp ec tation yields E [[ Z ] j | X = x ] = c k B B 0 √ k  e α e α + 1 − 1 e α + 1  ϕ j ( x ) , (43) where c k > 0 is a constant (wh ic h is b oun d ed indep endently of k ). Consequen tly , it suﬃ c es to tak e B = O ( B 0 √ k /α ) to guarantee the un b iasedn e ss co ndition E [[ Z i ] j | X i ] = ϕ j ( X i ). Ov er all, th e priv acy mec han ism and estimator p erform the follo wing steps: • giv en a data p oint X i , set the vecto r v = [ ϕ j ( X i )] k j =1 ; • sample Z i according to the strategy (26b), starting from the v ector v and usin g the b o und B = B 0 √ k ( e α + 1) /c k ( e α − 1); • compute the density estimate b f := 1 n n X i =1 k X j =1 Z i,j ϕ j . (44) The resulting e stimate enjo ys the follo wing guarantee , whic h (along with Prop osition 8 ) mak es clear that the p riv ate minimax lo wer b ound (39) is sh a rp, p ro v id ing a v arian t of the classical rates with a p ol ynomially w orse sample complexit y . (See Section C.3 for a pro o f.) Prop o sition 9. L et { ϕ j } b e a B 0 -uniformly b ounde d orthono rmal b asis for L 2 ([0 , 1]) . Ther e exists a c onstant c (dep ending only on C and B 0 ) such that, for any f in the Sob olev sp ac e F β [ C ] , the estimator (44) with k = ( n α 2 ) 1 / (2 β +2) has an MSE that is u p p er b ounde d as fol lows: E f h k f − b f k 2 2 i ≤ c  nα 2  − 2 β 2 β +2 . (45) 23 Before concluding our exp osition, we m ake a few r e marks on other p oten tial densit y estimators. Our orth o gonal series estimator (44) and sampling sc heme (43), w hile s im ilar in spirit to that pro- p osed b y W asserman and Zhou [51, S ec . 6], is d iﬀe ren t in that it is lo cally priv ate and requires a diﬀeren t n oise strategy to obtain b oth α -local priv acy an d the optimal conv ergence rate. Lastly , similarly to ou r remarks on the insu ﬃ c iency of standard Laplace noise addition for mean estima- tion, it is wo rth noting that densit y estimators that are based on orthogonal series and Laplac e p erturbation are su b-optima l: they can ac hiev e (at b est) r ates of ( nα 2 ) − 2 β 2 β +3 . This r a te is p oly- nomially w orse than the sharp result pro vided by P r oposition 9. Again, w e see that appropriately c hosen n o ise mec hanisms are crucial for obtaining optimal results. 6 Comparison to related w ork There has b een a substantia l amount of w ork in dev eloping d iﬀe ren tially pr iv ate m echanisms, b oth in lo cal and non-lo cal settings, and a num b er of authors ha ve attempted to charac terize optimal mec hanism s. F o r example, Kasiviswanathan et al. [37], w orking within a lo ca l d iﬀerential priv acy setting, study Probably-Appro ximately-Correct (P AC) learning problems and show that the statistic al query mo del [38] and lo cal learnin g are equiv alent up to p olynomial c hanges in the sample size. In our work, w e are concerned with a ﬁ ner-grained assessment of inferenti al pro cedures—that of rates of con v ergence of pro cedures and their optimalit y . In the remainder of this section, w e discu s s f urther connections of our w ork to previous researc h on optimalit y , global (non-lo ca l) diﬀerent ial priv acy , as well as error-in-v ariables mo dels. 6.1 Sample v ersus p opulation estimation The standard deﬁnition of diﬀeren tial p riv acy , due to Dwork et al. [24], is somewhat less restrictive than the lo cal priv acy formulation considered h ere. In particular, a conditional d istribution Q with output space Z is α -diﬀeren tially pr iv ate if sup  Q ( S | x 1: n ) Q ( S | x ′ 1: n ) | x i , x ′ i ∈ X , S ∈ σ ( Z ) , d ham ( x 1: n , x ′ 1: n ) ≤ 1  ≤ exp( α ) , (46) where d ham denotes the Hamming distance b et wee n sets. Sev eral researc hers h a ve considered quan- tities similar to our minimax cr iteria u nder lo cal (2) or non -lo cal (46) diﬀeren tial p riv acy [7, 35, 33, 18]. Ho wev er, the ob jectiv e h a s often b een quite diﬀeren t from ours: in ste ad of b ounding errors based on p o pulation-based quantitie s, they pro vide b ound s in whic h the data are assumed to b e held ﬁxed. More pr e cisely , let θ : X n → Θ denote an estimator, and let θ ( x 1: n ) b e a sample qu antit y based on x 1: n . Prior w ork is based on c onditional minimax risks of the f o rm M cond n ( θ ( X ) , Φ ◦ ρ, α ) := in f Q sup x 1: n ∈X n E Q h Φ  ρ  θ ( x 1: n ) , b θ  | X 1: n = x 1: n i , (47) where b θ is drawn according t o Q ( · | x 1: n ), the inﬁmum is ta k en o ver all α -diﬀeren tially priv ate c hann el s Q , and the su prem um is tak en o ver all p ossible samples of size n . The only randomness in this conditional minimax risk is pro vided by the c h a nnel; the data are held ﬁxed, so th e re is no randomness f r om an underlying p opu lation distr i bution. A partial list of pap ers that use d e ﬁnitions of this typ e include Beimel et al . [7, S e ction 2.4], Hardt and T alw ar [35, Deﬁnition 2.4], Hall et al. [33, Section 3], an d De [18]. 24 The conditional ( 47) and popu la tion minimax risk (5) ca n d iﬀe r subs t an tially , and suc h dif- ferences are critical to address within a statistical app r oa c h to priv acy-constrained inference. Th e goal of in f e rence is to draw conclusions ab out the p opulation-b ase d quantity θ ( P ) based o n the sample. M oreo v er, lo wer bou n ds on the conditional minimax risk (47) do not imply b ou n ds on th e rate of esti mation for the p opulation θ ( P ). In f act, the cond it ional minimax risk (47) inv olv es a supremum o ver al l p ossible sampl es x ∈ X , so the opp osite is usu a lly true: p opulation risks p r o vid e lo wer b ounds on the conditional minimax risk, as we sh o w presen tly . An illustr ative example is u s e ful to un derstand the diﬀerences. Consider estimation of the mean of a normal distribution with kno wn standard deviation σ 2 , in whic h the mean θ = E [ X ] ∈ [ − 1 , 1] is assu med to b elong to the unit in terv al. As our Pr op osition 1 sho w s, it is p ossible to estima te the mea n of a normally-distributed random v ariable ev en under α - lo c al diﬀeren tial priv acy (1). In sharp contrast, the f o llo wing r esult shows that the conditional minimax r isk is inﬁnite for this problem: Lemma 2. Consider the norm al lo c ation family { N ( θ , σ 2 ) | θ ∈ [ − 1 , 1] } under α -diﬀer ential pri- vacy (46) . The c onditional minimax risk of the me an statistic is M cond n ( θ ( R ) , ( · ) 2 , α ) = ∞ . Pr o of. Assume for sak e of con tradiction that δ > 0 satisﬁes Q ( | b θ − θ ( x 1: n ) | > δ | x 1: n ) ≤ 1 2 for all samples x 1: n ∈ R n . Fix N ( δ ) ∈ N and choose p oin ts 2 δ -separated p oin ts θ ν , ν ∈ [ N ( δ )], that is, | θ ν − θ ν ′ | ≥ 2 δ for ν 6 = ν ′ . Then the sets { θ ∈ R | | θ − θ ν | ≤ δ } are al l d isjoi n t, so for any pair of samples x 1: n and x ν 1: n with d ham ( x 1: n , x ν 1: n ) ≤ 1, Q ( ∃ ν ∈ V s.t. | b θ − θ ν | ≤ δ | x 1: n ) = N ( δ ) X ν =1 Q ( | b θ − θ ν | ≤ δ | x 1: n ) ≥ e − α N ( δ ) X ν =1 Q ( | b θ − θ ν | ≤ δ | x ν 1: n ) . W e ma y tak e eac h sample x ν 1: n suc h that θ ( x ν 1: n ) = 1 n P n i =1 x ν i = θ ν (for example, for eac h ν ∈ [ N ( δ )] set x ν 1 = nθ ν − P n i =2 x i ) and b y assumption, 1 ≥ Q ( ∃ ν ∈ V s.t. | b θ − θ ν | ≤ δ | x 1: n ) ≥ e − α N ( δ ) 1 2 . T aking N ( δ ) > 2 e α yields a cont radiction. Ou r argum ent applies to an arbitrary δ > 0, so the claim follo w s. There are v ariations on this result. F o r instance, ev en if the output of the mean estimator is restricted to [ − 1 , 1], the conditional minimax risk remains co nstan t. S imil ar argumen ts app l y to w eake nings of diﬀerential priv acy (e.g., δ -approximate α -diﬀeren tial pr iv acy [23]). Conditional and p opulation r isks are v ery diﬀeren t quantitie s. More generally , the p opulation minimax risk usually lo w er b ounds the cond itional minimax risk . Supp ose we measure minimax risks in some give n metric ρ (so the loss Φ( t ) = t ). Let e θ b e an y 25 estimator b a sed on the original sample X 1: n , and let b θ b e an y estimator b a sed on the priv atized sample. W e then h a ve th e follo w ing series of inequ a lities: E Q,P [ ρ ( θ ( P ) , b θ )] ≤ E Q,P [ ρ ( θ ( P ) , e θ )] + E Q,P [ ρ ( e θ , b θ )] ≤ E P [ ρ ( θ ( P ) , e θ )] + sup x 1: n ∈X n E Q,P [ ρ ( e θ ( x 1: n ) , b θ ) | X 1: n = x 1: n ] . (48) The p opulation m in imax risk (5) thus lo wer b ounds the conditional minimax risk (47) via M cond n ( e θ ( X ) , ρ, α ) ≥ M n ( θ ( P ) , ρ, α ) − E P [ ρ ( θ ( P ) , e θ )]. In particular, if there exists an estimato r e θ based on the original (non-priv ate data) suc h that E P [ ρ ( θ ( P ) , e θ )] ≤ 1 2 M n ( θ ( P ) , ρ, α ) w e are guarant eed that M cond n ( e θ ( X ) , ρ, α ) ≥ 1 2 M n ( θ ( P ) , ρ, α ) , so the conditional minimax risk is low er b ounded b y a constan t m u lt iple of the p opulation minimax risk. This lo wer b ound holds for eac h of the examples in Sections 3–5; lo w er b ounds on the α -priv ate p opulation min imax risk (5) are stronger than low er b ound s on th e conditional min im ax risk. T o illustrate one applicat ion of the lo wer b ound (48), consider the estimatio n of the sample mean of a data set x 1: n ∈ { 0 , 1 } n under α -local priv acy . This p roblem has b een considered b efore; for instance, Beimel et al. [7] study distributed proto cols for th is problem. In Theorem 2 of their work, they sho w that if a proto col has ℓ round s of comm un ic ation, the squared er r o r in estimating the s a mple mean (1 /n ) P n i =1 x i is Ω ( 1 / ( nα 2 ℓ 2 )). The standard mean estimator e θ ( x 1: n ) = (1 /n ) P n i =1 x i has error E [ | e θ ( x 1: n ) − θ | ] ≤ n − 1 2 . Co nsequentl y , the lo we r b ound (4 8) with combined with Prop ositio n 1 implies c 1 √ nα 2 − 1 √ n ≤ M n ( θ ( P ) , | · | , α ) − su p θ ∈ [ − 1 , 1] E [ | e θ ( x 1: n ) − θ | ] ≤ M cond n ( θ ( {− 1 , 1 } ) , | · | , α ) , for some numerical constan t c > 0. A corollary of our r esults is suc h an Ω(1 / ( nα 2 )) lo w er b ound on the conditional minimax risk f o r mean estimation, allo win g for sequentia l interact ivit y bu t not m u lt iple “rounds.” An insp ection of Beimel et al.’s pro of tec h nique [7, Section 4.2] sh o ws th a t their lo wer b ound also im p lie s a lo w er b ound of 1 /nα 2 for estimation of the p opulation mean E [ X ] in one dimension in non-i nter active (2) settings; it is, h ow ev er, unclear h o w to extend th e ir tec h n ique to other settings. 6.2 Lo cal v ersus non-lo cal priv acy It is also w orthwhile to mak e some comparisons to w ork on non-lo cal forms of d iﬀeren tial priv acy , mainly to understand the diﬀerences b et ween local a nd global forms of priv acy . Chaudhuri and Hsu [15] pro vide lo wer b ounds for estimation of certain one-dimensional statistics based on a tw o- p oin t family of problems. Th ei r tec h niques diﬀer from those of the curren t pap er, and do not app ear to provide b ound s on the statistic b eing estimated, bu t rather one that is near to it. Beimel et al. [8] provide some b ound s on samp le co mplexit y in the “probably app r o ximate correct” (P A C ) framew ork of learning theory , though extensions to other inferen tial tasks are u ncle ar. Other wo rk on non-lo cal priv acy [e.g., 33, 16, 48] sho ws that for v arious t yp es of estimation problems, adding Laplacian n o ise leads to degraded conv er gence rates in at most lo w er-order terms. In con trast, our w ork shows that the Laplace mec hanism ma y b e highly sub-optimal in lo cal priv acy . 26 T o understand con verge nce rates for non-lo ca l priv acy , let us r et urn to estimation of a multi- nomial distribution in ∆ d , based on ob s e rv ations X i ∈ { e j } d j =1 . In this case, add in g a noise v ector W ∈ R d with i.i.d. en tries d ist ributed as Laplace( αn ) provides diﬀerential priv acy [23]; the associ- ated mean-squared error is at most E θ      1 n n X i =1 X i + W − θ     2 2  = E      1 n n X i =1 X i − θ     2 2  + E [ k W k 2 2 ] ≤ 1 n + d n 2 α 2 . In p articular, in the asymptotic regime n ≫ d , there is n o p enalt y from pro v id ing diﬀeren tial p riv acy except in higher-order terms. Similar r esults h old for h isto gram estimation [33], classiﬁcation problems [16], and classical p oin t estimation problems [48]; in this sense, local and gl obal forms of diﬀeren tial priv acy ca n b e rather d iﬀe ren t. 6.3 Error-in-v ariables mo dels As a ﬁnal remark on related work, we touch b rieﬂy on err ors-in-v ariables m o dels [14, 31], which ha ve b een the su b ject of extensiv e study . In such p roblems, one observes a corrup t ed v ers io n Z i of the true co v ariate X i . Priv acy analysis is one of the few settings in wh ic h it is p ossible to precisely kno w th e conditional distribution Q ( · | X i ). Ho we v er, the mec hanisms th a t are optimal from our an alysis—in particular, th o se in strateg ies (26a) and (26b)—are more complica ted than adding noise d irect ly to th e co v ariates, whic h leads to complications. Kno wn (statistically) eﬃcien t error-in-v ariables estimatio n pro cedures often requir e either solving certain in tegrals or estimating equations, or solving n on -conv ex optimiza tion problems [e.g., 39, 41]. Some recen t wo rk [40] sh o ws that certain types of non-conv ex programs arising fr o m errors-in-v ariables can b e solved eﬃcien tly . In d ensit y est imation (a s noted in Sect ion 5.3.1), corrupted observ ations lead to nonparametric decon volutio n problems that app ear harder than e stimation under pr iv acy constraints. F urther in vestiga tion of computationally eﬃcient pro cedures for nonlinear err or-in-v ariables mo dels for priv acy-preserv ation is an in teresting direction for future researc h. 7 Pro of of Theorem 1 and related results W e no w turn to the pro ofs of our results, beginnin g with Theorem 1 and related results. I n all cases, w e defer the pro ofs of more tec hnical lemmas to the app endices. 7.1 Pro of of Theorem 1 Observe that M 1 and M 2 are abs olutely con tinuous w it h r e sp ect to one another, and there is a measure µ with resp ect to whic h they ha ve densities m 1 and m 2 , resp ectiv ely . The c h a nnel probabilities Q ( · | x ) and Q ( · | x ′ ) are likewise absolutely con tinuous, so that we ma y assume they ha ve densities q ( · | x ) and write m i ( z ) = R q ( z | x ) dP i ( x ). In terms of these densities, w e hav e D kl ( M 1 k M 2 ) + D kl ( M 2 k M 1 ) = Z m 1 ( z ) log m 1 ( z ) m 2 ( z ) dµ ( z ) + Z m 2 ( z ) log m 2 ( z ) m 1 ( z ) dµ ( z ) = Z  m 1 ( z ) − m 2 ( z )  log m 1 ( z ) m 2 ( z ) dµ ( z ) . Consequent ly , w e m ust b ound b oth th e diﬀerence m 1 − m 2 and the log ratio of the marginal densities. The follo win g t w o auxiliary lemmas are usefu l: 27 Lemma 3. F or any α -lo c al ly diﬀer ential ly private c onditional, we have | m 1 ( z ) − m 2 ( z ) | ≤ c α inf x q ( z | x ) ( e α − 1) k P 1 − P 2 k TV , (49) wher e c α = min { 2 , e α } . Lemma 4. L et a, b ∈ R + . Then   log a b   ≤ | a − b | min { a,b } . W e pro ve these tw o results at the end of this section. With the lemmas in hand, let u s no w complete the pro of of the theorem. F rom Lemma 4, the log ratio is b ound e d as     log m 1 ( z ) m 2 ( z )     ≤ | m 1 ( z ) − m 2 ( z ) | min { m 1 ( z ) , m 2 ( z ) } . Applying Lemma 3 to the numerator yields     log m 1 ( z ) m 2 ( z )     ≤ c α ( e α − 1) k P 1 − P 2 k TV inf x q ( z | x ) min { m 1 ( z ) , m 2 ( z ) } ≤ c α ( e α − 1) k P 1 − P 2 k TV inf x q ( z | x ) inf x q ( z | x ) , where the ﬁnal step uses the in equali t y min { m 1 ( z ) , m 2 ( z ) } ≥ in f x q ( z | x ). Pu tt ing together the pieces leads to the b oun d     log m 1 ( z ) m 2 ( z )     ≤ c α ( e α − 1) k P 1 − P 2 k TV . Com b ining with inequalit y (49) yields D kl ( M 1 k M 2 ) + D kl ( M 2 k M 1 ) ≤ c 2 α ( e α − 1) 2 k P 1 − P 2 k 2 TV Z inf x q ( z | x ) dµ ( z ) . The ﬁnal in tegral is at most one, w hic h completes the pro of of the theorem. It remains to pro v e Lemmas 3 and 4. W e b egi n with the form e r. F or an y z ∈ Z , w e ha v e m 1 ( z ) − m 2 ( z ) = Z X q ( z | x ) [ dP 1 ( x ) − dP 2 ( x )] = Z X q ( z | x ) [ dP 1 ( x ) − dP 2 ( x )] + + Z X q ( z | x ) [ dP 1 ( x ) − dP 2 ( x )] − ≤ sup x ∈X q ( z | x ) Z X [ dP 1 ( x ) − dP 2 ( x )] + + inf x ∈X q ( z | x ) Z X [ dP 1 ( x ) − dP 2 ( x )] − =  sup x ∈X q ( z | x ) − inf x ∈X q ( z | x )  Z X [ dP 1 ( x ) − dP 2 ( x )] + . By deﬁnition of th e total v ariation norm, w e ha v e R [ dP 1 − dP 2 ] + = k P 1 − P 2 k TV , and hence | m 1 ( z ) − m 2 ( z ) | ≤ sup x,x ′   q ( z | x ) − q ( z | x ′ )   k P 1 − P 2 k TV . (50) 28 F or any ˆ x ∈ X , w e ma y add and subtract q ( z | ˆ x ) fr om the quan tity insid e the su prem um, whic h implies that sup x,x ′   q ( z | x ) − q ( z | x ′ )   = inf ˆ x sup x,x ′   q ( z | x ) − q ( z | ˆ x ) + q ( z | ˆ x ) − q ( z | x ′ )   ≤ 2 inf ˆ x sup x | q ( z | x ) − q ( z | ˆ x ) | = 2 inf ˆ x q ( z | ˆ x ) sup x     q ( z | x ) q ( z | ˆ x ) − 1     . Similarly , w e ha v e for an y x, x ′ | q ( z | x ) − q ( z | x ′ ) | = q ( z | x ′ )     q ( z | x ) q ( z | x ′ ) − 1     ≤ e α inf b x q ( z | b x )     q ( z | x ) q ( z | x ′ ) − 1     . Since f o r any c hoice of x, ˆ x , we h a ve q ( z | x ) /q ( z | ˆ x ) ∈ [ e − α , e α ], we ﬁ nd that (since e α − 1 ≥ 1 − e − α ) sup x,x ′   q ( z | x ) − q ( z | x ′ )   ≤ min { 2 , e α } inf x q ( z | x ) ( e α − 1) . Com b ining with the earlier inequalit y (50) yields the cla im (49). T o see L emm a 4, note that for an y x > 0, the conca v ity of the loga rithm imp lie s that log( x ) ≤ x − 1 . Setting alternativ ely x = a/b and x = b/a , we obtain the inequalities log a b ≤ a b − 1 = a − b b and log b a ≤ b a − 1 = b − a a . Using the ﬁrst inequalit y for a ≥ b and the seco nd for a < b completes the pro of. 7.2 Pro of of Corollary 1 Let us recall th e d e ﬁnition of the in duced marginal distribution (3), giv en by M ν ( S ) = Z X Q ( S | x 1: n ) dP n ν ( x 1: n ) for S ∈ σ ( Z n ). F or eac h i = 2 , . . . , n , we let M ν,i ( · | Z 1 = z 1 , . . . , Z i − 1 = z i − 1 ) = M ν,i ( · | z 1: i − 1 ) denote the (marginal o v er X i ) d istribution of the v ariable Z i conditioned on Z 1 = z 1 , . . . , Z i − 1 = z i − 1 . In addition, use the sh orthand nota tion D kl  M ν,i k M ν ′ ,i  := Z Z i − 1 D kl  M ν,i ( · | z 1: i − 1 ) k M ν ′ ,i ( · | z 1: i − 1 )  dM i − 1 ν ( z 1 , . . . , z i − 1 ) to d e note th e in tegrated K L divergence of the conditional distributions on the Z i . By the c hain-rule for KL div ergences [32, Chapter 5.3], w e obtain D kl ( M n ν k M n ν ′ ) = n X i =1 D kl  M ν,i k M ν ′ ,i  . 29 By assumption (1), the distribution Q i ( · | X i , Z 1: i − 1 ) on Z i is α -diﬀeren tially pr iv ate for the sample X i . As a consequence, if w e let P ν,i ( · | Z 1 = z 1 , . . . , Z i − 1 = z i − 1 ) denote the conditional distribution of X i giv en the ﬁ rst i − 1 v alues Z 1 , . . . , Z i − 1 and the pac king index V = ν , then f r om the c hain rule and Theorem 1 w e obtain D kl ( M n ν k M n ν ′ ) = n X i =1 Z Z i − 1 D kl  M ν,i ( · | z 1: i − 1 ) k M ν ′ ,i ( · | z 1: i − 1 )  dM i − 1 ν ( z 1: i − 1 ) ≤ n X i =1 4( e α − 1) 2 Z Z i − 1   P ν,i ( · | z 1: i − 1 ) − P ν ′ ,i ( · | z 1: i − 1 )   2 TV dM i − 1 ν ( z 1 , . . . , z i − 1 ) . By the construction of our sampling scheme, th e random v ariables X i are conditionally indep enden t giv en V = ν ; th u s the d istribution P ν,i ( · | z 1: i − 1 ) = P ν,i , w here P ν,i denotes the distribution of X i conditioned on V = ν . Consequen tly , w e hav e   P ν,i ( · | z 1: i − 1 ) − P ν ′ ,i ( · | z 1: i − 1 )   TV =   P ν,i − P ν ′ ,i   TV , whic h give s the claimed result. 7.3 Pro of of Prop osition 1 The minimax r a te charac terized b y equation (20) inv olves b oth a lo w er and an upp er b ound, and w e divide our pro of accordingly . W e pr ovide the pro of for α ∈ (0 , 1], but note th a t a s i milar result (mo dulo d iﬀerent constan ts) holds for an y ﬁ n ite v alue of α . Lo wer b ound: W e use Le Cam’s metho d to p ro v e th e lo w er b ound in equ ation (20 ). Fix a gi v en constan t δ ∈ (0 , 1], with a precise v alue to b e sp eciﬁed later. F or ν ∈ V ∈ {− 1 , 1 } , deﬁn e the distribution P ν with supp ort {− δ − 1 /k , 0 , δ 1 /k } b y P ν ( X = δ − 1 /k ) = δ (1 + ν ) 2 , P ν ( X = 0) = 1 − δ, and P ν ( X = − δ − 1 /k ) = δ (1 − ν ) 2 . By construction, w e h a ve E [ | X | k ] = δ ( δ − 1 /k ) k = 1 and θ ν = E ν [ X ] = δ k − 1 k ν , when ce the m e an diﬀerence is giv en by θ 1 − θ − 1 = 2 δ k − 1 k . Applying Le Cam’s metho d (8) and the minim ax b ound (7) yields M n (Θ , ( · ) 2 , Q ) ≥  δ k − 1 k  2  1 2 − 1 2   M n 1 − M n − 1   TV  , where M n ν denotes the marginal d ist ribution of the samples Z 1 , . . . , Z n conditioned on θ = θ ν . No w Pinsk er’s inequalit y implies that   M n 1 − M n − 1   2 TV ≤ 1 2 D kl  M n 1 k M n − 1  , and Corolla ry 1 yields D kl  M n 1 k M n − 1  ≤ 4( e α − 1) 2 n k P 1 − P − 1 k 2 TV = 4( e α − 1) 2 nδ 2 . Putting toge ther th e pieces yields   M n 1 − M n − 1   TV ≤ ( e α − 1) δ √ 2 n . F or α ∈ (0 , 1], w e hav e e α − 1 ≤ 2 α , and th u s our earlier application of Le Cam’s metho d implies M n (Θ , ( · ) 2 , α ) ≥  δ k − 1 k  2  1 2 − αδ √ 2 n  . Substituting δ = m in { 1 , 1 / √ 32 nα 2 } yiel ds the claim (20). 30 Upp er b ound: W e m ust demonstrate a n α -locally p riv ate co nditional distribution Q and an estimator that ac hiev es the upp er b ound in equation (20). W e do so via a com bination of truncation and addition of Laplac ian noise. Deﬁne the truncation fu ncti on [ · ] T : R → [ − T , T ] b y [ x ] T := max {− T , min { x, T }} , where th e truncation lev el T is to b e c hosen. Let W i b e indep endent L ap lace( α/ (2 T )) random v ariables, and for eac h index i = 1 , . . . , n , deﬁn e Z i := [ X i ] T + W i . By construction, the rand o m v ariable Z i is α -diﬀeren tially p riv ate for X i . F or the mean estimator b θ := 1 n P n i =1 Z i , w e hav e E h ( b θ − θ ) 2 i = V ar( b θ ) +  E [ b θ ] − θ  2 = 4 T 2 nα 2 + 1 n V ar([ X 1 ] T ) + ( E [ Z 1 ] − θ ) 2 . (51) W e claim th a t E [ Z ] = E [[ X ] T ] ∈  E [ X ] − 1 ( k − 1) T k − 1 , E [ X ] + 1 ( k − 1) T k − 1  . (52) Indeed, b y the assump ti on th at E [ | X | k ] ≤ 1, w e hav e b y a c h a nge of v ariables that Z ∞ T xdP ( x ) = Z ∞ T P ( X ≥ x ) dx ≤ Z ∞ T 1 x k dx = 1 ( k − 1) T k − 1 . Th us E [[ X ] T ] ≥ E [min { X, T } ] = E [min { X , T } + [ X − T ] + − [ X − T ] + ] = E [ X ] − Z ∞ T ( x − T ) dP ( x ) ≥ E [ X ] − 1 ( k − 1) T k − 1 . A similar argumen t yields the upp er b ound in equatio n (52). F rom the b o und (51) and the inequalities that since [ X ] T ∈ [ − T , T ] and α 2 ≤ 1, w e hav e E h ( b θ − θ ) 2 i ≤ 5 T 2 nα 2 + 1 ( k − 1) 2 T 2 k − 2 v alid for any T > 0. Cho osing T = (5( k − 1)) − 1 2 k ( nα 2 ) 1 / (2 k ) yields E h ( b θ − θ ) 2 i ≤ 5(5( k − 1)) − 1 k ( nα 2 ) 1 k nα 2 + 1 ( k − 1) 2 (5( k − 1)) − 1+1 /k ( nα 2 ) 1 − 1 /k = 5 1 − 1 /k  1 + 1 k − 1  1 ( k − 1) 1 k ( nα 2 ) 1 − 1 k . Since (1 + ( k − 1) − 1 )( k − 1) − 1 k < ( k − 1) − 1 + ( k − 1) − 2 for k ∈ (1 , 2) and is b ounded by 1 + ( k − 1) − 1 ≤ 2 for k ∈ [2 , ∞ ], the upp er b ound (20) follo ws. 7.4 Pro of of Prop osition 2 W e no w turn to the pro o f of min ima x rates for ﬁ xe d design linear regression. 31 Lo wer b ound: W e use a sligh t generalization of the α -priv ate form (19) of the lo cal F ano in- equalit y from Corollary 3. F or concreteness, w e assume throughout that α ∈ [0 , 23 35 ], but analogous argumen ts h ol d f o r an y b ou n ded α with c hanges only in the constan t pre-factors. Consider an instance of the linear regression mo del (21) in whic h the n oise v ariables { ε i } n i =1 are dra wn i.i.d. from the uniform distribu tio n on [ − σ, + σ ]. Our ﬁrst step is to construct a s u ita ble pac king of the unit sphere S d − 1 = { u ∈ R d : k u k 2 = 1 } in ℓ 2 -norm: Lemma 5. Ther e exists a 1 -p acking V = { ν 1 , . . . , ν N } of the unit spher e S d − 1 with N ≥ exp( d/ 8) . See App endix D.1 for the pro of of this claim. F or a ﬁxed δ ∈ (0 , 1] to b e c h ose n shortly , deﬁn e the family of v ectors { θ ν , ν ∈ V } w it h θ ν := δ ν . Since k ν k 2 ≤ 1, w e hav e k θ ν − θ ν ′ k 2 ≤ 2 δ . Let P ν,i denote the distribu ti on of Y i conditioned on θ ∗ = θ ν . B y the form of the linear regression mo del (21) and our assumption on the noise v ariable ε i , P ν,i is uniform on the inte rv al [ h θ ν , x i i − σ, h θ ν , x i i + σ ]. Consequen tly , for ν 6 = ν ′ ∈ V , w e ha ve   P ν,i − P ν ′ ,i   TV = 1 2 Z | p ν,i ( y ) − p ν ′ ,i ( y ) | dy ≤ 1 2  1 2 σ | h θ ν , x i i − h θ ν ′ , x i i | + 1 2 σ | h θ ν , x i i − h θ ν ′ , x i i |  = 1 2 σ |h θ ν − θ ν ′ , x i i| . Letting V den ote a random sample from the u niform distribution on V , Corollary 1 implies th a t the m u tual information is upp er b ound ed as I ( Z 1 , . . . , Z n ; V ) ≤ 4( e α − 1) 2 n X i =1 1 |V | 2 X ν,ν ′ ∈V   P ν,i − P ν ′ ,i   2 TV ≤ ( e α − 1) 2 σ 2 n X i =1 1 |V | 2 X ν,ν ′ ∈V ( h θ ν − θ ν ′ , x i i ) 2 = ( e α − 1) 2 σ 2 1 |V | 2 X ν,ν ′ ∈V ( θ ν − θ ν ′ ) ⊤ X ⊤ X ( θ ν − θ ν ′ ) . Since θ ν = δ ν , we hav e by deﬁn it ion of the maximum singular v alue that ( θ ν − θ ν ′ ) ⊤ X ⊤ X ( θ ν − θ ν ′ ) ≤ δ 2   ν − ν ′   2 2 ρ max ( X ⊤ X ) ≤ 4 δ 2 ρ 2 max ( X ) = 4 nδ 2 ρ 2 max ( X/ √ n ) . Putting together the pieces, we ﬁnd that I ( Z 1 , . . . , Z n ; V ) ≤ 4 nδ 2 ( e α − 1) 2 σ 2 ρ 2 max ( X/ √ n ) ≤ 8 nα 2 δ 2 σ 2 ρ 2 max ( X/ √ n ) , where the second inequalit y is v alid for α ∈ [0 , 23 35 ]. C o nsequentl y , F ano’s inequalit y com bined with the pac king set V from Lemma 5 implies that M n  Θ , k·k 2 2 , α  ≥ δ 2 4  1 − 8 nδ 2 α 2 ρ 2 max ( X/ √ n ) /σ 2 + log 2 d/ 8  . W e split th e remaind e r of th e analysis in to cases. 32 Case 1: First supp ose that d ≥ 16. Then s e tting δ 2 = min { 1 , dσ 2 128 nρ 2 max ( X/ √ n ) } implies that 8 nδ 2 α 2 ρ 2 max ( X/ √ n ) /σ 2 + log 2 d/ 8 ≤ 8  log 2 d + 64 128  < 7 8 . As a co nsequence, w e ha ve the low er b ound M n  Θ , k·k 2 2 , α  ≥ 1 4 min  1 , dσ 2 128 nρ 2 max ( X/ √ n )  · 1 8 , whic h yields the claim for d ≥ 16. Case 2: Otherwise, we ma y assume that d < 16 . In this case, e a low er b ound for th e case d = 1 is su ﬃcie n t, since apart from constan t factors, the same b ound h olds for all d < 16. W e use the Le Cam method based on a t wo p oin t comparison. Indeed, let θ 1 = δ and θ 2 = − δ so that the total v ariation d ista nce is at u pp e r b ounded k P 1 ,i − P 2 ,i k TV ≤ δ σ | x i | . By Corollary 2, w e ha ve M n  Θ , ( · ) 2 , α  ≥ δ 2 1 2 − δ ( e α − 1) σ  n X i =1 x 2 i  1 2 ! . Letting x = ( x 1 , . . . , x n ) and setting δ 2 = min { 1 , σ 2 / (16( e α − 1) 2 k x k 2 2 ) } giv es the desired result. Upp er b ound: W e now turn to the upp er b ound, for wh ic h w e need to sp e cify a priv ate con- ditional Q and an estimator b θ that ac h i ev es the stated upp er b ound on the mean-squared err o r. Let W i b e ind e p enden t Laplace( α/ ( 2 σ )) random v ariables. Then the additive ly p erturb ed ran- dom v ariable Z i = Y i + W i is α -diﬀeren tially priv ate for Y i , since by a ssumption the resp onse Y i ∈ [ h θ , x i i − σ, h θ , x i i + σ ]. W e no w claim that the standard least-squares estimat or of θ ∗ ac hiev es the stated upp er b ound. Indeed, the least-squares estimate is giv en by b θ = ( X ⊤ X ) − 1 X ⊤ Y = ( X ⊤ X ) − 1 X ⊤ ( X θ ∗ + ε + W ) . Moreo v er, fr om the indep endence of W and ε , we ha v e E h k b θ − θ ∗ k 2 2 i = E h k ( X ⊤ X ) − 1 X ⊤ ( ε + W ) k 2 2 i = E h k ( X ⊤ X ) − 1 X ⊤ ε k 2 2 i + E h k ( X ⊤ X ) − 1 X ⊤ W ) k 2 2 i . Since ε ∈ [ − σ, σ ] n , we know that E [ εε ⊤ ]  σ 2 I n × n , and for the give n choic e of W , w e ha v e E [ W W ⊤ ] = (4 σ 2 /α 2 ) I n × n . Since α ≤ 1, w e thus ﬁnd E h k b θ − θ ∗ k 2 2 i ≤ 5 σ 2 α 2 tr  X ( X ⊤ X ) − 2 X ⊤  = 5 σ 2 α 2 tr  ( X ⊤ X ) − 1  . Noting that tr(( X ⊤ X ) − 1 ) ≤ d/ρ 2 min ( X ) = d/nρ 2 min ( X/ √ n ) giv es the claimed upp er b ound . 8 Pro of of Theorem 2 and related results In this sectio n, we collect together the pro of of Theorem 2 and related corolla ries. 33 8.1 Pro of of Theorem 2 Let Z den ote t he domain of the r an d om v ariable Z . W e b egin b y reducing the p roblem to the case w hen Z = { 1 , 2 , . . . , k } for an arbitrary p ositiv e integ er k . Ind e ed, in the general setting, w e let K = { K i } k i =1 b e any (measurable) ﬁnite partition of Z , where for z ∈ Z w e let [ z ] K = K i for the K i suc h that z ∈ K i . The KL div ergence D kl  M ν k M  can b e deﬁned as the supremum of the (discrete) KL div ergences b et we en the random v ariables [ Z ] K sampled according to M ν and M o ver all p a rtitions K of Z ; for instance, see Gray [32, Chapter 5]. Consequen tly , w e ca n pro ve the claim for Z = { 1 , 2 , . . . , k } , and then tak e the su p rem um o ve r k to reco v er the general ca se. Acco rdingly , w e can w ork with the probab ility mass fu ncti ons m ( z | ν ) = M ν ( Z = z ) and m ( z ) = M ( Z = z ), and w e may w rite D kl  M ν k M  + D kl  M k M ν  = k X z =1 ( m ( z | ν ) − m ( z )) log m ( z | ν ) m ( z ) . (53) Throughout, w e will also use (without loss of generalit y) the probabilit y mass functions q ( z | x ) = Q ( Z = z | X = x ), wh er e w e note that m ( z | ν ) = R q ( z | x ) dP ν ( x ). No w w e use Lemm a 4 from the p roof of Th eo rem 1 to complete the pr oof of Th e orem 2. Starting with equalit y (53), w e ha ve 1 |V | X ν ∈V  D kl  M ν k M  + D kl  M k M ν  ≤ X ν ∈V 1 |V | k X z =1 | m ( z | ν ) − m ( z ) |     log m ( z | ν ) m ( z )     ≤ X ν ∈V 1 |V | k X z =1 | m ( z | ν ) − m ( z ) | | m ( z | ν ) − m ( z ) | min { m ( z ) , m ( z | ν ) } . No w, w e deﬁne the measure m 0 on Z = { 1 , . . . , k } b y m 0 ( z ) := inf x ∈X q ( z | x ). I t is clear that min { m ( z ) , m ( z | ν ) } ≥ m 0 ( z ), whence we ﬁnd 1 |V | X ν ∈V  D kl  M ν k M  + D kl  M k M ν  ≤ X ν ∈V 1 |V | k X z =1 ( m ( z | ν ) − m ( z )) 2 m 0 ( z ) . It remains to b o und the ﬁnal su m. F o r any constant c ∈ R , we hav e m ( z | ν ) − m ( z ) = Z X ( q ( z | x ) − c )  dP ν ( x ) − d P ( x )  . W e deﬁne a set of functions f : Z × X → R (dep ending imp li citly on q ) b y F α :=  f | f ( z , x ) ∈ [1 , e α ] m 0 ( z ) for all z ∈ Z and x ∈ X  . By the deﬁnition of diﬀerentia l pr iv acy , when view ed as a join t mapping from Z × X → R , th e conditional p.m.f. q satisﬁes { ( z , x ) 7→ q ( z | x ) } ∈ F α . Since constan t (with resp ect to x ) shifts do not c h ange the abov e integral, we can mod ify the range of f unctions in F α b y subtracting m 0 ( z ) from eac h, yielding the set F ′ α :=  f | f ( z , x ) ∈ [0 , e α − 1] m 0 ( z ) for all z ∈ Z and x ∈ X  . 34 As a co nsequence, w e ﬁnd that X ν ∈V ( m ( z | ν ) − m ( z )) 2 ≤ sup f ∈F α ( X ν ∈V  Z X f ( z , x )  dP ν ( x ) − d P ( x )   2 ) = sup f ∈F ′ α ( X ν ∈V  Z X  f ( z , x ) − m 0 ( z )   dP ν ( x ) − d P ( x )   2 ) . By insp ection, when we divide b y m 0 ( z ) and recall the deﬁnition of the set B ∞ ⊂ L ∞ ( X ) in the statemen t of Th e orem 2, w e obtain X ν ∈V ( m ( z | ν ) − m ( z )) 2 ≤  m 0 ( z )  2 ( e α − 1) 2 sup γ ∈ B ∞ X ν ∈V  Z X γ ( x )  dP ν ( x ) − d P ( x )   2 . Putting together our bou n ds, we hav e 1 |V | X ν ∈V  D kl  M ν k M  + D kl  M k M ν  ≤ ( e α − 1) 2 k X z =1 1 |V |  m 0 ( z )  2 m 0 ( z ) sup γ ∈ B ∞ X ν ∈V  Z X γ ( x )  dP ν ( x ) − d P ( x )   2 ≤ ( e α − 1) 2 1 |V | sup γ ∈ B ∞ X ν ∈V  Z X γ ( x )  dP ν ( x ) − d P ( x )   2 , since P z m 0 ( z ) ≤ 1, whic h is the statemen t of the theorem. 8.2 Pro of of Corollary 4 In the non-in teractiv e setting (2), the marginal d istribution M n ν is a pro duct measure and Z i is conditionally in depen d en t of Z 1: i − 1 giv en V . Thus b y the c hain rule for mutual information [32, Chapter 5] and th e fact (as in the pro of of Theorem 2) that w e ma y assume w.l.o.g. that Z h as ﬁnite range I ( Z 1 , . . . , Z n ; V ) = n X i =1 I ( Z i ; V | Z 1: i − 1 ) = n X i =1 [ H ( Z i | Z 1: i − 1 ) − H ( Z i | V , Z 1: i − 1 )] . Since conditioning red u ce s en tropy and Z 1: i − 1 is conditionally ind epend en t of Z i giv en V , w e h av e H ( Z i | Z 1: i − 1 ) ≤ H ( Z i ) and H ( Z i | V , Z 1: i − 1 ) = H ( Z i | V ). In particular, w e ha ve I ( Z 1 , . . . , Z n ; V ) ≤ n X i =1 I ( Z i ; V ) = n X i =1 1 |V | X ν ∈V D kl  M ν,i k M i  . Applying Theorem 2 co mpletes the pro of. 35 9 Pro of of Theorem 3 The pr oof of this theorem combines the tec h n iques w e us ed in the p roofs of Theorems 1 and 2; the ﬁrst handles in teractivit y , while the techniques to deriv e the v ariational b ounds are reminiscent of those used in Theorem 2 . Ou r ﬁrst step is to note a consequence of th e ind e p endence stru ct ure in Fig. 1 essen tial to our tensorizat ion s teps . More precisely , w e claim that for an y set S ∈ σ ( Z ), M ± j ( Z i ∈ S | z 1: i − 1 ) = Z Q ( Z i ∈ S | Z 1: i − 1 = z 1: i − 1 , X i = x ) dP ± j,i ( x ) . (54) W e p ostpone the pro of of this in termediate claim to the end of this s e ction. No w consider the summed KL-dive rgences. Let M ± j,i ( · | z 1: i − 1 ) denote the conditional distribu - tion of Z i under P ± j , conditional on Z 1: i − 1 = z 1: i − 1 . As in th e p r oof of Corolla ry 1, the c hain-ru le for KL-div ergences [e.g. 32, Chapter 5] implies D kl  M n + j k M n − j  = n X i =1 Z Z i − 1 D kl ( M + j ( · | z 1: i − 1 ) k M − j ( · | z 1: i − 1 )) dM i − 1 + j ( z 1: i − 1 ) . F or notat ional conv en ience in the remainder of the p roof, let us deﬁne the symmetrized KL div er- gence b et ween measures M and M ′ as D sy kl ( M | | M ′ ) = D kl ( M k M ′ ) + D kl ( M ′ k M ). Deﬁning P := 2 − d P ν ∈V P n ν , w e ha v e 2 P = P + j + P − j for eac h j sim ultaneously , W e also in tr o duce M ( S ) = R Q ( S | x 1: n ) dM ( x 1: n ), and let E ± j denote th e exp e ctation taken und er the marginals M ± j . W e then ha v e D kl  M n + j k M n − j  + D kl  M n − j k M n + j  = n X i =1  E + j [ D kl ( M + j,i ( · | Z 1: i − 1 ) k M − j,i ( · | Z 1: i − 1 ))] + E − j [ D kl ( M − j,i ( · | Z 1: i − 1 ) k M + j,i ( · | Z 1: i − 1 ))]  ≤ n X i =1  E + j [ D sy kl ( M + j,i ( · | Z 1: i − 1 ) | | M − j,i ( · | Z 1: i − 1 ))] + E − j [ D sy kl ( M + j,i ( · | Z 1: i − 1 ) | | M − j,i ( · | Z 1: i − 1 ))]  = 2 n X i =1 Z Z i − 1 D sy kl ( M + j,i ( · | z 1: i − 1 ) | | M − j,i ( · | z 1: i − 1 )) d M i − 1 ( z 1: i − 1 ) , where we hav e used the d eﬁ nitio n of M and that 2 P = P + j + P − j for all j . S umming o ver j ∈ [ d ] yields d X j =1 D sy kl  M n + j | | M n − j  ≤ 2 n X i =1 Z Z i − 1 d X j =1 D sy kl ( M + j,i ( · | z 1: i − 1 ) | | M − j,i ( · | z 1: i − 1 )) | {z } =: T j,i dM i − 1 ( z 1: i − 1 ) . (55) W e b ound the underlined expression in inequalit y (55), wh o se ele men ts w e denote by T j,i . Without loss of generalit y (as in the pr oof of T heorem 2), w e ma y assume Z is ﬁnite, and that Z = { 1 , 2 , . . . , k } for some p ositiv e in teger k . Using the probabilit y mass fu ncti ons m ± j,i and 36 omitting the index i w hen it is clear from con text, Lemma 4 implies T j,i = k X z =1 ( m + j ( z | z 1: i − 1 ) − m + j ( z | z 1: i − 1 )) log m + j ( z | z 1: i − 1 ) m − j ( z | z 1: i − 1 ) ≤ k X z =1 ( m + j ( z | z 1: i − 1 ) − m + j ( z | z 1: i − 1 )) 2 1 min { m + j ( z | z 1: i − 1 ) , m − j ( z | z 1: i − 1 ) } . F or eac h ﬁxed z 1: i − 1 , deﬁne th e in ﬁmal measure m 0 ( z | z 1: i − 1 ) := inf x ∈X q ( z | X i = x, z 1: i − 1 ). By construction, w e hav e min { m + j ( z | z 1: i − 1 ) , m − j ( z | z 1: i − 1 ) } ≥ m 0 ( z | z 1: i − 1 ), and h ence T j,i ≤ k X z =1 ( m + j ( z | z 1: i − 1 ) − m + j ( z | z 1: i − 1 )) 2 1 m 0 ( z | z 1: i − 1 ) . Recalling equalit y (54), w e ha ve m + j ( z | z 1: i − 1 ) − m + j ( z | z 1: i − 1 ) = Z X q ( z | x, z 1: i − 1 )( dP + j,i ( x ) − dP − j,i ( x )) = m 0 ( z | z 1: i − 1 ) Z X  q ( z | x, z 1: i − 1 ) m 0 ( z | z 1: i − 1 ) − 1  ( dP + j,i ( x ) − dP − j,i ( x )) . F rom this p oint, the pro of is similar to that of Theorem 2. Deﬁn e the collectio n of functions F α := { f : X × Z i → [0 , e α − 1] } . Using the deﬁ n iti on of d iﬀerential p riv acy , we hav e q ( z | x, z 1: i − 1 ) m 0 ( z | z 1: i − 1 ) ∈ [1 , e α ], so th e re exists f ∈ F α suc h that d X j =1 T j,i ≤ d X j =1 k X z =1  m 0 ( z | z 1: i − 1 )  2 m 0 ( z | z 1: i − 1 )  Z X f ( x, z , z 1: i − 1 )( dP + j,i ( x ) − dP − j,i ( x ))  2 = k X z =1 m 0 ( z | z 1: i − 1 ) d X j =1  Z X f ( x, z , z 1: i − 1 )( dP + j,i ( x ) − dP − j,i ( x ))  2 . T aking a s uprem um o v er F α , w e ﬁnd the further upp er b ound d X j =1 T j,i ≤ k X z =1 m 0 ( z | z 1: i − 1 ) sup f ∈F α d X j =1  Z X f ( x, z , z 1: i − 1 )( dP + j,i ( x ) − dP − j,i ( x ))  2 . The in ner suprem um may b e tak en indep endent ly of z and z 1: i − 1 , so we rescale b y ( e α − 1) to obtain our p en ultimate inequalit y d X j =1 D sy kl ( M + j,i ( · | z 1: i − 1 ) | | M − j,i ( · | z 1: i − 1 )) ≤ ( e α − 1) 2 k X z =1 m 0 ( z | z 1: i − 1 ) sup γ ∈ B ∞ ( X ) d X j =1  Z X γ ( x )( dP + j,i ( x ) − dP − j,i ( x ))  2 . 37 Noting that m 0 sums to a quanti t y ≤ 1 and su bstituting the pr ec eding expression in inequalit y (55) completes the pro o f. Finally , w e return to pro ve our intermediate marginaliz ation claim (54). W e h a ve that M ± j ( Z i ∈ S | z 1: i − 1 ) = Z Q ( Z i ∈ S | z 1: i − 1 , x 1: n ) dP ± j ( x 1: n | z 1: i − 1 ) ( i ) = Z Q ( Z i ∈ S | z 1: i − 1 , x i ) dP ± j ( x 1: n | z 1: i − 1 ) ( ii ) = Z Q ( Z i ∈ S | Z 1: i − 1 = z 1: i − 1 , X i = x ) dP ± j,i ( x ) , where equalit y (i) f ollo ws by the assum e d conditional indep endence structure of Q (recall Figur e 1) and equalit y (i i) is a consequen c e of the indep endence of X i and Z 1: i − 1 under P ± j . That is, w e ha ve P + j ( X i ∈ S | Z 1: i − 1 = z 1: i − 1 ) = P + j,i ( S ) by the deﬁn it ion of P n ν as a pr oduct and that P ± j are a mixture of the pr o ducts P n ν . 10 Conclusions W e hav e linked minimax analysis from statistical decision theory w ith diﬀerentia l pr iv acy , brin g ing some of their resp ectiv e foundational principles into close con tact. Ou r main te c hn i que, in t he form of the div er gence inequalities in Theorems 1 and 2 , and their Corollaries 1–4, sho ws that applying diﬀerentiall y priv ate samp lin g sc hemes essen tially acts as a con traction on distributions. These cont ractiv e inequaliti es allo w us to give sharp minimax r a tes for estimation in lo cally priv ate settings, and w e think suc h resu lts may b e more generally applicable. With our examples in Sections 4.2, 5.2, and 5.3, w e ha ve dev elop ed a framew ork that sho ws that r o ughly , if one can construct a family of d istributions { P ν } on the sample space X that is n ot well “correlated” with an y mem b er of f ∈ L ∞ ( X ) for which f ( x ) ∈ {− 1 , 1 } , then pro viding pr iv acy is costly: the con traction Theorems 2 and 3 provide is strong. By pr oviding sharp con v ergence r a tes for man y standard statistical estimation pro cedures u nder lo c al diﬀerential priv acy , w e ha v e dev elop ed and explored some to ols that ma y b e used to b etter understand pr iv acy-preserving statistical inference. W e hav e iden tiﬁed a fund a men tal contin uum along wh ic h priv acy ma y b e traded f o r utilit y in the form of accurate statistica l estimates, pr oviding a w a y to adjust statistical pr ocedures to meet th e p riv acy or utilit y needs of the stat istician and the p opulation b eing sampled. There are a num b er of op en qu est ions r a ised by our work. It is natural to wonder wh e ther it is p ossible to obtain tensorized in e qualities of the form of Corollary 4 even for interacti v e mec hanisms. Another imp ortan t qu esti on is whether th e r e sults w e ha v e pro vid e d can b e extended to settings in whic h standard (non-lo c al) diﬀeren tial priv acy h o lds. Su c h extensions could yield in sig h ts in to optimal mec hanisms for diﬀerentia lly priv ate pro cedures. Ac kno wledgmen t s W e are very thankful to Sh uheng Zhou for p oin ting out errors in Corollaries 1 and 4 in an earlier v ersion of th is m a n uscript. W e also th a nk Gu y Rothblum for helpf ul discu s sio ns. JCD was partially supp orted by a F a ceb ook Gradu at e F ello wsh i p and an NDSEG fello wship. Our wo rk w as supp orted 38 in part by the U.S. Army Researc h Oﬃce u nder grant n um b er W911NF-11-1 -0391 , and Oﬃce of Na v al Researc h MURI gran t N00014- 11-1-0 688. A Pro ofs of m ulti-dimensional mean-estimation results A t a high lev el, our pro ofs of these results consist o f three steps, the ﬁrst of w h ic h is relati v ely standard, while the second t w o exploit sp eciﬁc asp ects of the lo cal priv acy setti ng. W e outline them here: (1) Th e ﬁrst step is a standard reduction, based on inequ a lities (7)–(9) in Section 2, from an estimation problem to a m ulti-w ay testing pr o blem that inv olv es d isc riminating b et ween indices ν co n tained within some subset V of R d . (2) Th e second step is an appropriate co nstruction of a maximal δ -pac king, meaning a set V ⊂ R d suc h that eac h pair is δ -separated and th e r esu lt ing set is as large as p ossible. In add it ion, our argumen ts require that, for a random v ariable V u n iformly distr ib uted o v er V , the co v ariance Co v ( V ) has relativ ely small operator norm. (3) Th e ﬁ nal s t ep is to apply Theorem 2 in ord e r to cont rol t he mutual information asso ciat ed with the testing problem. Do ing so requir es b ounding the supr e m u m in Corollary 4 via the the op erato r norm of C o v ( V ). The estimation to testing r e duction of Step 1 w as p revio usly d e scrib ed in Section 2. Accordingly , the pro ofs to follo w are dev oted to the second and th ir d steps in eac h case. A.1 Pro of of Prop osition 3 W e pro vide a pr o of of the low er b ound, as we pro vided the argum ent for the upp er b ound in Section 4.2.2. Constructing a go o d pac king: Let k b e an arbitrary integ er in { 1 , 2 , . . . , d } . The follo w ing auxiliary result pro vides a building blo c k for the pac king set und er lyin g our pr o of: Lemma 6. F or e ach inte ger k , ther e exists a p acking V k of the k - d imensional hyp er cub e {− 1 , 1 } k with k ν − ν ′ k 1 ≥ k / 2 for e ach ν, ν ′ ∈ V k with ν 6 = ν ′ such that |V k | ≥ ⌈ exp( k / 16 ) ⌉ , and 1 |V k | X ν ∈V k ν ν ⊤  25 I k × k . See App endix D.2 for the pro of. F or a giv en k ≤ d , w e extend the set V k ⊆ R k to a subset of R d b y setting V = V k × { 0 } d − k . F or a parameter δ ∈ (0 , 1 / 2] to b e c hosen, w e deﬁne a family of p robabilit y distributions { P ν } ν ∈V constructiv ely . In particular, the random vect or X ∼ P ν (a single observ ation) is formed by the follo wing p rocedur e: Cho ose index j ∈ { 1 , . . . , k } uniformly at r a ndom and set X = ( r e j w.p. 1+ δν j 2 − r e j w.p. 1 − δν j 2 . (56) 39 By construction, these distributions ha v e mean v ectors θ ν := E P ν [ X ] = δ r k ν. Consequent ly , give n the prop erties of the p a c king V , w e ha v e X ∈ B 1 ( r ) with probability 1, and k θ ν − θ ν ′ k 2 2 ≥ r 2 δ 2 /k . T h us we see that the mean v ectors { θ ν } ν ∈V pro vide us with an r δ/ √ k -pac king of the ball. Upp er b ounding the mutual informat ion: Ou r n ext step is to b ound the m u tual information I ( Z 1 , . . . , Z n ; V ) wh e n the observ ations X come f rom the distribu tio n (56) and V is u niform in the set V . W e ha v e the follo wing lemma, wh ich app lie s so lo ng as the channel Q is n o n-in teractiv e and α -locally priv ate (2 ). See App endix E.1 for the pro of. Lemma 7. Fix k ∈ { 1 , . . . , d } . L et Z i b e α -lo c al ly diﬀer e ntial ly private for X i , and let X b e sample d ac c or ding to the distribution (56) c onditional on V = ν . Then I ( Z 1 , . . . , Z n ; V ) ≤ n 25 e α 16 δ 2 k ( e α − e − α ) 2 . Applying t esting inequalities: W e now show how a co m b inat ion of the hypercub e packing sp eciﬁed by L e mma 6 and the sampling sc h e me (56) giv e us our desired lo w er b ound. Fix k ≤ d and let V = V k × { 0 } d − k b e th e p acking of {− 1 , 1 } k × { 0 } d − k deﬁned f ollo wing Lemma 6. Combining Lemma 7 and th e fact that the v ectors θ ν pro vide an r δ / √ k pac king of the ball of cardinalit y at least exp( k / 1 6), F ano’s inequalit y imp lies that for an y k ∈ { 1 , . . . , d } , M n ( θ ( P ) , k ·k 2 2 , α ) ≥ r 2 δ 2 4 k  1 − 25 ne α δ 2 ( e α − e − α ) 2 / (16 k ) + log 2 k / 1 6  Because of the 1-dimensional mean-esti mation lo wer b ounds pro vided in Sectio n 3.3.1, we may assume w.l.o.g. that k ≥ 32. Sett ing δ 2 n,α,k = min { 1 , k 2 / (50 ne α ( e α − e − α ) 2 ) } , w e obtain M n ( θ ( P ) , k ·k 2 2 , α ) ≥ r 2 δ 2 n,α,k 4 k  1 − 1 2 − log 2 2  ≥ cr 2 min  1 k , k ne α ( e α − e − α ) 2  for a unive rsal (numerical) constant c . Since e α ( e α − e − α ) 2 < 16 α 2 for α ∈ [0 , 1], w e obtain the lo wer b ound M n ( θ ( P ) , k ·k 2 2 , α ) ≥ cr 2 max k ∈ [ d ]  min  1 k , k nα 2  for α ∈ [0 , 1] and a universal constan t c > 0. Setting k in the p rece ding disp lay to b e the integ er in { 1 , . . . , d } nearest √ nα 2 giv es th e r esu lt of the prop ositio n. A.2 Pro of of Prop osition 4 Since the upp er b ound w as established in Section 4.2 .2, we fo cus on the lo w er b ound. 40 Constructing a go od pac king: In this case, the pac king set is v ery simple: set V = {± e j } d j =1 so that |V | = 2 d . Fix some δ ∈ [0 , 1], and for ν ∈ V , deﬁne a distribu ti on P ν supp orted on X = {− r , r } d via P ν ( X = x ) = (1 + δ ν ⊤ x/r ) / 2 d . In w ords, for ν = e j , th e co ordinates of X are ind e p enden t uniform on {− r, r } except for the co o rdinate j , for wh ic h X j = r with probabilit y 1 / 2 + δ ν j and X j = − r with probabilit y 1 / 2 − δ ν j . With this sc heme, we ha ve θ ( P ν ) = rδ ν , and since k δ r ν − δ r ν ′ k ∞ ≥ δ r , w e h av e constru ct ed a δ r pac king in ℓ ∞ -norm. Upp er b ounding the m utual information: Let V b e drawn un iformly from th e p ac king set V = {± e j } d j =1 . With the s amp li ng sc heme in the p revious paragraph, w e ma y pro vide the follo wing u pp er b ound on the mutual information I ( Z 1 , . . . , Z n ; V ) for an y n o n-in teractiv e p riv ate distribution (2): Lemma 8. F or any non-inter active α - d iﬀer ential ly private distribution Q , we have I ( Z 1 , . . . , Z n ; V ) ≤ n e α 4 d  e α − e − α  2 δ 2 . See App endix E.2 for a pro of. Applying t esting inequalities: Finally , we turn to application of the testing in equ a lities. Lemma 8, in conju n ct ion with the standard testing reduction and F ano’s inequalit y (9), implies that M n ( θ ( P ) , k ·k ∞ , α ) ≥ r δ 2  1 − e α δ 2 n ( e α − e − α ) 2 / (4 d ) + log 2 log(2 d )  . There is no loss of generalit y in assuming th a t d ≥ 2, in whic h case the c hoice δ 2 = min  1 , d log(2 d ) e α ( e α − e − α ) 2 n  yields the prop ositio n. A.3 Pro of of Prop osition 5 F or this pr o p osition, the construction of the p ac king and lo wer b ound used in the p roof of Pr o p osi- tion 4 also apply . U nder these packing and sampling p r ocedures, note that the separation of p oin ts θ ( P ν ) = r δ ν in ℓ 2 -norm is r δ . It th us r ema ins to pro vide the upp er b ound. I n this case, w e use the samp ling s t rategy (26b), as in Pr o p osition 4 and Section 4.2.2, noting that w e may tak e the b ound B on k Z k ∞ to b e B = c √ dr /α for a constan t c . Let θ ∗ denote the true mean, assu m e d to b e s - sparse. No w consider estima ting θ ∗ b y the ℓ 1 -regularized optimization problem b θ := argmin θ ∈ R d ( 1 2 n     n X i =1 ( Z i − θ )     2 2 + λ k θ k 1 ) , Deﬁning the error v ector W = θ ∗ − 1 n P n i =1 Z i , w e claim that λ ≥ 2 k W k ∞ implies that k b θ − θ k 2 ≤ 3 λ √ s. (57) 41 This result is a consequence o f standard r e sults on sparse estimation (e.g., Negah ban et al. [44, Theorem 1 and Corolla ry 1]). No w we n o te if W i = θ ∗ − Z i , th e n W = 1 n P n i =1 W i , an d by constru c tion of t he sampling mec hanism (26b) we ha ve k W i k ∞ ≤ c √ dr /α for a co nstan t c . By Ho e ﬀding’s inequalit y a nd a union b ound, we thus ha v e for some (diﬀeren t) universal co nstan t c that P ( k W k ∞ ≥ t ) ≤ 2 d exp  − c nα 2 t 2 r 2 d  for t ≥ 0 . By taking t 2 = r 2 d (log(2 d ) + ǫ 2 ) / ( cnα 2 ), we ﬁ nd that k W k 2 ∞ ≤ r 2 d (log(2 d ) + ǫ 2 ) / ( cnα 2 ) with probabilit y at least 1 − exp( − ǫ 2 ), whic h giv es the cl aimed m inimax upp er b ound b y appropriate c hoice of λ = c p d log d/nα 2 in inequalit y (57). A.4 Pro of of inequalit y (30) W e p ro v e the b ound by an argumen t using the priv ate form of F ano’s inequalit y from Corollary 3 . The pro of mak es use of the classical V arshamo v-Gilb ert b ound (e.g. [53, Lemma 4]): Lemma 9 (V a rshamo v-Gilb ert) . Ther e is a p acking V of the d -dimensional hyp er cu b e {− 1 , 1 } d of size |V | ≥ exp( d/ 8) such that   ν − ν ′   1 ≥ d/ 2 for al l distinct p airs ν, ν ′ ∈ V . No w, let δ ∈ [0 , 1] and the distrib utio n P ν b e a p oin t mass at δ ν / √ d . T hen θ ( P ν ) = δ ν / √ d and k θ ( P ν ) − θ ( P ν ′ ) k 2 2 ≥ δ 2 . In addition, a calculatio n imp lie s that if M 1 and M 2 are d -dimensional Laplace( κ ) distributions with means θ 1 and θ 2 , resp ec tiv ely , then D kl ( M 1 k M 2 ) = d X j =1 (exp( − κ | θ 1 ,j − θ 2 ,j | ) + κ | θ 1 ,j − θ 2 ,j | − 1) ≤ κ 2 2 k θ 1 − θ 2 k 2 2 . As a consequence, we ha ve that und e r our Laplacian sampling scheme for the Z an d with V c hosen uniformly from V , I ( Z 1 , . . . , Z n ; V ) ≤ 1 |V | 2 n X ν,ν ′ ∈V D kl ( M ν k M ν ′ ) ≤ nα 2 2 d |V | 2 X ν,ν ′ ∈V    ( δ / √ d )( ν − ν ′ )    2 2 ≤ 2 nα 2 δ 2 d . No w, applying F ano’s inequalit y (9) in the context of the testi ng inequalit y (7), w e ﬁnd that inf b θ sup ν ∈V E P ν h k b θ ( Z 1 , . . . , Z n ) − θ ( P ν ) k 2 2 i ≥ δ 2 4  1 − 2 nα 2 δ 2 /d + log 2 d/ 8  . W e ma y assume (based on our one-dimensional results in Prop osition 1) w.l.o.g. that d ≥ 10. T aking δ 2 = d 2 / (48 nα 2 ) then implies the r e sult (30). 42 B Pro ofs of m ultinomial estimation results In this s e ction, we prov e th e lo wer b ounds in Prop ositio n 6. Befo re pro ving the b ounds, how ev er, w e outline our tec h nique, which b orrows fr o m that in Section A, and which we also use to pro v e the lo we r b ounds on densit y estimation. The outline is as follo ws: (1) As in step (1) of Section A, our ﬁr st step is a standard reduction using the sh arp er v ersion of Assouad’s metho d (Lemma 1) from estimation to a m u l tiple binary hyp o thesis testing pr oblem. Sp eciﬁcally , we p erform a (essen tially standard) reduction of the form (1 0). (2) Ha ving constru ct ed appropriately separated binary h yp othesis tests, w e use app ly Theorem 3 via inequalit y (32) to control th e testing error in the bin a ry testing problem. App lying the theo- rem r e quires b oundin g certain suprema related to the co v ariance structure of rand omly selected elemen ts of V = {− 1 , 1 } d , as in the arguments in Section A. I n this case, though, the sy m met ry of the b in a ry hyp o thesis testing problems eliminates the need for carefully constructed pac kings of step A(2). With this outline in mind , w e turn to the pro ofs of inequalities (33) and (34). As we pro ved the u pp er boun ds in Section 5.2.2, th is section fo cu se s on the argumen t for the lo w er b ound. W e pro vide th e full pro of for th e mean-squared Eu cl idean error, after whic h w e sho w h o w the result for the ℓ 1 -error follo ws. Our ﬁrst step is to p ro vide a lo wer b ound of the form (10), giving a Hamming s ep a ration for the squared err o r. T o that end , ﬁx δ ∈ [0 , 1], and for simplicit y , let us assume that d is ev en. In this case, w e set V = {− 1 , 1 } d/ 2 , and for ν ∈ V let P ν b e the multi nomial distribu ti on w ith parameter θ ν := 1 d 1 + δ 1 d  ν − ν  ∈ ∆ d . F or an y estimator b θ , by deﬁning b ν j = sign( b θ j − 1 /d ) for j ∈ [ d/ 2] w e h av e the lo we r b ound k b θ − θ ν k 2 2 ≥ δ 2 d 2 d/ 2 X j =1 1 { b ν j 6 = ν j } , so that b y the sharp er v ariant (32) of Assouad’s Lemma, w e obtain max ν ∈V E P ν [ k b θ − θ ν k 2 2 ] ≥ δ 2 4 d   1 −  1 2 d d/ 2 X j =1 D kl  M n + j k M n − j  + D kl  M n − j k M n + j   1 2   . (58) No w we app ly Theorem 3, whic h requir e s b oundin g sums of int egrals R γ ( dP + j − dP − j ), where P + j is deﬁned in expression (3 1). W e claim the foll o wing inequ a lit y: sup γ ∈ B ∞ ( X ) d/ 2 X j =1  Z X γ ( x ) dP + j ( x ) − dP − j ( x )  2 ≤ 8 δ 2 d . (59) Indeed, b y co nstruction P + j is th e m u ltinomial with parameter (1 /d ) 1 + ( δ /d )[ e ⊤ j − e ⊤ j ] ⊤ ∈ ∆ d and similarly for P − j , wh ere e j ∈ { 0 , 1 } d/ 2 denotes the j th s tand a rd basis v ector. Abusing notation and iden tifying γ with v ectors γ ∈ [ − 1 , 1] d , w e hav e Z X γ ( x ) dP + j ( x ) − dP − j ( x ) = 2 δ d γ ⊤  e j − e j  , 43 whence w e ﬁnd d/ 2 X j =1  Z X γ ( x ) dP + j ( x ) − dP − j ( x )  2 = 4 δ 2 d 2 γ ⊤ d/ 2 X j =1  e j − e j   e j − e j  ⊤ γ = 4 δ 2 d 2 γ ⊤  I − I − I I  γ ≤ 8 δ 2 d , b ecause the op erator norm of th e matrix is b ounded b y 2. This give s the claim (59). Substituting the b ound (59) into the b ound (58) via Th eo rem 3, w e obtain max ν ∈V E P ν [ k b θ − θ ν k 2 2 ] ≥ δ 2 4 d h 1 −  4 n ( e α − 1) 2 δ 2 /d 2  1 2 i . Cho osing δ 2 = min { 1 , d 2 / (16 n ( e α − 1) 2 ) } giv es the lo wer b ound M n (∆ d , k·k 2 2 , α ) ≥ min  1 4 d , d 64 n ( e α − 1) 2  . T o complete the pro o f, w e note that w e can pro v e the preceding u pp e r b ound f o r any ev en d 0 ∈ { 2 , . . . , d } ; this requ ir e s choosing ν ∈ V = {− 1 , 1 } d 0 / 2 and co nstructing the multi nomial v ectors θ ν = 1 d 0  1 d 0 0 d − d 0  + δ d 0   ν − ν 0 d − d 0   ∈ ∆ d , wh ere 1 d 0 = [1 1 · · · 1] ⊤ ∈ R d 0 . Rep eati ng the pro of mutatis mutandis giv es the b oun d M n (∆ d , k·k 2 2 , α ) ≥ max d 0 ∈{ 2 , 4 ,..., 2 ⌊ d/ 2 ⌋} min  1 4 d 0 , d 0 64 n ( e α − 1) 2  . Cho osing d 0 to b e the ev en int eger closest to √ nα 2 in { 1 , . . . , d } and n oting that ( e α − 1) 2 ≤ 3 α 2 for α ∈ [0 , 1] giv es th e claimed result (33). In the case of measurin g error in the ℓ 1 -norm, w e p ro vide a completely identica l pro of, except that we hav e the separation k b θ − θ ν k 1 ≥ ( δ/d ) P d/ 2 j =1 1 { b ν j 6 = ν j } , and th us inequalit y (58) holds with the initial m ultiplier δ 2 / (4 d ) replaced b y δ/ (4 d ). P arallel reasoning to the ℓ 2 2 case then giv es the minimax lo w er b ound M n (∆ d , k·k 1 , α ) ≥ δ 4 d 0 h 1 − (4 n ( e α − 1) 2 δ 2 /d 2 0 ) 1 2 i for an y ev en d 0 ∈ { 2 , . . . , d } . Cho osing δ = min { 1 , d 2 0 / (16 n ( e α − 1) 2 ) } giv es the claim (34). C Pro ofs of densit y estimation r esults In this section, we p ro vide th e pro o fs of the results stated in Section 5.3 on dens i t y estimation. W e defer the pr oofs of more tec hn ic al results to later app endices. Throughout all p roofs, w e us e c to denote a univ ersal constan t whose v alue ma y change f rom line to line. 44 0 0.5 1 −0.3 0 0.3 g 1 0 0.5 1 −0.03 0 0.03 g 2 (a) (b) Figure 3. Panel (a): illustratio n of 1-Lipschitz contin uous bump function g 1 used to pack F β when β = 1. Panel (b): bump function g 2 with | g ′′ 2 ( x ) | ≤ 1 used to pack F β when β = 2. C.1 Pro of of Prop osition 7 As w it h our pr o of for m ultinomial estimation, the argum e n t follo ws the general outline describ ed at the b eginning of Section B. W e remark that our pr oof is based on an explicit construction of densities id e n tiﬁed with corners of th e hyp ercub e , a more classical app r oa c h than the global metric en tropy approac h of Y ang an d Barron [52] (cf. [53]). W e u s e the lo ca l pac king approac h since it is b etter su it ed to the pr iv acy constrain ts and information con tractions that we h a ve dev elop ed. In comparison with our pro ofs of previous pr op ositions, the construction of a suitable pac king of F β is somewh a t more c hallenging: the identiﬁcatio n of densities with ﬁnite-dimensional vecto rs, w hic h w e require for our application of Theorem 3, is not immediately obvi ous. In all cases, we guaran tee that our densit y functions f b elo ng to the trigonometric Sob olev sp a ce, so we m ay work directly with smo o th densit y fu nctio ns f . Constructing w e ll- separated densities: W e begin by d escrib ing a standard framewo rk for deﬁning lo cal pac kings of d e nsit y fu n ct ions. Let g β : [0 , 1] → R b e a f unction satisfying th e follo wing p roperties: (a) Th e fu nctio n g β is β -times diﬀeren tiable with 0 = g ( i ) β (0) = g ( i ) β (1 / 2) = g ( i ) β (1) for all i < β . (b) Th e function g β is cen tered w i th R 1 0 g β ( x ) dx = 0, and there exist constan ts c, c 1 / 2 > 0 such that Z 1 / 2 0 g β ( x ) dx = − Z 1 1 / 2 g β ( x ) dx = c 1 / 2 and Z 1 0  g ( i ) β ( x )  2 dx ≥ c for all i < β . (c) Th e fun ction g β is n o n-negativ e on [0 , 1 / 2] and non-p ositiv e on [1 / 2 , 1], and Leb esgue measur e is absolutely co n tinuous with resp ect to the measures G j , j = 1 , 2, giv en by G 1 ( A ) = Z A ∩ [0 , 1 / 2] g β ( x ) dx and G 2 ( A ) = − Z A ∩ [1 / 2 , 1] g β ( x ) dx. (60) 45 (d) Lastly , for almost ev ery x ∈ [0 , 1], w e ha ve | g ( β ) β ( x ) | ≤ 1 and | g β ( x ) | ≤ 1. As illustrated in Figure 3 , the fun ctions g β are smooth “bu m p” functions. Fix a p ositiv e in teger k (to b e sp eciﬁed in the sequel). Our ﬁr st step is to construct a family of “ w ell-separated” dens it ies f o r whic h w e can redu c e th e densit y e stimation problem to one o f iden tifying corners of a hypercub e, whic h allo ws application of Lemma 1. S peciﬁcally , w e m ust exhibit a condition similar to the separatio n condition (10). F or eac h j ∈ { 1 , . . . , k d e ﬁne the function g β ,j ( x ) := 1 k β g β  k  x − j − 1 k   1 n x ∈ h j − 1 k , j k io . Based on this d eﬁ nitio n, w e deﬁne th e family of densities  f ν := 1 + k X j =1 ν j g β ,j for ν ∈ V  ⊆ F β . (61) It is a standard fact [53, 49] that for any ν ∈ V , th e function f ν is β -times d i ﬀeren tiable, satisﬁes | f ( β ) ( x ) | ≤ 1 for all x . No w, based on some densit y f ∈ F β , let us deﬁn e the sign v ector v ( f ) ∈ {− 1 , 1 } k to ha ve en tries v j ( f ) := argmin s ∈{− 1 , 1 } Z [ j − 1 k , j k ] ( f ( x ) − sg β ,j ( x )) 2 dx. Then by constr u ct ion of the g β and v , w e h a ve for a numerical constan t c (whose v alue ma y dep end on β ) that k f − f ν k 2 2 ≥ c k X j =1 1 { v j ( f ) 6 = ν j } Z [ j − 1 k , j k ] ( g β ,j ( x )) 2 dx = c k 2 β +1 k X j =1 1 { v j ( f ) 6 = ν j } . By insp ect ion, th is is the Hamming separation required in inequ a lit y (10), whence the sharp er v ersion (32) of Assouad’s Lemma 1 giv es the result M n  F β [1] , k ·k 2 2 , α  ≥ c k 2 β   1 −  1 4 k k X j =1 ( D kl  M n + j k M n − j  + D kl  M n − j k M n + j  )  1 2   , (62) where w e h a ve deﬁned P ± j to b e the probabilit y d istribution asso ciated with th e a v eraged d e nsities f ± j = 2 1 − k P ν : ν j = ± 1 f ν . Applying divergence inequalities: No w we m ust con trol the su mmed KL-diverge nces. T o do so, w e note that b y the construction (61 ), symm e try implies that f + j = 1 + g β ,j and f − j = 1 − g β ,j for eac h j ∈ [ k ] . (63) W e then obtai n the follo wing resu lt , whic h b ounds the av eraged KL-d ivergences. 46 Lemma 10. F or any α -lo c al ly private c onditional distribution Q , the summe d KL-diver genc es ar e b ounde d as k X j =1  D kl  M n + j k M n − j  + D kl  M n + j k M n − j  ≤ 4 c 2 1 / 2 n ( e α − 1) 2 k 2 β +1 . The pr oof of this lemma is fairly inv olv ed, so we defer it to App endix E.3. W e n o te that, for α ≤ 1, w e ha v e ( e α − 1) 2 ≤ 3 α 2 , so we m ay rep lace the b ound in Lemma 10 w it h the qu a n tit y cnα 2 /k 2 β +1 for a c onstan t c . W e remark th a t standard div ergence b ounds u sing Assouad’s lemma [53, 49] pro vide a b ound of roughly n/k 2 β ; our b o und i s th us essen tially a f actor of the “dimension” k tigh ter. The remainder of the pro o f is an a pplication of inequalit y (62). In particular, by applyin g Lemma 10, w e ﬁnd that for an y α -lo cally priv ate c h a nnel Q , there are constan ts c 0 , c 1 (whose v alues may dep e nd on β ) suc h that M n  F β , k·k 2 2 , Q  ≥ c 0 k 2 β " 1 −  c 1 nα 2 k 2 β +2  1 2 # . Cho osing k n,α,β =  4 c 1 nα 2  1 2 β +2 ensures that the quan tit y inside th e paren theses is at least 1 / 2. Substituting for k in the preceding displa y p ro v es the p r o p osition. C.2 Pro of of Prop osition 8 Note that the op erator Π k p erforms a Euclidean pro jection of the vecto r ( k/n ) P n i =1 Z i on to the scaled probability simplex, th us pr o jecting b f onto the set of p robabilit y densities. Give n th e non-expansivit y of Euclidean p ro jectio n, this op eration can only decrease the error k b f − f k 2 2 . Co n- sequen tly , it suﬃces to b ound the error of the un pro jected estimator; to redu c e notational o verhead w e retain our previous notation of b θ for the unp ro jecte d v ersion. Using this notation, w e ha ve E h   b f − f   2 2 i ≤ k X j =1 E f " Z j k j − 1 k ( f ( x ) − b θ j ) 2 dx # . By expand ing this exp ression and noting that the in depen den t n oi se v ariables W ij ∼ Laplace( α/ 2) ha ve zero mean, we obtain E h   b f − f   2 2 i ≤ k X j =1 E f " Z j k j − 1 k  f ( x ) − k n n X i =1 [ e k ( X i )] j  2 dx # + k X j =1 Z j k j − 1 k E  k n n X i =1 W ij  2  = k X j =1 Z j k j − 1 k E f "  f ( x ) − k n n X i =1 [ e k ( X i )] j  2 # dx + k 1 k 4 k 2 nα 2 . (64) Next w e b ound th e err o r term inside the exp ecta tion (64). Deﬁning p j := P f ( X ∈ X j ) = R X j f ( x ) dx , we h av e k E f [[ e k ( X )] j ] = k p j = k Z X j f ( x ) dx ∈  f ( x ) − 1 k , f ( x ) + 1 k  for an y x ∈ X j , 47 b y the Lipsc hitz con tinuit y of f . Th us, expanding the bias and v ariance of the in tegrated expecta- tion ab o ve, w e ﬁn d that E f "  f ( x ) − k n n X i =1 [ e k ( X i )] j  2 # ≤ 1 k 2 + V ar k n n X i =1 [ e k ( X i )] j ! = 1 k 2 + k 2 n V ar([ e k ( X )] j ) = 1 k 2 + k 2 n p j (1 − p j ) . Recalling the inequalit y (64), w e obtain E f h   b f − f   2 2 i ≤ k X j =1 Z j k j − 1 k  1 k 2 + k 2 n p j (1 − p j )  dx + 4 k 2 nα 2 = 1 k 2 + 4 k 2 nα 2 + k n k X j =1 p j (1 − p j ) . Since P k j =1 p j = 1, w e ﬁnd that E f h   b f − f   2 2 i ≤ 1 k 2 + 4 k 2 nα 2 + k n , and c ho osing k = ( nα 2 ) 1 4 yields the claim. C.3 Pro of of Prop osition 9 W e b egin by ﬁxing k ∈ N ; w e will optimize the c hoice of k sh o rtly . Recall that, since f ∈ F β [ C ], we ha ve f = P ∞ j =1 θ j ϕ j for θ j = R f ϕ j . Th us w e ma y deﬁne Z j = 1 n P n i =1 Z i,j for eac h j ∈ { 1 , . . . , k } , and w e hav e k b f − f k 2 2 = k X j =1 ( θ j − Z j ) 2 + ∞ X j = k +1 θ 2 j . Since f ∈ F β [ C ], w e are guaranteed that P ∞ j =1 j 2 β θ 2 j ≤ C 2 , and hence X j >k θ 2 j = X j >k j 2 β θ 2 j j 2 β ≤ 1 k 2 β X j >k j 2 β θ 2 j ≤ 1 k 2 β C 2 . F or the ind ic es j ≤ k , we note that by assumption, E [ Z i,j ] = R ϕ j f = θ j , and since | Z i,j | ≤ B , we ha ve E  ( θ j − Z j ) 2  = 1 n V ar( Z 1 ,j ) ≤ B 2 n = B 2 0 c k k n  e α + 1 e α − 1  2 , where c k = Ω(1) is the constant in expression (43 ). Putting toget her the pieces, the mean-squ ared L 2 -error is upp er b ounded as E f h k b f − f k 2 2 i ≤ c  k 2 nα 2 + 1 k 2 β  , where c is a constan t d epend ing on B 0 , c k , and C . C hoose k = ( nα 2 ) 1 / (2 β +2) to complete the p roof. 48 C.4 Insuﬃciency of Laplace noise for densit y estimation Finally , w e consider the insuﬃciency of standard L ap lace noise addition for estimation in the setting of this sectio n. Consider the v ector [ ϕ j ( X i )] k j =1 ∈ [ − B 0 , B 0 ] k . T o mak e this v ector α - diﬀeren tially priv ate b y adding an indep enden t Laplace noise v ector W ∈ R k , we must tak e W j ∼ Laplace( α/ ( B 0 k )). The natural orthogonal series estimator [e.g., 51] is to take Z i = [ ϕ j ( X i )] k j =1 + W i , where W i ∈ R k are indep endent Laplace noise v ectors. W e then use the density estimator (44), except that we use the Laplacian p ertur bed Z i . H o wev er, this estimator su ﬀer s the follo wing d ra wbac k: Observ ation 1. L et b f = 1 n P n i =1 P k j =1 Z i,j ϕ j , wher e the Z i ar e the L aplac e- p erturb e d ve ctors of the pr evious p ar agr aph. Assume the orthono rmal b asis { ϕ j } of L 2 ([0 , 1]) c ontains the c onstant function. Ther e is a c onstant c such that for any k ∈ N , ther e is an f ∈ F β [2] such that E f h k f − b f k 2 2 i ≥ c ( nα 2 ) − 2 β 2 β +3 . Pr o of. W e b egin b y n o ting that f o r f = P j θ j ϕ j , b y deﬁn it ion of b f = P j b θ j ϕ j w e h av e E h k f − b f k 2 2 i = k X j =1 E h ( θ j − b θ j ) 2 i + X j ≥ k +1 θ 2 j = k X j =1 B 2 0 k 2 nα 2 + X j ≥ k +1 θ 2 j = B 2 0 k 3 nα 2 + X j ≥ k +1 θ 2 j . Without loss of generalit y , let u s assume ϕ 1 = 1 is the constan t f unction. Then R ϕ j = 0 for all j > 1, and by deﬁning the true function f = ϕ 1 + ( k + 1) − β ϕ k +1 , w e ha ve f ∈ F β [2] and R f = 1, and moreo v er, E h k f − b f k 2 2 i ≥ B 2 0 k 3 nα 2 + 1 ( k + 1) − 2 β ≥ C β ,B 0 ( nα 2 ) − 2 β 2 β +3 , where C β ,B 0 is a constan t dep ending on β and B 0 . This ﬁ nal low er b ound comes by minimizing o ve r all k . (If ( k + 1) − β B 0 > 1, w e can r esc ale ϕ k +1 b y B 0 to ac h ie v e the same r e sult and guarantee that f ≥ 0.) This lo wer b ound s h o w s that standard estimators based on adding Laplace noise to appropr iate basis expansions of the data fail: ther e is a degradation in rate from n − 2 β 2 β +2 to n − 2 β 2 β +3 . While this is not a f o rmal pro of that no approac h based on Laplace p erturbation can pro vide optimal con verge nce r a tes in our sett ing, it do e s suggest that ﬁnding suc h an estimato r is non-trivial. D P ac king set constructions In this app endix, we collect pr oofs of the constructions of our pac kin g sets. D.1 Pro of of Lemma 5 By the V a rshamo v-Gilb ert b ound [e.g., 53, Lemma 4], there is a pac king H d of th e d -dimensional h yp ercub e {− 1 , 1 } d of size |H d | ≥ exp( d/ 8) satisfying k u − v k 1 ≥ d/ 2 for all u, v ∈ H d with u 6 = v . F or eac h u ∈ H d , set ν u = u/ √ d , so that k ν u k 2 = 1 and k ν u − ν v k 2 2 ≥ d/d = 1 for u 6 = v ∈ H d . Setting V = { ν u | u ∈ H d } giv es the desired result. 49 D.2 Pro of of Lemma 6 W e u se the probabilistic method [2], sho w ing that for random dr aws f r o m the Bo olea n hypercub e, a collection of v ectors as claimed in the lemma exists w it h p ositiv e probab ility . Consider a set of N v ectors ν i ∈ {− 1 , 1 } k sampled uniformly at random from the Boolean hyp ercub e, and for a ﬁxed t > 0, deﬁne the t w o “bad” ev en ts B 1 :=  ∃ i 6 = j |   ν i − ν j   1 < k / 2  , and B 2 ( t ) :=  1 N N X i =1 ν i ( ν i ) ⊤ 6 ( t + 1) I k × k  . W e b egin b y analyzing B 1 . Letting { W ℓ } k ℓ =1 denote a sequence of i.i.d. Bernoulli { 0 , 1 } v ariables, f or an y i 6 = j , the even t {k ν i − ν j k 1 < k / 2 } is equiv alent to the even t { P k ℓ =1 W ℓ < k / 4 } . C o nsequent ly , b y com bining the union b ound w it h the the Hoeﬀding b ound, we ﬁnd P ( B 1 ) ≤  N 2  P  k ν i − ν j k 1 < k / 2  ≤  N 2  exp( − k / 8) . (65) T urning to the ev ent B 2 ( t ), w e ha ve 1 N P N i =1 ν i ( ν i ) ⊤ 6 ( t + 1) I k × k if and o nly if the maxim u m eigen v alue λ max ( 1 N P N i =1 ν i ( ν i ) ⊤ − I k × k ) is larger than t . Using sharp v ersions of the Ahlswede- Win ter inequalities [1] (see Corollary 4.2 in the pap er [42]), w e obtain P ( B 2 ( t )) ≤ k exp  − N t 2 k 2  . (66) Finally , com bining the union b ound with inequalities (65) and (66), w e ﬁnd that P ( B 1 ∪ B 2 ( t )  ≤ N ( N − 1) 2 exp( − k / 8) + d exp  − N t 2 k 2  . By in s pection, if w e c ho ose t = 24 and N = ⌈ exp( k / 16 ) ⌉ , the ab o v e b ound is strictly less than 1, so a p ac king satisfying the constraints m u s t exist. E Information b ounds In this app endix, w e collect th e pro ofs of lemmas providing m u t ual in formati on and KL-div ergence b ounds. E.1 Pro of of Lemma 7 Our strategy is to apply Theorem 2 to b oun d the mutual information. Without loss of generalit y , w e ma y assume that r = 1 so the set X = {± e j } k j =1 , w here e j ∈ R d . Th us, under the notatio n of Theorem 2, w e ma y id e n tify v ectors γ ∈ L ∞ ( X ) by ve ctors γ ∈ R 2 k . If w e d eﬁ ne ν = 1 |V | P ν ∈V ν to b e the mean elemen t of the pac kin g set, the linear functional ϕ ν deﬁned in Theorem 2 is ϕ ν ( γ ) = 1 2 k  k X j =1 γ ( e j ) 1 + ν j δ 2 + k X j =1 γ ( − e j ) 1 − ν j δ 2  − 1 2 k  k X j =1 γ ( e j ) 1 + ν j δ 2 + k X j =1 γ ( − e j ) 1 − ν j δ 2  = 1 2 k k X j =1  δ 2 γ ( e j )( ν j − ν j ) − δ 2 γ ( − e j )( ν j − ν j )  = δ 4 k γ ⊤  I k × k 0 k × d − k − I k × k 0 k × d − k  ( ν − ν ) . 50 Deﬁne the matrix A :=  I k × k 0 k × d − k − I k × k 0 k × d − k  ∈ {− 1 , 0 , 1 } 2 k × d . Then w e ha v e that 1 |V | X ν ∈V ϕ ν ( γ ) 2 = δ 2 (4 k ) 2 γ ⊤ A 1 |V | X ν ∈V ( ν − ν )( ν − ν ) ⊤ A ⊤ γ = δ 2 (4 k ) 2 γ ⊤ A 1 |V | X ν ∈V ν ν ⊤ − ν ν ⊤ ! A ⊤ γ ≤ δ 2 (4 k ) 2 γ ⊤ A 1 |V | X ν ∈V ν ν ⊤ ! A ⊤ γ ≤ 25 16 δ 2 k 2 γ ⊤ AI 2 d × 2 d A ⊤ γ =  5 δ 4 k  2 γ ⊤  I k × k − I k × k − I k × k I k × k  γ . (67) Here the ﬁnal inequ a lit y u sed our assump ti on on th e sum of outer pro ducts in V . W e complete our pro of using the b ound (67). Th e op erator norm of the matrix sp eciﬁed in (67 ) is 2. As a consequence, since w e h a ve th e cont ainmen t B ∞ = n γ ∈ R 2 k : k γ k ∞ ≤ 1 o ⊂ n γ ∈ R 2 k : k γ k 2 2 ≤ 2 k o w e h av e the inequalit y sup γ ∈ B ∞ 1 |V | X ν ∈V ϕ ν ( γ ) 2 ≤ 25 δ 2 16 k 2 · 2 · 2 k = 25 4 δ 2 k Applying Theorem 2 co mpletes the pro of. E.2 Pro of of Lemma 8 It is no loss of generalit y to assume th e radiu s r = 1. W e use th e n o tation of Theorem 2 , recalling the linear f unctional s ϕ ν : L ∞ ( X ) → R . Because the set X = {− 1 , 1 } d , w e can identify vecto rs γ ∈ L ∞ ( X ) with v ectors γ ∈ R 2 d . Moreo ver, we hav e (b y constru ct ion) that ϕ ν ( γ ) = X x ∈{− 1 , 1 } d γ ( x ) p ν ( x ) − X x ∈{− 1 , 1 } d γ ( x ) p ( x ) = 1 2 d X x ∈X γ ( x )(1 + δ ν ⊤ x − 1) = δ 2 d X x ∈X γ ( x ) ν ⊤ x. F or ea c h ν ∈ V , w e ma y construct a v ector u ν ∈ {− 1 , 1 } 2 d , indexed b y x ∈ {− 1 , 1 } d , with u ν ( x ) = ν ⊤ x = ( 1 if ν = ± e j and sign( ν j ) = sign( x j ) − 1 if ν = ± e j and sign( ν j ) 6 = sign( x j ) . F or ν = e j , we see that u e 1 , . . . , u e d are the ﬁrst d columns of the standard Hadama rd transform matrix (and u − e j are their negativ es). Then w e h av e that P x ∈X γ ( x ) ν ⊤ x = γ ⊤ u ν , and ϕ ν ( γ ) = γ ⊤ u ν u ⊤ ν γ . 51 Note also that u ν u ⊤ ν = u − ν u ⊤ − ν , and as a consequence we ha v e X ν ∈V ϕ ν ( γ ) 2 = δ 2 4 d γ ⊤ X ν ∈V u ν u ⊤ ν γ = 2 δ 2 4 d γ ⊤ d X j =1 u e j u ⊤ e j γ . (68) But n o w , studying the quadratic form (68), we note th a t the ve ctors u e j are orthogo nal. As a consequence, th e v ectors (up to scaling) u e j are the only eigen vec tors corresp ondin g to p ositiv e eigen v alues of the p ositiv e semideﬁnite matrix P d j =1 u e j u ⊤ e j . Th us, since the set B ∞ = n γ ∈ R 2 d : k γ k ∞ ≤ 1 o ⊂ n γ ∈ R 2 d : k γ k 2 2 ≤ 2 d o , w e h av e via an eigen v alue calculation that sup γ ∈ B ∞ X ν ∈V ϕ ν ( γ ) 2 ≤ 2 δ 2 4 d sup γ : k γ k 2 2 ≤ 2 d γ ⊤ d X j =1 u e j u ⊤ e j γ = 2 δ 2 4 d k u e 1 k 4 2 = 2 δ 2 since k u e j k 2 2 = 2 d for eac h j . Applying Theorem 2 and Corollary 4 completes the pro of. E.3 Pro of of Lemma 10 This r e sult relies on T heorem 3, along with a careful argument to un d erstand the extreme p oin ts of γ ∈ L ∞ ([0 , 1]) that w e use when applying the resu lt . First, w e tak e the p a c king V = {− 1 , 1 } β and densities f ν for ν ∈ V as in the construction (61). Ov er all, o ur ﬁrst step is to show f or the purp oses of applying T h eo rem 3, it is no loss of generalit y to iden tify γ ∈ L ∞ ([0 , 1]) with v ectors γ ∈ R 2 k , where γ is constan t on interv als of the form [ i/ 2 k , ( i + 1) / 2 k ]. With this iden tiﬁcation complete, w e can then provide a b ound on the correlat ion of an y γ ∈ B ∞ with the densities f ± j deﬁned in (63), w hic h completes the pr oof. With this outline in min d, let the sets D i , i ∈ { 1 , 2 , . . . , 2 k } , b e deﬁned as D i = [( i − 1) / 2 k , i/ 2 k ) except that D 2 k = [(2 k − 1) / 2 k , 1], so the collection { D i } 2 k i =1 forms a partition of the u nit interv al [0 , 1]. By construction of the densities f ν , th e sign of f ν − 1 r e mains constant on eac h D i . Let us deﬁne (for shorthand) the linear fu nctio nals ϕ j : L ∞ ([0 , 1]) → R for eac h j ∈ { 1 , . . . , k } via ϕ j ( γ ) := Z γ ( dP + j − dP − j ) = 2 k X i =1 Z D i γ ( x )( f + j ( x ) − f − j ( x )) dx = 2 Z D 2 j − 1 ∪ D 2 j γ ( x ) g β ,j ( x ) dx, where w e recall the deﬁnitions (63) of the mixture densities f ± j = 1 ± g β ,j . Since the set B ∞ from Theorem 3 is compact, conv ex, and Hausdorﬀ, the Krein-Milman theorem [45, Prop osition 1 .2] guaran tees that it is equal to th e conv ex h u ll of its extreme p oin ts; moreo v er, since the functionals γ 7→ ϕ 2 j ( γ ) are conv ex, the sup rem um in Theorem 3 must b e atta ined at the extreme p oin ts of B ∞ ([0 , 1]). As a co nsequence, when applying the div ergence b ound k X j =1  D kl  M n + j k M n − j  + D kl  M n − j k M n + j  ≤ 2 n ( e α − 1) 2 sup γ ∈ B ∞ k X j =1 ϕ 2 j ( γ ) , (69) 52 w e can restrict our atten tion to γ ∈ B ∞ for whic h γ ( x ) ∈ {− 1 , 1 } . No w w e a rgue that it is no loss of generalit y to assume that γ , wh en restricted to D i , is a constan t (apart from a measure ze ro set). Fix i ∈ [2 k ], and assume for the sake of con tradiction that there exist sets B i , C i ⊂ D i suc h that γ ( B i ) = { 1 } and γ ( C i ) = {− 1 } , while µ ( B i ) > 0 and µ ( C i ) > 0 wh ere µ denotes Leb esg ue measure. 1 W e will constru ct v ectors γ 1 and γ 2 ∈ B ∞ and a v alue λ ∈ (0 , 1) such th at Z D i γ ( x ) g β ,j ( x ) dx = λ Z D i γ 1 ( x ) g β ,j ( x ) dx + (1 − λ ) Z D i γ 2 ( x ) g β ,j ( x ) dx sim u lt aneously for all j ∈ [ k ], while on D c i = [0 , 1] \ D i , w e will ha ve the equiv alence γ 1 | D c i ≡ γ 2 | D c i ≡ γ | D c i . Indeed, set γ 1 ( D i ) = { 1 } and γ 2 ( D i ) = {− 1 } , otherwise setting γ 1 ( x ) = γ 2 ( x ) = γ ( x ) for x 6∈ D i . F or the unique ind e x j ∈ [ k ] suc h that [( j − 1) /k , j /k ] ⊃ D i , w e deﬁ n e λ := R B i g β ,j ( x ) dx R D i g β ,j ( x ) dx so 1 − λ = R C i g β ,j ( x ) dx R D i g β ,j ( x ) dx . By the construction of the fun ction g β , the f unctions g β ,j do not c han ge signs on D i , and the absolute con tinuit y conditions on g β sp eciﬁed in equation (6 0) guaran tee 1 > λ > 0, since µ ( B i ) > 0 and µ ( C i ) > 0. W e th us ﬁn d that for an y j ∈ [ k ], Z D i γ ( x ) g β ,j ( x ) dx = Z B i γ 1 ( x ) g β ,j ( x ) dx + Z C i γ 2 ( x ) g β ,j ( x ) dx = Z B i g β ,j ( x ) dx − Z C i g β ,j ( x ) dx = λ Z D i g β ,j ( x ) dx − (1 − λ ) Z D i g β ,j ( x ) dx = λ Z γ 1 ( x ) g β ,j ( x ) dx + (1 − λ ) Z γ 2 ( x ) g β ,j ( x ) dx. (Notably , for j suc h that g β ,j is id e n tically 0 on D i , this equalit y is trivial.) By linearit y and the strong con vexit y of the function x 7→ x 2 , then, we ﬁnd that for sets E j := D 2 j − 1 ∪ D 2 j , k X j =1 ϕ 2 j ( γ ) = k X j =1 Z E j γ ( x ) g β ,j ( x ) dx ! 2 < λ k X j =1 Z E j γ 1 ( x ) g β ,j ( x ) dx ! 2 + (1 − λ ) X ν ∈V Z E j γ 2 ( x ) g β ,j ( x ) dx ! 2 . Th us one of the dens ities γ 1 or γ 2 m u st hav e a larger ob jectiv e v alue than γ . This is our desired con tradiction, whic h sho ws that (up to measure zero sets) an y γ attaining the suprem um in the information b ound (69) must b e constan t on eac h of the D i . Ha ving sh own that γ is constan t on eac h of the interv als D i , we conclude th a t the sup r em um (69) can b e reduced to a ﬁnite-dimensional problem o v er the subset B 1 , 2 k := n u ∈ R 2 k | k u k ∞ ≤ 1 o 1 F or a fun ctio n f and set A , the n o tation f ( A ) denotes the image f ( A ) = { f ( x ) | x ∈ A } . 53 of R 2 k . In terms of this subs e t, the sup rem um (69) ca n b e rewr it ten as the the up per b ound sup γ ∈ B ∞ k X j =1 ϕ j ( γ ) 2 ≤ su p γ ∈B 1 , 2 k k X j =1  γ 2 j − 1 Z D 2 j − 1 g β ,j ( x ) dx + γ 2 j Z D 2 j g β ,j ( x ) dx  2 By construction of the f unction g β , w e hav e the equalit y Z D 2 j − 1 g β ,j ( x ) dx = − Z D 2 j g β ,j ( x ) dx = Z 1 2 k 0 g β , 1 ( x ) dx = Z 1 2 k 0 1 k β g β ( k x ) d x = c 1 / 2 k β +1 . This implies that 1 2 e α ( e α − 1) 2 n k X j =1  D kl  M n + j k M n − j  + D kl  M n + j k M n − j  ≤ su p γ ∈ B ∞ k X j =1 ϕ j ( γ ) 2 ≤ su p γ ∈B 1 , 2 k k X j =1  c 1 / 2 k β +1 γ ⊤ ( e 2 j − 1 − e 2 j )  2 = c 2 1 / 2 k 2 β +2 sup γ ∈B 1 , 2 k γ ⊤ k X j =1 ( e 2 j − 1 − e 2 j )( e 2 j − 1 − e 2 j ) ⊤ γ , ( 70) where e j ∈ R 2 k denotes the j th standard basis v ector. Rewriting this using the Kronec ke r pro duct ⊗ , w e ha v e k X j =1 ( e 2 j − 1 − e 2 j )( e 2 j − 1 − e 2 j ) ⊤ = I k × k ⊗  1 − 1 − 1 1   2 I 2 k × 2 k . Com b ining this b ound with our in equ a lit y (70), w e obtain k X j =1  D kl  M n + j k M n − j  + D kl  M n + j k M n − j  ≤ 4 n ( e α − 1) 2 c 2 1 / 2 k 2 β +2 sup γ ∈B 1 , 2 k k γ k 2 2 = 4 c 2 1 / 2 n ( e α − 1) 2 k 2 β +1 . F T ec hnical argumen ts In this app endix, we collect pr oofs of te c hn ic al lemmas and results needed for completeness. F.1 Pro of of Lem ma 1 Fix an (arbitrary) estimat or b θ . By assumption (10), w e hav e Φ( ρ ( θ , θ ( P ν ))) ≥ 2 δ d X j =1 1 { [ v ( θ )] j 6 = ν j } . T aking expectations, we s ee that sup P ∈P E P h Φ( ρ ( b θ ( Z 1 , . . . , Z n ) , θ ( P ))) i ≥ max ν ∈V E P ν h Φ( ρ ( b θ ( Z 1 , . . . , Z n ) , θ ν )) i ≥ 1 |V | X ν ∈V E P ν h Φ( ρ ( b θ ( Z 1 , . . . , Z n ) , θ ν )) i ≥ 1 |V | X ν ∈V 2 δ d X j =1 E P ν h 1 n [ ψ ( b θ )] j 6 = ν j oi 54 as the a v erage is smaller than the m a xim u m of a set and using th e s e paration assumption (10). Recalling the deﬁnition (31) of the mixtures P ± j , w e swa p the summation orders to see that 1 |V | X ν ∈V P ν  [ v ( b θ )] j 6 = ν j  = 1 |V | X ν : ν j =1 P ν  [ v ( b θ )] j 6 = ν j  + 1 |V | X ν : ν j = − 1 P ν  [ v ( b θ )] j 6 = ν j  = 1 2 P + j  [ v ( b θ )] j 6 = ν j  + 1 2 P − j  [ v ( b θ )] j 6 = ν j  . This giv es the statemen t claimed in the lemma, wh ile taking an inﬁmum o ver all testing p rocedures ψ : Z n → {− 1 , +1 } giv es the claim (11). F.2 Pro of of un biasedness for sam pling strategy (26a) W e compute the exp ecta tion of a r a ndom v ariable Z sampled according to the strategy (26a), i.e. w e compu te E [ Z | v ] for a v ector v ∈ R d . By scaling, it is n o loss of generalit y to assu me that k v k 2 = 1, and usin g the rotational symmetry of the ℓ 2 -ball, we see it is no loss of g eneralit y to assume that v = e 1 , the ﬁrst standard basis vect or. Let the function s d denote the surface area of the sphere in R d , so that s d ( r ) = dπ d/ 2 Γ( d/ 2 + 1) r d − 1 is the sur fac e area of the sp h ere of radius r . (W e u se s d as a shorthand f o r s d (1) when conv enien t.) Then for a rand om v ariable W sampled uniformly from the half of the ℓ 2 -ball with ﬁrst co o rdinate W 1 ≥ 0, symmetry implies that by integrati ng o v er the radii of the ball, E [ W ] = e 1 2 s d Z 1 0 s d − 1  p 1 − r 2  r dr . Making the c hange of v ariables to sph er ical co ordinates (w e use φ as th e angle), we ha v e 2 s d Z 1 0 s d − 1  p 1 − r 2  r dr = 2 s d Z π / 2 0 s d − 1 (cos φ ) sin φ dφ = 2 s d − 1 s d Z π / 2 0 cos d − 2 ( φ ) sin( φ ) dφ. Noting that d dφ cos d − 1 ( φ ) = − ( d − 1) cos d − 2 ( φ ) sin( φ ), w e obtain 2 s d − 1 s d Z π / 2 0 cos d − 2 ( φ ) sin( φ ) dφ = − cos d − 1 ( φ ) d − 1     π / 2 0 = 1 d − 1 , or that E [ W ] = e 1 ( d − 1) π d − 1 2 Γ( d 2 + 1) dπ d 2 Γ( d − 1 2 + 1) 1 d − 1 = e 1 Γ( d 2 + 1) √ π d Γ( d − 1 2 + 1) | {z } =: c d , (71) where w e deﬁn e th e constant c d to b e the ﬁnal ratio. Allo wing again k v k 2 ≤ r , w i th the expression (71), w e s e e that for ou r sampling strategy for Z , w e h av e E [ Z | v ] = v B r c d  e α e α + 1 − 1 e α + 1  = B r c d e α − 1 e α + 1 . 55 Consequent ly , the c hoice B = e α + 1 e α − 1 r c d = e α + 1 e α − 1 r √ π d Γ( d − 1 2 + 1) Γ( d 2 + 1) yields E [ Z | v ] = v . Moreo ver, w e hav e k Z k 2 = B ≤ r e α + 1 e α − 1 3 √ π √ d 2 b y Stirling’s appro ximation to the Γ-function. By noting that ( e α + 1) / ( e α − 1) ≤ 3 /α for α ≤ 1, w e see that k Z k 2 ≤ 8 r √ d/α . G Eﬀects of diﬀeren tial priv acy in non-comp act s p a ces In this app en dix, we presen t a somewhat p athological example that demonstrate s the eﬀe cts of diﬀeren tial priv acy in non-compact sp a ces. Let us assume only that θ ∈ R and α < ∞ , and w e denote P θ to b e the c ollectio n of p robabilit y measures with v ariance 1 ha ving θ a s a mean. In con trast to the non-priv ate case, where the r isk of the sample mean scale s as 1 /n , w e obtain M n ( R , ( · ) 2 , α ) = ∞ (72) for all n ∈ N . T o see this, consider the F ano inequalit y v ersion (9). Fix δ > 0 and c ho ose { θ 1 = 0 , θ 2 = 2 δ , . . . , θ N = 2 N δ } where N = N ( δ, n ) = max {  exp(64( e α − 1) 2 n )  , 2 4 } . Then b y applying Corollary 1, w e ha ve for V = [ N ] that M n ( R , ( · ) 2 , α ) ≥ δ 2 1 − 4( e α − 1) 2 n P ν,ν ′ ∈V k P ν − P ν ′ k 2 TV / |V | 2 + log 2 log N ( δ , n ) ! . W e ha ve k P ν − P ν ′ k TV ≤ 1 for an y t wo distributions P ν and P ν ′ , whic h implies M n ( R , ( · ) 2 , α ) ≥ δ 2  1 − 16( e α − 1) 2 n + log 2 log N ( δ , n )  ≥ δ 2  1 − 1 2  = 1 2 δ 2 . Since δ > 0 was arb it rary , this prov es the in ﬁnite min im ax r isk b oun d (72). The constru ction to ac hiev e (72) is somewhat contriv ed, b ut it suggests that care is n ee ded when designing diﬀeren tially priv ate in ference pro cedures, and shows that ev en in cases when it is p ossible to attain a parametric rate of con v ergence, there ma y b e no (locally) diﬀerential ly priv ate inference pro cedure. References [1] R. Ahlswe de and A. Winte r. Strong conv erse for iden tiﬁcation via quantum c hannels. IEE E T r ansactions on Informat ion The ory , 48(3):5 69–57 9, Marc h 2002. [2] N. Alon and J . H. Sp encer. The Pr ob abilistic M eth o d . Wiley-In terscience, seco nd edition, 2000. [3] V. Anantharam, A. Gohari, S. Kamath, and C. Nair. On maximal correlation, hyp e rcon trac- tivit y , and the data pr o cessing inequalit y studied by Erkip and Cov er. arXiv:1304 .6133 [cs.IT] , 2013. URL http: //arxiv.org/abs /1304.6133 . 56 [4] E. Ar ia s-Castro, E. Cand´ es, an d M. Da v enp ort. On the fu ndamen tal limits of adaptiv e sensing. IEEE T r ansactions on Informa tion The ory , 59(1): 472–4 81, 2013. [5] P . Assouad. Deux remarqu es su r l’estimation. C. R. A c ademy Scientiﬁque Paris S´ eries I Mathematics , 296(2 3):102 1–1024, 1983. [6] B. Barak, K . C haudh ur i, C . Dwork, S. Kale, F. McSherry , and K. T alw ar. Priv acy , accuracy , and consistency to o: A holistic solution to con tingency table release. In Pr o c e e dings of th e 26th ACM Symp osium on Principles of Datab ase Systems , 200 7. [7] A. Beimel, K. Nissim, and E. Omri. Distributed priv ate data analysis: Sim u lt aneously solving ho w and what. In A dvanc es in Crypt olo gy , v olume 5157 of L e ctur e N o tes in Computer Sci enc e , pages 451– 468. Sp ringer, 2008. [8] A. Beimel, S. P . Kasiviswanathan, and K. Nissim. Bound s on the sample complexit y for priv ate lea rning and p riv ate data release. In Pr o c e e dings of the 7th The ory of Crypto g r aphy Confer enc e , pages 437–454 , 2010 . [9] L. Birg ´ e. Appro ximation dans les espaces m´ et riques et th´ eorie de l’estimation. Zeitschrift f¨ u r Wahrscheinlichkeitsthe orie und verwebte Gebi e t , 65: 181–2 38, 1983. [10] A. Blum , K. Ligett, and A. Roth. A learning theory a pproac h to non-int eractiv e database priv acy . In Pr o c e e dings of the F ourtieth Annual ACM Symp osium on the The ory of Computing , 2008. [11] P . Bruck er. An O ( n ) algorithm for qu a dratic knapsac k problems. O p er ations R ese ar ch L etters , 3(3):1 63–16 6, 1984. [12] V. Buld yg in and Y. Kozac henko. Metric Char acterization of R andom V ariables and R andom Pr o c esses , v olume 188 of T r anslations of Mathematic al Mono gr aphs . American Mathematical So ciet y , 2000. [13] R. Carroll and P . Hall. Optimal rates of con v ergence for decon volving a density . Journal of the Americ an Statistic al Asso ciation , 83(404) :1184– 1186, 1988. [14] R. C a rroll, D. Rupp ert, L. Stefanski, and C. Crainicean u. Me asur ement Err or in Nonline ar Mo dels: A Mo dern Persp e ctive . Chapman and Hall, second ed i tion, 2006. [15] K. Chaudhuri and D. Hsu. Con v ergence r a tes for diﬀerentia lly priv ate statistical estimation. In Pr o c e e dings of the 29th International Confer e nc e on Machine L e arning , 2012. [16] K. Ch au d h uri, C. Monte leoni, and A. D. Sarw ate. Diﬀeren tially priv ate empirical risk mini- mization. Journal of M a chine L e arning R ese ar ch , 12:10 69–11 09, 2011. [17] T. M. Co ve r and J. A. Thomas. Elements of Information The ory, Se c ond Edition . Wiley , 2006. [18] A. De. Lo wer b ounds in diﬀeren tial priv acy . In P r o c e e dings of the Ninth The ory of Crypto gr aphy Confer enc e , 2012. URL http://arx iv.org/abs/1107.2183 . [19] J. C. Duchi, M. I. Jordan, and M. J. W ain wright. Priv acy aw are lea rning. [stat.ML] , 2012. URL http: //arxiv.org/abs /1210.2085 . [20] G. T. Duncan an d D. Lam b ert. Disclosure-limited data d isseminati on. Journal of the Americ an Statistic al Asso ciation , 81(393 ):10–1 8, 19 86. [21] G. T. Dun ca n and D. Lam b ert. T he risk of disclosure for micro data . Journal of Business and Ec onomic Statistics , 7(2):207– 217, 1989. [22] C. Dwo rk and J. Lei. Diﬀeren tial priv acy and robust statistics. In Pr o c e e dings of the F ourty- 57 First Annual ACM Symp osium on the The ory of Computing , 2009. [23] C. Dw ork, K. K e n thapadi, F. McSherr y , I. Mironov, and M. Naor. O ur data, our s e lv es: Pr i v acy via distributed noise ge neration. In A dvanc es in Crypto lo gy (EUR OCR Y P T 2006) , 2006. [24] C. Dw ork, F. McSherry , K. Nissim, an d A. Smith. Calibrating noise to sensitivit y in priv ate data analysis. I n Pr o c e e dings of the 3r d The ory of Crypto gr aphy Confer enc e , pages 265–2 84, 2006. [25] C. Dw ork, G. N. Roth b lu m, and S. P . V adhan. Boosting and diﬀeren tial priv acy . In 51st Annual Symp osium on F oundations of Computer Scienc e , pages 51–60 , 2010. [26] S. Efr o mo vich. Nonp ar ametric Curve Estimation: Metho ds, The ory, and Applic ations . Springer-V erlag, 19 99. [27] A. V. Evﬁm ie vski, J. Gehrk e, and R. Srik ant. Limiting priv acy breac hes in p riv acy preserving data mining. In P r o c e e dings of th e Twenty-Se c ond Symp osium on Principles of Datab ase Systems , pages 21 1–222 , 2003. [28] S. E. Fien b erg, U. E. Mak o v, and R. J. S te ele. Disclosure limitati on using p erturbation and related metho d s for categorica l data. Jo urnal of Oﬃcial Statistics , 14(4 ):485– 502, 1998. [29] S. E. Fienb e rg, A. Rinaldo, and X. Y ang. Diﬀeren tial pr iv acy and the risk-utilit y tradeoﬀ for m ulti-dimensional cont ingency tables. In The International Confer enc e on Privacy in Statistic al Datab ases , 201 0. [30] S. R. Gan ta, S . Kasivisw anathan, and A. Smith. C o mp osition attac ks and auxiliary inform a tion in data priv acy . In Pr o c e e dings of the 14th ACM SIGKDD Confer e nc e on Know le dge and Data Disc overy (KD D) , 2008. [31] L. J . Glese r. Estimation in a multiv ariate “e rrors in v ariables” regression m odel: large sample results. A nnals of Statistics , 9(1):24–44 , 1981 . [32] R. M. Gra y . Entr opy and Informatio n The ory . Springer, 19 90. [33] R. Hall, A. Rinaldo, and L. W asserman. Random diﬀerent ial priv acy . ar Xiv:1112.2680 [stat.ME] , 2011 . URL http://a rxiv.org/abs/1112.2680 . [34] M. Hardt and G. N. Roth blum. A multiplicativ e w eights mec h anism for priv acy-preserving data analysis. In 51st Annual Symp osium on F ounda tions of Computer Scienc e , 2010. [35] M. Hardt and K. T alw ar. O n the geometry of diﬀeren tial priv acy . In Pr o c e e dings of the F ourty-Se c ond A nnual ACM Symp osium on th e The ory o f Computing , pages 705– 714, 2010. URL http://arxi v.org/abs/0907.3754 . [36] R. Z . Has’minskii. A lo wer b ound on the risks of nonparametric estimates of densities in the uniform metric. Th e ory of Pr ob ability and Applic ations , 23:794–798 , 1978. [37] S. P . Kasiviswanat han, H. K. L ee, K. Nissim, S . R askh odniko v a, and A. Smith. What can we learn priv ately? SIAM Journal on Computing , 40(3):7 93–82 6, 2011. [38] M. Kearns. E ﬃci en t noise-toleran t learning from statistical queries. Journal of the A sso ciation for Computing Machinery , 45(6):9 83–10 06, 1998. [39] H. L ing and R. Li. V ariable selection for partially li near mo dels with mea surement errors. Journal of the Americ an Statistic al A sso ciation , 104(485) :234–2 48, 2009. [40] P .-L. Loh and M. J. W ainwrigh t. High-dimensional regression with noisy and missin g data: pro v able guarante es with noncon v exity . Annals of Statistics , 40(3 ):1637 –1664, 2012. 58 [41] Y. Ma and R. Li. V ariable selection in measur e men t error mo d el s. Bernoul li , 16(1):27 4–300 , 2010. [42] L. W. Mac k ey , M. I. Jordan, R. Y. Chen, B. F arrell, and J. A. T ropp. Matrix concen tration inequalities via the metho d of exc hangeable p a irs. arXiv:1201.600 2 [math.PR] , 2012. URL http://a rxiv.org/abs/12 0 1 .6 002 . [43] A. McGregor, I. Mirono v, T. Pitassi, O. Reingo ld, K. T alwa r, and S. V adhan. The limits of t wo-part y diﬀeren tial p riv acy . I n 51st Annual Symp osium on F oundatio ns of Computer Scienc e , 201 0. [44] S. Negah ban, P . Ra vikumar, M. W ain wright, and B. Y u. A u niﬁed framew ork for high- dimensional analysis of M -estimators with decomp osable regularizers. Statistic al Scienc e , 27 (4):53 8–557 , 2012. [45] R. R. Phelps. L e ctur es on Cho que t’s The or em, Se c ond Edition . Springer, 200 1. [46] B. I. P . Rubin ste in, P . L. Bartlett, L. Huang, and N. T aft. Learnin g in a large function space: priv acy-preserving mechanisms for SVM learning. Journal of Privacy and Conﬁd entiality , 4 (1):65 –100, 201 2. [47] D. Scott. On optimal and data-based histograms. Biometrika , 66(3) :605–6 10, 1979. [48] A. Smith. Priv acy-preserving sta tistical estimat ion with optimal conv ergence rates. In Pr o- c e e dings of the F ourty-Thir d Annual ACM Symp osium on the The ory of Computing , 2011. [49] A. B. Tsybako v. Intr o duction to Nonp ar ametric Estimation . Sprin g er, 2009. [50] S. W arner. Rand o mized resp onse: a surv ey tec hniqu e fo r eliminating ev asiv e answ er bias. Journal of the Americ an Statistic al A sso ciation , 60(309):6 3–69, 1965. [51] L. W asserman and S. Zhou. A statisti cal framewo rk for diﬀeren tial priv acy . Journal of the Americ an Statistic al Asso ci ation , 105(489):37 5–389, 2010. [52] Y. Y ang and A. Barron. Information-theoretic determination of min imax rates of conv ergence. Anna ls of Statistics , 27(5) :1564– 1599, 1999. [53] B. Y u. Assouad, Fano, and L e Cam. In F estschrift for Lu c ien L e Cam , p ages 423–435. Springer-V erlag, 19 97. 59

Local Privacy, Data Processing Inequalities, and Statistical Minimax Rates

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment