Faster Algorithms for Testing under Conditional Sampling

F aster Algorithms f or T esting under Co nditional Sampling Mo ein F alahatgar Ashk an Jafarp our Alon Orlitsky mfalah at@uc sd.edu ashkan @ucsd .edu alo n@ucs d.edu V enk atadheera j Pic hapathi Ananda Theertha Suresh dheera jpv7@ gmail.com asures h@ucs d.edu Univ ersit y of California, San Diego Septem b er 26, 2018 Abstract There has bee n considerable recent in terest in distribution-tests whose run-time a nd sample requirements ar e s ublinear in the domain-size k . W e study tw o of the most imp ortant tests under the conditional-sampling model where each query sp eciﬁes a subset S o f the domain, and the r esp onse is a sample dr awn from S accor ding to the under lying dis tr ibution. F or ident ity testing, which as ks whether the underlying distribution equals a spe ciﬁc given distribution or ǫ -diﬀers from it, w e reduce the known time and sample complexities fro m e O ( ǫ − 4 ) to e O ( ǫ − 2 ), thereb y matching the infor mation theor etic low er b ound. F or closeness testing, whic h asks whether t wo distributions underlying o bs erved data sets are equal or diﬀeren t, we reduce existing complexity from e O ( ǫ − 4 log 5 k ) to a n even sub-logarithmic e O ( ǫ − 5 log log k ) th us providing a b etter bound to an op en pr oblem in Be rtinoro W ork shop on Sublinear Algorithms [ Fisher , 2014 ]. Keyw ords: Prop ert y testing, conditional sampling, sublin ear algorithms 1 In tro duc tion 1.1 Bac kground The question of whether tw o p robabilit y d istributions are th e same or s ubstant ially d iﬀerent arises in many imp orta nt applications. W e consider t wo v ariations of th is pr oblem: identity testing where one d istribution is known w h ile the other is rev ealed only via its samples, and closeness t esting where b oth distribu tions are reve aled only via their samples. As its name suggests, identi ty testing arises w hen an iden tit y needs to b e veriﬁed. F or example, testing whether a giv en p erson g enerated an observed ﬁngerprint, if a s p eciﬁc author wrote an unattributed do cument, or if a certain disease caused the symp toms exp erienced by a p atien t. In all these cases we may h av e suﬃcient inf ormation to accurately infer the tru e id en tit y’s underlyin g distribution, and ask whether this d istribution also generated newly-observ ed samples. F or example, m ultiple original h igh-qu alit y ﬁngerpr in ts can b e used to infer the ﬁ n gerprint s tructure, and then b e applied to decide whether it generated newly-observ ed ﬁn gerprint s. 1 Closeness testing arises wh en w e try to d iscern wh ether the same en tit y generated t w o diﬀerent data sets. F or example, if t wo ﬁngerpr in ts w ere generated by the same individual, tw o do cuments w ere written by th e same author, or t w o patien ts suﬀer from the same disease. In these cases, w e do not kno w th e distr ib ution un derlying eac h data set, bu t wo uld still like to d etermine wh ether they w ere generated by the same distribu tion or by tw o d iﬀerent ones. Both problems h av e b e en stud ied extensiv ely . In the h yp o thesis-testing framew ork, r esearc hers studied the asymptotic test error a s the n umb er of samp les tends to in ﬁ nit y , [see Ziv , 198 8 , Unnikrishn an , 2012 , and references therein]. W e will follo w a m ore r ecen t, non-asymptotic ap- proac h. Two distrib utions p and q are ǫ - far if || p − q || 1 ≥ ǫ. An identity test for a giv en distribution p considers indep end en t samples from an unkn o wn dis- tribution q and declares either q = p or they are ǫ -far. The test’s err o r pr o b ability is the highest probabilit y that it errs, maximized o ver q = p and every q that is ǫ -far fr om p . Note if p and q are neither same nor ǫ -far, namely if 0 < || q − p || 1 < ǫ , neither answer constitutes an err or. Let N id ( k , ǫ, δ ) b e the smallest n umb er of samples to iden tit y test ev ery k -elemen t distribution with error probabilit y ≤ δ . It can b e sho wn that the sample complexit y dep e nd s on δ mildly , N id ( k , ǫ, δ ) ≤ O ( N id ( k , ǫ, 0 . 1)) · lo g 1 δ . Hence we f o cus on N id ( k , ǫ, 0 . 1), d enoting it by N id ( k , ǫ ). This formulatio n was introdu ced b y Goldreic h and Ron [ 2000 ] who, motiv ated by testing graph expansion, considered iden tit y testing of uniform distr ibutions. P aninski [ 2008 ] sho wed that the sample complexit y of identit y testing for the u niform distrib utions is Θ( ǫ − 2 √ k ). General iden- tit y testing was stud ied by Batu et al. [ 2001 ] who sho wed that N id ( k , ǫ ) ≤ e O ( ǫ − 2 √ k ), and re- cen tly V alian t and V alian t [ 201 3 ] pro ved a matc hing lo wer b ound , implying that N id ( k , ǫ ) = Θ( ǫ − 2 √ k ), where e O an d later e Θ and e Ω, hide m ultiplicativ e logarithmic factors. Similarly , a closeness test tak es indep enden t samples from p and q and declares them either to b e th e same or ǫ -far. Th e test’s err o r pr ob a bility is the highest probabilit y that it err s, maximized o v er q = p an d every p and q that are ǫ -far. Let N cl ( k , ǫ, δ ) b e the smallest num b er of samples that suﬃce to closeness test ev ery t wo k -elemen t distributions with error pr obabilit y ≤ δ . Here to o it suﬃces to consider N cl ( k , ǫ ) def = N cl ( k , ǫ, 0 . 1). Closeness testing w as ﬁrst studied b y Batu et al. [ 2000 ] who sho we d that N cl ( k , ǫ ) ≤ e O ( ǫ − 4 k 2 / 3 ). Recen tly V alian t [ 2011 ], Ch an et al. [ 2014b ] sho wed that N cl ( k , ǫ ) = Θ(max( ǫ − 4 / 3 k 2 / 3 , ǫ − 2 √ k )). 1.2 Alternativ e mo dels The p roblem’s elegance, int rin s ic interest, and p otent ial applications ha ve led seve ral researc hers to consider scenarios where few er samples may s uﬃce. Monotone, log-conca v e, and m -mo dal distribu- tions w ere considered in Rubinfeld and S erv edio [ 2009 ], Dask alakis et al. [ 2013 ], Diak onik olas et al. [ 2015 ], Chan et al. [ 2014a ], and their sample complexit y was sho wn to d ecline from a p olynomial in k to a p olynomial in log k . F or example, identit y testing of monotone d istributions o ver k elements requires O ( ǫ − 5 / 2 √ log k ) samples, a nd iden tit y testing log-conca v e d istributions o ver k elemen ts requires e O ( ǫ − 9 / 4 ) samples, indep end ent of the supp ort size k . A comp etitiv e framework that analyzes the optimalit y for eve ry pair of distributions wa s consid- ered in Ac hary a et al. [ 2012 ], V alian t and V alia nt [ 2013 ]. O th er related scenarios include classiﬁca- tion [ Ac hary a et al. , 2012 ], outlier detection [ Achary a et al. , 2014b ], testing collec tions of d istribu- tions [ Levi et al. , 2013 ], testing for th e class of monotone d istributions [ Batu et al. , 2004 ], testing for 2 the class of Po isson Binomial d istr ibutions [ Achary a and Dask al akis , 2015 ], testing under d iﬀeren t distance measures [ Guha et al. , 2009 , W aggoner , 2015 ]. Another direction lo we red the sample complexit y of all d istr ibutions b y considering more p ow er- ful qu eries. Perhaps the most n atural is the c onditiona l-sampling mo del int ro duced in dep end en tly in Chakrab orty et al. [ 2013 ] and Canonne et al. [ 2014 ], w here ins tead of obtaining samples from the en tire sup p ort set, eac h query sp eciﬁes a query set S ⊆ [ k ] and the samples are then selected from S in p rop ortion to their original probabilit y , namely element i is selected w ith probabilit y P S ( i ) = ( p ( i ) p ( S ) i ∈ S, 0 otherwise, where p ( S ) is the probability of set S under p . Conditional sampling is a natur al extension of sampling, and Chakrab orty et al. [ 2013 ] d escrib es seve ral s cenarios wh ere it ma y arise. Note that unlik e other works in distribution testing, conditional sampling algorithms can b e adaptiv e, i.e., eac h query set can d ep end on previous queries and observed samples. It is similar in spirit to the mac hine learning’s p opular active testing paradigm, where additional in formation is in teractiv ely requested for sp eciﬁc domain element s. Balcan et al. [ 2012 ] sho wed that v arious p roblems such as testing u nions of interv als, testing linear separators b eneﬁt signiﬁcan tly from the activ e testing mo del. Let N ∗ id ( k , ǫ ) and N ∗ cl ( k , ǫ ) b e the n umb er of samp les required for identit y- and closeness-testing under conditional sampling mo del. F or ident ity testing, Canonne et al. [ 2014 ] sho w ed that cond i- tional sampling eliminates the dep end en ce on k , Ω( ǫ − 2 ) ≤ N ∗ id ( k , ǫ ) ≤ e O ( ǫ − 4 ) . F or closeness testing, the same pap er sho wed that N ∗ cl ( k , ǫ ) ≤ e O ( ǫ − 4 log 5 k ) . Chakrab orty et al. [ 2013 ] show ed that N ∗ id ( k , ǫ ) ≤ p oly(log ∗ k , ǫ − 1 ) and designed a p oly(log k , ǫ − 1 ) algorithm for testing any lab el-in v arian t prop ert y . Th ey also d eriv ed a Ω( √ log log k ) lo wer b ound for testing an y lab el-in v arian t prop er ty . An op en p roblem p osed by Fish er [ 2014 ] ask ed the sample complexit y of closeness testing under conditional sampling whic h was partly answered by Ac hary a et al. [ 2014a ], wh o sh o w ed N ∗ cl ( k , 1 / 4) ≥ Ω ( p log log k ) . 1.3 New results Our ﬁrst result resolv es the sample complexit y of id en tit y testing with conditional sampling. F or iden tit y testing w e show that N ∗ id ( k , ǫ ) ≤ e O ( ǫ − 2 ) . Along with the information-theoretic lo wer b ound ab o v e, this yields N ∗ id ( k , ǫ ) = e Θ( ǫ − 2 ) . F or closeness testing, w e add ress th e op en problem of Fisher [ 2014 ] b y reducing the upp er b ound from log 5 k to log log k . W e sho w that N ∗ cl ( k , ǫ ) ≤ e O  ǫ − 5 log log k  . 3 This v ery mild, doub le-logarithmic dep endence on the alphab et size ma y b e the ﬁr st sub -p oly- logarithmic growth rate of any non-constan t-complexit y p r op ert y and to gether with the lo w er bou n d in Ac hary a et al. [ 2014a ] shows that the dep endence on k is ind eed a p oly-double-logarithmic. Rest of the pap er is organized as follo ws. W e ﬁ rst study identit y testing in Section 2 . In Section 3 w e prop ose an algorithm for closeness testing. All the pr o ofs are giv en in App end ix. 2 Iden tit y testing In the follo wing, p is a d istribution ov er [ k ] def = { 1 , . . . ,k } , p ( i ) is the probability of i ∈ [ k ], | S | is the cardinalit y of S ⊆ [ k ], p S is the conditional distribu tion of p wh en S is queried, and n is the n umb er of samp les. F or an elemen t i , n ( i ) is u sed to denote the num b er of o ccurrences of i . This section is organized as follo ws. W e ﬁrst motiv ate our iden tit y test using restricted u ni- formit y testing, a s p ecial case of identit y testing. W e then h ighligh t tw o imp ortan t asp ects of our iden tit y test: ﬁnding a distinguishing element i and ﬁ nding a distinguishing set S . W e then pr o vide a simple algorithm for ﬁ nding a distinguishing element . As w e sho w, ﬁn ding d istinguishing sets are easy for testing ne a r-u ni f orm distributions and w e give an algorithm for testing near-unif orm d is- tributions. W e later use the n ear-uniform case as a subroutine for testing any general distribu tion. 2.1 Example: restricted uniformit y test ing Consider the class of distributions Q , where eac h q ∈ Q has k / 2 element s w ith probabilit y (1 + ǫ ) /k , and k / 2 elemen ts w ith probability (1 − ǫ ) /k . Let p b e the uniform distr ibution, namely p ( i ) = 1 /k for all 1 ≤ i ≤ k . Hence for ev ery q ∈ Q , || p − q || 1 = ǫ . W e now m otiv ate our test via a simp ler r estricte d uniformity testing , a sp ecial case of id en tit y testing where one determines if a d istribution is p or if it b elongs to the class Q . If w e kno w t wo elemen ts i, j su c h that q ( i ) = 1+ ǫ k > 1 k = p ( i ) and q ( j ) = 1 − ǫ k < 1 k = p ( j ), it suﬃces to consider th e set S = { i, j } . F or this set p S ( i ) = p ( i ) p ( i ) + p ( j ) = p S ( j ) = p ( j ) p ( i ) + p ( j ) = 1 /k 2 /k = 1 2 , while q S ( i ) = q ( i ) q ( i ) + q ( j ) = (1 + ǫ ) /k (1 + ǫ ) /k + (1 − ǫ ) /k = 1 + ǫ 2 , and similarly q S ( j ) = (1 − ǫ ) / 2. Thus diﬀerentia ting b et we en p S and q S is same as diﬀerentia ting b et wee n B (1 / 2) and B ((1 + ǫ ) / 2) for whic h a simple application of the C h ernoﬀ b oun d shows that O ( ǫ − 2 ) samples suﬃce. Thus th e sample complexit y is O ( ǫ − 2 ) if w e knew suc h a set S . Next consider th e s ame class of distrib utions Q , bu t without the knowledge of elemen ts i and j . W e can pick tw o elemen ts uniformly at rand om from all p ossible  k 2  pairs. With p robabilit y ≥ 1 / 2, the t wo elemen ts will h a v e diﬀerent probabilities as ab ov e, and again we could determine whether ro ot the d istribution is uniform. Our su ccess probability is half the success probabilit y when S is kno wn, bu t it can b e increased b y rep eating the exp eriment sev eral times and d eclaring the distribution to b e non-uniform if one of the c hoices of i and j indicates non-uniformity . While the ab ov e example illustrates tests for uniform distribu tion, for non-unif orm d istr ibutions ﬁnding elements i, j can b e d iﬃcult. In stead of ﬁnding p airs of elements, we ﬁn d a distinguish in g elemen t i and a distinguish ing set S suc h that q ( i ) < p ( i ) ≈ p ( S ) < q ( S ), th us when cond itional 4 samples f rom S ∪ { i } are observ ed, th e num b er of times i app ears wo uld diﬀer signiﬁcantly , and one can u s e Chern oﬀ-t yp e argument s to diﬀerentia te b etw een same and diff . While p revious authors hav e used similar metho ds, our main cont rib ution is to design a information theoretically near-optimal ident ity test. Before w e pro ceed to identi ty testing, we quantify the Chernoﬀ-t yp e arguments formally us- ing Test-eq ual . It tak es samples from t w o u n kno wn binary distribu tions p, q (without loss of generalit y assume ov er { 0 , 1 } ), error probabilit y δ , and a parameter ǫ and it tests if p = q or ( p − q ) 2 ( p + q )(2 − p − q ) ≥ ǫ . W e use the c hi-squared distance ( p − q ) 2 ( p + q )(2 − p − q ) as the m easure of distance instead of ℓ 1 since it captures the d ep endence on sample complexit y more accurately . F or example, consider t w o scenarios: p, q = B (1 / 2) , B (1 / 2 + ǫ/ 2) or p, q = B (0) , B ( ǫ/ 2). In b oth cases || p − q || 1 = ǫ , bu t the n umb er of samples requ ired to distinguish p and q in the ﬁ rst case is O ( ǫ − 2 ), w hile in the s econd case O ( ǫ − 1 ) suﬃ ce. Ho wev er, c hi-squared distance correctly captures the samp le complexit y as in the ﬁrst case it is O ( ǫ 2 ) and in the second case it is O ( ǫ ). While seve ral other simple h yp othesis tests exist, the algorithm b elo w has near-optimal samp le complexit y in terms of ǫ, δ . Algorithm Test-equal Input: chi-squared b ound ǫ , error δ , distrib u tions B ( p ) and B ( q ). P arameters: n = O (1 /ǫ ). Rep eat 18 log 1 δ times and output the ma jorit y: 1. Let n ′ = p oi( n ) and n ′′ = p oi( n ) b e t w o indep end ent P oisson v ariables with mean n . 2. Dra w samples x 1 , x 2 . . . x n ′ from the ﬁ rst distribution and y 1 , y 2 . . . y n ′′ from the s econd one. 3. Let n 1 = P n ′ i =1 x i and n 2 = P n ′′ i =1 y i . 4. If ( n 1 − n 2 ) 2 − n 1 − n 2 n 1 + n 2 − 1 + ( n 1 − n 2 ) 2 − n 1 − n 2 n ′ + n ′′ − n 1 − n 2 − 1 ≤ nǫ 2 then output same , else di ff . Lemma 1 (App endix B.1 ) . If p = q , then Test-equal outputs s ame with pr ob ability 1 − δ . If ( p − q ) 2 ( p + q )(2 − p − q ) ≥ ǫ , it outputs diff with pr ob ability ≥ 1 − δ . F urthermor e the algorithm uses O  1 ǫ · log 1 δ  samples. 2.2 Finding a distinguishing elemen t i W e no w giv e an algorithm to ﬁnd an element i suc h th at p ( i ) > q ( i ). In the ab ov e men tioned example, we could ﬁn d su c h an element with probability ≥ 1 / 2, by randomly selecting i out of all elemen ts. Ho w ev er, for some distributions, this p robabilit y is m uch low er. F or example consider the follo wing distributions p and q . p (1) = ǫ/ 2, p (2) = 0, p ( i ) = 1 − ǫ/ 2 k − 2 for i ≥ 2, and q (1) = 0, q (2) = ǫ / 2, q ( i ) = 1 − ǫ/ 2 k − 2 for i ≥ 2. Again note that | | p − q || 1 = ǫ . If we pic k i at ran d om, the chance that p ( i ) > q ( i ) is 1 /k , ve ry small for our pur p ose. A b etter wa y of selecting i would b e sampling according to p itself. F or example, the probabilit y of ﬁnding an elemen t i suc h that p ( i ) > q ( i ) when sampled from p is ǫ/ 2 ≫ 1 /k . W e qu an tify the ab o ve idea n ext b y u sing the follo wing simple algorithm that pic ks elemen ts suc h that p ( i ) > q ( i ). W e ﬁrst need the f ollo wing d eﬁnition. Without loss of generalit y assume that the elemen ts are ordered such that p (1) ≥ p (2) ≥ p (3) . . . ≥ p ( k ). Deﬁnition 2. F or a distribution p , element i is α -he avy, if P i ′ : i ′ ≥ i p ( i ′ ) ≥ α . 5 As w e sho w in pro ofs, symb ols that are hea vy ( α large) can b e used as d istinguishing sym b ols easily and hence our goal is to choose symbols such th at p ( i ) > q ( i ) and i is α -hea vy for a large v alue of α . T o this end, ﬁr s t consid er an auxiliary r esult th at sho ws if f or some non-negativ e v alues a i , P i p ( i ) a i > 0, then the f ollo wing samplin g algorithm will p ic k an elemen t x i suc h that x i is α i -hea vy and a x i ≥ β i . While sev eral other algorithms ha v e similar prop erties, the follo wing algorithm ac hieve s a go o d trade-b et we en α and β (one of th e tuples satisfy αβ = ˜ Ω(1)), h ence is useful in ac hieving near-optimal sample complexit y . Algorithm Find-element Input: Parameter ǫ , d istribution p . P arameters: m = 16 /ǫ , β j = j ǫ/ 8, α j = 1 / (4 j log (16 /ǫ )). 1. Dra w m ind ep endent s amp les x 1 , x 2 . . . x m from p . 2. Outp u t tuples ( x 1 , β 1 , α 1 ) , ( x 2 , β 2 , α 2 ) , . . . , ( x m , β m , α m ). Lemma 3 (App endix B.2 ) . F or 1 ≤ i ≤ k , let a i b e such that 0 ≤ a i ≤ 2 . If P k i =1 p i a i ≥ ǫ/ 4 , then with pr ob ability ≥ 1 / 5 , at le ast one tuple ( x, α, β ) r eturne d by Find-element ( ǫ, p ) satisfy the pr op erty that x is α -he a vy and a x ≥ β . F urthermor e it u se s 16 /ǫ samples. W e no w use the ab ov e lemma to pick elements suc h that p ( i ) > q ( i ). Since || p − q || 1 ≥ ǫ , X i : p ( i ) ≥ q ( i ) ( p ( i ) − q ( i )) ≥ ǫ/ 2 . Hence X i p ( i ) max  0 , p ( i ) − q ( i ) p ( i )  ≥ ǫ 2 . Applying Lemma 3 with a i = max  0 , p ( i ) − q ( i ) p ( i )  , yields Lemma 4. If || p − q | | 1 ≥ ǫ , then with pr ob ability ≥ 1 / 5 at le ast one of the tuple ( i, β , α ) r eturne d by Find-element ( ǫ, p ) satisﬁes p ( i ) − q ( i ) ≥ β p ( i ) and i i s α - he a vy. F urth ermor e Find-elemen t uses 16 /ǫ samples. Note that ev en though the ab o ve algo rithm do es not use d istribution q , it ﬁ nds i suc h that p ( i ) − q ( i ) ≥ β p ( i ) just b y the prop erties of ℓ 1 distance. F urthermore, β j increases with j and α j decreases with j ; thus th e ab ov e lemma states th at the algorithm ﬁnds an elemen t i su c h that either ( p ( i ) − q ( i )) /p ( i ) is large, b u t ma y not b e hea vy , or ( p ( i ) − q ( i )) /p ( i ) is small, y et it b elongs to one of th e higher probabilities. T his p recise tr ade-oﬀ b ecomes imp ortant to b ound th e sample complexit y . 2.3 T esting for near-un iform distributions W e deﬁne a distrib ution p to b e ne ar-uniform if max i p ( i ) ≤ 2 min i p ( i ). Recall that w e need to ﬁnd a distinguishing element and a distinguish ing s et. As w e sho w, f or near-unif orm distributions, there are singleton distinguishin g sets and hence are easy to ﬁnd. Using Find-elemen t , we ﬁ rst 6 deﬁne a meta algo rithm to test for near-unif orm distribu tions. Th e inpu ts to the algorithm are parameter ǫ , error δ , distributions p, q and an element y suc h that p ( y ) ≥ q ( y ). Sin ce w e us e Near- uniform-identity-test as a su broutine later, y is giv en f rom the main algorithm. Ho wev er, if w e w ant to use Near-uniform-identity-test by itself, w e can ﬁnd a y using Find-el ement ( ǫ, p ). The algorithm u ses Find-eleme nt to ﬁnd an elemen t x su ch that q ( x ) − p ( x ) ≥ β q ( x ). Since p ( y ) ≥ q ( y ) and q ( x ) − p ( x ) ≥ β q ( x ), r unning Test-equal on set { x, y } will yield an algorithm for identi ty testing. The pr ecise b oun d s in Lemmas 1 and 3 help us to obtain th e optimal sample complexit y . In p articular, Lemma 5 (App endix B.3 ) . If p = q , then Ne ar-uniform-identity-test r eturns sam e with pr ob ability ≥ 1 − δ . If p is ne ar-uniform and || p − q || 1 ≥ ǫ , then Near-uniform-identity-test r eturns di ff with pr ob ability ≥ 1 / 5 − δ . The algorithm uses O  1 ǫ 2 · log 1 δǫ  samples. Algorithm Near-uniform-identity-test Input: d istance ǫ , error δ , distribu tions p , q , an element y suc h th at p ( y ) ≥ q ( y ). 1. Run Find-eleme nt ( ǫ, q ) to obtain tup les ( x j , β j , α j ) for 1 ≤ j ≤ 16 /ǫ . 2. F or ev ery tup le ( x j , β j , α j ), run Test-equal ( β 2 j / 144 , 6 δ/ ( π 2 j 2 ) , p { x,y } , q { x,y } ). 3. Outp u t same if Test-equ al in previous step r eturns s ame for all tuples, otherwise output diff . 2.4 Finding a distinguishing set for general distributions W e n o w extend Near-uniform-identity-test to general distribu tions. Recall that w e need to ﬁnd a distinguishing elemen t and a distinguish ing s et. Once w e ha ve an elemen t suc h that p ( i ) > q ( i ), our ob jectiv e is to ﬁnd a distinguishing set S such th at p ( S ) < q ( S ) and p ( S ) ≈ p ( i ). Natural candidates for such sets are com binations of elements whose probabilities ≤ p ( i ). S ince p is k n o wn, we can select suc h sets easily . Let G i = { j : j ≥ i } . Consider the sets H 1 , H 2 , . . . formed by com binin g elemen ts in G i suc h that p ( i ) ≤ p ( H j ) ≤ 2 p ( i ) , ∀ j . W e ideally would lik e to u se one of th ese H j s as S , how ev er dep ending on the v alues of p ( H j ) three p ossible scenarios arise and that constitutes the main algorithm. W e need one more deﬁnition for describin g th e main identi ty test. F or any d istribution p , and a partition of S in to d isjoin t subsets S = { S 1 , S 2 , .. } , the ind uced distrib ution p S is a distribution o v er S 1 , S 2 , . . . suc h that ∀ i, p S S ( S i ) = p ( S i ) p ( S ) . 2.5 Prop osed iden tit y test The algorithm is a com bination of tests for eac h p ossible scenarios. First it ﬁ nds a s et of tup les ( i, β , α ) suc h that one tuple satisﬁes ( p ( i ) − q ( i )) /p ( i ) ≥ β an d i is α -heavy . Th en, it d ivides G i in to H 1 , H 2 , . . . such th at , ∀ j, p ( i ) ≤ p ( H j ) ≤ 2 p ( i ). If || p − q || 1 ≥ ǫ , th en there are three p ossible cases. 1. p ( H j )(1 − β / 2) ≤ q ( H j ) for most j s. W e can randomly pick a set H j and sample from H j ∪ { i } and w e would b e able to test if || p − q || 1 ≥ ǫ usin g n ( i ) wh en sampled fr om H j ∪ { i } . 7 2. p ( H j )(1 − β / 2) ≥ q ( H j ) f or most j . Since for most j ’s, p ( H j )(1 − β / 2) ≥ q ( H j ), w e ha v e p ( G i )(1 − β / 2) ≥ q ( G i ), and since p ( G i ) ≥ α , w e can sample from th e entire d istribution and use n ( G i ) to test if | | p − q || 1 ≥ ǫ . 3. F or some j , p ( H j )(1 − β / 2) ≥ q ( H j ) an d f or some j , p ( H j )(1 − β / 2) ≤ q ( H j ). It can b e s ho wn that this condition im p lies that elements in G i can b e group ed in to H 1 , H 2 , . . . suc h that induced d istribution on groups is near-un if orm and y et the ℓ 1 distance b et wee n th e induced distributions is large. W e use Near-uniform-identity-test f or this scenario. The algorithm has a step corresp onding to eac h of the ab ov e three scenarios. If p = q , then all three steps would output s ame w ith high probabilit y , otherwise one of the steps wo uld outpu t diff . The main result of this s ection is to b ound the sample complexit y of Identity-tes t Theorem 6 (App endix B.4 ) . If p = q , then Identity-test r eturns same with pr ob ability ≥ 1 − δ and if || p − q || 1 ≥ ǫ , then Iden tity-test r e tu rns diff with pr o b ability ≥ 1 / 30 . The algorithm uses at most N ∗ id ( k , ǫ ) ≤ Θ  1 ǫ 2 · log 2 1 ǫ · log 1 ǫδ  samples. The prop osed iden tit y testing has d iﬀeren t error p robabilities when p = q and || p − q || 1 ≥ ǫ . In particular, if p = q , the algorithm returns same with probabilit y ≥ 1 − δ and if || p − q || 1 ≥ ǫ it outp u ts diff with probabilit y ≥ 1 / 30. While the probabilit y of success for || p − q || 1 ≥ ǫ is small, it can b e b o osted arbitrarily close to 1, b y r ep eating the algorithm O (log(1 /δ )) times and testing if more th an 1 / 60 fraction of times the algorithm outputs d iff . By a simple Chern oﬀ typ e argumen t, it can b e sh o wn that for b oth cases p = q and || p − q || 1 , the error p robabilit y of th e b o osted algorithm is ≤ δ . F urthermore, throughout the pap er we ha ve calculated all the constan ts except sample complexities whic h we hav e left in O n otation. Algorithm Identity-tes t Input: err or δ , distance ǫ an unkn o wn d istribution q , and a known distrib u tion p . 1. Run Find-eleme nt ( ǫ, p ) to obtain tuples ( x, β , α ). 2. F or ev ery tup le ( x, β , α ): (a) Let G x = { y : y ≥ x } . (b) P artition G x in to groups H = H 1 , H 2 , . . . s.t. for eac h group H j , p ( x ) ≤ p ( H j ) ≤ 2 p ( x ). (c) T ak e a rand om sample y from p H G x and run Test-equal  β 2 1800 , ǫδ 48 , p { x,y } , q { x,y }  . (d) Run Test-equal  αβ 5  2 , ǫδ 48 , p { G x ,G c x } , q { G x ,G c x }  . (e) Run Near-uniform-identity-test  β 5 , ǫδ 48 , p H G x , q H G x  . 3. Outp u t diff if an y of the ab ov e tests returns diff for any tuple, otherwise outp ut same . 3 Closeness testing Recall that in closeness testing, b oth p and q are u nkno wn and w e test if p = q or || p − q || 1 ≥ ǫ using samples. First w e relate identit y testing to closeness testing. 8 Iden tit y testing had t w o parts: ﬁndin g a distinguishing elemen t i and a distin gu ish ing set S . The algorithm w e used to generate i did n ot use an y a priori kno wledge of the distrib ution. Hence it carries ov er to closeness testing easily . The main diﬃculty of extending identit y testing to closeness testing is to ﬁ nd a distinguish ing set. Recall that in iden tit y testing, we ord ered elemen ts su c h that their probabilities are decreasing and considered set G i = { j : j ≥ i } to ﬁnd a distinguishing set. G i w as known in identit y testing, how ev er in closeness testing, it is u nknown and is diﬃcu lt to ﬁn d. The rest of the section is organized as follo ws: W e ﬁrst outline a metho d of iden tifying a distinguishing set by sampling at a certain fr equency (whic h is u nkno wn ). W e then formalize ﬁnding a distinguishing element and then s h o w ho w one can use a binary searc h to ﬁnd the sampling frequency and a d istinguishing set. W e ﬁ nally describ e our m ain closeness test, wh ic h requ ires few additional tec hniques to hand le some sp e cial c ases . 3.1 Outline for ﬁnding a distinguishing set Recall that in id en tit y testing, we ordered element s su c h that their probabilities are decreasing and considered G i = { j : j ≥ i } . W e then used a subs et of S ⊂ G i suc h that p ( S ) ≈ p ( i ) as the distinguishing set. Ho w ev er, in closeness test this is not p ossible as set G i is unknown. W e no w outline a metho d of ﬁnding suc h a set S using random sampling without the kno wledge of G i . Without loss of generalit y , assu m e that elemen ts are ordered su ch that p (1) + q (1) ≥ p (2) + q (2) ≥ . . . ≥ p ( k ) + q ( k ). T he algorithm do es n ot use this fact and the assum ption is for the ease of pro of notation. Let G i = { j : j ≥ i } under this ordering ( G i serv es same pur p ose as G i for iden tit y testing, ho wev er is sy m metric with resp ect to p, q and hence easy to hand le compared to th at of iden tit y testing). F urthermore, for simp licit y in the rest of the section, assume th at p ( i ) > q ( i ) and p ( G i ) ≤ q ( G i ). Sup p ose we come up with a scheme that ﬁnds su bset S of G i suc h that p ( S ) ≈ p ( i ) and p ( S ) < q ( S ), then as in Ide ntity-test , we can use that sc heme together with Test-equal on S ∪ { i } to d iﬀerentiate b et we en p = q and || p − q || 1 ≥ ǫ . The main challe nge of the algorithm is to ﬁ nd a d istinguishing su bset of G i . Let r = ( p + q ) / 2, i.e., r ( j ) = ( p ( j ) + q ( j )) / 2 ∀ 1 ≤ j ≤ k . Supp ose we kno w r 0 = r ( i ) r ( G i ) . C onsider a set S formed b y including eac h elemen t j indep end en tly with p robabilit y r 0 . Thus the probabilit y of that set can b e written as p ( S ) = k X j =1 I j ∈ S p ( j ) , where I j ∈ S is the indicator r andom v ariable for j ∈ S . In any su c h set S , there migh t b e elemen ts that are not fr om G i . W e can pru n e these elemen ts (refer to them as j ′ ) by sampling from the distribution p { j,j ′ } and testing if j ′ app eared more than j . Pr ecise pr obabilistic argumen ts are give n later. Supp ose w e remov e all elemen ts in S that are not in G i . T hen, p ( S ) = X j ∈ G i I j ∈ S p ( j ) . Since Pr( I j ∈ S = 1) = r 0 , E [ p ( S )] = X j ∈ G i E [ I j ∈ S ] p ( j ) = r 0 X j ∈ G i p ( j ) = r ( i ) r ( G i ) · p ( G i ) . Similarly one can sh o w that E [ q ( S )] = r ( i ) r ( G i ) · q ( G i ). T hus E [ p ( S )] < E [ q ( S )] and E [ p ( S )] + E [ q ( S )] = p ( i ) + q ( i ). Note that for eﬃcien tly using Test-eq ual , we not only need p ( i ) > q ( i ) and E [ p ( S )] < 9 E [ q ( S )], bu t w e the c hi-squ ared distance needs to b e large. It can b e sh o wn that this condition is same as stating p ( S ) + q ( S ) ≈ p ( i ) + q ( i ) is necessary and hence E [ p ( S )] + E [ q ( S )] = p ( i ) + q ( i ) is useful. Th u s in exp ectation S is a go o d candid ate for d istinguishing set. Hence if we tak e samples fr om S ∪ { i } and compare p ( S ) , p ( i ) and q ( S ) , q ( i ), we can test if p = q or || p − q || 1 ≥ ǫ . W e therefore hav e to ﬁnd an i such that p ( i ) > q ( i ) and p ( G i ) < q ( G i ), estimate r ( i ) /r ( G i ) and con v ert the ab o ve exp ectation argument to a probabilistic one. While the calculations and analysis in exp ectatio n s eem natur al, ju diciously analyzing the success probabilit y of these ev ents tak es a fair amount of eﬀort. F urthermore, note that given a conditional sampling access to p and q , one can generate a conditional sample fr om r , by selecting p or q indep endent ly with pr obabilit y 1 / 2 and then obtaining a conditional sample from the s elected distribution. 3.2 Finding a distinguishing elemen t i W e n o w sh o w that u sing an algorithm similar to Find-eleme nt , we can ﬁnd an i suc h that ( p ( i ) > q ( i ) and p ( G i ) ≤ q ( G i )) or ( p ( i ) < q ( i ) and p ( G i ) > q ( G i )). T o quantify the ab o v e statemen t we need the follo wing d eﬁ nition. W e deﬁ n e β -approximabilit y as Deﬁnition 7. F or a p air of distributions p and q , element i is β - appr oximable, i f     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( G i ) − q ( G i ) p ( G i ) + q ( G i )     ≥ β . As w e sho w later, it is suﬃcien t to consid er β -approxima ble elemen ts instead of elemen ts with p ( i ) > q ( i ) and p ( G i ) ≤ q ( G i ). Thus the ﬁrst step of our algorithm is to ﬁnd β -approxima ble elemen ts. T o this end, we show that Lemma 8 (App endix C.1 ) . If || p − q || 1 ≥ ǫ , then X i p ( i ) + q ( i ) 2     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( G i ) − q ( G i ) p ( G i ) + q ( G i )     ≥ ǫ 4 . Hence if w e use Find-el ement for the d istr ibution r = ( p + q ) / 2, then one of the tup les would b e β j -appro ximable for some β j . Note that with a i =    p ( i ) − q ( i ) p ( i )+ q ( i ) − p ( G i ) − q ( G i ) p ( G i )+ q ( G i )    , 0 ≤ a i ≤ 2 and P k i =1 r ( i ) a i ≥ ǫ/ 4. By Lemma 3 , Find-el ement outputs a tuple ( i, α, β ) suc h that i th at is α - hea vy and β -app ro ximable. Note that although w e obtain i and guaran tees on G i , the algorithm do es not ﬁnd G i . Lemma 9. Wi th pr ob ability ≥ 1 / 5 , of the tuples r eturne d by Find-ele ment ( ǫ, r ) ther e exist at le ast one tuple that is b oth α -he avy and β -appr ox imable. 3.3 Appro ximating r ( i ) r ( G i ) via binary searc h Our next goal is to estimate r 0 = r ( i ) r ( G i ) using samples. I t can b e easily sh o wn that it is su ﬃcien t to kno w r ( i ) r ( G i ) up-to a multiplica tive factor, sa y γ (w e later c ho ose γ = Θ(log log log k )). F urth ermore b y the deﬁn ition of G i , r ( G i ) ≥ r ( i ) and r ( G i ) = P j ≥ i r ( j ) ≤ P j ≥ i r ( i ) ≤ k r ( i ). T herefore, 1 k ≤ r ( i ) r ( G i ) ≤ 1 , 10 and log k ≥ − log r ( i ) r ( G i ) ≥ 0. Approximati ng r ( i ) r ( G i ) up-to a m ultiplicativ e factor γ is the same as appro ximating log r ( i ) r ( G i ) up-to an additiv e factor log γ . W e can th us run our algorithm for r ( i ) r ( G i ) corresp ondin g to eac h v alue of { 0 , log γ , 2 log γ , 3 log γ , . . . , log k } , and if || p − q || 1 ≥ ǫ , at least for one v alue of r ( i ) r ( G i ) w e outp ut dif f . Using carefully c hosen thresholds we can also ensure that if p = q , the algorithm outputs sa me alwa ys. Th e samp le complexit y for the ab o ve algorithm is log k log γ ≈ ˜ Θ(log k ) times th e complexit y when w e kno w r ( i ) r ( G i ) . W e impro ve the sample complexit y by using a b etter searc h algorithm ov er { 0 , log γ , 2 log γ , 3 log γ , . . . , log k } . W e dev elop a comparator (step 4 in Binar y-search ) with the follo wing prop erty: if our guess v a lue r guess ≥ γ r ( i ) r ( G i ) it outputs heavy and if r guess ≤ 1 γ · r ( i ) r ( G i ) it outputs light . Using suc h a comparator, we do a binary searc h and ﬁn d th e right v al ue faster. R ecall that bin ary searc h o v er m elemen ts uses log m qu eries. F or our p roblem m = log k and thus our sample complexit y is appr o ximately log log k times th e samp le complexit y of the case wh en we know r ( i ) r ( G i ) . Ho w ev er, our comparator cann ot iden tify if we hav e a goo d guess i.e ., if 1 γ r guess r ( i ) r ( G i ) ≤ r guess ≤ γ · r ( i ) r ( G i ) . Thus, our b inary search instead of outputting the v alue of r ( i ) r ( G i ) up-to some appr o ximation factor γ , ﬁnds a set of candidates r 1 guess , r 2 guess , . . . su c h that at least one of the r j guess s satisﬁes 1 γ r ( i ) r ( G i ) ≤ r j guess ≤ γ r ( i ) r ( G i ) . Hence, for eac h v alue of r guess w e assum e that r guess ≈ r ( i ) /r ( G i ), and run the closeness test. A t least for one v alue of r guess w e would b e correct. The algorithm is giv en in Binar y -search . The algorithm Prune-set remo v es all elements of p robabilit y ≥ 4 r ( i ), yet do es not remov e any elemen t of probability ≤ r ( i ). Since after prun ing S on ly con tains elemen ts of pr obabilit y ≤ 4 r ( i ), w e show that at some p oint of the log log k steps, the algorithm encount ers r guess ≈ r ( i ) r ( G i ) . Algorithm Pr un e-set Input: S , ǫ , i , α , m , and γ . P arameters: δ ′ = δ 40 m log log k , n 1 = O  log γ δ ′ αβ  ·  γ αβ log γ αβ + log 1 δ ′ log log 1 δ ′  , n 2 = O (log log log k + log 1 ǫδ ). Rep eat n 1 times: Obtain a samp le j from r S and sample n 2 times from r { j,i } . If n ( j ) ≥ 3 n 2 / 4, r emo v e j from set S . 11 Algorithm Binar y -search Input: T u ple ( i, β , α ). P arameters: γ = 1000 log log log k δǫ , n 3 = O  γ 2 log log log k δ  . Initialize log r guess = − log √ k . Set lo w = − log k and high = 0. Do log log k times: 1. Create a set S b y indep endently kee pin g elemen ts { 1 , 2 , . . . , k } \ { i } eac h w .p. r guess . 2. Pru n e S using Prune-set ( S, ǫ, i, α, 1 , γ ). 3. Run Assiste d-closene ss-test ( r guess , ( i, β , α ) , γ , ǫ, δ ). 4. Obtain n 3 samples from S ∪ { i } . If n ( i ) < 5 n 3 γ , then output heavy , else output light . (a) If ou tp ut is heavy , up d ate high = log r guess and log r guess = (log r guess + lo w) / 2. (b) If ou tp ut is light , up d ate lo w = log r guess and log r guess = (log r guess + high) / 2. 5. If an y of the Assiste d-closene ss-test s return diff , th en output diff . Lemma 10 (App end ix C.2 ) . If i is α -he avy and β -appr oximable, then the algorithm Binar y- search , with pr ob ability ≥ 1 − δ , r e aches r guess such that r ( i ) γ = r ( G i ) γ · r ( i ) r ( G i ) ≤ r guess ≤ γ β · r ( i ) r ( G i ) . Note that due to tec hn ical r easons w e get an ad d itional 1 /β factor in the u pp er b ound and a factor of r ( G i ) in the lo w er b ound . 3.4 Assisted closeness t est W e now discuss the prop osed test, whic h u ses the ab ov e v alue of r guess . As stated b efore, in exp ectation it would b e suﬃcien t to keep elemen ts in th e set S with probabilit y r guess and use the resulting set S to test for closeness. Ho we ver, there are t wo ca v eats. Firstly , Prune-set can remo v e only elemen ts which are bigger than 4( i ), while we can reduce the factor 4 to any n umb er > 1, but w e can nev er reduce it to 1 as if there is an elemen t with probabilit y 1 + δ ′ for suﬃcient ly small δ ′ , that elemen t is almost indistin gu ish able from an element w ith p robabilit y 1 − δ ′ . Th us w e need a wa y of ensuring that elemen ts with probabilit y > r ( i ) and ≤ 4 r ( i ) d o n ot aﬀect the concen tration inequalities. Secondly , s ince we hav e an appro ximate v alue of r ( i ) /r ( G i ), the probability that requ ir ed qu an - tities concentrat e is small and w e ha v e to rep eat it many times to obtain a higher p robabilit y of success. O ur algorithm add ress b oth these issues and is giv en b elo w: The algorithm picks m sets and p runes them to ens u re that none of the elemen ts h as probabilit y ≥ 4 r ( i ) and considers tw o p ossibilities: there exist many elemen ts j s u c h that j / ∈ G i and     p ( i ) − q ( i ) r ( i ) − p ( j ) − q ( j ) r ( j )     ≥ β ′′ ( β ′′ determined later) , or th e num b er of such elements is sm all. If it is the ﬁrst case, the algorithm ﬁn ds su c h an element j and p erforms Test -equal o v er set { i, j } . Other w ise, we show that r ( S ) ≈ r ( i ), it concen trates, 12 and with high probabilit y     p ( i ) − q ( i ) r ( i ) − p ( S ) − q ( S ) r ( S )     ≥ β ′′ ( β ′′ determined later) , and th us one can sample from S ∪ { i } and use n ( i ) to test closeness. T o conclude, th e prop osed Close ness-tes t uses Find-elemen t to ﬁ nd a distinguishing ele- men t i . It then ru ns Binar y-search to appro ximate r ( i ) /r ( G i ). Ho w ev er since the searc h do es not iden tify if it has f ound a goo d estimate of r ( i ) /r ( G i ), for eac h estimate it run s Assisted- closene ss-test wh ic h uses the d istinguishing elemen t i and the estimate of r ( i ) /r ( G i ). The main result in this section is the sample complexit y of our prop osed Close ness-tes t . Theorem 11 (App end ix C.3 ) . If p = q , then Closeness -test r eturns same with pr ob ability ≥ 1 − δ and if || p − q || 1 ≥ ǫ , then Close ness-tes t r eturns diff with pr ob ability ≥ 1 / 30 . The algorithm uses N ∗ cl ( k , ǫ ) ≤ e O  log log k ǫ 5  samples. As stated in the p revious section, by rep eating and taking a ma jority , the success probabilit y can b e b o osted arbitrarily close to 1. Note that none of the constan ts or the error p robabilities ha v e b een optimized. Constan ts for all the parameters except the sample complexities n 1 , n 2 , n 3 , and n 4 ha v e b een given. Algorithm Closeness -test Input: ǫ , oracles p, q . 1. Generate a set of tuples using Find-el ement ( ǫ, r ). 2. F or ev ery tup le ( i, α, β ), run Binar y-sear ch ( i, β , α ). 3. If an y of the Binar y-search r eturned diff outpu t d iff otherw ise output same . Algorithm Assisted-cl osenes s-test Input: r guess , tu p le ( i, β , α ), γ , ǫ , and δ . P arameters: β ′′ = αβ 128 γ log 128 γ β 2 , m = 4096 γ αβ 2 , n 4 = O ( γ / ( αβ )), and δ ′ = ǫδ 32 m ( n 4 +1) log l og k . 1. Create S 1 , S 2 , . . . S m indep end en tly by kee pin g elemen ts { 1 , 2 , . . . , k } \ { i } eac h w .p. r guess . 2. Run Prune-set ( S ℓ , ǫ, i, α, m, γ ) for 1 ≤ ℓ ≤ m 3. F or eac h set S do: (a) T ak e n 4 samples from r S ∪{ i } and for all seen elemen ts j , run Test-eq ual  ( β ′′ ) 2 / 25 , δ ′ , p { i,j } , q { i,j }  . (b) Let S = {{ i } , S } . Ru n Test -equal  α 3 β 3 2 23 γ 2 log 3 128 γ β 2 , δ ′ , p S S 1 ∪{ i } , q S S 1 ∪{ i }  . 4. If an y of the ab ov e tests return diff , output diff . 13 4 Ac kno wledgemen ts W e thank J ay adev Ac harya , Cl ´ emen t Canonne, Sreechakra Gopara ju, and Himanshu Ty agi for useful suggestions and discussions. References J. Ac hary a and C. Dask alakis. T esting p oisson binomial d istributions. In Pr o c e e dings of the Twenty- Sixth Annual ACM-SIAM Symp osium on Discr ete Algorith ms, SODA 2015, San Die go, CA, USA, January 4-6, 2015 , pages 1829–1 840, 2015. 1.2 J. Ac harya , H. Das, A. Jafarp ou r , A. Orlitsky , S. P an, and A. T. S u resh. Comp etitiv e classiﬁcation and close ness testing. In Pr o c e e dings of the 25th Annual Confer enc e on L e arning The ory (COL T) , pages 22.1– 22.18, 2012. 1.2 , A , 13 J. Ac harya , C. L. Canonne, and G. Kamath. A chasm b et wee n iden tit y and equiv alence testing with conditional queries. CoRR , abs/1411 .7346, 2014a. 1.2 , 1.3 J. Ac harya, A. Jafarp our , A. O r litksy , and A. T. Suresh . Sublinear algorithms for outlier detection and generalized closeness testing. I n Pr o c e e dings of the 2014 IEEE International Symp o sium on Information The o ry (ISIT) , 2014b. 1.2 M. Balcan, E. Blais, A. Blum, and L. Y ang. Activ e prop ert y testing. In 53r d Annual IEEE Sym- p o sium on F oundations of Computer Scienc e, F OCS 2012, New Brunswick, NJ, U SA, Octob er 20-23, 2012 , p ages 21–30, 2012. 1.2 T. Batu, L. F ortno w, R. Rubinfeld, W. D. S mith, and P . White. T esting that d istr ibutions are close. I n Annual Symp osium on F oundations of Computer Scienc e (F OCS) , p ages 259–269, 2000. 1.1 T. Batu, L. F ortno w, E. Fischer, R. Kum ar, R. Ru binfeld, and P . White. T esting random v ariables for ind ep endence and ident it y . In 42nd Annual Symp osium on F oundations of Computer Scienc e, F OCS 2001, 14-17 O ctob er 2001, L as V e gas, Nevada, USA , pages 442–45 1, 2001. 1.1 T. Batu, R. Ku m ar, and R. Rub infeld. Su blinear algorithms for testing monotone and un imo dal distributions. In P r o c e e dings of the 36th Annual ACM Symp osium on The or y of Computing, Chic ago , IL, USA , Ju ne 13-16, 2004 , pages 381–39 0, 2004. 1.2 C. L. Canonn e, D. Ron, and R. A. Servedio. T esting equiv ale n ce b etw een distrib utions u s ing conditional samples. I n Pr o c e e dings of the Twenty-Fifth Annual ACM-SIAM Symp osium on Discr ete Algorithms, SODA 2014, Portland, Or e gon, USA, January 5-7, 2014 , pages 1174–11 92, 2014. 1.2 S. Chakrab ort y , E. Fisc her, Y. Goldhirsh, and A. Matsliah. On the p o wer of conditional samples in distribution testing. In Innovations in The or etic al Computer Scienc e, ITCS ’13, Berkeley, CA, USA, January 9-12, 2013 , pages 561–580 , 2013 . 1.2 S. O. Chan, I. Diak onik olas, R. A. Servedio, and X. Sun . Eﬃcien t d ensit y estimation via piecewise p olynomial app r o ximation. In Symp o sium on The or y of Computing, STOC 2014, New Y ork, NY, USA , M ay 31 - June 03, 2014 , pages 604–6 13, 2014a. 1.2 14 S. O. Chan, I. Diak onikol as, P . V alia nt, and G. V alian t. Op timal algorithms f or testing closeness of d iscrete distribu tions. In Symp osium on Discr ete Algorith ms (SO DA) , 2014b. 1.1 C. Dask al akis, I. Diak oniko las, R. A. S erv edio, G. V alian t, and P . V alian t. T esting k -mo d al dis- tributions: Optimal algorithms via red u ctions. In Pr o c e e dings of the Twenty-F ourth Annua l ACM-SIAM Symp osium on Discr ete Algorithms, SODA 2013, New O rle ans, L ouisiana , USA, January 6-8, 2013 , p ages 1833–1852 , 2013 . 1.2 I. Diak onik olas, D. M. Kane, and V. Nikish kin. T esting ident ity of stru ctured distribu tions. In Pr o c e e dings of the Twenty-Sixth Annual ACM-SIAM Symp os ium on Discr ete Algorithms, SODA 2015, San Die go, CA, USA, January 4-6, 2015 , pages 1841– 1854, 2015. 1.2 E. Fisher. Distinguishing distribu tions with conditional samples. Bertinor o 2014 , 2014. URL http://s ublinear. info/66 . (do cument) , 1.2 , 1.3 O. Goldreic h and D. R on . On testing expansion in b ounded -degree graphs. E le ctr onic Col lo quium on Computational Complexity (ECCC) , 7(20), 2000. 1.1 S. Guh a, A. McGregor, and S. V enk atasubramanian. S ublinear estimation of en tropy and inf orma- tion distances. ACM T r ansactions on Algorithms , 5(4), 2009. 1.2 R. Levi, D. Ron, and R. Rubinfeld. T esting prop er ties of collections of distribu tions. The ory of Computing , 9:295–3 47, 2013. 1.2 L. P aninski. A coincidence-based test for uniformit y giv en very spars ely samp led discrete data. IEEE T r ansactions on Information The ory , 54(10 ):4750–47 55, 2008. 1.1 R. Ru binfeld and R. A. Servedio. T esting monotone h igh-dimensional distributions. R ando m Struct. Algor ithms , 34(1):24–4 4, 2009. 1.2 J. Unnikrishnan. On op timal t w o sample h omogeneit y tests for ﬁn ite alphab ets. In Pr o c e e dings of the 2012 IEEE International Symp osium on Information The ory (ISIT) , pages 2027– 2031, 2012. 1.1 G. V alian t and P . V alia nt. In stance-b y-instance optimal identit y testing. E le ctr onic Col lo quium on Computation al Complexity (ECCC) , 20:111, 2013. 1.1 , 1.2 P . V alian t. T esting symmetric prop erties of d istributions. SIAM Journal on Computing , 40(6): 1927– 1968, Decem b er 2011. ISS N 0097-53 97. 1.1 B. W agg oner. L p testing an d learnin g of discrete distributions. In Pr o c e e dings of the 2015 Con- fer enc e on Innovations in The or etic al Computer Scie nc e, ITCS 2015, R ehovot, Isr ael, J anuary 11-13, 2015 , p ages 347–356, 2015. 1.2 J. Ziv. On classiﬁcation with empirically observe d statistics and unive rs al data compr ession. IEE E T r ansactio ns on Informatio n The ory , 34(2):278–2 86, 1988. 1.1 15 A T o ols W e use the follo wing v ariation of th e Chernoﬀ b ound. Lemma 12 (Chernoﬀ b ound) . If X 1 , X 2 . . . X n ar e distribute d ac c or ding to Bernoul li p , then Pr  P n i =1 X i n − p > δ  ≤ e − 2 nδ 2 , Pr  P n i =1 X i n − p < − δ  ≤ e − 2 nδ 2 . The follo wing lemma follo ws from a b ound in Ac hary a et al. [ 2012 ] and the f act that y 5 e − y is b ound ed for all non-negativ e v alues of y . Lemma 13 ( Ac hary a et al. [ 2012 ]) . F or two indep endent Poisson r and om variables µ and µ ′ with me ans λ and λ ′ r esp e ctively, E  ( µ − µ ′ ) 2 − µ − µ ′ µ + µ ′ − 1  = ( λ − λ ′ ) 2 λ + λ ′  1 − e − λ − λ ′  , V ar  ( µ − µ ′ ) 2 − µ − µ ′ µ + µ ′ − 1  ≤ 4 ( λ − λ ′ ) 2 λ + λ ′ + c 2 , wher e c is a universal c onsta nt. B Iden tit y testing pro ofs B.1 Pro of of Lemma 1 Let t = ( n 1 − n 2 ) 2 − n 1 − n 2 n 1 + n 2 − 1 + ( n 1 − n 2 ) 2 − n 1 − n 2 n ′ + n ′′ − n 1 − n 2 − 1 . Since w e are u sing p oi( n ) samples, n 1 , n 2 , n ′ − n 1 , n ′′ − n 2 are all indep endent Poisson distributions with means np, nq , n (1 − p ), and n (1 − q ) resp ectiv ely . Supp ose the und erlying h yp othesis is p = q . By Lemma 13 , E [ t ] = 0 and since , n 1 , n 2 , n ′ − n 1 , n ′′ − n 2 are all indep en den t P oisson distributions, v ariance of t is the sum of v ariances of eac h term and hence V ar( t ) ≤ 2 c 2 for some univ ersal constant c . Thus by Chebyshev’s in equ alit y Pr( t ≥ nǫ / 2) ≤ 8 c 2 ( nǫ ) 2 ≤ 1 3 . Hence by the Chernoﬀ b oun d 12 , after 18 log 1 δ rep etitions probabilit y that the ma jorit y of outputs is s ame is ≥ 1 − δ . Su pp ose the u n derlying hyp othesis is ( p − q ) 2 ( p + q )(2 − p − q ) > ǫ . Then by Lemma 13 E [ t ] = n ( p − q ) 2 p + q  1 − e − n ( p + q )  + n ( p − q ) 2 2 − p − q  1 − e − n (2 − p − q )  ( a ) ≥ n ( p − q ) 2 p + q  1 − e − nǫ  + n ( p − q ) 2 2 − p − q  1 − e − nǫ  ≥ 2 n ( p − q ) 2 ( p + q )(2 − p − q )  1 − e − nǫ  ( b ) ≥ n ( p − q ) 2 ( p + q )(2 − p − q ) . 16 ( a ) from the fact that p + q ≥ ( p + q ) ( p − q ) 2 ( p + q ) 2 ≥ ( p − q ) 2 p + q ≥ ǫ and similarly 2 − p − q ≥ ǫ . ( b ) follo ws the fact that nǫ ≥ 10. Similarly the v ariance is V ar( t ) ≤ c 2 + 4 n ( p − q ) 2 p + q + c 2 + 4 n ( p − q ) 2 2 − p − q = 2 c 2 + 8 n ( p − q ) 2 ( p + q )(2 − p − q ) . Th u s again by Chebyshev’s inequ ality , Pr( t ≤ nǫ / 2) ≤ 2 c 2 + 8 n ( p − q ) 2 ( p + q )(2 − p − q ) ( n ( p − q ) 2 ( p + q )(2 − p − q ) − nǫ 2 ) 2 ≤ 32 c 2 ( nǫ ) 2 + 32 nǫ ≤ 1 3 . The last inequalit y follo ws when n ≥ max(192 , 20 c ) ǫ . The lemma follo ws by the Ch er n oﬀ b ound argumen t as b efore. B.2 Pro of of Lemma 3 Let A j b e th e ev en t that a x j ≥ β j and x j is α j -hea vy . Since w e c ho ose eac h tu ple j in dep endently at time j , ev en ts A j s are indep end en t. Hence, Pr( ∪ A j ) = 1 − Pr ( ∩ A c j ) = 1 − m Y j =1 Pr( A c j ) = 1 − m Y j =1 (1 − Pr( A j )) ≥ 1 − e − P m j =1 Pr( A j ) . Let B j = { i : a i ≥ β j } . Since all elements in B j coun t tow ards A j except the last α j part, Pr( A j ) ≥ p ( B j ) − α j . Thus m X j =1 Pr( A j ) ≥ m X j =1 p ( B j ) − m X j =1 α j ≥ m X j =1 p ( B j ) − log m 4 log 16 /ǫ ≥ m X j =1 p ( B j ) − 1 / 4 . W e no w show that P m j =1 p ( B j ) ≥ 1 / 2, thus pro ving that P m j =1 Pr( A j ) ≥ 1 / 4 and Pr ( ∪ j A j ) ≥ 1 / 5. Since P k i =1 p ( i ) a i ≥ ǫ/ 4, m X i =1 p ( i ) a i = X i : a i ≥ ǫ/ 8 p ( i ) a i + X i : a i <ǫ/ 8 p ( i ) a i ≤ X i : a i ≥ ǫ/ 8 p ( i ) a i + ǫ/ 8 . 17 Th u s X i : a i ≥ ǫ/ 8 p ( i ) a i ≥ ǫ/ 8 , and X i : a i ≥ ǫ/ 8 p ( i )  8 a i ǫ  ≥ 1 / 2 . In P m j =1 p ( B j ), eac h p ( i ) is count ed exactly  8 a i ǫ  times, th us m X i =1 p ( B j ) ≥ 1 2 . B.3 Pro of of Lemma 5 The p ro of u ses the follo wing auxiliary lemma. Let p i,j def = p { i,j } ( i ) = p ( i ) p ( i )+ p ( j ) denote th e p robabilit y of i u nder conditional sampling. Lemma 14. If     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( j ) − q ( j ) p ( j ) + q ( j )     ≥ ǫ, (1) then ( p i,j − q i,j ) 2 ( p i,j + q i,j )(2 − p i,j − q i,j ) ≥ ǫ 2 ( p ( i ) + q ( i )) 2 ( p ( j ) + q ( j )) 2 4[ p ( i )( q ( i ) + q ( j )) + q ( i )( p ( i ) + p ( j ))][ p ( j )( q ( i ) + q ( j )) + q ( j )( p ( i ) + p ( j ))] ≥ ǫ 2 ( p ( i ) + q ( i ))( p ( j ) + q ( j )) 4( p ( i ) + q ( i ) + p ( j ) + q ( j )) 2 . Pr o of. Let s ( i ) = p ( i ) + q ( i ) and s ( j ) = p ( j ) + q ( j ). Up on expanding,     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( j ) − q ( j ) p ( j ) + q ( j )     = 2     p ( i ) q ( j ) − p ( j ) q ( i ) s ( i ) s ( j )     . (2) F urthermore, p i,j = p ( i ) p ( i )+ p ( j ) and similarly q i,j = q ( i ) q ( i )+ q ( j ) . Hence, ( p i,j − q i,j ) 2 ( p i,j + q i,j )(2 − p i,j − q i,j ) = ( p ( i ) q ( j ) − q ( i ) p ( j )) 2 [ p ( i )( q ( i ) + q ( j )) + q ( i )( p ( i ) + p ( j ))][ p ( j )( q ( i ) + q ( j )) + q ( j )( p ( i ) + p ( j ))] ( a ) ≥ ǫ 2 s 2 ( i ) s 2 ( j ) 4[ p ( i )( q ( i ) + q ( j )) + q ( i )( p ( i ) + p ( j ))][ p ( j )( q ( i ) + q ( j )) + q ( j )( p ( i ) + p ( j ))] ( b ) ≥ ǫ 2 s 2 ( i ) s 2 ( j ) 4( s ( i ) + s ( j )) 2 s ( i ) s ( j ) = ǫ 2 s ( i ) s ( j ) 4( s ( i ) + s ( j )) 2 . 18 ( a ) follo ws by Equations ( 1 ) and ( 2 ). ( b ) follo ws f r om m ax( p ( i ) , q ( i )) ≤ s ( i ) and m ax ( p ( j ) , q ( j )) ≤ s ( j ).  Pr o of. (Lemma 5 ) W e ﬁr st show th at if p = q , then Near-uniform-identity-test returns same with probability ≥ 1 − δ . By Lemma 1 , Tes t-equal return s error pr obabilit y δ j = 6 δ / ( π 2 j 2 ) for the j th tuple and hence by the union b ound , the o v erall error is ≤ P j δ j ≤ δ . If p 6 = q , w ith probabilit y ≥ 1 / 5, Find -element r eturns an elemen t x that is α -hea vy and q ( x ) − p ( x ) ≥ β q ( x ). F or this x, y , since p ( y ) ≥ q ( y ), q ( x ) − p ( x ) p ( x ) + q ( x ) − q ( y ) − p ( y ) p ( y ) + q ( y ) ≥ β q ( x ) p ( x ) + q ( x ) . By Lemma 14 the c hi-squared d istance b et we en p { x,y } and q { x,y } is low er b ounded by ≥  β q ( x ) p ( x ) + q ( x )  2 ( p ( x ) + q ( x )) 2 ( p ( y ) + q ( y )) 2 4[ p ( x )( q ( x ) + q ( y )) + q ( x )( p ( x ) + p ( y ))][ p ( y )( q ( x ) + q ( y )) + q ( y )( p ( x ) + p ( y ))] ( a ) ≥ β 2 q 2 ( x ) p 2 ( y ) 4[ p ( x )( q ( x ) + p ( y )) + q ( x )( p ( x ) + p ( y ))][ p ( y )( q ( x ) + p ( y )) + p ( y )( p ( x ) + p ( y ))] ( b ) ≥ β 2 q 2 ( x ) p 2 ( y ) 4[2 p ( y ) q ( x ) + p ( x ) p ( y ) + q ( x )(2 p ( y ) + p ( y ))][ p ( y )( q ( x ) + 2 p ( x )) + p ( y )( p ( x ) + 2 p ( x ) )] ( c ) ≥ β 2 q 2 ( x ) p 2 ( y ) 4[2 p ( y ) q ( x ) + q ( x ) p ( y ) + q ( x )(2 p ( y ) + p ( y ))][ p ( y )( q ( x ) + 2 q ( x )) + p ( y )( q ( x ) + 2 q ( x ))] ( d ) ≥ β 2 144 . ( a ) f ollo ws from the fact th at p ( y ) ≥ q ( y ). p ( x ) ≤ 2 p ( y ) and p ( y ) ≤ 2 p ( x ) hence ( b ). p ( x ) ≤ q ( x ) implies ( c ) and ( d ) f ollo ws from numerical simpliﬁcation. Th u s by Lemm a 1 algorithm retur n s diff w ith probability ≥ 1 − δ . By th e un ion b ound , the total er r or prob ab ility is ≤ 4 5 + δ . Th e n umb er of samples used is 16 /ǫ for the ﬁr st step and e O  1 β 2 j log 1 δ j  for tuple j . Hence the total n umb er of samp les u s ed is 16 ǫ + 16 /ǫ X j =1 O 1 β 2 j log 1 δ j ! = O  1 ǫ 2 log 1 δ ǫ  .  B.4 Pro of of T heorem 6 W e state the theorem statemen t for b etter readabilit y: If p = q , then Identity-te st returns sa me with probabilit y ≥ 1 − δ and if || p − q || 1 ≥ ǫ , then Identity-tes t returns diff with probabilit y ≥ 1 / 30. Recall that there are 16 ǫ tuples. Also observ e that all the three tests inside I dentity-test are called with error parameter ǫδ 48 . As a result if p = q , Identity -test outputs same with p robabilit y ≥ 1 − ǫδ 48 · 3 · 16 ǫ = 1 − δ . 19 W e n o w sho w that if || p − q || 1 ≥ ǫ , th en the algorithm outputs d iff with probabilit y ≥ 1 / 30. By Lemma 4 , with probabilit y ≥ 1 / 5 Find-el ement returns an elemen t x suc h that p ( x ) − q ( x ) ≥ β p ( x ) and α -hea vy . Pa rtition G x in to groups H = H 1 , H 2 , . . . s .t. for eac h group H j , p ( x ) ≤ p ( H j ) ≤ 2 p ( x ) and let p H G x and q H G x b e the corresp ondin g ind uced distributions. T here are three p ossib le cases. W e sho w that for any q , at least one of the sub-routines in Identity-tes t will outpu t diff with high probabilit y . 1. | p ( G x ) − q ( G x ) | ≥ αβ 5 . 2. | p ( G x ) − q ( G x ) | < αβ 5 and     p H G x − q H G x     1 ≥ β 5 . 3. | p ( G x ) − q ( G x ) | < αβ 5 and     p H G x − q H G x     1 < β 5 . If | p ( G x ) − q ( G x ) | ≥ αβ 5 , th en c hi-squ ared distance b et wee n p { G x ,G c x } and q { G x ,G c x } is ≥  αβ 5  2 and h ence Tes t-equal  ( αβ / 5) 2 , ǫδ 48 , p { G x ,G c x } , q { G x ,G c x }  (step 2 c ) outpu ts d iff with probability > 1 − ǫδ 48 . If | p ( G x ) − q ( G x ) | < αβ 5 and     p H G x − q H G x     1 ≥ β 5 , th en by Lemma 5 Nea r-uniform-identity- test ( β 5 , ǫδ 48 , p H G x , q H G x ) outputs diff with probabilit y > 1 5 − ǫδ 48 > 1 6 . If | p ( G x ) − q ( G x ) | < αβ 5 and     p H G x − q H G x     1 < β 5 , X y ∈H p H G x ( y ) I  p ( y ) − q ( y ) p ( y ) > 4 5 β  ≤ 1 p ( G x ) X y ∈H p ( y ) I  p ( y ) − q ( y ) > 4 5 β p ( y )  ≤ 5 4 β p ( G x ) X y ∈H | p ( y ) − q ( y ) | = 5 4 β X y ∈H     p ( y ) p ( G x ) − q ( y ) p ( G x ) + q ( y ) q ( G x ) − q ( y ) q ( G x )     ( a ) ≤ 5 4 β   X y ∈H     p ( y ) p ( G x ) − q ( y ) q ( G x )     + X y ∈H q ( y )     1 p ( G x ) − 1 q ( G x )       ( b ) ≤ 5 4 β  β 5 + q ( G x ) | p ( G x ) − q ( G x ) | p ( G x ) q ( G x )  ( c ) ≤ 5 4 β  β 5 + β 5  ≤ 1 2 . ( a ) f ollo ws from triangle inequalit y . ( b ) follo ws f rom the fact that     p H G x − q H G x     1 ≤ β 5 . p ( G x ) ≥ α and p ( G x ) − q ( G x ) ≤ αβ 5 and h en ce ( c ) . Th erefore, for a random samp le y from p H G x , with probabilit y ≥ 1 / 2, p ( y ) − q ( y ) p ( y ) ≤ 4 β 5 . L et q ( y ) p ( y ) = β ′ ≥ 1 − 4 β 5 and furthermore q ( x ) p ( x ) = β ′′ ≤ 1 − β . Hence β ′ − β ′′ ≥ β 5 . Th u s similar to the pro of of Lemma 14 , the c hi-squared distance b et wee n p { x,y } and q { x,y } can b e 20 lo w er b ound ed b y ≥ ( p ( x ) q ( y ) − q ( x ) p ( y )) 2 [ p ( x )( q ( x ) + q ( y )) + q ( x )( p ( x ) + p ( y ))][ p ( y )( q ( x ) + q ( y )) + q ( y )( p ( x ) + p ( y ))] ( a 1 ) ≥ ( β ′ − β ′′ ) 2 p 2 ( x ) p 2 ( y ) [ p ( x )( q ( x ) + q ( y )) + q ( x )( p ( x ) + p ( y ))][ p ( y )( q ( x ) + q ( y )) + q ( y )( p ( x ) + p ( y ))] ( a 2 ) ≥ ( β ′ − β ′′ ) 2 max 2 (1 , β ′ , β ′′ ) p ( x ) p ( y ) 4( p ( x ) + p ( y )) 2 ( b ) ≥ ( β ′ − β ′′ ) 2 18 max 2 (1 , β ′ , β ′′ ) ( c ) ≥ β 2 1800 . ( a 1 ) , ( a 2 ) follo w by subs tituting q ( x ) = β ′ p ( x ) and q ( y ) = β ′′ p ( y ). ( b ) follo ws from p ( x ) ≤ 2 p ( y ) and p ( y ) ≤ 2 p ( x ). β ′′ ≤ 1 and β ′ − β ′′ ≥ β 5 and hence th e RHS in ( b ) is minimized by β ′ = 1 + β 5 and β ′′ = 1. F or these v alues of β ′ , β ′′ , max(1 , β ′ , β ′′ ) ≤ 2 and hence ( c ). Thus Test-eq ual outputs diff with p robabilit y > 1 − ǫδ 48 . If || p − q || 1 ≥ ǫ , th en by Lemm a 4 s tep 1 p ic ks a tuple ( x, β , α ) s uc h that p ( x ) − q ( x ) ≥ p ( x ) β with p robabilit y at least 1 5 . Conditioned on this eve nt, for the three cases discussed ab o ve the minim um prob ab ility of outputting diff is 1 6 and every p, q falls in to one of the th r ee categories. Hence with probabilit y > 1 30 Identity-tes t outputs diff . W e n o w compute the sample complexit y of Identity-test . Step 1 of the algorithm uses 16 /ǫ samples. F or ev ery tup le ( x, β , α ), step 2( c ) of the algorithm us es O  1 β 2 log 1 δǫ  samples. Summin g o v er all tuples yields a sample complexity of 16 /ǫ X j =1 O 1 β 2 j log 1 δ j ǫ ! = O  1 ǫ 2 log 1 δ ǫ  F or the diﬀeren t tup les Test-equal  αβ 5 ) 2 , ǫδ 30 , p { G x ,G c x } , p { G x ,G c x }  can reuse samples and as αβ = Ω( ǫ/ (log 1 /ǫ )), it uses a total of Θ  1 ǫ 2 log 2 1 ǫ log 1 δǫ  . s amples. F urthermore, Ne ar-uniform-identity-test uses O  1 β 2 log 1 ǫδ  samples. Summin g o ve r all tuples, the sample complexity is O  1 ǫ 2 log 1 δǫ  . S umming o v er all the three cases, the sample com- plexit y of the algorithm is O  1 ǫ 2 log 2 1 ǫ log 1 δǫ  . C Closeness testing p r o ofs C.1 Pro of of Lemma 8 Recall that G i = { j : j ≥ i } . Let r i = p ( i )+ q ( i ) 2 and s i = p ( i ) − q ( i ) 2 . W e w ill u se the follo wing prop erties: P k i =1 r i = 1, P k i =1 s i = 0, and P k i =1 | s i | ≥ ǫ 2 . W e will sh o w that k X i =1 r i      s i r i − P k j = i s j P k j = i r j      ≥ ǫ 4 . 21 W e sho w that k X i =1 r i      s i r i − P k j = i s j P k j = i r j      ≥ | s 1 | + | s 2 | − | s 1 + s 2 | 2 +( r 1 + r 2 )      s 1 + s 2 r 1 + r 2 − P k j =1 s j P k j =1 r j      + k X i =3 r i      s i r i − P k j = i s j P k j = i r j      . Th u s red u cing the problem fr om k indices to k − 1 indices with s 1 , s 2 , . . . s k going to s 1 + s 2 , s 3 , . . . s k and r 1 , r 2 , . . . r k going to r 1 + r + 2 , r 3 , r 4 , . . . r k . Con tinuing similarly w e can reduce the k − 1 indices to k − 2 indices with term s s 1 + s 2 + s 3 , s 4 . . . s k and r 1 + r 2 + r 3 , r 4 . . . r k and so on. T elescopically adding the sum k X i =1 r i      s i r i − P k j = i s j P k j = i r j      ≥ | s 1 | + | s 2 | − | s 1 + s 2 | 2 + | s 1 + s 2 | + | s 3 | − | s 1 + s 2 + s 3 | 2 + . . . = P k i =1 | s i | 2 ≥ ǫ 4 , where the last equalit y follo ws from th e fact that P k i =1 s i = 0. T o pro ve the requir ed inductiv e step, it suﬃces to sho w 2 X i =1 r i      s i r i − P k j = i s j P k j = i r j      ≥ | s 1 | + | s 2 | − | s 1 + s 2 | 2 + ( r 1 + r 2 )      s 1 + s 2 r 1 + r 2 − P k j =1 s j P k j =1 r j      ≥ | s 1 | + | s 2 | + | s 1 + s 2 | 2 , where th e last inequalit y follo ws fr om the fact that P k i =1 s i = 0. Rewriting the left hand side us ing the fact that P k i =1 s i = 0 | s 1 | + r 2     s 2 r 2 + s 1 r 2 + r ′ 3     , where r ′ 3 = P k j =3 r j . T h us it suﬃces to sho w | s 1 | + r 2     s 2 r 2 + s 1 r 2 + r ′ 3     ≥ | s 1 | + | s 2 | + | s 1 + s 2 | 2 . W e pro ve it by considering three sub -cases: s 1 , s 2 ha v e the s ame sign, s 1 , s 2 ha v e diﬀeren t signs but | s 1 | ≥ | s 2 | , and s 1 , s 2 ha v e diﬀeren t signs bu t | s 1 | < | s 2 | . If s 1 , s 2 ha v e the s ame sign, then | s 1 | + r 2     s 2 r 2 + s 1 r 2 + r ′ 3     ≥ | s 1 | + r 2     s 2 r 2     = | s 1 | + | s 2 | = | s 1 | + | s 2 | + | s 1 + s 2 | 2 . If s 1 and s 2 ha v e diﬀeren t signs and | s 1 | ≥ | s 2 | , then | s 1 | + r 2     s 2 r 2 + s 1 r 2 + r ′ 3     ≥ | s 1 | = | s 1 | + | s 1 | 2 = | s 1 | + | s 2 | + | s 1 + s 2 | 2 . If s 1 and s 2 ha v e diﬀeren t signs and | s 1 | < | s 2 | , then | s 1 | + r 2     s 2 r 2 + s 1 r 2 + r ′ 3     ≥ | s 1 | + r 2     s 2 r 2 + s 1 r 2     = | s 1 | + | s 2 + s 1 | = | s 1 | + | s 2 | + | s 1 + s 2 | 2 . 22 C.2 Pro of of Lemma 10 W e pro ve this lemma usin g sev eral sm aller sub-results. W e ﬁrst state a concen tration result, whic h follo ws from Bernstein’s inequalit y . Lemma 15. Consider a set G such that max j ∈ G r ( j ) ≤ r max . Consider set S forme d by sele cting e ach element fr om G indep endently and uniformly r andomly with pr ob ability r 0 , then E [ r ( S )] = r 0 r ( G ) , and with pr ob ability ≥ 1 − 2 δ , | r ( S ) − E [ r ( S )] | ≤ r 2 r 0 r max r ( G ) log 1 δ + r max log 1 δ . F urthermor e E [ | S | ] = r 0 | G | , and with pr ob ability ≥ 1 − 2 δ , || S | − r 0 | G || ≤ r 2 r 0 | G | log 1 δ + log 1 δ . C.2.1 Results on Prune-set W e no w sh o w that with high probab ility Pr une-s et nev er remo ves an elemen t with probability ≤ 2 r ( i ). Lemma 16. The pr ob ability that the algorithm r emoves an element j fr om the se t S such that r ( j ) ≤ 2 r ( i ) during step 2 of Binar y-sear ch is ≤ δ / 5 . Pr o of. If r ( j ) < 2 r ( i ), then r ( j ) r ( j )+ r ( i ) ≤ 2 3 . Applyin g Ch ernoﬀ b ound , Pr  n ( j ) ≥ 3 n 2 4  ≤ e − n 2 / 72 . Since the algorithm u ses this step no more than O ( n 1 log log k ) times, the total error proba- bilit y is less than O ( n 1 log log k · e − n 2 / 72 ). S ince n 1 is p oly(log log log k, ǫ − 1 , log δ − 1 ) and n 2 = O (log log log k + log 1 ǫδ ), the err or probabilit y is ≤ δ / 5.  W e n o w show that Pr une -set r emo v es all elements with pr obabilit y ≥ 4 r ( i ) with high pr oba- bilit y . Recall that δ ′ = δ 40 m log log k . Lemma 17. If element i is α -he avy, β -appr oximable and r guess ≤ γ β r ( i ) r ( G i ) , then Prune-set r emoves al l elements such that r ( j ) > 4 r ( i ) during al l c a l ls of step 2 of Bina r y-s earch with pr ob ability ≥ 1 − δ 5 . Pr o of. Let A = { j : r ( j ) ≤ 4 r ( i ) } and S ′ = S ∩ A . By Lemma 15 , with probabilit y ≥ 1 − 2 δ ′ r ( S ′ ) ≤ r guess r ( A ) + r 8 r guess r ( i ) r ( A ) log 1 δ ′ + 4 r ( i ) log 1 δ ′ ≤ 2 r guess + 8 r ( i ) log 1 δ ′ , 23 where the last inequ alit y follo ws from the identit y √ 2 ab ≤ a + b . Observe that | A c | ≤ 1 4 r ( i ) . Let S ′′ = S \ S ′ . By Lemma 15 , w ith probabilit y ≥ 1 − 2 δ ′ ν def = | S ′′ | ≤ r guess 1 4 r ( i ) + s 2 r guess 1 4 r ( i ) log 1 δ ′ + log 1 δ ′ ≤ r guess 2 r ( i ) + 2 log 1 δ ′ . S has ν elemen ts with p robabilit y > 4 r ( i ). Su pp ose we ha ve observ ed j of these element s and remo v ed them from S . There are ν − j of them left in S . After taking another η samp les from S , th e pr ob ab ility of not observing a ( j + 1)t h hea vy elemen t is < ( r ( S ′ ) / ( r ( S ′ ) + 4 r ( i )( ν − j ))) η . Therefore, η j def = log ν δ ′ ·  1 + r ( S ′ ) 4 r ( i )( ν − j )  ≥ log ν δ ′ log  1 + 4 r ( i )( ν − j ) r ( S ′ )  samples su ﬃce to observe an elemen t from S ′′ with probability > 1 − δ ′ ν . Af ter observing the samp le (call it j ), similar to the pro of of Lemma 16 it can b e sho wn that with probabilit y ≥ 1 − δ ′ , for samples from r { j,i } , n ( j ) ≥ 3 n 2 / 4 and hence j will b e remo ve d from S . Thus to remov e all ν elemen ts of probability > 4 r ( i ), we n eed to rep eat this s tep n 1 = ν X j =1 η j = log ν δ ′ · ν X j =1  1 + r ( S ′ ) 4 r ( i ) j  ≤ log ν δ ′ ·  ν + r ( S ′ ) 4 r ( i ) log ν  times. S ubstituting r ( S ′ ) and ν in the RHS and simplifying w e hav e n 1 ≤ 4 log γ 2 δ ′ αβ  γ 2 αβ log γ 2 αβ + 2 log 1 δ ′ log log 1 δ ′  . By the un ion b ound , total error p r obabilit y is ≤ δ ′ . Since the num b er of calls to Prune-set is at most log log k d uring step 2 of the algorithm, the error is at most log log k · 2 δ ′ ≤ δ / 5 and the lemma follo ws from the union b oun d.  C.2.2 Pro of of Lemma 10 The pro of of Lemm a 10 follo ws from the follo wing t w o sub-lemmas. In Lemma 18 , w e sh o w that if r guess ≥ γ r ( i ) r ( G i ) then step 4 w ill return heavy , and if r guess ≤ 1 γ r ( i ) r ( G i ) hence th e algorithm outputs light with h igh prob ab ility . Since w e ha v e log log k iterations and 1 k ≤ r ( i ) r ( G i ) ≤ 1, we reac h r ( i ) γ r ( G i ) ≤ r guess ≤ γ r ( i ) β r ( G i ) at some p oin t of the algorithm. Lemma 18. If r guess > γ β r ( i ) r ( G i ) , i is α he avy, β -appr oximable, and Prune-set has r emove d none of the elements with pr ob ability ≤ 2 r ( i ) , then with pr o b ability ≥ 1 − 4 δ ′ , step 4 outputs heavy . Pr o of. Let G ′ i = G i \ { i } . Since i is α hea vy and β -approximable, b y con v exit y r ( G ′ i ) r ( G i )     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( G ′ i ) − q ( G ′ i ) p ( G ′ i ) + q ( G ′ i )     ≥     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( G i ) − q ( G i ) p ( G i ) + q ( G i )     ≥ β . 24 Hence r ( G ′ i ) ≥ β r ( G i ) / 2. By assu mption, all the elements with probability < 2 r ( i ) in set S will remain after pruning. Thus all the elemen ts in set S from G ′ i remains after pruning. Let S ′ = G ′ i ∩ S . By Lemma 15 with G = G ′ i , r max = r i , and r 0 = r guess Pr r ( S ′ ) ≤ r guess r ( G ′ i ) − r 2 r guess r ( i ) r ( G ′ i ) log 1 δ ′ − r ( i ) log 1 δ ′ ! < 2 δ ′ . (3) T aking d er iv ativ es, it can b e sho wn that th e slop e of the RHS of the term in side parenthesis is r ( G ′ i ) − r r ( i ) r ( G ′ i ) log 1 δ ′ 2 r guess , w hic h is p ositive f or r guess ≥ γ r ( i ) β r ( G i ) . Th us the v alue is minimized at r guess = γ r ( i ) β r ( G i ) in the range h γ r ( i ) β r ( G i ) , ∞  and s implifying this lo w er b ou n d usin g v alues of γ , β , w e get r guess r ( G ′ i ) − r 2 r guess r ( i ) r ( G ′ i ) log 1 δ ′ − r ( i ) log 1 δ ′ ≥ γ r ( i ) 4 . Since Pr( X < b ) ≤ Pr ( X < b + t ), we h a v e Pr  r ( S ′ ) ≤ γ r ( i ) 4  < 2 δ ′ . Hence with probabilit y ≥ 1 − 2 δ ′ , r ( i ) r ( S )+ r ( i ) ≤ r ( i ) r ( S ′ ) ≤ 4 γ . By the Ch ernoﬀ b ound , Pr  n ( i ) n 3 > 5 γ  ≤ Pr  n ( i ) n 3 > r ( i ) r ( S ) + r ( i ) + 1 γ  ≤ e − 2 n 3 /γ 2 . Therefore, for n 3 ≥ O  γ 2 log log log k δ  , s tep 3 outputs heavy with probabilit y ≥ 1 − 2 δ ′ . By the union b oun d the total error probability ≤ 4 δ ′  Lemma 19. If r guess < r ( i ) γ and Prune-set has r emove d al l elements with pr ob ability ≥ 4 r ( i ) and none of the elements with pr ob ability ≤ 2 r ( i ) , then with pr ob ability ≥ 1 − 4 δ ′ , step 3 , outputs li ght . Pr o of. The p ro of is similar to that of Lemma 18 . By assu mption all the elemen ts ha ve p robabilit y ≤ 4 r ( i ). By Lemma 15 , Pr r ( S ) > r ( i ) " s 8 r guess r ( i ) log 1 δ ′ + r guess r ( i ) + 4 log 1 δ ′ #! ≤ 2 δ ′ . Similar to the analysis after E quation ( 3 ), taking deriv ati ves it can b e sh own that the RHS of the term inside paren thesis is m aximized wh en r guess = r ( i ) γ for the r an ge [0 , r ( i ) γ ]. Thus simplifyin g the ab ov e expression with this v alue of r guess and the v alue of γ , with pr obabilit y ≥ 1 − 2 δ ′ , r ( S ) ≤ γ r ( i ) / 10. Th u s with probabilit y ≥ 1 − 2 δ ′ , r ( i ) r ( S ) + r ( i ) ≥ 1 1 + γ / 10 ≥ 6 γ . By the Chernoﬀ b ound Pr  n 1 ( i ) n 3 ≤ 5 γ  ≤ Pr  n 1 ( i ) n 3 ≤ r ( i ) r ( S ) + r ( i ) − 1 γ  ≤ e − 2 n 3 /γ 2 . The lemma follo ws f r om the b ound on n 3 and by the un ion b ound total err or pr obabilit y ≤ 4 δ ′ .  25 Note that the conditions in Lemmas 18 and 19 h old with p robabilit y ≥ 1 − 2 δ 5 b y Lemmas 17 and 16 . F urthermore, since we use all the steps at most log log k times, by the union b ound, the conclusion in Lemma 10 fails with probabilit y ≤ 2 δ 5 + log log k · 8 δ ′ = 2 δ 5 ≤ δ . C.3 Pro of of Theorem 11 F or the ease of r eadabilit y , we divide the pr o of in to seve ral sub-cases. W e ﬁrst show that if p = q , then the algorithm returns same with high probabilit y . Recall that for n otational simplicit y we redeﬁne δ ′ = ǫδ 32 m ( n 4 +1) log log k . Lemma 20. If p = q , Closen ess tes t outputs same with e rr or pr ob ability ≤ δ . Pr o of. Note that th e algorithm returns d iff only if an y of the Test-Equal s retur n d iff . W e call Test-Equal at most 16 ǫ · log log k · m · ( n 4 + 1) times. Th e pr obabilit y that at an y time it return s an error is ≤ δ ′ . Thus by the u nion b ound total err or pr obabilit y is ≤ δ ′ 16 log log k · m · ( n 4 + 1) /ǫ ≤ δ .  W e no w pro ve the result when || p − q || 1 ≥ ǫ . W e ﬁrs t state a lemma showing that Pr une -set ensures that set S do es not h a v e an y elemen ts ≥ 4 r ( i ). The pro of is similar to that of Lemmas 17 and 16 and hence omitted. Lemma 21. If i is α - he a vy and β -appr oxima ble, then at any c al l of step 2 of As sisted-clos eness- test , with pr ob ability ≥ 1 − 2 δ 5 , if r guess ≤ γ β r ( i ) r ( G i ) , then Prune-test never r emoves an element with pr ob ability ≤ 2 r ( i ) and r emoves al l elements with pr ob ability ≥ 4 r ( i ) . The pro of when || p − q || 1 ≥ ǫ is d ivided into t wo parts b ased on the pr obabilit y of certain ev en ts. Let β ′ = p ( i ) − q ( i ) p ( i )+ q ( i ) , β ′′ = αβ 128 γ log 128 γ β 2 . Let D d enote the ev en t suc h that an element j fr om G c i with    p ( j ) − q ( j ) p ( j )+ q ( j ) − β ′    ≥ β ′′ and r ( j ) ≤ 4 r ( i ) gets in clud ed in S . W e divide the pro of in t w o cases when Pr( D ) ≥ αβ 2 128 γ and Pr( D ) < αβ 2 128 γ . Lemma 22. Supp ose || p − q || 1 ≥ ǫ . If i is α -he avy and β -appr oximable, r ( i ) γ ≤ r guess ≤ γ β r ( i ) r ( G i ) , the c onclusions in L emma 21 hold , and Pr( D ) ≥ αβ 2 128 γ , then step 3( a ) of Assisted-clo seness -test r eturns di ff with pr ob ability ≥ 1 / 5 . Pr o of. w e then sho w that the follo wing four ev ents happ en with high pr obabilit y for at least one set S ∈ { S 1 , S 2 . . . S m } . • S in cludes a j su ch that    p ( j ) − q ( j ) p ( j )+ q ( j ) − β ′    ≥ β ′′ , r ( j ) ≤ 4 r ( i ) , j / ∈ G i . • r ( S ) ≤ r guess + 8 p r ( i ) r guess . • j app ears when S is s amp led n 4 times. • Test -equal returns diff . 26 Clearly , if the ab o v e four even ts h ap p en then the algorithm outpu ts diff . Thus to b ound the err or probabilit y , we b ound the error pr ob ab ility of eac h of the ab ov e four ev ent s and use un ion b ound . Probabilit y that at least one of the sets con tain an elemen t j suc h th at    p ( j ) − q ( j ) p ( j )+ q ( j ) − β ′    ≥ β ′′ , r ( j ) ≤ 4 r ( i ) , j / ∈ G i is 1 − (1 − Pr( D )) m ≥ 1 − e − Pr( D ) m ≥ 5 6 . Let S ′ = { j ∈ S : r ( j ) ≤ 4 r ( i ) } . O bserv e th at b efore prun ing E [ r ( S ′ )] ≤ r guess and V ar( r ( S ′ )) ≤ 4 r ( i ) r guess . Hence by the Chebyshev b ound w ith p robabilit y ≥ 1 − 1 / 16, r ( S ′ ) ≤ r guess + 8 q r ( i ) r guess , After prun in g, r ( S ) con tains only elements of probabilities from S ′ . Hence w ith probabilit y ≥ 1 − 1 / 16, r ( S ) ≤ r guess + 8 p r ( i ) r guess . Probability that th is elemen t do es app ear wh en samp led n 4 times is 1 −  1 − r ( j ) r ( S )  n 4 ≥ 1 − 1 − r ( i ) r guess (1 + 8 p r ( i ) /r guess ) ! n 4 ≥ 1 −  1 − αβ 9 γ  n 4 ≥ 5 6 . Since    p ( j ) − q ( j ) p ( j )+ q ( j ) − β ′    ≥ β ′′ , r ( i ) ≤ r ( j ) ≤ 4 r ( i ) b y Lemma 14 the c hi-squared d istance is ≥ ( β ′′ ) 2 r ( j ) r ( i ) 4( r ( i ) + r ( j )) 2 ≥ ( β ′′ ) 2 25 . Th u s by Lemma 1 , algorithm outpu ts diff with probability 1 − δ ′ . By the union b oun d the total error probabilit y ≤ 1 / 6 + 1 / 16 + +1 / 6 + δ ′ ≤ 4 / 5.  Lemma 23. Supp ose || p − q || 1 ≥ ǫ . If i is α -he avy and β -appr oximable, r ( i ) γ ≤ r guess ≤ γ β r ( i ) r ( G i ) , the c onclusions in L emma 21 hold , and Pr( D ) < αβ 2 128 γ , then step 3( b ) of Assiste d-closen ess-test r eturns di ff with pr ob ability ≥ 1 / 5 . W e s ho w that the follo wing four ev ent s h app en with high probabilit y for at least some s et S ∈ { S 1 , S 2 , . . . , S m } . Let S ′ = S ∩ G i and G ′ i = G i \ { i } . Let Z = r ( S ′ )     β ′ − p ( S ′ ) − q ( S ′ ) p ( S ′ ) + q ( S ′ )     = | β ′ ( p ( S ′ ) + q ( S ′ )) − ( p ( S ′ ) − q ( S ′ )) | 2 . • Z ≥ r guess | ( β ′ ( p ( G ′ i ) + q ( G ′ i )) − p ( G ′ i ) + q ( G ′ i )) | / 4. • r ( S ) ≤ 8( r ( i ) + r guess ) log 128 γ αβ 2 . • Ev ent D d o es n ot h ap p en. • Test-equal outputs diff . Clearly if all of the ab ov e ev ents h ap p en, then the test outputs di ff . W e no w b ound the error probabilit y of eac h of the ev ent s and us e un ion b ound . Sin ce none of the element s in S ′ undergo prun in g, the v alue of Z remains u nc hanged b efore and after pru ning. T h us an y concentrati on 27 inequalit y for Z remains th e same after pr uning. W e n o w compu te th e exp ectation and v ariance of Z and use Pa ley Zigmund inequalit y . E [ Z ] = E [ | β ′ ( p ( S ′ ) + q ( S ′ )) − p ( S ′ ) + q ( S ′ ) | ] / 2 ≥ | E ( β ′ ( p ( S ′ ) + q ( S ′ )) − p ( S ′ ) + q ( S ′ )) | / 2 = r guess | ( β ′ ( p ( G ′ 1 ) + q ( G ′ i )) − p ( G ′ i ) + q ( G ′ i )) | / 2 , where the in equalit y follo ws from con v exit y of | · | fu n ction. Let 1 ( j, S ′ ) denote th e ev ent that j ∈ S ′ . T he v aria nce is lo wer b ound ed as V ar( Z ) = E [ Z 2 ] − ( E [ Z ]) 2 = E [( β ′ ( p ( S ′ ) + q ( S ′ )) − p ( S ′ ) + q ( S ′ )) 2 ] / 4 − E 2 [ | β ′ ( p ( S ′ ) + q ( S ′ )) − p ( S ′ ) + q ( S ′ ) | ] / 4 ( a ) ≤ E [( β ′ ( p ( S ′ ) + q ( S ′ )) − p ( S ′ ) + q ( S ′ )) 2 ] / 4 − E 2 [ β ′ ( p ( S ′ ) + q ( S ′ )) − p ( S ′ ) + q ( S ′ )] / 4 =V ar( β ′ ( p ( S ′ ) + q ( S ′ )) − p ( S ′ ) + q ( S ′ )) / 4 ( b ) = X j ∈ G ′ i V ar( 1 ( j, S ′ )( β ′ ( p ( j ) + q ( j )) − p ( j ) + q ( j )) 2 / 4 ≤ X j ∈ G ′ i E [ 1 ( j, S ′ )]( β ′ ( p ( j ) + q ( j )) − p ( j ) + q ( j )) 2 / 4 = X j ∈ G ′ i r guess ( β ′ ( p ( j ) + q ( j )) − p ( j ) + q ( j )) 2 / 4 ≤ max j ′ ∈ G ′ i | β ′ ( p ( j ′ ) + q ( j ′ )) − p ( j ′ ) + q ( j ′ ) | · r guess X j ∈ G ′ i | β ′ ( p ( j ) + q ( j )) − p ( j ) + q ( j ) | / 4 ( c ) ≤ 4 r ( i ) · r guess r ( G ′ i ) . ( a ) follo ws f rom the b oun d on exp ectation. ( b ) follo ws from the ind ep endence of ev en ts 1 ( j, S ). ( c ) follo ws fr om th e fact that p ( j ) + q ( j ) = 2 r ( j ) ≤ 2 r ( i ), | β ′ | ≤ 1 and P i r ( j ) ≤ r ( G ′ i ). Hence by the P aley Zygmund inequality , Pr( Z ≥ r guess | ( β ′ ( p ( G ′ 1 ) + q ( G ′ i )) − p ( G ′ i ) + q ( G ′ i )) | / 4) ≥ P r( Z ≥ E [ Z ] / 2) ≥ 1 4 E 2 [ Z ] V ar( Z ) + E 2 [ Z ] ≥ 1 4 E 2 [ Z ] 4 r ( G ′ i ) r ( i ) r guess + E 2 [ Z ] . Since i is β -app ro ximable, by conv exit y r ( G ′ i ) r ( G i )     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( G ′ i ) − q ( G ′ i ) p ( G ′ i ) + q ( G ′ i )     ≥     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( G i ) − q ( G i ) p ( G i ) + q ( G i )     ≥ β . Hence, r ( G ′ i )     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( G ′ i ) − q ( G ′ i ) p ( G ′ i ) + q ( G ′ i )     ≥ r ( G i ) β . 28 Th u s E [ Z ] ≥ r guess r ( G i ) β and Pr( Z ≥ r guess | ( β ′ ( p ( G ′ 1 ) + q ( G ′ i )) − p ( G ′ i ) + q ( G ′ i )) | / 4) ≥ 1 4 ( r guess r ( G i ) β ) 2 4 r ( i ) r guess r ( G ′ i ) + ( r guess r ( G i ) β ) 2 ≥ 1 4 ( r guess r ( G i ) β ) 2 2 max(4 r ( i ) r guess r ( G ′ i ) , ( r guess r ( G i ) β ) 2 ) = 1 8 min  1 , ( r guess r ( G i ) β ) 2 4 r ( i ) r guess r ( G ′ i )  ≥ r ( G i ) β 2 32 γ ≥ αβ 2 32 γ . (4) By Lemma 15 , with probabilit y ≥ 1 − αβ 2 128 γ , r ( S ) ≤ r guess + r 8 r guess r ( i ) log 128 γ αβ 2 + 4 r ( i ) log 128 γ αβ 2 ≤ 8( r ( i ) + r guess ) log 128 γ αβ 2 . (5) Let S ′′ = S \ S ′ . If even t D has n ot happ ened then for all elemen ts j ∈ S ′′ ,    β ′ − p ( j ) − q ( j ) p ( j )+ q ( j )    ≤ β ′′ and hence     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( S ′′ ) − q ( S ′′ ) p ( S ′′ ) + q ( S ′′ )     ≤ β ′′ . (6) Com binin g the ab o v e set of equations,     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( S ) − q ( S ) p ( S ) + q ( S )     ( a ) ≥ r ( S ′ ) r ( S )     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( S ′ ) − q ( S ′ ) p ( S ′ ) + q ( S ′ )     − r ( S ′′ ) r ( S )     p ( i ) − q ( i ) p ( i ) + q ( i ) − p ( S ′′ ) − q ( S ′′ ) p ( S ′′ ) + q ( S ′′ )     ( b ) ≥ Z r ( S ) − β ′′ ( c ) ≥ 2 r ( G ′ i ) r guess 4 r ( S )      β ′ − p ( G ′ i ) − q ( G ′ i ) p ( G i ) + q ( G i )      − β ′′ ≥ r ( G i ) r guess β 2 r ( S ) − β ′′ ( d ) ≥ r ( G i ) r guess β 8 r ( S ) . ( a ) f ollo ws f r om conv exit y and the fact that | a + b | ≥ | a | − | b | , ( b ) follo ws f r om Equation ( 6 ), and 29 ( c ) follo ws from Equation ( 4 ). ( d ) follo ws f rom the fact that β ′′ = αβ 128 γ log 128 γ αβ 2 ≤ β 128 log 128 γ αβ 2 min  α γ , α  ≤ β 128 log 128 γ αβ 2 min  r ( G i ) r guess r ( i ) , r ( G i )  = r ( G i ) r guess β 64 log 128 γ αβ 2 · 2 m ax( r ( i ) , r guess ) ≤ r ( G i ) r guess β 64 log 128 γ αβ 2 ( r ( i ) + r guess ) ≤ r ( G i ) r guess β 8 r ( S ) . Th u s by Lemma 14 , c hi-squared distance is low er b ounded b y  r ( G i ) r guess β 8 r ( S )  2 r ( i ) r ( S ) 4( r ( i ) + r ( S )) 2 = r 2 guess β 2 r ( i ) r 2 ( G i ) 2 8 r ( S )( r ( i ) + r ( S )) 2 ( a ) ≥ r 2 guess β 2 r ( i ) r 2 ( G i ) 2 20 ( r ( i ) + r guess ) 3 log 3 128 γ αβ 2 ≥ r 2 guess β 2 r ( i ) r 2 ( G i ) 2 20 · 8 m ax( r 3 ( i ) , r 3 guess ) · log 3 128 γ αβ 2 = r 2 ( G i ) β 2 2 23 log 3 128 γ αβ 2 min r ( i ) r guess , r 2 guess r 2 ( i ) ! ( b ) ≥ r 2 ( G i ) β 2 2 23 log 3 128 γ αβ 2 min  β r ( G i ) γ , r 2 ( G i ) γ 2 r 2 ( G i )  ≥ α 3 β 3 2 23 γ 2 log 3 128 γ αβ 2 . ( a ) follo ws f rom Equ ation ( 5 ) and ( b ) follo ws from b ounds on r guess . Thus with prob ab ility ≥ 1 − αβ 2 128 γ , Test-eq ual outputs diff . By th e un ion b ound, the error pr obabilit y for an S ∈ { S 1 , S 2 , . . . S m } is ≤ 1 − αβ 2 32 γ + αβ 2 128 γ + αβ 2 128 γ + δ ′ ≤ 1 − αβ 2 128 γ . Since we are rep eating it for m s ets, the probabilit y that it outpu ts diff is ≥ 1 −  1 − αβ 2 128 γ  m ≥ 1 − e αβ 2 m 128 γ ≥ 1 5 . Theorem 11 follo ws fr om Lemm a 20 for the case p = q . If || p − q || 1 ≥ ǫ , then it follo ws from 9 (ﬁnds a goo d tuple), 10 (ﬁnds go o d appro ximation of r guess ), 21 (pruning), and 22 (Pr( D ) is large), and 23 (Pr ( D ) is s mall). By Lemma 20 s u ccess probabilit y wh en p = q is ≥ 1 − δ . Th e su ccess probabilit y when || p − q || 1 ≥ ǫ is at least th e probability th at w e pick a go o d-tuple ( i, β , α ) and the success probabilit y once a go o d tuple is p ic k ed (sum of errors in Lemmas 10 , 21 + maximum 30 of errors in Lemmas 22 and 23 ) which can b e sho wn to b e 1 / 5 · (1 / 5 − δ − 2 δ / 5) ≥ 1 / 30. W e now analyze the n umb er of samples our algorithm uses. W e ﬁrs t calculate th e num b er of samples used by Assisted -closenes s-test . S tep 2 calls Pr un e-set m times and eac h time Prune-set u ses n 1 n 2 samples. Hence, step 2 uses m n 1 n 2 samples. Step 3( a ) uses mn 4 · e O ( β ′′− 2 ) and step 3( b ) uses m · e O ( ǫ − 3 ). Hence, the total num b er of samples used b y As sisted-clos eness-te st is mn 1 n 2 + mn 4 · e O ( β ′′− 2 )+ m · e O ( ǫ − 3 ) = e O  α − 1 β − 2 ǫ − 1 + α − 1 β − 2 ǫ − 1 ǫ − 2 + α − 1 β − 2 ǫ − 3  = e O  β − 1 ǫ − 4  . Th u s eac h Assiste d-closene ss-test uses e O ( ǫ − 4 β − 1 ) samples. Hence, the n umb er of samp les used b y Bina r y-s earch is ≤ e O  log log k  n 1 n 2 + ǫ − 4 β − 1 + n 3  = e O  log log k ǫ 4 β  . Since Closeness-te st calls Binar y-search for 16 /ǫ diﬀeren t tuples. Hence, the sample com- plexit y of closeness test is 16 ǫ + 16 /ǫ X j =1 e O  log log k ǫ 4 β j  = e O  log log k ǫ 5  . 31

Faster Algorithms for Testing under Conditional Sampling

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment