Data-driven efficient score tests for deconvolution problems

Data-driv en eﬃcien t score test s for decon v olution problems. Lango v o y Mikhail ∗ Mikhail L angovoy, Institute for Mathematic al Sto chastics, Ge or g-A ugust-University at G¨ ottingen, Maschmuel lenwe g 8/10, 37077 Go ettingen, Germany e-mail: langov oy@math.uni- goettingen.de Abstract: W e consider te sting statistical h ypotheses ab out densities of signals in dec onv olution mo dels. A new approac h to this problem is pro- posed. W e constructed score t ests f or the deconv olution with the known noise d ensity and eﬃcient score te sts for the case of unkno wn de nsity . The tests are incorporated with model selection rules t o choose reasonable model dimensions aut omatically b y the data. Consistenc y of the tests i s prov ed. AMS 2000 sub ject cla ssiﬁcations: Primary 62H15; seconda ry 62P30, 62E20. Keywords and ph rases: Hypothesis testing, statistical in v erse problems, decon v olution, eﬃcient score test, mo del selec tion, data-driv en test . 1. In tro duction Constructing g o o d tests for statistical hypo theses is an essential problem of statistics. There are tw o main approaches to cons tructing tes t statistics. In the ﬁrs t approach, roughly speak ing , some measur e of distance betw een the theoretical a nd the corresp onding empirical distributions is proposed as the test statistic. Classical examples of this appro ach are the Cramer-von Mises and the Kolmogo rov-Smirnov statistics. Although, these tests works and are capable of giving v ery goo d results, but each of these tes ts is asymptotically optimal only in a ﬁnite n umber of dir ections of alterna tives to a null hypo thes is (see Nikitin (1995)). ∗ Financial supp ort of the Deutsche F orsc hun gsgemeinscha ft GK 1023 ”Ident iﬁk ation i n Mathematisc hen M odellen” is gratefully ackn owledged . 1 imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 2 Now adays, there is an increasing in terest to the second a pproach of con- structing t est statistics. The idea of this approach is to cons truct tests in such a wa y that the tests w ould be asymptotically optimal. T est statistics con- structed follo wing this appr o ach ar e often c a lled (eﬃcien t) s c ore test statistics. The pio nee r o f this approa ch was Neyman (1937) and then many other works follow ed: Neyman (1959), Cox and Hinkley (1974), Bick el a nd Ritov (1992), Le Cam (1956), Ledwina (1994). This approach is also closely rela ted to the the- ory of eﬃcient (adaptive) estimation - Bick el et al. (199 3), Ibragimov and Ha s ′ minski ˘ ı (1981). Score tests are a symptotically optimal in the sense of intermediate eﬃ- ciency in an inﬁnite num ber of directio ns of a lternatives (see Inglo t and L e dwina (1996)) and show go o d ov erall p erfor ma nce in pr actice (Ka llenberg and Ledwina (1995), Kallenberg and Ledwina (199 7)). W e descr ib ed the situation in classical hypothesis testing , i.e., testing hy- po theses a bo ut ra ndo m v ariables X 1 , . . . , X n , whose v alues are directly obse r v- able. But, it is imp ortant from practica l p oint of view to b e able to construct tests fo r situations where X 1 , . . . , X n are co rrupted o r can only b e obser ved with an additional noise term. These kind of problems ar e termed statistic al inverse pr oble ms . The most w ell-known example here is the deconv olution pr o blem. This problem app ear s when one has noisy signals or measuremen ts: in ph ysics, seismology , optics and imaging, engineering. It is a building blo ck for many complicated statistical inverse pro blems. Due to imp or tance o f the deconvolution problem, testing statistica l hypothe- ses related to this pr o blem ha s b een widely studied in the liter a ture. But, to our knowledge, a ll the propos ed tests w ere bas e d on so me kind of distance (usually a L 2 − type distance) betw een the theoretical density function and the e mpir- ical estimate of the density (see, for exa mple, Bickel and Rosenblatt (197 3), Delaigle and Gijb els (20 02), Holzmann et al. (2 007)). Thus, only the ﬁr st ap- proach des crib ed above was implemented for the deconv olution pr oblem. In this pap er, we trea t the deconv olution problem with the second appro a ch. W e construct eﬃcient sco re tests for the problem. F ro m classica l hypothesis testing, it was s hown that for applications of eﬃcient sco re tests, it is im- imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 3 po rtant to select the r ight num b er of comp onents in the test statistic (see Bick el and Rito v (1992), Eubank et al. (1993), Kallenber g and Ledwina (19 95), F an (1996)). Thus, we provide corr esp onding reﬁnement of our tes ts. F ollowing the solution prop osed in Ka lle nber g (200 2), we make our tests data- driven, i.e., the tests are capable to c ho ose a reaso nable num b er of comp onents in the test statistics automatically by the data. In Section 2, we for mulate the s imple deconv olution pro blem. In Section 3, we construct the score tests for the parametric deconv olution hypo thesis. In Section 5 , we prove consistency of our tests aga ins t nonparametr ic alternatives. In Section 6, w e turn to the decon volution with an unknown er r or densit y . W e derive the eﬃcient scor es for the comp osite par ametric deco nv olution hypo thesis in Section 7. In Section 8, w e co nstruct the eﬃcien t sco re tests for this case. In Section 9, we make our tests data-driven. In Section 10, we prove consistency of the tests against nonpa rametric a lter natives. Additionally , in Sections 5 a nd 10, w e explicitly characterize the class of nonparametric alternatives such tha t our tests are inco nsistent a nd therefore shouldn’t b e use d for testing against the alternatives from this class. So me simple ex a mples of applica tions of the theory are also presented in this pap er. 2. Notation and basic assumptions The problem of testing whether i.i.d. r eal-v alued random v ar iables X 1 , . . . , X n are distributed according to a given density f is classical in s ta tistics. W e con- sider a mor e diﬃcult problem, namely the c a se when X i can only b e o bs erved with a n additiona l noise term, i.e., instead of X i one o bserves Y i , where Y i = X i + ε i , and ε ′ i s ar e i.i.d. with a k nown density h with resp ect to the Leb esgue mea sure λ ; also X i and ε i are indep endent for each i and E ε i = 0 , 0 < E ε 2 < ∞ . F or brevity of nota tio n say that X i , Y i , ε i hav e the same distribution as random v a riables X , Y , ε co r resp ondingly . Assume that X has a density with re s pec t to λ. imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 4 Our null hypo thesis H 0 is the simple hypothesis that X has a known density f 0 with res pec t to λ. The most ge neral pos sible nonpar a metric alternative hy- po thesis H A is that f 6 = f 0 . Since this c lass of a lternatives is to o broa d, ﬁrs t we would b e concerned with a sp ecial class of submo dels of the mo del describ ed ab ov e. In this pap er we will at ﬁrst assume that all p ossible alternatives from H A belo ng to some pa rametric family . Then we will prop ose a test that is ex- pec ted to b e a symptotically optimal (in s ome sense) aga ins t the alter natives from this para metr ic family . How ever, we will prov e that our test is consis tent also aga ins t other alternatives even if they do not b elong to the initial para met- ric family . The test is therefore a pplica ble in many nonpara metr ic pro blems. Moreov er, the test is expected to b e asymptotically optimal (in s ome s e nse) for testing ag ainst an inﬁnite num b er of directions of nonpara metric a lter natives (see Inglo t and Ledwina (1996)). This is the general pla n for our construction. 3. Score test for sim ple deconv ol ution Suppo se that all p os s ible densities of X b elong to some pa rametric family { f θ } , where θ is a k − dimensional Euclidean para meter, Θ ∈ R k is a parameter set. Then all the possible densities q ( y ; θ ) of Y hav e in suc h mo del the form q ( y ; θ ) = Z R f θ ( s ) h ( y − s ) ds . (1) The sc or e function ˙ l is deﬁned as ˙ l ( y ; θ ) =  q ( θ )  ′ θ q ( θ ) 1 [ q ( θ ) > 0] , (2) where q ( θ ) := q ( y ; θ ) and l ( θ ) := l ( y ; θ ) fo r br evity . The Fisher information matrix of par ameter θ is deﬁned as I ( θ ) = Z R ˙ l ( y ; θ ) ˙ l T ( y ; θ ) dQ θ ( y ) . (3) Deﬁnition 1. Call our pro blem a r e gular de c onvolution pr oblem if imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 5 h B1 i for a ll θ ∈ Θ q ( y ; θ ) is contin uously diﬀere ntiable in θ for λ − almost all y with gradient ˙ q ( θ ) h B2 i   ˙ l ( θ )   ∈ L 2 ( R , Q θ ) for all θ ∈ Θ h B3 i I ( θ ) is nonsingula r for all θ ∈ Θ and contin uous in θ . If θ is a true parameter v alue, call such mode l GM k ( θ ) a nd denote by Q θ the probability distribution function and by E θ the exp ectation co rresp onding to the densit y q ( · ; θ ) . If conditions h B 1 i − h B 3 i holds, then by Prop osition 1, p.13 of Bick el et al. (1993) we calculate fo r all y ∈ supp q ( · ; θ ) ˙ l ( θ ) = ˙ l ( y ; θ ) =  q ( y ; θ )  ′ θ q ( y ; θ ) = ∂ ∂ θ R R f θ ( s ) h ( y − s ) ds R R f θ ( s ) h ( y − s ) ds . (4) Then for y ∈ supp q ( · ; θ ) the eﬃcient sc or e ve ctor for testing H 0 : θ = 0 is l ∗ ( y ) := ˙ l ( y ; 0) = ∂ ∂ θ  R R f θ ( s ) h ( y − s ) ds     θ =0 R R f 0 ( s ) h ( y − s ) ds . (5) Set L = { E 0 [ l ∗ ( Y )] T l ∗ ( Y ) } − 1 (6) and U k =  1 √ n n X j =1 l ∗ ( Y j )  L  1 √ n n X j =1 l ∗ ( Y j )  T . (7) Theorem 1. F or the r e gular de c onvolution pr oblem the eﬃcient sc or e ve ctor l ∗ for testing θ = 0 in GM k ( θ ) is given for al l x ∈ R by (5). Mor e over, under H 0 : θ = 0 we have U k → d χ 2 k as n → ∞ . W e construct the test based on the test statistic U k as follows: the null hy- po thesis H 0 is rejected if the v alue of U k exceeds standar d critical p oints for χ 2 k − distribution. Note that w e do not need to estimate the scores l ∗ . imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 6 Corollary 2. If the de c onvolution pr oblem is r e gular and f θ ( · ) is diﬀer entiable in θ for al l θ ∈ Θ , then the c onclusions of The or em 1 ar e valid and the eﬃcient sc or e ve ctor for test ing H 0 : θ = 0 c an b e c alculate d by the formula l ∗ ( y ) = R R  ∂ ∂ θ f θ ( s )    θ =0 h ( y − s ) ds R R f θ ( s ) h ( y − s ) ds . (8) Example 1. Consider one imp ortant s pec ial ca se. Assume that each submo del of interest is given by the following restriction: a ll p oss ible densities f of X belo ng to a parametric exp onential family , i.e., f = f θ for s ome θ , where f θ ( x ) = f 0 ( x ) b ( θ ) exp ( θ ◦ u ( x )) , (9) where the symbol ◦ denotes the inner pro duct in R k , u ( x ) = ( u 1 ( x ) , . . . , u k ( x )) is a vector of known Leb esg ue measur able functions, b ( θ ) is the normalizing factor and θ ∈ Θ ⊆ R k . W e ass ume that the standar d r egularity ass umptions o n exp onential families (see Bar ndorﬀ-Nielsen (1978)) a r e satisﬁed. All the p ossible densities q ( y ; θ ) of Y hav e in suc h mo de l the form q ( y ; θ ) = Z R f 0 ( s ) b ( θ ) ex p( θ ◦ u ( s )) h ( y − s ) ds . (10) These densities no long er nee d to for m an ex p o ne ntial family . If we assume, for e xample, tha t h > 0 λ − almost ev erywhere on R and the functions f 0 , h, u 1 , . . . , u k are b ounded and λ − measurable and that there exists an op en subset Θ 1 ⊆ Θ such that   ˙ l ( y ; θ )   ∈ L 2 ( Q θ ) and the Fis he r information matrix I (Θ) is nonsingular and contin uous in θ , then co nditions h B 1 i − h B 3 i are satisﬁed for this pro blem and the previo us results are a pplicable. The sco re vector for the problem is l ∗ ( y ) = R R u ( s ) f 0 ( s ) h ( y − s ) ds R R f 0 ( s ) h ( y − s ) ds − Z R u ( s ) f 0 ( s ) ds . (11) In other words, if w e denote by ∗ the standar d con volution of functions, l ∗ ( y ) = ( uf 0 ) ∗ h f 0 ∗ h ( y ) − E 0 u ( X ) . (12) imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 7 Let L b e deﬁned by (6) and V k =  1 √ n n X j =1 l ∗ ( Y j )  L  1 √ n n X j =1 l ∗ ( Y j )  T . (13) This is the score test statistic designed to b e a symptotically optimal for testing H 0 against the alternatives from the exp onential family (9). Its a symptotic distribution under the n ull h yp o thesis H 0 is given by T he o rem 1. 4. Selection rule F or the use o f score tests in clas s ical hypotheses tes ting it was sho wn (see the Int ro duction) tha t it is imp ortant to sele ct the right dimension k o f the space of po ssible alterna tives. I nco rrect choice o f the model dimension can substantially decrease the p ow er o f a test. In Section 5 we give a theoretical explanation o f this fact fo r the case of deconv olution. The p oss ible s o lution of this pro ble m is to incorp or ate the test s ta tistic of interest by some pro cedure (called a se- lection r ule ) that cho oses a r easonable dimension of the mo del automa tically by the da ta. See K allenberg (2002) fo r an extensive discussion a nd practical examples. In this sectio n we implemen t this ide a for tes ting the deconv olution hypothesis. First we give a deﬁnition of selection rule, g eneralizing ideas from Inglot a nd Ledwina (2006). Denote by M k ( θ ) the mo del descr ibed in Sectio n 3 such that the true pa - rameter θ b e longs to the par ameter set, say Θ k , and dim Θ k = k . By a n este d family of submo dels M k ( θ ) for k = 1 , 2 , . . . we mea n a sequence of these mo dels such that for their para meter sets it holds that Θ 1 ⊂ Θ 2 ⊂ . . . . Deﬁnition 2. Consider a nested fa mily of submo dels M k ( θ ) for k = 1 , . . . , d, where d is ﬁxed but otherwise arbitrary . Cho ose a function π ( · , · ) : N × N → R , where N is the se t of na tural num be r s. Assume that π (1 , n ) < π (2 , n ) < . . . < π ( d, n ) for all n a nd π ( j, n ) − π (1 , n ) → ∞ as n → ∞ for every j = 2 , . . . , d. Call π ( j, n ) a p en alty att ribute d to the j-th mo del M j ( θ ) and t he sample size n. Then a sele ction rule S for the test statistic U k is an integer-v alued rando m v ariable imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 8 satisfying the condition S = min  k : 1 ≤ k ≤ d ; U k − π ( k , n ) ≥ U j − π ( j, n ) , j = 1 , . . . , d  . (14) W e call U S a data-driven eﬃcient sc or e test statistic for testing v alidity o f the initial model. F r o m Theore m 3 b elow it follows that for o ur problem (as well as in the classical case, see Kallenberg (2002)) many p ossible p enalties lea d to consistent tests. So the c hoice o f the p enalty should b e dictated by exter nal pr actical consideratio ns. O ur simulation study is not so v ast to recommend the mo st practically suitable p enalty for the deconv olution problem. Possible choices are, for example, Sch w arz ’s p ena lt y π ( j, n ) = j log n , or Ak aike’s p enalty π ( j, n ) = j. Denote by P n 0 the pr obability mea sure co rresp onding to the cas e when X 1 , . . . , X n all hav e the density f 0 . F or simplicity of notation we will further sometimes o mit index ”n” and wr ite simply P 0 . The main result about the asymptotic n ull dis- tribution of U S is the following Theorem 3 . Supp ose that assumptions h B 1 i − h B 3 i holds. Then un der t he n u l l hyp othesis H 0 it holds that P n 0 ( S > 1 ) → 0 and U S → d χ 2 1 as n → ∞ . R emark 4 . The s election r ule S can b e mo diﬁed in or der to make it p ossible to choose not only mo dels of dimensio n less than some ﬁxed d but to allow arbitrar y large dimensio ns of M k ( θ ) as n gr ows to inﬁnity . In this case a n ana lo gue o f Theorem 3 still holds, but the pro of b ecomes more tech nical and one should take care ab out the p ossible rates of growth of the mo del dimension. Tho ugh, one can ar gue that even d = 10 is often eno ugh fo r practical purp oses (see Kallenberg and Ledwina (1995)). 5. Consistency of tests Let F b e a true distribution function of X . Here F is not necessa rily par ametric and p ossibly do esn’t hav e a density with r e s pe c t to λ. Let us choo se for every k ≤ d an auxiliary parametric family { f θ } , θ ∈ Θ ⊆ R k such that f 0 from this imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 9 family coincides with f 0 from the n ull h yp othesis H 0 . Suppose that the c hosen family { f θ } gives us the reg ular deco nv olution problem in the sense of Deﬁnition 1. T he n one is a ble to cons truct the scor e test statistic U k deﬁned by (7) despite the fact tha t the true F p o ssibly has no r e la tion to the chosen { f θ } . One can use the exp onential family from Exa mple 1 as { f θ } , or some other parametr ic family whatever is co n venien t. This is our goal in this se ction to determine under what conditions th us build U k will be consistent for testing against F. Suppo se that the follo wing c ondition ho lds h D1 i there exists an in teger K ≥ 1 suc h tha t K ≤ d and E F l ∗ 1 = 0 , . . . , E F l ∗ K − 1 = 0 , E F l ∗ K = C K 6 = 0 , where l ∗ i is the i − th coor dina te function o f l ∗ and l ∗ is deﬁned b y (5), d is the maximal p ossible dimension o f our mo del as in Deﬁnition 2 of Section 4, and E F denotes the mathematical exp ectation with resp ect to F ∗ h. Condition h D 1 i is a weak ana log o f nondegenera cy: if for all k h D 1 i fails, then F is o rthogonal to the who le s ystem { l ∗ i } ∞ i =1 , and if this sys tem is complete, then F is deg enerate. Also h D 1 i is rela ted to the iden tiﬁability of the mo del (see the beg inning of Section 10 for more details). W e s ta rt with inv estigation of consis tency of U k , where k is some ﬁxed num- ber , 1 ≤ k ≤ d. The following result s hows why it is imp ortant to cho ose the right dimension o f the mo del. Prop ositio n 5. L et h D 1 i holds. Then for al l 1 ≤ k ≤ K − 1 , if F is the tru e distribution function of X , then U k → d χ 2 k as n → ∞ . This result and Theorem 1 show that if the dimension o f the mo del is to o small, then the test doesn’t work s inc e it do es n’t dis ting uish betw een F and f 0 . Prop ositio n 6. L et h D 1 i holds. Then for k ≥ K, if F is the true distribution function of X , then U k → ∞ in pr ob ability as n → ∞ . Now we turn to the da ta-driven sta tistic U S . Supp ose that the se lection r ule S is deﬁned as in Section 4. Assume that imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 10 h S1 i for every ﬁxed k ≥ 1 it holds that π ( k, n ) = o ( n ) a s n → ∞ . Denote by P F the probability mea sure corre s po nding to the case when X 1 , . . . , X n all hav e the distributio n F . Co ns ider consistency of the ”adaptive” test based on U S . Prop ositio n 7. L et h D 1 i and h S 1 i holds. If F is the true di stribution function of X , then P F ( S ≥ K ) → 1 and U S → ∞ as n → ∞ . The main res ult of this section is the following Theorem 8. 1. The t est b ase d on U k is c onsistent for testing against al l alternative dis- tributions F such that h D 1 i is satisﬁe d with K ≤ k 2. The test b ase d on U k is inc onsistent for testing against al l alternative distributions F such that h D 1 i is satisﬁe d with K > k 3. If the sele ction rule S satisﬁes h S 1 i , then test b ase d on U S is c onsistent against al l alternative distributions F s u ch that h D 1 i is satisﬁe d with some K. 6. Comp os ite deconv ol u ti on In the pr evious sections, we tr eated the simplest case of the deconvolution pro b- lem. The nex t s ections a re dev oted to the more realistic ca se o f unk nown error density . Our main ideas and constructions will be similar to the ones fo r the simple case. Our goa l is to mo dify the technics and constructions from the sim- ple h ypo thesis ca se in order to a pply them in the new situation. I n order to do this we will have to imp ose on our new mo del additional regula rity ass umptions of uniformity . These assumptions ar e quite standa rd in statistics. They are a necessary paymen t for our abilit y to keep simple and general constructions for the more complica ted problem. W e will have to mo dify the sc ores we used in the simple case. The mo diﬁca tion we will use is called e ﬃcie n t scores. imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 11 Despite of all the changes, we will still b e able to build a selection r ule for the new problem. W e will need a new and mo diﬁed deﬁnition of the s e lection rule. Big par t of the new mo del uniformity assumptions will b e needed not to build a n eﬃcient sco re test, but to make such test data-driven (see section 9). Consider the s ituation describ ed in the ﬁrst para graph o f Section 2, but with the following complication introduced. Suppo se further o n that the density h of ε is unknown . Then the mos t general possible null hypothes is H 0 in this setup is that f = f 0 and the err or ε has e x pec ta tion 0 and ﬁnite v aria nce. The most general alternative hypothesis H A is that f 6 = f 0 . Since b oth H 0 and H A are in this case to o bro ad, we would ﬁr st co nsider a sp ecial class o f submo dels of the mo del describ ed ab ov e. At ﬁrst we a s sume that all p ossible dens ities f o f X b elong to some sp eciﬁc and pr eassigned pa rametric family { f θ } , i.e., f = f θ for s ome θ and θ is a k − dimensiona l Euclidian par a meter and Θ ⊆ R k is a para meter se t for θ. Our sta r ting assumption ab out the density of the er ror ε will be that h belo ngs to some sp eciﬁc parametric family { h η } , where η ∈ Λ a nd Λ ⊆ R m is a parameter set. Thus, η is a nu isance par ameter. The null h ypo thesis H 0 is the following comp osite h ypo thesis: X has pa rticular density f 0 with r esp ect to λ. Then we will prop os e a test that is exp ected to be asymptotically optimal (in some sense) fo r testing in this parametric situation. After that we will pr ov e that our test is consistent also agains t a wide cla ss of nonparametr ic alterna tives. Moreov er, the test is expected to b e asymptotically optimal (in some s ense) for testing aga ins t an inﬁnite num ber of directions of no npa rametric alterna tives. This is essentially the same plan as for the simple case. If ( θ , η ) is a true parameter v alue, we call such s ubmo de l M k,m ( θ, η ) . Denote in this ca se the density of Y b y g ( · ; ( θ , η )) and the cor resp onding exp ectation by E ( θ ,η ) . Let the n ull hypothesis H 0 be θ = θ 0 , where it is assumed that θ 0 ∈ Θ . Then the alternative h yp othesis θ 6 = θ 0 is a par a metric s ubset of the or iginal general a nd nonparametric alterna tive hypothesis H A . imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 12 7. Eﬃcient scores All p o s sible densities g ( y ; ( θ, η )) of Y have in our mo del the form g ( y ; ( θ , η )) = Z R f θ ( s ) h η ( y − s ) ds . (15) It is not always p oss ible to iden tify θ or/ and η in this model. Since we are concerned with testing h yp otheses and no t with estimation of par ameters, it is not necessary for us to imp ose a re strictive assumption of identiﬁabilit y on the mo del. W e will need only a (weak er) consistency c o ndition to build a sensible test (see Section 10). The sc or e function for ( θ , η ) at ( θ 0 , η 0 ) is deﬁned a s (see Bic kel e t al. (1993), p.28): ˙ l θ 0 ,η 0 ( y ) =  ˙ l θ 0 ( y ) , ˙ l η 0 ( y )  , (16) where ˙ l θ 0 is the score function for θ at θ 0 and ˙ l η 0 is the score funct ion fo r η at η 0 , i.e. ˙ l θ 0 ( y ) = ∂ ∂ θ  g ( y ; ( θ , η 0 ))    θ = θ 0 g ( y ; ( θ 0 , η 0 )) 1 [ y : g ( y ;( θ 0 ,η 0 )) > 0] (17) = ∂ ∂ θ  R R f θ ( s ) h η 0 ( y − s ) ds     θ = θ 0 R R f θ 0 ( s ) h η 0 ( y − s ) ds 1 [ y : g ( y ;( θ 0 ,η 0 )) > 0] , ˙ l η 0 ( y ) = ∂ ∂ η  g ( y ; ( θ 0 , η ))    η = η 0 g ( y ; ( θ 0 , η 0 )) 1 [ y : g ( y ;( θ 0 ,η 0 )) > 0] (18) = ∂ ∂ η  R R f θ 0 ( s ) h η ( y − s ) ds     η = η 0 R R f θ 0 ( s ) h η 0 ( y − s ) ds 1 [ y : g ( y ;( θ 0 ,η 0 )) > 0] . The Fish er information matrix of pa rameter ( θ , η ) is deﬁned as I ( θ , η ) = Z R ˙ l T θ ,η ( y ) ˙ l θ ,η ( y ) dG θ ,η ( y ) , (19) imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 13 where G θ ,η ( y ) is the probability measure cor resp onding to the density g ( y ; ( θ , η )) . The symbol ’T ’ denotes the tr ansp osition a nd all vectors a re supp osed to be row ones. W e a ssume that M k,m ( θ, η ) is a r egular parametric mo del in the sense of the following deﬁnition. Deﬁnition 3. Call our pro blem a r e gular de c onvolution pr oblem if h A1 i for a ll ( θ, η ) ∈ Θ × Λ g ( y ; ( θ , η )) is con tin uously diﬀeren tiable in ( θ, η ) for λ − a lmost a ll y h A2 i   ˙ l ( θ , η )   ∈ L 2 ( R , G θ ,η ) for all ( θ , η ) ∈ Θ × Λ h A3 i I ( θ, η ) is nonsingular for all ( θ, η ) ∈ Θ × Λ and co nt inuous in ( θ, η ) . This is a joint regula rity condition and it is stronger than the assumption that the mo del is regula r in θ and η sepa rately . Let us wr ite I ( θ 0 , η 0 ) in the blo ck matrix form: I ( θ 0 , η 0 ) =   I 11 ( θ 0 , η 0 ) I 12 ( θ 0 , η 0 ) I 21 ( θ 0 , η 0 ) I 22 ( θ 0 , η 0 )   , (20) where I 11 ( θ 0 , η 0 ) is k × k , I 12 ( θ 0 , η 0 ) is k × m, I 21 ( θ 0 , η 0 ) is m × k , I 11 ( θ 0 , η 0 ) is m × m. Thus, denoting for simplicity of formulas Ω := [ y : g ( y ; ( θ 0 , η 0 )) > 0] we ca n write explicitly I 11 ( θ 0 , η 0 ) = E θ 0 ,η 0 ˙ l T θ 0 ˙ l θ 0 = Z R ˙ l T θ 0 ( y ) ˙ l θ 0 ( y ) dG θ 0 ,η 0 ( y ) (21) = Z Ω ∂ ∂ θ  R R f θ ( s ) h η 0 ( y − s ) ds  T    θ = θ 0 ∂ ∂ θ  R R f θ ( s ) h η 0 ( y − s ) ds     θ = θ 0 R R f θ 0 ( s ) h η 0 ( y − s ) ds dy , imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 14 I 12 ( θ 0 , η 0 ) = E θ 0 ,η 0 ˙ l T θ 0 ˙ l η 0 = Z R ˙ l T θ 0 ( y ) ˙ l η 0 ( y ) dG θ 0 ,η 0 ( y ) (22) = Z Ω ∂ ∂ θ  R R f θ ( s ) h η 0 ( y − s ) ds  T    θ = θ 0 ∂ ∂ η  R R f θ 0 ( s ) h η ( y − s ) ds     η = η 0 R R f θ 0 ( s ) h η 0 ( y − s ) ds dy , and a nalogous ly for I 21 ( θ 0 , η 0 ) and I 22 ( θ 0 , η 0 ) . The eﬃcient sc or e function for θ in M k,m ( θ, η ) is deﬁned a s (see Bic kel et al. (19 9 3), p.2 8): l ∗ θ 0 ( y ) = ˙ l θ 0 ( y ) − I 12 ( θ 0 , η 0 ) I − 1 22 ( θ 0 , η 0 ) ˙ l η 0 ( y ) , (23) and the eﬃcient Fisher information matrix for θ in M k,m ( θ, η ) is deﬁned a s I ∗ θ 0 = E θ 0 ,η 0 l ∗ T θ 0 l ∗ θ 0 = Z R l ∗ θ 0 ( y ) T l ∗ θ 0 ( y ) dG θ 0 ,η 0 ( y ) . (24) Before closing this section we cons ider tw o simple examples. Example 2. Suppose θ ∈ R , η ∈ R + and, mor eov er, { f θ } is a family { N ( θ , 1) } of normal densities with mean θ and v a riance 1, a nd { h η } is a family { N (0 , η 2 ) } . Then g ( θ, η ) = f θ ∗ h η ∼ N ( θ , η 2 + 1) . Let θ be the para meter o f interest and η the n uisance o ne . Let H 0 be θ = θ 0 . By (17) and (18) for all y ˙ l θ 0 ( y ) = y − θ 0 η 2 0 + 1 , ˙ l η 0 ( y ) = ( y − θ 0 ) 2 η 0 ( η 2 0 + 1) 2 − η 0 η 2 0 + 1 . (25) By (22) I 12 ( θ, η ) = Z R y − θ η 2 + 1  ( y − θ ) 2 η ( η 2 + 1) 2 − η η 2 + 1  dN ( θ, η 2 + 1)( y ) = 0 , for all θ , η . This means that adaptive e stimation of θ is po ssible in this model, i.e., we ca n estimate θ equally well whether we know the tr ue η 0 or not. Thoug h, we will no t b e concerned with estimation here. F rom (21) we ge t ( I ∗ θ ) − 1 = Z R ( y − θ ) 2 ( η 2 + 1) 2 dN ( θ, η 2 + 1)( y ) = 1 η 2 + 1 . (26) imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 15 Example 3. Supp ose now that we are interested in the par ameter η in the situation of Example 2 and the null h yp othesis is H 0 : η = η 0 . Ther e is a sort of symmetry b etw een signal and nois e: ”what is a sig nal for one p e rson is a noise for the o ther” (se e also Remark 9). F rom Ex ample 2 w e know that the score function ˙ l η 0 for η at η 0 is given by (25). Since w e proved for this example I 12 = I 21 = 0 , the eﬃcient s core function l ∗ η 0 for η at η 0 is g iven by (25) as well. W e calculate no w ( I ∗ η 0 ) − 1 = Z R  ( y − θ ) 2 η 0 ( η 2 0 + 1) 2 − η 0 η 2 0 + 1  2 dN ( θ, η 2 0 + 1)( y ) =: 1 C ( η 0 ) . (27) The constant C ( η 0 ) in (27) can be ex pressed explicitly in terms of η 0 , but this is not the p oint of this example. By the s y mmetry of θ and η we hav e l ∗ η 0 ( y ) = ˙ l η 0 ( y ) − I 21 ( θ, η 0 ) I − 1 11 ( θ, η 0 ) ˙ l θ 0 ( y ) = ˙ l η 0 ( y ) . R emark 9 . No te that the problem is symmetric in θ and η in the se ns e that it is po ssible to cons ide r estimating and testing for ea ch para meter, θ or η . Physically this means that from the nois y signal one ca n recover some ”infor mation” not only ab out the pure signal but also ab out the noise. This is actually natura l since a no ise is in fact also a signal. W e are observing t w o signals at once. The paymen t for this p ossibility is that except for so me trivial cases one ca n’t rec over full infor mation a bo ut both the signa l of in terest as well as a b o ut the noise. 8. Eﬃcient score test Let l ∗ θ 0 be deﬁned by (23) and I ∗ θ 0 by (24). Note tha t b oth l ∗ θ 0 and I ∗ θ 0 depe nds (at least in principle) on the unknown n uisance par ameter η 0 . Let l ∗ j and L b e some estimators of l ∗ θ 0 ( Y j ) and ( I ∗ θ 0 ) − 1 corres p o nding ly . Thes e es timators a re supp os ed to dep end only on the observ able Y 1 , . . . , Y n , but not o n the X 1 , . . . , X n . Deﬁnition 4. W e say that l ∗ j is a suﬃciently go o d e stimator of l ∗ θ 0 ( Y j ) if for each ( θ 0 , η 0 ) ∈ Θ × Λ it ho lds that for ev ery ε > 0 G n θ 0 ,η 0  1 √ n     n X i =1 ( l ∗ j − l ∗ θ 0 ( Y j ))     ≥ ε  → 0 as n → ∞ , (28) where k · k denotes the Euclidian nor m of a g iven vector. imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 16 In o ther words, condition (2 8) means that the av erag e 1 n P n i =1 l ∗ θ 0 ( Y j ) ≈ E θ 0 ,η 0 l ∗ θ 0 is √ n − consistently e stimated. W e illustrate this deﬁnition by so me examples . Example 2 (con tinued) . W e ha ve (denoting v ariance of Y by σ 2 ( Y )): l ∗ θ 0 ( Y j ) = Y j − θ 0 σ 2 ( Y ) . Deﬁne l ∗ j := Y j − θ 0 b σ 2 n , where b σ 2 n is a ny √ n − consistent estimator o f the v ariance of Y . One can take, for example, the sample v ariance s 2 n = s 2 n ( Y 1 , . . . , Y n ) as such an estimate. Then, since by the mo del assumptions σ 2 ( Y ) > 0 , thus constructed l ∗ j satisﬁes Deﬁni- tion 4. See Appendix for the pro of.  Example 3 (con tinued) . W e ha ve in this case l ∗ η 0 ( Y j ) = η 0 η 2 0 + 1 ( Y j − θ 0 ) 2 − η 0 η 2 0 + 1 . F or s implicity of notations we write l ∗ η 0 ( Y j ) = C 1 ( η 0 )( Y j − θ 0 ) 2 − C 2 ( η 0 ) . Let b θ n be a ny √ n − consistent estimate of θ 0 and put l ∗ j := C 1 ( η 0 )( Y j − b θ n ) 2 − C 2 ( η 0 ) . Then Deﬁnition 4 is satisﬁed in this Exa mple also. This is pr ov ed in App endix.  Deﬁnition 4 reﬂects the basic idea of the method o f estimated scores. This metho d is widely used in statistics (see Bic kel et al. (1993), Sc hick (19 86), Ibragimov and Ha s ′ minski ˘ ı (198 1), Inglot a nd Ledwina (2006) a nd o thers). These authors show that for diﬀerent problems it is po ssible to construct nontrivial parametric, semi- and nonpa rametric estimators of scores such tha t these esti- mators will sa tis fy (28). Deﬁnition 5. Deﬁne W k =  1 √ n n X j =1 l ∗ j  b L  1 √ n n X j =1 l ∗ j  T , (29) where b L is an estimate of ( I ∗ θ 0 ) − 1 depe nding only o n Y 1 , . . . , Y n . Note that l ∗ j is a k − dimensional vector and b L is a k × k matrix. W e ca ll W k the eﬃcient sc or e imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 17 test statistic for testing H 0 : θ = θ 0 in M k,m ( θ, η ) . It is a ssumed tha t the null hypothesis is rejected for lar ge v alues of W k . Normally it should b e p ossible to construct reas onably go od estimator s b η n of η b y standard metho ds since a t this p oint our co nstruction is pa rametric. After that it would be enoug h to plug in these es tima tes in (23) and get the desir ed l ∗ ′ j s sa tisfying (28). Example 2 (con tin ued). Let b σ 2 ( Y ) b e any √ n − consistent estimate of η 2 + 1 such tha t this estimate is ba sed o n Y 1 , . . . , Y n . Then by (26), (25) and deﬁnition (29) the eﬃcien t score test s tatistic for testing H 0 : θ = θ 0 (in the mo del M 1 , 1 ( θ, η )) is W 1 =  1 √ n n X j =1 Y j − θ 0 b σ 2 n ( Y )  2 b σ 2 n ( Y ) = 1 b σ 2 n ( Y )  1 √ n n X j =1 ( Y j − θ 0 )  2 . (30) Example 3 (con tinued). Using any √ n − consistent estimate b θ of θ , we get the eﬃcien t s c o re test statistic W 1 =  1 √ n n X j =1  ( Y j − b θ n ) 2 η 0 ( η 2 0 + 1) 2 − η 0 η 2 0 + 1   2 C ( η 0 ) =  1 √ n η 0 ( η 2 0 + 1) 2 n X j =1 ( Y j − b θ n ) 2 − √ n η 0 η 2 0 + 1  2 C ( η 0 ) . (31) R emark 1 0 . W e make now the following rema rk to av oid p ossible c o nfusions. F o r the simple deconvolution we had the score test statistics and now we ha ve the eﬃcient sco re test statistics. This do es not mean that the statistics for s imple deconv olution is ” ineﬃcient ”. Here the word ” eﬃcient” has a strictly technical meaning. Becaus e of the pr esence of the nuisance para meter we have to extract information ab out the parameter of interest. W e wan t to do this eﬃciently in some s ense. This is the explanation of the terminology . The following theore m descr ib es asymptotic b ehavior of W k under the null hypothesis. Theorem 11. A ssume t he n u l l hyp othesis H 0 : θ = θ 0 holds true, h A 1 i - h A 3 i ar e fulﬁ l le d, (28) is satisﬁe d, and b L is any c onsistent estimate of ( I ∗ θ 0 ) − 1 . Then imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 18 W k → d χ 2 k as n → ∞ , wher e χ 2 k denotes a r andom variable with c entr al chi-squar e distribution with k de gr e es of fr e e dom. 9. Selection rule In this sectio n we e x tend the constr uction of Section 4 to the ca se o f comp osite hypotheses. First we give a general deﬁnition of a selection rule. Denote by M k,m ( θ, η ) the mo del describ ed in Section 6 and such that the true parameter ( θ , η ) b elongs to a parameter se t, say Θ k × Λ , and dim Θ k = k . By a neste d family o f submodels M k,m ( θ, η ) for k = 1 , . . . we w ould mean a s equence of these mo dels such that for their par ameter sets it holds that Θ 1 × Λ ⊂ Θ 2 × Λ ⊂ . . . . Deﬁnition 6. Consider a nested family o f submo dels M k,m ( θ, η ) for k = 1 , . . . , d, where d is ﬁxe d but otherwise arbitra ry , and m is ﬁxed. Cho o se a function π ( · , · ) : N × N → R , where N is the set of natural n umbers. Assume that π (1 , n ) < π (2 , n ) < . . . < π ( d, n ) for all n and π ( j, n ) − π (1 , n ) → ∞ as n → ∞ for every j = 2 , . . . , d. Call π ( j, n ) a p enalty att r ibut e d to the j-th mo del M j ( θ ) and t he sample size n. Then a sele ction rule S ( l ∗ ) for the test sta tistic W k is a n int eger -v a lued ra ndom v ariable sa tisfying the condition S ( l ∗ ) = min  k : 1 ≤ k ≤ d ; W k − π ( k , n ) ≥ W j − π ( j, n ) , j = 1 , . . . , d  . (32) W e call the random v ariable W S a data-driven eﬃcient sc or e test statistic for testing v alidit y of the initial mo del. In this pap er we als o a ssume that the following condition holds. h S1 i for every ﬁxed k ≥ 1 it holds that π ( k, n ) = o ( n ) a s n → ∞ . imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 19 Unlik e the case of the simple null hypothesis, in the case of the co mpo site hypotheses the selection rule dep ends o n the estimator l ∗ j of the unknown v a lues l ∗ θ 0 ( Y j ) of the eﬃcien t score function. This means that we need to estimate the n uisance pa rameter η, or corres po nding scores, or their sum. Surpr ising result follows from Theor em 12 below: for our pro blem many p ossible p ena lties and, mor eov er, ess ent ially a ll sens ible estimators plugged in W k , give consistent selection rules. Possible choices of p enalties are, for instance, Shw arz’s penalty π ( j, n ) = j log n, or Ak aike’s p enalty π ( j, n ) = j. Denote by P n θ 0 ,η 0 the probability mea sure corr esp onding to the case when X 1 , . . . , X n all hav e the dens it y f ( θ 0 , η 0 ) . The ma in result a bo ut the a symptotic nu ll distr ibution of W S is the following theorem (it is proved analog ously to Theorem 3). Theorem 12. Under the c onditions of The or em 11, as n → ∞ it holds that P n θ 0 ,η 0 ( S ( l ∗ ) > 1) → 0 and W S → d χ 2 1 . Condition (28) is what makes this direct refere nce to the case of the simple hypothesis po ssible. Estimatio n of the eﬃcient scor e function l ∗ θ 0 can b e done by diﬀerent w ays. First wa y is to estimate the who le expression from the righ t side o f (23 ). F or this metho d of estimation condition (28) is na tural. The s e cond and probably more conv enient metho d of estimating l ∗ θ 0 is via estimation of the nu isance par ameter η by so me estimato r b η . But fo r this appr oach co ndition (28) bec omes something that hav e to b e proved for each particula r es tima tor. W e hop e that this inconv enience is ex cused by the fact that we are only introducing the new test here. It is p os sible to refo rmulate condition (28) explicitly in terms of conditions on b η, { f θ } , a nd { h η } (see an analogue in Inglot et al. (19 97)). R emark 1 3 . The selection rule S ( l ∗ ) ca n b e mo diﬁed in o rder to make it p ossible to choose not o nly mo dels of dimension less than s ome ﬁxed d, but to a llow arbitrar y large dimensions of M k,m ( θ, η ) as the num ber of o bserv ations grows. See Remark 4. R emark 14 . It is p ossible to mo dify the deﬁnition o f selection rule so that b oth dimensions k and m would be selected by the test from the da ta. A co rresp onding imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 20 test statistic will b e o f the form W S , where this time S = ( S 1 , S 2 ) . Pr o ofs of the asymptotic proper ties for this statistic are analo gous to tho se presen ted in this pap er. Possibly this statistic could b e useful since the s itua tion with the noise of an unknown dimension often seems to b e more r ealistic. On the o ther ha nd, this statistic will also hav e some disadv antages. One will ha ve to imp ose mor e stric t assumptions on b oth signa l and noise (including a n ana logue of the double- ident iﬁability assumption). Also the ﬁnal result will b e weaker than the r e s ult of this sec tio n. This will b e a paymen t for a n attempt to extract info r mation ab out a lar ger num ber of para meters from the same amount of obs e rv ations Y 1 , . . . , Y n . 10. Consiste ncy of tests Let F be a true distribution function o f X and H a tr ue distribution o f ε. Here F and H are not necess a rily par ametric and p oss ibly these distr ibution functions do not have densities w ith re sp e c t to the Leb esg ue meas ur e λ. Let us choo se for every k ≤ d an auxiliar y pa r ametric family { f θ } , θ ∈ Θ ⊆ R k such that f 0 from this family co incides with f 0 from the null hypothesis H 0 . Corresp o nding ly , let us ﬁx an integer m and choose an aux ilia ry para metric family { h η } , η ∈ Λ ⊆ R m . Suppo se tha t the chosen families { f θ } a nd { h η } give us the regular deconv olution problem in the sense of Deﬁnition 3. Then o ne is able to construct the scor e test statistic W k deﬁned by (29) despite the fact that the true F and H p ossibly do no t hav e a n y relation to the chosen { f θ } and { h η } . This is our go al in this section to determine under what conditions thus build W k will be consistent for testing against H A . Suppo se that the follo wing c ondition ho lds h C1 i there exists integer K ≥ 1 suc h that K ≤ d and E F ∗ H l ∗ θ 0 (1) = 0 , . . . , E F ∗ H l ∗ θ 0 ( K − 1) = 0 , E F ∗ H l ∗ θ 0 ( K ) = C K 6 = 0 , where l ∗ θ 0 ( i ) is the i − th co ordinate function of l ∗ θ 0 and l ∗ θ 0 is deﬁned by (23), d is the maximal pos s ible dimension of our mo del as in Deﬁnition 3 of Section 9, imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 21 and E F ∗ H denotes the mathematica l exp ectation with r e s pe c t to F ∗ H . Condition h C 1 i is a weak ana log o f nondeg e neracy: if for all k h C 1 i fails, then F is o rthogonal to the whole system l ∗ θ 0 ( i ) ∞ i =1 and if this system is co mplete, then F ∗ H is degenera te. Also h C 1 i is related to the identiﬁabilit y o f the mo del: if the mo del is not iden tiﬁable, then F ∗ H = F 0 ∗ H can happ en and h C 1 i fails. Establishing identiﬁabilit y for the parametric deco nvolution is no t trivia l (see Sclov e and V an Ryzin (19 69), e.g.). It is imp ortant to note a lso that a lthough h C 1 i has something common with both nondegenerac y and identiﬁabilit y , it is in general pretty far from b oth these notions. The main result of this section is the following. Theorem 15. If (28) is satisﬁe d and b L is any c onsistent estimate of ( I ∗ θ 0 ) − 1 , then 1. the t est b ase d on W k is c onsistent for testing against al l alternative di stri- butions F, H such that h C 1 i is satisﬁe d with K ≤ k 2. the test b ase d on W k is inc onsistent for testing against alternative distri- butions F, H such that h C 1 i is satisﬁe d with K > k 3. if the sele ction ru le S ( l ∗ ) satisﬁes h S 1 i , then test b ase d on W S is c onsistent against al l alternative distributions F ∗ H such that h C 1 i is satisﬁe d with some K. Part 2 of Theo rem 15 shows why it is imp ortant to choos e the s uita ble mo del dimension. No w we give tw o sp eciﬁc examples. Example 2 (contin ue d). B y Theo rem 15 the test based o n W 1 is consistent if a nd only if for true F a nd H it holds that 1 η 2 + 1 E F ∗ H ( Y − θ 0 ) 6 = 0 , i.e. E F ∗ H ( Y ) 6 = θ 0 . (33) F or example, W 1 do esn’t work when the true H is symmetric ab out 0 and the true F 6 = F 0 has the mean equal to θ 0 . Example 3 (contin ued). By Theorem 15 W 1 is consistent if a nd only if for true F and H it holds that imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 22 E F ∗ H  ( y − θ ) 2 η 0 ( η 2 0 + 1) 2 − η 0 η 2 0 + 1  6 = 0 , i.e. E F ∗ H ( y − θ ) 2 6 = η 2 0 + 1 , or e q uiv ale n tly V ar F ∗ H Y 6 = V ar F ∗ H 0 Y . (34) Note that co ndition (33) can b e interpreted as ” W 1 is consistent for testing the hypothesis ab out the mean in this mo del iﬀ the exp ectation of Y under a lter- native is diﬀerent fr om the ex pec tation under the n ull h yp othesis” and (34) as ” W 1 is consistent for testing the hypothes is ab o ut the v ariance in this mo del iﬀ the v a riance of Y under alter native is diﬀere nt from the v ariance under the null hypothesis”. O ne cannot exp ect more fro m such a simple test as W 1 . On con- trary , the data - driven test sta tistic W S provides a consistent testing procedur e. Ac kno wledgments Author would like to thank Axel Munk for sugges ting this topic of resea rch and F ado ua Balab dao ui, T a-Chao Kao , Wilb ert K allenberg and Ja nis V aleinis for helpful discussions. References O. B arndorﬀ-Nielse n. Information and ex p onent ial families in statistic al the ory . John Wiley & Sons Ltd., Chichester, 197 8. ISBN 0-47 1-995 45-2. Wiley Ser ies in Probability and Mathematical Sta tistics. P . J. Bick el, C. A. J . Klaas sen, Y. Ritov, and J. A. W ellner . Eﬃcient and adap- tive estimation for semip ar ametric mo dels . Jo hns Hopk ins Series in the Math- ematical Sciences. J ohns Hopkins Universit y Press, Baltimore, MD, 199 3. ISBN 0-8018-4 5 41-6. P . J . Bickel and Y. Ritov. T es ting for go o dnes s of ﬁt: a new approa ch. In Nonp ar ametric st atistics and r elate d topics (Ottawa, ON, 1991) , pages 51–57 . North-Holland, Amsterdam, 1 992. imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 23 P . J. Bickel and M. Ros enblatt. On so me global measures o f the devia tions of density function estimates . Ann. Statist. , 1:1 071–1 095, 19 73. ISSN 0090 -5364 . D. R. Cox and D. V. Hinkley . The or etic al statistics . Chapma n and Hall, Lo ndo n, 1974. A. Delaigle and I. Gijb els. Estimation of integrated s quared density deriv ativ es from a con taminated sample. J. R . Stat. S o c. Ser. B S tat. Metho dol. , 64(4): 869–8 86, 20 02. ISSN 1 369-7 412. R. L. Eubank, J. D. Har t, and V. N. LaRiccia . T es ting go o dness of ﬁt via non- parametric function estimation tec hniques. Comm. Statist. The ory Metho ds , 22(12):33 27–3 3 54, 1 993. ISSN 0 361-0 926. J. F an. T est of signiﬁcanc e based on w av elet thres holding and Neyma n’s trun- cation. J. Amer. Statist. Asso c. , 91(43 4):674– 688, 199 6. ISSN 0162- 1 459. H. Holzmann, N. Bissantz, and A. Munk. Densit y testing in a contaminated sample. J. Multivariate Anal. , 98(1):57–75 , 2007 . I. A. Ibrag imov a nd R. Z. Has ′ minski ˘ ı. Statistic al estimation , volume 16 of Applic ations of Mathe matics . Spring er-V erlag, New Y ork, 1 981. ISBN 0 -387- 90523 -5. Asymptotic theory , T ranslated from the Russian by Sam uel Ko tz. T. Inglo t, W. C. M. K allenberg, and T. Ledwina. Data dr iven smo oth tests for comp osite hypotheses. Ann. S tatist. , 2 5(3):1222 –125 0 , 199 7. ISSN 0090 -5364 . T. Inglot and T. Ledwina. Asymptotic optimality of data - driven Neyma n’s tests for unifor mit y . Ann. Statist. , 24 (5):1982 – 2019 , 19 96. ISSN 0 090-5 364. T. Inglo t and T. Ledwina. Asymptotic optimality of new adaptive test in r e- gressio n mo del. Ann. Inst. H. Poinc ar ´ e Pr ob ab. Statist. , 4 2(5):579– 590, 2006. ISSN 0 246-0 203. W. C. M. Kalle n b erg. The pe nalty in data driven Neyman’s tests. Math. Metho ds Statist. , 1 1(3):323– 340 (2 003), 20 02. ISSN 1 066-5 307. W. C. M. K allenberg and T. Ledwina . Consistency and Monte Carlo simulation of a data driven version of smoo th go o dness -of-ﬁt tests. Ann. Statist. , 2 3(5): 1594– 1608, 19 95. ISSN 0 090-5 364. W. C. M. Kallenber g and T. Ledwina. Data driv en smo o th tests for comp osite hypotheses: co mparison of p ow ers. J. Statist. Comput. S imu lation , 59(2 ): imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 24 101–1 21, 1997 . ISSN 0094-96 55. L. Le Cam. On the asymptotic theor y of estimatio n and testing hypo theses. In Pr o c e e dings of the Thir d Berkeley Symp osium on Mathematic al Statistics and Pr ob ability, 19 54–1955 , vol. I , pa g es 1 29–1 5 6, Ber keley and Los Angeles, 1956. Univ ersity o f Califor nia P ress. T. Le dwina . Data-driven version o f Neyma n’s smo o th test of ﬁt. J. Amer. Statist. Asso c. , 89(427):1000 –1005 , 1 994. ISSN 0162-1 459. J. Ney ma n. Smo oth test for g o odness o f ﬁt. Skand. Aktuarietidskr. , 2 0:150 –199, 1937. J. Neyma n. Optimal asymptotic tests of c o mpo site sta tis tica l hypotheses. In Pr ob ability and statistics: The Har ald Cr am´ er volume (e dite d by Ulf Gr enan- der) , pages 213 –234. Almqvist & Wiksell, Stockholm, 1959. Y. Nikitin. Asymptotic eﬃciency of nonp ar ametric tests . Ca mbridge Universit y Press, C a mbridge, 1995 . ISBN 0-521- 47029 -3. A. Schic k. On asymptotically eﬃcient estimation in semipar ametric mo dels. Ann. S tatist. , 14(3):1139– 1 151, 1986. ISSN 0090-53 64. S. L. Sclove and J. V a n Ryzin. E stimating the par ameters of a con volution. J. R oy. Statist. So c. Ser. B , 31 :181– 1 91, 19 69. ISSN 0 035-9 246. App endix. Pr o of. (Theor em 1). W e calculated the eﬃcient sco re v ector in (4)-(5). By Prop ositio n 1 , p.13 of Bick el et al. (19 93) and our regular ity a ssumptions matr ix L exists and is p ositive deﬁnite and nondeg enerate of ra nk k . Under h B 1 i − h B 3 i E 0 l ∗ ( y ) = 0 (see Bic kel et al. (1993), p.15) and our sta tement follows. Pr o of. (Pr op osition 5). F ollows by the multiv aria te Central Limit Theorem. Pr o of. (Theor em 3) . Denote ∆( k , n ) := π ( k , n ) − π (1 , n ) . F or any k = 2 , . . . , d P n 0 ( S = k ) ≤ P n 0  U k − π ( k , n ) ≥ U 1 − π (1 , n )  ≤ P n 0  U k ≥ π ( k, n ) − π (1 , n )  = P n 0  U k ≥ ∆( k , n )  . imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 25 By Theor em 1 U k → d χ 2 k as n → ∞ , thus for ∆( k , n ) ↑ ∞ as n → ∞ we hav e P n 0  U k ≥ ∆( k, n )  → 0 as n → ∞ , s o for an y k = 2 , . . . , d we hav e P n 0 ( S = k ) → 0 as n → ∞ . This proves that P n 0 ( S ≥ 2) = d X k =2 P n 0 ( S = k ) → 0 , n → ∞ , and so P n 0 ( S = 1) → 1 . Now write for a rbitrary rea l t > 0 P n 0 ( | U S − U 1 | ≥ t ) = P n 0 ( | U 1 − U 1 | ≥ t ; S = 1 ) + d X m =2 P n 0 ( | U m − U 1 | ≥ t ; S = m ) = d X m =2 P n 0 ( | U m − U 1 | ≥ t ; S = m ) . (35) F or m = 2 , . . . , d w e ha ve P n 0 ( S = m ) → 0 , so 0 ≤ d X m =2 P n 0 ( | U m − U 1 | ≥ t ; S = m ) ≤ d X m =2 P n 0 ( S = m ) → 0 as n → ∞ and th us by (35) it follows that U S tends to U 1 in proba bilit y as n → ∞ . But U 1 → d χ 2 1 by Theo r em 1, so U S → d χ 2 1 as n → ∞ . W e shall use the follo wing s tandard lemma from linea r a lgebra. Lemma 16. L et x ∈ R k , and let A b e a k × k p ositive deﬁnite matrix; if for some r e al num b er δ > 0 we have A > δ (in t he s ense that the m atr ix ( A − δ I k × k ) is p ositive deﬁnite, wher e I k × k is t he k × k identity matrix), then for al l x ∈ R k it holds that xAx T > δ k x k 2 . Pr o of. (Pr op osition 6). F r o m h D 1 i b y the law of large num b ers w e get 1 n n X j =1 l ∗ i ( Y j ) → P 0 for 1 ≤ i ≤ K − 1 (36) 1 n n X j =1 l ∗ i ( Y j ) → P C K 6 = 0 . (37) imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 26 W e apply Lemma 16 to the matrix L deﬁned in (6); since all the eig en v alues of L a re p ositive we can cho ose δ to b e any ﬁxed p os itive num b er less tha n the smallest eigen v a lue of L. W e obtain the following inequality U k =  1 √ n n X j =1 l ∗ ( Y j )  L  1 √ n n X j =1 l ∗ ( Y j )  T > δ     1 √ n n X j =1 l ∗ ( Y j )     2 = δ n k X i =1  1 n n X j =1 l ∗ i ( Y j )  2 ≥ δ n  1 n n X j =1 l ∗ K ( Y j )  2 . (38) Now by (36) and (3 7 ) w e get for all s ∈ R P ( U k ≤ s ) ≤ P  δ n  1 n n X j =1 l ∗ K ( Y j )  2 ≤ s  = P  1 n n X j =1 l ∗ K ( Y j )  2 ≤ s δ n  = P      1 n n X j =1 l ∗ K ( Y j )     ≤ r s δ n  → 0 as n → ∞ , and this proves the Propo sition. Pr o of. (Pr op osition 7). Let π ( k , n ) and ∆( k , n ) b e deﬁned as in Section 4. F o r any i = 1 , . . . , K − 1 w e have P F ( S = i ) ≤ P F  U i − π ( i, n ) ≥ U K − π ( K , n )  = P F  U i ≥ U K − ( π ( K , n ) − π ( i , n ))  . (39) By (37) and (38) we get P F  U K ≥ δ C K 2 n  → 1 as n → ∞ . (40) imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 27 Note that P F  U i ≥ U K − ( π ( K , n ) − π ( i , n ))  (41) ≤ P F  U i ≥ δ C K 2 n − ( π ( K , n ) − π ( i, n )); U K ≥ δ C K 2 n  + P F  U K ≤ δ C K 2 n  . Since b y h S 1 i it holds that π ( K , n ) − π ( i, n ) = o ( n ) , we get P F  U i ≥ δ C K 2 n − ( π ( K, n ) − π ( i , n )); U K ≥ δ C K 2 n  (42) ≤ P F  U i ≥ δ C K 2 n − ( π ( K, n ) − π ( i, n ))  ≤ P F  U i ≥ δ C K 2 n  → 0 as n → ∞ by Chebyshev’s inequality since by Pro p o sition 5 we hav e U i → d χ 2 i as n → ∞ for a ll i = 1 , . . . , K − 1 . Substituting (4 0) and (42) to (41) we g et P F ( S = i ) → 0 a s n → ∞ for all i = 1 , . . . , K − 1 . This means that P F ( S ≥ K ) → 1 as n → ∞ . Now wr ite for t ∈ R P F ( U S ≤ t ) = P F ( U S ≤ t ; S ≤ K − 1 ) + P F ( U S ≤ t ; S ≥ K ) =: R 1 + R 2 . But R 1 → 0 since P F ( S = i ) → 0 for i = 1 , . . . , K − 1 and K ≤ d < ∞ . Since U l 1 ≥ U l 2 for l 1 ≥ l 2 , we g et R 2 ≤ d X l = K P F ( U K ≤ t ) → 0 as n → ∞ by P rop osition 6. Thus P F ( U S ≤ t ) → 0 as n → ∞ for all t ∈ R . Pr o of. (Theor em 8). Part 1 follows fro m Theorem 1 and Prop osition 6, par t 2 from Theorem 1 and P rop osition 5, part 3 from Theorem 3 and Prop os ition 7. imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021 M. La ngovoy/Sc or e tests for de c onvolution. 28 Pr o of. (The statement ab out l ∗ j from E xample 2 ). Indeed, 1 √ n     n X i =1 ( l ∗ j − l ∗ θ 0 ( Y j ))     = 1 √ n     n X i =1 ( Y j − θ 0 )  1 σ 2 ( Y ) − 1 b σ 2 n      = √ n     1 σ 2 ( Y ) − 1 b σ 2 n     · 1 n     n X i =1 ( Y j − θ 0 )     . But 1 n     n X i =1 ( Y j − θ 0 )     =   Y − θ 0   =   Y − E Y   → 0 in G θ 0 ,η 0 − probability , there fo re Deﬁnition 4 is sa tisﬁe d if √ n   1 σ 2 ( Y ) − 1 b σ 2 n   is bo unded in G θ 0 ,η 0 − probability , and this holds if b σ 2 n is a √ n − consistent estimate of σ 2 ( Y ) . Here Y denotes the s a mple mean Y = 1 n P n i =1 Y j . Pr o of. (The statement ab out l ∗ j from E xample 3 ). 1 √ n     n X i =1 ( l ∗ j − l ∗ η 0 ( Y j ))     = 1 √ n   C 1 ( η 0 )       n X i =1  ( Y j − b θ n ) 2 − ( Y j − θ 0 ) 2      = 1 √ n   C 1 ( η 0 )       n X i =1 ( b θ n − θ 0 )( − 2 Y j + b θ n + θ 0 )     =   C 1 ( η 0 )   √ n   b θ n − θ 0   1 n     n X i =1 ( Y j − b θ n ) + n X i =1 ( Y j − θ 0 )     =   C 1 ( η 0 )   √ n   b θ n − θ 0     ( Y − b θ n ) + ( Y − θ 0 )   ≤   C 1 ( η 0 )   √ n   b θ n − θ 0      Y − b θ n   +   Y − θ 0    → 0 in G θ 0 ,η 0 − probability since for n → ∞ it holds that   Y − b θ n   → 0 and   Y − θ 0   → 0 , bo th in G θ 0 ,η 0 − probability , and √ n   b θ n − θ 0   is b o unded in G θ 0 ,η 0 − probability . Pr o of. (Theor em 11). Put V k =  1 √ n n X j =1 l ∗ θ 0 ( Y j )  ( I ∗ θ 0 ) − 1  1 √ n n X j =1 l ∗ θ 0 ( Y j )  T , (43) imsart-g eneric ver. 2007/04 /13 file: Deconvol ution_Arxiv. tex date: July 13, 2021 M. L angovoy/Sc or e tests for de c onvolution. 29 where l ∗ θ 0 is deﬁned b y (23) and I ∗ θ 0 by (24 ). O f co urse, V k is not a statistic since it depends on the unknown η 0 . But if the true η 0 is kno wn, then because of h B 1 i - h B 3 i we can apply the m ultiv ar iate Central Limit Theorem a nd obtain V k → d χ 2 k as n → ∞ . Condition (28 ) implies that 1 √ n n X i =1 l ∗ j → 1 √ n n X i =1 l ∗ θ 0 ( Y j ) in G θ 0 ,η 0 − probability as n → ∞ and by co nsistency of b L we g et the statement o f the theorem by Slutsky ’s Lemma. Pr o of. (Theor em 15). Because o f condition (28) the pro o f is a na logous to the pro of of Theo rem 8. Indeed, after obvious change of notations P rop ositions 5, 6, and 7 are true for W k , W S ( l ∗ ) , S ( l ∗ ) instead of U k , U S , S. Pro ofs of the ne w versions of prop ositions are a nalogous to the pr o ofs o f the prev io us versions. The only diﬀerence is that the pro o f o f the key ineq uality analog ous to (38) requires the use of the followin g lemma. Lemma 17. L et A b e a k × k p ositive deﬁnite matrix and { A n } ∞ n =1 b e se quenc e of k × k m atric es such that A n → A in the Euclidian matrix norm. Su pp ose that for some r e al numb er δ > 0 we have A > δ in the sense that the m atr ix ( A − δ I k × k ) is p ositive deﬁnite, wher e I k × k is the k × k identity matrix. Then for al l suﬃciently lar ge n it holds that A n > δ. imsart-g eneric ver. 2007/04/13 file: Deconvolutio n_Arxiv.tex date: July 13, 2021

Data-driven efficient score tests for deconvolution problems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment