Safe hypotheses testing with application to order restricted inference

Hypothesis tests under order restrictions arise in a wide range of scientific applications. By exploiting inequality constraints, such tests can achieve substantial gains in power and interpretability. However, these gains come at a cost: when the im…

Authors: Ori Davidov

Safe h yp otheses testing with application to order restricted inference Ori Da vidov Departmen t of Statistics, Universit y of Haifa, Moun t Carmel, Haifa 3498838 Israel E-mail: davidov@stat.haifa.ac.il Abstract Hyp othesis tests under order restrictions arise in a wide range of scien tific applications. By exploiting inequalit y constraints, such tests can ac hiev e substantial gains in pow er and in terpretability . Ho wev er, these gains come at a cost: when the imp osed constraints are missp ecified, the resulting inferences may b e misleading or even inv alid, and Type I I I errors ma y o ccur, i.e., the n ull h yp othesis may b e rejected when neither the n ull nor the alternative is true. T o address this problem, this pap er in tro duces safe tests. Heuristically , a safe test is a testing pro cedure that is asymptotically free of Type I I I errors. The prop osed test is accompanied b y a certificate of v alidity , a pre–test that assesses whether the original hypothe- ses are consisten t with the data, thereb y ensuring that the n ull h yp othesis is rejected only when warran ted, enabling principled inference without risk of systematic error. Although the dev elopment in this pap er fo cus on testing problems in order–restricted inference, the un- derlying ideas are more broadly applicable. The prop osed metho dology is ev aluated through sim ulation studies and the analysis of well–kno wn illustrativ e data examples, demonstrating strong protection against Type I I I errors while main taining pow er comparable to standard pro cedures. K ey-W or ds : Certificate of V alidit y , Constrained Inference, Distance T est, Large Sample Theory , Safe T ests, Type I I I Errors. 1 In tro duction Hyp othesis testing has b een studied extensively within the framework of order–restricted inference (ORI); see the monographs of Barlow et al. (1972), Rob ertson et al. (1988), and Silv apulle and 1 Sen (2005). Silv apulle and Sen (2005) classified a large subset of the testing problems arising in ORI into T yp e A or Type B Problems. T yp e A Problems are form ulated as H 0 : θ ∈ L versus H 1 : θ ∈ C \L (1) where L is a linear subspace and C is closed con vex cone with L ⊂ C . A classic example of such testing problems is H 0 : θ = 0 v ersus H 1 : θ ∈ R m + \{ 0 } where R m + is the p ositiv e orthant. Type A problems are common in applications and often referred to as testing for an order. T yp e B Problems are form ulated as H 0 : θ θ θ ∈ C versus H 1 : θ θ θ / ∈ C . (2) A canonical T yp e B Problem is H 0 : θ ∈ R m + v ersus H 1 : θ / ∈ R m + . This class of tests is referred to as testing against an order. It is well kno wn that accounting for constrain ts, as specified in ( 1 ) or ( 2 ), impro ves the p o wer of the resulting tests (e.g.,Praestgaard 2012) as w ell as the accuracy of the associated estimators (e.g., Hwang and Peddada 1994, Silv apulle and Sen 2005, Rosen and Davido v 2017). These im- pro vemen ts are often substan tial (cf., Singh et al. 2021). A case in p oint is ANOV A type problems where the sup erior p erformance of ORI has b een w ell kno wn for o ver fifty years, cf., Barlow et al. (1972). Singh and Da vidov (2019) recen tly sho w ed that striking gains are possible when exp er- imen ts are both designed and analyzed using metho ds that properly accoun t for the underlying constrain ts. Ho wev er, to date, few scien tific studies hav e capitalized on these findings, and the metho ds of ORI remain v astly underutilized. In our view, barriers to the broad adoption of the metho ds of ORI are both practical and principled. Practically , ORI requires constrained estima- tion and nonstandard asymptotic theory , making it more complex to understand and implemen t. Moreo ver, standard to ols suc h as the b o otstrap may fail when parameters lie on the b oundary of the parameter space (Andrews, 2000), and user–friendly soft w are remains limited. Principled ob jections concern the b eha vior of tests and estimators when the assumed constraints, e.g., θ ∈ C are missp ecified. F or example, one may ask how a test for ( 1 ) b ehav es when θ / ∈ C . Addition- ally , several authors, including Silv apulle (1997) and Cohen and Sackro witz (2004), hav e discussed metho dological concerns and potential deficiencies of the lik eliho o d ratio test (LR T) in ORI. See also P erlman and W u (1999) and the references therein. Suc h concerns ha ve motiv ated the de- v elopment of alternativ e procedures, including cone–order monotone tests as adv o cated b y Cohen and Sackro witz (1998). This comm unication addresses the aforemen tioned principled concerns, thereb y resolving man y of the issues raised in the literature. Despite the well–kno wn p ossibilit y of missp ecifying ordered restrictions and the widely rec- ognized risk of T yp e II I errors, there is a clear gap in the literature concerning their formal treatmen t. Addressing this gap, the pap er introduces and studies a nov el, easy–to–apply safe test, a testing pro cedure that is asymptotically free of T yp e I I I errors. Safe tests constitute a first step 2 to ward adaptiv e ORI, methodologies for estimation, prediction, and related tasks, in which order constrain ts are imp osed only when supp orted by the data. The pap er is organized in the follo wing w ay . In Section 2 the geometry of the distance test is studied. Section 3 introduces and studies a nov el safe test. Sim ulation results and illustrative examples, including the reanalysis of some w ell kno wn case studies from the literature, are provided in Section 4 . W e conclude in Section 5 with a brief summary and a discussion. All pro ofs are collected in Appendix A. 2 The distance test and T yp e I I I errors Supp ose that there exists a statistic S S S n whic h estimates a parameter θ θ θ ∈ Θ ⊆ R m and satisfies √ n ( S S S n − θ θ θ ) ⇒ N m ( 0 0 0 , Σ Σ Σ) (3) as n → ∞ where ⇒ denotes conv ergence in distribution. W e further assume that Σ Σ Σ n , a consisten t estimator for Σ Σ Σ , exists. Numerous tests for ( 1 ) and ( 2 ) assuming ( 3 ) ha v e b een proposed in the literature (Silv apulle and Sen, 2005). The most common in b oth applications as w ell in theoretical studies is the distance test (DT) whic h is of the form T n = T n (Θ 0 , Θ 1 ) = n {∥ S S S n − Π Σ Σ Σ n ( S S S n | Θ 0 ) ∥ 2 Σ Σ Σ n − ∥ S S S n − Π Σ Σ Σ n ( S S S n | Θ 1 ) ∥ 2 Σ Σ Σ n } , (4) where (Θ 0 , Θ 1 ) = ( L , C ) for T yp e A Problems and (Θ 0 , Θ 1 ) = ( C , R m ) for Type B Problems. Here Π Σ Σ Σ n ( S S S n | Θ i ) is the Σ Σ Σ n − pro jection of S S S n on to Θ i where i ∈ { 0 , 1 } and ∥ · ∥ Σ Σ Σ n is the corresponding norm. The DT is the large sample v ersion of the LR T under normalit y , i.e., if ( 3 ) holds exactly and Σ Σ Σ is kno wn up to a constan t m ultiple, then ( 4 ) is the LR T. The null is rejected in fav or of the alternativ e at the lev el α if T n ≥ c α the α lev el critical v alue. W e say that the DT is consistent at θ θ θ ∈ R m if P θ θ θ ( T n ≥ c α ) → 1 as n → ∞ . Understanding the geometry of the DT requires additional notation. First, for an y cone C let C ◦ Σ Σ Σ denote its p olar cone with resp ect to the inner product ⟨ u , v ⟩ Σ Σ Σ = u T Σ Σ Σ − 1 v , i.e., C ◦ Σ Σ Σ = { u ∈ R m : u T Σ Σ Σ − 1 v ≤ 0 , ∀ v ∈ C } . F or con venience w e shall write C ◦ instead of C ◦ Σ Σ Σ whenev er no am biguity arises. Next, note that the contin uit y of pro jections onto conv ex sets and the contin uous mapping theorem imply that T n = n ∆ + O p ( √ n ) where ∆ = ∥ θ θ θ − Π Σ Σ Σ ( θ θ θ | Θ 0 ) ∥ 2 Σ Σ Σ − ∥ θ θ θ − Π Σ Σ Σ ( θ θ θ | Θ 1 ) ∥ 2 Σ Σ Σ , (5) so T n → ∞ if and only if ∆ > 0 in whic h case P θ θ θ ( T n ≥ c α ) → 1 . Using ( 5 ) we ha ve: 3 Theorem 2.1. In T yp e A Pr oblems, the DT is c onsistent pr ovide d θ θ θ / ∈ ( C ∩ L ⊥ ) ◦ . (6) In T yp e B Pr oblems the DT is c onsistent for al l θ θ θ / ∈ C . Remark 2.1. The or em 2.1 has b e en informal ly state d but not pr ove d Silvapul le and Sen (2005). An imp ortan t sp ecial case of ( 6 ) arises when the cone C is defined by a finite set of linear inequalities in whic h case it is referred to as p olyhedral cone, i.e., L = { θ θ θ ∈ R m : Rθ Rθ Rθ = 0 0 0 } and C = { θ θ θ ∈ R m : Rθ Rθ Rθ ≥ 0 0 0 } for some p × m restriction matrix R . Set η η η = Rθ Rθ Rθ and rewrite ( 1 ) as H 0 : η η η ∈ M v ersus H 1 : η η η ∈ Q\M where M = { 0 0 0 } and Q = R p + ≡ { η η η ∈ R p : η η η ≥ 0 0 0 } . Let W W W n = R S RS RS n b e an estimator of η η η . It follo ws from ( 3 ) that √ n ( W W W n − η η η ) ⇒ N p ( 0 0 0 , R Σ R R Σ R R Σ R T ) . Next note that M ⊥ = R p so Q ∩ M ⊥ is nothing but Q . By Prop osition 3.12.8 in Silv apulle and Sen (2005) and a bit of algebra it can b e sho wn that the p olar cone of Q with resp ect to R Σ R R Σ R R Σ R T is giv en by Q ◦ = { η η η ∈ R p : η η η T ( R Σ R R Σ R R Σ R T ) − 1 ≤ 0 0 0 } . Substituting η η η = Rθ Rθ Rθ w e conclude that the DT is not consisten t pro vided θ θ θ satisfies θ θ θ T R R R T ( R R R Σ Σ Σ R R R T ) − 1 ≤ 0 0 0 , a condition that is easy to c heck. Example 2.1. Consider testing H 0 : θ θ θ = 0 0 0 versus H 1 : θ θ θ ∈ R 2 + \ 0 0 0 (7) b ase d on a sample X X X 1 , X X X 2 , . . . fr om N 2 ( 0 0 0 , I I I ) distribution. A pplying The or em 2.1 shows that the DT, which c oincides with the LR T, is c onsistent whenever θ θ θ ∈ R 2 \ R 2 − , wher e R 2 − denotes the ne gative quadr ant. Ther efor e, if θ θ θ ∈ { θ θ θ : θ 1 > 0 , θ 2 ≤ 0 } ∪ { θ θ θ : θ 1 ≤ 0 , θ 2 > 0 } , i.e., θ θ θ b elongs to the se c ond or fourth quadr ants, then the DT is c onsistent although θ θ θ do es not b elong to the alternative. F or such θ θ θ and al l n ∈ N ther e is a p ossibility of a T yp e III Err or. Let A n (Θ 0 , Θ 1 , α ) and R n (Θ 0 , Θ 1 , α ) denote the acceptance and rejection regions, resp ectively , for testing H 0 : θ θ θ ∈ Θ 0 v ersus H 0 : θ θ θ ∈ Θ 1 \ Θ 0 using ( 4 ) at the level α . The follo wing theorem describ es the DT–based acceptance regions for Type A and B Problems. In what follo ws the sym b ol ⊕ denotes the Minko wski Sum, i.e., for an y sets U and V define U ⊕ V = { u + v : u ∈ U , v ∈ V } , and Ball Σ Σ Σ ( 0 0 0 , c ) = { x x x : ∥ x x x ∥ 2 Σ Σ Σ < c 2 } is the open ball with radius c with resp ect to the norm ∥ · ∥ Σ Σ Σ . Theorem 2.2. F or T yp e A Pr oblems we have: A n ( L , C , α ) = ( C ∩ L ⊥ ) ◦ Σ Σ Σ n ⊕ Ball Σ Σ Σ n ( 0 0 0 , r c α n ) , (8) wher e as for T yp e B Pr oblems we have: A n ( C , R m , α ) = C ⊕ Ball Σ Σ Σ n ( 0 0 0 , r c α n ) . (9) 4 Theorems 2.1 and 2.2 are closely related. In fact ( 8 ) can b e obtained from ( 6 ) b y adding the Mink owski addition of small ball to ( 6 ). This procedure is sometimes referred to as ε − fattening. One difference b et ween Theorems 2.1 and 2.2 is that the acceptance regions are computed with resp ect to the estimated v ariance Σ Σ Σ n whereas consistency is calculated with resp ect to the true v ariance Σ Σ Σ . The difference b et ween the tw o is small as A n ( L , C , α ) → ( C ∩ L ⊥ ) ◦ as n → ∞ . In certain cases, one can obtain more explicit and intuitiv e characterization than those provided b y Theorems 2.1 (and 2.2 ). F or example, consider the simplest p ossible ANO V A mo del Y ij = θ i + ε ij , (10) where i = 1 , . . . , K , j = 1 , . . . , n i and ε ij are IID with mean 0 and v ariance σ 2 . T ypically , it is assumed that under the n ull all means are equal, i.e., θ θ θ ∈ L where L = { θ θ θ ∈ R K : θ 1 = · · · = θ K } . The most common ordered alternativ es are the simple, tree and um brella orders specified b y the cones C s = { θ θ θ ∈ R K : θ 1 ≤ · · · ≤ θ K } , C t = { θ θ θ ∈ R K : θ 1 ≤ θ 2 , . . . , θ 1 ≤ θ K } and C u = { θ θ θ ∈ R K : θ 1 ≤ · · · ≤ θ p ≥ · · · ≥ θ K } , resp ectiv ely . All three alternativ e h yp otheses arise frequen tly in practice and were instrumental in motiv ating the dev elopment of ORI; see Barlo w et al. (1972) and v an Eeden (2006) for surv eys of early work in the area. Theorem 2.3. The DT for H 0 : θ θ θ ∈ L versus H 1 : θ θ θ ∈ C s \ L is c onsistent pr ovide d that for some 1 ≤ i ≤ K − 1 we have max 1 ≤ s ≤ i A v ( θ θ θ , s, i ) < min i +1 ≤ t ≤ K A v ( θ θ θ , i + 1 , t ) (11) wher e A v( θ θ θ , u, v ) = P v u w j θ j / P v u w j and w j = lim n j /n ∈ (0 , 1) wher e n = P K j =1 n j and j = 1 , . . . , K . Equation ( 11 ) states that the DT is consisten t if there exists an index i that partitions the means into at least t w o lev el–sets. If so, the means are, in a w eak sense, increasing on av erage. Equiv alen tly , since Π Σ Σ Σ ( θ θ θ | L ) = A v ( θ θ θ , 1 , K ) 1 1 1 K , where 1 1 1 K = (1 , . . . , 1) T , m ust differ from Π Σ Σ Σ ( θ θ θ | C s ) when the test is consistent, it follo ws that the DT has no pow er on ( C s ∩ L ⊥ ) ◦ = { θ θ θ ∈ R m : Π Σ Σ Σ ( θ θ θ | C s ) ∈ span { 1 1 1 K } } . Example 2.2. Sp e cific al ly, supp ose that K = 3 and n 1 = n 2 = n 3 . It is e asy to verify that Equation ( 11 ) holds for i = 1 pr ovide d θ 1 < min { θ 2 , ( θ 2 + θ 3 ) / 2 } and ther efor e, either: ( a ) θ 1 < θ 2 ≤ θ 3 ; or ( b ) θ 1 < θ 2 > θ 3 and θ 1 < ( θ 2 + θ 3 ) / 2 . Similarly, when i = 2 Equation ( 11 ) holds whenever max { ( θ 1 + θ 2 ) / 2 , θ 2 } < θ 3 and ther efor e either ( c ) θ 1 ≤ θ 2 < θ 3 ; or ( d ) θ 1 > θ 2 < θ 3 and ( θ 1 + θ 2 ) / 2 < θ 3 . Cle arly, if θ θ θ satisfies ( a ) or ( c ) then θ θ θ ∈ C s . However if θ θ θ , satisfies ( b ) or ( d ) , then θ θ θ / ∈ C s and a T yp e III err or wil l o c cur with pr ob ability tending to unity as n → ∞ . With slight modifications, the proof of Theorem 2.3 can b e adapted to deal with the tree and um brella order. In particular, the DT for the tree order is consisten t provided θ 1 < max { θ 2 , . . . , θ K } , 5 i.e., if θ 1 < θ i for some i ∈ { 2 , . . . , K } . If so, for large n the constrained estimator of θ 1 satisfies θ ∗ 1 ,n = P i ∈ J w j θ i P i ∈ J w j + o p (1) where J = { 1 ≤ j ≤ K : θ j ≤ θ 1 } whereas the constrained estimator of θ i for i / ∈ J satisfies θ ∗ i,n = θ i + o p (1) . Consequently for an y i / ∈ J w e hav e P ( θ ∗ 1 ,n < θ ∗ i,n ) → 1 and it follows from the argumen ts in the pro of of Theorem 2.3 that the DT is consisten t. The situation for the umbrella order is a bit more complicated. Ho wev er, it can b e demonstrated that the DT is consistent if either: ( i ) the up–branc h, i.e., the sub vector ( θ 1 , . . . , θ p ) , satisfies the conditions of Theorem 2.3 , i.e., for some 1 ≤ i < p we hav e max 1 ≤ s ≤ i A v ( θ θ θ , s, i ) < min i +1 ≤ t ≤ p A v ( θ θ θ , i + 1 , t ); or if ( ii ) the do wn branc h, i.e., the subv ector ( θ p , . . . , θ K ) satisfies the rev erse condition, i.e., for some p ≤ i ≤ K − 1 w e ha v e min p ≤ s ≤ i A v ( θ θ θ , s, i ) > max i +1 ≤ t ≤ K A v ( θ θ θ , i + 1 , t ) . Studying the geometry of the DT enables an explicit characterization of the set on whic h the DT is consisten t and, therefore, also the set on whic h T yp e II I errors will o ccur with probability increasing to one as n → ∞ . Examples 2.1 and 2.2 demonstrate ho w Type I I I errors arise in common scenarios, emphasizing that T yp e I I I errors are not unfortunate pathological acciden ts or bizarre sp ecial cases, but the norm. In other w ords, Type II I errors are ubiquitous in Type A Problems. Although the possibility of Type II I errors in ORI is well kno wn among researc hers in the field, we are una w are of an y formal in vestigation thereof. Clearly , T yp e II I errors do not arise in Type B Problems where Θ 0 ∪ Θ 1 = R m . Although T yp e I I I errors are m uch less familiar than T yp e I & I I errors, their effect on the v alidit y of our inferences are as sev ere. F or example, a naiv e application of the DT ma y lead to the erroneous conclusion that a giv en treatmen t impro ves all outcomes under study , when, in fact, there is an impro vemen t in a single outcome accompanied b y deterioration in the remaining m − 1 outcomes. Suc h errors inevitably result in p o or decision making. F or obvious reasons, T yp e I I I errors are particularly prev alen t and dangerous in high–dimensional settings. In some applications, Type I I I errors are referred to as directional errors (Lehmann and Romano, 2005). F or further discussion and additional p ersp ectiv es, see Kaiser (1960), Shaffer (1972, 1990, 2002), Finner (1999), Oleckno (2008), Guo et al. (2010), Salkind (2010), May o and Spanos (2011), Grandhi et al. (2016), and Lin and P eddada (2024). 3 Safe tests for T yp e A Problems W e start b y defining safet y in testing. Definition 3.1. A α level test T n with r eje ction r e gion R is said to b e safe if for al l α ∈ (0 , 1) and e ach fixe d θ θ θ / ∈ Θ 1 we have lim n P θ θ θ ( T n ∈ R ) = 0 . 6 Th us, a test is safe if the probability that it commits Type I I I errors decreases to 0 as n → ∞ . Theorem 3.1. The DT ( 4 ) is not safe for testing ( 1 ) in T yp e A Pr oblems. Theorem 3.1 shows that the DT is not safe. Hence, it v alidates the principled objections to the use of ORI–based metho ds, as discussed in the Introduction and further demonstrated in Examples 2.1 and 2.2 . In the following, a safe test that alleviates these concerns is introduced and studied. 3.1 F orm ulation W e in tro duce safe tests for T yp e A Problems; more general safe tests are briefly discussed in Section 5 . W e start with some notation. Let t n denote the realized v alue of T n , the DT for testing ( 1 ). Denote the asso ciated p–v alue b y α ∗ = P 0 0 0 ( T n ≥ t n ) . Next, consider an auxiliary system of h yp otheses; in general these are of the form H ′ 0 : θ θ θ ∈ Θ 0 ∪ Θ 1 v ersus H ′ 1 : θ θ θ / ∈ Θ 0 ∪ Θ 1 . Note that when (Θ 0 , Θ 1 ) = ( L , C ) , i.e., in Type A problems, the auxilliary h yp otheses reduce to ( 2 ), a Type B testing problem. Let T ′ n denote the corresponding DT and let t ′ n b e its realized v alue. It is well known that for an y γ ∈ (0 , 1) the critical v alue c ′ γ for testing ( 2 ) solv es γ = sup θ θ θ ∈C P θ θ θ ( T ′ n ≥ c ′ γ ) = P 0 0 0 ( T ′ n ≥ c ′ γ ) ; corresp ondingly the asso ciated p–v alue is γ ∗ = P 0 0 0 ( T ′ n ≥ t ′ n ) . The pair ( γ ∗ , α ∗ ) rep orts on the outcome of the t w o tests and summarizes the evidence in the data. Consider the mapping ( γ ∗ , α ∗ ) 7→ ( D 1 , D 2 ) where D 1 = I { γ ∗ ≥ γ } and D 2 = I { α ∗≤ α } for some sp ecified v alues of γ and α . In particular D 1 = 1 indicates that the auxiliary n ull is not rejected at the level γ . Equiv alen tly , a certificate of v alidity of level γ , is issued. The ev en t D 2 = 1 indicates that the original n ull is rejected at the lev el α . See T able 1 for the relev an t possibilities. T able 1: The decision space Certificate ( D 1 ) Original T est ( D 2 ) Conclusion 1 1 Safely , reject the Null. 1 0 Do not reject the Null. 0 1 A likely T yp e II I error. Revisit assumptions. 0 0 Do not reject the Null. Revisit assumptions. Henceforth we shall refer to the tests T n and T ′ n as the base–tests. In the literature, e.g., Raub ertes et al. (1986), these tests are sometimes denoted by T n, 01 and T n, 12 . W e shall combine the base–tests to deriv e a comp osite safe test. F or con venience w e will denote the acceptance regions of the base tests b y A n = A n ( L , C , α ) and A ′ n = A n ( C , R m , γ ) , resp ectiv ely . The corresp onding rejection regions are accordingly lab eled. 7 Definition 3.2. Fix α and γ . L et T SAFE n b e a test for ( 1 ) with r eje ction r e gion R SAFE n = R SAFE n ( α, γ ) = A ′ n ∩ R n = { S S S n ∈ R m : T ′ n < c ′ γ , T n ≥ c α } . (12) It immediately follows that R SAFE n = { S S S n ∈ R m : T SAFE n ≥ c α } where T SAFE n = T n I { T ′ n 0 } + ( X 2 1 ,n + X 2 2 ,n ) I { X 1 ,n ≥ 0 ,X 2 ,n ≥ 0 } } . The v alue of c SAFE α is obtained b y solving the equation e α = P ( χ 2 1 ≥ c ) / 2 + P ( χ 2 2 ≥ c ) / 4 for c where e α solves ( 13 ) for a specified v alue of α SAFE guaran teeing that P 0 0 0 ( T SAFE n ≥ c SAFE α ) = α. Moreo ver, since α SAFE ≤ α , we ha ve c SAFE α ≤ c α . Giv en an explicit form of T SAFE n , see the display ab o ve, its distribution can b e studied for any θ θ θ ∈ R 2 . F or example, it is immediate that for any θ θ θ ∈ C we ha ve P θ θ θ ( T SAFE n  = T n ) → 0 for an y fixed n as γ → 0 and similarly for an y fixed γ as n → ∞ . Hence T n and T SAFE n are exp ected to p erform similarly on the alternative. Ho wev er, for an y θ θ θ in the second or fourth quadran t it is easy to verify that T SAFE n = 0 for all γ ∈ (0 , 1) if n is large enough. Therefore, T SAFE n , ab ov e, is indeed a safe test. 3.3 The general case W e conclude this section with a general Theorem addressing safe tests for Type A Problems. Theorem 3.2. Consider testing ( 1 ) using T SAFE n for some fixe d levels α and γ . Then α SAFE = sup θ θ θ ∈L P θ θ θ ( T SAFE n ≥ c α ) ≤ α. (14) Mor e over, the test T SAFE n is safe. F urthermor e, for al l θ θ θ ∈ int( C \ L ) we have P θ θ θ ( T n ≥ c α ) ≤ P θ θ θ ( T SAFE n ≥ c SAFE α ) + o (1) (15) 10 wher e c SAFE α is the α –level critic al value of T SAFE n . Theorem 3.2 shows that the test T SAFE n with rejection region ( 12 ) is a safe test. Moreo ver, for large samples the test based on T SAFE n is more p o w erful than the test based on T n in the in terior of C . Ho w ev er, in finite samples the safe test ma y incur some p ow er loss near the boundary of C . This loss typically dep ends on γ and the sample size n . It is also clear that for finite samples the p ossibilit y of T yp e I I I errors while using T SAFE n p ersists, but to a lesser degree and in a smaller subset of the parameter space compared to T n . Next, supp ose that C is a p olyhedral cone of the form C = { θ θ θ ∈ R m : Rθ Rθ Rθ ≥ 0 0 0 } for some p × m restriction matrix R . It follows that testing ( 1 ) using S S S n is equiv alent to testing H 0 : η η η = 0 0 0 v ersus H 1 : η η η ∈ R p + \ { 0 0 0 } using W W W n where √ n ( W W W n − η η η ) ⇒ N ( 0 0 0 , Ψ Ψ Ψ) , η η η = R θ Rθ Rθ and Ψ Ψ Ψ = R R R Σ Σ Σ R R R T . The corresponding auxiliary hypotheses are H ′ 0 : η η η ∈ R p + v ersus H ′ 1 : η η η / ∈ R p + . The DT s for these systems are T n = ∥ Π Ψ Ψ Ψ ( W W W | R p + ) ∥ 2 Ψ Ψ Ψ + o p (1) , T ′ n = ∥ Π Ψ Ψ Ψ ( W W W | ( R p + ) ◦ Ψ Ψ Ψ ) ∥ 2 Ψ Ψ Ψ + o p (1) , where W W W is a N ( 0 0 0 , Ψ Ψ Ψ) R V. Now, by Lemma 3.13.6 in Silv apulle and Sen (2005), see also Raub ertas et al. (1986), w e can deduce that for an y c 1 , c 2 ≥ 0 we ha ve P 0 0 0 ( T n ≥ c 1 , T ′ n < c 2 ) = p X j =0 w j ( p, Ψ Ψ Ψ , R p + ) P ( χ j ≥ c 1 ) P ( χ p − j < c 2 ) + o p (1) , (16) where w j = w j ( p, Ψ Ψ Ψ , R p + ) , j = 0 , . . . , p are nonnegativ e weigh ts that sum to unity . F urthermore, ( 16 ) implies that T SAFE n ⇒ T SAFE where for an y t SAFE ≥ 0 and fixed c ′ γ the tail of T SAFE is giv en b y P 0 0 0 ( T SAFE ≥ t SAFE ) = P 0 0 0 ( T ≥ t SAFE , T ′ < c ′ γ ) where T and T ′ are resp ectiv ely , the distributional limits of T n and T ′ n . Ev aluating p–v alues, finding critical v alues and related inferential tasks accurately approxi- mating the unknown quan tities in ( 16 ). It is w ell kno wn, see Lemma 3.13.7 in Silv apulle and Sen (2005), that w j where j = 0 , . . . , p is the probabilit y that the R V W W W is projected on to a face of dimension j of the cone R p + . It follo ws that b y generating a large sample ˆ W W W 1 , . . . , ˆ W W W N from N ( 0 0 0 , Ψ Ψ Ψ n ) , where Ψ Ψ Ψ n = R Σ n R R Σ n R R Σ n R T , w e can estimate w j b y ˆ w j = N − 1 P N k =1 I { Π( ˆ W W W k | R p + ) ∈F j } , i.e., the prop ortion of times the pro jection is in F j , the collection of faces of dimension j . Clearly , ˆ w j p − → w j for all j as n → ∞ and N → ∞ . Given the simulated w eigh ts ˆ w 0 , . . . , ˆ w p w e can find the rejection region of T SAFE n . First, fix γ and set c 1 = 0 in ( 16 ). Solve the resulting equation, i.e., 1 − γ = P p j =0 ˆ w j P ( χ p − j ≤ c ′ γ ) and denote the solution b y ˆ c ′ γ . Next, for an y fixed α solve the equation α = P p j =0 b w j P ( χ j > c SAFE α ) P ( χ p − j ≤ ˆ c ′ γ ) and denote the solution by ˆ c SAFE α . Note that b oth equations can b e readily solved by the bisection metho d. Plugging ˆ c SAFE α and ˆ c ′ γ in to ( 12 ) w e obtain an approximate α –level rejection region for T SAFE n . W e ha ve just sho wn that given γ 11 it is alw a ys p ossible to adjust the lev el of T n so T SAFE n has an y prec hosen level in (0 , 1) . Using a similar pro cedure it is alw a ys possible to adjust γ so T SAFE n has level α SAFE ∈ (0 , α ) . 4 Numerical results In this section the p erformance of the safe test is compared with that of DT in sev eral exp erimental settings. In addition w e pro vide an analysis of t wo w ell known examples from the literature. 4.1 Sim ulations W e conducted a sim ulation study to ev aluate the performance of the prop osed safe test. Specifi- cally , w e test ( 7 ) b y generating samples of size n from N 2 ( θ θ θ , I I I ) . Sev en possible experimental set- tings for the v alue of θ θ θ , listed in T able 2 , were considered. These include a null v alue θ θ θ 0 = (0 , 0) T along with six non–n ull v alues, all lo cated on a circle with radius ∥ θ θ θ i ∥ 2 = 3 / 4 for i = 1 , . . . , 6 . F or eac h i , w e rep ort ∡ ( e e e 1 , θ θ θ i ) , the angle b etw een θ θ θ i and e e e 1 = (1 , 0) the p ositive horizon tal axis. Observ e that θ θ θ i ∈ C for i ∈ { 1 , 2 , 3 } and θ θ θ i / ∈ C for i ∈ { 4 , 5 , 6 } . T able 2: Experimental settings for the mean v alue θ θ θ used in the sim ulation study . Mean θ θ θ 0 θ θ θ 1 θ θ θ 2 θ θ θ 3 θ θ θ 4 θ θ θ 5 θ θ θ 6 Angle − 45 ◦ 15 ◦ 0 − 15 ◦ − 45 ◦ − 60 ◦ Region L C C C C c C c C c T able 3 rep orts on the p ow er of T n and T SAFE n in the aforementioned exp erimen tal settings assuming α = α SAFE = 0 . 05 , n ∈ { 10 , 20 , 50 } and γ ∈ { 0 . 1 , 0 . 05 , 0 . 01 } . The test T n is applied as usual while in order to use T SAFE n w e first find c SAFE α as explained earlier. The pow er is calculated using 10 6 sim ulation runs and rounded to the third significant digit. The first blo ck of results are obtained at the null θ θ θ 0 = (0 , 0) T ; these show that the actual lev el of T SAFE n agrees with its nominal lev el for all v alues of γ . A t θ θ θ 1 , lo cated in the in terior of R 2 + , there is no difference b et ween the p o wer of T n and T SAFE n whereas at θ θ θ 2 the p o wers of T n and T SAFE n are v ery similar. A t θ θ θ 3 , whic h lies on the b oundary of R 2 + , the p o wer of T n is slightly higher than the p ow er of T SAFE n ; this difference is insignificant when γ is small. When i ∈ { 4 , 5 , 6 } w e hav e θ θ θ i / ∈ C and the DT commits a Type I I I error with high probabilit y . In fact, as n grows, so does the likelihoo d of a T yp e II I error. On the other hand, the probabilit y of a Type I I I error drops precipitously when T SAFE n is used esp ecially for large n and v alues of θ θ θ whic h are far from the alternativ e. 12 T able 3: Comparing the p ow er of the DT and Safe test n = 10 n = 20 n = 50 Mean γ T n T SAFE n T n T SAFE n T n T SAFE n 0 . 1 0 . 050 0 . 048 0 . 050 0 . 049 0 . 050 0 . 048 θ θ θ 0 0 . 05 0 . 050 0 . 049 0 . 050 0 . 049 0 . 050 0 . 049 0 . 01 0 . 050 0 . 050 0 . 050 0 . 050 0 . 050 0 . 050 0 . 1 0 . 705 0 . 705 0 . 931 0 . 931 1 . 000 1 . 000 θ θ θ 1 0 . 05 0 . 706 0 . 706 0 . 932 0 . 932 1 . 000 1 . 000 0 . 01 0 . 706 0 . 706 0 . 932 0 . 932 1 . 000 1 . 000 0 . 1 0 . 693 0 . 687 0 . 928 0 . 924 1 . 000 0 . 999 θ θ θ 2 0 . 05 0 . 693 0 . 691 0 . 928 0 . 927 1 . 000 0 . 999 0 . 01 0 . 693 0 . 693 0 . 928 0 . 928 1 . 000 1 . 000 0 . 1 0 . 666 0 . 640 0 . 917 0 . 879 1 . 000 0 . 957 θ θ θ 3 0 . 05 0 . 665 0 . 652 0 . 917 0 . 899 1 . 000 0 . 980 0 . 01 0 . 666 0 . 664 0 . 917 0 . 914 1 . 000 0 . 996 0 . 1 0 . 608 0 . 528 0 . 885 0 . 711 0 . 999 0 . 635 θ θ θ 4 0 . 05 0 . 609 0 . 565 0 . 885 0 . 782 0 . 999 0 . 752 0 . 01 0 . 609 0 . 598 0 . 885 0 . 856 0 . 999 0 . 907 0 . 1 0 . 353 0 . 182 0 . 624 0 . 160 0 . 955 0 . 020 θ θ θ 5 0 . 05 0 . 354 0 . 230 0 . 623 0 . 234 0 . 955 0 . 043 0 . 01 0 . 354 0 . 299 0 . 624 0 . 392 0 . 955 0 . 140 0 . 1 0 . 193 0 . 071 0 . 352 0 . 041 0 . 724 0 . 002 θ θ θ 6 0 . 05 0 . 192 0 . 096 0 . 352 0 . 070 0 . 724 0 . 004 0 . 01 0 . 192 0 . 143 0 . 352 0 . 147 0 . 724 0 . 021 These results corroborate our theoretical findings, demonstrating that the safe test is effective in limiting the o ccurrence of Type I I I errors ev en with small sample sizes, while simultaneously main taining p ow er comparable to the standard DT, ev en in when the true v alue of the parameter is close to the b oundary . 4.2 Illustrativ e examples In this section t wo w ell–known examples are examined from the p ersp ective of safe testing. 4.2.1 T esting for p ositivit y W e start b y demonstrating that the region in the parameter space on whic h T yp e I I I errors o ccur dep ends on the v ariance matrix Σ Σ Σ . Recall that the w ell kno wn Hotelling T 2 test (Bilo deau and Brenner, 1999) rejects the n ull H 0 : θ θ θ = 0 0 0 when n S S S T n Σ Σ Σ − 1 n S S S n is large, i.e., whenev er ∥ S S S n ∥ 2 Σ Σ Σ n is large. The latter norm also pla ys a role in the DT ( 4 ). W e demonstrate the subtle and surprising effect of Σ Σ Σ (or Σ Σ Σ n ) on the DT. Recall that the DT for testing ( 7 ) when m = 2 is consisten t pro vided θ θ θ / ∈ ( R 2 + ) ◦ Σ Σ Σ = { θ θ θ ∈ R 2 : θ θ θ T Σ Σ Σ − 1 v v v ≤ 0 , ∀ v v v ∈ R 2 + } = { θ θ θ ∈ R 2 : θ θ θ T Σ Σ Σ − 1 v v v ≤ 0 , v v v ∈ { e e e 1 , e e e 2 }} 13 where e e e 1 = (1 , 0) T , e e e 2 = (0 , 1) T . Supp ose that Σ Σ Σ has an in terclass correlations structure of the form   1 ρ ρ 1   (17) for some | ρ | < 1 . It is easily v erified that ( R 2 + ) ◦ Σ Σ Σ = { θ θ θ ∈ R 2 : θ 1 − ρθ 2 ≤ 0 , θ 2 − ρθ 1 ≤ 0 } . Figure 2 plots ( R 2 + ) ◦ Σ Σ Σ for ρ ∈ { +1 / 4 , − 1 / 4 } . θ 1 θ 2 ( R 2 + ) ◦ Σ Σ Σ (a) Figure 2(a): ρ = +1 / 4 θ 1 θ 2 ( R 2 + ) ◦ Σ Σ Σ (b) Figure 2(b): ρ = − 1 / 4 Figure 2: Plots of the cones ( R 2 + ) ◦ Σ Σ Σ , i.e., the regions in whic h the DT is not consisten t, for ρ ∈ { +1 / 4 , − 1 / 4 } . The cones, in blue, are b ounded by their extreme ra ys which extend indefinitely . It is clear that if ρ ≥ 0 then ( R 2 + ) ◦ Σ Σ Σ ⊆ R 2 − and ( R 2 + ) ◦ Σ Σ Σ ⊇ R 2 − when ρ ≤ 0 ; if ρ = 0 then ( R 2 + ) ◦ Σ Σ Σ = R 2 − . Th us, the rejection region dep ends on the v alue of the correlation co efficien t. F urther observe that a T yp e I I I error ma y o ccur whenever θ θ θ ∈ ( Q 2 ∩ Q 3 ∩ Q 4 ) \ ( R 2 + ) ◦ Σ Σ Σ , where Q i is the i th quadran t of R 2 . It follo ws that the p ossibilit y of T yp e I I I errors increases when ρ = +1 / 4 compared to when ρ = − 1 / 4 , as the dep endence structure alters the geometry of the rejection region. It is also clear from Figure 2 that when ρ = +1 / 4 , it is p ossible to reject the n ull even when b oth comp onents of X X X n are negativ e; this is not p ossible when ρ = − 1 / 4 . Th us, the correlation structure strongly influences the consistency of tests, and therefore also the p ossibilit y of T yp e I I I errors. Moreov er, the pow er function, as a function of ρ , exhibits sharp phase transitions from regions where it is consistent to regions in which the test do es not ha ve any p ow er; a phenomenon, not observed with unconstrained tests and whic h has not b een fully recognized to date. It is also clear that the essence of this example holds in higher dimensions and more complexly structured v ariance matrices. The preceding analysis is the k ey to understanding Silv apulle (1997) paper en titled A curious example involving the LR T for one side d hyp otheses which sparked a liv ely debate, cf. P erlman and W u (1999). In that pap er the h yp otheses ( 7 ) was tested assuming X X X 1 , . . . , X X X 5 are I ID N 2 ( θ θ θ , Σ Σ Σ) , where X X X 5 = ( − 3 , − 2) T and Σ Σ Σ is of the form ( 17 ) with ρ = 0 . 9 . He observed that the LR T rejects the n ull h yp othesis, whereas the individual one–sided tests, i.e., H ( i ) 0 : θ i = 0 v ersus 14 H ( i ) 1 : θ i > 0 for i ∈ 1 , 2 , and therefore the Intersection–Union T est (Casella and Berger, 2024), do not. Silv apulle argued that these t wo diametrically opp osed conclusions, reached using different testing pro cedures, ma y app ear counterin tuitive, but are not logically inconsisten t. Nev ertheless, he emphasized that this example serv es as a caution against the indiscriminate application of one–sided tests. W e reanalyze this example from the p ersp ective of safe testing. In fact, in this example T n = 12 . 89 and c α solv es α = P ( χ 2 1 ≥ c α ) / 2 + (1 − cos − 1 ( ρ ) /π ) P ( χ 2 2 ≥ c α ) which for α = 0 . 05 equals 4 . 915 . It follows that T n > c α and the asso ciated p–v alue α ∗ is smaller than 0 . 001 . It is also easy to c heck that T ′ n > c ′ γ for γ ∈ { 0 . 1 , 0 . 05 , 0 . 01 } . In fact, the p–v alue asso ciated with the auxiliary h yp otheses is highly significan t, i.e., γ ∗ < 10 − 6 . Th us, using the safe test, T SAFE n , w ould not lead to the rejection of the n ull h yp othesis but rather to a reev aluation of the original assumptions on the p ossible v alues of the parameter θ . F urther note that since ρ = 9 / 10 > 0 it is possible, as stated earlier, cf. Figure 2 , to reject the n ull using T SAFE n ev en when b oth en tries of X X X 5 are negative. Ho wev er, the distance of X X X 5 from the null also pla ys an imp ortan t role. In particular, X X X 5 is sufficien tly separated from the n ull, i.e., ( − 3 , − 2) T / ∈ A 5 ( 0 0 0 , R 2 + , γ ) for all γ ≥ 10 − 5 . W e conclude that the inconsistency pointed out by Silv apulle’s pap er is easily understo o d and resolv ed by using the framew ork of safe testing. 4.2.2 T esting for sto c hastic order Cohen and Sac kro witz (1998) presented tw o tables comparing trinomial distributions with fixed marginals. Their data, app earing in T ables 5 and 6 in their pap er, is displa y ed in T able 4 b elow. T able 4: T ables 5 and 6 of Cohen and Sac kro witz (1998). Both sub–tables compare t wo trinomial distributions, one for the con trol group the other for the treatmen t group. W orse Same Better T otal Con trol 5 11 1 17 T able 5 of C & S: T reatment 3 8 4 15 T otal 8 19 5 W orse Same Better T otal Con trol 0 16 1 17 T able 6 of C & S: T reatment 8 3 4 15 T otal 8 19 5 The ob jective w as to test whether the outcomes distributions are ordered b y treatmen t. Let P = ( p 1 , p 2 , p 3 ) and Q = ( q 1 , q 2 , q 3 ) denote the trinomial distribution of the con trol and treatmen t groups resp ectively . The sto c hastic order P ⪯ st Q , see Shaked and Shanthikumar (2007), holds pro vided p 1 ≥ q 1 and p 1 + p 2 ≥ q 1 + q 2 . It follo ws that testing H 0 : P = st Q v ersus H 1 : P ≺ st Q (18) 15 is equiv alen t to testing H 0 : η η η = 0 0 0 versus H 1 : η η η ∈ R 2 + \{ 0 0 0 } where η η η = Rθ Rθ Rθ with θ θ θ = ( p 1 , p 1 + p 2 , q 1 , q 1 + q 2 ) T and R =   1 0 − 1 0 0 1 0 − 1   . Let S S S n denote the MLE of θ θ θ . By the central limit theorem √ n ( S S S n − θ θ θ ) ⇒ N 4 ( 0 0 0 , Σ Σ Σ) where Σ Σ Σ = Blo c kDiag ( ρ 1 Σ Σ Σ 1 , ρ 2 Σ Σ Σ 2 ) is a block diagonal matrix, ρ i = lim( n/n i ) for i = 1 , 2 . Under the n ull P = Q so Σ Σ Σ 1 = Σ Σ Σ 2 and therefore Σ Σ Σ can b e estimated b y Σ Σ Σ n = Blo c kDiag ( n/n 1 b Σ Σ Σ 0 , n/n 2 b Σ Σ Σ 0 ) where b Σ Σ Σ 0 =   n 11 + n 21 n 1 + n 2 (1 − n 11 + n 21 n 1 + n 2 ) n 11 + n 21 n 1 + n 2 n 13 + n 23 n 1 + n 2 n 11 + n 21 n 1 + n 2 n 13 + n 23 n 1 + n 2 n 13 + n 23 n 1 + n 2 (1 − n 13 + n 23 n 1 + n 2 )   . No w η η η = Rθ Rθ Rθ can b e estimated b y W W W n = RS RS RS n and we find that W W W [T5] n =   0 . 20 0 . 09   , W W W [T6] n =   − 0 . 53 0 . 21   and V n = R Σ n R T =   0 . 75 0 . 16 0 . 16 0 . 53   . Here W W W [T5] n and W W W [T6] n are the v alues of W W W n calculated from their T ables 5 and 6 resp ectively . Since the margins of b oth tables are equal the estimated v ariance under the n ull, i.e., V n , is the same in b oth settings. The p–v alues associated with their T able 5 are ( α ∗ , γ ∗ ) = (0 . 12 , 0 . 96) . This finding indicates that the n ull hypothesis in ( 18 ) can not b e rejected in fa vor of the alternative in ( 18 ). A p–v alue of 0 . 12 do es indicate that there is some evidence, albeit w eak, that P ≺ st Q . The p–v alue associated with the auxiliary h yp othesis, i.e., with H 0 : P ⪯ st Q v ersus H 1 : P ⊀ st Q is 0 . 96 indicating that the test is safe. Th us, it seems that their T able 5 suggests an ordering but is under–p o w ered. In fact if w e (artificially) double the n umber of observ ations in eac h cell in their T able 5 then w e obtain the p–v alues ( α ∗ , γ ∗ ) = (0 . 03 , 0 . 96) indicating that the original n ull can b e safely rejected. The p–v alues associated with their T able 6 are ( α ∗ , γ ∗ ) = (0 . 01 , 0 . 001) . These p– v alues indicate that b oth the original n ull and the auxiliary n ull are rejected. In other w ord using T n w ould lead to the rejection of the original null, and v ery likely a Type I I I error, whereas T SAFE n w ould protect against such a potentially erroneous conclusion. It is worth men tioning that Cohen and Sac krowitz (1998) write that "some statisticians w ould b e more inclined to assert sto chastic order for T able 5 than for T able 6" a statemen t with whic h w e fully agree and for whic h w e no w ha ve a well–founded rigorous explanation. 5 Summary and discussion This paper dev elops a metho dology for constructing safe tests with a fo cus on T yp e A Problems in ORI. W e b eliev e that by alleviating the problem of Type I I I errors, the prop osed metho dology addresses some of the principled ob jection to the use of ORI. W e hop e that this adv ance will allow 16 statisticians and researc hers w orking in a v ariety of application areas to capitalize on the benefits of ORI without the fear of systematic errors. The new testing pro cedure com bines tw o base–tests; T n for the original system of h yp otheses, i.e., ( 1 ), and T ′ n for the auxiliary h yp otheses ( 2 ). The proposed approac h can also b e described as t wo–step pro cedure as indicated in T able 1 . In Step One, the n ull h yp othesis in ( 2 ) is tested at a predetermined level γ . If the null is rejected, the procedure terminates, and one concludes that neither the original n ull nor the alternative h yp otheses are supported b y the data and therefore implausible. In such cases, researchers are advised to reev aluate the av ailable evidence, reconsider their mo deling assumptions, and, collect additional data. If the auxilliary null is not rejected, a certificate of v alidit y is issued, and it is appropriate to pro ceed to Step T w o, in which the h yp otheses in ( 1 ) are tested at the significance lev el α controlling the ov erall probability of a T yp e I error at the lev el α SAFE . The lev el of the comp osite safe test denoted b y α SAFE is a function of the levels of the base tests α and γ . F or an y fixed v alue of γ one can easily adjust α so α SAFE ac hieves any prescrib ed v alue in (0 , 1) . The v alue of γ is immaterial in large samples provided c ′ γ / √ n → 0 when n → ∞ . In finite samples, the v alue of γ mo dulates the relationship b et ween the p o wer of the test and its T yp e I I I error rate, esp ecially for v alues of θ θ θ near the b oundary of the alternative. Since in T yp e A Problems L ⊂ C it seems that c ho osing γ < α is coherent, i.e., from a purely logical p ersp ective rejecting the auxiliary hypotheses should require stronger evidence than rejecting the original null. Nevertheless, we ha ve also exp erimen ted with v alues γ > α . These are sensible in situations where T yp e I I I errors may ha v e a profound negativ e impact or the exp erimenter ma y ha ve low confidence in the original formulation of the testing problem. W e b eliev e that an optimal c hoice of γ can b e made b y formally balancing the p o w er with the possibility of Type I I I errors. This is an open problem. The developmen ts in this pap er can b e generalized in man y directions. F or example, safe tests for infinite dimensional h yp otheses of the form H 0 : F ( x ) = G ( x ) v ersus H 1 : F ( x ) ≩ G ( x ) where the symbol ≩ indicates that a w eak inequality holds for all x ∈ R and a strict inequalit y holds for some x ∈ R can b e dev elop ed. These h yp otheses can b e tested by emplo ying the ordinal dominance curv e, (Davido v and Herman, 2012), or other reasonable alternatives. The auxiliary h yp otheses are H ′ 0 : F ( x ) ≥ G ( x ) versus H ′ 1 : ∃ x ∈ R suc h that G ( x ) > F ( x ) for whic h an ordinal curv ed based test can also be developed. It is clear that a safe test is p ossible; its structure and prop erties require further study . The framew ork developed here can b e extended to construct safe tests for general h yp otheses of the form H 0 : θ θ θ ∈ Θ 0 v ersus H 1 : θ θ θ ∈ Θ 1 where Θ 0 and Θ 1 arbitrary subsets of Θ . An in teresting direction inv olves applying safe tests to Neyman–P earson t wo–point hypotheses. Initial results suggest that the resulting pro cedures exhibit sev eral app ealing properties. In summary , while order restrictions can substan tially improv e statistical efficiency and inter- pretabilit y , these gains hinge on the v alidity of the assumed order. If the constraints are incorrect, 17 inference ma y be seriously distorted. Consequently , there is a clear need for pro cedures that adaptiv ely determine whether the data supp ort an ordering and imp ose constraints only when w arranted. This manuscript constitutes an initial con tribution tow ard the developmen t of suc h an adaptiv e metho dology in ORI. Here h yp othesis testing is addressed; extensions to constrained estimation, classification, and prediction forming the basis of future work. A cknowledgments The researc h of Ori Da vidov was supp orted in part by the Israeli Science F oundation Gran t No. 2200/22. References [1] Andrews DW (2000). Inconsistency of the b o otstrap when a parameter is on the boundary of the parameter space. Ec onometric a , 1: 399–405. [2] Barlo w RE, Bartholomew DJ, Bremner JM, Brunk HD (1972). Statistic al Infer enc e Under Or der R estrictions: The The ory and A pplic ation of Isotonic R e gr ession . Wiley . [3] Bilo deau M, Brenner D (1999). The ory of Multivariate Statistics . Springer. [4] Casella G, Berger R (2024). Statistic al Infer enc e . CR C Press. [5] Cohen A, Sackro witz HB (1998). Directional tests for one-sided alternatives in m ultiv ariate mo dels. The A nnals of Statistics , 26: 2321–2338. [6] Cohen A, Sackro witz HB (2004). A discussion of some inference issues in order restricted mo dels. Canadian Journal of Statistics , 32: 199–205. [7] Da vidov O, Herman A (2012). Ordinal dominance curve based inference for stochastically ordered distributions. Journal of the R oyal Statistic al So ciety, Series B , 74: 825-847. [8] Finner H (1999). Step wise m ultiple test procedures and control of directional errors. The A nnals of Statistics , 27: 274–289. [9] Ghosh S, Da vido v O (2024). Some no vel limiting distributions arising in order restricted inference. Statistics and A pplic ations , 22:45–66. [10] Grandhi A, Guo W, P eddada SD (2016). A multiple testing pro cedure for m ulti-dimensional pairwise comparisons with application to gene expression studies. BMC bioinformatics , 17, 1–12. 18 [11] Guo W, Sarkar SK, Peddada SD (2010). Con trolling false disco veries in multidimensional directional decisions, with applications to gene expression data on ordered categories. Bio- metrics , 66, 485–492. [12] Hw ang JG, P eddada SD (1994). Confidence interv al estimation subject to order restrictions. The A nnals of Statistics , 22: 67–93. [13] Kaiser HF (1960). Directional statistical decisions. Psycholo gic al R eview , 67: 160. [14] Lehmann EL, Romano JP (2005). T esting Statistic al Hyp otheses . Springer: New Y ork. [15] Lin H, Peddada SD (2024). Multigroup analysis of comp ositions of microbiomes with cov ariate adjustmen ts and rep eated measures. Natur e Metho ds , 21: 83–91. [16] Ma yo DG, Spanos A (2011). Error statistics. In Philosophy of Statistics (pp. 153-198). North– Holland. [17] Olec kno W A (2008). Epidemiolo gy: Conc epts and Metho ds . W a veland Press. [18] P erlman MD, W u L (1999). The emp eror’s new tests. Statistic al Scienc e , 14: 355–369. [19] Praestgaard J (2012). A note on the pow er sup eriority of the restricted likelihoo d ratio test. Journal of Multivariate A nalysis , 104: 1–15. [20] Raub ertas RF, Lee CIC, Nordheim EV (1986). Hyp othesis tests for normal means constrained b y linear inequalities. Communic ations in Statistics-The ory and Metho ds , 15: 2809–2833. [21] Rob ertson T, W right FT, Dykstra R (1988). Or der R estricte d Statistic al Infer enc e . Wiley & Sons. [22] Rosen S, Davido v O (2017). Ordered regressions. Sc andinavian Journal of Statistics , 44: 817– 842. [23] Salkind NJ (2010). Encyclop e dia of R ese ar ch Design . Sage. [24] Shak ed M, Shan thikumar JG (2007). Sto chastic Or ders . Springer. New Y ork. [25] Silv apulle MJ (1997). A curious example inv olving the lik eliho o d ratio test against one-sided h yp otheses. The A meric an Statistician , 51: 178–180. [26] Silv apulle MJ, Sen PK (2005). Constr aine d Statistic al Infer enc e . John Wiley & Sons. [27] Singh SP , Da vidov O (2019). On the design of exp erimen t with ordered treatmen ts. Journal of the R oyal Statistic al So ciety, Series B , 81: 881–900. 19 [28] Singh SP , P eddada SD, Da vidov O (2021). A cost effective approach to the design and analysis of multi–group exp erimen ts. Statistics and A pplic ations , 19: 1–10. [29] Shaffer JP (1972). Directional statistical hypotheses and comparisons among means. Psycho- lo gic al Bul letin , 77: 195. [30] Shaffer JP (1990). Con trol of directional errors with stage wise multiple test pro cedures. The A nnals of Statistics , 8: 1342–1347. [31] Shaffer JP (2002). Multiplicit y , directional (T yp e I I I) errors, and the Null Hyp othesis. Psy- cholo gic al Metho ds , 7: 356. [32] Shak ed M, Shan thikumar JG (2007). Sto chastic Or ders . New Y ork, NY: Springer New Y ork. [33] V an Eeden C (2006). R estricte d p ar ameter sp ac e estimation pr oblems . Springer. 6 App endix A: Pro ofs Pro of of Theorem 2.1 : Pr o of. In Type A Problems ∆ = ∥ θ θ θ − Π Σ Σ Σ ( θ θ θ | L ) ∥ 2 Σ Σ Σ − ∥ θ θ θ − Π Σ Σ Σ ( θ θ θ | C ) ∥ 2 Σ Σ Σ whic h by part ( h ) of Propo- sition 3.12.6 in Silv apulle and Sen (2005) reduces to ∆ = ∥ Π Σ Σ Σ ( θ θ θ | L ) − Π Σ Σ Σ ( θ θ θ | C ) ∥ 2 Σ Σ Σ . In addition b y part ( i ) of Prop osition 3.12.6 of Silv apulle and Sen (2005) we ma y reexpress ∆ , giv en in the displa y ab ov e, as ∆ = ∥ Π Σ Σ Σ ( θ θ θ | L ⊥ ∩ C ) ∥ 2 Σ Σ Σ . It follo ws from Moreau’s Theorem (Prop osition 3.12.4 in Silv apulle and Sen, 2005) that Π Σ Σ Σ ( θ θ θ | L ⊥ ∩ C ) = 0 0 0 if and only if θ θ θ ∈ ( L ⊥ ∩ C ) ◦ . Th us w e conclude that in T yp e A Problems the DT is consisten t pro vided θ θ θ / ∈ ( L ⊥ ∩ C ) ◦ establishing ( 6 ). Next, consider Type B Problems where Θ 0 = C and Θ 1 = R m in whic h case ∆ reduces to ∥ θ θ θ − Π Σ Σ Σ ( θ θ θ | C ) ∥ 2 Σ Σ Σ whic h is strictly non–negativ e if and only if θ θ θ / ∈ C . This completes the pro of. Pro of of Theorem 2.2 : 20 Pr o of. Consider T yp e A Problems. By Theorem 3.7.1 in Silv apulle and Sen (2005) testing H 0 : θ θ θ ∈ L v ersus H 1 : θ θ θ ∈ C \ L is equiv alen t to testing H 0 : θ θ θ = 0 0 0 v ersus H 1 : θ θ θ ∈ C ∩ L ⊥ . The DT for the latter system of hypotheses simplifies to T n = n {∥ S S S n ∥ 2 Σ Σ Σ n − ∥ S S S n − Π Σ Σ Σ n ( S S S n | ( C ∩ L ⊥ ) ◦ Σ Σ Σ n ) ∥ 2 Σ Σ Σ n } = n ∥ Π Σ Σ Σ n ( S S S n | ( C ∩ L ⊥ ) ◦ Σ Σ Σ n ) ∥ 2 Σ Σ Σ n where the second equality in the display abov e follows from Moreua’s Theorem. Thus, the n ull is not rejected if T n < c α , i.e., when ∥ S S S n − ( C ∩ L ⊥ ) ◦ Σ Σ Σ n ∥ 2 Σ Σ Σ n < c α n . It is easy to v erify that the latter is satisfied if and only if S S S n ∈ A n ( L , C , α ) where A n ( L , C , α ) = ( C ∩ L ⊥ ) ◦ Σ Σ Σ n ⊕ Ball Σ Σ Σ n ( 0 0 0 , r c α n ) (19) establishing ( 8 ) as required. Next we consider a Type B Problem in which the hypotheses H 0 : θ θ θ ∈ C versus H 1 : θ θ θ ∈ R m \ C are tested. Clearly , the DT reduces to T n = n {∥ S S S n − Π Σ Σ Σ n ( S S S n | C ) ∥ 2 Σ Σ Σ n = n ∥ Π Σ Σ Σ n ( S S S n | C ) ∥ 2 Σ Σ Σ n . Therefore T n < c α if and only if ∥ S S S n − C ∥ 2 Σ Σ Σ n < c α n . Rep eating the calculations ab o ve, w e find that A n ( C , R m ) = C ⊕ Ball Σ Σ Σ n ( 0 0 0 , r c α n ) concluding the proof. Pro of of Theorem 2.3 : Pr o of. It is w ell known (e.g., Barlo w et al. 1972) that the DT for testing H 0 : θ θ θ ∈ L versus H 1 : θ θ θ ∈ C \ L , where L = { θ θ θ : θ 1 = · · · = θ K } and C is an an y closed con vex cone is of the form T n = n b σ 2 n K X i =1 w i,n ( e θ i,n − Y n ) 2 , where Y n = n − 1 P K i =1 P n i j = Y ij is the o v erall mean, ( w 1 ,n , . . . , w K,n ) = ( n 1 /n, . . . , n K /n ) is a v ector of weigh ts determined b y the sample sizes, and e θ 1 ,n , . . . , e θ K,n are the elemen ts of f θ θ θ n = Π Σ Σ Σ n ( S S S n | C ) the constrained estimator of θ θ θ . Here S S S n = ( Y 1 ,n , . . . , Y K,n ) T with Y i,n = n − 1 i P n i j = Y ij for i = 1 , . . . , K and Σ Σ Σ n = b σ 2 n Diag(1 /w 1 ,n , . . . , 1 /w K,n ) where b σ 2 n is an y consistent estimator for σ 2 . The 21 resulting estimators are kno wn as the isotonic regression estimators. In particular if C = C s the i th elemen t of e θ θ θ n is given b y the min–max form ula e θ i,n = min t ≥ i max s ≤ i A v ( S S S n , s, t ) . (20) Since √ n ( S S S n − θ θ θ ) ⇒ N K ( 0 0 0 , Σ Σ Σ) and Σ Σ Σ n p → Σ Σ Σ = σ 2 Diag(1 /w 1 , . . . , 1 /w K ) as n → ∞ it is evident that for large n w e ha ve 1 b σ 2 n K X i =1 w i,n ( e θ i,n − Y n ) 2 = 1 σ 2 { K X i =1 w i ( θ ∗ i − θ ) 2 } + o p (1) where θ = P K i =1 w i θ i is the almost sure limit of Y n and θ ∗ i , i = 1 , . . . , K are the limiting v alue of e θ n,i obtained by applying the la w of large n umbers and the contin uous mapping theorem to ( 20 ), i.e., θ ∗ i = min t ≥ i max s ≤ i A v ( θ , s, t ) . It follows that the DT is consisten t if and only if K X i =1 w i ( θ ∗ i − θ ) 2 > 0 . (21) It is well known that Y n = P K i =1 w i,n e θ i,n so θ = P K i =1 w i θ i = P K i =1 w i θ ∗ i . Now b y construction θ ∗ 1 ≤ · · · ≤ θ ∗ K and θ = P K i =1 w i θ ∗ i and therefore ( 21 ) holds if and only if for some i we ha ve θ ∗ i < θ ∗ i +1 . Recall that the min–max form ulas are equiv alent to the celebrated po ol–adjacent violator algorithm (P A V A) as describ ed in v an Eeden (2006). By the lo op–inv ariant property of P A V A w e can first apply P A V A separately to the vectors S S S n (1 , i ) = ( Y 1 ,n , . . . , Y i,n ) T and S S S n ( i + 1 , K ) = ( Y i +1 ,n , . . . , Y K,n ) T obtain the estimators e θ θ θ n (1 , i ) and e θ θ θ n ( i + 1 , K ) and then apply P A V A again to the full v ector ( e θ θ θ n (1 , i ) T , e θ θ θ n ( i + 1 , K ) T ) to obtain e θ θ θ n . F urthernote that b y the min–max form ulas applied separately to the v ectors S S S n (1 , i ) and S S S n ( i + 1 , K ) we find that e θ i,n (1 , i ) = max 1 ≤ s ≤ i A v ( S S S n (1 , i ) , s, i ) , e θ i +1 ,n ( i, K + 1) = min i +1 ≤ t ≤ K A v ( S S S n ( i + 1 , K ) , i + 1 , t ) . Here e θ i,n (1 , i ) is the i th and largest elemen t of e θ θ θ n (1 , i ) whereas e θ i +1 ,n ( i, K + 1) is the 1 st and smallest elemen t of e θ θ θ n ( i + 1 , K ) . By the la w of large num b ers and the con tinuous mapping theorem e θ i,n (1 , i ) p → θ ∗ i (1 , i ) and e θ i +1 ,n ( i, K + 1) p → θ ∗ i +1 ( i + 1 , K ) where θ ∗ i (1 , i ) = max 1 ≤ s ≤ i A v ( θ θ θ (1 , i ) , s, i ) , 22 θ ∗ i +1 ( i + 1 , K ) = min i +1 ≤ t ≤ K A v ( θ θ θ ( i + 1 , K , i + 1 , t ) . No w if θ ∗ i (1 , i ) < θ ∗ i +1 ( i + 1 , K ) (22) then θ θ θ ∗ = ( θ θ θ ∗ (1 , i ) T , θ θ θ ∗ ( i + 1 , K ) T ) T , i.e., applying P A V A (or min–max) to S S S n is equiv alen t to separately applying it to S S S n (1 , i ) and S S S n ( i + 1 , K ) and then com bining the results. Thus, if ( 22 ) holds for some i ∈ { 1 , . . . , K } then θ ∗ i < θ ∗ i +1 and consequen tly ( 21 ) holds as w ell so the DT is consisten t. Ho wev er, Equation ( 22 ) is nothing but Equation ( 11 ). This completes the pro of. Pro of of Theorem 3.1 : Pr o of. Consider T yp e A Problems. Clearly for any θ θ θ ∈ C w e ha v e ∆ > 0 , so the DT is consistent on C . By Equation ( 6 ) the DT is not consistent on ( C ∩ L ⊥ ) ◦ . First note that ( C ∪ ( C ∩ L ⊥ ) ◦ ) ⊆ C ∪ C ◦ . Next note that R m \ ( C ∪ C ◦ ) = { u u u ∈ R m : u u u T Σ Σ Σ − 1 v v v > 0 , ∀ v v v ∈ C }  = ∅ , i.e., C ∪ C ◦ is a strict subset of R m . It follo ws that DT is consisten t on θ θ θ ∈ R m \ ( C ∪ ( C ∩ L ⊥ ) ◦ ) . In other words there are θ θ θ / ∈ C for whic h lim n P θ θ θ ( T n ∈ R ) = 1 . Thus the DT is not safe. Pro of of Theorem 3.2 : Pr o of. Fix α and let c α = sup θ θ θ ∈L P θ θ θ ( T n ≥ c α ) . No w b y construction T SAFE n ≤ T n and therefore { T SAFE n ≥ c α } ⊆ { T n ≥ c α } . It immediately follo ws that α SAFE = sup θ θ θ ∈L P θ θ θ ( T SAFE n ≥ c α ) ≤ sup θ θ θ ∈L P θ θ θ ( T n ≥ c α ) = α, establishing ( 14 ). Next let E denote the set of v alues of θ θ θ on which the test T n commits a T yp e II I error with probabilit y one as n → ∞ , i.e., E = R p \ ( C ∪ C ◦ ) , cf., Theorem 3.1 . No w for an y θ θ θ ∈ E and ε > 0 w e hav e P θ θ θ ( T SAFE n ≥ c α ) = P θ θ θ ( T n ≥ c α , T ′ n < c ′ γ ) ≤ P θ θ θ ( T ′ n ≤ c ′ γ ) = P θ θ θ ( S S S n ∈ A n ( C , R p , γ )) = P θ θ θ ( S S S n ∈ C ⊕ Ball Σ Σ Σ n ( 0 , s c ′ γ n )) 23 = P θ θ θ ( S S S n ∈ C ⊕ Ball Σ Σ Σ ( 0 , s c ′ γ n )) + o p (1) = P θ θ θ ( S S S n ∈ C ⊕ Ball Σ Σ Σ ( 0 , s c ′ γ n ) , S S S n ∈ Ball Σ Σ Σ ( θ θ θ , ε )) + P θ θ θ ( S S S n ∈ C ⊕ Ball Σ Σ Σ ( 0 , s c ′ γ n ) , S S S n / ∈ Ball Σ Σ Σ ( θ θ θ , ε )) + o p (1) ≤ P θ θ θ ( S S S n ∈ ( C ⊕ Ball Σ Σ Σ ( 0 , s c ′ γ n ) ∩ Ball Σ Σ Σ ( θ θ θ , ε ))) + P θ θ θ ( S S S n / ∈ Ball Σ Σ Σ ( θ θ θ , ε )) . Since √ n ( S S S n − θ θ θ ) ⇒ N p ( 0 0 0 , Σ Σ Σ) we ha ve P θ θ θ ( S S S n / ∈ Ball Σ Σ Σ ( θ θ θ , ε )) → 0 as n → ∞ . No w θ θ θ ∈ E so for all small enough ε we hav e Ball Σ Σ Σ ( θ θ θ , ε ) ∩ C p + = ∅ . Consequen tly , for large enough n w e hav e C ⊕ Ball Σ Σ Σ ( 0 0 0 , q c ′ γ /n ) ∩ Ball Σ Σ Σ ( θ θ θ , ε ) = ∅ . Hence, P θ θ θ ( S S S n ∈ ( C ⊕ Ball Σ Σ Σ ( 0 0 0 , s c ′ γ n ) ∩ Ball Σ Σ Σ ( θ θ θ , ε )) = 0 for all large n . Com bining the displays ab o ve w e conclude that P θ θ θ ( T SAFE n ≥ c α ) → 0 and therefore T SAFE n is a safe test as claimed. Finally , since for any α ∈ (0 , 1) w e ha ve α SAFE < α , i.e., P ( T SAFE n ≥ c α ) < α . A dditionally , there must exist a v alue c SAFE α for which and sup θ θ θ ∈L P ( T SAFE n ≥ c SAFE α ) = α . Thus, c SAFE α < c α . Let θ θ θ ∈ int( C \ L ) . F or an y suc h θ θ θ w e hav e P θ θ θ ( T ′ n < c ′ γ ) → 1 as n → ∞ and since T SAFE n = T n I { T ′ n

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment