Addressing parity blindness of data-driven Sobolev tests on the hypersphere

Addressing parity blindness of data-dri ven Sobole v tests on the h ypersphere Marcio Rev erbel 1, ∗ Université Libr e de Bruxelles, Brussels, Belgium Abstract W e study the asymptotic behavior of the data-dri v en Sobolev test for testing uniformity on the (hyper)sphere. W e show that it can be blind to certain contiguous alternati ves and propose a simple modiﬁcation of the test statistic. This adapted test retains consistency under ﬁx ed alternati ves and achie ves non-trivial asymptotic po wer against contiguous alternativ es for which the original test fails. Simulation results support our theoretical ﬁndings. K e ywor ds: hypothesis testing, directional statistics, local asymptotics, contiguity 1. Introduction Directional data, i.e., observations which lie on the surface of a (hyper)sphere, appear in applications where only the directions are relev ant, such that the magnitude of each observ ation can be discarded. Such applications are found in many dif ferent ﬁelds, including astronomy , medicine, genetics, bioinformatics, image analysis, machine learning and text mining, among others (Pe wsey and García-Portugués, 2021; García-Portugués and V erdebout, 2018). T esting for uniformity on an independent and identically distributed (iid) sample of unit v ectors on the unit sphere S d − 1 : = { x ∈ R d : ∥ x ∥ = x ⊤ x = 1 } is a problem that has been widely discussed in the literature. It arises when one has directional data and needs to test whether these directions appear “at random” (uniform) or sho w some preferred orientation (anisotropy). For example, in 1767, John Michell, Fello w of the Royal Society of London, was interested in ho w stars were distributed in the night sky and argued that there were too many pairs or groups of stars clustered together than would be possible in a uniformly distrib uted set of stars (American Physical Society, 2009). The class of Sobolev tests presented by Giné (1975) is the most extensi ve class of tests devised to tackle this testing problem (García-Portugués and V erdebout, 2018). It includes tests that preceded it historically , such as the Rayleigh (Rayleigh, 1919) and Bingham tests (Bingham, 1974). The data-driv en Sobole v test proposed by Jupp (2008), while consistent against all ﬁxed alternativ es, is blind to contiguous sequences of alternatives characterized by angular functions whose k th deriv ati ves vanish at zero for k odd. W e propose a modiﬁcation by placing a lo wer-bound on ˆ k , and show that this results in a test with non-tri vial asymptotic power ag ainst the aforementioned contiguous alternativ es. Simulations conﬁrm our theoretical ﬁndings. This work is or ganized as follo ws: Section 2 provides a theoretical introduction to Sobolev tests. Section 3 dis- cusses the asymptotic properties of the data-dri ven Sobolev test, Section 4 presents the adapted test, and Section 5 compares the original and adapted data-driven tests with simulations. Section 6 offers a small conclusion with direc- tions for further research. 2. Preliminaries Sobolev tests of uniformity on S d − 1 = { x ∈ R d : ∥ x ∥ = 1 } are constructed from an inﬁnite-dimensional or- thonormal basis on L 2 ( S d − 1 , ν d ) , where ν d is the uniform probability measure on S d − 1 . Giv en n iid observ ations ∗ Corresponding author Email addr ess: marcio.reverbel@kuleuven.be (Marcio Re verbel) 1 Present address: Department of Mathematics, KU Leuven, Celestijnenlaan 200, B-02.29, 3001 He verlee, Belgium. X ( n ) = ( x 1 , . . . , x n ) of a random vector X ∈ S d − 1 , we would like to test the hypothesis that the observations are drawn from a uniform distrib ution: H 0 : X ∼ Unif ( S d − 1 ) . Sobolev tests (Giné, 1975) reject the null hypothesis for high values of the test statistic: S v k : = 1 n n X i , j = 1 ∞ X k = 1 v 2 k h k ( x ⊤ i x j ) , with h k ( t ) : =        2 cos  k arccos( t )  , if d = 2 ,  1 + 2 k d − 2  P ⌊ k / 2 ⌋ j = 0 ( − 1) j c d − 2 2 k , j t k − 2 j , if d > 2 , , (1) where ( v k ) k ∈ N 1 is a real sequence satisfying P ∞ k = 1 v 2 k d k < ∞ , d k = d + k − 3 d − 2 ! + d + k − 2 d − 2 ! , and c λ k , j : = 2 k − 2 j Γ  k − j + λ  Γ ( λ ) j !  k − 2 j  ! . Under the null hypothesis, we hav e S v k D − − → H 0 ∞ X k = 1 v 2 k χ 2 d k . The Sobolev test as deﬁned in Equation (1) is a class of tests which depends on the sequence v k chosen, and includes the Rayleigh (Rayleigh, 1919) and Bingham tests (Bingham, 1974), as well as the score test of uniformity in the exponential model. While a purely theoretical Sobolev test in which v k , 0 for all k > 0 is consistent against all alternativ es, practical implementations of this test – with a ﬁnite number of non-zero elements in the sequence v k – will be blind to speciﬁc sets of alternativ es (Mardia and Jupp, 2009; Jupp and Spurr, 1983; Giné, 1975). The Rayleigh test of uniformity , R n , is a Sobolev test with v k = I [ k = 1] . W e also have R n = d n ∥ ¯ x ∥ 2 , (2) where ¯ x denotes the sample av erage. The Rayleigh test coincides with the likelihood ratio test against von Mises al- ternativ es, and is the locally most po werful in variant test against these alternativ es (Mardia and Jupp, 2009). Howe ver , since it is a test based on the sample average, which is insensitive to identical perturbations on opposing sides of the unit sphere, the Rayleigh test is blind to alternativ es with any kind of antipodal symmetry . The Bingham test, B n , is a Sobolev test with v k = I [ k = 2] . It takes the form B n = n d ( d + 2) 2         1 n 2 n X i , j = 1 ( x ⊤ i x j ) 2 − 1 d         = n d ( d + 2) 2 " tr [ ˆ Σ ] + 2 ¯ x ⊤ ˆ Σ ¯ x + ∥ ¯ x ∥ 4 − 1 d # , (3) where ˆ Σ denotes the sample cov ariance matrix. It is blind to alternati ves for which the expected value of the sample scatter matrix under the alternativ e is indistinguishable from the expected v alue under the null: E H 0 [ X X ⊤ ] = d − 1 I d . The following version of the Sobolev was introduced by Beran (1979). Its test statistic, S K , takes the form of a Sobolev test with v k = I [ k ≤ K ] for some K < ∞ : S K : = 1 n n X i , j = 1 K X k = 1 h k ( x ⊤ i x j ) . (4) This test plays an important role in the data-driv en Sobole v test introduced by Jupp (2008). 3. Data-driven Sobolev tests One of the most dif ﬁcult aspects concerning the Sobole v tests lies in the choice of the sequence v k , as this impacts the kind of alternati ves the test cannot detect. In the case of the quadratic score test (4), this translates into choosing K . Jupp (2008) proposes a variant of the score test in which this choice is made by the data, via a penalized score statistic. Deﬁne B S ( K ) = S K − p K log( n ) , (5) 2 where S K is the quadratic score test (4) and p K is as deﬁned in Section 2. The data-dri ven Sobole v test statistic, S ˆ k , is giv en by: S ˆ k : = 1 n n X i , j = 1 ˆ k X k = 1 h k ( x ⊤ i x j ) , ˆ k = inf ( K ∈ N : B S ( K ) = sup m ∈ N B S ( m ) ) (6) The test rejects uniformity for large values of S ˆ k . In practice, one would not calculate inﬁnitely many B S statistics to choose ˆ k , but rather place an upper limit M on m . Jupp (2008) derived important asymptotic properties for the data-driv en Sobole v test: the con v ergence in proba- bility under the null of ˆ k to 1 , the asymptotic distribution of S ˆ k under the null, and the consistency of the test against all alternatives to uniformity . T ogether, these theorems tells us that, for large samples, ˆ k tends to be close to 1 under uniformity , which leads to a simple test statistic, and that ˆ k tends to rise sufﬁciently to cause rejection of the null hypothesis under any (ﬁx ed) alternativ e distribution. See Theorems 3.1, 3.2 and 3.3 in Jupp (2008) for more details. The asymptotic behavior of Sobole v tests under the null hypothesis has been well explored and follows directly from Theorem 4.1 in Giné (1975). Howe ver , research into the asymptotic behavior of Sobole v tests under non-null distributions is much more recent, with García-Portugués et al. (2025) presenting important results under some of the most commonly considered alternativ e distributions. These distrib utions hav e densities of the form: x 7→ c d ,κ , g g ( κ µ ⊤ x ) , (7) where κ > 0 is a concentration parameter, µ ∈ S d − 1 is a location parameter , and g : R → R + is an angular function , i.e, a function which only depends on the angle between between x and µ , and c d ,κ , g is a normalizing constant. Distributions that fall under this category include the von Mises-Fisher distribution, with g ( s ) = exp ( s ) ; the W atson distribution, with g ( s ) = exp ( s 2 ) ; other exponential distributions of the form g ( s ) = exp ( s b ) , with b > 0 ; and the directional Cauchy distribution, with g ( s ) = (1 + 2 s ) − 1 and s = κ (1 − µ ⊤ x ) instead. The angular terminology is used here to emphasize the fact that these distrib utions are rotationally symmetric with respect to the location parameter , which makes them of particular interest. Let P κ n , g denote a local v ersion of the distrib ution in Equation (7), such that κ = κ n con ver ges to zero as n increases. Finally , we have the follo wing result from García-Portugués et al. (2025): Corollary 3.1 (Parity-blindness) . F or d ≥ 2 , consider a sequence v k with only ﬁnitely many non-zer o terms such that v k = 0 for eac h even (resp. odd) k . Let g be an angular function, q ( ≥ max { k : v k , 0 } ) times dif fer entiable at zero, with g ( k ) (0) = 0 for each odd (r esp. even) k ∈ { k v , . . . , q } . Then, under P κ n , g with κ n = n − 1 / (2 q ) τ , S ( n ) v k D − → K v X k = 1 v 2 k χ 2 d k as n → ∞ . This Corollary states that if a Sobolev test only has non-zero v k for k of one parity , and the k th deriv ati ves of g vanish at zero for all k of the same parity , then the test is blind, in the sense that its asymptotic distribution under the local alternativ e is no dif ferent than under the null. 4. An adapted data-driv en Sobolev test Based on the results of Section 3, we in vestigate the non-null asymptotic behavior of the data-driv en Sobole v test in the Le Cam sense. Jupp (2008) demonstrated the test’ s consistency in a ﬁxed setting. Ho wev er , the test’ s behavior under sequences of alternativ es which become increasingly closer to the null distribution remains unexplored. Speciﬁcally , we will see that the data-driven test is blind to contiguous alternati ves whose odd-numbered deriv ativ es of the angular function v anish at zero. First, recall the deﬁnition of contiguity: For two sequences of probability measures, Q n and P n , we say that Q n is contiguous with respect to P n (notation: Q n ◁ P n ) if, for e very sequence of measurable sets A n , P n ( A n ) n →∞ − − − − → 0 implies Q n ( A n ) n →∞ − − − − → 0 . W e can now present the follo wing lemma: Lemma 4.1. Let S ˆ k be the data-driven Sobolev test as deﬁned in Equation (6) , and let P κ n be a sequence of distribu- tions. If P κ n ◁ P 0 , then: P κ n [ ˆ k = 1] n →∞ − − − − → 1 (8) 3 Pr oof. This lemma is a direct application of the deﬁnition of contiguity . From Theorem 3.1 in Jupp (2008), and contiguity of P κ n : P 0 [ ˆ k = 1] n →∞ − − − − → 1 | {z } Theorem 3.1 in Jupp (2008) = ⇒ P 0 [ ˆ k > 1] n →∞ − − − − → 0 = ⇒ P κ n [ ˆ k > 1] n →∞ − − − − → 0 | {z } P κ n ◁ P 0 = ⇒ P κ n [ ˆ k = 1] n →∞ − − − − → 1 . From Corollary 3.1, this test will be blind to contiguous alternati ves if the k th deriv ati ves of their angular functions vanish at zero for ev ery k odd. This is, of course, not desirable, as there exist other Sobole v tests with non-tri vial asymptotic powers against these alternativ es (such as the Bingham test), and it would be highly appropriate for a data-driv en test to select one of these instead of the blind test. W e propose a simple solution to counter this problem, which is to include at least one k of each parity in the Sobolev test. Thus, an adapted data-dri ven Sobole v test takes the follo wing form: S ˆ k ∗ : = 1 n n X i , j = 1 ˆ k ∗ X k = 1 h k ( x ⊤ i x j ) , ˆ k ∗ = max { ˆ k , 2 } , (9) and ˆ k is as deﬁned in Equation (6). The adapted data-dri v en Sobole v test S ˆ k ∗ is identical to the original data-dri ven test, except that the smallest v alue accepted of ˆ k ∗ is 2. The test rejects uniformity for large v alues of S ˆ k , and it is easy to see that it is still consistent against all alternatives to uniformity in a ﬁxed setting. Ho wev er , no w it will not be blind to contiguous alternati ves whose odd-ordered deri v ativ es v anish at zero. The price to pay is the decrease in po wer when testing against alternativ es that could be easily rejected at ˆ k = 1 . In fact, we hav e the following results: Theorem 4.1. Let ˆ k ∗ be deﬁned as in Equation (9) . Then ˆ k ∗ P − → H 0 2 as n → ∞ . Pr oof. The proof is tri vial. From Theorem 3.1 in Jupp (2008), we ha ve that ˆ k con ver ges in probability to 1 under H 0 . Therefore, ˆ k ∗ con ver ges in probability to max { 1 , 2 } = 2 under H 0 . Theorem 4.2. Let S ˆ k ∗ be deﬁned as in Equation (9) . Then S ˆ k ∗ D − → H 0 χ 2 d 1 + d 2 as n → ∞ . Theorem 4.3. The test ϕ ∗ ( x ( n ) ) which rejects uniformity for lar ge values of S ˆ k ∗ is consistent against all (ﬁxed) alter - natives to uniformity , i.e: E H 1 [ ϕ ∗ ( x ( n ) )] n →∞ − − − − → 1 The proofs are simple analogous versions those of Theorems 3.2 and 3.3 in Jupp (2008). In addition, the follo w- ing result shows the existence of non-trivial asymptotic po wer for the adapted data-driven Sobolev test against any contiguous sequence of alternativ es of the form in Equation (7) and for which g ( k ) (0) , 0 for some k : Corollary 4.1. Let S ( n ) ˆ k ∗ be the adapted data-driven Sobolev test deﬁned in (9) . Let g be an angular function that is q ( ≥ max { k : v k , 0 } ) times dif fer entiable at zer o suc h that g ( k ) (0) , 0 for at least one k ≤ q . Let k ∗ be the smallest k for which g ( k ) (0) , 0 . Set a non-centrality parameter ξ k , k ∗ ( τ ) = 1 ( k ∗ !) 2 w 2 k  g ( k ∗ ) (0)  2 τ 2 k ∗         ⌊ k / 2 ⌋ X j = 0 ( − 1) j c ( d − 2) / 2 k , j a k + k ∗ − 2 j         2 , 4 wher e w k : =                √ 2 , if d = 2 , 1 + 2 k ( d − 2) √ d k , if d ≥ 3 , a m = I h m 2 ∈ N i m 2 − 1 Y r = 0 1 + 2 r d + 2 r , and with the con vention for a m that an empty pr oduct is equal to one. Then, under P κ n , g ◁ P 0 with κ n = n − 1 / (2 k ∗ ) τ , S ( n ) ˆ k ∗ D − → 2 X k = 1 χ 2 d k  I [ k ∼ k ∗ ] ξ k , k ∗ ( τ )  , wher e the r elation a ∼ b is satisﬁed when a and b shar e the same parity . Pr oof. W e have v k = I [ k ∈ { 1 , 2 } ] for the adapted data-driv en Sobolev test under contiguous alternativ es, and k ∗ will always exist under any of the alternativ e distrib utions considered. Direct application of Theorem 5.2 from García- Portugués et al. (2025) yields the ﬁnal result. 5. Simulations In the follo wing, we provide some simulation results comparing the original data-dri ven Sobolev test (Jupp, 2008) with our adapted version. For e very combination of n ∈ { 200 , 500 , 1500 } , ℓ ∈ { 2 , 4 , 6 } and τ ∈ { 0 , 0 . 5 , . . . , 6 } , we generated M = 5 , 000 independent random samples in R 3 of v on Mises-Fisher ( g ( s ) = exp( s ) ) and W atson distribu- tions ( g ( s ) = exp( s 2 ) ) with κ n = n − 1 /ℓ τ (see Equation (7)). In each of these samples, we computed the data-driven Sobolev test ( S ˆ k ) and adapted data-driv en Sobolev test ( S ˆ k ∗ ). The results are presented in Figures 1a, 1b for the von Mises-Fisher alternativ es, and Figures 1c, 1d for the W atson alternati ves. 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 0 2 4 6 τ R e j e ct i o n F r e q u e n cy ( a ) O r i g i n a l t e st ( vM F ) 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 0 2 4 6 τ ( b ) A d a p t e d t e st ( vM F ) 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 0 2 4 6 τ R e j e ct i o n F r e q u e n cy ( c) O r i g i n a l t e st ( W a t so n ) 0 . 0 0 0 . 2 5 0 . 5 0 0 . 7 5 1 . 0 0 0 2 4 6 τ ( d ) A d a p t e d t e st ( W a t so n ) ℓ 2 4 6 n 2 0 0 5 0 0 1 5 0 0 Figure 1: Rejection frequencies at asymptotic lev el α = 5% for 5000 samples under (top) von Mises–Fisher and (bottom) W atson alternativ es; left: data-driv en Sobolev , right: adapted version. Their respective angular functions are of the form g ( κ n x ⊤ µ ) , where κ n = n − 1 / ℓ τ and µ = e 1 , from the standard basis. In the bottom ﬁgures, we included in gray the asymptotic po wer of the Bingham test, for comparison. The Rayleigh test is the locally most powerful inv ariant test against von Mises-Fisher alternati ves. Since these alternativ es are contiguous to the null distribution when ℓ = 2 , we hav e ˆ k → 1 and ˆ k ∗ → 2 in probability . There- fore, Jupp’ s data-dri ven Sobole v test is the Rayleigh test in this scenario, while the adapted test is a combination of 5 Rayleigh and Bingham tests. The Rayleigh test being the most powerful test suggests that Jupp’ s test should ha ve higher asymptotic power than our adapted version. This is clearly supported by Figures 1a, 1b . The bro wn curves, representing the empirical rejection frequencies for ℓ = 2 , approximate the non-tri vial asymptotic powers against contiguous alternativ es for dif ferent values of τ . When testing against these kinds of alternativ es, the proposed test performs worse. Howe ver , it remains consistent against von Mises-Fisher alternati ves for which the con ver gence rate κ n is slower than in the contiguous case ( ℓ > 2 ). When testing against the W atson distribution, our theoretical results showed that Jupp’ s test should be blind against the contiguous alternativ es, while our adapted version should show non-tri vial asymptotic powers. Once again, this is clearly supported by Figures 1c, 1d. It is clear from Figure 1c that Jupp’ s test is blind to the contiguous alternati ve ( ℓ = 4) , since the increase in sample size decreases the rejection frequency for ﬁxed τ . The adapted test (Figure 1d) achiev es non-trivial asymptotic powers when ℓ = 4 . For comparison, we include in gray the asymptotic power of the Bingham test ( v k = I [ k = 2] ) against the contiguous W atson alternati ve. This indicates the cost that comes from using the data-driv en tests instead of the Bingham test against this speciﬁc alternati ve. 6. Conclusion In this work, we analyzed the data-driv en Sobolev test under contiguous sequences of alternatives. This led to an adapted version of the test, correcting for the fact that the original data-driv en test is blind under some contiguous alternativ es. This comes at the cost of lower power against other alternativ es. Simulations conﬁrm our theoretical results. Directions for future research include exploring the asymptotic behavior of Sobolev tests under other groups of alternativ es, such as multi-spiked alternatives like the Bingham distribution (García-Portugués et al., 2025), mixtures of distributions, as well as e xploring other versions of the data-dri ven test. Acknowledgements The author gratefully acknowledges Thomas V erdebout (Univ ersité Libre de Bruxelles) for his guidance during the author’ s master’ s thesis which originated this work, and Johan Se gers (KU Leuv en) for v aluable comments and careful proofreading of the manuscript. Bibliography American Physical Society (2009). This month in physics history . https://www.aps.org/archives/ publications/apsnews/200911/physicshistory.cfm . APS News Online; accessed 21 May 2025. Beran, R. J. (1979). Exponential models for directional data. Ann. Stat. , 7(6):1162–1178. Bingham, C. (1974). An antipodally symmetric distribution on the sphere. Ann. Stat. , 2(6):1201–1225. García-Portugués, E., Painda veine, D., and V erdebout, T . (2025). On a class of sobolev tests for symmetry , their detection thresholds, and asymptotic powers. J. Amer . Statist. Assoc. T o appear . García-Portugués, E. and V erdebout, T . (2018). An overvie w of uniformity tests on the hypersphere. Giné, E. (1975). Inv ariant tests for uniformity on compact Riemannian manifolds based on Sobolev norms. Ann. Stat. , 3(6):1243–1266. Jupp, P . E. (2008). Data-driv en Sobole v tests of uniformity on compact Riemannian manifolds. Ann. Stat. , 36(3):1246– 1260. Jupp, P . E. and Spurr, B. D. (1983). Sobolev tests for symmetry of directional data. Ann. Stat. , 11(4):1225–1231. Mardia, K. and Jupp, P . (2009). Dir ectional Statistics . Wile y Series in Probability and Statistics. W iley . Pewse y , A. and García-Portugués, E. (2021). Recent adv ances in directional statistics. T est , 30(1):1–58. Rayleigh, Lord. (1919). On the problem of random vibrations, and of random ﬂights in one, two, or three dimensions. Lond. Edinb . Dublin Philos. Mag . J. Sci. , 37(220):321–347. 6

Addressing parity blindness of data-driven Sobolev tests on the hypersphere

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment