A practical procedure to find matching priors for frequentist inference

A practical pro cedure to ﬁnd matc hing priors for frequen tist inference By JUAN ZHANG and JOHN E. KOLASSA Dep artment of St atistics, Rutgers University, Pisc ataw ay, New Jersey, 088 54, U.S.A. janezh@stat.rutger s.edu kolassa@stat.rutger s.edu Summar y W e presen t a practical w ay to ﬁnd the matc hing prio rs prop osed by W elc h & Pe ers (1963) and P eers (1965). W e in v es tigate the use of saddlepoint appro ximations com bined with match ing priors and obtain p -v alues of the test of an in terest pa- rameter in the presence of n uisance para meter. The adv an tage o f our pro cedure is the ﬂexibility of choosing diﬀeren t initial conditions so that one can adjust the p erformance of the test. Tw o examples hav e b een studied, with co v erage v eriﬁed via Mon te Carlo simu lation. One relates to t he ratio o f t w o exp onen tial means, and the ot her re lates the logistic regression mo del. Particularly , w e are in terested in small sample settings. Some key wor ds: Bayes; Conditional inferenc e ; Matching prior; Mo diﬁed signed roo t likeliho o d ratio statistic; Partial diﬀerential e quation; Saddlep oint approximation. 1 1. Intr oduction W e consider inference on a single scalar parameter in the presence of n ui- sance parameters. Under the frequen tist settings, conditional inference can b e complicated. Ba y esian metho d can simplify frequen tist elimination of n uisance parameters. The frequen tist and the Ba y esian approa c hes can b e connected b y matc hing prio rs. Matc hing priors were ﬁrst prop osed b y W elc h & P eers (1963) and P eers (196 5). Determining a matching prior is equiv alen t to ﬁnding a solution of a ﬁrst order pa r t ial diﬀeren tial equation. Only in simple circumstances, suc h as when parameters are orthogonal, the partial diﬀerential equation can b e solv ed analytically . Levine & Casella (2003) not e that “ Unfortunately , except for these cases, the solution of the resulting partial diﬀeren tial equations b ecomes quite a h urdle; our only hop e is to ﬁnd n umerical solutio ns to these partial diﬀeren tial equation.” W e will see a practical w ay to solv e for the matc hing prio r s, without the in- v olv emen t of t he bac k transformat io n describ ed b y Levine & Casella (2003). This pro cedure is easy to understand, can b e implemen ted in R, R Dev elopmen t Core T eam (2007) and is suitable to a ll kinds of initial conditions. Our implemen tation of matc hing prior s for the approximations prop osed b y DiCiccio & Martin (1993) is less complicated t han ot her frequen tist metho ds. Di- Ciccio a nd Martin’s approx imations are saddlep oint approximations that mak e use of Ba y esian–frequen tis t parallels. O ur prop osed implemen tation requires less computational eﬀor t compared to the iterativ e Metrop olis-Hasting alg orithm de- scrib ed by Levine & Casella (2 003). W e end the in tro duction with a brief outline of this pap er. In § 2, W e review the concepts o f matching priors and discuss the circumstance when orthogonal pa- rameters are presence . Existing analytical and n umerical solutions are review ed. 2 In § 3, w e presen t t he pro cedure f o r solving matching prio r s, b oth analytically and numerically . Speciﬁcation of initial condition is discuss ed. W e also pro vide information of R soft w are implemen tation of the solving pro cedure. In § 4, the ap- pro ximations of DiCiccio & Martin (1993) are review ed. The application o f using matc hing prio rs conjuncted with DiCiccio and Martin’s appro ximations is illus- trated through examples in section § 5. Diﬀeren t initial conditions are sp eciﬁed for obtaining v arious matc hing priors. Finally , § 6 con tains the conclusion. 2. Ma tching priors W e consider parametric mo dels with random v ariables X 1 , . . . , X n ha ving joint densit y f unction that dep ends on the unkno wn parameter v ector ω . Supp ose ω is of length d and ω = ( ω 1 , ω 2 , . . . , ω d ) = ( ψ , λ ) with ψ = ω 1 , the parameter of in terest, and the nuis ance parameter λ = ( ω 2 , . . . , ω d ). Matc hing priors were prop osed by W elc h & P eers (1963) and Pee rs (1965). In the follo wing, denote the matching prior by π ( · ). Let pr π ( ·| X ) b e the p osterior probabilit y measure for ψ under prior π ( · ). The upp er (1 − α ) p osterior quantile constructed on the basis of a prior densit y function π ( ψ ) has the prop erty that it is also t he frequen t ist limit, suc h tha t pr π { ψ ≤ ψ (1 − α ) ( π , X ) | X } = pr ψ { ψ ≤ ψ (1 − α ) ( π , X ) } = 1 − α + O ( n − 1 ) . When there are no n uisance parameters, W elc h & Pe ers (1963) sho w ed t hat the appropriate c hoice of π ( ω ) is π ( ω ) ∝ { i ( ω ) } 1 / 2 , where i ( ω ) = E {− d 2 l ( ω ) / d ω 2 } , and l ( · ) is the log - lik eliho o d function. In this case, matc hing pr io rs can b e easily obtained. In the presence of n uisance parameters, P eers ( 1 965) sho w ed that π ( ω ) mus t 3 b e chos en to satisfy the partial diﬀerential equation d X j =1 i 1 j ( i 11 ) − 1 / 2 ∂ ∂ ω j (log π ) + d X j =1 ∂ ∂ ω j { i 1 j ( i 11 ) − 1 / 2 } = 0 , (1) where i j k ( ω ) = E {− ∂ 2 l ( ω ) /∂ ω j ∂ ω k } and ( i j k ) is the d × d inv erse matrix of ( i j k ). If the parameter of in terest and the nuisance parameter v ector are orthogo- nal, solving the partial diﬀerential equation ( 1) is relativ ely easy . W e fo llo w t he deﬁnition of para meter orthogona lity b y Co x & Reid (19 87). Orthogonality is deﬁned with respect to the expected Fisher informat ion matrix. The most direct statistical in terpretation of parameter orthogo nalit y is that the relev ant comp o- nen t s of the orig ina l statistic are uncorrelated. In general, it is p ossible to obtain orthogonality of a scalar parameter of interes t to a set of nuis ance parameters. When the parameter o f in terest ψ is orthog onal to a set of n uisance pa rame- ters, equation (1) reduces to ( i ψψ ) − 1 / 2 ∂ ∂ ψ (log π ) + ∂ ∂ ψ ( i ψψ ) − 1 / 2 = 0 . (2) Tibshirani (1 989) show ed that solutions we re of the form π ( ψ , λ ) ∝ { i ψψ ( ψ , λ ) } 1 / 2 g ( λ ), where g ( λ ) is arbitrary , and the suggestiv e notation i ψψ ( ψ , λ ) is used in place of i 11 ( ψ , λ ). Ho w ev er, c ho osing a para metrizatio n to ac hiev e parameter orthog onalit y is not alw a ys easy , a nd it can b e ha r d in some cases. It is equiv a lently hard to ob- tain orthog onalization and to solve the pa rtial diﬀeren tial equation (1) directly , since the orthogonalization pro cedure also requires solutions to partial diﬀerential equations of form similar t o ( 1 ). Staicu and Reid (2007), studied the use of matc h- ing priors with the a pproximation of D iCiccio & Martin (1 9 93) under o rthogonal parametrization, and sho w ed that the P eers-Tibshirani class of matc hing prior s w a s essen tially unique. One can mo dify the argumen ts in this pap er to solv e the 4 partial diﬀeren tia l equation that deﬁnes the orthogo nalit y tra nsformation, and attempt, using o r t hogonality , to narrow do wn the class of matc hing prio rs. Levine & Casella ( 2 003) prop osed a general pro cedure t o solv e the partial dif- feren t ial equation (1) num erically , in mo dels with a single nuis ance pa rameter. Firstly , they transform the parameters into another pa rameter space, solving the equation, and then transform bac k to the original parameter space. The n umerical application of this pro cedure is not necessarily easy , and usually the transformation b etw een the tw o para meter spaces is non trivial. Levine & Casella (2003) implemen ted their pro cedure using Mathematica. They did not give in- structions o n initial condition sp eciﬁcation, whic h is a necessary comp onen t to giv e sp eciﬁc solution in solving the partial diﬀeren tial equation. Sw eeting (2005) in tro duced data- dep enden t priors that lo cally approximate the mat ching prio r s, and his pro cedure can deal with v ector nuisance parameters. 3. Sol ving for the ma tching priors In this section, w e introduce a pro cedure to solv e the pa rtial diﬀeren tial equa- tion (1) in general parametrization. F or simplicit y , w e consider the dimension of the parameter space as 2. First, w e giv e analytical for m of the solutio ns, and then practical not es will b e presen ted later in this section. In the case tha t d = 2, equation (1) is reduced to a ( ψ , λ ) z ψ + b ( ψ , λ ) z λ = d ( ψ , λ ) , (3) where z ( ψ , λ ) = log { π ( ψ , λ ) } , a ( ψ , λ ) = { i 11 ( ψ , λ ) } 1 / 2 , b ( ψ , λ ) = i 12 ( ψ , λ ) { i 11 ( ψ , λ ) } − 1 / 2 , 5 and d ( ψ , λ ) = −  ∂ ∂ ψ { i 11 ( ψ , λ ) } 1 / 2 + ∂ ∂ λ { i 12 ( ψ , λ ) }{ i 11 ( ψ , λ ) } − 1 / 2  . The co eﬃcien t a ( ψ , λ ) is a diagonal elemen t of the in v erse matrix of ( i j k ), so a ( ψ , λ ) can not b e zero. Dividing b oth sides of (3) b y a ( ψ , λ ) , w e hav e z ψ + b ( ψ , λ ) a ( ψ , λ ) z λ = d ( ψ , λ ) a ( ψ , λ ) . This forces the co eﬃcien t of z ψ to b e 1, whic h simpliﬁes the pro cedure o f ﬁnding a solution. T o solv e t he equation (1), it suﬃces to solv e the follo wing ordinary diﬀerential equations system d ψ d s = 1 , d λ d s = b ( ψ , λ ) a ( ψ , λ ) , d z d s = d ( ψ , λ ) a ( ψ , λ ) . (4) T o b e mo r e sp eciﬁc with the solution, let us consider the initial conditions prescribed alo ng an initial curv e I . Suppose tha t I is given parametrically , in terms of a parameter ξ , as ψ = Ψ( ξ ) , λ = Λ( ξ ) . Then ev aluating z ( ψ , λ ) at a p oint on I is equiv alen t to expressing z a s a function of ξ , z = Z ( ξ ) = z { Ψ( ξ ) , Λ( ξ ) } . (5) Here, it is obvious to see that I can not b e ta ng en t to the direction  1 , b { Ψ( ξ ) , Λ( ξ ) } a { Ψ( ξ ) , Λ( ξ ) }  . W e then obtain ψ = ψ ( s, ξ ) , λ = λ ( s, ξ ) b y sim ultaneously integrating the t w o equations deﬁned by d ψ d s =1 , ψ ( s 0 , ξ ) = Ψ( ξ ) , (6) d λ d s = b ( ψ , λ ) a ( ψ , λ ) , λ ( s 0 , ξ ) = Λ( ξ ) . (7) 6 F rom the t hir d equation in (4), the initial condition is giv en by (5). The n w e ha v e, d z d s = d ( ψ , λ ) a ( ψ , λ ) , z ( s 0 , ξ ) = Z ( ξ ) , (8) Equation (8) can b e in tegrated by quadrature, o nce equations (6) and (7) ha v e b een solv ed, z ( s, ξ ) = Z ( ξ ) + Z s s 0 d { ψ ( s ′ , ξ ) , λ ( s ′ , ξ ) } a { ψ ( s ′ , ξ ) , λ ( s ′ , ξ ) } d s ′ . (9) These generate a surface in three dimensions, Z ( ψ , λ ), that satisﬁes b o th the equation (3) and the initial condition. When there are no close for m solutions for equations ( 6),(7) and (8), numerical solutions can b e ac hiev ed. Rhee et al. (1986) presen ts more mathematical details. In obtaining the solution formula (9) of z ( s, ξ ), w e a v oid doing back tr a ns- formation as describ ed b y Levine & Casella (20 03). Noticing that if we w a n t to sp ecify the v alue of a matching prior at a certain po in t, sa y ( ψ ∗ , λ ∗ ), w e can directly sp ecify s as ψ ∗ and ξ as λ ∗ in f orm ula (9), and then the mat ching prio r ev a lua ted at ( ψ ∗ , λ ∗ ) can b e ac hiev ed. Without loss of generalit y , set the initial condition { Ψ( ξ ) , Λ( ξ ) , Z ( ξ ) } = (0 , ξ , − 1) . With Ψ( ξ ) = 0, we ha ve ψ = s . The equations (7 ) and (8) can b e simpliﬁed as d λ d s = b ( s, λ ) a ( s, λ ) , λ ( s 0 , ξ ) = Λ( ξ ) , (10) d z d s = d ( s, λ ) a ( s, λ ) , z ( s 0 , ξ ) = Z ( ξ ) . W e used R pac k a ge odesolve by Setzer (2007) to solv e equation (10) and get a n umerical expression of λ ( · ) in s . The command lsoda() in odesolve pac k age is designed to solv e initial v alue problems for stiﬀ or non-stiﬀ systems of ﬁrst order ordinary diﬀ erential equations. It pro vides an interface to the F o r t r an ordinary 7 diﬀeren tia l equation solve r o f the same name, written b y Hindmarsh (1 983) and P etzold (1983). F or (9 ) , we did n umerical inte gratio n using Simpson’s Rule a nd emplo y ed the R function sintegral() in the Bolstad pac k age b y Curran (20 05). Supp ose z will be ev aluated at ( ψ ∗ , λ ∗ ). Noticing that Λ( ξ ) = ξ , c ho ose the start v alue as λ ∗ in solving (10), and then choose the upp er integration limit as ψ ∗ in (9). The pro cedure is easy to p erform if one has an ordinary diﬀeren tial equation solv er, ev en if not using the solv er pro vided b y R pa ck age odesolve . Based on the ordinary diﬀerential equation (6), ψ = s +Ψ( ξ ), i.e. s = ψ − Ψ( ξ ). So s 0 m ust b e c hosen considering t he range of ψ . If w e c ho ose Ψ ( ξ ) = 0, then ψ = s . F or the ﬁrst example in § 5, the parameter ψ is the rat io of t w o exp onen tial means, and hence ψ > 0. Therefore, s 0 should b e c hosen as any p ositiv e v alue. In the ab ov e w e c ho ose the initial v alues as { Ψ ( ξ ) , Λ( ξ ) , Z ( ξ ) } = (0 , ξ , − 1). No w w e will show that the num erical solving pro cedure is suitable to an y initial v alues. • Supp ose the initial condition for the ordinary diﬀeren tial equation (7 ) is λ ( s 0 , ξ ) = Λ( ξ ), for Λ( ξ ) an arbitrary know n function rather than Λ( ξ ) = ξ as ab ov e. T he solution f o rm ula of z is the same as stated in (9) . When solving (7), the initial v alue should b e c hosen as Λ( λ ∗ ), no longer λ ∗ , if z is ev aluated at ( ψ ∗ , λ ∗ ). • If the initial condition of (6) is ψ ( s 0 , ξ ) = Ψ( ξ ), then the solution from the equation (6) is ψ = s + Ψ( ξ ). Therefore, the equation (7) b ecomes, d λ d s = b { s + Ψ( ξ ) , λ } a { s + Ψ( ξ ) , λ } . Let ˜ s = s + Ψ( ξ ). By simple c hange of v ariables, (7) b ecomes d λ d ˜ s = b ( ˜ s, λ ) a ( ˜ s, λ ) . Equation (8) is d z d ˜ s = d [ ψ { ˜ s − Ψ( ξ ) , ξ } , λ { ˜ s − Ψ( ξ ) , ξ } ] a [ ψ { ˜ s − Ψ ( ξ ) , ξ } , λ { ˜ s − Ψ( ξ ) , ξ } ] 8 with z { ˜ s 0 − Ψ( ξ ) , ξ } = Z ( ξ ), not icing tha t ˜ s 0 = s 0 + Ψ( ξ ). Then the solution of z is simply give n b y the following fo r mula, z ( ˜ s, ξ ) = Z ( ξ ) + Z ˜ s − Ψ( ξ ) s 0 − Ψ( ξ ) d { ψ ( s ′ , ξ ) , λ ( s ′ , ξ ) } a { ψ ( s ′ , ξ ) , λ ( s ′ , ξ ) } d s ′ . (11) That is to sa y , the v a lue o f the prior o n a certain p oin t with the initia l condition ψ ( s 0 , ξ ) = Ψ( ξ ) , is obtained b y translating the inte rv al of integration when Ψ( ξ ) = 0 b y Ψ ( ξ ). • Supp ose the initial condition for (8) is z ( s 0 , ξ ) = Z ( ξ ) and Z ( · ) is a kno wn function. This case is eve n simpler to deal with. One only needs t o plug the v alue of Z ( ξ ) in to (9). Therefore, the suggested numerical solving pro cedure is suitable to any initial v alues. In the ab ov e, b oth the para meter of interest and the nuisance parameter are scalars. With dimension 2, it is r elativ ely easy to understand the ﬁrst o rder par- tial diﬀeren tial equation solving pro cedure from the g eometric p oin t of view, since one can dra w the initial conditions and the solution surface in a 3-dimensional space. In Zhang (2008), the solving pro cedure w as extended to m ultiple nuisance parameters, while k eeping the para meter of interes t as a scalar. The pro cedure of the higher dimension is similar as the one of 2-dimensional mo del parameters. Ho w ev er, when d > 2, it can b e computational in tensiv e to impleme n t the pro ce- dure. Also, if there are no explicit expressions fo r the co eﬃcien ts in the original ﬁrst order partial diﬀerential equation, n umerical implemen tation may b e more diﬃcult. 4. DiCiccio and Mar tin’s ap pro xima t ions Lik eliho o d ratio test is widely used in statistical inference. T he signed ro ot of the lik eliho o d ratio statistic is R = sgn( ˆ ψ − ψ 0 )[2 { l ( ˆ ω ) − l ( ψ 0 , ˆ λ 0 ) } ] 1 / 2 , where 9 l ( ω ) is the log-lik eliho o d function for the unkno wn parameter ve ctor ω a nd ˆ λ 0 is shorthand for ˆ ω ψ 0 , the constrained maxim um lik eliho o d estimator of ω . The standard normal approxim ation to the distribution of R typically has error of order O ( n − 1 / 2 ), and R can b e used to construct appro ximate conﬁdence limits for ψ ha ving cov erage error of order O ( n − 1 / 2 ). Using matc hing priors, DiCiccio & Martin (1993) prop o sed tail probability appro ximations o f o rder O ( n − 1 ). The approximations are saddlep oint approxima- tions that in v o lv e Bay esian metho d. The appro ximations of DiCiccio & Mart in (1993) can b e expressed in the Barndorﬀ-Nielsen (1980) format Φ { R + R − 1 log( T /R ) } , (12) and the Lugannani & Rice (1980) format Φ( R ) + φ ( R )( R − 1 − T − 1 ) , (13) where Φ is the standard no r mal distribution f unction, and T is deﬁned a s T = l ψ ( ψ 0 , ˆ λ 0 ) | − l λλ ( ψ 0 , ˆ λ 0 ) | 1 / 2 π ( ˆ ω ) | − l ω ω ( ˆ ω ) | 1 / 2 π ( ψ 0 , ˆ λ 0 ) . (14) Here l ψ ( ω ) = ∂ l ( ω ) /∂ ψ , l ω ω is the matr ix of second-order partial deriv a tiv es o f l ( ω ) tak en with resp ect to ω ; l λλ ( ω ) is the submatrix of l ω ω ( ω ) corresp onding to λ ; and π ( ω ) is a matching prior densit y for ω = ( ψ , λ ) which satisﬁes equation (1). Then the resulting approximation is pr( ψ ≥ ψ 0 | X ) . = Φ { R + R − 1 log( T /R ) } , or, pr( ψ ≥ ψ 0 | X ) . = Φ( R ) + φ ( R )( R − 1 − T − 1 ) . Both of them hav e relativ e error of order O ( n − 1 ). Appro ximate conﬁdence limits for ψ can b e constructed using either of (12) or (13). These conﬁdence limits hav e co v erage erro rs of order O ( n − 1 ). T o relativ e erro r of the order O p ( n − 1 ), the v a riable T is parameterization in v arian t under transformations ω 7→ { ψ , τ ( ω ) } . The approximations of DiCiccio & Martin (1993) show their adv an tages in less computational eﬀort compared to the Metrop olis-Hastings pro cedure used 10 b y Levine & Casella (2 003). T o calculate T in (14), the matc hing prior r equires to b e ev aluated at tw o p oints, ( ψ 0 , ˆ λ ψ 0 ) and ˆ ω . The initial curve can b e chose n passing through ( ψ 0 , ˆ λ ψ 0 ); that is to say , only the solution on one p oint ˆ ω needs to b e determined. 5. Examples 5.1. R atio of two exp onential me ans Let X and Y be exp onential random v ariables with means µ a nd ν resp ec- tiv ely; t he ratio of the means, ν /µ , is the parameter of in terest. The parameter transformation  µ → λψ − 1 2 , ν → λψ 1 2  mak es the tw o new para meters ψ and λ orthogonal. Then X and Y ha v e exp ectations λψ − 1 2 and λψ 1 2 , resp ectiv ely . Supp ose w e ha v e n indep enden t replications of ( X , Y ). Denote ω = ( ψ , λ ). W e can obtain t he log-lik eliho o d function as l ( ω ) = − n { ( ψ ¯ x + ¯ y ) / ( λ √ ψ ) + 2 log λ } . Both approx imations of the Barndorﬀ-Nielson format (12) and the Lugannani and R ice f o rmat (13) are considered. Based on these appro ximations, p -v alues can b e calculated. Appro ximations based on diﬀeren t prior densit y functions men tio ned previously ma y b e used to generate an approximate one-sided p -v alue b y approx imating pr( R ≥ r ), for r the observ ed v alue of R . Approx imate t w o- sided p -v alues ma y b e calculated b y approximating 2 min { pr( R ≥ r ) , pr( R < r ) } . One and t w o-sided hypotheses tests of size α may b e constructed b y rejecting the n ull h yp othesis when the p -v alue is less than α . T able 1 rep orts t yp e I error probabilities o f the 1,000,0 00 rounds of sim ulation with n = 10. In this example, the parameters ψ and λ are orthogonal. Using t he simpliﬁed partial diﬀeren tial equation (2), π ( ψ , λ ) = 1 / ψ is an ex plicit solution. Also π ( ψ , λ ) = 1 / ( ψ λ ) is another explicit solution. Numerical solutions were also calculated. One of the init ia l condition is { Ψ( ξ ) , Λ( ξ ) , Z ( ξ ) } = (0 , ξ , − 1). The 11 resulting matc hing prior corresp onds to the the ana lytic solution 1 /ψ . Another n umerically solv ed matc hing prio r is based on the initial condition (0 , ξ , − lo g ξ ), whic h cor r espo nds to the the a nalytic solution 1 / ( ψ λ ). F rom T a ble 1, one can see that the n umerical and ana lytic solutions give almost the same sim ulation results, whic h conﬁrmed the v alidit y of our n umerical solution pro cess . Appro ximations (12) and (13) hav e a remo v able singularit y at R = 0. Conse- quen tly , these and similar form ulae require care when ev aluating near R = 0. In these cases, for a ll but the most extreme conditioning ev en ts, the resulting con- ditional p -v alue is la rge enough as to not imply rejection of the null hy p othesis, and so these sim ulated data sets are treated as not implying rejection of the n ull h yp othesis. 5.2. L o gistic r e gr ession W e consider a logistic regression mo del with a binary resp onse Y and only one explanatory v ariable X . Let ω 1 denote the unkno wn interc ept and ω 2 denote the unkno wn eﬀect of the explanatory v ariable. Supp ose ω 2 is the par ameter of in- terest and ω 1 is the n uisance parameter. W e will solve matching priors and apply DiCiccio and Martin’s a ppro ximations to do inference ab out ω 2 . Levine & Casella (2003) considered a similar example. Let Y i b e the r esp o nse v ar ia ble t aking binary v a lues with succes s proba- bilit y as p i , and X i b e the explanatory v a r ia ble follo wing uniform distribution U (0 , 1). Supp ose there are n indep enden t replications of ( X i , Y i ). Fit the mo del log { p i / (1 − p i ) } = v ′ i ω = ω 1 + ω 2 x i , where v i = (1 , x i ) ′ and ω 2 is the par ameter of in terest. In v erting the equation, we ha v e p i = (1 + e − v ′ i ω ) − 1 . W e can obtain the log-lik eliho o d function as l ( ω ; x ) = P n i =1 y i log { p i / (1 − p i ) } + P n i =1 log(1 − p i ) . The ﬁrst deriv ativ e of the lo g=lik eliho o d function is V ′ ( y − p ), where V is the design matrix with v ′ i in row i . The second deriv a t ive of log-lik eliho o d function 12 is − V ′ W V , where W is a diagona l matrix with diagonal elemen ts p i (1 − p i ) , i = 1 , · · · , n . Using sample size n = 30, w e generate data satisfying the lo gistic regression mo del with ω 1 = − 1, ω 2 = 0 . 5 , and the explanator y v a riable X following uniform distribution U (0 , 1). In this case, generally the para meters ω 1 and ω 2 are not orthogonal. W e use the numeric al pro cedure describ ed in § 3 and study p erfo r - mances o f diﬀeren t initial conditions. T able 2 contains t yp e I error probabilities for bo th one-sided and t w o-sided tests for appro ximations of b oth Barndorﬀ - Nielson format and Lugannani and Rice f o rmat, based on 10 ,000 rounds of sim- ulation. As w e mentioned previously , a pproximations (12) a nd (13) hav e a remov able singularit y a t R = 0. W e deal with this singularity the same w ay as in § 5.1. In the following, we giv e some instructions on how to c hange the initial con- dition and how to choose fav orable initial conditions. Initial condition (0 , ξ , − 1) giv es t yp e I error probabilities larg er than the no minal lev el 0.05; that is to say , it has the tendency to underestimate tail probabilities a nd reject the n ull h yp o th- esis. W e w a nt to c ho ose initial conditions to obtain a test whose ty p e I error rate is closer to the nominal lev el. W e adjust the initial condition when solving the partial diﬀeren tial equation (1), and use the Barndorﬀ-Nielson fo r ma t of the appro ximation. The quantit y T in (1 4) is the only part in the approximation that relates to matc hing priors. F or a one-sided t est, when the probabilit y is small and close to 0, R and T are negativ e. Making Φ { R + R − 1 log( T /R ) } larger is equiv alent to ma king T bigger. Also one may notice that Z ( ξ ) is used only in equation (9). Supp ose the initial condition is { Ψ( ξ ) , Λ( ξ ) , Z ( ξ ) } . Keep the ﬁrst t w o comp onents o f the initial condition, Ψ( ξ ) and Λ( ξ ) , unc hanged, and only mo dify the third term, Z ( ξ ). By doing so, the in tegral par t in equation 13 (9) is kept unc hanged and z v aries only with Z ( ξ ). By c hanging Z ( ξ ), w e wan t to adjust T to b e bigger. Because T is negativ e when reject a h yp othesis, and matc hing priors app ear in T as a ratio, one can construct a Z ( · ) suc h tha t t he ratio, exp { Z ( ˆ ψ , ˆ λ ) } / exp { Z ( ψ 0 , ˆ λ 0 ) } , will b e smaller tha n 1; recall t hat 1 is the v alue of the ratio when Z ( ξ ) = − 1. Based on the a b o v e argumen ts, Z ( · ) function is constructed as Z ( ξ ) = − log { ( ξ + 1) q + 1 } , where q is a tuning parameter and leads Z ( · ) to an ev en function. As an ev en function, Z ( ξ ) achiev es its maxim um v alue at − 1, where − 1 is the true v alue for the n uisance para meter when da t a w ere simulated. W e hav e constructed prior s using kno wledge of the true v alue of the nuisance parameter. Of course, in practice this kno wledge is una v ailable. One might instead use an estimator o f the nuisanc e par ameter in place of the true v alue. When Z ( ξ ) increases quick ly , suc h as q = 2 in table 2, the t yp e I error probabilit y deviates far a w a y from the nominal lev el in the other direction. If a more slo wly increasing f unctions is used, the p erforma nce of t yp e I error ma y b e b etter. Unfortunately , with some choices of initia l conditions, suc h as the last three listed in table 2, the Luganna ni and Rice format approximation ma y fall out- side the range of 0 and 1 in some cases. F or example, the initial condition of [0 , ξ , − log { ( ξ + 1) 2 + 1 } ] yielded 5 suc h probabilities out of 10,000 data sets. W e con v ert those v alues to 0 or 1 b y min { max( p, 0) , 1 } , where p is the p -v alue that is outside 0 and 1. F or t he parameter of in terest ω 2 , w e calculate credible in terv als using DiCiccio and Martin’s approx imation in Barndorﬀ- Nielson forma t . With initial condition (0 , ξ , − 1), out of 1,000 generated data sets, there ar e 938 credible interv als cov ered the true v alue 0 . 5. With initial condition [0 , ξ , − log { ( ξ + 1) 2 / 5 + 1 } ], for the 14 parameter of inte rest ω 2 , there are 9 54 credible inte rv als cov ered the true v alue 0 . 5. W e apply the ab ov e pro cedure to a real data set from Hosmer & Lemesho w (2000, T able 1 .1 ). The resp onse v ar iable is coronary heart disease indicator, y , and the explanatory v a riable is age, x . One hundred sub j ects were included in the study; i.e. n = 100. W e ﬁt the log istic regression mo del following the same deﬁnition as ab ov e, with ω 1 deﬁned for the unknow n interce pt and ω 2 for the eﬀect of age on heart disease status. Using initial condition (0 , ξ , − 1) and Ba rndorﬀ- Nielson f ormat appro ximation, a tw o-sided testing p - v alues is 5 . 532326 × 10 − 8 , and ﬁv e and ninet y-ﬁv e p osterior p ercen tiles are of 0.07 a nd 0.15 resp ectiv ely . 6. Conclusion Matc hing prio r s w ere ﬁrst prop osed b y W elc h & P eers (1963) and Pee rs (1965). In the general parametrization, if the parameter of in terest and the nuis ance parameters are not orthog onal, solving the prio r from a ﬁrst order partial diﬀer- en t ial equation is nontrivial. This pap er presen ts a pra ctical w ay to solv e for the matc hing priors and the pro cedure can b e suitable to all kinds of initial condi- tions. Matching priors can b e used with the a ppro ximations of DiCiccio & Martin (1993). By c ho osing diﬀeren tial initial conditions one is able to impro ve the p er- formances of DiCiccio and Martin’s appro ximations. References Barndorff-Nielsen, O. E. (1980). C onditionality resolutions. Biometrika 67 , 293– 310. 15 Co x, D. R. & Reid, N. (1987 ) . P arameter orthogonality and appro ximate conditional inference. J. R. Statist. So c. S e r. B 49 , 1–39 . Curran, J. (20 0 5). Bolstad: Bolstad function s . R pack age version 0.2-12. DiCiccio, T. J. & Mar tin, M. A. (199 3 ). Simple mo diﬁcations f or signed ro ots of lik eliho o d ratio statistics. J. R. Statist. So c. Ser. B 55 , 305– 316. Hindmarsh, A. C. (19 83). Scientiﬁc C omputing . Amsterdam: North- Holland. Ed. Stepleman, R.W. et al.; V ol. 1 of IMA CS T ra nsactions on Scien t iﬁc Com- putation. Hosmer, D. W. & Lemeshow, S . (2000 ) . Applie d L o gistic R e gr essi o n . New Y o rk: Wiley , 2nd ed. Levine, R. A. & Casella, G. (2003). Implemen ting matching priors f o r fre- quen tist inference. Biom etrika 90 , 1 27–137. Lugannani, R. & Rice, S. (19 80). Saddlep oin t appro ximation for the dis- tribution of the sum of indep enden t random v ariables. A dv. Appl. Pr ob. 12 , 475–490. Peers, H. W. (1 9 65). On conﬁdnece p oints and bay esian probability p oints in the case o f sev eral para meters. J. R. Statist. So c. Ser. B 27 , 9–16. Petzold, L. R. (1 9 83). Automatic selection of metho ds for solving stiﬀ and nonstiﬀ systems of ordinary diﬀeren t ia l equations. SI AM: J. Sci. Statist. Comp. 4 , 136– 48. R Deve lopment Core Team (2007). R: A L anguage and En vir onment fo r Statistic al Computing . R F oundatio n for Statistical Computing, Vienna, Aus- tria. 16 Rhee, H. , Aris, R. & Amundson, N. (1986). First-or der Partial Diﬀer ential Equations: The ory and Applic a tion of Sing l e Equations , v ol. 1. Englew o o d Cliﬀs, New Jersey: Pren tice-Hall. Setzer, R. W. (2007). o desolve: S olvers for Or d i n ary D i ﬀ er ential Equations . R pack age vers ion 0.5-1 7. Sweeting, T. J. (2005). On the implemen tation of lo cal probability priors for in terest parameters. Bio metrika 92 , 47–57. Tibshirani, R . (1989). Noninformativ e priors for one parameter of man y . Biometrika 76 , 6 04–608. Welch, B . L. & Pee rs, H. W. (19 63). On formulae f or conﬁdence p o ints based on intergrals of w eighted likelih o o ds. J. R. Statist. So c. Ser. B 25 , 3 1 8–329. Zhang, J. (200 8). Higher order conditional inference using pa rallels with approx - imate bay esian tec hniques. Ph.D. Dissertation, Rutgers University , 40–44. 17 T able 1: Ratio of tw o exp onen tial means: t yp e I erro r probabilit y BN F ormat LR F ormat T ests 1-sided 2-sided 1-sided 2-sided Lik eliho o d ratio test 0.0520 0.0526 0.052 0 0.0526 I.C. (0 , ξ , − 1) 0.0456 0.0441 0.045 6 0.0441 Analytic solution: 1 /ψ 0.0456 0.0441 0.045 6 0.0441 I.C. (0 , ξ , − log ξ ) 0.0499 0.0498 0.049 9 0.0498 Analytic solution: 1 / ( ψ λ ) 0.0499 0.0498 0.0 4 99 0.0498 ∗ I.C. stands for initial condition. † Results a re based on 1,00 0,000 rounds of simulation with n = 10. ‡ T ests a r e of nominal type I err or 0.05 . 18 T able 2: Logistic regression: ty p e I error probability BN F ormat LR F ormat T est 1-sided 2-sided 1-sided 2-sided Lik eliho o d ratio test 0.054 0.060 0.054 0.060 I.C. (0 , ξ , − 1) 0.052 0.057 0.052 0.057 I.C. [0 , ξ , − log { ( ξ + 1) 2 + 1 } ] 0.028 0.019 0.031 0.020 I.C. [0 , ξ , − log { ( ξ + 1) 2 / 5 + 1 } ] 0.041 0.041 0.044 0.046 I.C. [0 , ξ , − log { ( ξ + 1) 2 / 11 + 1 } ] 0.045 0.048 0.046 0.050 ∗ I.C. stands for initial condition. † Results are based on 10,00 0 r ounds of simulation with n = 30. ‡ T ests a r e of nomina l type I err or 0.05. 19

A practical procedure to find matching priors for frequentist inference

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment