Introducing the b-value: combining unbiased and biased estimators from a sensitivity analysis perspective
In empirical research, when we have multiple estimators for the same parameter of interest, a central question arises: how do we combine unbiased but less precise estimators with biased but more precise ones to improve the inference? Under this setti…
Authors: ** - **Zhe Xiaolin** (주 저자) - (논문에 명시된 다른 공동 저자들이 있다면 여기서 나열) *※ 논문에 정확한 저자 명단이 제공되지 않아, 저자 정보를 확인 후 추가하시기 바랍니다.* --- **
In tro ducing the b-v alue: com bining un biased and biased estimators from a sensitivit y analysis p ersp ectiv e Zhexiao Lin ∗ , P eter J. Bick el † , and P eng Ding ‡ F ebruary 19, 2026 Abstract In empirical researc h, when we ha ve m ultiple estimators for the same parameter of in terest, a cen tral question arises: how do w e combine unbiased but less precise estimators with biased but more precise ones to improv e the inference? Under this setting, the p oin t estimation prob- lem has attracted considerable atten tion. In this pap er, w e fo cus on a less studied inference question: ho w can we conduct v alid statistical inference in suc h settings with unknown bias? W e prop ose a strategy to combine unbiased and biased estimators from a sensitivity analysis p erspective. W e deriv e a sequence of confidence interv als indexed by the magnitude of the bias, whic h enable researc hers to assess how conclusions v ary with the bias levels. Importantly , we in tro duce the notion of the b-v alue, a critical v alue of the unkno wn maxim um relative bias at whic h com bining estimators does not yield a significant result. W e apply this strategy to three canonical com bined estimators: the precision-w eighted estimator, the pretest estimator, and the soft-thresholding estimator. F or each estimator, w e c haracterize the sequence of confidence in- terv als and determine the bias threshold at whic h the conclusion changes. Based on the theory , w e recommend rep orting the b-v alue based on the soft-thresholding estimator and its asso ciated confidence interv als, wh ic h are robust to unknown bias and achiev e the lo west worst-case risk among the alternatives. Keyw ords : data fusion, data in tegration, pretest, shrink age estimator, soft-thresholding. 1 In tro duction In empirical researc h, it is common for researchers to employ different metho ds to estimate the same parameter of interest. These differences ma y arise from the use of distinct datasets or from imp osing differen t mo del assumptions on the same dataset. W e motiv ate our pap er with the following tw o examples of com bining estimators. Example 1.1. Randomized controlled trials (R CT s) are the gold standard for estimating treatment effects due to their abilit y to eliminate unmeasured confounding. Ho wev er, R CT s often suffer from ∗ Departmen t of Statistics, Universit y of California, Berk eley , CA 94720, USA; e-mail: zhexiaolin@berkeley.edu † Departmen t of Statistics, Univ ersity of California, Berkeley , CA 94720, USA; e-mail: bickel@stat.berkeley.edu ‡ Departmen t of Statistics, Univ ersity of California, Berkeley , CA 94720, USA; e-mail: pengdingpku@berkeley.edu 1 limited sample sizes, as large-scale exp erimen ts can b e costly or infeasible. In con trast, observ ational data are more readily av ailable from the target population of interest. Ho wev er, estimates using observ ational data may b e biased in estimating treatment effects due to unmeasured confounding, raising concerns ab out the internal v alidit y . See Brant ner et al. ( 2023 ) and Colnet et al. ( 2024 ) for recen t reviews on motiv ations and metho ds for combining R CT s and observ ational studies. Example 1.2. The ordinary least squares (OLS) estimator is biased in estimating the unknown parameters under the linear mo del when the error term is correlated with the regressor. In contrast, the instrumental v ariables (IV) estimator can provide un biased estimates for the parameters of in terest with a v alid instrumen tal v ariable that is uncorrelated with the error term but correlated with the regressor. Ho wev er, the IV estimator is usually muc h less precise than the OLS estimator, esp ecially when the IV is w eakly correlated with the regressor ( Bound et al. , 1995 ). In empirical studies, e.g., Angrist and Krueger ( 1991 ), researc hers often rep ort results from b oth OLS and IV estimators. Ho w to combine OLS and IV estimators is gaining increasing in terest ( Armstrong et al. , 2025 ). When the estimators are from differen t datasets, e.g., Example 1.1 , the estimators are inde- p enden t as long as the datasets are indep endent. When the estimators are from the same dataset but with differen t mo del assumptions, e.g., Example 1.2 , the estimators are dep enden t in general. Giv en access to multiple potentially dep enden t estimators, some unbiased but less precise and others biased but more precise, a natural question is: How can we combine the unbiased and p otentially biased estimators to improv e the inference with unknown bias? F rom the p oin t estimation p erspec- tiv e, this problem has b een extensiv ely studied ( Bic kel , 1984 ; Green and Strawderman , 1991 ; Giles and Giles , 1993 ; Chen et al. , 2015 ; A they et al. , 2020 ; de Chaisemartin and D’Haultfœuille , 2020 ; Rosenman et al. , 2023a ; Gao and Y ang , 2023 ; Y ang et al. , 2025 ). Man y metho ds hav e b een prop osed for constructing combined estimators that p erform w ell when the bias is small and ha ve b ounded risks when the bias is large. F rom the statistical inference p ersp ectiv e, this problem is less studied. In this pap er, w e answ er the follo wing question: ho w can w e conduct v alid statistical inference after com bining the estimators? This question has receiv ed considerably less atten tion. The primary difficulty lies in the imp os- sibilit y of characterizing the distribution of the combined estimator with unkno wn bias ( Armstrong et al. , 2025 ). Once the information ab out the bias is introduced, e.g., an upp er b ound on its mag- nitude, confidence interv als for the parameter of interest b ecome p ossible. In the absence of suc h information, w e fo cus on the follo wing question: How c an we c onstruct a se quenc e of c onfidenc e intervals for the p ar ameter of inter est acr oss bias levels? The sequence of confidence interv als provides a wa y to quantify how the level of bias impacts the uncertaint y in p oin t estimation. One imp ortant application of this sequence is in h yp othesis testing. Supp ose the n ull hypothesis is not rejected based on the unbiased but less precise estimator. No w consider a scenario where the statistical test based on the biased but more precise estimator 2 rejects the null hypothesis. If we hav e prior knowledge suggesting the bias is small, incorp orating the biased estimator ma y yield a more precise estimator to reject the n ull hypothesis. In suc h cases, the sequence of confidence in terv als enables us to address the follo wing question: How lar ge must the bias b e to change the c onclusion of a hyp othesis test—fr om r eje ction to non-r eje ction? The idea of constructing the sequence of confidence in terv als and examining how conclusions c hange as the assumed level of bias v aries is related to sensitivity analysis in causal inference with unmeasured confounding, e.g., Cornfield et al. ( 1959 ); Rosen baum and Rubin ( 1983 ); V anderW eele and Ding ( 2017 ). In observ ational studies, sensitivity analysis assesses how the causal conclusions c hange with resp ect to different degrees of unmeasured confounding b y v arying the sensitivity pa- rameter ( Rosenbaum , 2002 ; Ding and V anderW eele , 2016 ). Our prop osed framework has a similar fla vor: b y indexing inference results o v er a con tinuum of bias levels, w e can assess the robustness of statistical inference. W e formalize tw o statistical inference questions, confidence in terv al and hypothesis testing, in the con text of combining unbiased and biased estimators. Under regularit y conditions, the estimators satisfy a join t central limit theorem. Consequen tly , w e presen t our formulation in a finite-sample Gaussian setting, assuming exact normality for both the unbiased and biased estimators. This reduction to a Gaussian mo del is motiv ated by Le Cam’s classical asymptotic argument ( Le Cam , 1956 ), and b ecause of this, our Gaussian formulation should b e view ed as an asymptotic idealization rather than a restrictiv e finite-sample assumption. W e dev elop a general framework that applies to an y p oin t estimator formed by combining such estimators. Within this framew ork, w e construct a sequence of confidence interv als indexed by the bias level, and importantly , w e in tro duce the notion of the b-v alue, a critical v alue of the unkno wn maximum relativ e bias at whic h com bining estimators do es not yield a significant result. W e examine three canonical com bined estimators: the precision-w eighted estimator, the pretest estimator, and the soft-thresholding estimator. F or eac h estimator, we derive either analytically or n umerically the sequence of confidence interv als and the b-v alue. Among the three, we advocate for the soft-thresholding estimator, as it offers robustness to unkno wn bias compared with the precision- w eighted estimator, and exhibits low er worst-case risk and more desirable prop erties for confidence in terv al construction than the pretest estimator. W e provide a Python pac k age for the prop osed metho ds, av ailable at https://github.com/zhexiaolin/b- value . Notation. F or a vector a = ( a 1 , . . . , a d ) ⊤ ∈ R d , let ∥ a ∥ 1 = P d i =1 | a i | , ∥ a ∥ 2 = ( P d i =1 a 2 i ) 1 / 2 , and | a | = ( | a 1 | , . . . , | a d | ) ⊤ . F or v ectors a = ( a 1 , . . . , a d ) ⊤ and b = ( b 1 , . . . , b d ) ⊤ , let a ⊙ b = ( a 1 b 1 , . . . , a d b d ) ⊤ b e the Hadamard (elemen t-wise) product, and a ≤ b denote a i ≤ b i for all i = 1 , 2 , . . . , d . F or a scalar b ∈ R and a set A ⊂ R , we write b − A = { b − a : a ∈ A } , which extends naturally to vectors and sets in R d . W e use Φ( · ) and ϕ ( · ) to denote the cum ulativ e distribution function and densit y function of the standard normal distribution, resp ectiv ely . W e use ϕ µ , Σ ( · ) to denote the densit y function of the multiv ariate normal distribution with mean µ and cov ariance matrix Σ . W e use Ψ d ( · ; λ ) to denote the cum ulativ e distribution function of the noncen tral c hi- 3 squared distribution with noncentralit y parameter λ and degrees of freedom d . W e use c α to denote the (1 − α ) upp er quantile of the standard normal distribution. 2 Problem setup and a review of p oin t estimation 2.1 Problem Setup W e consider the follo wing setting: Assumption 2.1. Supp ose we observe two indep endent r andom variables: one unbiase d estimator b τ 0 ∼ N ( τ , σ 2 0 ) and one biase d estimator b τ 1 ∼ N ( τ + ∆ , σ 2 1 ) . Her e, τ ∈ R is the unknown p ar ameter of inter est. W e assume that σ 2 0 and σ 2 1 ar e known, wher e as ∆ is unknown. In practice, b oth the un biased and biased estimators for the same parameter of in terest are constructed from data. Under regularit y conditions, these estimators are join tly asymptotically normal with an unknown co v ariance matrix. As long as this cov ariance matrix can b e consistently estimated, the problem of combining estimators reduces to an exact normalit y framework under Le Cam’s asymptotic framework ( Le Cam , 1956 ). Le Cam ( 1956 ) show ed that a wide class of estima- tors and test statistics can b e appro ximated in large samples b y Gaussian exp erimen ts with known co v ariances. Within this framew ork, w e treat the estimators as arising from Gaussian exp erimen ts in whic h the v ariances are replaced by their consistent estimates, and the v alidity of the inference pro cedures is preserved asymptotically . F or this reason, w e present our analysis in the exact nor- malit y setting. Nevertheless, our results represen t the asymptotic limits of a broad class of more general and practically relev an t inference problems. The analysis can b e generalized to dep enden t (Section 3.5 ), m ultiv ariate (Section 4 ) and m ultiple estimators (Section 5 ) cases. Let γ = σ 2 0 /σ 2 1 b e the v ariance ratio b et ween b τ 0 and b τ 1 . In general, the unbiased estimator is less precise but the biased estimator is more precise. Therefore, we fo cus on the regime in whic h σ 2 0 is large and σ 2 1 is small, whic h indicates that γ is large. Under Assumption 2.1 , w e address the following central question: How c an we c onstruct a se quenc e of two-side d c onfidenc e intervals for τ acr oss differ ent levels of bias ( ∆ )? W e fo cus on constructing tw o-sided symmetric confidence in terv als centered at presp ecified p oin t estimators. In other w ords, we do not let the p oint estimator itself depend on the bias lev el ∆ . An alternativ e approach would be to use a bias-dep enden t p oin t estimator, where the estimator incorp orates ∆ . W e will discuss the adv antages of using prespecified p oin t estimators later in App endix A.3 . Since b τ 0 is an un biased estimator of τ , one natural approac h is to construct a confidence in terv al based solely on b τ 0 , which is inv ariant to the bias ∆ . Giv en a significance level ζ ∈ (0 , 1) , w e can construct a standard t wo-sided confidence interv al [ b τ 0 − σ 0 c ζ / 2 , b τ 0 + σ 0 c ζ / 2 ] , 4 where c ζ / 2 is the (1 − ζ / 2) upp er quantile of the standard normal distribution. While b eing v alid regardless of the bias ∆ , this confidence interv al may b e to o wide when b τ 0 is not precise, whose length 2 σ 0 c ζ / 2 scales prop ortionally with σ 0 . As we hav e access to the additional biased but more precise estimator b τ 1 , a natural question arises: can we shorten the confidence in terv al by incorp orating information from b τ 1 ? This motiv ates us to combine the tw o estimators b τ 0 and b τ 1 to construct a t wo-sided confidence in terv al with shorter length. Consider a generic combined estimator b τ = b τ ( b τ 0 , b τ 1 , σ 2 0 , σ 2 1 ) of τ , whic h depends only on observed data ( b τ 0 , b τ 1 ) and kno wn v ariances ( σ 2 0 , σ 2 1 ) , but not on the unkno wn bias ∆ . If ∆ were known, constructing an exact tw o-sided confidence interv al w ould b e straigh tforward, as the distributions of b oth estimators are known, which lead to kno wn distribution of b τ . How ev er, since ∆ is unknown, the exact confidence interv al dep ends on the magnitude of ∆ . T o analyze the problem, w e imp ose b ounds on the bias ∆ and construct t wo-sided confidence in terv als for differen t lev els of bias. W e assume that | ∆ /σ 0 | ≤ b for some b ≥ 0 , and study ho w the confidence in terv al based on b τ c hanges as a function of the bias b ound b . Here we fo cus on the relativ e bias ∆ /σ 0 instead of the absolute bias ∆ to ensure that the parameterization is inv ariant to the scale of the estimators. F or a given b , w e aim to construct a tw o-sided confidence interv al that ac hieves correct cov erage uniformly o ver all ∆ satisfying | ∆ /σ 0 | ≤ b . W e thus define the confidence in terv al b elo w. Definition 2.1. Giv en a significance level ζ ∈ (0 , 1) and a maxim um relative bias b ≥ 0 , we w ant to construct an in terv al I ( b, ζ ) = I ( b, ζ , b τ 0 , b τ 1 , σ 2 0 , σ 2 1 ) suc h that inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ b τ − I ( b, ζ )) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( b τ − τ ∈ I ( b, ζ )) ≥ 1 − ζ , (2.1) where P ∆ explicitly denotes the dep endence of the distribution of b τ on ∆ . The confidence interv al for τ based on b τ is then given by b τ − I ( b, ζ ) . W e next imp ose tw o natural monotonicit y conditions on the interv al I ( b, ζ ) b elow. Assumption 2.2. W e assume: 1. F or fixe d ( b τ 0 , b τ 1 , σ 2 0 , σ 2 1 ) and ζ , we have I ( b, ζ ) ⊂ I ( b ′ , ζ ) whenever b ≤ b ′ . 2. F or fixe d ( b τ 0 , b τ 1 , σ 2 0 , σ 2 1 ) and b , we have I ( b, ζ ) ⊂ I ( b, ζ ′ ) whenever ζ ≥ ζ ′ . The first condition in Assumption 2.2 requires that as we allo w greater bias in b τ 1 , the confidence in terv al b ecomes wider and contains the previous interv als. The second condition in Assumption 2.2 requires that to guaran tee higher co verage rate, the confidence in terv al m ust widen. A common class of in terv als satisfying b oth conditions is the class of symmetric fixed-length centered in terv als, whose length do es not dep end on ( b τ 0 , b τ 1 ) , i.e., I ( b, ζ ) = [ − c ( b, ζ , σ 2 0 , σ 2 1 ) , c ( b, ζ , σ 2 0 , σ 2 1 )] for some c ( b, ζ , σ 2 0 , σ 2 1 ) ≥ 0 dep ending only on b , ζ , σ 2 0 , and σ 2 1 . In this case, the confidence interv al b τ − I ( b, ζ ) in Definition 2.1 is giv en by [ b τ − c ( b, ζ , σ 2 0 , σ 2 1 ) , b τ + c ( b, ζ , σ 2 0 , σ 2 1 )] . 5 Therefore, for giv en ( b τ 0 , b τ 1 , σ 2 0 , σ 2 1 ) and ζ , we can regard the confidence in terv al b τ − I ( b, ζ ) in Definition 2.1 as a function of the bias b ound b . Th us, to address the central question of constructing a sequence of t w o-sided confidence in terv als for τ across differen t lev els of bias, we compute this confidence in terv al for a range of v alues of b : { b τ − I ( b, ζ ) } b ≥ 0 . Once we construct the sequence of confidence in terv als, a natural application is hypothesis testing ab out the parameter τ . W e fo cus on the t wo-sided test: H 0 : τ = 0 v ersus H 1 : τ = 0 . (2.2) W e generalize the discussion to one-sided tests, suc h as testing τ = 0 v ersus τ > 0 or testing τ ≤ 0 v ersus τ > 0 , in Section A.2 . It is straigh tforward to extend the tw o-sided test to h yp otheses of the form τ = τ ∗ v ersus τ = τ ∗ for any giv en τ ∗ ∈ R . A common practice for the tw o-sided test in volv es constructing confidence interv als for τ and chec king whether the n ull v alue 0 lies within these in terv als. Under Assumption 2.2 , the width of the confidence in terv al increases with the bias b ound b . This leads to a central question when using the combined estimator b τ with the confidence interv al b τ − I ( b, ζ ) : How lar ge must the bias b ound b b e to change the c onclusion of the hyp othesis test ( 2.2 ) ? By the monotonicity of I ( b, ζ ) in b (holding ( b τ 0 , b τ 1 ) and ζ fixed) from Assumption 2.2 , w e define the limiting interv al as I ( ∞ , ζ ) = lim b →∞ I ( b, ζ ) . Then there ma y exist a critical v alue of b at whic h the confidence in terv al b τ − I ( b, ζ ) contains the null v alue tested in ( 2.2 ). W e fo cus on cases where the critical v alue exists. Sp ecifically , we consider the case when 0 / ∈ b τ − I (0 , ζ ) but 0 ∈ b τ − I ( ∞ , ζ ) . In this scenario, w e define the b-v alue b elo w. Definition 2.2. Define the b-v alue as the critical v alue b ∗ of testing τ = 0 versus τ = 0 : b ∗ ( ζ ) = b ∗ ( ζ , b τ 0 , b τ 1 , σ 2 0 , σ 2 1 ) = inf { b ≥ 0 : 0 ∈ b τ − I ( b, ζ ) } . (2.3) By the monotonicity conditions in Assumption 2.2 , we hav e 0 / ∈ b τ − I ( b, ζ ) for 0 ≤ b < b ∗ and 0 ∈ b τ − I ( b, ζ ) for b > b ∗ . Th us, we reject the null h yp othesis τ = 0 when b < b ∗ and fail to reject it for b > b ∗ . The b-v alue b ∗ th us represents the maxim um relative bias b ey ond whic h the null h yp othesis can no longer be rejected. W e prop ose to rep ort b ∗ defined in ( 2.3 ) for giv en estimators ( b τ 0 , b τ 1 ) and significance lev el ζ . Compared with the sensitivit y analysis literature, the bias b ound b in our framework plays a role analogous to the sensitivit y parameter. The corresp onding confidence interv al b τ − I ( b, ζ ) serves as a sensitivity curve, illustrating how inference changes as the bias v aries. In this context, the b-v alue b ∗ serv es a role similar to key robustness metrics in prior work: it parallels the design sensitivity b y Rosenbaum ( 2004 ), the E-v alue b y V anderW eele and Ding ( 2017 ), and the robustness v alue by Cinelli and Hazlett ( 2020 ). The b-v alue b ∗ pro vides a criterion for comparing comp eting strategies of confidence in terv al construction. Differen t c hoices of the com bined estimator b τ and different formulations of I ( b, ζ ) can lead to different v alues of b ∗ . W e prefer pro cedures that yield a larger b-v alue b ∗ , since this indicates 6 that the resulting confidence interv al is more robust to p otential bias in the biased estimator. This role of b ∗ is analogous to that use of the design sensitivity in Rosen baum ( 2004 ), where it serves as a criterion for comparing differen t test statistics and matched designs in observ ational studies. Remark 2.1. Besides the scenario of primary interest: 0 / ∈ b τ − I (0 , ζ ) and 0 ∈ b τ − I ( ∞ , ζ ) , tw o other less in teresting scenarios may occur. First, if 0 ∈ b τ − I (0 , ζ ) , then 0 ∈ b τ − I ( b, ζ ) for all b ≥ 0 , and w e alw ays fail to reject the null hypothesis regardless of the bias magnitude. Second, if 0 / ∈ b τ − I ( ∞ , ζ ) , then 0 / ∈ b τ − I ( b, ζ ) for all b ≥ 0 , and we alw a ys reject the null h yp othesis regardless of the bias magnitude. In these tw o scenarios, com bining the estimators or not do es not c hange the statistical result qualitativ ely . Therefore, w e ignore them in our discussion. Dep ending on the choice of b τ and construction of I ( b, ζ ) , not all three scenarios may arise. W e fo cus on the case when the b-v alue b ∗ defined in ( 2.3 ) is in the in terv al (0 , ∞ ) . 2.2 P oin t estimation: a review Before diving in to the details of the method for constructing confidence in terv als and obtaining the b-v alue, we first review the existing approaches to p oin t estimation. Sp ecifically , we review three p oin t estimators: the precision-weigh ted estimator, the pretest estimator, and the soft-thresholding estimator. First, w e recall the precision-weigh ted estimator: b τ PW := σ 2 1 σ 2 0 + σ 2 1 b τ 0 + σ 2 0 σ 2 0 + σ 2 1 b τ 1 = b τ 0 + σ 2 0 σ 2 0 + σ 2 1 ( b τ 1 − b τ 0 ) = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) . When ∆ is known to b e 0, the precision-weigh ted estimator is the maximum likelihoo d estimator and b est linear un biased estimator of τ . Moreov er, b y the classical Le Cam asymptotic decision theory ( Le Cam , 1956 ) and classical results for the Gaussian shift mo del, it is asymptotically admissible and minimax under L 2 risk in general, i.e., E[( b τ − τ ) 2 ] , among all regular estimators under standard regularit y conditions. Ho wev er, b τ PW is not robust to bias: its risk is large when | ∆ | is large. This motiv ates us to consider a combined estimator that p erforms nearly as well as b τ PW when the bias is small, and is robust to unknown bias ∆ , ensuring that the w orst-case risk sup ∆ ∈ R E ∆ [( b τ − τ ) 2 ] remains b ounded. This motiv ates the following t w o estimators. Second, we recall the pretest estimator, which in volv es incorp orating a pretest for ∆ = 0 versus ∆ = 0 , a pro cedure that is commonly used ( Bancroft , 1944 ; W allace , 1977 ; Bancroft and Han , 1977 ; Giles and Giles , 1993 ). Under ∆ = 0 , given the indep endence b et ween b τ 1 and b τ 0 , their difference follo ws: b τ 1 − b τ 0 ∼ N (0 , σ 2 ) , where σ 2 = σ 2 0 + σ 2 1 . F or a fixed significance level α ∈ (0 , 1) , we consider the test statistic ( b τ 1 − b τ 0 ) /σ . Let A = {| b τ 1 − b τ 0 | ≤ σ c α/ 2 } denote the even t that the pretest fails to reject the null hypothesis ( ∆ = 0 ). If the pretest fails to reject the null hypothesis, i.e., | b τ 1 − b τ 0 | ≤ σ c α/ 2 , the pretest estimator uses the precision-weigh ted 7 estimator: b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) . If the pretest rejects the n ull hypothesis, i.e., | b τ 1 − b τ 0 | > σ c α/ 2 , the pretest estimator emplo ys hard-thresholding by reverting to the unbiased estimator b τ 0 . Combining the t wo cases, the pretest estimator is: b τ PT = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) 1 ( A ) . Third, we recall the soft-thresholding estimator, whic h ensures con tinuit y at the pretest b ound- ary ( | b τ 1 − b τ 0 | = σ c α/ 2 ). If the pretest fails to reject the n ull hypothesis, the soft-thresholding estimator also uses the precision-weigh ted estimator. If the pretest rejects the null h yp othesis, the soft-thresholding estimator emplo ys soft-thresholding by: b τ 0 + γ 1 + γ σ c α/ 2 sign( b τ 1 − b τ 0 ) . Com bining the tw o cases, the soft-thresholding estimator is: b τ ST = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) 1 ( A ) + γ 1 + γ σ c α/ 2 sign( b τ 1 − b τ 0 ) 1 ( A c ) . While b τ PT reduces to b τ 0 with probability nearly one when the bias is large, the risk of b τ PT is m uch higher than that of b τ ST when the bias is mo derate. See Bick el ( 1983 ) and Armstrong et al. ( 2025 ) for a comparison of worst-case risk betw een b τ PT and b τ ST and the sup erior p erformance of b τ ST . In b oth b τ PT and b τ ST , the role of α is differen t from its usual in terpretation in hypothesis testing. Rather than a T yp e I error rate, α here acts as a tuning parameter that balances the bias and v ariance of the p oin t estimator. In this sense, α simply indexes a family of estimators, muc h like the p enalt y level in p enalized regression metho ds (e.g., the Lasso). Cho osing α optimally dep ends on ∆ , whic h is unknown in practice. In our empirical studies, w e set α = 0 . 05 as a default, following the standard con ven tion, and examine how the estimator’s p erformance v aries across differen t bias lev els. 3 Confidence in terv als, h yp othesis testing, and the b-v alue Giv en the canonical p oin t estimators in troduced in Section 2.2 , we no w discuss the problem of constructing confidence in terv als, h yp othesis testing, and the b-v alue. 3.1 Confidence in terv al based on the precision-w eighted estimator As a warm-up, we use the precision-w eighted estimator b τ PW to construct the sequence of confidence in terv als and to illustrate how the confidence interv al c hanges with resp ect to b . Based on the theory b elo w, w e do not recommend using the precision-w eighted estimator and its corresp onding confi- dence in terv als in practice. Under Assumption 2.1 , w e hav e b τ PW ∼ N ( τ + (1 + γ ) − 1 γ ∆ , (1 + γ ) − 1 σ 2 0 ) . The follo wing theorem pro vides the sequence of confidence in terv als based on b τ PW dep ending on b . 8 Theorem 3.1. L et b L PW = b L PW ( b, ζ , γ ) ≥ 0 denote the solution to the e quation of L : Φ L − γ √ 1 + γ b − Φ − L − γ √ 1 + γ b = 1 − ζ . The b L PW always exists and is unique. The shortest length symmetric c enter e d c onfidenc e interval b ase d on b τ PW for τ satisfying ( 2.1 ) is given by [ b τ PW − b L PW (1 + γ ) − 1 / 2 σ 0 , b τ PW + b L PW (1 + γ ) − 1 / 2 σ 0 ] . The b L PW in Theorem 3.1 corresp onds to the (1 − ζ ) quantile of the folded normal distribution | N ( γ √ 1+ γ b, 1) | . This distribution also arises in the regression literature, where w orst-case bias is incorp orated into the construction of confidence in terv als ( Armstrong et al. , 2020 , 2022 ). In the sp ecial case when b = 0 , we ha ve b L PW (0 , ζ , γ ) = c ζ / 2 , whic h yields the length of the confidence interv al based on b τ PW to b e 2 c ζ / 2 (1 + γ ) − 1 / 2 σ 0 . F or comparison, recall that the length of the confidence interv al solely based on b τ 0 is 2 c ζ / 2 σ 0 . Th us the length of the confidence in terv al reduces by a factor of (1 + γ ) − 1 / 2 when using b τ PW instead of b τ 0 , whic h is small when γ is large, i.e., σ 2 0 is muc h larger than σ 2 1 . This indicates that b y using a more precise but p oten tially biased estimator, if w e ha ve prior kno wledge that the bias is small, then we can achiev e a muc h shorter confidence in terv al. Giv en ζ and γ , the b L PW ( b, ζ , γ ) in Theorem 3.1 increases as b increases, and go es to infinity as b → ∞ . Therefore, the precision-weigh ted estimator is not robust to unkno wn bias since the confidence in terv al is not b ounded as the bias diverges. One in teresting observ ation is that b L PW dep ends on σ 2 0 and σ 2 1 only through γ , the v ariance ratio. This explains why the bias ∆ is scaled by σ 0 in the definition of b , and the confidence in terv al is represen ted as [ b τ PW − b L PW (1 + γ ) − 1 / 2 σ 0 , b τ PW + b L PW (1 + γ ) − 1 / 2 σ 0 ] . 3.2 Confidence in terv al based on the pretest estimator W e no w construct the confidence in terv al for the pretest estimator b τ PT . In the following lemma, we presen t the distribution of b τ PT − τ . Lemma 3.1. L et Z 1 , Z 2 b e two indep endent standar d normal r andom variables. Then b τ PT − τ is distribute d as 1 √ 1 + γ σ 0 Z 1 + γ 1 + γ ∆ 1 Z 2 + r γ 1 + γ ∆ σ 0 ≤ c α/ 2 − r γ 1 + γ σ 0 Z 2 1 Z 2 + r γ 1 + γ ∆ σ 0 > c α/ 2 . By Lemma 3.1 , the distribution of b τ PT is a mixture of a normal distribution and a truncated normal distribution. W e define b L PT = b L PT ( b, ζ , σ 2 0 , σ 2 1 , α ) as the smallest length such that the confidence in terv al [ b τ PT − b L PT (1 + γ ) − 1 / 2 σ 0 , b τ PT + b L PT (1 + γ ) − 1 / 2 σ 0 ] ac hieves correct co verage for all ∆ satisfying | ∆ /σ 0 | ≤ b , where w e choose the same scale as in Theorem 3.1 for comparison. By Definition 2.1 , w e can formulate b L PT as the optimization problem b L PT = b L PT ( b, ζ , σ 2 0 , σ 2 1 , α ) = inf L ≥ 0 : inf ∆: | ∆ /σ 0 |≤ b P ∆ ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) ≥ 1 − ζ . (3.1) Ho wev er, b L PT generally does not admit a closed-form expression. Moreo ver, direct computation of b L PT based on ( 3.1 ) is computationally c hallenging since for eac h L ≥ 0 , the optimization problem 9 ( 3.1 ) inv olv es finding the infim um of P ∆ ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) ov er all ∆ satisfying | ∆ /σ 0 | ≤ b . Nev ertheless, we show that computing b L PT is tractable due to the follo wing theorem: Theorem 3.2. F or any L > 0 , P ∆ ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) as a function of ∆ is symmetric ab out ∆ = 0 . Then b L PT = b L PT ( b, ζ , γ , α ) in ( 3.1 ) is the solution to the fol lowing e quation of L : min 0 ≤ t ≤ b P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ , wher e P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t ih Φ L − γ √ 1 + γ t − Φ − L − γ √ 1 + γ t i + Z − c α/ 2 − q γ 1+ γ t −∞ h Φ L + √ γ u − Φ − L + √ γ u i ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t h Φ L + √ γ u − Φ − L + √ γ u i ϕ ( u )d u. (3.2) [ b τ PT − b L PT (1 + γ ) − 1 / 2 σ 0 , b τ PT + b L PT (1 + γ ) − 1 / 2 σ 0 ] is the shortest length symmetric c enter e d c onfidenc e interval b ase d on b τ PT for τ satisfying ( 2.1 ) . W e hav e tw o main observ ations from Theorem 3.2 . First, for any L > 0 , the cov erage probabilit y P ∆ ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) is symmetric about ∆ = 0 . As a result, instead of minimizing o ver all ∆ satisfying | ∆ /σ 0 | ≤ b as in ( 3.1 ), w e can restrict attention to the in terv al 0 ≤ ∆ /σ 0 ≤ b . Second, Theorem 3.2 pro vides an explicit expression for the cov erage probability P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) as a function of t and L . The first term in ( 3.2 ) corresp onds to the ev en t that the pretest fails to reject the n ull hypothesis ( ∆ = 0 ), in whic h case b τ PT reduces to the precision- w eighted estimator. The second and third terms in ( 3.2 ) integrate ov er the regions where the pretest rejects the n ull hypothesis, in which case b τ PT reduces to the un biased estimator. Although the probability P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) admits a closed form for any L > 0 , the minimum of P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) ov er 0 ≤ t ≤ b is not explicitly av ailable in closed form, and thus must b e computed n umerically . Since the fu nction P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) is monotonically increasing with respect to L for L ≥ 0 , L ∗ PT can be computed n umerically using, for example, the bisection metho d, com bined with numerical computation of the minim um of P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) with resp ect to t for each candidate L . 3.3 Confidence in terv al based on the soft-thresholding estimator W e now construct the confidence interv al for the soft-thresholding estimator b τ ST . In the following lemma, w e present the distribution of b τ ST − τ . Lemma 3.2. L et Z 1 , Z 2 b e two indep endent standar d normal r andom variables. Then b τ ST − τ is distribute d as 1 √ 1 + γ σ 0 Z 1 + γ 1 + γ ∆ 1 Z 2 + r γ 1 + γ ∆ σ 0 ≤ c α/ 2 10 − r γ 1 + γ σ 0 Z 2 − c α/ 2 sign Z 2 + r γ 1 + γ ∆ σ 0 1 Z 2 + r γ 1 + γ ∆ σ 0 > c α/ 2 . By Lemma 3.2 , the distribution of b τ ST is a mixture of a normal distribution and a truncated normal distribution. Supp ose | ∆ /σ 0 | ≤ b for some b > 0 . W e seek the shortest length b L ST = b L ST ( b, ζ , σ 2 0 , σ 2 1 , α ) suc h that the confidence interv al [ b τ ST − b L ST (1 + γ ) − 1 / 2 σ 0 , b τ ST + b L ST (1 + γ ) − 1 / 2 σ 0 ] ac hieves correct cov erage uniformly ov er all ∆ with | ∆ /σ 0 | ≤ b . By Definition 2.1 , w e can formulate b L ST as the optimization problem: b L ST = inf L ≥ 0 : inf ∆: | ∆ /σ 0 |≤ b P ∆ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) ≥ 1 − ζ . (3.3) Ho wev er, b L ST generally do es not admit a closed-form expression. Moreov er, direct computation of b L ST based on ( 3.3 ) is computationally c hallenging since for each L ≥ 0 , the optimization problem ( 3.3 ) inv olv es finding the infimum of P ∆ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) ov er all ∆ satisfying | ∆ /σ 0 | ≤ b . Nev ertheless, we show that b L ST can b e computed efficiently due to the follo wing theorem: Theorem 3.3. F or any L > 0 , P ∆ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) as a function of ∆ is symmetric ab out ∆ = 0 and monotonic al ly de cr e asing in | ∆ | . Then b L ST = b L ST ( b, ζ , γ , α ) in ( 3.3 ) is the solution to the fol lowing e quation of L : P ∆ /σ 0 = b ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ , wher e P ∆ /σ 0 = t ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t ih Φ L − γ √ 1 + γ t − Φ − L − γ √ 1 + γ t i + Z − c α/ 2 − q γ 1+ γ t −∞ h Φ L + √ γ ( u + c α/ 2 ) − Φ − L + √ γ ( u + c α/ 2 ) i ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t h Φ L + √ γ ( u − c α/ 2 ) − Φ − L + √ γ ( u − c α/ 2 ) i ϕ ( u )d u. (3.4) [ b τ ST − b L ST (1 + γ ) − 1 / 2 σ 0 , b τ ST + b L ST (1 + γ ) − 1 / 2 σ 0 ] is the shortest length symmetric c enter e d c onfidenc e interval b ase d on b τ ST for τ satisfying ( 2.1 ) . W e hav e tw o main observ ations from Theorem 3.3 . First, for any L > 0 , the cov erage probabilit y P ∆ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) is symmetric about ∆ = 0 , unlike the pretest estimator, monoton- ically decreasing in | ∆ | . This monotonicit y implies that the worst-case cov erage ov er the bias ∆ satisfying | ∆ /σ 0 | ≤ b as in ( 3.3 ) is alwa ys attained at the b oundary ∆ /σ 0 = b . The monotonicity of the cov erage probability P ∆ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) makes the computation of b L ST more efficien t than that of b L PT . Second, Theorem 3.3 pro vides an explicit expression for the co verage probabilit y P ∆ /σ 0 = t ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) as a function of t and L . The first term in ( 3.4 ) corresp onds to the ev ent that the pretest fails to reject the n ull h yp othesis ( ∆ = 0 ), in whic h case b τ ST reduces to the precision-weigh ted estimator. The second and third terms in ( 3.4 ) integrate ov er 11 the regions where the pretest rejects the n ull hypothesis, in whic h case b τ ST reduces to the unbiased estimator with a constan t shift to ensure contin uit y at the pretest b oundary . Since P ∆ /σ 0 = b ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) is monotonically increasing in L , b L ST can b e efficien tly computed b y , for example, the bisection metho d. 3.4 Comparison of the p oin t estimators and confidence in terv als Compared with b τ PW , b τ ST has b ounded w orst-case risk. The mean squared error of b τ ST remains b ounded regardless of the magnitude of the bias, whereas the mean squared error of b τ PW gro ws without b ound as the bias increases. Thus, b τ ST offers a more balanced compromise b et ween efficiency and robustness: it retains efficiency comparable to b τ PW when the bias is small, yet its p erformance comparable to the un biased estimator when the bias is large. Compared with b τ PT , b τ ST enjo ys a desirable monotonicit y prop ert y: for any L > 0 , the probability P ∆ ( | b τ ST − τ | ≤ L ) decreases monotonically in | ∆ | (Theorem 3.3 ). This behavior parallels the result in Bick el ( 1983 ), which shows that the mean squared error of b τ ST increases monotonically in | ∆ | . This monotonicit y of b τ ST allo ws us to compute the confidence interv al based on b τ ST more efficien tly than b τ PT , as sho wn in Theorem 3.3 . Figure 1 compares the confidence in terv als based on b τ ST , b τ PT , b τ PW , and the un biased estimator b τ 0 , as the bias bound b v aries. W e fix σ 2 0 = 1 and b τ 0 = 1 . Let ζ = 0 . 05 and α = 0 . 05 . W e set b τ 1 = 2 and examine tw o scenarios: γ = 10 and γ = 100 . W e observe that when the bias b ound b is small, the confidence in terv als based on b τ ST , b τ PT , and b τ PW are all shorter than that based on b τ 0 . Imp ortan tly , when the bias b ound b is small, the confidence interv al based on b τ ST is shorter than b τ PT and is comparable to b τ PW . F urthermore, the confidence in terv al length based on b τ ST remains b ounded, whereas that based on b τ PW gro ws without b ound as b increases. Therefore, b τ ST com bines the efficiency gains of precision-weigh ted metho ds when bias is small with robustness to large biases, making it a sup erior choice for inference in practice. 0 1 2 3 4 5 B i a s B o u n d b 2 0 2 4 6 CI Endpoints = 1 0 PW CI PT CI ST CI Unbiased CI 0 1 2 3 4 5 B i a s B o u n d b = 1 0 0 PW CI PT CI ST CI Unbiased CI Figure 1: Confidence interv als against the maximum relativ e bias | ∆ /σ 0 | ≤ b 12 Remark 3.1 (Computing the b-v alue) . After plotting the estimators and their confidence interv als as in Figure 1 , w e are ready to read the b-v alue for each estimator given the significance level ζ . Since the sequence of confidence interv als is differen t for different estimators, the b-v alues are also differen t. Alternatively , we can also compute the b-v alue directly . A naiv e wa y is to use bisection metho d by the monotonicit y of the confidence interv al, which inv olves computing the confidence in terv al for each p ossible bias level b , and see whether the confidence interv al contains the n ull v alue. Ho wev er, this pro cedure is computationally heavy since the computation of the confidence interv al for a given b also inv olv es the bisection metho d when based on b τ PT and b τ ST . In App endix A.1 , w e prop ose a metho d to compute the b-v alue efficien tly . 3.5 Generalization to the dep enden t case So far we assume independent un biased and biased estimators. In this section, w e generalize the ab o v e discussion to the case where b τ 0 and b τ 1 are jointly normal with kno wn correlation ρ . W e assume ρσ 1 = σ 0 , whic h trivially holds when σ 2 0 > σ 2 1 , or equiv alen tly γ > 1 . The k ey is to construct a biased estimator which is indep enden t of b τ 0 without losing information of b τ 1 . T o ac hieve that, w e define the reparametrization b τ ′ 1 = b τ 1 − ( ρσ 1 /σ 0 ) b τ 0 1 − ρσ 1 /σ 0 . (3.5) Then, b τ 0 and b τ ′ 1 are indep enden t, with b τ ′ 1 ∼ N ( τ + ∆ ′ , σ ′ 2 1 ) , where ∆ ′ = ∆ 1 − ρσ 1 /σ 0 , σ ′ 2 1 = (1 − ρ 2 ) σ 2 1 (1 − ρσ 1 /σ 0 ) 2 . Th us, the problem reduces to the indep enden t case considered ab o ve. In the reparametrization in ( 3.5 ), we need to kno w how the transformation rescales the bias and how to in terpret the relativ e bias. W e can compute the confidence interv als and the b-v alue based on the reparametrization with indep enden t unbiased estimator b τ 0 and transformed biased estimator b τ ′ 1 . Let the confidence in terv als and the b-v alue in the transformed problem b e b τ − I ′ ( b, ζ ) and b ∗ ′ , resp ectiv ely . Then the confidence in terv als for the original problem are b τ − I ( b, ζ ) = b τ − I ′ ( b/ | 1 − ρσ 1 /σ 0 | , ζ ) and the b-v alue is b ∗ = | 1 − ρσ 1 /σ 0 | b ∗ ′ , respectively . W e relegate the tec hnical details to App endix A.4 . 4 Generalization to the m ultiv ariate case In this section, w e extend our framework to the m ultiv ariate case. Many applications inv olv e vector- v alued parameters rather than scalars. Consider tw o leading examples in causal inference. First, when a treatmen t has multiple lev els, the parameter of in terest is a vector of treatmen t effects across those lev els. This setting also includes factorial designs, where researchers aim to estimate factorial effects jointl y . Second, when the p opulation is partitioned in to subgroups, the fo cus is often on subgroup treatment effects, yielding a v ector of conditional av erage treatmen t effects indexed b y the subgroup v ariable ( Sch w artz et al. , 2026 ). Bey ond causal inference, similar issues arise in 13 regression settings. F or instance, when combining OLS and IV estimators, the target parameter can b e a vector of regression co efficien ts. Therefore, generalization to the multiv ariate case is essential for applications where researc hers must make inferences about multiple parameters. 4.1 Setup W e consider the following m ultiv ariate setting: Assumption 4.1. Supp ose we observe two indep endent r andom ve ctors: one unbiase d estimator b τ 0 ∼ N ( τ , Σ 0 ) and one biase d estimator b τ 1 ∼ N ( τ + ∆ , Σ 1 ) . Her e τ ∈ R d is the unknown p ar ameter of inter est. W e assume that Σ 0 and Σ 1 ar e known but ∆ is unknown. As in the univ ariate case, w e presen t our analysis in the exact normality setting which asymp- totically captures the essential structure of inference as in W ald ( 1943 ) and Le Cam ( 1956 ). Our goal is to construct confidence regions for τ at different bias levels. Although w e assume indep en- den t un biased and biased estimators, the extension to the dependent case is straightforw ard; see Section 3.5 . Since b τ 0 is unbiased for τ , giv en significance lev el ζ ∈ (0 , 1) , a natural confidence region based solely on b τ 0 is the ellipsoid: { τ ∈ R d : ( b τ 0 − τ ) ⊤ Σ − 1 0 ( b τ 0 − τ ) ≤ χ 2 d, 1 − ζ } , (4.1) where χ 2 d, 1 − ζ is the (1 − ζ ) upp er quan tile of the chi-squared distribution with d degrees of freedom. Then w e consider combining the tw o estimators b τ 0 and b τ 1 , and we consider a generic combined estimator b τ = b τ ( b τ 0 , b τ 1 , Σ 0 , Σ 1 ) . W e assume that | [ Σ − 1 / 2 ∆ ] j | ≤ b j for some b j ≥ 0 for all j = 1 , 2 , . . . , d and study how the confidence region c hanges with the maxim um relative bias v ector b = ( b 1 , b 2 , . . . , b d ) ⊤ . Here Σ = Σ ( Σ 0 , Σ 1 ) ∈ R d × d can b e any fixed p ositiv e definite scaling matrix that dep ends on Σ 0 and Σ 1 , which corresp onds to σ 2 0 w e used in the univ ariate case. F or a given bias vector b , different c hoices of Σ corresp ond to different regions for the bias ∆ . In this pap er, w e do not discuss how to choose the scaling matrix Σ optimally . F or simplicity , one may take Σ = I d , where I d is the d × d iden tity matrix, or set Σ = Σ 0 as in the univ ariate case. Analogous to Definition 2.1 for the univ ariate case, the confidence region is defined b elo w. Definition 4.1. Given a significance lev el ζ ∈ (0 , 1) and the maximum relativ e bias v ector b with b j ≥ 0 for all j = 1 , 2 , . . . , d , we seek a region I ( b , ζ ) = I ( b , ζ , b τ 0 , b τ 1 , Σ 0 , Σ 1 ) suc h that inf ∆ : | Σ − 1 / 2 ∆ |≤ b P ∆ ( τ ∈ b τ − I ( b , ζ )) = inf ∆ : | Σ − 1 / 2 ∆ |≤ b P ∆ ( b τ − τ ∈ I ( b , ζ )) ≥ 1 − ζ . (4.2) The confidence region based on b τ is then given by b τ − I ( b , ζ ) . Then we introduce the monotonicit y conditions on the region I ( b , ζ ) b elo w, which generalizes Assumption 2.2 to the m ultiv ariate case. Assumption 4.2. W e assume: 1. F or fixe d ( b τ 0 , b τ 1 , Σ 0 , Σ 1 ) and ζ , we have I ( b , ζ ) ⊂ I ( b ′ , ζ ) whenever b ≤ b ′ , i.e., b j ≤ b ′ j for al l j = 1 , 2 , . . . , d . 14 2. F or fixe d ( b τ 0 , b τ 1 , Σ 0 , Σ 1 ) and b , we have I ( b , ζ ) ⊂ I ( b , ζ ′ ) whenever ζ ≥ ζ ′ . Assume I ( b , ζ ) satisfies the monotonicity conditions of Assumption 4.2 . A common family of suc h regions is fixed-length cen tered ellipsoids: I ( b , ζ ) = { h ∈ R d : h ⊤ A − 1 h ≤ c ( b , ζ , Σ 0 , Σ 1 ) } with some constan t c ( b , ζ , Σ 0 , Σ 1 ) ≥ 0 and a p ositiv e definite matrix A . In this case, the confidence region b τ − I ( b , ζ ) in Definition 4.1 is given by { τ ∈ R d : ( b τ − τ ) ⊤ A − 1 ( b τ − τ ) ≤ c ( b , ζ , Σ 0 , Σ 1 ) } . Remark 4.1. Unlik e the univ ariate case, the construction of confidence regions in the m ultiv ariate setting is more complicated. First, in one dimension, the only conv ex confidence set is an interv al, whereas in higher dimensions, there exists a wide v ariet y of admissible con vex confidence regions. Although ellipsoidal regions enjo y optimality prop erties under sp ecific conditions ( Stein , 1962 ; W ald , 1949 ), the appropriate geometry dep ends on the family of contrasts and the norm used to measure uncertain ty . Ellipsoidal regions based on the Mahalanobis distance ( Hotelling , 1931 ) arise naturally under the Gaussian mo del, but alternative geometries hav e b een studied in the literature ( T ukey , 1949 ; Sc heffé , 1953 ; Šidák , 1967 ). Second, the optimal choice of the cen ter of the confidence region is not straigh tforward. F or d ≥ 3 , Stein’s phenomenon implies that recentering at shrink age esti- mators such as the James–Stein estimator can yield confidence regions with smaller v olume while main taining nominal co v erage ( Stein , 1962 ; Hw ang and Casella , 1982 ; Berger , 1985 ). In this sec- tion we follo w one sp ecific trac k: constructing ellipsoidal confidence regions under the Mahalanobis distance, cen tered at presp ecified estimators, in parallel with our discussion for the univ ariate case in Section 3 . W e do not fo cus on the optimalit y among all p ossible shap es or centers, but rather fo cus on this formulation for its analytical tractabilit y and interpret ability within our framew ork. Then w e define the multiv ariate b-v alue b elo w. Definition 4.2. Define the b-v alue as the critical boundary b ∗ of testing τ = 0 v ersus τ = 0 as b ∗ ( ζ ) = b ∗ ( ζ , b τ 0 , b τ 1 , Σ 0 , Σ 1 ) = ∂ { b ≥ 0 : 0 ∈ b τ − I ( b , ζ ) } . (4.3) Definition 4.2 extends the univ ariate notion of the b-v alue (Definition 2.2 ) to the multiv ariate setting. In higher dimensions d > 1 , the set { b ≥ 0 : 0 ∈ b τ − I ( b , ζ ) } is a con vex region in the first quadran t. The b-v alue is defined as the b oundary of this region, ∂ { b ≥ 0 : 0 ∈ b τ − I ( b , ζ ) } , whic h is a ( d − 1) -dimensional surface, e.g., a curve when d = 2 , in the first quadrant. When d = 1 , this b oundary reduces to the single left endp oin t inf { b ≥ 0 : 0 ∈ b τ − I ( b, ζ ) } , which is exactly Definition 2.2 . By the monotonicit y conditions in Assumption 4.2 , the geometry of this b oundary yields a natural decision rule: we reject the null hypothesis τ = 0 if the bias bound vector b lies to the left of the b-v alue surface, and fail to reject it if b lies on or to the righ t of the b-v alue surface. T o compute and visualize the multiv ariate b-v alue, w e can plot the estimators and their confidence regions, and then w e are ready to read the b-v alue for each estimator giv en the significance lev el ζ . 4.2 P oin t estimation First, w e recall the precision-weigh ted estimator defined as b τ PW = ( Σ − 1 0 + Σ − 1 1 ) − 1 ( Σ − 1 0 b τ 0 + Σ − 1 1 b τ 1 ) = b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) . 15 The precision-w eigh ted estimator is the maxim um lik eliho od estimator of τ if ∆ is kno wn to b e all zeros. Ho wev er, its risk is large when ∥ ∆ ∥ 2 is large. Th us, we seek a combined estimator that p erforms nearly as w ell as b τ PW when the bias is small, and is robust to unknown bias ∆ , ensuring that the maxim um risk sup ∆ ∈ R d E ∆ [ ∥ b τ PW − τ ∥ 2 2 ] remains bounded. T o dev elop estimators that p erform well under zero bias yet remain robust to unknown bias, v arious estimators ha ve b een prop osed in the empirical Bay es and shrink age estimation literature ( Berger , 1981 ; Bic kel , 1984 ; Green and Strawderman , 1991 ; Green et al. , 2005 ; Rosenman et al. , 2023a , b ). In this pap er, w e fo cus on the generic pretest estimator and the generic soft-thresholding estimator, as discussed b elo w. Second, we recall the pretest estimator, which in v olves incorporating a pretest for ∆ = 0 v ersus ∆ = 0 . Under ∆ = 0 , given the indep endence b et ween b τ 1 and b τ 0 , their difference follo ws: b τ 1 − b τ 0 ∼ N ( 0 , Σ 0 + Σ 1 ) . W e consider the test statistic ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 and the critical v alue q ≥ 0 . The critical v alue q here pla ys a similar role as the significance lev el α in Section 2.2 . Let A = {∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q } denote the ev ent that the pretest fails to reject the n ull hypothesis ( ∆ = 0 ). If the pretest fails to reject the null hypothesis, i.e., ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q , the pretest estimator uses the precision-w eighted estimator: b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) . If the pretest rejects the null h yp othesis, i.e., ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q , the pretest estimator emplo ys hard-thresholding b y reverting to the un biased estimator b τ 0 . Combining the t wo cases, the pretest estimator is: b τ PT = b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) 1 ( A ) . Third, we recall the soft-thresholding estimator, whic h ensures con tinuit y at the pretest b ound- ary ( ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 = q ). If the pretest fails to reject the n ull h yp othesis, the soft- thresholding estimator also uses the precision-weigh ted estimator. If the pretest rejects the n ull h yp othesis, the soft-thresholding estimator emplo ys soft-thresholding by: b τ 0 + h ∗ q ( ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 )( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) , where h ∗ q ( · ) : [ q , ∞ ) → [0 , 1] is a non-increasing function with h ∗ q ( q ) = 1 . Com bining the tw o cases, the soft-thresholding estimator is: b τ ST = b τ 0 + h q ( ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 )( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) , where h q ( · ) : [0 , ∞ ) → [0 , 1] is defined as h q ( r ) = 1 (0 ≤ r ≤ q ) + h ∗ q ( r ) 1 ( r > q ) . Here b oth q and h ∗ q ( · ) can dep end on Σ 0 and Σ 1 . The generic soft-thresholding estimator b τ ST con tains man y classical estimators as sp ecial cases, suc h as the estimator studied in Berger ( 1981 , Theorem 3) and Bick el ( 1984 , Section 4), whic h generalizes the univ ariate soft-thresholding estimator b τ ST to the multiv ariate setting. W e provide a detailed discussion of the relationship betw een the generic soft-thresholding estimator and the 16 estimator in Berger ( 1981 ); Bick el ( 1984 ) in App endix A.5 . In this pap er, w e fo cus on general Σ 0 , Σ 1 , and h ∗ q ( · ) . 4.3 Confidence in terv als First, w e construct the confidence regions based on the precision-weigh ted estimator b τ PW . Under Assumption 4.1 , we hav e b τ PW ∼ N ( τ + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ∆ , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . By using the ellipsoidal confidence region under the Mahalanobis distance and the co v ariance matrix of b τ PW , we consider the confidence region based on b τ PW with the follo wing form: { τ ∈ R d : ( b τ PW − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PW − τ ) ≤ M } , for some M ≥ 0 . The follo wing theorem explicitly c haracterizes the confidence region based on b τ PW as a function of bias b ound b . Theorem 4.1. L et c M PW = c M PW ( b , ζ , Σ 0 , Σ 1 , Σ ) ≥ 0 b e the (1 − ζ ) upp er quantile of the nonc entr al chi-squar e d distribution with d de gr e es of fr e e dom and nonc entr ality p ar ameter sup s ∈{± 1 } d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 Σ 1 / 2 b ⊙ s 2 2 . The c onfidenc e r e gion for τ satisfying ( 4.2 ) is given by { τ ∈ R d : ( b τ PW − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PW − τ ) ≤ c M PW } . Theorem 4.1 extends the univ ariate result in Theorem 3.1 to the multiv ariate setting. In the univ ariate case, taking Σ = σ 2 0 and b = b reduces the m ultiv ariate confidence region in Theorem 4.1 exactly to the confidence in terv al in Theorem 3.1 . In the univ ariate case, the bias constrain t { ∆ : | ∆ /σ 0 | ≤ b } has only tw o b oundary p oin ts, and the absolute bias of b τ PW is the same at b oth endp oin ts. By contrast, the multiv ariate bias constrain t { ∆ : | Σ − 1 / 2 ∆ | ≤ b } is a con vex h yp errectangle with 2 d v ertices, and the maximum L 2 norm of the bias of b τ PW ma y o ccur at an y of these vertices. Theorem 4.1 therefore characterizes the maximum p ossible bias of b τ PW o ver the en tire bias constraint. In the sp ecial case when b = 0 , we ha ve c M PW = χ 2 d, 1 − ζ . Compared with the ellipsoid confidence region based on b τ 0 , the scaling matrix in the definition of ellipsoid confidence region reduces from Σ − 1 0 to Σ − 1 0 + Σ − 1 1 = ( Σ − 1 1 Σ 0 + I d ) Σ − 1 0 , analogous to the efficiency gain in the univ ariate case, where Σ − 1 1 Σ 0 can b e regarded as the v ariance ratio γ in the univ ariate case. As b increases, c M PW increases, and go es to infinity as b → ∞ . Second, w e construct the confidence regions based on the generic pretest estimator b τ PT . Similar to the confidence regions based on the precision-w eighted estimator, w e consider the confidence regions based on b τ PT with the follo wing form: { τ ∈ R d : ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M } , for some M ≥ 0 . Here, we fo cus on the ellipsoid confidence region defined using the same scaling matrix, Σ − 1 0 + Σ − 1 1 , as that used in the precision-weigh ted estimator. This c hoice allo ws for a direct comparison with the confidence region based on the precision-w eighted estimator through the upp er 17 b ound M in the confidence region. T o ensure the confidence regions satisfy Definition 4.1 , w e need to find the minimal v alue c M PT = c M PT ( b , ζ , Σ 0 , Σ 1 , Σ ) suc h that the confidence region { τ ∈ R d : ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ c M PT } ac hieves correct cov erage for all ∆ satisfying | [ Σ − 1 / 2 ∆ ] j | ≤ b j for all j = 1 , 2 , . . . , d . W e can form ulate c M PT as the optimization problem c M PT = inf M ≥ 0 : inf ∆ : | Σ − 1 / 2 ∆ |≤ b P ∆ (( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M ) ≥ 1 − ζ . W e show ho w to compute c M PT in the follo wing theorem. Theorem 4.2. c M PT is the solution to the e quation of M : inf ∆ : | Σ − 1 / 2 ∆ |≤ b P ∆ (( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M ) = 1 − ζ , with an explicit form given by: P Σ − 1 / 2 ∆ = t ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M =Ψ d M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 Σ 1 / 2 t 2 2 Ψ d q ; ( Σ 0 + Σ 1 ) − 1 / 2 Σ 1 / 2 t 2 2 + Z ∥ u ∥ 2 2 >q Ψ d M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ Σ 1 / 2 t − ( Σ 0 + Σ 1 ) 1 / 2 u ] 2 2 ϕ ( Σ 0 + Σ 1 ) − 1 / 2 Σ 1 / 2 t , I d ( u ) d u . (4.4) Theorem 4.2 extends the univ ariate result in Theorem 3.2 to the multiv ariate setting. Theo- rem 4.2 provides an explicit expression for the cov erage probability P Σ − 1 / 2 ∆ = t (( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M ) as a function of M and t . The first term in ( 4.4 ) corresp onds to the ev ent that the pretest fails to reject the null h yp othesis ( ∆ = 0 ), in which case b τ PT reduces to the precision- w eighted estimator. The second term in ( 4.4 ) integrates ov er the regions where the pretest rejects the n ull hypothesis, in which case b τ PT reduces to the un biased estimator. Third, we construct the confidence regions based on the generic soft-thresholding estimator b τ ST . Similar to the confidence regions based on the precision-w eighted estimator, we consider the confidence regions based on b τ ST with the follo wing form: { τ ∈ R d : ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M } , for some M ≥ 0 . T o ensure the confidence regions satisfy Definition 4.1 , we need to find the minimal v alue c M ST = c M ST ( b , ζ , Σ 0 , Σ 1 , Σ ) suc h that the confidence region { τ ∈ R d : ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ c M ST } ac hieves correct cov erage for all ∆ satisfying | [ Σ − 1 / 2 ∆ ] j | ≤ b j for all j = 1 , 2 , . . . , d . W e can form ulate c M ST as the optimization problem c M ST = inf M ≥ 0 : inf ∆ : | Σ − 1 / 2 ∆ |≤ b P ∆ (( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M ) ≥ 1 − ζ . W e show that c M ST can b e computed efficiently in the follo wing theorem. 18 Theorem 4.3. c M ST is the solution to the e quation of M : inf ∆ : Σ − 1 / 2 ∆ = b ⊙ s , s ∈{− 1 , 1 } d P ∆ (( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M ) = 1 − ζ , with an explicit form given by: P Σ − 1 / 2 ∆ = t ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M =Ψ d M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 Σ 1 / 2 t 2 2 Ψ d q ; ( Σ 0 + Σ 1 ) − 1 / 2 Σ 1 / 2 t 2 2 + Z ∥ u ∥ 2 2 >q Ψ d M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ Σ 1 / 2 t − (1 − h ∗ q ( ∥ u ∥ 2 2 ))( Σ 0 + Σ 1 ) 1 / 2 u ] 2 2 ϕ ( Σ 0 + Σ 1 ) − 1 / 2 Σ 1 / 2 t , I d ( u ) d u . (4.5) Theorem 4.3 extends the univ ariate result in Theorem 3.3 to the multiv ariate setting. Theo- rem 4.3 provides an explicit expression for the cov erage probabilit y P Σ − 1 / 2 ∆ = t (( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M ) as a function of M and t . The first term in ( 4.5 ) corresp onds to the even t that the pretest fails to reject the null hypothesis ( ∆ = 0 ), in which case b τ ST reduces to the precision- w eighted estimator. The second term in ( 4.5 ) integrates ov er the regions where the pretest rejects the n ull hypothesis, in whic h case b τ ST reduces to the un biased estimator with a nonconstan t shift to ensure con tinuit y at the pretest b oundary . As in the univ ariate case, the monotonicity of the co verage probability makes the computation of c M ST more efficien t than that of c M PT . F or an y M > 0 , the cov erage probability is minimized at the boundary of the bias region, namely at the v ertices ∆ satisfying Σ − 1 / 2 ∆ = b ⊙ s for some s ∈ {− 1 , 1 } d . Using Theorems 4.1 – 4.3 , we can construct confidence regions based on b τ PW , b τ PT , and b τ ST for v arious bias b ounds b and significance levels ζ . 5 Generalization to multiple estimators In this section, w e extend our framework to the multiple estimators setting. In empirical researc h, analysts often face the c hallenge of in tegrating evidence from datasets of heterogeneous qualit y . A leading example in causal inference is to combine an RCT with multiple observ ational studies. While the RCT provides un biased but often noisy estimates, observ ational studies can offer m uch larger sample sizes but are sub ject to hidden bias due to unmeasured confounding. Bey ond causal inference, similar problems o ccur when synthesizing results from multiple registries, surv eys, or administrativ e databases, where no single source is sufficient on its o wn. W e presen t our theory b elo w. 5.1 Setup W e consider the follo wing multiple estimators setting with one unbiased estimator and K p oten tially biased estimators. W e can generalize to the case where there are multiple un biased estimators estimating the same parameter. In that case, w e can first combine all the unbiased estimators into 19 a single unbiased estimator with smaller v ariance using the precision-w eighted estimator. Then w e can apply our current framew ork to combine this aggregated unbiased estimator with m ultiple biased estimators. Assumption 5.1. Supp ose we observe K + 1 indep endent r andom variables: one unbiase d estimator b τ 0 ∼ N ( τ , σ 2 0 ) and K p otential ly biase d estimators b τ j ∼ N ( τ + ∆ j , σ 2 j ) for j = 1 , . . . , K . Her e τ ∈ R is the unknown p ar ameter of inter est. W e assume that σ 2 0 , σ 2 1 , . . . , σ 2 K ar e known, wher e as ∆ 1 , . . . , ∆ K ar e unknown. Let γ j = σ 2 0 /σ 2 j b e the v ariance ratio b etw een the unbiased estimator and the j -th biased estimator. Let γ = ( γ 1 , . . . , γ K ) ⊤ b e the vector of v ariance ratios. W e consider a generic com bined estimator b τ = b τ ( b τ 0 , b τ 1 , . . . , b τ K , σ 2 0 , σ 2 1 , . . . , σ 2 K ) . W e assume that | ∆ j /σ 0 | ≤ b j for some b j > 0 for all j = 1 , . . . , K and study how the confidence interv al changes with the maximum relativ e bias vector b = ( b 1 , . . . , b K ) . Let ∆ = (∆ 1 , . . . , ∆ K ) b e the vector of unknown biases. Analogous to Definition 2.1 for the univ ariate case, the confidence in terv al is defined b elo w. Definition 5.1. Given a significance lev el ζ ∈ (0 , 1) and the maximum relativ e bias v ector b with b j ≥ 0 for all j = 1 , . . . , K , w e w ant to construct an in terv al I ( b , ζ ) = I ( b , ζ , b τ 0 , b τ 1 , . . . , b τ K , σ 2 0 , σ 2 1 , . . . , σ 2 K ) suc h that inf ∆ : | ∆ /σ 0 |≤ b P ∆ ( τ ∈ b τ − I ( b , ζ )) = inf ∆ : | ∆ /σ 0 |≤ b P ∆ ( b τ − τ ∈ I ( b , ζ )) ≥ 1 − ζ , (5.1) The confidence in terv al for τ based on b τ is then given by b τ − I ( b , ζ ) . Then w e in tro duce the monotonicity conditions on the interv al I ( b , ζ ) b elow, which generalizes Assumption 2.2 to the m ultiple estimators setting. Assumption 5.2. W e assume: 1. F or fixe d ( b τ 0 , b τ 1 , . . . , b τ K , σ 2 0 , σ 2 1 , . . . , σ 2 K ) and ζ , we have I ( b , ζ ) ⊂ I ( b ′ , ζ ) whenever b ≤ b ′ , i.e., b j ≤ b ′ j for al l j = 1 , . . . , K . 2. F or fixe d ( b τ 0 , b τ 1 , . . . , b τ K , σ 2 0 , σ 2 1 , . . . , σ 2 K ) and b , we have I ( b , ζ ) ⊂ I ( b , ζ ′ ) whenever ζ ≥ ζ ′ . Assume I ( b , ζ ) satisfies the monotonicity conditions of Assumption 5.2 . A common family of suc h in terv als is fixed-length cen tered in terv als: I ( b , ζ ) = [ − c ( b , ζ , σ 2 0 , σ 2 1 , . . . , σ 2 K ) , c ( b , ζ , σ 2 0 , σ 2 1 , . . . , σ 2 K )] with some constan t c ( b , ζ , σ 2 0 , σ 2 1 , . . . , σ 2 K ) ≥ 0 . In this case, the confidence interv al b τ − I ( b , ζ ) in Definition 5.1 is giv en by [ b τ − c ( b , ζ , σ 2 0 , σ 2 1 , . . . , σ 2 K ) , b τ + c ( b , ζ , σ 2 0 , σ 2 1 , . . . , σ 2 K )] . Then w e define the b-v alue with m ultiple estimators b elo w. Definition 5.2. Define the b-v alue as the critical boundary b ∗ of testing τ = 0 versus τ = 0 as b ∗ ( ζ ) = b ∗ ( ζ , b τ 0 , b τ 1 , . . . , b τ K , σ 2 0 , σ 2 1 , . . . , σ 2 K ) = ∂ { b ≥ 0 : 0 ∈ b τ − I ( b , ζ ) } . (5.2) 20 Definition 5.2 extends the univ ariate notion of the b-v alue (Definition 2.2 ) to the multiple es- timators setting. As in the multiv ariate b-v alue defined in Definition 4.2 , in higher dimensions K > 1 , the set { b ≥ 0 : 0 ∈ b τ − I ( b , ζ ) } is a con vex region in the first quadrant, and the b-v alue is the b oundary of this region, whic h is a ( K − 1) -dimensional surface. See Definition 4.2 and the subsequen t discussion for more details. 5.2 P oin t estimation First, w e recall the precision-weigh ted estimator defined as b τ PW = σ − 2 0 σ − 2 0 + P K ℓ =1 σ − 2 ℓ b τ 0 + K X j =1 σ − 2 j σ − 2 0 + P K ℓ =1 σ − 2 ℓ b τ j = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) . If ∆ j = 0 for all j = 1 , . . . , K , the precision-weigh ted estimator is the maxim um likelihoo d estimator and best linear unbiased estimator of τ . Moreo ver, by the classical W ald–Le Cam asymptotic decision theory ( W ald , 1949 ; Le Cam , 1956 ), it is asymptotically admissible and minimax under the L 2 risk, i.e., E[( b τ − τ ) 2 ] , among all regular estimators under standard regularit y conditions. Second, we in tro duce a pretest estimator. W e incorp orate the pretest for ∆ j = 0 separately using b τ j and the unbiased estimator b τ 0 , and combine those b τ j ’s that fail to reject the null. Here w e assume the pretests share the same significance level α for simplicit y . The generalization to different significance lev els is straigh tforw ard. Since b τ j − b τ 0 ∼ N (∆ j , σ 2 0 + σ 2 j ) and σ 2 0 + σ 2 j = (1 + γ − 1 j ) σ 2 0 , let A j = {| b τ j − b τ 0 | ≤ (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 } denote the ev ent that the pretest fails to reject the null hypothesis ( ∆ j = 0 ). The pretest estimator is: b τ PT = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) 1 ( A j ) . Third, we introduce a soft-thresholding estimator, which ensures contin uit y at the pretest b ound- ary . Instead of dropping a biased estimator entirely when the pretest rejects, soft-thresholding shifts the un biased estimator b y making the estimator contin uous at the pretest boundary . The soft-thresholding estimator is: b τ ST = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 h ( b τ j − b τ 0 ) 1 ( A j ) + (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 sign( b τ j − b τ 0 ) 1 A c j i . By the definitions of the pretest estimator and the soft-thresholding estimator, when the pretests fail to reject for all j = 1 , . . . , K , b oth the pretest estimator and the soft-thresholding estimator reduce to the precision-w eighted estimator. When K = 1 , i.e., there is only one biased estima- tor, b oth pretest estimator and soft-thresholding estimator reduce to the estimators in tro duced in Section 2.2 . Similar to the discussion in Section 2.2 , the choice of α is a tuning parameter. The pretest estimator and the soft-thresholding estimator considered ab o ve are not the only wa y to com bine b τ 0 and b τ 1 , . . . , b τ K . F or example, one could apply shrink age or soft-thresholding directly b et w een the precision-weigh ted estimator b τ PW and the un biased estimator b τ 0 , rather than at the 21 lev el of individual biased components. W e fo cus on thresholding at the comp onen t level b ecause it admits a transparent interpretation in terms of testing and controlling eac h bias comp onen t ∆ j separately , whic h aligns naturally with our sensitivity analysis p erspective. Our framew ork can accommo date other shrink age schemes in principle, but we do not pursue them here. 5.3 Confidence in terv als First, we construct the confidence interv als based on the precision-w eighted estimator b τ PW . Under Assumption 5.1 , we hav e b τ PW ∼ N ( τ + (1 + ∥ γ ∥ 1 ) − 1 ⟨ γ , ∆ ⟩ , (1 + ∥ γ ∥ 1 ) − 1 σ 2 0 ) . The following theorem pro vides the confidence interv als based on b τ PW as a function of bias b ound b . Theorem 5.1. L et b L PW = b L PW ( b , ζ , γ ) ≥ 0 denote the solution to the e quation of L : Φ L − ⟨ γ , b ⟩ p 1 + ∥ γ ∥ 1 ! − Φ − L − ⟨ γ , b ⟩ p 1 + ∥ γ ∥ 1 ! = 1 − ζ . The b L PW always exists and is unique. The shortest length symmetric c enter e d c onfidenc e interval b ase d on b τ PW for τ satisfying ( 5.1 ) is given by [ b τ PW − b L PW (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ PW + b L PW (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] . Theorem 5.1 extends the single estimator result in Theorem 3.1 to the multiple estimators setting. In the single estimator case, the bias constraint { ∆ : | ∆ /σ 0 | ≤ b } has only t wo b oundary p oin ts, and the absolute bias of b τ PW is the same at b oth endp oin ts. By contrast, the m ultiple estimators bias constraint { ∆ : | ∆ /σ 0 | ≤ b } is a h yp errectangle with 2 K v ertices, and the maxim um absolute bias of b τ PW o ccurs only at ∆ /σ 0 = ± b . Theorem 5.1 therefore characterizes the maxim um p ossible bias of b τ PW o ver the entire bias constrain t. Second, we construct the confidence in terv als based on the pretest estimator b τ PT . W e seek the shortest length b L PT = b L PT ( b , ζ , γ , α ) such that the confidence interv al [ b τ PT − b L PT (1+ ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ PT + b L PT (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] ac hieves correct co verage uniformly o v er all ∆ with | ∆ j /σ 0 | ≤ b j for all j = 1 , . . . , K . W e can formulate b L PT as the optimization problem: b L PT = inf L ≥ 0 : inf ∆ : | ∆ /σ 0 |≤ b P ∆ ( | b τ PT − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) ≥ 1 − ζ . (5.3) W e show ho w to compute b L PT in the follo wing theorem. Theorem 5.2. b L PT = b L PT ( b , ζ , γ , α ) in ( 5.3 ) is the solution to the e quation of L : inf ∆ : | ∆ /σ 0 |≤ b P ∆ ( | b τ PT − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = 1 − ζ , with an explicit form given by: P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = Z R K " Φ L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 ! − Φ − L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 !# ϕ t , V ( u )d u , 22 wher e for any u ∈ R K , u ′ ∈ R K is define d as u ′ j = u j 1 ( | u j | > (1 + γ − 1 j ) 1 / 2 c α/ 2 ) for al l j = 1 , . . . , K, and V ij = 1+ γ − 1 i 1 ( i = j ) for i, j = 1 , . . . , K . [ b τ PT − b L PT (1+ ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ PT + b L PT (1+ ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] is the shortest length symmetric c enter e d c onfidenc e interval b ase d on b τ PT for τ satisfying ( 5.1 ) . Theorem 5.2 extends the single estimator result in Theorem 3.2 to the multiple estimators setting. Theorem 5.2 pro vides an explicit expression for the cov erage probabilit y P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) as a function of t and L . Third, we construct the confidence interv als based on the soft-thresholding estimator b τ ST . W e seek the shortest length b L ST = b L ST ( b , ζ , γ , α ) suc h that the confidence interv al [ b τ ST − b L ST (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ ST + b L ST (1+ ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] achiev es correct cov erage uniformly o ver all ∆ with | ∆ j /σ 0 | ≤ b j for all j = 1 , . . . , K . W e can formulate b L ST as the optimization problem: b L ST = inf L ≥ 0 : inf ∆ : | ∆ /σ 0 |≤ b P ∆ ( | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) ≥ 1 − ζ . (5.4) W e show that b L ST can b e computed efficiently in the follo wing theorem. Theorem 5.3. F or any L > 0 , P ∆ ( | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) is symmetric ab out ∆ = 0 and monotonic al ly de cr e asing in | ∆ j | for al l j = 1 , . . . , K . Then b L ST = b L ST ( b , ζ , γ , α ) in ( 5.4 ) is the solution to the e quation of L : P ∆ /σ 0 = b ( | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = 1 − ζ , with an explicit form given by: P ∆ /σ 0 = t ( | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = Z R K " Φ L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 ! − Φ − L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 !# ϕ t , V ( u )d u , wher e for any u ∈ R K , u ′ ∈ R K is define d as u ′ j = [ u j − (1 + γ − 1 j ) 1 / 2 c α/ 2 sign( u j )] 1 ( | u j | > (1 + γ − 1 j ) 1 / 2 c α/ 2 ) for al l j = 1 , . . . , K, and V ij = 1 + γ − 1 i 1 ( i = j ) for i, j = 1 , . . . , K . [ b τ ST − b L ST (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ ST + b L ST (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] is the shortest length symmetric c enter e d c onfidenc e interval b ase d on b τ ST for τ satisfying ( 5.1 ) . Theorem 5.3 extends the single estimator result in Theorem 3.3 to the multiple estimators setting. Theorem 5.3 provides an explicit expression for the cov erage probability P ∆ /σ 0 = t ( | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) as a function of t and L . As in the single estimator case, the monotonicit y makes the computation of b L ST more efficient than that of b L PT . F or an y L > 0 , the co verage probabilit y is minimized at the v ertices ∆ satisfying ∆ /σ 0 = ± b . 6 Empirical studies In this section, we use the example from Angrist and Krueger ( 1991 ) to demonstrate ho w our framew ork works in practice, where the authors studied the effect of y ears of sc ho oling on earnings. 23 Angrist and Krueger ( 1991 ) used quarter of birth as an instrument to obtain the IV estimate. They also rep orted the OLS estimate, which w ould b e biased in the presence of endogeneit y . W e consider the OLS estimator and the IV estimator. The OLS estimator of the return to sc ho oling is potentially biased b ecause schooling decisions are endogenous: unobserv ed factors may influence b oth sc ho oling and earnings. By contrast, the IV estimator uses quarter of birth as a source of exogenous v ariation, yielding a consisten t estimate of the causal effect under standard IV assumptions. W e tak e the IV estimator as the un biased estimator b τ 0 and the OLS estimator as the biased estimator b τ 1 . Using these tw o estimators, we construct the three com bined estimators: the precision-w eighted estimator b τ PW , the pretest estimator b τ PT , and the soft-thresholding estimator b τ ST . Figure 2 visualizes the confidence in terv als of the IV, OLS, and combined estimators against the maxim um relative bias | ∆ /σ 0 | ≤ b . 0 1 2 3 4 B i a s B o u n d b 0.05 0.00 0.05 0.10 0.15 0.20 CI Endpoints All Men Unbiased CI (IV only) PW CI PT CI ST CI 0 1 2 3 4 B i a s B o u n d b 0.3 0.2 0.1 0.0 0.1 0.2 0.3 0.4 0.5 Black Men Unbiased CI (IV only) PW CI PT CI ST CI Figure 2: Confidence interv als against the maximum relativ e bias | ∆ /σ 0 | ≤ b In b oth the full sample (blac k and white men) and the subsample (blac k men), we observ e that the standard error of the OLS estimator is muc h smaller than that of the IV estimator. Moreo v er, although w e compute the OLS estimator and the IV estimator using the same sample, their corre- lation is very low. Therefore, the new biased estimator constructed follo wing Section 3.5 is nearly iden tical to the original biased estimator (OLS). The combined estimators b ehav e as expected. When the bias b ound is small, b τ PW , b τ PT , and b τ ST all yield confidence interv als substantially shorter than that of the un biased estimator (IV), reflecting efficiency gains from incorp orating the more precise OLS estimator. Notably , the confidence interv al of b τ ST is nearly iden tical to that of b τ PW and m uch shorter than that of b τ PT when the bias b ound is small. Ev en when the bias b ound b ecomes large, the confidence in terv als of b τ ST and b τ PT remain nearly the same in length. These b eha viors arise b ecause the OLS estimator is far more precise than the IV estimator. Ho wev er, there are differences b et ween the full sample (black men and white men) and the subsample (black men). In the full sample, the sample size is v ery large, and the IV estimator is sufficien tly precise that the n ull h yp othesis of zero returns to sc ho oling can b e rejected using IV 24 alone. In contrast, for the subsample of blac k men, the sample size is m uch smaller, leading to a muc h larger standard error for the IV estimator. As a result, the IV-based confidence interv al is wide enough that the n ull h yp othesis cannot be rejected at the 5% significance lev el. This con trast illustrates a common empirical challenge: when the unbiased estimator is noisy , inference based solely on it can b e inconclusive, ev en when the ov erall dataset is large. In suc h settings, the com bined estimators, particularly the precision-weigh ted estimator b τ PW and the soft-thresholding estimator b τ ST , yield muc h tigh ter confidence interv als under small biases. Under small biases, these estimators pro vide sufficient evidence to reject the null hypothesis of zero returns to sc ho oling, while still allo wing researchers to explicitly assess sensitivity to potential bias. 7 Discussion This pap er dev elops a strategy to combine un biased and biased estimators from a sensitivity analysis p erspective. In particular, w e construct a sequence of confidence interv als indexed by the magni- tude of bias and prop ose the notion of the b-v alue to quantify the maximum relative bias so that com bining estimators yields an insignificant result. A c kno wledgmen t W e thank A vi F eller and Liyang Sun for helpful commen ts. Lin was partially supp orted by the T wo Sigma PhD F ellowship. Ding was supp orted b y the U.S. National Science F oundation (1945136, 2514234). References Angrist, J. D. and Krueger, A. B. (1991). Do es compulsory school attendance affect schooling and earnings? The Quarterly Journal of Ec onomics , 106(4):979–1014. Armstrong, T. B., Kline, P ., and Sun, L. (2025). A dapting to missp ecification. Ec onometric a , 93(6):1981–2005. Armstrong, T. B. and K olesár, M. (2020). Simple and honest confidence in terv als in nonparametric regression. Quantitative Ec onomics , 11(1):1–39. Armstrong, T. B., Kolesár, M., and Kw on, S. (2020). Bias-aw are inference in regularized regression mo dels. arXiv pr eprint arXiv:2012.14823 . Armstrong, T. B., W eidner, M., and Zeleneev, A. (2022). Robust estimation and inference in panels with in teractive fixed effects. arXiv pr eprint arXiv:2210.06639 . A they , S., Chetty , R., and Imbens, G. (2020). Com bining exp erimen tal and observ ational data to estimate treatmen t effects on long term outcomes. arXiv pr eprint arXiv:2006.09676 . 25 Bancroft, T. and Han, C.-P . (1977). Inference based on conditional sp ecification: a note and a bibliograph y . International Statistic al R eview/R evue Internationale de Statistique , pages 117– 127. Bancroft, T. A. (1944). On biases in estimation due to the use of preliminary tests of significance. The A nnals of Mathematic al Statistics , 15(2):190–204. Berger, J. (1981). Estimation in c ontinuous exp onential families: Bayesian estimation subje ct to risk r estrictions and inadmissibility r esults . Purdue Universit y . Departmen t of Statistics. Berger, J. O. (1985). Statistic al De cision The ory and Bayesian A nalysis . Springer Science & Business Media. Bic kel, P . (1983). Minimax estimation of the mean of a normal distribution sub ject to doing well at a p oin t. In R e c ent A dvanc es in Statistics , pages 511–528. Elsevier. Bic kel, P . (1984). P arametric robustness: small biases can b e worth while. The A nnals of Statistics , 12(3):864–879. Bound, J., Jaeger, D. A., and Baker, R. M. (1995). Problems with instrumen tal v ariables estimation when the correlation b et ween the instruments and the endogenous explanatory v ariable is w eak. Journal of the Americ an Statistic al Asso ciation , 90(430):443–450. Bran tner, C. L., Chang, T.-H., Nguyen, T. Q., Hong, H., Di Stefano, L., and Stuart, E. A. (2023). Metho ds for integrating trials and non-exp erimen tal data to examine treatment effect heterogene- it y . Statistic al Scienc e , 38(4):640–654. Chen, A., Owen, A. B., and Shi, M. (2015). Data enric hed linear regression. Ele ctr onic Journal of Statistics , 9:1078–1112. Cinelli, C. and Hazlett, C. (2020). Making sense of sensitivit y: Extending omitted v ariable bias. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 82(1):39–67. Colnet, B., May er, I., Chen, G., Dieng, A., Li, R., V aro quaux, G., V ert, J.-P ., Josse, J., and Y ang, S. (2024). Causal inference metho ds for combining randomized trials and observ ational studies: a review. Statistic al Scienc e , 39(1):165–191. Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B., and W ynder, E. L. (1959). Smoking and lung cancer: recent evidence and a discussion of some questions. Journal of the National Canc er Institute , 22(1):173–203. de Chaisemartin, C. and D’Haultfœuille, X. (2020). Empirical mse minimization to estimate a scalar parameter. arXiv pr eprint arXiv:2006.14667 . Ding, P . and V anderW eele, T. J. (2016). Sensitivit y analysis without assumptions. Epidemiolo gy , 27(3):368–377. 26 Gao, C. and Y ang, S. (2023). Pretest estimation in combining probabilit y and non-probabilit y samples. Ele ctr onic Journal of Statistics , 17(1):1492–1546. Giles, J. A. and Giles, D. E. (1993). Pre-test estimation and testing in econometrics: recent dev elopments. Journal of Ec onomic Surveys , 7(2):145–197. Green, E. J. and Strawderman, W. E. (1991). A james-stein t yp e estimator for combining unbiased and p ossibly biased estimators. Journal of the A meric an Statistic al Asso ciation , 86(416):1001– 1006. Green, E. J., Stra wderman, W. E., Amateis, R. L., and Reams, G. A. (2005). Impro v ed estimation for m ultiple means with heterogeneous v ariances. F or est Scienc e , 51(1):1–6. Hotelling, H. (1931). The generalization of studen t’s ratio. The Annals of Mathematic al Statistics , 2(3):360–378. Hw ang, J. T. and Casella, G. (1982). Minimax confidence sets for the mean of a multiv ariate normal distribution. The Annals of Statistics , 10(3):868–881. Le Cam, L. (1956). On the asymptotic theory of estimation and testing hypotheses. In Pr o c e e dings of the Thir d Berkeley Symp osium on Mathematic al Statistics and Pr ob ability, V olume 1: Contri- butions to the The ory of Statistics , v olume 3, pages 129–157. Universit y of California Press. Rosen baum, P . and Rubin, D. (1983). Assessing sensitivit y to an unobserv ed binary co v ariate in an observ ational study with binary outcome. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 45(2):212–218. Rosen baum, P . R. (2002). Observational Studies . Springer. Rosen baum, P . R. (2004). Design sensitivit y in observ ational studies. Biometrika , 91(1):153–164. Rosenman, E. T., Basse, G., Ow en, A. B., and Baio cc hi, M. (2023a). Combining observ ational and exp erimen tal datasets using shrink age estimators. Biometrics , 79(4):2961–2973. Rosenman, E. T., Dominici, F., and Miratrix, L. (2023b). Empirical ba yes double shrink age for com bining biased and unbiased causal estimates. arXiv pr eprint arXiv:2309.06727 . Sc heffé, H. (1953). A metho d for judging all contrasts in the analysis of v ariance. Biometrika , 40(1-2):87–110. Sc hw artz, D., Saha, R., V entz, S., and T rippa, L. (2026). Harmonized estimation of subgroup- sp ecific treatment effects in randomized trials: The use of external control data. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 88(1):143–170. Šidák, Z. (1967). Rectangular confidence regions for the means of multiv ariate normal distributions. Journal of the Americ an Statistic al Asso ciation , 62(318):626–633. 27 Stein, C. M. (1962). Confidence sets for the mean of a multiv ariate normal distribution. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , 24(2):265–285. T ukey , J. W. (1949). Comparing individual means in the analysis of v ariance. Biometrics , 5(2):99– 114. V anderW eele, T. J. and Ding, P . (2017). Sensitivit y analysis in observ ational research: In tro ducing the e-v alue. Annals of Internal Me dicine , 167(4):268. W ald, A. (1943). T ests of statistical h yp otheses concerning sev eral parameters when the num b er of observ ations is large. T r ansactions of the A meric an Mathematic al So ciety , 54(3):426–482. W ald, A. (1949). Statistical decision functions. The Annals of Mathematic al Statistics , 20(2):165– 205. W allace, T. D. (1977). Pretest estimation in regression: A survey . Americ an Journal of A gricultur al Ec onomics , 59(3):431–443. Y ang, X., Lin, L., A they , S., Jordan, M. I., and Imbens, G. W. (2025). Cross-v alidated causal inference: a mo dern method to combine experimental and observ ational data. arXiv pr eprint arXiv:2511.00727 . 28 Supplemen tary materials for “Introducing the b-v alue: com bining un biased and biased estimators from a sensitivit y analysis p ersp ec- tiv e” App endix A con tains several additional discussions that complemen t the main text. Section A.1 discusses how to compute the b-v alue efficiently . Section A.2 extends our framew ork to one-sided confidence b ounds for τ . Section A.3 discusses the adv antages of using presp ecified p oint estimators instead of bias-dep enden t p oint estimators. Section A.4 pro vides further details on the generalization to dep enden t case, complementing the discussion in Section 3.5 . Section A.5 discusses further details on the relationship b et ween the generic soft-thresholding estimator and the estimator in in Berger ( 1981 ); Bic kel ( 1984 ), complementing the discussion in Section 4.2 . App endix B contains the pro ofs of the results in the main pap er, and App endix C con tains the pro ofs of the results in the app endix. A More discussions A.1 Computing the b-v alue efficiently In this section, w e discuss how to compute the b-v alue efficien tly . Consider a general combined estimator b τ with symmetric fixed-length centered confidence inter- v al [ b τ − c ( b, ζ , σ 2 0 , σ 2 1 ) , b τ + c ( b, ζ , σ 2 0 , σ 2 1 )] . By Definition 2.2 , the b-v alue b ∗ ( ζ ) = b ∗ ( ζ , b τ , σ 2 0 , σ 2 1 ) ≥ 0 is the smallest bias level at which the n ull v alue 0 is contained in the confidence interv al. Equiv alen tly , it is the solution to the equation of b : b ∗ ( ζ ) = inf b ≥ 0 : 0 ∈ [ b τ − c ( b, ζ , σ 2 0 , σ 2 1 ) , b τ + c ( b, ζ , σ 2 0 , σ 2 1 )] = inf b ≥ 0 : c ( b, ζ , σ 2 0 , σ 2 1 ) ≥ | b τ | . Therefore, the b-v alue b ∗ ( ζ ) is 0 if c (0 , ζ , σ 2 0 , σ 2 1 ) > | b τ | , or is ∞ if c ( ∞ , ζ , σ 2 0 , σ 2 1 ) < | b τ | . Otherwise, if w e further assume c ( b, ζ , σ 2 0 , σ 2 1 ) is strictly increasing in b for giv en ζ , σ 2 0 , σ 2 1 , then the b-v alue b ∗ ( ζ ) is the unique solution to the equation of b : c ( b, ζ , σ 2 0 , σ 2 1 ) = | b τ | . In this case, we can compute b ∗ ( ζ ) efficien tly using standard one-dimensional ro ot-finding methods such as the bisection algorithm. In man y settings, the function c ( b, ζ , σ 2 0 , σ 2 1 ) do es not admit a closed-form expression. Instead, we solv e an equation of L : g ( L ; b, ζ , σ 2 0 , σ 2 1 ) = 0 to obtain c ( b, ζ , σ 2 0 , σ 2 1 ) for some function g dep ending on b, ζ , σ 2 0 , σ 2 1 . In such cases, we can compute the b-v alue b y solving g ( | b τ | ; b, ζ , σ 2 0 , σ 2 1 ) = 0 as an equation of b . This reduces the computation of the b-v alue to a single one-dimensional root-finding problem. The ab ov e discussion pro vides a general method for computing the b-v alue for an y com bined estimator with a symmetric fixed-length centered confidence in terv al. No w w e apply this metho d to compute the b-v alue for the precision-weigh ted estimator b τ PW , the pretest estimator b τ PT , and the soft-thresholding estimator b τ ST . First, the following theorem pro vides the b-v alue b ∗ PW ( ζ ) for the precision-weigh ted estimator S1 b τ PW with confidence in terv al [ b τ PW − b L PW (1 + γ ) − 1 / 2 σ 0 , b τ PW + b L PW (1 + γ ) − 1 / 2 σ 0 ] . Theorem A.1. The b-value b ∗ PW ( ζ ) is 0 if Φ | b τ PW | (1 + γ ) − 1 / 2 σ 0 − Φ − | b τ PW | (1 + γ ) − 1 / 2 σ 0 < 1 − ζ . Otherwise, the b-value b ∗ PW ( ζ ) c an b e e quivalently written as the solution to the e quation of b : Φ | b τ PW | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ b − Φ − | b τ PW | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ b = 1 − ζ . Second, the follo wing theorem pro vides the b-v alue b ∗ PT ( ζ ) for the pretest estimator b τ PT with confidence in terv al [ b τ PT − b L PT (1 + γ ) − 1 / 2 σ 0 , b τ PT + b L PT (1 + γ ) − 1 / 2 σ 0 ] . Theorem A.2. The b-value b ∗ PT ( ζ ) is 0 if P ∆ /σ 0 =0 ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) < 1 − ζ , and is ∞ if min t ≥ 0 P ∆ /σ 0 = t ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) > 1 − ζ . Otherwise, the b-value b ∗ PT ( ζ ) c an b e e quivalently written as the solution to the e quation of b : min 0 ≤ t ≤ b P ∆ /σ 0 = t ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) = 1 − ζ , wher e e τ PT is indep endent and identic al ly distribute d as b τ PT , with P ∆ /σ 0 = t ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t i h Φ | b τ PT | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ t − Φ − | b τ PT | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ t i + Z − c α/ 2 − q γ 1+ γ t −∞ h Φ | b τ PT | (1 + γ ) − 1 / 2 σ 0 + √ γ u − Φ − | b τ PT | (1 + γ ) − 1 / 2 σ 0 + √ γ u i ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t h Φ | b τ PT | (1 + γ ) − 1 / 2 σ 0 + √ γ u − Φ − | b τ PT | (1 + γ ) − 1 / 2 σ 0 + √ γ u i ϕ ( u )d u. Third, the follo wing theorem provides the b-v alue b ∗ ST ( ζ ) for the soft-thresholding estimator b τ ST with confidence in terv al [ b τ ST − b L ST (1 + γ ) − 1 / 2 σ 0 , b τ ST + b L ST (1 + γ ) − 1 / 2 σ 0 ] . Theorem A.3. The b-value b ∗ ST ( ζ ) is 0 if P ∆ /σ 0 =0 ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) < 1 − ζ , and is ∞ if P ∆ /σ 0 = ∞ ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) > 1 − ζ . Otherwise, the b-value b ∗ ST ( ζ ) c an b e e quivalently written as the solution to the e quation of b : P ∆ /σ 0 = b ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) = 1 − ζ , S2 wher e e τ ST is indep endent and identic al ly distribute d as b τ ST , with P ∆ /σ 0 = t ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t i h Φ | b τ ST | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ t − Φ − | b τ ST | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ t i + Z − c α/ 2 − q γ 1+ γ t −∞ h Φ | b τ ST | (1 + γ ) − 1 / 2 σ 0 + √ γ ( u + c α/ 2 ) − Φ − | b τ ST | (1 + γ ) − 1 / 2 σ 0 + √ γ ( u + c α/ 2 ) i ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t h Φ | b τ ST | (1 + γ ) − 1 / 2 σ 0 + √ γ ( u − c α/ 2 ) − Φ − | b τ ST | (1 + γ ) − 1 / 2 σ 0 + √ γ ( u − c α/ 2 ) i ϕ ( u )d u. A.2 One-sided confidence b ounds In this section, w e discuss how to construct one-sided confidence b ounds within our framew ork. W e only fo cus on the low er confidence b ound since the upp er confidence b ound can b e constructed in an analogous manner. By using the low er confidence b ound, we can conduct one-sided h yp othesis testing, such as testing τ = 0 v ersus τ > 0 or testing τ ≤ 0 versus τ > 0 . The computation of the b-v alue in the one-sided setting follo ws the same logic as in the tw o-sided case, so here we fo cus on the construction of the one-sided confidence b ounds. First, the following theorem pro vides the sequence of lo wer confidence bounds based on b τ PW , analogous to Theorem 3.1 . Theorem A.4. L et b L ′ PW = b L ′ PW ( b, ζ , γ ) = c ζ + γ √ 1+ γ b . The shortest length lower c onfidenc e b ound b ase d on b τ PW for τ satisfying ( 2.1 ) is given by [ b τ PW − b L ′ PW (1 + γ ) − 1 / 2 σ 0 , ∞ ) . Second, the follo wing theorem provides the sequence of low er confidence b ounds based on b τ PT , analogous to Theorem 3.2 . Theorem A.5. b L ′ PT = b L ′ PT ( b, ζ , γ , α ) is the solution to the fol lowing e quation of L : min 0 ≤ t ≤ b P ∆ /σ 0 = t ( b τ PT − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ , wher e P ∆ /σ 0 = t ( b τ PT − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t i Φ L − γ √ 1 + γ t + Z − c α/ 2 − q γ 1+ γ t −∞ Φ L + √ γ u ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t Φ L + √ γ u ϕ ( u )d u. The shortest length lower c onfidenc e b ound b ase d on b τ PT for τ satisfying ( 2.1 ) is given by [ b τ PT − b L ′ PT (1 + γ ) − 1 / 2 σ 0 , ∞ ) . Third, the follo wing theorem provides the sequence of lo wer confidence bounds based on b τ ST , analogous to Theorem 3.3 . S3 Theorem A.6. F or any L > 0 , P ∆ ( b τ ST − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) is monotonic al ly de cr e asing in ∆ . Then b L ′ ST = b L ′ ST ( b, ζ , γ , α ) is the solution to the e quation of L : P ∆ /σ 0 = b ( b τ ST − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ , wher e P ∆ /σ 0 = t ( b τ ST − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t i Φ L − γ √ 1 + γ t + Z − c α/ 2 − q γ 1+ γ t −∞ Φ L + √ γ ( u + c α/ 2 ) ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t Φ L + √ γ ( u − c α/ 2 ) ϕ ( u )d u. The shortest length lower c onfidenc e b ound b ase d on b τ ST for τ satisfying ( 2.1 ) is given by [ b τ ST − b L ′ ST (1 + γ ) − 1 / 2 σ 0 , ∞ ) . A.3 Bias-dep enden t p oin t estimator A natural question arises: since the combined estimator b τ itself do es not dep end on the unknown true bias ∆ , could we instead construct confidence interv als using a bias-dep enden t p oin t estimator, i.e., one that explicitly dep ends on ∆ ? More generally , supp ose we ha ve an estimator whose exp ectation is τ + g (∆) , where g ( · ) is a kno wn function of the bias. There are tw o natural approac hes to construct a confidence interv al for τ in this setting. The first approac h is to estimate the bias and subtract it from the p oin t estimator. Sp ecifically , we can construct an adaptiv e estimator of the form b τ − [ g (∆) , where [ g (∆) is an estimate of g (∆) . The second approach is to construct a p oin t estimator b τ − g (∆) and its corresp onding confidence in terv al for each p ossible v alue of ∆ within the bias b ound, and then take the union of all such interv als. How ev er, the union of these confidence interv als is conserv ativ e since the biased estimator cannot simultaneously exhibit multiple bias v alues within the b ound. See also Armstrong and Kolesár ( 2020 ) for a related argumen t: they sho w that using critical v alues to account for p oten tial bias is more efficient than subtracting an estimate of the bias from the p oin t estimator. W e now illustrate why constructing confidence interv als using a presp ecified p oin t estimator is more efficien t than using a bias-dep enden t one. F or the first approac h, a natural estimate of the bias is b ∆ = b τ 1 − b τ 0 . W e tak e the precision- w eighted estimator as an example. Subtracting this estimated bias from the precision-we ighted estimator giv es b τ 0 + γ 1+ γ ( b τ 1 − b τ 0 ) − γ 1+ γ b ∆ = b τ 0 . Thus, after bias correction, the estimator collapses to the unbiased estimator b τ 0 . In other w ords, this approach discards the more precise estimator. The issue is that, when constructing the confidence interv al, we need to account for the randomness in b ∆ , which mak es the confidence interv al wider. As a result, this approac h provides no efficiency gain o ver the unbiased estimator. F or the second approach, suppose w e know the true bias ∆ . Then the bias-corrected combined estimator is b τ ∆ = b τ − g (∆) . Supp ose the shortest length symmetric cen tered confidence in terv al based on b τ ∆ for τ is giv en by [ b τ ∆ − b L ζ , b τ ∆ + b L ζ ] , where b L ζ do es not dep ends on ∆ . When we only S4 ha ve | ∆ | ≤ b , we tak e the union of these confidence interv als ov er all possible ∆ with | ∆ | ≤ b . The resulting confidence in terv al is [ ∆: | ∆ |≤ b [ b τ ∆ − b L ζ , b τ ∆ + b L ζ ] = b τ − max ∆: | ∆ |≤ b g (∆) + b L ζ , b τ ∆ + max ∆: | ∆ |≤ b g (∆) + b L ζ . (A.1) By Definition 2.1 , the shortest length symmetric centered confidence in terv al for τ based on b τ is giv en by [ b τ − b L ζ ( b ) , b τ + b L ζ ( b )] , where b L ζ ( b ) = argmin L ≥ 0 inf ∆: | ∆ |≤ b P ∆ ( | b τ − τ | ≤ L ) ≥ 1 − ζ . (A.2) Comparing the confidence interv al based on bias-dep endent p oin t estimator ( A.1 ) with the confi- dence in terv al based on presp ecified p oint estimator ( A.2 ), we note that max ∆: | ∆ |≤ b g (∆) + b L ζ ∈ inf ∆: | ∆ |≤ b P ∆ ( | b τ − τ | ≤ L ) ≥ 1 − ζ . Indeed, for an y ∆ ′ with | ∆ ′ | ≤ b , P ∆ ′ | b τ − τ | ≤ max ∆: | ∆ |≤ b g (∆) + b L ζ ≥ P ∆ ′ | b τ − τ | ≤ g (∆ ′ ) + b L ζ ≥ 1 − ζ . Therefore b L ζ ( b ) ≤ max ∆: | ∆ |≤ b g (∆) + b L ζ , with equalit y only when b = 0 . Hence, the confidence in terv al based on presp ecified p oint estimator is strictly narro wer when the bias b ound is nonzero. This comparison highligh ts a k ey principle of our framework: instead of accoun ting for the bias in the p oin t estimation through subtracting the bias, it is more efficient to account for the bias in the confidence interv al construction through critical v alues. A.4 More details for the dep endence case In this section, w e provide more details on the generalization to the dep endence case in Section 3.5 of the main pap er. In the reparametrization in ( 3.5 ), one ma y b e concerned ab out how scaling the bias w orks and ho w to in terpret the relative bias. W e compute the confidence in terv als and the b-v alue by the follo wing three steps. (a) Reparametrization: W e apply the transformation in ( 3.5 ) to obtain the transformed biased estimator b τ ′ 1 . Under this transformation, b τ 0 remains an un biased estimator of τ , and b τ ′ 1 b ecomes a biased estimator that is indep endent of b τ 0 . W e then construct the combined estimator b τ using these t wo indep enden t comp onen ts. (b) Compute the confidence interv als and the b-v alue in the transformed problem: W e next w ork in the transformed problem. Let the relativ e bias in the transformed parametrization b e ∆ ′ /σ 0 with b ound b ′ . Using Definition 2.1 , w e construct the confidence interv al b τ − I ( b ′ , ζ ) such that inf ∆ ′ : | ∆ ′ /σ 0 |≤ b ′ P ∆ ′ ( τ ∈ b τ − I ( b ′ , ζ )) ≥ 1 − ζ , S5 and compute the corresp onding b-v alue b ∗ ′ in the transformed problem. (c) T ransform the confidence interv als and the b-v alue bac k to the original problem: Finally , we transform the results bac k to the original problem. Under the transformation in ( 3.5 ), the relativ e biases in the tw o parametrizations are related b y | ∆ ′ /σ 0 | ≤ b ′ ⇐ ⇒ | ∆ /σ 0 | ≤ | 1 − ρσ 1 /σ 0 | b ′ . Th us, a bias bound b ′ in the transformed parametrization corresp onds to a bias bound b = | 1 − ρσ 1 /σ 0 | b ′ in the original parametrization. Using this relationship, the confidence interv al b τ − I ( b, ζ ) satisfies Definition 2.1 in the original parametrization: inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ b τ − I ( b, ζ )) ≥ 1 − ζ , and the b-v alue in the original parametrization is b ∗ = | 1 − ρσ 1 /σ 0 | b ∗ ′ . A.5 Relationship b et ween the generic soft-thresholding estimator and the esti- mator in Berger ( 1981 ); Bic k el ( 1984 ) In this section, we discuss the relationship b et w een the generic soft-thresholding estimator and the estimator in Berger ( 1981 ); Bick el ( 1984 ) in Section 4.2 of the main pap er. W e consider the special setting studied in Berger ( 1981 ) and Bick el ( 1984 ), where the cov ariance matrices are prop ortional and isotropic. Sp ecifically , assume Σ 0 = σ 2 0 I d and Σ 0 = γ Σ 1 for some γ > 0 . The γ here generalizes the v ariance ratio σ 2 0 /σ 2 1 in the univ ariate setting. Let C ≥ 0 b e a user-c hosen constant and let ρ C ( r ) denote the ratio of Bessel functions, whose explicit form is giv en in Lemma 3 of Berger ( 1981 ). The constan t C here plays the same role as the critical v alue c α/ 2 in the univ ariate setting. The estimator in Berger ( 1981 ) and Bick el ( 1984 ) is then given b y choosing the shrink age function h ∗ q ( r ) = ρ C /σ 2 0 ( r ) where the threshold q = q ( C ) is defined as the solution to h ∗ q ( q ) = 1 ⇐ ⇒ ρ C /σ 2 0 ( q ) = 1 . Supp ose w e c ho ose the function h ∗ q ( · ) as the estimator in Berger ( 1981 ) and Bick el ( 1984 ), i.e., h ∗ q ( r ) = ρ C /σ 2 0 ( r ) with q given ab o ve. In the univ ariate case d = 1 , the estimator b τ ST reduces to the univ ariate soft-thresholding estimator b τ ST . In this case, the user-chosen constant C in b τ ST corresp onds one-to-one to the significance level α , or equiv alen tly to the critical v alue c α/ 2 in b τ ST . Consider the sp ecial case C = 0 . When d ≤ 2 , the estimator b τ ST reduces to the unbiased estimator b τ 0 . How ever, when d ≥ 3 , the b eha vior of b τ ST is different. When C = 0 and d ≥ 3 , we hav e ρ C /σ 2 0 ( r ) = ρ 0 ( r ) = 2( d − 2) /r , and therefore the threshold q is giv en by q = 2( d − 2) . If ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q , or equiv alently , ∥ b τ 1 − b τ 0 ∥ 2 2 ≤ 2( d − 2)( σ 2 0 + σ 2 1 ) , the estimator b τ ST reduces to the precision-weigh ted estimator b τ 0 + γ 1+ γ ( b τ 1 − b τ 0 ) . If instead ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q , or equiv alently , ∥ b τ 1 − b τ 0 ∥ 2 2 > 2( d − 2)( σ 2 0 + σ 2 1 ) , the estimator b τ ST tak es the James–Stein shrink age form ( Green and Stra wderman , 1991 ): b τ 0 + h ∗ q ( ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 )( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) = b τ 0 + 2( d − 2) σ 2 0 ∥ b τ 1 − b τ 0 ∥ 2 2 ( b τ 1 − b τ 0 ) . S6 B Pro ofs of the main results B.1 Pro of of Theorem 3.1 Pr o of of The or em 3.1 . By the definition of b τ PW , under Assumption 2.1 , w e hav e b τ PW ∼ N τ + γ 1 + γ ∆ , 1 1 + γ σ 2 0 . Equiv alen tly , after standardization, (1 + γ ) 1 / 2 σ − 1 0 ( b τ PW − τ ) ∼ N γ √ 1 + γ ∆ σ 0 , 1 . W e now ev aluate the worst-case co verage probability of the confidence interv al [ b τ PW − L (1 + γ ) − 1 / 2 σ 0 , b τ PW + L (1 + γ ) − 1 / 2 σ 0 ] . W e hav e inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ [ b τ PW − L (1 + γ ) − 1 / 2 σ 0 , b τ PW + L (1 + γ ) − 1 / 2 σ 0 ]) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( | b τ PW − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( | (1 + γ ) 1 / 2 σ − 1 0 ( b τ PW − τ ) | ≤ L ) = inf ∆: | ∆ /σ 0 |≤ b P ∆ N γ √ 1 + γ ∆ σ 0 , 1 ≤ L =P N sup ∆: | ∆ /σ 0 |≤ b γ √ 1 + γ ∆ σ 0 , 1 ! ≤ L ! =P N γ √ 1 + γ b, 1 ≤ L =Φ L − γ √ 1 + γ b − Φ − L − γ √ 1 + γ b . Therefore, the pro of is complete since b L PW is the solution to Φ L − γ √ 1 + γ b − Φ − L − γ √ 1 + γ b = 1 − ζ . B.2 Pro of of Theorem 3.2 Pr o of of The or em 3.2 . Recall that the pretest estimator b τ PT is defined as b τ PT = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | ≤ σ c α/ 2 . T o characterize the distribution of b τ PT − τ , note that b τ 0 + γ 1+ γ ( b τ 1 − b τ 0 ) − τ b τ 1 − b τ 0 ! ∼ N γ 1+ γ ∆ ∆ ! , 1 1+ γ σ 2 0 0 0 σ 2 0 + σ 2 1 !! . W e first consider the even t that the pretest fails to reject, | b τ 1 − b τ 0 | ≤ σ c α/ 2 . Conditional on this S7 ev ent, b τ PT = b τ 0 + γ 1+ γ ( b τ 1 − b τ 0 ) , and therefore ( b τ PT − τ ) | {| b τ 1 − b τ 0 | ≤ σ c α/ 2 } ∼ N γ 1 + γ ∆ , 1 1 + γ σ 2 0 . Equiv alen tly , after standardization, (1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | {| b τ 1 − b τ 0 | ≤ σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 , 1 . W e next consider the even t that the pretest rejects, | b τ 1 − b τ 0 | > σ c α/ 2 . Conditional on σ − 1 ( b τ 1 − b τ 0 ) = u , w e hav e γ 1 + γ ( b τ 1 − b τ 0 ) = r γ 1 + γ σ 0 u. Since b τ PT = b τ 0 conditional on this ev ent, we ha ve ( b τ PT − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } =( b τ 0 − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } =( b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } − r γ 1 + γ σ 0 u ∼ N γ 1 + γ ∆ − r γ 1 + γ σ 0 u, 1 1 + γ σ 2 0 , whic h implies (1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 − √ γ u, 1 . W e now ev aluate the worst-case co verage probability of the confidence interv al [ b τ PT − L (1 + γ ) − 1 / 2 σ 0 , b τ PT + L (1 + γ ) − 1 / 2 σ 0 ] . W e hav e inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ [ b τ PT − L (1 + γ ) − 1 / 2 σ 0 , b τ PT + L (1 + γ ) − 1 / 2 σ 0 ]) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) . Fixing ∆ /σ 0 = t , we decomp ose the probabilit y according to whether the pretest accepts or rejects: P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) =P ∆ /σ 0 = t ( | (1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | ≤ L ) =P ∆ /σ 0 = t ( | (1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | ≤ L, | b τ 1 − b τ 0 | ≤ σ c α/ 2 ) + P ∆ /σ 0 = t ( | (1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | ≤ L, | b τ 1 − b τ 0 | > σ c α/ 2 ) =P N γ √ 1 + γ t, 1 ≤ L P N r γ 1 + γ t, 1 ≤ c α/ 2 + E U ∼ N ( q γ 1+ γ t, 1) P N γ √ 1 + γ t − √ γ U, 1 ≤ L 1 | U | > c α/ 2 S8 = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t ih Φ L − γ √ 1 + γ t − Φ − L − γ √ 1 + γ t i + Z − c α/ 2 − q γ 1+ γ t −∞ h Φ L + √ γ u − Φ − L + √ γ u i ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t h Φ L + √ γ u − Φ − L + √ γ u i ϕ ( u )d u, whic h yields exactly the expression in ( 3.2 ). Finally , since the cov erage probabilit y is symmetric in ∆ , the worst case ov er | ∆ /σ 0 | ≤ b is attained for some t ∈ [0 , b ] . Cho osing b L PT to satisfy min 0 ≤ t ≤ b P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ completes the pro of. B.3 Pro of of Theorem 3.3 Pr o of of The or em 3.3 . Recall that the soft-thresholding estimator b τ ST is defined as b τ ST = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | ≤ σ c α/ 2 + γ 1 + γ σ c α/ 2 sign( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | > σ c α/ 2 . As b efore, note that b τ 0 + γ 1+ γ ( b τ 1 − b τ 0 ) − τ b τ 1 − b τ 0 ! ∼ N γ 1+ γ ∆ ∆ ! , 1 1+ γ σ 2 0 0 0 σ 2 0 + σ 2 1 !! . W e first consider the even t that the pretest accepts, | b τ 1 − b τ 0 | ≤ σ c α/ 2 . Conditional on this even t, b τ ST = b τ 0 + γ 1+ γ ( b τ 1 − b τ 0 ) , and therefore ( b τ ST − τ ) | {| b τ 1 − b τ 0 | ≤ σ c α/ 2 } ∼ N γ 1 + γ ∆ , 1 1 + γ σ 2 0 . Equiv alen tly , after standardization, (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) | {| b τ 1 − b τ 0 | ≤ σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 , 1 . W e next consider the even t that the pretest rejects, | b τ 1 − b τ 0 | > σ c α/ 2 . Conditional on σ − 1 ( b τ 1 − b τ 0 ) = u , w e hav e γ 1 + γ ( b τ 1 − b τ 0 ) = r γ 1 + γ σ 0 u. Since b τ ST = b τ 0 + γ 1+ γ σ c α/ 2 sign( b τ 1 − b τ 0 ) conditional on this ev ent, we ha v e ( b τ ST − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } = b τ 0 − τ + γ 1 + γ σ c α/ 2 sign( b τ 1 − b τ 0 ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) − τ | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } − r γ 1 + γ σ 0 [ u − c α/ 2 sign( u )] S9 ∼ N γ 1 + γ ∆ − r γ 1 + γ σ 0 [ u − c α/ 2 sign( u )] , 1 1 + γ σ 2 0 , whic h implies (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 − √ γ [ u − c α/ 2 sign( u )] , 1 . W e now ev aluate the worst-case co verage probability of the confidence interv al [ b τ ST − L (1 + γ ) − 1 / 2 σ 0 , b τ ST + L (1 + γ ) − 1 / 2 σ 0 ] . W e hav e inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ [ b τ ST − L (1 + γ ) − 1 / 2 σ 0 , b τ ST + L (1 + γ ) − 1 / 2 σ 0 ]) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) . Fixing ∆ /σ 0 = t , we decomp ose the probabilit y according to whether the pretest accepts or rejects: P ∆ /σ 0 = t ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) =P ∆ /σ 0 = t ( | (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) | ≤ L ) =P ∆ /σ 0 = t ( | (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) | ≤ L, | b τ 1 − b τ 0 | ≤ σ c α/ 2 ) + P ∆ /σ 0 = t ( | (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) | ≤ L, | b τ 1 − b τ 0 | > σ c α/ 2 ) =P N γ √ 1 + γ t, 1 ≤ L P N r γ 1 + γ t, 1 ≤ c α/ 2 + E U ∼ N ( q γ 1+ γ t, 1) P N γ √ 1 + γ t − √ γ [ U − c α/ 2 sign( U )] , 1 ≤ L 1 | U | > c α/ 2 = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t ih Φ L − γ √ 1 + γ t − Φ − L − γ √ 1 + γ t i + Z − c α/ 2 − q γ 1+ γ t −∞ h Φ L + √ γ ( u + c α/ 2 ) − Φ − L + √ γ ( u + c α/ 2 ) i ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t h Φ L + √ γ ( u − c α/ 2 ) − Φ − L + √ γ ( u − c α/ 2 ) i ϕ ( u )d u. whic h yields exactly the expression in ( 3.4 ). T o establish the monotonicit y of the co verage probability in | ∆ | , fix L > 0 and write t = ∆ /σ 0 . Since P ∆ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) is symmetric ab out ∆ = 0 , it suffices to consider t ≥ 0 . By Lemma 3.2 , w e hav e (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) d = Z + γ √ 1 + γ t − √ γ S ( U ) , where Z ∼ N (0 , 1) is indep enden t of U ∼ N ( q γ 1+ γ t, 1) and S ( u ) = sign( u )( | u | − c α/ 2 ) + . Conse- quen tly , we can write the cov erage probabilit y as P ∆ /σ 0 = t ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = E [Φ ( L − µ t ( U )) − Φ ( − L − µ t ( U ))] , with µ t ( U ) = γ √ 1+ γ t − √ γ S ( U ) . The function µ 7→ Φ( L − µ ) − Φ( − L − µ ) is even and strictly S10 decreasing in | µ | . Moreov er, increasing t shifts the distribution of U aw a y from zero, whic h increases | µ t ( U ) | in the sense of sto c hastic order. Therefore, the cov erage probability is nonincreasing in t ≥ 0 , or equiv alen tly , monotonically decreasing in | ∆ | . Finally , since the cov erage probability is symmetric in ∆ and monotonically decreasing in | ∆ | , the w orst case ov er | ∆ /σ 0 | ≤ b is attained at ∆ /σ 0 = b . Cho osing b L ST to satisfy P ∆ /σ 0 = b ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ completes the pro of. B.4 Pro of of Theorem 4.1 Pr o of of The or em 4.1 . By the definition of b τ PW , under Assumption 4.1 , w e hav e b τ PW ∼ N ( τ + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ∆ , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . Equiv alen tly , after standardization, ( Σ − 1 0 + Σ − 1 1 ) 1 / 2 ( b τ PW − τ ) ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ , I d ) , whic h implies ( b τ PW − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PW − τ ) ∼ χ 2 d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 , where χ 2 d ( · ) denotes the noncen tral chi-squared distribution with d degrees of freedom. W e now ev aluate the worst-case co verage probability of the quadratic confidence region { τ ∈ R d : ( b τ PW − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PW − τ ) ≤ M } . W e hav e inf ∆ : | [ Σ − 1 / 2 ∆ ] j |≤ b j , ∀ j =1 , 2 ,...,d P ∆ ( b τ PW − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PW − τ ) ≤ M = inf ∆ : | [ Σ − 1 / 2 ∆ ] j |≤ b j , ∀ j =1 , 2 ,...,d P ∆ χ 2 d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 ≤ M =P χ 2 d sup ∆ : | [ Σ − 1 / 2 ∆ ] j |≤ b j , ∀ j =1 , 2 ,...,d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 ! ≤ M ! . T o ev aluate the supremum in the noncentralit y parameter, note that ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 = ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 Σ 1 / 2 ( Σ − 1 / 2 ∆ ) 2 2 , whic h is a conv ex quadratic function of u = Σ − 1 / 2 ∆ . Since the constraint | u j | ≤ b j defines a h yp errectangle, the maxim um of this con vex quadratic o v er the constraint set is attained at a v ertex. Therefore, sup ∆ : | [ Σ − 1 / 2 ∆ ] j |≤ b j , ∀ j =1 , 2 ,...,d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 S11 = sup s ∈{± 1 } d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 Σ 1 / 2 b ⊙ s 2 2 . Finally , c ho osing c M PW as the (1 − ζ ) upp er quan tile of the noncen tral chi-squared distribution with d degrees of freedom and the ab o ve noncentralit y parameter ensures that the confidence region attains the desired co verage level. This completes the pro of. B.5 Pro of of Theorem 4.2 Pr o of of The or em 4.2 . Recall that the pretest estimator b τ PT is defined as b τ PT = b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) 1 ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q . A direct calculation sho ws that b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) ⊥ ⊥ b τ 1 − b τ 0 . Moreo ver, b τ 1 − b τ 0 ∼ N ( ∆ , Σ 0 + Σ 1 ) , and b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) ∼ N ( τ + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ∆ , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . W e first consider the even t that the pretest accepts, ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q . Conditional on this ev ent, b τ PT = b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) , and therefore ( b τ PT − τ ) | {∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q } ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ∆ , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . After standardization, ( Σ − 1 0 + Σ − 1 1 ) 1 / 2 ( b τ PT − τ ) | {∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q } ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ , I d ) , whic h implies ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) | {∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q } ∼ χ 2 d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 . W e next consider the ev ent that the pretest rejects, ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q . Conditional on ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , w e hav e ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) = ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( Σ 0 + Σ 1 ) 1 / 2 u . Since b τ PT = b τ 0 conditional on this ev ent, we ha ve ( b τ PT − τ ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } = b τ 0 | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } = b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } S12 − ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( Σ 0 + Σ 1 ) 1 / 2 u ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 [ ∆ − ( Σ 0 + Σ 1 ) 1 / 2 u ] , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . After standardization, ( Σ − 1 0 + Σ − 1 1 ) 1 / 2 ( b τ PT − τ ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ ∆ − ( Σ 0 + Σ 1 ) 1 / 2 u ] , I d ) , whic h implies ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } ∼ χ 2 d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ ∆ − ( Σ 0 + Σ 1 ) 1 / 2 u ] 2 2 . W e now ev aluate the worst-case co verage probability of the confidence region { τ ∈ R d : ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M } . W e decomp ose the cov erage probabilit y according to whether the pretest accepts or rejects: P ∆ ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M =P ∆ ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q + P ∆ ( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q =Ψ M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 Ψ q ; ( Σ 0 + Σ 1 ) − 1 / 2 ∆ 2 2 + Z ∥ u ∥ 2 2 >q Ψ M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ ∆ − ( Σ 0 + Σ 1 ) 1 / 2 u ] 2 2 ϕ u ; ( Σ 0 + Σ 1 ) − 1 / 2 ∆ d u , whic h yields exactly the expression in ( 4.4 ) by taking Σ − 1 / 2 ∆ = t . Finally , choosing c M PT to satisfy inf ∆ : | Σ − 1 / 2 ∆ |≤ b P ∆ (( b τ PT − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ PT − τ ) ≤ M ) = 1 − ζ completes the pro of. B.6 Pro of of Theorem 4.3 Pr o of of The or em 4.3 . Recall that the soft-thresholding estimator b τ ST is defined as b τ ST = b τ 0 + h q ( ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 )( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) . A direct calculation sho ws that b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) ⊥ ⊥ b τ 1 − b τ 0 . Moreo ver, b τ 1 − b τ 0 ∼ N ( ∆ , Σ 0 + Σ 1 ) , S13 and b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) ∼ N ( τ + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ∆ , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . W e first consider the even t that the pretest accepts, ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q . Conditional on this ev ent, b τ ST = b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) , and therefore ( b τ ST − τ ) | {∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q } ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ∆ , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . After standardization, ( Σ − 1 0 + Σ − 1 1 ) 1 / 2 ( b τ ST − τ ) | {∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q } ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ , I d ) , whic h implies ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) | {∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q } ∼ χ 2 d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 . W e next consider the ev ent that the pretest rejects, ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q . Conditional on ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , w e hav e ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) = ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( Σ 0 + Σ 1 ) 1 / 2 u . Since b τ ST = b τ 0 + h ∗ q ( ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 )( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) conditional on this ev ent, we ha ve ( b τ ST − τ ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } = b τ 0 + h ∗ q ( ∥ u ∥ 2 2 )( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } = b τ 0 + ( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( b τ 1 − b τ 0 ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } − (1 − h ∗ q ( ∥ u ∥ 2 2 ))( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 ( Σ 0 + Σ 1 ) 1 / 2 u ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 Σ − 1 1 [ ∆ − (1 − h ∗ q ( ∥ u ∥ 2 2 ))( Σ 0 + Σ 1 ) 1 / 2 u ] , ( Σ − 1 0 + Σ − 1 1 ) − 1 ) . After standardization, ( Σ − 1 0 + Σ − 1 1 ) 1 / 2 ( b τ ST − τ ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } ∼ N (( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ ∆ − (1 − h ∗ q ( ∥ u ∥ 2 2 ))( Σ 0 + Σ 1 ) 1 / 2 u ] , I d ) , whic h implies ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) | { ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) = u , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q } ∼ χ 2 d ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ ∆ − (1 − h ∗ q ( ∥ u ∥ 2 2 ))( Σ 0 + Σ 1 ) 1 / 2 u ] 2 2 . W e now ev aluate the worst-case co verage probability of the confidence region { τ ∈ R d : ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M } . W e decomp ose the cov erage probabilit y according to whether the pretest accepts or rejects: P ∆ ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M S14 =P ∆ ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 ≤ q + P ∆ ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M , ∥ ( Σ 0 + Σ 1 ) − 1 / 2 ( b τ 1 − b τ 0 ) ∥ 2 2 > q =Ψ M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 ∆ 2 2 Ψ q ; ( Σ 0 + Σ 1 ) − 1 / 2 ∆ 2 2 + Z ∥ u ∥ 2 2 >q Ψ M ; ( Σ − 1 0 + Σ − 1 1 ) − 1 / 2 Σ − 1 1 [ ∆ − (1 − h ∗ q ( ∥ u ∥ 2 2 ))( Σ 0 + Σ 1 ) 1 / 2 u ] 2 2 ϕ u ; ( Σ 0 + Σ 1 ) − 1 / 2 ∆ d u , whic h yields exactly the expression in ( 4.5 ) by taking Σ − 1 / 2 ∆ = t . Let t = Σ − 1 / 2 ∆ and denote by p ( t ) := P Σ − 1 / 2 ∆ = t ( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M . By construction of the soft-thresholding rule, p ( t ) is in v arian t under co ordinate-wise sign flips, i.e., p ( t ) = p ( t ⊙ s ) for all s ∈ {± 1 } d . Moreov er, for eac h j ∈ { 1 , . . . , d } and any fixed v alues of the remaining co ordinates, the map u 7→ p ( t 1 , . . . , t j − 1 , u, t j +1 , . . . , t d ) is nonincreasing in | u | . Therefore, inf t : | t j |≤ b j p ( t ) = inf s ∈{± 1 } d p ( b ⊙ s ) , i.e., the w orst-case cov erage probabilit y ov er the h yp errectangle is attained at a v ertex. Finally , choosing c M ST to satisfy inf ∆ : Σ − 1 / 2 ∆ = b ⊙ s , s ∈{− 1 , 1 } d P ∆ (( b τ ST − τ ) ⊤ ( Σ − 1 0 + Σ − 1 1 )( b τ ST − τ ) ≤ M ) = 1 − ζ completes the pro of. B.7 Pro of of Theorem 5.1 Pr o of of The or em 5.1 . By the definition of b τ PW , under Assumption 5.1 , w e hav e b τ PW ∼ N τ + K X j =1 γ j 1 + ∥ γ ∥ 1 ∆ j , 1 1 + ∥ γ ∥ 1 σ 2 0 . Equiv alen tly , after standardization, (1 + ∥ γ ∥ 1 ) 1 / 2 σ − 1 0 ( b τ PW − τ ) ∼ N K X j =1 γ j p 1 + ∥ γ ∥ 1 ∆ j σ 0 , 1 . W e now ev aluate the worst-case co verage probability of the confidence interv al [ b τ PW − L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ PW + L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] . W e hav e inf ∆ : | ∆ j /σ 0 |≤ b j , ∀ j =1 ,...,K P ∆ ( | b τ PW − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = inf ∆ : | ∆ j /σ 0 |≤ b j , ∀ j =1 ,...,K P ∆ N K X j =1 γ j p 1 + ∥ γ ∥ 1 ∆ j σ 0 , 1 ≤ L S15 =P N sup ∆ : | ∆ j /σ 0 |≤ b j , ∀ j =1 ,...,K K X j =1 γ j p 1 + ∥ γ ∥ 1 ∆ j σ 0 , 1 ≤ L =P N ⟨ γ , b ⟩ p 1 + ∥ γ ∥ 1 , 1 ! ≤ L ! =Φ L − ⟨ γ , b ⟩ p 1 + ∥ γ ∥ 1 ! − Φ − L − ⟨ γ , b ⟩ p 1 + ∥ γ ∥ 1 ! . Therefore, the pro of is complete since b L PW is the solution to Φ L − ⟨ γ , b ⟩ p 1 + ∥ γ ∥ 1 ! − Φ − L − ⟨ γ , b ⟩ p 1 + ∥ γ ∥ 1 ! = 1 − ζ . B.8 Pro of of Theorem 5.2 Pr o of of The or em 5.2 . Recall that the pretest estimator b τ PT is defined as b τ PT = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) 1 | b τ j − b τ 0 | ≤ (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 . Then ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 ∼ N ( ∆ /σ 0 , V ) , V ij = 1 + γ − 1 i 1 ( i = j ) , and moreo ver b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) ⊥ ⊥ ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ . Rewrite b τ PT as b τ PT = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) 1 | b τ j − b τ 0 | ≤ (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) − K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) 1 | b τ j − b τ 0 | > (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 . Consequen tly , conditioning on ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u yields b τ PT | { ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u } = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) − X j : | u j | > (1+ γ − 1 j ) 1 / 2 c α/ 2 γ j 1 + ∥ γ ∥ 1 σ 0 u j , ∼ N τ + K X j =1 γ j 1 + ∥ γ ∥ 1 ∆ j − X j : | u j | > (1+ γ − 1 j ) 1 / 2 c α/ 2 γ j 1 + ∥ γ ∥ 1 σ 0 u j , 1 1 + ∥ γ ∥ 1 σ 2 0 . S16 Equiv alen tly , (1 + ∥ γ ∥ 1 ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | { ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u } ∼ N K X j =1 γ j p 1 + ∥ γ ∥ 1 ∆ j σ 0 − X j : | u j | > (1+ γ − 1 j ) 1 / 2 c α/ 2 γ j p 1 + ∥ γ ∥ 1 u j , 1 . W e now ev aluate the worst-case co verage probability of the confidence interv al [ b τ PT − L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ PT + L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] . Define u ′ ∈ R K co ordinate-wise by u ′ j = u j 1 ( | u j | > (1 + γ − 1 j ) 1 / 2 c α/ 2 ) for j = 1 , . . . , K . W e hav e P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = Z R K P ∆ /σ 0 = t | b τ PT − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 | ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u ϕ t , V ( u )d u = Z R K P N ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 , 1 ! ≤ L ! ϕ t , V ( u ) d u = Z R K " Φ L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 ! − Φ − L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 !# ϕ t , V ( u )d u , whic h yields exactly the expression in the theorem. Finally , choosing b L PT to satisfy inf ∆ : | ∆ /σ 0 |≤ b P ∆ ( | b τ PT − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = 1 − ζ completes the pro of. B.9 Pro of of Theorem 5.3 Pr o of of The or em 5.3 . Recall that the soft-thresholding estimator b τ ST is defined as b τ ST = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 h ( b τ j − b τ 0 ) 1 | b τ j − b τ 0 | ≤ (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 + (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 sign( b τ j − b τ 0 ) 1 | b τ j − b τ 0 | > (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 i . Then ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 ∼ N ( ∆ /σ 0 , V ) , V ij = 1 + γ − 1 i 1 ( i = j ) , and moreo ver b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) ⊥ ⊥ ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ . S17 Rewrite b τ ST as b τ ST = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 h ( b τ j − b τ 0 ) 1 | b τ j − b τ 0 | ≤ (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 + (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 sign( b τ j − b τ 0 ) 1 | b τ j − b τ 0 | > (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 i = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) − K X j =1 γ j 1 + ∥ γ ∥ 1 h ( b τ j − b τ 0 ) − (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 sign( b τ j − b τ 0 ) i 1 | b τ j − b τ 0 | > (1 + γ − 1 j ) 1 / 2 σ 0 c α/ 2 . Consequen tly , conditioning on ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u yields b τ ST | { ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u } = b τ 0 + K X j =1 γ j 1 + ∥ γ ∥ 1 ( b τ j − b τ 0 ) − X j : | u j | > (1+ γ − 1 j ) 1 / 2 c α/ 2 γ j 1 + ∥ γ ∥ 1 σ 0 h u j − (1 + γ − 1 j ) 1 / 2 c α/ 2 sign( u j ) i ∼ N τ + K X j =1 γ j 1 + ∥ γ ∥ 1 ∆ j − X j : | u j | > (1+ γ − 1 j ) 1 / 2 c α/ 2 γ j 1 + ∥ γ ∥ 1 σ 0 h u j − (1 + γ − 1 j ) 1 / 2 c α/ 2 sign( u j ) i , 1 1 + ∥ γ ∥ 1 σ 2 0 . Equiv alen tly , ( b τ ST − τ ) / [(1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] | { ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u } ∼ N K X j =1 γ j p 1 + ∥ γ ∥ 1 ∆ j σ 0 − X j : | u j | > (1+ γ − 1 j ) 1 / 2 c α/ 2 γ j p 1 + ∥ γ ∥ 1 h u j − (1 + γ − 1 j ) 1 / 2 c α/ 2 sign( u j ) i , 1 . W e now ev aluate the worst-case co verage probability of the confidence interv al [ b τ ST − L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 , b τ ST + L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ] . Define u ′ ∈ R K co ordinate-wise by u ′ j = [ u j − (1 + γ − 1 j ) 1 / 2 c α/ 2 sign( u j )] 1 ( | u j | > (1 + γ − 1 j ) 1 / 2 c α/ 2 ) for j = 1 , . . . , K . W e hav e P ∆ /σ 0 = t ( | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = Z R K P ∆ /σ 0 = t | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 | ( b τ 1 − b τ 0 , . . . , b τ K − b τ 0 ) ⊤ /σ 0 = u ϕ t , V ( u )d u = Z R K P N ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 , 1 ! ≤ L ! ϕ t , V ( u ) d u = Z R K " Φ L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 ! − Φ − L − ⟨ γ , t − u ′ ⟩ p 1 + ∥ γ ∥ 1 !# ϕ t , V ( u )d u , whic h yields exactly the expression in the theorem. S18 Let t = Σ − 1 / 2 ∆ and denote by p ( t ) := P ∆ /σ 0 = t | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 . Analogous to the pro of of Theorem 3.3 , we can sho w that p ( t ) is symmetric ab out t = 0 and monotonically decreasing in | t j | for all j = 1 , . . . , K . Therefore, the w orst-case cov erage probability is attained at t = b . Finally , choosing b L ST to satisfy P ∆ /σ 0 = b ( | b τ ST − τ | ≤ L (1 + ∥ γ ∥ 1 ) − 1 / 2 σ 0 ) = 1 − ζ completes the pro of. C Pro ofs of additional results C.1 Pro of of Lemma 3.1 and Lemma 3.2 Pr o of of L emma 3.1 and L emma 3.2 . By Assumption 2.1 , b τ 0 − τ b τ 1 − τ ! ∼ N 0 ∆ ! , σ 2 0 0 0 σ 2 1 !! . Hence, b y linearity of multiv ariate normals, b τ 0 − τ b τ 1 − b τ 0 ! ∼ N 0 ∆ ! , σ 2 0 − σ 2 0 − σ 2 0 σ 2 0 + σ 2 1 !! . Another linear transformation yields b τ 0 + γ 1+ γ ( b τ 1 − b τ 0 ) − τ b τ 1 − b τ 0 ! ∼ N γ 1+ γ ∆ ∆ ! , 1 1+ γ σ 2 0 0 0 σ 2 0 + σ 2 1 !! . Therefore, there exist indep enden t standard normals Z 1 , Z 2 suc h that b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) − τ d = γ 1 + γ ∆ + 1 √ 1 + γ σ 0 Z 1 , b τ 1 − b τ 0 d = ∆ + σ Z 2 . Next, b y the definitions of b τ PT and b τ ST , b τ PT = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | ≤ σ c α/ 2 = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) − γ 1 + γ ( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | > σ c α/ 2 , and b τ ST = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | ≤ σ c α/ 2 + γ 1 + γ σ c α/ 2 sign( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | > σ c α/ 2 = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) − γ 1 + γ [( b τ 1 − b τ 0 ) − σ c α/ 2 sign( b τ 1 − b τ 0 )] 1 | b τ 1 − b τ 0 | > σ c α/ 2 . F or the pretest estimator, b τ PT − τ = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) − τ − γ 1 + γ ( b τ 1 − b τ 0 ) 1 | b τ 1 − b τ 0 | > σ c α/ 2 S19 d = γ 1 + γ ∆ + 1 √ 1 + γ σ 0 Z 1 − γ 1 + γ (∆ + σ Z 2 ) 1 Z 2 + ∆ σ > c α/ 2 = 1 √ 1 + γ σ 0 Z 1 + γ 1 + γ ∆ 1 Z 2 + r γ 1 + γ ∆ σ 0 ≤ c α/ 2 − r γ 1 + γ σ 0 Z 2 1 Z 2 + r γ 1 + γ ∆ σ 0 > c α/ 2 , whic h prov es Lemma 3.1 . Similarly , for the soft-thresholding estimator, b τ ST − τ = b τ 0 + γ 1 + γ ( b τ 1 − b τ 0 ) − γ 1 + γ [( b τ 1 − b τ 0 ) − σ c α/ 2 sign( b τ 1 − b τ 0 )] 1 | b τ 1 − b τ 0 | > σ c α/ 2 d = γ 1 + γ ∆ + 1 √ 1 + γ σ 0 Z 1 − γ 1 + γ [(∆ + σ Z 2 ) − σ c α/ 2 sign(∆ + σ Z 2 )] 1 Z 2 + ∆ σ > c α/ 2 = 1 √ 1 + γ σ 0 Z 1 + γ 1 + γ ∆ 1 Z 2 + r γ 1 + γ ∆ σ 0 ≤ c α/ 2 − r γ 1 + γ σ 0 Z 2 − c α/ 2 sign Z 2 + r γ 1 + γ ∆ σ 0 1 Z 2 + r γ 1 + γ ∆ σ 0 > c α/ 2 , whic h prov es Lemma 3.2 . C.2 Pro of of Theorem A.1 Pr o of of The or em A.1 . By Theorem 3.1 , the shortest symmetric centered confidence in terv al based on b τ PW is giv en by [ b τ PW − b L PW (1 + γ ) − 1 / 2 σ 0 , b τ PW + b L PW (1 + γ ) − 1 / 2 σ 0 ] , where b L PW is the unique solution to Φ L − γ √ 1 + γ b − Φ − L − γ √ 1 + γ b = 1 − ζ . By the definition of the b-v alue, b ∗ PW ( ζ ) = inf n b ≥ 0 : 0 ∈ [ b τ PW − b L PW (1 + γ ) − 1 / 2 σ 0 , b τ PW + b L PW (1 + γ ) − 1 / 2 σ 0 ] o = inf b ≥ 0 : b L PW ≥ | b τ PW | (1 + γ ) − 1 / 2 σ 0 No w ev aluate the confidence in terv al at b = 0 . In this case b L PW (0) solv es Φ( L ) − Φ( − L ) = 1 − ζ . If Φ | b τ PW | (1 + γ ) − 1 / 2 σ 0 − Φ − | b τ PW | (1 + γ ) − 1 / 2 σ 0 < 1 − ζ , then b L PW (0) > | b τ PW | (1+ γ ) − 1 / 2 σ 0 and therefore b ∗ PW ( ζ ) = 0 . Otherwise, Φ | b τ PW | (1 + γ ) − 1 / 2 σ 0 − Φ − | b τ PW | (1 + γ ) − 1 / 2 σ 0 ≥ 1 − ζ , so b ∗ PW ( ζ ) ∈ (0 , ∞ ) . By definition of the infimum, at b = b ∗ PW ( ζ ) w e m ust hav e the b oundary condition b L PW ( b ∗ PW ( ζ )) = | b τ PW | (1 + γ ) − 1 / 2 σ 0 . S20 Plugging L = | b τ PW | (1+ γ ) − 1 / 2 σ 0 in to the defining equation of b L PW yields that b = b ∗ PW ( ζ ) solves Φ | b τ PW | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ b − Φ − | b τ PW | (1 + γ ) − 1 / 2 σ 0 − γ √ 1 + γ b = 1 − ζ , whic h is the desired characterization. C.3 Pro of of Theorem A.2 Pr o of of The or em A.2 . By Theorem 3.2 , the shortest symmetric centered confidence in terv al based on b τ PT is given by [ b τ PT − b L PT (1 + γ ) − 1 / 2 σ 0 , b τ PT + b L PT (1 + γ ) − 1 / 2 σ 0 ] , where b L PT is the solution to min 0 ≤ t ≤ b P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ . By the definition of the b-v alue, b ∗ PT ( ζ ) = inf n b ≥ 0 : 0 ∈ [ b τ PT − b L PT (1 + γ ) − 1 / 2 σ 0 , b τ PT + b L PT (1 + γ ) − 1 / 2 σ 0 ] o = inf b ≥ 0 : b L PT ≥ | b τ PT | (1 + γ ) − 1 / 2 σ 0 A t b = 0 , the defining equation for b L PT b ecomes P ∆ /σ 0 =0 ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ . If P ∆ /σ 0 =0 ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) < 1 − ζ , then b L PT (0) > | b τ PT | (1+ γ ) − 1 / 2 σ 0 and therefore b ∗ PT ( ζ ) = 0 . A t b = ∞ , the defining equation for b L PT b ecomes min t ≥ 0 P ∆ /σ 0 = t ( | b τ PT − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ . If min t ≥ 0 P ∆ /σ 0 = t ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) > 1 − ζ , then b L PT ( ∞ ) < | b τ PT | (1+ γ ) − 1 / 2 σ 0 and therefore b ∗ PT ( ζ ) = ∞ . Otherwise, min t ≥ 0 P ∆ /σ 0 = t ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) ≤ 1 − ζ , and P ∆ /σ 0 =0 ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) ≥ 1 − ζ , so b ∗ PW ( ζ ) ∈ (0 , ∞ ) . By definition of the infimum, at b = b ∗ PT ( ζ ) we must hav e the b oundary condition b L PT ( b ∗ PT ( ζ )) = | b τ PT | (1 + γ ) − 1 / 2 σ 0 . Plugging L = | b τ PT | (1+ γ ) − 1 / 2 σ 0 in to the defining equation of b L PT yields that b = b ∗ PT ( ζ ) solves min 0 ≤ t ≤ b P ∆ /σ 0 = t ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) = 1 − ζ , whic h is the desired characterization. The explicit form of P ∆ /σ 0 = t ( | e τ PT − τ | ≤ | b τ PT | | b τ PT ) is directly giv en by Theorem 3.2 . S21 C.4 Pro of of Theorem A.3 Pr o of of The or em A.3 . By Theorem 3.3 , the shortest symmetric centered confidence in terv al based on b τ ST is giv en by [ b τ ST − b L ST (1 + γ ) − 1 / 2 σ 0 , b τ ST + b L ST (1 + γ ) − 1 / 2 σ 0 ] , where b L ST is the solution to P ∆ /σ 0 = b ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ . By the definition of the b-v alue, b ∗ ST ( ζ ) = inf n b ≥ 0 : 0 ∈ [ b τ ST − b L ST (1 + γ ) − 1 / 2 σ 0 , b τ ST + b L ST (1 + γ ) − 1 / 2 σ 0 ] o = inf b ≥ 0 : b L ST ≥ | b τ ST | (1 + γ ) − 1 / 2 σ 0 A t b = 0 , the defining equation for b L ST b ecomes P ∆ /σ 0 =0 ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ . If P ∆ /σ 0 =0 ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) < 1 − ζ , then b L ST (0) > | b τ ST | (1+ γ ) − 1 / 2 σ 0 and therefore b ∗ ST ( ζ ) = 0 . A t b = ∞ , the defining equation for b L ST b ecomes P ∆ /σ 0 = ∞ ( | b τ ST − τ | ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ . If P ∆ /σ 0 = ∞ ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) > 1 − ζ , then b L ST ( ∞ ) < | b τ ST | (1+ γ ) − 1 / 2 σ 0 and therefore b ∗ ST ( ζ ) = ∞ . Otherwise, P ∆ /σ 0 = ∞ ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) ≤ 1 − ζ , and P ∆ /σ 0 =0 ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) ≥ 1 − ζ , so b ∗ ST ( ζ ) ∈ (0 , ∞ ) . By definition of the infim um, at b = b ∗ ST ( ζ ) w e must hav e the boundary condition b L ST ( b ∗ ST ( ζ )) = | b τ ST | (1 + γ ) − 1 / 2 σ 0 . Plugging L = | b τ ST | (1+ γ ) − 1 / 2 σ 0 in to the defining equation of b L ST yields that b = b ∗ ST ( ζ ) solves P ∆ /σ 0 = b ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) = 1 − ζ , whic h is the desired characterization. The explicit form of P ∆ /σ 0 = t ( | e τ ST − τ | ≤ | b τ ST | | b τ ST ) is directly giv en by Theorem 3.3 . C.5 Pro of of Theorem A.4 Pr o of of The or em A.4 . W e follow the argument in the pro of of Theorem 3.1 . F rom that pro of, we kno w that (1 + γ ) 1 / 2 σ − 1 0 ( b τ PW − τ ) ∼ N γ √ 1 + γ ∆ σ 0 , 1 . Consider the lo wer confidence b ound [ b τ PW − L (1 + γ ) − 1 / 2 σ 0 , ∞ ) . S22 The w orst-case cov erage probabilit y of the low er confidence b ound is inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ [ b τ PW − L (1 + γ ) − 1 / 2 σ 0 , ∞ )) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( b τ PW − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ((1 + γ ) 1 / 2 σ − 1 0 ( b τ PW − τ ) ≤ L ) = inf ∆: | ∆ /σ 0 |≤ b P ∆ N γ √ 1 + γ ∆ σ 0 , 1 ≤ L =P N sup ∆: | ∆ /σ 0 |≤ b γ √ 1 + γ ∆ σ 0 , 1 ! ≤ L ! =P N γ √ 1 + γ b, 1 ≤ L =Φ L − γ √ 1 + γ b . Therefore, the pro of is complete since b L ′ PW = c ζ + γ √ 1+ γ b is the solution to Φ L − γ √ 1 + γ b = 1 − ζ . C.6 Pro of of Theorem A.5 Pr o of of The or em A.5 . W e follow the argument in the pro of of Theorem 3.2 . F rom that pro of, we kno w that (1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | {| b τ 1 − b τ 0 | ≤ σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 , 1 , and (1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 − √ γ u, 1 . Consider the lo wer confidence b ound [ b τ PT − L (1 + γ ) − 1 / 2 σ 0 , ∞ ) . The w orst-case cov erage probabilit y of the low er confidence b ound is inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ [ b τ PT − L (1 + γ ) − 1 / 2 σ 0 , ∞ )) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( b τ PT − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) . F or a fixed t = ∆ /σ 0 , w e decomp ose the probability according to whether the pretest accepts or rejects: P ∆ /σ 0 = t ( b τ PT − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) =P ∆ /σ 0 = t ((1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) ≤ L ) S23 =P ∆ /σ 0 = t ((1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) ≤ L, | b τ 1 − b τ 0 | ≤ σ c α/ 2 ) + P ∆ /σ 0 = t ((1 + γ ) 1 / 2 σ − 1 0 ( b τ PT − τ ) ≤ L, | b τ 1 − b τ 0 | > σ c α/ 2 ) =P N γ √ 1 + γ t, 1 ≤ L P N r γ 1 + γ t, 1 ≤ c α/ 2 + E U ∼ N ( q γ 1+ γ t, 1) P N γ √ 1 + γ t − √ γ U, 1 ≤ L 1 | U | > c α/ 2 = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t i Φ L − γ √ 1 + γ t + Z − c α/ 2 − q γ 1+ γ t −∞ Φ L + √ γ u ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t Φ L + √ γ u ϕ ( u )d u. Finally , the w orst-case cov erage probabilit y o v er | ∆ /σ 0 | ≤ b is attained for some t ∈ [0 , b ] . Cho osing b L ′ PT to satisfy min 0 ≤ t ≤ b P ∆ /σ 0 = t ( b τ PT − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ completes the pro of. C.7 Pro of of Theorem A.6 Pr o of of The or em A.6 . W e follow the argument in the pro of of Theorem 3.3 . F rom that pro of, we kno w that (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) | {| b τ 1 − b τ 0 | ≤ σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 , 1 , and (1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) | { σ − 1 ( b τ 1 − b τ 0 ) = u, | b τ 1 − b τ 0 | > σ c α/ 2 } ∼ N γ √ 1 + γ ∆ σ 0 − √ γ [ u − c α/ 2 sign( u )] , 1 . Consider the lo wer confidence b ound [ b τ ST − L (1 + γ ) − 1 / 2 σ 0 , ∞ ) . The w orst-case cov erage probabilit y of the low er confidence b ound is inf ∆: | ∆ /σ 0 |≤ b P ∆ ( τ ∈ [ b τ ST − L (1 + γ ) − 1 / 2 σ 0 , ∞ )) = inf ∆: | ∆ /σ 0 |≤ b P ∆ ( b τ ST − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) . F or a fixed t = ∆ /σ 0 , w e decomp ose the probability according to whether the pretest accepts or rejects: P ∆ /σ 0 = t ( b τ ST − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) =P ∆ /σ 0 = t ((1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) ≤ L ) =P ∆ /σ 0 = t ((1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) ≤ L, | b τ 1 − b τ 0 | ≤ σ c α/ 2 ) + P ∆ /σ 0 = t ((1 + γ ) 1 / 2 σ − 1 0 ( b τ ST − τ ) ≤ L, | b τ 1 − b τ 0 | > σ c α/ 2 ) S24 =P N γ √ 1 + γ t, 1 ≤ L P N r γ 1 + γ t, 1 ≤ c α/ 2 + E U ∼ N ( q γ 1+ γ t, 1) P N γ √ 1 + γ t − √ γ [ U − c α/ 2 sign( U )] , 1 ≤ L 1 | U | > c α/ 2 = h Φ c α/ 2 − r γ 1 + γ t − Φ − c α/ 2 − r γ 1 + γ t i Φ L − γ √ 1 + γ t + Z − c α/ 2 − q γ 1+ γ t −∞ Φ L + √ γ ( u + c α/ 2 ) ϕ ( u )d u + Z ∞ c α/ 2 − q γ 1+ γ t Φ L + √ γ ( u − c α/ 2 ) ϕ ( u )d u. Finally , since the cov erage probability is monotonically decreasing in ∆ , the w orst case o ver | ∆ /σ 0 | ≤ b is attained at ∆ /σ 0 = b . Cho osing b L ′ ST to satisfy P ∆ /σ 0 = b ( b τ ST − τ ≤ L (1 + γ ) − 1 / 2 σ 0 ) = 1 − ζ completes the pro of. S25
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment