Covariate Adjustment for Wilcoxon Two Sample Statistic and Test
We apply covariate adjustment to the Wincoxon two sample statistic and Wincoxon-Mann-Whitney test in comparing two treatments. The covariate adjustment through calibration not only improves efficiency in estimation/inference but also widens the appli…
Authors: ** (논문에 명시된 저자 정보를 여기 입력하십시오. 예: 김민수, 박지은, 이현우 등) **
Co v ariate Adjustmen t for Wilco xon Tw o Sample Statistic and T est Zhilan Lou 1 , Jun Shao 2 , Ting Y e 3 , T uo W ang 4 , Y any ao Yi 4 , and Y u Du 4 1 Sc ho ol of Data Sciences, Zhejiang Univ ersity of Finance and Economics, Hangzhou, Zhejiang, China 2 Departmen t of Statistics, Universit y of Wisconsin, Madison, Wisconsin, U.S.A. 3 Departmen t of Biostatistics, Universit y of W ashington, Seattle, W ashington, U.S.A. 4 Global Statistical Science, Eli Lilly and Compan y , Indianapolis, Indiana, U.S.A. ∗ F ebruary 19, 2026 Abstract W e apply cov ariate adjustment to the Wincoxon tw o sample statistic and Wincoxon-Mann- Whitney test in comparing t w o treatmen ts. The co v ariate adjustmen t through calibration not only impro v es efficiency in estimation/inference but also widens the application scop e of the Wilco xon t w o sample statistic and Wincoxon-Mann-Whitney test to situations where co v ariate-adaptive randomization is used. W e motiv ate ho w to adjust cov ariates to reduce v ariance, establish the asymptotic distribution of adjusted Wincoxon tw o sample statistic, and provide explicitly the guaran teed efficiency gain. The asymptotic distribution of adjusted Wincoxon t w o sample statistic is inv arian t to all commonly used cov ariate-adaptiv e randomization schemes so that a unified form ula can b e used in inference regardless of which co v ariate-adaptive randomization is applied. Keyw ords: Cov ariate calibration, Cov ariate-adaptiv e randomization, Confidence interv als, In v ariance of asymptotic distribution, Wilcoxon-Mann-Whitney test 1 In tro duction Consider a random sample of n units, each of whic h is assigned to one and only one treatment j and results in an outcome Y j distributed with an unknown contin uous distribution F j , j = 1 , ..., J , where J ≥ 2 is the num b er of treatments. An example is a clinical trial to study effects of J medical pro ducts, in which units are typically patients. The simplest w a y of assigning treatments is simple randomization that assigns n units completely at random with a pre-determined probability π j > 0 to treatmen t j , P J j =1 π j = 1. ∗ Corresponding author: du_yu@lilly.com 1 Let A ∈ { 1 , ..., J } b e the treatment assignment and A i b e the treatment assignment for unit i in the random sample, i = 1 , ..., n . Details of generating A i ’s are given in Section 2. If A i = j , then unit i is assigned to treatment j and the observed outcome from unit i is denoted by Y iA i = Y ij ∼ (distributed as) Y j . Estimation and inference on unknown characteristics in F 1 , ..., F J can b e carried out based on outcomes Y iA i , i = 1 , ..., n . F or comparing t w o fixed treatments j and k , the well-kno wn Wilcoxon-Mann-Whitney rank-sum test ( Lehmann , 1975 , pages 5-9) is a v aluable nonparametric alternative to the tw o sample t-test (based on sample means of outcomes from t wo treatmen t groups) that is criticized when the outcome Y j or Y k is not normally distributed and/or has large v ariance. The Wilcoxon-Mann-Whitney test statistic is given by n j n k U j k + n j ( n j + 1) 2 , U j k = 1 n j n k X i : A i = j X i ′ : A i = k I ( Y ij ≤ Y i ′ k ) , (1) fo cusing on the n umber of outcome pairs Y ij and Y i ′ k with Y ij ≤ Y i ′ k , where I ( B ) is the indicator of ev en t B , n t is the num b er of i ’s with A i = t , and t = j, k . The U j k in ( 1 ) is called the Wilcoxon tw o sample statistic ( Serfling , 1980 , page 175), a special case of the tw o sample U-statistic for estimating the treatmen t effect θ j k = E ( U j k ) = P ( Y j ≤ Y k ). Under the null h yp othesis H 0 : F j = F k (i.e., there is no difference in p opulations under treatments j and k ), θ j k = 1 / 2 and, th us, the Wilco xon-Mann-Whitney test rejects H 0 when U j k is far wa y from 1 / 2. More details are given in Section 3.1. In many studies there exists cov ariate information related with the outcomes, useful for gaining estimation efficiency . In clinical trials, for example, there are baseline cov ariates not affected by treatmen ts, suc h as patien t’s age, sex, geographical location, o ccupation, education level, disease stage, etc. Utilizing cov ariates to improv e efficiency of estimation and inference is referred to as cov ariate adjustmen t. The Wilcoxon tw o sample statistic U j k in ( 1 ) do es not make use of an y cov ariate and, th us, it ma y b e improv ed by cov ariate adjustment. If a correct mo del b etw een outcomes and cov ariates can b e sp ecified, then U j k can b e improv ed through mo del fitting with cov ariates. How ever, such a mo del- based approac h relies heavily on the mo del correctness. Because a correct mo del may not b e easily sp ecified in applications, mo del-free approaches for co v ariate adjustment hav e caught on recently . In the regulatory agencies of clinical trials, for example, it is particularly recommended to utilize co v ariates “under approximately the same minimal statistical assumptions that would b e needed for unadjusted estimation” ( ICH E9 , 1998 ; EMA , 2015 ; FDA , 2021 ). Note that the consistency and asymptotic normalit y of Wilcoxon tw o sample U j k in ( 1 ) is established under no assumption other than simple randomization and n → ∞ ( Jiang , 2010 ). The purp ose of this pap er is to apply cov ariate adjustment to U j k in ( 1 ) and the related Wilco xon- Mann-Whitney test, through mo del-free cov ariate calibration considered as early as in Cassel et al. ( 1976 ) for survey problems and well summarized in S¨ arndal et al. ( 2003 ). The cov ariate calibration has b een shown to b e effectiv e in gaining efficiency for functions of sample means or estimators from generalized estimation equations ( Y ang and Tsiatis , 2001 ; F reedman , 2008 ; Zhang et al. , 2008 ; Mo ore and v an der Laan , 2009 ; Lin , 2013 ; V ermeulen et al. , 2015 ; W ang et al. , 2019 ; Liu and Y ang , 2020 ; Benk eser et al. , 2021 ; Zhang and Zhang , 2021 ; Cohen and F ogart y , 2023 ; W ang et al. , 2023 ; Y e et al. , 2023 ; Bannic k et al. , 2025 , among others). But our co v ariate adjustment for U j k in ( 1 ) is created b y directly using the cov ariance b etw een U j k and adjusted cov ariates without any assumption. It guaran- 2 tees an asymptotic efficiency gain o v er the unadjusted U j k and provides an inv ariant inference formula for all commonly used co v ariate-adaptive randomization sc hemes (Section 2.1), including simple ran- domization. It also gains additional efficiency in comparing tw o treatmen ts when there are more than t w o treatmen ts ( J > 2). After in tro ducing notation and randomization for treatment assignments, in Section 2.2 we derive co v ariate adjustment/calibration for the Wilcoxon tw o sample statistic U j k in ( 1 ), motiv ated b y why this adjustmen t guaran tees efficiency gain. In Section 2.3 we establish that the prop osed cov ariate adjusted statistic is asymptotically normal under all commonly used co v ariate-adaptive randomization, without any assumption. The co v ariate adjusted statistic is guaran teed to b e more efficient than the unadjusted U j k with explicitly given efficiency gain. Our asymptotic result for adjusted Wilco xon tw o sample statistic is inv ariant to randomization schemes for treatment assignmen ts if cov ariates used in randomization are included in calibration, whic h is an adv antage of our adjustment since practitioners can use a unified formula for inference, regardless of whic h cov ariate-adaptive randomization is used. This unified formula property does not hold for unadjusted U j k as w ell as the sample means for t-tests, since their asymptotic distributions under co v ariate-adaptiv e randomization are different from those under simple randomization. Based on the asymptotic theory and v ariance estimation, in Section 3 we prop ose a cov ariate adjusted Wilcoxon-Mann-Whitney test for H 0 : F j = F k , whic h is more p o w erful (in terms of Pitman’s asymptotic relative efficiency) than the tw o sample t-test and unadjusted Wilco xon-Mann-Whitney test when co v ariates are useful, regardless of the type of outcome distribution. Ev en for normally distributed outcomes under which the tw o sample t-test is more p ow erful than the unadjusted Wilcoxon-Mann-Whitney test, the adjusted Wilcoxon-Mann-Whitney test may ha ve an additional improv ement and b etter than the tw o sample t-test. F urthermore, the Wilcoxon statistic in ( 1 ) and its adjustment are inv ariant to any monotone transformation of outcome, as it is a function of outcome ranks, unlike the t w o sample t-test for which a serious effort may b e needed to find a suitable transformation such as the Box-Co x transformation when Y j app ears to b e non-normal (see Section 5 for a real data example). Section 4 contains some simulation results to examine finite p erformances and complemen t the asymptotic theory . In Section 5, a real data example is considered for illustration. 2 Calibrating Wilco xon Tw o Sample Statistic F ollo wing the notation in Section 1, observ ed outcomes are Y iA i , i = 1 , ..., n , the num b er of outcomes under treatment j is n j , P J j =1 n j = n , and the Wilco xon t wo sample statistic to compare treatments j and k is U j k in ( 1 ) unbiased for θ j k , j = 1 , ..., J , k = 1 , ..., J , j = k . 2.1 Co v ariate-adaptive randomization of treatmen t assignmen ts Co v ariate adjustment can be started at the stage of treatmen t assignment, i.e., generating treatment assignmen ts A 1 , ..., A n , prior to obtaining outcomes. Simple randomization assigns treatmen ts com- pletely at random with probability P ( A i = j ) = π j to assign unit i to treatmen t j with pre-specified π j > 0, P J j =1 π j = 1. Simple randomization do es not make use of cov ariates and may yield treatmen t prop ortions that substantially deviate from the target π j across levels of some baseline prognostic factors, especially when units are sequen tially arrived. T o balance the num b er of units in each treat- 3 men t group across baseline prognostic factors, cov ariate-adaptive randomization has b ecome the new norm. F rom 1989 to 2008, cov ariate-adaptive randomization was used in more than 500 clinical trials ( T a v es , 2010 ); among nearly 300 trials published in tw o years, 2009 and 2014, 237 of them applied co v ariate-adaptiv e randomization ( Ciolino et al. , 2019 ). The tw o most p opular cov ariate-adaptive ran- domization schemes are the stratified p ermuted blo ck ( Zelen , 1974 ) and Poco ck-Simon’s minimization ( T a v es , 1974 ; Poco ck and Simon , 1975 ). Other schemes and details of eac h can b e found in t wo reviews Sc h ulz and Grimes ( 2002 ) and Shao ( 2021 ). Because treatment assignments are more balanced under cov ariate-adaptive randomization, the resulting estimators are more efficien t compared with that under simple randomization. Ho wev er, co v ariate-adaptiv e randomization generates a dep endent sequence of treatmen t assignments A 1 , ..., A n and, thus, conv entional metho d under simple randomization to derive asymptotic prop erties of esti- mators is not applicable ( EMA , 2015 ; FDA , 2021 ). F or example, under the stratified p erm uted blo ck randomization, the Wilcoxon tw o sample statistic U j k in ( 1 ) has an asymptotic distribution differen t from that under simple randomization. Th us, to use the Wilcoxon-Mann-Whitney test under the stratified p ermuted blo ck randomization, one has to derive its asymptotic distribution. This seems to b e what we need to do but we remedy the issue by cov ariate adjustmen t/calibration. 2.2 Co v ariate calibration Let X b e the vector of co v ariates used for adjustmen t and X i b e its v alue from unit i . W e assume that all baseline cov ariates used in co v ariate-adaptive randomization are included in X , whic h is imp ortant as we explained later in Section 3. W e assume that X is c hosen so that Σ = V ar( X ) is finite and non-singular. Also, Σ do es not dep end on j since cov ariates are not affected b y treatments, which is the reason why we can gain efficiency ov er U j k in ( 1 ) by cov ariate adjustment. The idea of cov ariate calibration is to add “estimators” of zero to U j k in ( 1 ) to form U j k ( ¯ X j , ¯ X k , ¯ X ) = U j k + ( ¯ X j − ¯ X ) ⊤ β j − ( ¯ X k − ¯ X ) ⊤ β k , (2) where ¯ X t is the sample mean of X i ’s in treatment group t = j or k , ¯ X is the sample mean of all X i ’s, a ⊤ is the transpose of a , and β j and β k are nonrandom v ectors chosen to reduce the v ariance of U j k and are estimated later if they dep end on unkno wn quantities. The cov ariate calibration refers to calibrate ¯ X t b y ¯ X with cov ariate data not in treatmen t group t . T o find out what β j and β k reduce V ar( U j k ), w e calculate the v ariance of U j k ( ¯ X j , ¯ X k , ¯ X ) in ( 2 ) under simple randomization. By the indep endence and exchangeabilit y of data and the indep endence of data in different treatment groups under simple randomization, Co v( U j k , ¯ X j ) = ( π j n ) − 1 Co v I ( Y ij ≤ Y i ′ k ) , X ij + o ( n − 1 ) = ( π j n ) − 1 Co v E [ I ( Y ij ≤ Y i ′ k ) | Y ij , X ij ] , X ij + o ( n − 1 ) = ( π j n ) − 1 Co v 1 − F k ( Y ij ) , X ij + o ( n − 1 ) = − ( π j n ) − 1 C j k + o ( n − 1 ) , where C j k = Cov { F k ( Y j ) , X j } , X ij (or X j ) is X asso ciated with Y ij (or Y j ), o ( n − 1 ) = n − 1 o (1), and o (1) denotes a term → 0 as n → ∞ . Similarly , Cov( U j k , ¯ X k ) = ( π k n ) − 1 C kj + o ( n − 1 ), where 4 C kj = Co v { F j ( Y k ) , X k } . By the indep endence of data in different treatment groups and these results ab out co v ariances, V ar { U j k ( ¯ X j , ¯ X k , ¯ X ) } = V ar U j k + ( ¯ X j − ¯ X ) ⊤ β j − ( ¯ X k − ¯ X ) ⊤ β k = V ar( U j k ) + V ar ( ¯ X j − ¯ X ) ⊤ β j + V ar ( ¯ X k − ¯ X ) ⊤ β k + 2Co v U j k , ( ¯ X j − ¯ X ) ⊤ β j − 2Co v U j k , ( ¯ X k − ¯ X ) ⊤ β k − 2Co v ( ¯ X j − ¯ X ) ⊤ β j , ( ¯ X k − ¯ X ) ⊤ β k = V ar( U j k ) + 1 − π j π j n β ⊤ j Σ β j + 1 − π k π k n β ⊤ k Σ β k − 2(1 − π j ) π j n β ⊤ j C j k − 2 n β ⊤ j C kj − 2(1 − π k ) π k n β ⊤ k C kj − 2 n β ⊤ k C j k + 2 n β ⊤ j Σ β k + o ( n − 1 ) , where we used ¯ X j − ¯ X = n − n j n ¯ X j − n k n ¯ X k − n − n j − n k n ¯ X − j k with ¯ X − j k denoting the sample mean of X i ’s not in treatment groups j and k , and Co v ( ¯ X j − ¯ X ) ⊤ β j , ( ¯ X k − ¯ X ) ⊤ β k = β ⊤ j { V ar( ¯ X ) − Cov( ¯ X j , ¯ X ) − Cov( ¯ X k , ¯ X ) } β k = − n − 1 β ⊤ j Σ β k . It turns out that if we choose β j = Σ − 1 C j k and β k = Σ − 1 C kj , (3) then V ar { U j k ( ¯ X j , ¯ X k , ¯ X ) } = V ar( U j k ) − 1 − π j π j n β ⊤ j Σ β j − 1 − π k π k n β ⊤ k Σ β k − 2 n β ⊤ j Σ β k + o ( n − 1 ) = V ar( U j k ) − ( π j β k + π k β j ) ⊤ Σ ( π j β k + π k β j ) π j π k ( π j + π k ) n − (1 − π j − π k )( β j − β k ) ⊤ Σ ( β j − β k ) ( π j + π k ) n + o ( n − 1 ) ≤ V ar( U j k ) + o ( n − 1 ) with equality holds if and only if C j k = C kj = 0 (i.e., ¯ X j and ¯ X k are uncorrelated with U j k so that U j k cannot b e improv ed by using ¯ X j and ¯ X k in the adjustment). When J > 2, although we compare just tw o treatments j and k , the prop osed adjustment still uses co v ariates from all J treatments, instead of those just from treatmen ts j and k , b ecause utilizing all X i ’s results in more efficient adjusted estimators. Sp ecifically , if we only use cov ariates in treatmen t groups j and k for adjustment, then adjustment ( 2 ) should b e changed to U j k ( ¯ X j , ¯ X k , ¯ X j k ) = U j k + ( ¯ X j − ¯ X j k ) ⊤ β j − ( ¯ X k − ¯ X j k ) ⊤ β k , 5 where ¯ X j k is the sample mean of X i ’s in treatment groups j and k , and the same calculation leads to V ar { U j k ( ¯ X j , ¯ X k , ¯ X j k ) } = V ar( U j k ) − π k π j ( π j + π k ) n β ⊤ j Σ β j − π j π k ( π j + π k ) n β ⊤ k Σ β k − 2 n β ⊤ j Σ β k + o ( n − 1 ) > V ar( U j k ) − 1 − π j π j n β ⊤ j Σ β j − 1 − π k π k n β ⊤ k Σ β k − 2 n β ⊤ j Σ β k + o ( n − 1 ) = V ar { U j k ( ¯ X j , ¯ X k , ¯ X ) } + o ( n − 1 ) , where the strict inequality follows from π k / ( π j + π k ) < 1 − π j for any j and k when J > 2 (i.e., j and k are not the only treatmen ts) and at least one of C j k and C kj is nonzero (otherwise > should b e replaced by =). T o finish our prop osal for cov ariate adjustment, we just need to substitute β j and β k in ( 3 ) by estimators. F orm ( 3 ), β j and β k in v olve unkno wn Σ , C j k , and C kj . The matrix Σ can be consistently estimated by b Σ = the sample cov ariance matrix of all X i ’s. F rom how C j k and C kj are derived, they can b e consistently estimated resp ectively by b C j k = 1 n j n k X i : A i = j X i ′ : A i ′ = k I ( Y i ′ k ≤ Y ij )( X ij − ¯ X j ) and b C kj = 1 n j n k X i : A i = j X i ′ : A i ′ = k I ( Y ij ≤ Y i ′ k )( X i ′ k − ¯ X k ) . Th us, consisten t estimators of β j and β k are b β j = b Σ − 1 b C j k and b β k = b Σ − 1 b C kj , resp ectiv ely . Our prop osed adjusted Wilcoxon tw o sample statistic is then U C j k = U j k + ( ¯ X j − ¯ X ) ⊤ b β j − ( ¯ X k − ¯ X ) ⊤ b β k . (4) By the consistency of b β j and b β k and the previous discussion, we conclude that U C j k has a v ariance no larger than that of U j k when n is large, under simple randomization. The prop osed cov ariate adjusted U C j k in ( 4 ) also w orks when cov ariate-adaptive randomization is used. The asymptotic ( n → ∞ ) distribution of U C j k and the v ariance reduction prop ert y of U C j k are sho wn next, under cov ariate-adaptive randomization. 2.3 Asymptotic Theory In the follo wing we derive the asymptotic distribution of adjusted U C j k in ( 4 ) and sho w that U C j k is asymptotically more efficient than U j k or equiv alent to U j k when C j k = C kj = 0, for all commonly used cov ariate-adaptiv e randomization schemes satisfying the following minimum condition. (D) Co v ariate-adaptive randomization is carried out using a discrete baseline co v ariate Z with finitely man y joint levels; conditioned on ( Z 1 , ..., Z n ), treatment assignments, outcomes, and cov ariates are indep endent, where Z i is the v alue of Z for the i th unit; P ( A i = j | Z 1 , ..., Z n ) = π j for all i ; and for every lev el z of Z , n z j /n z → π j in probability as n → ∞ , where n z j is the num b er of units with Z i = z and A i = j , and n z = P J j =1 n z j . 6 Condition (D) is satisfied for most p opular cov ariate-adaptive randomization sc hemes, including the stratified p ermuted blo ck and Poco ck-Simon’s minimization ( Baldi Antognini and Zagoraiou , 2015 ), as well as simple randomization (with Z = constan t) that is not cov ariate-adaptive. Theorem 1 . Assume (D) for randomization to generate A 1 , ..., A n and assume that Z used in co v ariate-adaptiv e randomization is included in X for adjustment. Then, as n → ∞ , √ n ( U C j k − θ j k ) con v erges in distribution to the normal distribution with mean 0 and v ariance τ j k + τ kj − φ j k , where τ j k = V ar { F k ( Y j ) } π j , τ kj = V ar { F j ( Y k ) } π k , and φ j k = ( π j β k + π k β j ) ⊤ Σ ( π j β k + π k β j ) π j π k ( π j + π k ) + (1 − π j − π k )( β j − β k ) ⊤ Σ ( β j − β k ) π j + π k . The pro of of Theorem 1 is in the App endix. Here we elab orate the result in Theorem 1 in three asp ects. (i) The result in Theorem 1 is not only v alid for all cov ariate-adaptive randomization schemes satisfying (D), but also in v ariant in the sense that the limiting v ariance τ j k + τ kj − φ j k in Theorem 1 is the same for all cov ariate-adaptive randomization satisfying (D), which means a unified proc edure can be used in inference, desirable for practitioners as no tailored form ula to eac h randomization scheme is needed. T o achiev e this, we must ensure that Z used in cov ariate- adaptiv e randomization is included in X . Without this condition, the asymptotic distribution of U C j k v aries with randomization scheme and is different from that in Theorem 1. (ii) Under simple randomization, for the unadjusted U j k , √ n ( U j k − θ j k ) is asymptotically normal with mean 0 and v ariance τ j k + τ kj ( Jiang , 2010 , pages 380-382). This immediately sho ws that the adjusted U C j k is asymptotically more efficient than the unadjusted U j k under simple randomization, unless φ j k = 0, i.e., ¯ X j and ¯ X k are uncorrelated with U j k , in which case U C j k and U j k ha v e the same asymptotic distribution. When cov ariate-adaptive randomization is applied, ho w ever, the asymptotic distribution of unadjusted U j k v aries with randomization scheme and is not av ailable from the literature for some cov ariate-adaptive randomization suc h as Poco ck- Simon’s minimization. (iii) When J > 2, although w e compare just tw o treatments j and k , the prop osed adjustment is asymptotically more efficien t than cov ariate adjustmen t just using cov ariates from treatment groups j and k , as discussed in Section 2.2. 3 Inference The asymptotic distribution of U C j k in Theorem 1 is explicit and useful for statistical inference on unkno wn θ j k considered in this section. 7 3.1 T esting Consider testing null h yp othesis H 0 : F j = F k . Under H 0 , b oth F k ( Y ij ) and F j ( Y ik ) are uniform on the interv al [0 , 1] since F j and F k are contin uous and, hence, θ j k = 1 2 and τ j k + τ kj = 1 12 1 π j + 1 π k . With the unadjusted U j k , the Wilcoxon-Mann-Whitney test rejects H 0 when √ n | U j k − 1 2 | > z α/ 2 q 1 12 1 π j + 1 π k , (5) where z α/ 2 is the 1 − α/ 2 quan tile of N (0 , 1) and α ∈ (0 , 0 . 5) is a given lev el of significance. Under simple randomization, the unadjusted Wilcoxon-Mann-Whitney test given by ( 5 ) has asymptotic significance lev el α , as w e discussed in elab oration (ii) of Section 2.3. The R pack age wilcox.test() uses ( 5 ) with a sligh t mo dification, i.e., 1 π j + 1 π k is replaced by n n j + n n k + n n j n k , where n n j n k is a contin uity correction. When co v ariate-adaptive randomization is applied, how ever, the asymptotic significance level of unadjusted Wilcoxon-Mann-Whitney test in ( 5 ) v aries with randomization scheme and is not av ailable for P o co c k-Simon’s minimization. F rom Theorem 1, the asymptotic v ariance of U C j k is τ j k + τ kj − φ j k . Under H 0 : F j = F k , τ j k + τ kj = 1 12 1 π j + 1 π k and φ j k = β ⊤ Σ β 1 π j + 1 π k , since β j = β k = β not dep ending on j and k . Let b β = ( π j b β j + π k b β k ) / ( π j + π k ), where b β j and b β k are giv en in ( 4 ). Then, based on cov ariate adjusted U C j k , we prop ose the adjusted Wilcoxon-Mann-Whitney test that rejects H 0 when √ n | U C j k − 1 2 | > z α/ 2 q 1 12 − b β ⊤ b Σ b β 1 π j + 1 π k , (6) whic h has asymptotic significance level α regardless of which co v ariate-adaptive randomization sc heme satisfying (D) is used. Our cov ariate adjustmen t widens the application scop e of Wilcoxon-Mann- Whitney test (to situations where cov ariate-adaptive randomization is applied). 3.2 Comparison of tests by Pitman’s ARE The following result sho ws that, under simple randomization, adjusted Wilco xon-Mann-Whitney test ( 6 ) is more efficient than unadjusted Wilco xon-Mann-Whitney test ( 5 ), in terms of Pitman’s asymptotic relativ e efficiency (ARE) ( Serfling , 1980 , pages 316-318). The pro of is in the App endix. Theorem 2 . Consider the n ull hypothesis H 0 : F j = F k . Under simple randomization and the con tiguous alternative hypothesis with F j ha ving v ariance σ 2 j , contin uous density f j , and F k ( y ) = F j ( y − γ n − 1 / 2 ) for a constant γ = 0, (i) the ARE of unadjusted Wilcoxon-Mann-Whitney test ( 5 ) relative to the tw o sample t-test is 12 σ 2 j { R f 2 j ( y ) dy } 2 , as n → ∞ ; (ii) the ARE of adjusted Wilcoxon-Mann-Whitney test ( 6 ) relative to unadjusted Wilcoxon-Mann- Whitney test ( 5 ) is 1 / (1 − 12 β ⊤ Σ β ) ≥ 1 with equality holds if and only if β = 0, where β is the limit of β j and β k as n → ∞ ; (iii) the ARE of adjusted Wilcoxon-Mann-Whitney test ( 6 ) relative to the tw o sample t-test is 12 σ 2 j { R f 2 j ( y ) dy } 2 / (1 − 12 β ⊤ Σ β ). 8 Note that w e do not mak e the ARE comparison under cov ariate-adaptive randomization, b ecause unadjusted Wilcoxon-Mann-Whitney test ( 5 ) or the tw o sample t-test ma y not hav e asymptotic level α as we discussed in Sections 2.3 and 3.1. Our sim ulation results in Section 4 show that unadjusted Wilco xon-Mann-Whitney test ( 5 ) or the tw o sample t-test are conserv ativ e under cov ariate-adaptive randomization and has p ow er low er than that of adjusted Wilcoxon-Mann-Whitney test ( 6 ). The efficiency of unadjusted Wilcoxon-Mann-Whitn ey test compared with t w o sample t-test is quite high since ARE = 0 . 955 when F j is normal (in whic h case the use of t-test is justified), ARE = 1 when F j is uniform on the interv al [0,1], ARE = 1 . 5 when F j is double-exp onential, and ARE is b ounded b elow by 0.864 for any F j ( Ho dges and Lehmann , 1956 ). It follows from the low er b ound for 12 σ 2 j { R f 2 j ( y ) dy } 2 and Theorem 2(iii) that, if 1 − 12 β ⊤ Σ β < 0 . 864, then adjusted Wilcoxon-Mann- Whitney test ( 6 ) is more p ow erful than the tw o sample t-test (without co v ariate adjustmen t) for any F j (including the case where F j is normal). These results are confirmed in our simulation presented in Section 4. 3.3 Confidence in terv al The parameter θ j k measures the difference b etw een F j and F k when H 0 is rejected. W e may assess the treatment effect b y setting a confidence interv al for θ j k . This requires a consisten t estimator of τ j k + τ kj − φ j k in Theorem 1 regardless of whether H 0 holds or not. While φ j k in Theorem 1 can b e consistently estimated by b φ j k b eing φ j k with Σ , β j and β k substituted by b Σ , b β j and b β k , resp ectiv ely , regardless of whether H 0 holds and which co v ariate-adaptiv e randomization scheme is used, the remaining term τ j k + τ kj , which is not equal to 1 12 ( 1 π j + 1 π k ) when H 0 do es not hold, has to b e estimated. Since V ar { F j ( Y k ) } = E { F 2 j ( Y k ) } − [ E { F j ( Y k ) } ] 2 = E { F 2 j ( Y k ) } − θ 2 j k , w e can consistently estimate τ kj b y b τ kj = 1 π k 1 n k X i ′ : A i ′ = k 1 n j X i : A i = j I ( Y ij ≤ Y i ′ k ) 2 − U 2 j k . Similarly , a consistent estimator of τ j k is b τ j k = 1 π j 1 n j X i : A i = j 1 n k X i ′ : A i ′ = k I ( Y ij ≤ Y i ′ k ) 2 − U 2 j k . Hence, we estimate τ j k + τ kj − φ j k in Theorem 1 by b τ j k + b τ kj − b φ j k , which is consisten t regardless of whether H 0 holds or not and whic h co v ariate-adaptive randomization is used. A large sample lev el 1 − α confidence interv al for θ j k based on calibrated U C j k and Theorem 1 has tw o end p oints U C j k ± z α/ 2 p ( b τ j k + b τ kj − b φ j k ) /n . Without co v ariate adjustmen t, one can use a large sample lev el 1 − α confidence in terv al with t wo end p oints U j k ± z α/ 2 p ( b τ j k + b τ kj ) /n , v alid only under simple randomization. 9 4 Sim ulation W e consider a simulation study to c heck the finite sample performance of unadjusted U j k in ( 1 ), co v ariate adjusted U C j k in ( 4 ), ¯ Y j − ¯ Y k related to tw o sample t-test, and related tests for H 0 : F j = F k with α = 0 . 05, where ¯ Y j and ¯ Y k are resp ectively the sample means for outcomes in treatment groups j and k . The t w o sample t-test do es not adjust for cov ariate; it is included in our comparison to sho w that the better wa y to improv e the unadjusted Wilco xon-Mann-Whitney test is using cov ariate adjustmen t rather than t-test. A thorough discussion of improving t-tests using cov ariate adjustmen t is given in Y e et al. ( 2023 ). W e generate a random sample of outcomes and co v ariates. The 2-dimensional co v ariate X ∼ the biv ariate normal distribution with zero means, unit v ariances, and correlation coefficient 0 . 3. There are a total of J = 4 treatments with π j = 0 . 25 for all j , but we only fo cus on treatments j = 1 and k = 2 for estimation and testing. T o generate treatmen t assignmen ts A i ’s, w e emplo y simple randomization or cov ariate-adaptive randomization with stratified p ermuted blo ck of size 8, in which strata are four categories of the first comp onent of X discretized with equal probabilities. Giv en treatment A , the outcome Y A | X , i.e., Y A conditioned on cov ariate X , is generated in the follo wing t w o differen t cases. 1. Normal outcome: Y A | X ∼ the normal distribution with mean a ( A − 1) + (0 . 3 , 0 . 3) ⊤ X and v ariance 0.25, where a = 0 , 0 . 1 , 0 . 2, or 0 . 3. 2. Double-exp onential outcome: Y A | X ∼ the double-exp onential distribution with mean a ( A − 1) + (0 . 3 , 0 . 3) ⊤ X and v ariance 0 . 5, where a = 0 , 0 . 1 , 0 . 2, or 0 . 3. In any case, the null hypothesis H 0 holds when a = 0 and the alternative H 1 holds when a = 0. W e consider sample sizes n = 200, 400, and 600. All observ ed cov ariates are used for cov ariate adjustmen t. The simulation results based on 5,000 replications are giv en in T able 1 for normal outcome and T able 2 for double-exp onential outcome. The results include the av erage bias (AB), standard deviation (SD), av erage of estimated standard deviation (SE), cov erage probability (CP) of the 95% asymptotic confidence interv al, type I error probability (P when a = 0), and p ow er (P when a = 0). The followin g is a summary of the results in T ables 1 and 2 . 1. The type I errors (P when a = 0) are close to 5% for all tests under simple randomization, but are muc h smaller than 5% for t-test and unadjusted Wilcoxon-Mann-Whitney test under co v ariate-adaptiv e randomization, showing that they are conserv ative as we discussed in Section 3. This conserv ativ eness also affects the co verage probabilit y of confidence interv als using sample means and unadjusted U j k . 2. In terms of SD, the adjusted U C j k in ( 4 ) is muc h b etter than the unadjusted U j k in ( 1 ). 3. In terms of p ow er (P w hen a = 0), the t wo sample t-test compared with unadjusted Wilco xon- Mann-Whitney test ( 5 ) is sligh tly b etter for normal outcome (T able 1 ), but worse for double- exp onen tial outcome (T able 2 ). Cov ariate adjusted Wilcoxon-Mann-Whitney test ( 6 ) is muc h more pow erful than the tw o sample t-test and unadjusted Wilcoxon-Mann-Whitney test ( 5 ) (b oth of whic h ha ve no cov ariate adjustment), regardless of whether outcome is normal or 10 double-exp onen tial or whether cov ariate-adaptive randomization is applied. This confirms our asymptotic theory in Section 3.3. 4. The prop osed v ariance estimator (SE) w orks well and the cov erage probabilities are close to the targeted 95%. 5 Example W e consider a dataset from a randomized phase 3 clinical trial sp onsored by Eli Lilly and Compan y , for adults with type 2 diab etes who remained inadequately controlled on standard oral glucose-low ering therap y . Eligible participants are randomly assigned in a 1:1:1:3 ratio to one of three treatment dose arms and an active control group, using stratified permuted blo ck randomization with blo ck size 6 to ensure balance across imp ortant baseline characteristics. T o illustrate the prop osed metho d, we fo cus on a pre-sp ecified exploratory outcome, the c hange in total cholesterol from baseline to week 52 . W e restrict the analysis to the subgroup of n = 491 participan ts with baseline b o dy mass index (BMI) greater than 35kg/m 2 , n 1 = 76 randomized to treatment dose 1, n 2 = 83 to treatment dose 2, n 3 = 88 to treatment dose 3, and n 4 = 244 to con trol group. In this subgroup, the mean age is 62 y ears, 45% are female, and the median baseline total cholesterol is 154 mg/dL with an interquartile range of 130 to 183 mg/dL. F or pairwise comparisons b etw een the con trol group and each treatment arm, we test three n ull h yp otheses of interest: H 0 : F j = F 4 , where F 4 is the outcome distribution under control group and F j is the outcome distribution under treatment with dose j = 1 , 2 , 3. W e consider four testing metho ds. (i) The tw o sample t-test applied to raw changes in total cholesterol. Note that the tw o sample t-test is not for H 0 : F j = F 4 unless raw changes in total cholesterol are normally distributed with the same v ariance across treatment arms. (ii) The t wo sample t-test applied to transformed outcome, log (total cholesterol at week 52) − log (total cholesterol at baseline). W e consider this transformation b ecause lipid data often exhibit sk ew ed distribution and raw changes in total cholesterol do not appear to b e normally distributed and ha v e large v ariabilit y . (iii) The unadjusted Wilcoxon-Mann-Whitney test applied to raw changes in total cholesterol. (iv) The cov ariate adjusted Wilco xon-Mann-Whitney test applied to raw c hanges in total cholesterol, with adjustment using five baseline cov ariates, the age, systolic blo o d pressure, hemoglobin A1c, BMI, and baseline total cholesterol. F or eac h pairwise comparison, T able 3 provides the tw o-sided p-v alue for each of the four testing metho ds, the standard error (SE) of the difference of sample means from treatment j and 4 when the t w o sample t-test (saw data or transformed data) is used or the SE of U j 4 or U C j 4 when Wilco xon- Mann-Whitney test (unadjusted or adjusted) is used, and the confidence interv al (CI) related to eac h testing metho ds. 11 T able 1: Simulation results based on 5,000 replications for normal outcome: AB = av erage of bias, SD = standard deviation, SE = av erage of estimated SD, CP = co verage probability of 95% asymptotic confidence interv al, P = type I error probability when a = 0 and P = p ow er when a = 0. Simple randomization Stratified p erm uted blo ck a n Estimator AB SD SE CP P AB SD SE CP P 0 200 ¯ Y j − ¯ Y k -0.002 0.141 0.142 0.948 0.052 0.002 0.120 0.141 0.974 0.026 U j k -0.001 0.059 0.058 0.938 0.057 0.001 0.050 0.058 0.971 0.027 U C j k 0.000 0.046 0.045 0.938 0.052 0.001 0.044 0.044 0.945 0.048 400 ¯ Y j − ¯ Y k -0.003 0.101 0.099 0.944 0.056 0.002 0.085 0.099 0.975 0.025 U j k -0.001 0.042 0.041 0.944 0.053 0.001 0.036 0.041 0.973 0.025 U C j k 0.000 0.031 0.031 0.942 0.055 0.001 0.031 0.031 0.948 0.050 600 ¯ Y j − ¯ Y k 0.000 0.080 0.081 0.954 0.046 -0.001 0.069 0.081 0.975 0.025 U j k 0.000 0.033 0.033 0.951 0.047 0.000 0.029 0.033 0.974 0.025 U C j k 0.000 0.025 0.025 0.947 0.051 0.000 0.025 0.024 0.952 0.048 0.1 200 ¯ Y j − ¯ Y k -0.002 0.141 0.142 0.948 0.112 0.002 0.120 0.141 0.974 0.076 U j k 0.000 0.059 0.058 0.938 0.112 0.002 0.050 0.058 0.969 0.078 U C j k 0.000 0.046 0.044 0.936 0.155 0.002 0.044 0.044 0.946 0.152 400 ¯ Y j − ¯ Y k -0.003 0.101 0.099 0.944 0.175 0.002 0.085 0.099 0.975 0.141 U j k -0.001 0.042 0.041 0.944 0.168 0.002 0.035 0.041 0.974 0.138 U C j k 0.000 0.031 0.030 0.942 0.254 0.001 0.031 0.030 0.948 0.265 600 ¯ Y j − ¯ Y k 0.000 0.080 0.081 0.954 0.236 -0.001 0.069 0.081 0.975 0.193 U j k 0.000 0.033 0.033 0.951 0.229 0.000 0.029 0.033 0.975 0.184 U C j k 0.000 0.025 0.024 0.946 0.374 0.000 0.025 0.024 0.949 0.367 0.2 200 ¯ Y j − ¯ Y k -0.002 0.141 0.142 0.948 0.286 0.002 0.120 0.141 0.974 0.266 U j k 0.000 0.058 0.057 0.938 0.281 0.001 0.049 0.057 0.968 0.260 U C j k 0.001 0.045 0.043 0.936 0.438 0.002 0.043 0.043 0.945 0.448 400 ¯ Y j − ¯ Y k -0.003 0.101 0.099 0.944 0.497 0.002 0.085 0.099 0.975 0.537 U j k 0.000 0.041 0.040 0.943 0.485 0.001 0.035 0.040 0.975 0.516 U C j k 0.001 0.031 0.030 0.942 0.735 0.001 0.030 0.030 0.949 0.759 600 ¯ Y j − ¯ Y k 0.000 0.080 0.081 0.954 0.700 -0.001 0.069 0.081 0.975 0.726 U j k 0.000 0.032 0.033 0.949 0.679 0.000 0.028 0.033 0.973 0.699 U C j k 0.000 0.024 0.024 0.947 0.904 0.000 0.024 0.024 0.950 0.906 0.3 200 ¯ Y j − ¯ Y k -0.002 0.141 0.142 0.948 0.556 0.002 0.120 0.141 0.974 0.585 U j k -0.001 0.057 0.056 0.940 0.538 0.001 0.048 0.056 0.968 0.564 U C j k 0.001 0.044 0.043 0.934 0.757 0.002 0.042 0.042 0.946 0.791 400 ¯ Y j − ¯ Y k -0.003 0.101 0.099 0.944 0.842 0.002 0.085 0.099 0.975 0.901 U j k 0.000 0.041 0.039 0.944 0.827 0.001 0.034 0.039 0.973 0.887 U C j k 0.001 0.030 0.030 0.943 0.975 0.001 0.030 0.030 0.945 0.976 600 ¯ Y j − ¯ Y k 0.000 0.080 0.081 0.954 0.960 -0.001 0.069 0.081 0.975 0.977 U j k 0.001 0.032 0.032 0.952 0.956 0.000 0.028 0.032 0.973 0.973 U C j k 0.001 0.024 0.024 0.946 0.998 0.000 0.024 0.024 0.949 0.999 12 T able 2: Simulation results based on 5,000 replications for double-exp onential outcome: AB = a verage of bias, SD = standard deviation, SE = av erage of estimated SD, CP = cov erage probability of 95% asymptotic confidence interv al, P = type I error probability when a = 0 and P = p ow er when a = 0. Simple randomization Stratified p erm uted blo ck a n Estimator AB SD SE CP P AB SD SE CP P 0 200 ¯ Y j − ¯ Y k -0.003 0.173 0.174 0.956 0.044 -0.002 0.155 0.173 0.973 0.027 U j k -0.001 0.058 0.058 0.947 0.048 -0.001 0.052 0.058 0.971 0.026 U C j k -0.001 0.048 0.048 0.944 0.051 -0.001 0.048 0.047 0.942 0.052 400 ¯ Y j − ¯ Y k 0.001 0.122 0.122 0.954 0.046 -0.001 0.111 0.122 0.965 0.035 U j k 0.000 0.041 0.041 0.947 0.051 0.000 0.037 0.041 0.968 0.030 U C j k 0.000 0.034 0.033 0.942 0.054 0.000 0.033 0.033 0.952 0.046 600 ¯ Y j − ¯ Y k -0.001 0.098 0.099 0.949 0.051 0.000 0.090 0.099 0.972 0.028 U j k 0.000 0.033 0.033 0.946 0.052 0.000 0.030 0.033 0.970 0.029 U C j k 0.000 0.027 0.027 0.947 0.052 0.000 0.027 0.027 0.945 0.053 0.1 200 ¯ Y j − ¯ Y k -0.003 0.173 0.174 0.955 0.079 -0.001 0.155 0.173 0.973 0.064 U j k 0.000 0.058 0.058 0.949 0.090 -0.001 0.052 0.058 0.970 0.065 U C j k 0.000 0.048 0.047 0.943 0.112 -0.001 0.048 0.047 0.941 0.113 400 ¯ Y j − ¯ Y k 0.001 0.122 0.122 0.954 0.135 -0.001 0.111 0.122 0.965 0.103 U j k 0.000 0.041 0.041 0.946 0.142 0.000 0.037 0.041 0.967 0.111 U C j k 0.001 0.034 0.033 0.941 0.192 0.001 0.033 0.033 0.951 0.194 600 ¯ Y j − ¯ Y k -0.003 0.098 0.099 0.949 0.168 0.001 0.090 0.099 0.972 0.148 U j k -0.001 0.033 0.033 0.947 0.181 -0.001 0.030 0.033 0.970 0.156 U C j k -0.001 0.027 0.027 0.947 0.259 -0.001 0.027 0.027 0.943 0.264 0.2 200 ¯ Y j − ¯ Y k -0.004 0.173 0.174 0.955 0.215 -0.005 0.155 0.173 0.972 0.185 U j k -0.001 0.057 0.057 0.949 0.231 -0.002 0.051 0.057 0.970 0.202 U C j k 0.000 0.048 0.047 0.947 0.309 -0.001 0.048 0.047 0.938 0.313 400 ¯ Y j − ¯ Y k 0.002 0.122 0.122 0.954 0.378 0.000 0.111 0.122 0.965 0.363 U j k 0.000 0.041 0.040 0.947 0.409 0.000 0.036 0.040 0.966 0.399 U C j k 0.000 0.034 0.033 0.941 0.571 0.001 0.033 0.033 0.951 0.575 600 ¯ Y j − ¯ Y k -0.002 0.098 0.099 0.949 0.510 0.001 0.090 0.099 0.972 0.523 U j k -0.001 0.033 0.033 0.948 0.555 0.000 0.029 0.033 0.971 0.575 U C j k 0.000 0.027 0.027 0.947 0.747 0.000 0.027 0.026 0.944 0.743 0.3 200 ¯ Y j − ¯ Y k -0.002 0.173 0.174 0.956 0.415 -0.002 0.155 0.173 0.973 0.402 U j k 0.000 0.057 0.057 0.950 0.442 -0.001 0.051 0.056 0.968 0.445 U C j k 0.001 0.047 0.046 0.945 0.592 0.000 0.047 0.046 0.941 0.593 400 ¯ Y j − ¯ Y k 0.001 0.122 0.122 0.954 0.697 0.000 0.111 0.122 0.965 0.708 U j k 0.000 0.040 0.040 0.949 0.739 0.000 0.036 0.040 0.967 0.755 U C j k 0.001 0.033 0.033 0.942 0.881 0.001 0.033 0.032 0.951 0.888 600 ¯ Y j − ¯ Y k -0.003 0.098 0.099 0.949 0.854 0.001 0.090 0.099 0.972 0.876 U j k -0.001 0.032 0.033 0.949 0.882 0.000 0.029 0.032 0.971 0.916 U C j k 0.000 0.027 0.026 0.947 0.974 0.000 0.026 0.026 0.943 0.974 13 T able 3: Comparison on c hange in total cholesterol under different treatment dose s versus the active con trol from a completed phase 3 trial for the treatmen t of diab etes. T esting metho d Tw o sample t-test Wilco xon-Mann-Whitney test Comparison ra w data transformed data unadjusted co v ariate adjusted treatmen t dose 1 vs control p-v alue 0.122 0.051 0.023 0.044 SE 4.746 0.028 0.038 0.034 CI (-16.812, 2.002) (-0.112, 0.000) (0.339, 0.489) (0.365, 0.498) treatmen t dose 2 vs control p-v alue 0.089 0.081 0.063 0.022 SE 4.486 0.029 0.038 0.031 CI (-16.557, 1.201) (-0.106, 0.006) (0.358, 0.506) (0.367, 0.486) treatmen t dose 3 vs control p-v alue 0.112 0.039 0.079 0.045 SE 4.391 0.027 0.037 0.034 CI (-15.719, 1.651) (-0.110, -0.003) (0.364, 0.509) (0.368, 0.501) SE: the standard error of difference of sample m eans (ra w data or transformed) under tw o sample t-test SE: the standard error of Wilconxon statistic (unadjusted or adjusted) under Wilcoxon-Mann-Whitney test The empirical results in T able 3 demonstrate that the co v ariate adjusted U C j 4 has smaller SE than the unadjusted U j 4 , which is consistent with our theory and simulation results. When 5% is considered as the significance lev el, the co v ariate adjusted Wilcoxon-Mann-Whitney test rejects all three null h yp otheses in pairwise comparisons, whereas the unadjusted Wilco xon-Mann-Whitney test cannot reject when j = 2 and j = 3, indicating its inefficiency . The t wo sample t-test applied to ra w data in this example is apparently not appropriate, since it cannot reject an y n ull hypothesis and its related confidence interv al is uselessly to o wide. Although the tw o sample t-test applied to log-transformed outcomes is b etter than that based on raw data, its p erformance is still not as go o d as the Wilcoxon-Mann-Whitney test with cov ari- ate adjustment. F urthermore, the interpretation of confidence interv als based on transformed out- comes is not straightforw ard, since the confidence interv al on the log scale do es not translate directly bac k to the original outcome scale and ma y lead to incorrect interpretations, due to the fact that E (log Y ) = log E ( Y ) (geometric mean versus arithmetic mean). By con trast, the confidence interv als based on Wilcoxon t w o sample statistics, adjusted or unadjusted, are on the original scale without an y transformation needed to meet distributional requirement. 14 References Baldi Antognini, A. and Zagoraiou, M. (2015). On the almost sure con v ergence of adaptive allo cation pro cedures. Bernoul li Journal , 21(2):881–908. Bannic k, M., Shao, J., Liu, J., Du, Y., Yi, Y., and Y e, T. (2025). A general form of co v ariate adjustmen t in clinical trials under cov ariate-adaptive randomization. Biometrika , 112:in press. Benk eser, D., Diaz, I., Luedtke, A., Segal, J., Scharfstein, D., and Rosen blum, M. (2021). Improving precision and p ow er in randomized trials for COVID-19 treatments using cov ariate adjustmen t, for binary , ordinal, and time-to-even t outcomes. Biometrics , 77:1467–1481. Cassel, C. M., S¨ arndal, C. E., and W retman, J. H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite p opulations. Biometrika , 63(3):615–620. Ciolino, J. D., Palac, H. L., Y ang, A., V aca, M., and Belli, H. M. (2019). Ideal vs. real: a systematic review on handling cov ariates in randomized controlled trials. BMC Me dic al R ese ar ch Metho dolo gy , 19(1):136. Cohen, P . L. and F ogarty , C. B. (2023). No-harm calibration for generalized Oaxaca–Blinder estimators. Biometrika , 111(1):331–338. EMA (2015). Guideline on adjustmen t for baseline co v ariates in clinical trials. Committee for Medicinal Pro ducts for Human Use, Europ ean Medicines Agency (EMA). FD A (2021). Adjusting for cov ariates in randomized clinical trials for drugs and biological pro ducts. Draft Guidance for Industry . Center for Drug Ev aluation and Research and Center for Biologics Ev aluation and Research, F o o d and Drug Administration (FDA), U.S. Department of Health and Human Services. May 2021. F reedman, D. A. (2008). Randomization do es not justify logistic regression. Statistic al Scienc e , 23(2):237–249. Ho dges, J. and Lehmann, E. (1956). The efficiency of some nonparametric comp etitors of the t-test. A nnals of Mathematic al Statistics , 27:324–335. ICH E9 (1998). Statistical principles for clinical trials E9. International Council for Harmonisation (ICH). Jiang, J. (2010). L ar ge Sample T e chniques for Statistics . Springer. Lehmann, E. (1975). Nonp ar ametrics . Holden-Day , Inc. Lin, W. (2013). Agnostic notes on regression adjustmen ts to exp erimental data: Reexamining freed- man’s critique. Annals of Applie d Statistics , 7(1):295–318. Liu, H. and Y ang, Y. (2020). Regression-adjusted a v erage treatment effect estimates in stratified randomized exp erimen ts. Biometrika , 107(4):935–948. 15 Mo ore, K. L. and v an der Laan, M. J. (2009). Cov ariate adjustment in randomized trials with binary outcomes: targeted maxim um likelihoo d estimation. Statistics in Me dicine , 28(1):39–64. P o co c k, S. J. and Simon, R. (1975). Sequential treatment assignmen t with balancing for prognostic factors in the controlled clinical trial. Biometrics , 31(1):103–115. S¨ arndal, C.-E., Swensson, B., and W retman, J. (2003). Mo del Assiste d Survey Sampling . Springer Science & Business Media. Sc h ulz, K. F. and Grimes, D. A. (2002). Generation of allo cation sequences in randomised trials: c hance, not choice. The L anc et , 359(9305):515–519. Serfling, R. (1980). Appr oximation The or ems of Mathematic al Statistics . John Wiley and Sons, New Y ork. Shao, J. (2021). Inference for cov ariate-adaptive randomization: Asp ects of methodology and theory (with discussions). Statistic al The ory and R elate d Fields , 5:172–186. T a v es, D. R. (1974). Minimization: A new metho d of assigning patients to treatmen t and control groups. Clinic al Pharmac olo gy and Ther ap eutics , 15(5):443–453. T a v es, D. R. (2010). The use of minimization in clinical trials. Contemp or ary Clinic al T rials , 31(2):180– 184. V ermeulen, K., Thas, O., and V ansteelandt, S. (2015). Increasing the p ow er of the Mann-Whitney test in randomized exp eriments through flexible cov ariate adjustment. Statistics in Me dicine , 34:1012– 1030. W ang, B., Ogburn, E. L., and Rosenblum, M. (2019). Analysis of co v ariance in randomized trials: More precision and v alid confidence interv als, without mo del assumptions. Biometrics , 75(4):1391–1400. W ang, B., Susukida, R., Mo jtabai, R., Amin-Esmaeili, M., and Rosenblum, M. (2023). Model-robust inference for clinical trials that improv e precision b y stratified randomization and cov ariate adjust- men t. Journal of the A meric an Statistic al Asso ciation , 118:1152–1163. Y ang, L. and Tsiatis, A. A. (2001). Efficiency study of estimators for a treatment effect in a pretest– p osttest trial. The A meric an Statistician , 55(4):314–321. Y e, T., Shao, J., Yi, Y., and Zhao, Q. (2023). T ow ard b etter practice of cov ariate adjustment in analyzing randomized clinical trials. Journal of the Americ an Statistic al Asso ciation , 118:2370– 2382. Zelen, M. (1974). The randomization and stratification of patien ts to clinical trials. Journal of Chr onic Dise ases , 27(7):365–375. Zhang, M., Tsiatis, A. A., and Davidian, M. (2008). Impro ving efficiency of inferences in randomized clinical trials using auxiliary cov ariates. Biometrics , 64(3):707–715. 16 Zhang, M. and Zhang, B. (2021). Disscusion of “Improving precision and pow er in randomized trials for CO VID-19 treatmen ts using cov ariate adjustment, for binary , ordinal, and time-to-even t outcomes”. Biometrics , 77:1485–1488. A Pro of of Theorem 1 F rom page 381 of Jiang ( 2010 ), √ n ( U j k − θ j k ) = √ n ( ¯ W j k + ¯ W kj ) + o p (1) , where ¯ W j k = 1 n j X i : A i = j { 1 − F k ( Y ij ) − θ j k } , ¯ W kj = 1 n k X i : A i = k { F j ( Y ik ) − θ j k } , and o p (1) denotes a term conv erges to 0 in probability . Consequently , √ n ( U C j k − θ j k ) = √ n ( ¯ W j k + ¯ W kj ) + ( ¯ X j − ¯ X ) ⊤ b β j − ( ¯ X k − ¯ X ) ⊤ b β k + o p (1) = √ n ( ¯ W j k + ¯ W kj ) + ( ¯ X j − ¯ X ) ⊤ β j − ( ¯ X k − ¯ X ) ⊤ β k + o p (1) . The rest of pro of follo ws the same argument in the proofs of Corollary 1 and Theorem 2 in Y e et al. ( 2023 ), since ¯ W j k and ¯ W kj are types of sample means with outcomes in treatment groups j and k , resp ectiv ely . B Pro of of Theorem 2 (i) Under the con tiguous alternative hypothesis with F j ha ving mean 0, v ariance σ 2 j , and con tin uous densit y f j , and F k ( y ) = F j ( y − γ n − 1 / 2 ) with a constant γ = 0, √ n ( U j k − θ j k ) conv erges in distribution to the normal distribution with mean 0 and v ariance 1 12 1 π j + 1 π k . Let µ = γ n − 1 / 2 . Then θ j k = E ( U j k ) = Z { 1 − F k ( y ) } dF j ( y ) = Z { 1 − F j ( y − µ ) } f j ( y ) dy and, under the assumed conditions, dθ j k dµ = Z f j ( y − µ ) f j ( y ) dy → Z f 2 j ( y ) dy . The t wo sample t-test is based on the fact that √ n ( ¯ Y k − ¯ Y j − µ ) conv erges in distribution to the normal distribution with mean 0 and v ariance σ 2 j 1 π j + 1 π k . Hence, the ARE of unadjusted Wilcoxon-Mann- Whitney test ( 5 ) relative to tw o sample t-test is equal to ( Serfling , 1980 , pages 316-318) ARE = lim n →∞ σ 2 j 1 π j + 1 π k dµ dµ 2 dθ j k dµ 2 1 12 1 π j + 1 π k = 12 σ 2 j Z f 2 j ( y ) dy 2 17 (ii) The ARE of adjusted Wilcoxon-Mann-Whitney test ( 6 ) relative to unadjusted Wilco xon-Mann- Whitney test ( 5 ) can b e similarly obtained as ARE = lim n →∞ 1 12 1 π j + 1 π k dθ j k dµ 2 dθ j k dµ 2 1 12 1 π j + 1 π k − φ j k = 1 1 − 12 β ⊤ Σ β . (iii) The ARE in (iii) is equal to the ARE in (i) m ultiplying the ARE (ii). 18
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment