A general framework for modeling Gaussian process with qualitative and quantitative factors

A General F ramew ork F or Mo deling Gaussian Pro cess with Qualitativ e and Quan titativ e F actors ∗ Linsui Deng and C. F. Jeﬀ W u # Sc ho ol of Data Science, The Chinese Univ ersit y of Hong Kong, Shenzhen, China F ebru a r y 19, 2026 Abstract Computer exp erimen ts in v olvin g b oth qualitative and quan titative (QQ) factors ha ve attracted in- creasing atten tion. Gaussian process (GP) models ha ve pro v en eﬀective in this con text b y choosing sp ecialized co v ariance functions for QQ factors. In this w ork, we extend the laten t v ariable-based GP approac h, whic h maps qualitativ e factors in to a contin uous latent space, by establishing a gen- eral framew ork to apply standard k ernel functions to con tin uous laten t v ariables. This approac h pro vides a no v e l p e rs p ect i ve for in terpreting some existing GP mo dels for QQ factors and introduces new cov ariance structures in some situations. The ordinal structure can be incorporated naturally and seamlessly in this framew ork. F urthermore, the Bay esian information criterion and lea v e-one- out cross-v alidat i on are employ ed for mo del selecti on and mo del a veraging. The p erformance of the proposed method is comp r eh en si vely stu di e d on sev eral examples. Keywor ds: Computer exp eriment; Ordinal v ariable s; Uncertaint y quan tiﬁcation; Latent v ariable; Ba y e sian inf or mat i on criterion; Lea ve-one-out cross v alidation # Corresp onding author. Email: jeﬀwu@cuhk.edu.cn ∗ This v ersion is accepted for pub li c ati on in T e chnometrics . 1 1 In tro duction Computer exp eriments ha v e attracted in cr ea si n g atten tion in science, engineering, and busi- ness due to their ability to mo del complex systems. Ho w ever, the high com p u t a ti o n a l cost of running these simulations often necessitates the use of surrogate mo d el s or emulators. Among these, Gaussian pro cess (GP) mo deling has emerged a s a p ow erful approac h b ecause it can appro ximate the b ehavior of simulations a ccu r a t el y and eﬃciently ( San t n er et al. , 20 03 ) . Recen t developmen ts hav e extended GP mo deling to supp ort a v ariet y of input t yp es, such as probability distributions ( Bac ho c et al. , 2017 ) and functio n s ( Li and T an , 2022 ). In many applications, the inputs of a computer exp eriment in v o l ve b oth quan titativ e and qualitative factors, co m m o n l y referred to as QQ inputs. F or instance, in em ban k m ent system design, the inputs include one quan titativ e v ariable (shoulder distance from the cen terline) and three qualitative v ariables (construction rate, Y oung’s mo dulus of columns, and reinforcemen t stiﬀness) ( Liu and Ro we , 2015 ; Deng et al. , 2017 ). Similarly , mo deling the thermal dynamics of a data center requires considerati o n of qualitative fac to r s such as diﬀuser lo catio n , return air ven t lo cation , and rac k heat load non unifo r m i ty , along with quan titativ e factors li ke rac k temp erature rise, rack heat load, and total diﬀuser ﬂo w rate ( Sc hmidt et al. , 2005 ; Qian et al. , 2008 ). These examples show the imp ortance of developing GP mo dels that work well with QQ inp u t s. An essential step of GP mo del i n g is the construction of the co v ariance function. In recen t y ears, a n umb er of co v aria n ce structures hav e b een prop osed and in v estiga t ed to impro v e prediction accuracy for handling computer exp eriments with QQ inputs ( Qian et al. , 2008 ; Zhou et al. , 2011 ; Deng et al. , 2017 ; Zhang et al. , 2020 ; Roustan t et al. , 2020 ; Garr i d o- Merc h´ an and Hern´ andez-Lobato , 2020 ; T ao et al. , 2021 ; Xiao et al. , 2021 ; Lin et al. , 2024 ), as reviewed in Section 2.2 . Nevertheless, there still lacks a general framework that ties these approac hes together. 2 In this pap er , we prop ose a genera l laten t-v ariable-based framework for GP mo deling with QQ inputs, built up on the laten t v ari ab l e approac h prop osed by Zhang et al. ( 2020 ). W e sho w that this framew or k can include many existing cov ariance structures and allows a systematic dev elopmen t of new ones. This is achiev ed by applying diﬀeren t kernel functions to the laten t v ariables, such as Gaussian, exp onential, and linear kernels. W e study identiﬁabilit y conditions for the laten t parameterization for th e se k ernel settings. W e further dem o n st ra t e ho w ordinal information can b e integrated b y imp osing constraints on the latent v ariabl es. Giv en a v ariet y of a v ailable kernel c hoices, we empl oy b oth lea v e-on e -o u t cross-v alidation and the Ba y esian information criterion (BIC ) for mo del selection. Finally , we i ntro duce a BIC-based mod el av eraging strategy to robustly com bine predicti on s from mo dels emp l oying diﬀeren t k ernels. The r em a i n d er of thi s pap er is structured as follows. Section 2 introduces the gener a l framew ork, establ i sh es its connection to existing approaches, and provides the identiﬁabilit y condition for the laten t parameterization. Section 3 describ es ho w to incorp orate ordinal information within the framework. Estimation and prediction pro cedures are presented in Section 4 . In Secti o n 5 , we describ e the mo del selection and mo del a veraging strate gi es . Section 6 pro vides comprehensiv e n u m e ri cal co m p ar i so n s. Section 7 summarizes the ﬁndings and makes further discussions. 2 Metho do lo gy 2.1 F ramew ork Consider a problem in whic h the resp o n se Y ( U , V ) has t w o types of inputs: U = ( u 1 , · · · , u I ) ⊤ are qua ntitativ e factors and V = ( v 1 , · · · , v J ) ⊤ are qua l i t ati ve factors, with each v j p ossess- ing a j lev els, i.e., v j ∈ { 1 , 2 , · · · , a j } . W e m o del the r esp onse using a GP , which is expr esse d 3 as Y ( U , V ) = µ + G ( U , V ) , where µ represents t h e constant mean term and G ( U , V ) is a zero-mean GP . The primary goal is to mo del the cov ariance b etw een resp onses Y ( U , V ) and Y ( U ′ , V ′ ) corresp onding to t w o d i st i nct inputs ( U , V ) and ( U ′ , V ′ ). By utilizing the cov ariance kernel, we can make predictions for new inputs ( San tner et al. , 2003 ). Existing approac hes focu s on choosing or prop osing co v ari a n ce structures that hav e some of the attributes: in tuitive, in terpretable, or computationally eﬃcient ( McMillan et al. , 1999 ; Qian et al. , 2 00 8 ; Zhou et al. , 2011 ; Deng et al. , 2017 ). Motiv ated by the idea that qualitative v ariables can b e represen ted b y some underlying numerical v al u e s, Zhang et al. ( 2020 ) prop osed that t h e j -th q u a l i t at i ve factor v j corresp onds to a latent v ector z ( j ) v j ∈ R l j , where 1 ≤ l j ≤ a j . F ollowing this form ulation, w e deﬁne the concatenated latent vector Z V ∈ R P J j =1 l j as Z ⊤ V =  ( z (1) v 1 ) ⊤ , ( z (2) v 2 ) ⊤ , · · · , ( z ( J ) v J ) ⊤  . Using this framew ork, we c an state that the resp onse Y ( U , V ) for input ( U , V ) follows the same distri b u t i o n a s th e re sp onse Y for input ( U , Z V ), i.e., Y ( U , V ) d = Y ( U , Z V ) . Giv en the GP assumption, this distributio n al equiv alen c e holds if and only if their co v ariance functions are identical. Therefo r e, w e pro p ose to mo del t h e co v ariance fu n ct i o n of Y ( U , V ), the original pro cess of interest, using that of the contin uous i n p ut ( U , Z V ) as follows Co v { Y ( U , V ) , Y ( U ′ , V ′ ) } =Co v { Y ( U , Z V ) , Y ( U ′ , Z ′ V ) } = σ 2 Corr { Y ( U , Z V ) , Y ( U ′ , Z ′ V ) } = σ 2 K U ( U , U ′ ) K Z ( Z V , Z ′ V ) , (1) where σ 2 denotes the v ariance, and K U ( · , · ) and K Z ( · , · ) are resp ectively k ernel functions for 4 the quantitativ e factors and latent vectors a sso ciated with the qual i t a ti ve factors. The last iden tit y in ( 1 ) is based on the assumption that the eﬀects of U and V on Y can b e factorized. Both kernels satisfy the normalizatio n condi ti o n K U ( U , U ) = 1 and K Z ( Z , Z ) = 1 for all U ∈ R I and Z ∈ R P J j =1 l j . This framework is ﬂ e xi b l e b ecause it can accommo date v arious k ernel functions to capture diverse patter n s. While the assu m p t i on of latent v ectors ma y initially app ear restrictive, we show that this mo deling framework integrates numerous established approac h e s ( Qian et al. , 2008 ; Deng et al. , 2017 ; Zhang et al. , 2020 ; T ao et al. , 2021 ) as sp eci al cases. F urthermore, its inherent generality oﬀers p otential for further m e th o dological adv ancemen ts and applications. 2.2 Connection with existing approaches In the following, we provide a deta i l ed discussion to establish connecti o n s be tw een th e frame- w ork a n d some exi st i n g metho ds in the literature. Multiplic ative Line ar Kernel. By imp osing a m ultiplicativ e structure among qualitative v ariables and adopt i n g the linear kernel for contin uous v ariables ( Ro jo- ´ Alv arez et al. , 2018 ), the correlation deﬁned by the l a te nt vectors is K Z ( Z V , Z V ′ ) = J Y j =1 K j  z ( j ) v j , z ( j ) v ′ j  = J Y j =1  z ( j ) v j  ⊤ z ( j ) v ′ j . (2) • Case I ( l j = a j ). The cov ariance structure prop osed by Qian et al. ( 2008 ) is d e ﬁ n ed as Co v { Y ( U , V ) , Y ( U ′ , V ′ ) } = σ 2 ( J Y j =1 τ ( j ) v j ,v ′ j ) K U ( U , U ′ ) , (3) where  τ ( j ) v ,v ′  a j × a j are J semi-p osit i ve d e ﬁ n i te m a t r i ces with unit diagonal elements (SPDUDE). Here, τ ( j ) v ,v ′ represen ts the correlation b etw een levels v and v ′ for the j th 5 qualitative factor. By using the Cholesky decomp osition ( Pi n h ei r o and Bates , 1996 ), w e can represen t τ ( j ) v j ,v ′ j as the pro duct of tw o column vectors, i.e., τ ( j ) v j ,v ′ j =  z ( j ) v j  ⊤ z ( j ) v ′ j . (4) In this wa y , the quali t at i ve part in ( 3 ) can b e equiv alen tly expressed by ( 2 ). In addition, Zhou et al. ( 2011 ) suggested using h yp erspherical p a r am e te ri z at i o n to simpli fy the computations. • Case I I ( l j < a j ). Roustan t et al. ( 2020 ) and T ao et al. ( 2021 ) explored the case where the length of the latent vector, l j , is shorter than the n um b er of lev els, a j , to imp ose a lo w-rank structure. This approac h signiﬁcantly reduces the n u mb er of h yp erparameters and hence a l lev i a te s the estimat i o n bur d en . Building on this, T ao et al. ( 2021 ) further in tro duced h yp ersph e ri c al expressions to address computational c halleng es. Ho w ev er, the issue of identiﬁabilit y was not th o r ou g h l y discussed (see Section 2.3 ). • Case I I I (restricted correlation matrix). T o simplify the complexity of the correlation matrix, one m ay assume sp eciﬁc struct u re s, such as equal correlation considered in Qian et al. ( 2 0 08 ) . In our framework, this assumption can b e transformed in to restrictions on the laten t v ariables. Sp eciﬁcally , it corresp onds to a sp ecial case where indep enden t noise is p ermitted , an d one-dimensional latent v ect or s with equal elements are assigned. A dditive Line ar Kernel. Deng et al. ( 2017 ) prop osed to mo del cov ariance through im- p osing an additive structu re for qu a l i t at i ve facto r s, whic h is th en m ulti p l i ed b y t h e correla- tion attributed to quantitativ e factors. Our framework has a direct connection with theirs. Supp ose K Z ( · , · ) i s the k ernel function mo diﬁed from a ﬁrst-order additi ve GP p r o cess for con tin u ou s v ariables ( Plate , 1999 ; Duvenaud et al. , 2011 ) , whose i n d i v i d u a l comp onents em- plo y a linear k ernel. T o establish this conn ect i o n , we construct latent vectors satisfying ( 4 ). 6 The co v ar i a n ce in ( 1 ) b eco m es σ 2 J X j =1 K j  z ( j ) v j , z ( j ) v ′ j  K U ( U , U ′ ) = J X j =1 ψ j σ 2   z ( j ) v j  ⊤ z ( j ) v ′ j  K U ( U , U ′ ) = J X j =1 σ 2 j τ ( j ) v j ,v ′ j K U ( U , U ′ ) , (5) where ψ j denotes t h e weigh t satisfying P J j =1 ψ j = 1 and σ 2 j represen ts the v ariance asso ciated with the j th quali t a t i ve factor . The cov ariance in equation ( 5 ) is a sp ecial case of the co v ariance structure prop osed by Deng et al. ( 2017 ), which has the form J X j =1 σ 2 j τ ( j ) v j ,v ′ j K U ,j ( U , U ′ ) . (6) In t h ei r formulation, the kernel function K U ,j ( U , U ′ ) v aries across diﬀeren t quantitativ e factors j . In co ntrast, our approach uses a ﬁxed kernel functio n for all quan tit a t i ve factors. Multiplic ative Gaussian Kernel. Zhang et a l . ( 2020 ) p r o p osed and ev aluated the kernel functions with a multiplicativ e structure and Gaussian kernel, which is deﬁned a s K Z ( Z V , Z V ′ ) = J Y j =1 exp  −    z ( j ) v j − z ( j ) v ′ j    2  . Although recognizin g the p ot ential to enhan ce its general i ty using other kernels, su ch as p o w er exp onen t i a l , Mat ´ ern, and lifted Brownian k ernels, they did not go further to develop details for v arious choices of kernels. T o apply other kernels, comprehensive inv estigations and empirical v alidations are essen tial to ensure their applicabilit y in real-world appli c at i o n s. F or instance, the linear kernel has distinct identiﬁabilit y conditions and requires addition a l constraints compared to the Gaussia n k ernel, as will be presen ted in Secti o n 2.3 . Pr e-sp e ciﬁe d L atent V ariable. A p opu l ar approach for mo deling GP with QQ factors is 7 to enco de qualitative v ariables in to numerical vectors. F or instance, Garrido -Me rch´ an and Hern´ andez-Lobato ( 2020 ) used one-h ot enco ding vectors of length a j , where the v t h element is 1 ( v j = v ). In another instance, Luo et a l . ( 2024 ) a d op t ed similarity enco ding ( Cerda et al. , 2018 ), which constructs feature vectors f ro m pairwise similari t i es b etw een qualitative v ariable levels. Once enco ded, standard kernels for con ti nuous v ariables can then b e applied. Therefore, these tw o metho ds can b e view ed a s spec i al cases of the gener a l framew ork. Non-sep ar able Kernel. As sho wn in ( 6 ), Deng et al. ( 2 017 ) suggested that th e smo othness parameters of quantitativ e factors may v ary across diﬀerent qualitat i ve factors. In addition, Xiao et al. ( 2021 ) assumed that these parameters v ary across diﬀeren t lev els within the same qualitative factor. Lin et al. ( 2024 ) partitioned the input space in to several non-ov erlapping regions via a qualit at i ve-factor-based tree and ﬁtted a separate GP for each region. These metho ds cou l d , in principle, b e accommo dated in our framework by relaxing the factorization assumption in ( 1 ) and capturin g interactions b etw een QQ factors using non-separable k ernels, suc h as higher-order additive GP ( Duv enaud et al. , 2011 ) or tree GP ( Gramacy and Lee , 2008 ). While such extensions are feasible, the need for careful k ernel sp eciﬁcation and the substan tially greater mo deling c om p l exi ty would li kely l i m i t their p ra ct i ca l b eneﬁts. T o summarize, the relev an t metho d o log i es fall into three categories: (i) data-driven latent v ariables, summarized in T able 1 ; (ii) presp eciﬁed latent v ariables, such as one-hot and similarity enco ding; and (iii) approaches that mo del interactions betw een QQ factors. The ﬁrst t w o categorie s can b e readi l y interpreted within the prop osed framework, whereas the third cannot b e fully explained due to the presence of interactions. Finally , another li n e of researc h adopts a matrix-ﬁrst p ersp ective b y struct u r al l y parame- terizing the correlation matr i x of the qualitative factors  τ ( j ) v ,v ′  , which directly extends Qian et al. ( 2008 ). Roustant et al. ( 2020 ) emp l oy ed blo ck-structured correlation matrices to im- p ose group-level structure on the correlations among lev els of eac h qualitative input. Sa v es 8 Kernel F unctions T yp e linear equal Gaussian exp onen tial linear full-dim correlation lo w-dim lo w-dim lo w-dim m ultiplicativ e ▲ # ▲ ■ □ ■ □ additive △ △ T able 1: Com p a ri s on of metho ds expl i ci t l y incorp orating sp eciﬁc k er n el functions and st r u c- tures. W e consider b oth multiplicativ e and additive relationsh i p s b et w een qualitative v ari- ables. Diﬀerent col or s represent di ﬀer ent metho ds: indicates our framework; ▲ corre- sp onds to the metho d prop osed by Qian et al. ( 2008 ); # refers to the metho d b y Zhou et al. ( 2011 ); △ represents the approach b y Deng et al. ( 2017 ); ■ and □ denote la t ent v ariable- based approaches studied in Zhang et al. ( 2020 ) and T ao et al. ( 2021 ) , resp ectiv ely . et al. ( 2023 ) const r u ct ed correlation matrices through generalized contin uous exp onen tial k ernels, which hav e a generalized form and include the con tin uous relaxation and Go w er distance approac hes as sp ecial cases. Our approac h adopts a diﬀerent p ersp ective by using laten t representations for the quali ta t i ve factors and ind u ces correlations implicitly through standard kernels deﬁned on the l a t ent space. 2.3 Iden tiﬁability Since the relationship b etw een any inputs can b e fully c har acter i ze d b y ( 1 ), it is cru ci a l to examine the conditions on latent vectors and kernel functions that guarantee uniqueness. T o address this, we formalize the concep t of par am e te ri z at i o n equiv alence. Deﬁnition 1 (P arameterization Equiv alence) . We say the latent p ar ameterization { Z V } under the kernel K Z ( · , · ) and the latent p ar ameterization { W V } under the kernel K W ( · , · ) ar e e quivalent if K Z ( Z V , Z V ′ ) = K W ( W V , W V ′ ) for al l V , V ′ ∈ × J j =1 { 1 , 2 , · · · , a j } . T o ensure the cov ariance in ( 1 ) is w ell de ﬁ n ed , K Z ( · , · ) is requi r ed to b e a Mercer kernel ( Mercer , 1909 ; Bac h and Jordan , 2002 ). Sp eciﬁcally , K Z ( · , · ) is a function fro m R P J j =1 l j × R P J j =1 l j to R , and for any n inputs Z 1 , · · · , Z n , the matri x ( K Z ( Z i , Z j )) n × n m ust b e p ositive 9 semideﬁnite. Common exa m p l es of Mercer kernels include the Gaussian, exp onential, and Mat ´ ern kernels. F urthermore, when the kernel exhibits certain separability prop erties, the linear kernel is very ﬂexible and cap ab l e of repre senting a wide range of kernels. Theorem 1. Supp ose the structur e b etwe en diﬀer ent qualitative variables is either multi- plic ative K Z ( Z V , Z V ′ ) = J Y j =1 K Z j  z ( j ) v j , z ( j ) v ′ j  (7) or additive K Z ( Z V , Z V ′ ) = J X j =1 ψ j K Z j  z ( j ) v j , z ( j ) v ′ j  (8) with P J j =1 ψ j = 1 . Ther e always exists a latent p ar ameterization { W V } with w ( j ) v j ∈ R a j under the kernel K W ( · , · ) with K W j ( · , · ) b eing the line ar kernel, that is e quivalent to the latent p ar ameterization { Z V } with z ( j ) v j ∈ R l j under the kernel K Z ( · , · ) with K Z j ( · , · ) b eing any Mer c er kernel. Remark 1. When the kernel is non-sep ar able, al l qualitative variables ar e inte gr ate d into a single variable with Q J j =1 a j levels, wher e e ach level c orr esp onds to a unique c ombination of the original qualitative variables. In this way, non-sep ar able ker nels c an b e r epr esente d by some latent p ar ameterization under line ar kernels by applying The or em 1 . Theorem 1 and Remark 1 hol d b ecause w e use a linear kernel for the laten t represen tation { W V } . The pro of of Th eo r em 1 is provided in the App endix. The reverse of Theorem 1 do es n o t hold . F or in st an ce , some latent parameterizations under the linear kernel cannot b e represented by t h e Gaussian kernel b ecause the linear kernel can accommo date negativ e correlations, whereas the Gaussian k ernel is restricte d to mo deling p ositive correlations only . In other w ords, when l j = a j , the linear k ernel exhibits greater ﬂexibilit y compared to other kernels. Remark 1 addresses the case wh er e kernels among qu a l i t at i ve v ariables are 10 non-separable. A similar approach was prop osed in Ou n e and Bostanabad ( 2021 ) using the Gaussian k ernel. Ho w ev er, the i n cr ea sed num b er of levels results in more parameters, whic h mak es parameter estimation more di ﬃ cu l t . Although such a reparameterization alwa ys exists, if th e true underlying k ernel structure follo ws or appro ximately follo ws a Gaussian k er nel wi t h l j < a j , it may b e p ossible to reco v er the structure using few er parameters, whic h can reduce redundancy and v ariability while improving computational eﬃ ci e n cy . With a ﬁxed kernel function K Z ( · , · ), diﬀerent laten t vectors can pro duce identical co- v ariance structures. Hence, examining th e uniqueness of latent v ectors and identifying the essen tial comp onen ts that deﬁne the co v ariance is imp ort a nt. This identiﬁabilit y issu e is dep endent on the sp eciﬁc kernel functi on emplo yed. T o b egin with , understanding the parameterization probl e m from a geometric p ersp ectiv e pro vides v aluable insights. F or the linear k ernel, equiv alence under or th o g on a l transforma- tions corresp on d s to isometries in the inner pro duct s p ace , where angles and distances a r e preserv ed. Rotat i on s ab out the origin and reﬂections across a ny plane passing through the origin maintain the relationships b et w een vectors. F or Gaussi an kernels (la t er extended to isotropic k ernels), equiv alence corresp onds to isometries in the d i st a n ce space, where dis- tances remain in v ariant. In additi on to rotati o n and reﬂectio n , translations also preserve the relativ e distances b etw een vectors. Belo w, we provide an identiﬁabilit y condition for the lin ear and isotropic kernels when I = 0 and J = 1, where I is the t ot a l n um b er of quantitativ e facto r s and J is the total num b er of qualitative factors. An isotropic kernel is a kernel function K ( x , x ′ ) that dep ends only on the relative Euclidean distance b etw een tw o inputs, i.e., K ( x , x ′ ) = K ( d ), where d = ∥ x − x ′ ∥ 2 and kernel generating function K ( · ) is a nonnegative and monotonical l y decreasing. Here, w e tak e I = 0 (i.e., there is n o quan titativ e factor) b ecause the identiﬁabilit y issue arises on l y 11 from the latent represen tations asso ciated with qualitative factors. F or the case J > 1, under the m ultiplicative structure in ( 7 ) or the additive structure in ( 8 ) considered in Theorem 1 , iden tiﬁabilit y can b e established by examin i n g eac h qual i t a t i ve factor individually . Prop osition 1 (Identiﬁabilit y for linear kernel) . Consider the latent p ar ameterization { Z V } such that  z (1) 1 , · · · , z (1) l 1  is ful l r ank and ∥ z v ∥ 2 = 1 for al l 1 ≤ v ≤ a 1 . (a) Ther e always exists a unique p ar ameterization { W V } satisfying w ( j ) v ,l = 0 for al l l > v and w ( j ) v ,v > 0 for 1 ≤ v ≤ l 1 that is e quivalent to { Z V } under the line ar kernel. (b) { W V } c an b e deploye d to hyp erspheric al c o or dinates thr ough w (1) v ,l = cos  θ (1) v ,l  l − 1 Y ι =1 sin  θ (1) v ,ι  for 1 ≤ l < l 1 − 1 and w (1) v ,l = l Y ι =1 sin  θ (1) v ,ι  for l = l 1 − 1 , with the c onstr aint 0 ≤ θ (1) v ,l ≤ π for 1 ≤ l < min( v , l 1 ) , 0 ≤ θ (1) v ,l 1 − 1 ≤ 2 π for v ≥ l 1 and θ (1) v ,l = 0 for v ≤ l , wher e v ∈ { 1 , · · · , a 1 } . Prop osition 2 (Identiﬁabilit y for iso t r op i c k ernel) . Consider the latent p ar ameterization { Z V } such that  z (1) 2 − z (1) 1 , · · · , z (1) l 1 +1 − z (1) 1  is ful l r ank. Ther e always exists a unique p ar ameterization { W V } satisfying w (1) v ,l = 0 for al l l ≥ v and w (1) v ,v − 1 > 0 for 2 ≤ v ≤ l 1 + 1 that is e quivalent to { Z V } under the isotr opic kernel. The pro ofs of Prop ositions 1 and 2 are pro vided in the App endix. F or the linear kernel, the hyp er sp h er i ca l co ordinate transformation simpliﬁes the optimization pro cess sub ject to the norm constraint f or the laten t vectors (see Lemma S1). Comp ared to Z h ou et al . ( 2011 ), w e allow l j ≤ a j whereas they restrict l j = a j . Consequen tly , addit i on a l attention is paid to determine th e range of the ang l es. Speci ﬁ ca l l y , their range of angle s is [0 , π ], whereas we require the addit i on a l condition 0 ≤ θ (1) v ,l 1 − 1 ≤ 2 π for v ≥ l 1 . Note that Zhang et al. ( 2020 ) and Y erramilli et al. ( 2023 ) claimed that the Gaussi a n k ernel is in v aria nt under translation and rotation. How ev er, as stated i n Prop osition 2 , to ensure uniqu en ess also requires transla t i on 12 in v arian ce . F or instance, when l j = 2 and a j > 3, we require w (1) 3 , 2 > 0. Finally , our approach can extend the scop e to en c om p a ss an y isot ro p i c k ernel. 3 Ordinal v ariable F or ordinal v ariables, adjacent lev els are generally ex p ected to exhibit cl o ser relationship s . Luo et al. ( 2024 ) treats ordi n a l v ar i a b l es as nominal when the num b er of levels is small, and as contin uous v ariable s constrained to integer v alues when large. Thi s empirical rule is straightforw ard to apply but may either o v erl o ok the underlying ordinal structure or b e applicable only when the ord i n al v ariable takes integer v alues. Qian et a l . ( 2008 ) prop osed t w o approac hes for mo deling ordinal correlations: one applies constraints to the correlation matrix, while the other tr a n sfo rm s the ordinal scale into a con tin uous v ariable an d deﬁnes correlations based on the transformed v alues. Although the tw o approaches p r ovide v alu- able conceptual sc hemes, they do not oﬀer sp eciﬁc algorithms for practical implementation. Roustan t et al. ( 2020 ) dev elop ed up on th e second approach of Qian et al. ( 2008 ) b y ap p l yi n g a cosine kernel to the distances b etw een transformed v alues. How ever, their metho d handles nominal and ordinal v ariables separately and regards them as tw o distinct t yp es. Our framework i s directly applicabl e to ordinal v ariables by imp osi n g order constraints on latent v ariables. Sp eciﬁcally , for the isotropic k ernel, we consider a on e- d i men si o n al laten t v ector z ( j ) (or scal a r latent v ariable; we use the sam e notatio n for simplicity) and require that 0 = z ( j ) 1 , 1 ≤ z ( j ) 2 , 1 ≤ · · · ≤ z ( j ) a j , 1 . T h i s constraint ensures that the relati ve distances b etw een the latent v ariables preserve th e ordinal information. As discussed in Section 2.3 , under the linear kernel, th e correlation b etw een t w o latent vectors is de te rm i n e d by the angle b etw een them. This observ ation motiv ates our use of angles in h yp erspherical co ordinates to enco de ordinal information. Sp eciﬁcally , w e deﬁ n e z ( j ) v , 1 = cos  θ ( j ) v , 1  and z ( j ) v , 2 = sin  θ ( j ) v , 2  with the 13 constraint 0 = θ ( j ) 1 , 1 ≤ θ ( j ) 2 , 1 ≤ · · · ≤ θ ( j ) a j , 1 ≤ π , wher e θ ( j ) v , 1 denotes the angles asso ciated with the v th levels of the j th qualitative v ariable. This transformation eﬀectively captures one-dimensional ordinal str u ct u r e within a t w o-d i m en si o n al laten t space. T o simplify optimization under ordinal constraints, we reparameterize the ordinal struc- ture using non-negative increments. S p eciﬁcally , for the isot r o p i c k ernel, we deﬁne z ( j ) v , 1 = P v ι =1 ∆ ( z ,j ) ι , where ∆ ( z ,j ) 1 = 0 and ∆ ( z ,j ) ι ≥ 0, 2 ≤ ι ≤ a j . In this case, the ordinal constrai nt is transformed in to a b ox-constrained optimization probl em ( Carp en ter et al. , 2017 ), which can b e eﬃcien tl y solv ed by the L-BF GS-B algorithm ( Byrd et al. , 1995 ). F or n ot a t i on con- sistency , we represen t θ ( j ) v , 1 = P v ι =1 ∆ ( θ ,j ) ι for the linear k ernel, where ∆ ( θ ,j ) 1 = 0, ∆ ( θ ,j ) ι ≥ 0, 2 ≤ ι ≤ a j , and θ ( j ) a j , 1 = P a j ι =1 ∆ ( θ ,j ) ι < π . The parameters are o p ti m i z ed using an adaptive barrier algorithm ( Lange , 1999 , Chapter 16.3). A summary of the reparameterizations, along with the identiﬁabilit y conditi o n s and th e n umber o f parameters, is shown in T able 2 . As shown i n T able 2 , the latent represen tations { Z V } are uniquely determined b y these reparameterized parameters, pro vided under the identiﬁabilit y condition s. F or conv enience, w e collectively denot e these reparameterized para m et er s as { Ω V } , with sp eciﬁc fo r m s based on the kernel and v ariabl e type: for a linear k ernel,  Ω V  =  θ ( j ) ν,ι  (nominal) or  Ω V  =  ∆ ( θ ,j ) ι  (ordinal); for an isotrop i c kernel,  Ω V  =  Z V  (nominal) o r { Ω V } =  ∆ ( z ,j ) ι  (ordinal). Imp osing the iden tiﬁability con d i ti o n s is achiev ed b y restricting { Ω V } to a sp eciﬁc region M Ω . It is helpful to illustrate how diﬀerent kernel choices for qualita t i ve v ariables imp ose fundamen tally diﬀerent low-dimensional str u ct u r es. Consider a simpl e scenar i o with a singl e ordinal qualitative v ariable having a 1 = 3 lev els a n d l a t ent dimens i on l 1 = 1. Let τ (1) v ,v ′ = K Z  z (1) v , z (1) v ′  denote the k ernel-induced correlation b etw een lev els v and v ′ of the qualitative v ariable. W e reparameteri ze the latent embedd i n g s via non-n e ga t i ve increments as θ (1) 1 , 1 = ∆ ( θ , 1) 1 = 0, θ (1) 2 , 1 = ∆ ( θ , 1) 2 , θ (1) 3 , 1 = ∆ ( θ , 1) 2 + ∆ ( θ , 1) 3 , and similarly z (1) 1 , 1 = ∆ ( z , 1) 1 = 0, z (1) 2 , 1 = ∆ ( z , 1) 2 , 14 Kernel V ariable K j  z ( j ) v , z ( j ) v ′  Reparameterization of Identiﬁabilit y Condition Num. of Para. Type Type z ( j ) v,l M Ω P j ( l j ; a j ) Linear Nominal  z ( j ) v  ⊤ z ( j ) v ′ cos  θ ( j ) v,l  l − 1 Y ι =1 sin  θ ( j ) v,ι  for 1 ≤ l < l j − 1 l Y ι =1 sin  θ ( j ) v,ι  for l = l j − 1 0 ≤ θ ( j ) v,l ≤ π for 1 ≤ l < min( v , l j ) 0 ≤ θ ( j ) v,l j − 1 ≤ 2 π for v ≥ l j θ ( j ) v,l = 0 for v ≤ l ( a j − 1)( l j − 1) Ordinal ( l j = 2)  z ( j ) v  ⊤ z ( j ) v ′ cos v X ι =1 ∆ ( θ ,j ) ι ! for l = 1 sin v X ι =1 ∆ ( θ ,j ) ι ! for l = 2 ∆ ( θ ,j ) 1 = 0 ∆ ( θ ,j ) v ≥ 0 0 ≤ a j X ι =1 ∆ ( θ ,j ) ι ≤ π a j − 1 Isotropic Nominal K j     z ( j ) v − z ( j ) v ′    2  z ( j ) v,l z ( j ) v,v − 1 > 0 z ( j ) v,l = 0 for l ≥ v (2 a j − l j − 1) l j / 2 Ordinal ( l j = 1) K j     z ( j ) v − z ( j ) v ′    2  P v ι =1 ∆ ( z ,j ) ι for l = 1 ∆ ( z ,j ) 1 = 0 ∆ ( z ,j ) v ≥ 0 a j − 1 T able 2: Summary of reparamet er i zat i o n s { Ω V } and ranges M Ω for diﬀerent qualitative v ariable t yp es under diﬀerent kernel structures. z (1) 3 , 1 = ∆ ( z , 1) 2 + ∆ ( z , 1) 3 . Under the linear kernel, the correlations satisfy τ (1) 1 , 3 = τ (1) 1 , 2 τ (1) 2 , 3 −  1 − ( τ (1) 1 , 2 ) 2  1 / 2  1 − ( τ (1) 2 , 3 ) 2  1 / 2 , which arises from the cosine law on the unit circl e in tw o- dimensional space. In con trast , fo r the Gaussian kernel, the correlation must satisfy τ (1) 1 , 3 = τ (1) 1 , 2 τ (1) 2 , 3 exp  − 2(log τ (1) 1 , 2 ) 1 / 2 (log τ (1) 2 , 3 ) 1 / 2  , which follows from solvin g ∆ ( z , 1) 2 and ∆ ( z , 1) 3 giv en τ (1) 1 , 2 and τ (1) 2 , 3 , and then substituting them in to the expression for τ (1) 1 , 3 . These expressions rev eal that even if the pairwise similarities b etw een adjacent levels are identical, the implied similarity b etw een no n -a d ja cent lev els can diﬀ er substantially dep ending on the k ernel c hoice. As a r esu l t, whil e Roustan t et al . ( 2020 ) recommended the linear k ernel to capture p otential negativ e correlation and ordinal inform at i o n si multaneously , th i s choice may b e sub optimal if Gaussian or other kernels are more suitable for capturin g the underlying structure. In Section 5 , w e will explore mo del selection and mo del av eraging techniques to determine the appropriate kernel for analysis a n d to combine p red i ct i o n s for enha n ced p er fo r m an c e. 15 4 Estimation and prediction As shown in ( 1 ), the k ernels for the q u antitativ e and qualitative factors are user-sp eciﬁed. By following standard practi c e, w e e m p l oy th e Gaussian k ernel for the quantitativ e factors. Denote the unknown parameters with i n the kernel fun ct i o n as Φ . W e ha v e K U ( U , U ′ ) = K U ( U , U ′ | Φ ) = exp ( − I X i =1 ϕ i ( u i − u ′ i ) 2 ) , where Φ = ( ϕ 1 , ϕ 2 , · · · , ϕ I ). Other p opular kernels, suc h as the exp onential kernel and the Mat ´ ern k ernel, can also b e used. In the follo wing, w e fo cus on the cate go r y of data-dr i ven latent v ariables describ ed in Section 2.2 b ecause the estimated latent v ariables provide v alu ab l e i n si g hts in to the si m i l a ri - ties b etw een diﬀeren t levels. Some of the suggested k ernels for qual i t a ti ve v ariables are listed in T able 1 . W e examine combinations of the multiplicativ e or additive structure, as deﬁned in ( 7 ) and ( 8 ), with diﬀerent choices of K Z j ( · , · ), including Gaussian, exp onential, and linear k ernels. W e consider one an d t w o d i m en si o n s for the Gaussian and exp onential kernels, and t w o and three dimen si o n s for the linear kernel. When or d i n a l info r m at i o n is present, we can also incorp orate the corresp onding metho d for comparison. Estimation. Supp ose there are n resp onse v alues Y = ( Y 1 , · · · , Y n ) ⊤ corresp onding to input v alues D =  ( U 1 , V 1 ) , ( U 2 , V 2 ) , · · · , ( U n , V n )  . The log-likelihoo d function up to an additive constant is l  µ, σ 2 , Φ , { Ω V }  = − 1 2 n n log ( σ 2 ) + log | R | + ( Y − µ 1 ) ⊤ R − 1 ( Y − µ 1 ) /σ 2 o , (9) where 1 is a vector of length n with elements one, | · | is the determinant, and { Ω V } are reparameterized p a ra m et er s. The laten t parameterization { Z V } is calculated usi n g the in- 16 formation in column 4 of T able 2 . Then, R is the correlation matr i x whose ( i, j )th element is calculated through K U ( U i , U j | Φ ) K Z ( Z V i , Z V j ). In our exp erim e nts, the computation of R − 1 is somet i m es numerically unstable due to ill-conditioning. F ollo wing P eng and W u ( 2014 ), we u se a nugget term in kriging to improv e its conditioning, which adds a small p ositive constan t to the di ag o n al o f R and ensur e that t h e smal l est ei ge nv alue is ab ov e a prescrib ed threshold ϵ . Sp eci ﬁ ca l l y , we consider a sequence of candidate thresholds, e.g., ϵ ∈ { 10 − 1 , 10 − 2 , · · · , 10 − 8 } , select the v alue that minimizes the negativ e log-lik eliho o d, and then rep l a ce R by R + δ I n , where δ = max { 0 , ϵ − λ min ( R ) } , with λ min ( R ) deno ti n g the smallest eigenv alue of the original corr el at i o n matrix, and I n ∈ R n × n b eing the i d e ntit y matrix. Given Φ and { Z V } , b µ and b σ 2 are estimated by b µ =  1 ⊤ R − 1 1  − 1 1 ⊤ R − 1 Y and b σ 2 = ( Y − b µ 1 ) ⊤ R − 1 ( Y − b µ 1 ) /n. Plug the estimated mean and v ariance in to the log-likelihoo d functio n , and then the remain- ing parameters are estimated through  b Φ , { b Ω V }  = arg min Φ , { Ω V ∈M Ω }  n log ( b σ 2 ) + log | R |  , where M Ω denotes the region describin g iden tiﬁ ab i l i ty cond i t i on s , as provided i n T able 2 . Prediction. W e can p erform predicti o n and interpolation in the same manner as in ordinary krigin g for quantitativ e-only v ariables. Denote the estimated parameters as b µ , b σ 2 , b Φ , and { b Ω V } . Then, estimated laten t representations { b Z V } are o b t ai n ed via using the information in column 4 of T able 2 . The estimat ed cor r el a t i on matrix, b R , has its ( i, j )th en try K U ( U i , U j | b Φ ) K Z ( b Z V i , b Z V j ). F or a new input ( U ⋆ , V ⋆ ), w e predict i ts resp onse and 17 corresp onding v ariance as follows: b Y ( U ⋆ , V ⋆ ) = b µ + b r ⊤ b R − 1 ( Y − b µ 1 ) and s 2 ( U ⋆ , V ⋆ ) = b σ 2 ( 1 − b r ⊤ b R − 1 b r + ( b r ⊤ b R − 1 1 − 1) 2 1 ⊤ b R − 1 1 ) , where b r = ( r 1 , · · · , r n ) with r i = K U ( U i , U ⋆ | b Φ ) K Z ( b Z V i , b Z V ⋆ ). 5 Mo del selection and mo del a v erag ing The general fra m ework can induce diﬀerent mo dels b y selecting di ﬀer ent kernels and v arying the dimension of laten t spaces (see T able 1 ). Supp ose there are K candidate mo dels, denot ed b y {M 1 , · · · , M K } . A natura l question then arises: H ow should we d et er m in e which kernel to use? Motiv ated by the mo del selection problem in GP with quantitativ e inputs, w e prop ose tw o t yp es of criteria to address this questi o n . The ﬁrst typ e utilizes leav e-one-out cross-v alidation (LOOCV), whic h h a s b een co m m on l y u sed for kernel selection in Gau ssi a n pr o cesses with quan titativ e inputs ( Dubrule , 1983 ; Rasmussen and Williams , 2006 ) and for simulator selec- tion ( Hung et al. , 2023 ). The second type emplo ys the Ba yesian information criterion, BIC ( Sc h warz , 1978 ), whic h has b een shown to provide satisfactory selection p erformance in a diﬀeren t con text ( Chen et al. , 2024 ). 5.1 LOOCV-based mo del select io n The LOOCV pro cedure pro ceeds as follo ws. The leav e-one-out prediction in v olves ﬁtting the mo del while leaving o n e observ ation out, then calcu l a t i n g the error based on the mo del’ s p erformance when m ak i n g predictions using the ﬁtted mo del. W e calculate the LOOCV score for input i of each mo del M k in the prediction step, rather than in the estimation 18 step. Sp eciﬁcall y , we calculate the LOOCV score of M k as follo ws: b S M k = 1 n n X i =1 b S M k , ( i ) , with b S M k , ( i ) = L  Y i ; b µ M k , ( i ) , b σ 2 M k , ( i )  , where b µ M k , ( i ) and b σ 2 M k , ( i ) are estimate d mean and v ariance using D \{ ( U i , V i ) } . The function L ( y ; µ, σ 2 ) denotes a user-deﬁned marginal error measure for an observ ation y under the assumption that y follows a no r m al distribution wit h mean µ and v ariance σ 2 . According to Rasm ussen and Williams ( 2006 ), the leav e-one-out estimated m e an and v aria n ce hav e closed-form solutions, given by b µ M k , ( i ) = Y i −  b R − 1 Y  i /  b R − 1  ii and b σ 2 M k , ( i ) = 1 /  b R − 1  ii , where ( · ) i is the i th element of v ector an d ( · ) ij is the ( i, j )th element of m a t ri x, res p ectiv ely . W e then choose the model minimizing the LOOCV score. In practice, w e hav e t wo metho ds using diﬀerent error meas u re m ents L ( y ; µ, σ 2 ): 1. The LOOCV log − lik uses the negative lo g- l i keliho o d of the normal distribution, deﬁned as L ( y ; µ, σ 2 ) = { log(2 π σ 2 ) } / 2 + ( y − µ ) 2 / 2 σ 2 . 2. The LOOCV l 2 measures the l 2 loss b et w een the predi ct ed v alue b µ M k , ( i ) and the observ ed v alue Y i , deﬁned as L ( y ; µ, σ 2 ) = ( y − µ ) 2 . 5.2 BIC-based mo del selection The BIC MSel metho d selects the mo del that mi n i m i zes the BIC criterion as the ﬁnal mo del. When the multiplicativ e structure is assigned b etw een d i ﬀer ent qualitative v ariables, the 19 BIC is deﬁned as d BIC M k = − 2 l  b µ, b σ 2 , b Φ , { b Ω V }  + 2 + I + J X j =1 P j ( l j ; a j ) ! ln( n ) , (10) where b µ, b σ 2 , b Φ and { b Ω V } , 2 + I + P J j =1 P j ( l j ; a j ) are the num b er of parameters, including tw o parameters for m ea n and v ariance, I parameters for th e scale parameter for the quan titativ e v ariables and P J j =1 P j ( l j ; a j ) parameters deﬁned in T able 2 , for the qualitative v ariables. The ﬁrst term i n ( 10 ) ev aluates the go o dness-of-ﬁt of t h e mo del, while the second term imp oses a p enalty for mo del complexity . When an additive structure is employ ed, the num b er of p ar a m et er s increases by J − 1, as the relative w eigh ts { ψ j } , shown in ( 8 ), are also considered unknown pa r am e te rs and are sub ject to the constraint P J j =1 ψ j = 1. 5.3 BIC-based mo del a v erage T reating {M k } as prior mo dels with equal probabilities, we can leverage the results giv en by diﬀeren t mo dels and hen ce make predict i on s more robustly . F ollowing Claesk ens and Hjort ( 2008 ), we can obtain the p oster i or probabilit y of eac h mo del M k as follo ws: p ( M k | D ) ∝ p ( D | M k ) p ( M k ) ≈ exp  − d BIC M k / 2  p ( M k ) . Therefore, the ﬁn a l mo del of BIC MAvr is a weigh ted av erage of diﬀerent mo dels, where the w eigh ts ar e de te rm i n e d by their BIC v alues. T o d e ri ve the predi ct i o n a n d as so ciated uncer- tain t y from the ﬁnal mo del, let b Y M k denote the v alu e predicted by mo del M k , and let s 2 M k b e the corresp onding predictive v ariance for a new inp u t ( U ⋆ , V ⋆ ). The ﬁnal predicted v alue 20 and v ariance are then given by b Y ( U ⋆ , V ⋆ ) = K X k =1 w M k b Y M k , and s 2 ( U ⋆ , V ⋆ ) = K X k =1 w M k r s 2 M k + n b Y ( U ⋆ , Y ⋆ ) − b Y M k o 2 . where w M k ∝ exp  − d BIC M k / 2  and P K k =1 w M k = 1. 6 Numerical com pa ri so ns In this section, w e ev alu at e the p erformance of the mo dels in Section 4 using our framework. In the simulation examples, w e include three comp etitive metho ds for comparison, namely EzGP ( Xiao et al. , 2021 ), EEzGP ( Xiao et al. , 2021 ), and ctGP ( Lin et al. , 2024 ) , all of whic h accoun t for interactions betw een QQ factors, as discussed in Section 2.2 . In the subsequen t com p u te r exp er i m ent examples, w e fo cus on further inv estigating the practica l p erformance and interpretabilit y of our p r o p osed metho ds. The metho d names represent Gaussian ( Gau ), exp onential ( Exp ), and li n ea r ( Linear ) k ernels assi gn ed to the latent v ariables of qualitative v ariables. The subscripts denote the dimen si on of the laten t v ariab l e s or indicate the inc or p oration of ordinal information ( ord ) if it exists. W e distinguish b et w een the m ultiplicativ e ( multi ) and additiv e ( add ) relationships b etw een the qualitative v ariables using sup erscripts. F or example, Gau multi 1 (or resp ectiv ely Gau multi ord ) rep r ese nts the multiplicativ e Gaussian k ernel with 1-dimensiona l (or resp ectively ordinal) latent v ariables. T o mitigate the risk of lo cal op t i m a , the optimization is initial i zed from 15 random starting p oin ts, a n d the so l u t i on yi el d i n g the small est log- l i keliho o d in ( 9 ) is reta i n ed . Building on these base mo dels, our mo del s el ect i o n and mo del av eraging strategies, as in tro duced in Section 5 , lead to four metho ds for ev aluation: LOOCV log − lik , LOOCV l 2 , BIC MSel , and BIC MAvr . Giv en the hold-out test p oints D te =  ( U te i , V te i )  n te i =1 , the accuracy is ev aluated by the 21 relativ e ro ot-mean-squared error (R R MS E) , whic h is deﬁned as RRMSE =      P n te i =1  b Y ( U te i , V te i ) − Y ( U te i , V te i )  2 P n te i =1  Y ( U te i , V te i ) − ¯ Y  2      1 / 2 , where Y ( U te i , V te i ) and b Y ( U te i , V te i ) denote the true and predicted v alues at the i n p u t ( U te i , V te i ) and ¯ Y = n − 1 te P n te i =1 Y ( U te i , V te i ) is the mean of the tr u e resp o n ses o ver the test inputs. 6.1 Sim ulation exa mpl es F ollo wing Zhang et al. ( 2020 ), w e ap pl y our prop osed metho ds to four real-w orld engi neer i n g mo dels: (i) the b eam b ending mo del, (ii) the b orehole mo del , (iii) the output transformerless (OTL) cir cu i t mo del, and (iv ) the piston mo del. Detailed descriptions of these examples can b e found in Section SI.1. In all examples, qualitative v ariables are generated from quan titativ e v ariables, making them ordinal in nature. F or a fair co m p ar i so n , w e employ the same dataset consisting of 30 replicates as used in Zhang et al. ( 2020 ). F or each mo del, the training p oints were g en er a te d using a maximin Latin hypercub e d e si gn (LHD) ( San tner et al. , 2003 ), and 10,000 uniformly distribut ed test p oints were u sed for ev aluation. Moreov er, the results are insensitive to the choice of exp erimental design, as illustrated by the additional sim ulation in Supplementary Section SI I.1, where random sampling and the Ma xPr o design ( Joseph et al. , 2015 , 2020 ) y i el d similar p erformance to the maximin LHD design. The RRMSEs across the 30 replicates are rep orted in Figure 1 . W e summarize the key ﬁndings b elo w. Ov erall, the three comp eting metho ds, esp ecially ctGP , tend to yield higher RRMSE v alues than the metho ds un d er our framework. Our metho d s equipp ed with diﬀerent ker- nels demonstrate v arying strengths across the examples. T o b etter show case the results, we 22 Competing Methods Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV bending borehole OTL piston EzG P E E z G P ctGP Ga u O r d multi G au 1 multi Gau 2 multi E x p O r d multi Exp 1 multi Exp 2 multi Linea r O r d multi Linear 2 multi L in ear 3 multi Gau Ord add G a u 1 add Gau 2 add Exp O r d add E x p 1 add Exp 2 add Lin ea r O r d add L inea r 2 add Linear 3 add B IC MA v r BIC M S el L O OC V log − lik L OOC V l 2 0.03 0.10 0.30 0.01 0.03 0.10 0.30 0.01 0.10 1.00 0.1 0.3 1.0 RRMSE Figure 1: Comparison of RRMSE across diﬀerent metho ds and kernel conﬁgurations fo r the b eam b ending, b orehole, OTL circuit, and piston examples. Each b o xplot summarizes the results from 30 indep endent runs wit h diﬀerent training sets gener a te d via maximin LHD. All metho ds were ev aluated on the same set of 10, 0 00 uniformly distributed test p oints. 23 classify the metho ds into t w o categ o ri e s based on whet h er t h ey imp ose additive or multi- plicative structures across diﬀeren t qualit a ti ve v ariables. The t wo classes exhibit distinct p erformance patterns: (i) the results b et ween the tw o classes diﬀer signiﬁcantly in the b ore- hole examp l e; (ii) in the OTL example, the diﬀerences b etw een Ga u ssi a n , exp onential, and linear kernels are less pronounced when an add i t i ve structure is imp osed, co m p a re d to a m ultiplicativ e str u ct u re . Within eac h class, further comparisons can b e made based on the c hoice of k ernel, whic h determines how the relationsh i p s betw een diﬀer ent lev el s of each qualitative v ariable are mo deled. Metho ds using the exp onen tial kernel exhibit p o or p erfor- mance, as reﬂected by the highest RRMSE v alues. F or the other tw o kernels, metho ds with the linear kernel p erform b etter than those with the Gaussian k ernel in the b eam b ending, while the former is slightly w orse than the latter. F or t h e b orehole and OTL examples, they are co m p a ra b l e. These diﬀerences can b e attributed to the diﬀerent correlation str u ct u r es b et w een qualitativ e v ariables in the four exam p l e s. The dimension of the la t ent v ector plays a critical role in determining the p erform a n ce of metho d s using the same kernel. F or example, in the b eam b ending example, metho ds with the li n ea r kernel and a latent v ector dimension of l j = 2 ( Linear multi 2 and Linea r add 2 ) out - p erform those with l j = 3 ( Lin ea r multi 3 and Linear add 3 ). Conv ersely , in the piston example, the metho d u si n g the additive Gaussian k ernel with l j = 2 ( Gau add 2 ) achiev es b etter p erformance compared to its counterpart wit h l j = 1 ( Gau add 1 ). These results demonstrate the imp ortance of selecting an appropriate latent v ect or dimension. Wh en a low-dimensional laten t v ector is suﬃcient to capture the structure of the data, increasi n g the dim en si o n can intro d u c e additional uncertaint y , reduce general i z ab i l i ty , and degrade p erformance. Con versely , an o v erly low-dimensional v ector may fail to capture the essen tial relationships and structural complexity . Striking t h e righ t balance is essential to ensure the mo del is b oth accurate and robust and av oid the pitfalls of ov er-parameteriz at i o n or unde r- sp eciﬁcation. 24 In thi s simulation, incorp orating the ordinal nature of qualitative v ariables, when present, generally leads to impro ved p erformance. T raditional GP mo dels with QQ inputs often treat ordinal v ariables as nominal ones. While this app r o ach is conv enient, it fails to fully exploit the inherent ordinal stru ct u re within the data. In con trast, our metho ds, which expl i ci t l y ac- coun t for the ordinal stru ct u r e, consistently hav e sup er i or p erformance. Sp eciﬁ ca l l y , Gau multi ord , Gau add ord , Exp multi ord and Exp add ord in most cases outp erform Gau multi 1 , Gau add 1 , Exp multi 1 and Exp add 1 across all four examples. Moreo ver, Line a r multi ord / Linea r add ord outp erforms Linea r multi 2 / Linea r add 2 in the OTL example and p erforms comparably in the other examples. These ﬁndings highlight the limitati on s of treat i n g ordinal v ariab l es as nomi n a l and u n d er sco r e the imp ortance of lev eraging the ordinal structure to enhanc e mo d el i n g accuracy and eﬃciency . No w w e inv estigate the reason b ehind the improv ed p erforma n ce when incorp orating ordinal information. Figure 2 visualizes th e laten t v ectors estimated by Gau multi 1 , Gau multi ord , Exp multi 1 , and Exp multi ord in the OTL example where Gau multi ord and Exp multi ord demonstrate signiﬁ- can tly b etter p erformance than other t w o. First, let us fo cus on the factor R f . The lat ent v ectors esti m a t ed by Gau multi 1 generally follow the order of the levels, with only one excep- tion. This indicates that when usi n g the Gaussian kernel, the ordin a l structur e is suﬃciently strong and can b e eﬀectiv ely l ea rn e d for R f in th i s example. This observ ation reinforces the rationale for using ordinal inform at i on . Wh en the exp o n ential k ernel is a p p l i ed , treating the ordinal v ariable as nominal (i.e., Ex p multi 1 ) in tro duces more noise, as the estimated l a t ent v ectors exhibit diﬀerent ordering patterns across replica t i on s. By incorp orating ordinal in- formation, Exp multi ord signiﬁcantly enhances its predictive p o w er. F or the additive kernel, the phenomenon is simi l a r and can b e found in Supp l em entary Figure S1. Next, let us examine β , whic h has si x levels and p oses a greater c hall en g e. Both Gau multi 1 and Exp multi 1 exhibit noisy laten t vector est i m a t i on s. In many replications, most latent v ectors corresp onding t o the six lev els estimated by Exp multi ord are identical, which indicat es that the diﬀerent levels of β are 25 diﬃcult to distinguish using this dataset. In such ca ses, estimatin g to o many parameters for the latent parameterization may l ea d to ov erﬁtting. By comparing the metho ds that treat the v ariable as nominal ( Exp multi 1 and Gau multi 1 ) with those that imp ose an ordinal str u ct u r e ( Exp multi ord and Gau multi ord ), it b ecomes clear that incorp orating ordinal cons tr a i nts has the eﬀect of regularizing parameter es ti m a t i on and he n ce enhancing generalizability in this case. Gau 1 multi Gau ord multi 1 2 3 4 1 2 3 4 0.0 0.1 0.2 0.3 0.0 0.2 0.4 0.6 0.8 lev el z 1 (a) R f , m ultiplicative Gaussian kernel. Exp 1 multi Exp ord multi 1 2 3 4 1 2 3 4 0.0 0.1 0.2 −0.2 0.0 0.2 0.4 lev el z 1 (b) R f , m ultiplicative exp onential kernel. Gau 1 multi Gau ord multi 1 2 3 4 5 6 1 2 3 4 5 6 0.00 0.01 0.02 0.03 0.04 −0.04 −0.02 0.00 0.02 lev el z 1 (c) β , multiplicativ e Gaussian k erne l . Exp 1 multi Exp ord multi 1 2 3 4 5 6 1 2 3 4 5 6 0.0000 0.0005 0.0010 0.0015 0.0020 −0.002 −0.001 0.000 0.001 0.002 lev el z 1 (d) β , multiplicativ e exponential k ernel. Figure 2: The b oxplots and scatter p oints depict the latent vectors z 1 for resistance R f and curren t gain β , estimated b y Gau multi 1 , Gau multi ord , Exp multi 1 , and Exp multi ord , resp ectively , across 30 replications in the OTL example. Poin ts from the same replication are connected by l i n es. In the R f panel for Exp multi 1 , diﬀerent line t yp es indicate diﬀerent ordering patterns of the estimated latent v ectors: solid lines corresp ond to rep l i cat i o n s w her e t h e original level order is preserv ed, dashed lines indicate a reversal betw een the second and thir d levels, and dotted lines indicate a swap b etw een the third and fo u r t h lev els. Since assi g n i n g diﬀerent k ernels yields v arying p er fo r m an c e, it is i m p ortan t to select an app r o p ri a t e one. W e compare three mo del selection strategies ( BIC MSel , LOOCV l 2 , and LOOCV log − lik ) and one mo del av eraging strategy ( BIC MAvr ) prop osed in Section 5 through their normalized RRMSE ranks, wh er e a low er rank indicates b etter (i.e., lo w er) RRMSE 26 p erformance. As shown in T able 3 , these strategies p erform satisfactorily in ge n er al b e- cause they alwa ys select metho ds ranked in the top half. F or the mo del selecti o n strategies, LOOCV log − lik ac hiev es relatively stabl e ranks, which fall within the top 20% to 30%, and sho ws comp etitive p erformance in the OTL and piston examples. In con trast, BIC MSel and BIC MAvr ac hiev e the b est p erformance in the b ending and b orehole examp l es, while their p erformance is less fa vorable in the piston example. Her e, BIC MAvr ac hiev es more stable p erformance across v arious scenarios and demonstrates b etter p erformance than BIC MSel in b oth the OTL and piston examples. T able 3: Normalized rank of RRMSE for BIC MAvr , BIC MSel , LOOCV log − lik , and LOOCV l 2 across sim ulation t yp es. The ranks for BIC MSel , LOOCV log − lik , and LOOCV l 2 are computed among the 18 base mo dels b as ed on ascendi n g RRMSE, whe re as the rank for BIC MAvr is determined b y including its RRMSE alongside the 18 base mo d el s. A lo wer rank implies a low er RRMSE v alue. The results are summarized as medi a n , mean, and standard dev i a t i on (SD) across replications. Metho d b ending b orehole OTL piston Median Mean SD Median Mean SD Median Mean SD Median Mean SD BIC MAvr 0.105 0.125 0.040 0.105 0.130 0.047 0.263 0.254 0.1 3 6 0.342 0.321 0 . 21 1 BIC MSel 0.111 0.119 0.111 0.111 0.113 0.068 0.278 0.285 0.1 8 3 0.389 0.398 0 . 23 6 LOOCV log − lik 0.222 0.248 0.106 0.250 0.237 0.076 0.278 0.280 0.1 7 8 0.222 0.302 0 . 21 1 LOOCV l 2 0.111 0.157 0.117 0.278 0.246 0.071 0.278 0.296 0.1 7 9 0.333 0.370 0 . 20 9 T o sum up, this simulation study highlights the signiﬁcance of selecting appropriate k ernels, optim i zi n g l a t ent vector dimensions, leveraging ordinal struct u r es, and employing eﬀectiv e mo del selection or av eraging strat eg i es. T o further ex a mi n e the prediction accuracy and computationa l cost fo r v arious dimen si o n s, w e conduct additional exp eri m ents on the b orehole example with diﬀeren t d i s cr et i zat i o n degrees. Our metho ds achiev e the lo w est RRMSE in mo derate dimension. As an example of our approach, Gau multi ord deliv ers more accurate predictions t h a n ctGP , EzGP , an d EEzGP across all degrees within a re aso n a b l e computational time. Interestingly , a trade- oﬀ b etw een accuracy and time can b e seen. F or instance, while ctGP beco m es more accurat e with ﬁner di scr et i za ti on , its computational 27 cost increases more rapi d l y than other metho ds. Another observ ation is that Lin ea r multi ord attains higher predictive accuracy than b oth Linea r multi 2 and Linea r multi 1 at the exp ense of a higher computational cost. The detailed setups and results are provided in Supplementary Section SI I.2. 6.2 A 3D coupled ﬁnite elemen t mo del for em bankmen ts In this section, we apply v arious m et h o ds to a fully 3D coupled ﬁnite element mo del, whic h has b een rigorousl y v alidated for its eﬀectiveness in capturing the deforma t i on s and stresses of full-scale embankmen ts ( Liu and Row e , 2015 ). The corresp onding c om p u t er ex- p erimen ts inv olv e one quantitativ e factor and three qualitative factors. The qu a ntitativ e factor u 1 ∈ [0 , 14] (in m ) represen ts the distance from the em bankmen t centerline to the em bankmen t shoulder, taking 29 u n i f or m l y spaced v alues. The three qualitati ve fa ct o rs are the embankmen t constr u ct i o n rate v 1 ∈ { 1 , 5 , 10 } (in m /mon th ) , the Y oung’s mo dulus of columns v 2 ∈ { 50 , 100 , 200 } (in MPa), and the reinforcement stiﬀness v 3 ∈ { 1578 , 4800 , 8000 } (in kN/ m ). This example is particularly suitable for stud yi n g scenarios with a limited num- b er of levels for qualitative v ariables, as eac h quali ta ti ve fact or here has only t h r ee lev els. As describ ed in Deng et al. ( 2017 ) and Kang and Deng ( 2020 ) , for each v alue of the quan titativ e factor, a t h re e-l evel fractional factorial design with nine runs is employ ed for the qualitati ve factors, resulting in a total of 261 design p oints. The test dataset consists of 29 input settings, w h er e u 1 tak es 29 equally spaced v alues ov er interv al [0 , 14] , a n d the qual- itativ e factors are ﬁxed at ( v 1 , v 2 , v 3 ) = (5 , 100 , 4800). Among the design p oin ts, tw o p o i nts ( u 1 , v 1 , v 2 , v 3 ) ∈ { (14 , 5 , 200 , 8000) , (14 , 10 , 200 , 1578) } are identiﬁed as having p oten tially re- v ersed lab els and are excluded from the training pro ces s (see S u p p l em entary Section SI.3 for details). Additionally , training on the entire dataset fails. A plausible explanation is that some p oi nts that are to o close to each other can cause a near singula r i ty when computing 28 Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV n=27 n=45 n=63 G a u O r d multi G au 1 multi G au 2 multi E x p O r d multi Exp 1 multi E x p 2 multi L ine ar O r d multi L inea r 2 multi Linear 3 multi Gau O r d add Gau 1 add Gau 2 add E x p add E x p 1 add E x p 2 add L in ea r O r d add L in ear 2 add Linear 3 add BIC MA v r BIC MSel LOOC V log − li k L OOC V l 2 0.03 0.10 0.30 1.00 0.01 0.03 0.10 0.30 0.01 0.03 0.10 0.30 RRMSE Figure 3: Comparison of RRMSE across diﬀerent metho ds and kernel conﬁgurations fo r the 3D coupled ﬁnite element mo del for embankmen ts, ev aluated on the remain i n g p oints of t h e training dataset. Eac h b oxplot summari zes resul t s from 100 runs. In each run, the mo del is trained using a su b set of 27, 45, or 63 design p oints randomly sampled from th e training dataset. R − 1 in ( 9 ). T o address this, su b set s of 3, 5, and 7 p oints are randomly selected at each com bination of the qualitative factors, r es u l ti n g in t r ai n i n g set s with 2 7 , 45, and 63 design p oin ts, resp ectively . T o ev aluate the p erformance of each metho d, the RRMSE is cal c u l at ed using (i) the remaining po i nts in the tra i n i n g dataset and (ii) the indep endent test dataset. As shown in Figures 3 and 4 , increasi n g the latent dimension generally imp r ov es predic- tion accuracy . Since the num b er of levels is relativ ely low (three), increasing the dimension from one to t w o for Gaussian and exp on ential k ernels or from tw o to three for Gaussian k ernels only requires estimating one additiona l parameter, which enhances mo del ﬂexibility with a slight increase in comple xi ty . 29 Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV n=27 n=45 n=63 G a u O r d multi G au 1 multi G au 2 multi E x p O r d multi Exp 1 multi E x p 2 multi L ine ar O r d multi L inea r 2 multi Linear 3 multi Gau O r d add Gau 1 add Gau 2 add E x p add E x p 1 add E x p 2 add L in ea r O r d add L in ear 2 add Linear 3 add BIC MA v r BIC MSel LOOC V log − li k L OOC V l 2 0.01 0.10 1.00 0.01 0.10 1.00 0.01 0.10 1.00 RRMSE Figure 4: Comparison of RRMSE across diﬀerent metho ds and k ernel conﬁgura ti o n s for the 3D coupled ﬁnite elemen t mo del for embankmen ts, ev aluated on the ind e p enden t test dataset. Each b o xplot summarizes results from 100 runs. In eac h run, the mo del is trained using a subset of 27, 45, or 63 design p oin ts ran d o m l y sampled from the training dataset. 6.3 A material design example W e t h en inv estigate a mat er i a l design example fo cusing on the elastic and mec hanical prop- erties of materi al s ( Balachandran et al. , 2016 ). The dataset consists of 223 comp ounds from the M 2 AX family , with their elastic prop erties computed using density functional theory and the planewa ve/core p otential formalism ( Co v er et al. , 2009 ) . The resp onses include the bulk mo dulus, shear mo dulus, and Y oung’s mo dulus. This example in v ol ves th r ee nomin al v ariables, eac h with m ulti p l e levels: the M atom has ten levels { Sc, Ti, V, Cr, Zr, Nb, Mo, Hf, T a, W } , the A atom has t w o levels { C, N } , and the X atom has tw elve levels { Al, Si, P, S, Ga, Ge, As, Cd, In, Sn, Tl, Pb } . In addition, the M, A, and X atoms are asso ci- ated with three, t w o, and t wo qua ntitativ e fea tur es , resp ectively , that describ e their ph ysical prop erties. Using the shear mod u l u s as a resp onse, previous studies ha ve demons trated the imp ortance of i n co rp orating qualitative v ariables i nto the prediction ( Zhang et al. , 2020 ). 30 Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV Bulk modulus Shear modulus Y oung's modulus Gau 1 multi Ga u 2 multi E x p 1 multi E x p 2 multi Linear 2 multi L inea r 3 multi Ga u 1 add G a u 2 add Exp 1 add E x p 2 add L in ear 2 add Linear 3 add BIC MA v r BIC MSel LOOC V log − li k L OOC V l 2 0.0 0.2 0.4 0.6 0.00 0.25 0.50 0.75 0.00 0.25 0.50 0.75 RRMSE Figure 5: Comparison of RRMSE across diﬀerent metho ds and kernel conﬁgurations fo r the material design example. Eac h boxplot reﬂects resu l t s from 10 resampling runs, where 200 p oin ts are used for training and 23 p oin ts for testing in eac h run. Here, w e extend this inv estigatio n b y comparing the p erformance of v arious kernel conﬁg- urations across all three r esp onses. F ollo wing the pre vi o u s approach, we randomly select 200 dat a p oints from a total of 223 as the traini n g set, with the remaining 23 d a t a p oints reserv ed for testing. The ev aluation is rep eated 10 times. T able 4: Summary of RRMSE for 12 base mo dels, three mo del selection metho ds, and one mo del av eraging metho d. Results are presented as medi a n , mean, and standard deviation (SD) acr o ss replicati o n s. The sm a l l est RRMSE mean and median v alues are p r es ented in b oldface for each resp onse. Method Modulus Criterion Gau multi Exp multi Linear multi Gau add Exp add Linear add BIC LOOCV 1-d 2-d 1-d 2-d 2-d 3-d 1-d 2-d 1-d 2-d 2-d 3 -d MAvr MSel log − lik l 2 Mean 0.314 0.419 0.350 0.441 0.357 0.331 0.3 20 0.327 0.284 0.315 0.263 0.29 1 0.314 0.314 0.372 0.339 Bulk Median 0.283 0.435 0.335 0 . 3 84 0.345 0.320 0.312 0.3 16 0.260 0.295 0.284 0.284 0.283 0.283 0.352 0.306 SD 0.115 0.104 0.075 0.132 0.104 0.083 0.0 54 0.123 0.073 0.081 0.058 0.045 0.115 0.115 0.083 0.101 Mean 0.465 0.468 0.499 0.442 0.531 0.457 0.4 81 0.504 0.486 0.510 0.459 0.418 0.465 0.465 0.437 0.454 Shear Median 0.424 0.455 0. 4 9 8 0.472 0.506 0.445 0.4 30 0.447 0.415 0.445 0.387 0.373 0.424 0.424 0.421 0.410 SD 0.143 0.106 0.113 0.088 0.120 0.091 0.1 51 0.160 0.185 0.149 0.170 0.145 0.142 0.143 0.102 0.169 Mean 0.475 0.452 0.482 0.396 0.508 0.446 0.4 78 0.478 0.501 0.514 0.443 0.377 0.475 0.475 0.435 0.387 Y oung Median 0.416 0.458 0.454 0.387 0.499 0.449 0.377 0.436 0.4 0 9 0.475 0.358 0.363 0.416 0.416 0.441 0.363 SD 0.163 0.071 0.101 0.095 0.126 0.075 0.2 25 0.157 0.182 0.143 0.179 0.109 0.163 0.163 0.086 0.114 The RRMSEs across ten replicates for the three resp onses are sh own in Figur e 5 , with a 31 Bulk modulus Shear modulus Y oung's modulus Ga u 1 add Gau 2 add Ex p 1 add Exp 2 add Linear 2 add Lin e ar 3 add Gau 1 add G a u 2 add Exp 1 add Ex p 2 add L i ne a r 2 add L i ne a r 3 add Gau 1 add Gau 2 add Exp 1 add Exp 2 add Linear 2 add Linear 3 add 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Relative V ar iance T ype of Atom M−atom A−atom X−atom Figure 6: Comparison of relat i ve w eigh ts ψ j deﬁned in ( 8 ) for metho ds with ad d i ti ve k ernels in the material design ex am p l e . summary provided in T able 4 . In t h i s example, either Linea r add 2 or Linear add 3 generally achiev es the lo w est mean or median RRMSE, with Linear add 3 exhibiting lo w er v ariation. Assigning either an additiv e or multiplicativ e k ernel results in comparable p erformance; ho w ev er, the additive k ernel generally p erforms slig htly b etter. T o understand this diﬀerence, w e visualize the relative weigh ts { ψ j } of the additive kernel in Figure 6 . Denote the X-atom as the j 0 th qualitative v ariable. W e observe that its relative w eigh ts ψ j 0 are close to zero, which indi cat es that t h e X-atom con tributes minimally to the v ariation in the resp onse. In o t h er w o r d s, there is little distinction b et ween di ﬀer ent lev els of the X-a t om . The same eﬀect ca n also b e ac hiev ed by using the multiplicativ e k ernel, pro vided that τ j 0 v ,v ′ ≈ 1 in ( 3 ) for all v , v ′ ∈ { 1 , . . . , a j 0 } , i.e., the correlations b et ween an y t w o levels of th e X-atom are appro xima t el y one. When the levels are represented via la t ent v ariables, this condition says that the latent v ariables asso ciat ed wi t h diﬀeren t lev els ar e nearly id e ntical. Ho w ever, to determine suc h a structure accurately w ould req u i r e a large amoun t of data . Consequen tly , when a quali ta t i ve v ariable has little or no eﬀect on the resp onse, the use of the a d d i t i ve kernel may captur e this pa t te rn more eﬀect i vely . 32 7 Summary rema r k s and further discussion In this pap er, w e introd u ce a general framew ork for mo deling QQ factors using GP . The main idea comes from an insightful approach b y Zhang et al. ( 2020 ), which maps eac h qualitative factor into a contin uous latent space. W e realize that this approach includes man y existing mod el s as sp ecial cases b y emplo yi n g multiplicativ e or additive structu r es and b y adopting k ernel functions—such as Gaussian, exp onential, and linear k ernels—for the laten t vectors of each qualitative v ariable. W e systematically ev aluate the p erformance of these mo dels. Met h o ds with the l i n e ar kernel achiev e sup eri or p erformance in certain cas es. Ov erall, mo dels using multiplicativ e k ernels te n d t o ou t p erform those using additive kernels, except when some qualitative v ariables are inactive in predicting the resp onse. Moreo ver, lev eraging ordinal infor m at i o n can improv e p erformance. Finally , b oth mo del a veraging and mo del selection strategies eﬀectively iden tify appropriat e mo dels and yield satisfactory predictive accuracy . Using this framework, w e establish tw o imp ortan t connections that enhance b oth con- ceptual understanding a n d practical implementation. First, our framework bridges the gap b et w een quantitativ e input-only mo deling and QQ input mo deling. This connection enables the exten s i on of tec hniques from quantitativ e input-only Gaussian pro cesses to situations in v ol vin g b oth quantitativ e and qu a l i tat i ve inputs. A common approach is to transform qualitative levels into dummy v ariables and then appl y kernels designed for quantitativ e v ariables. While this metho d is straightforw ard to implement, it often complicates the inter- pretation o f learned scale parameters. Bey ond the approach ta ken by Zhang et al. ( 2020 ), we tak e a further step b y studying its compatibility with oth er k ernels for quan titativ e inputs. This can facilitate future w ork to develop more complex kernels for QQ inputs. Ho w ever, careful attention m ust b e paid to improv e interpretabilit y and tackle computational c hal- lenges arising from the di scr et e nature of qualitative inputs. 33 Second, we demonstrate that this framework uniﬁes man y existing approaches by choos- ing diﬀeren t kernels, as shown in Section 2.2 . F or these mo dels, we prop ose mo del selection or mo del av eraging pro cedures, the eﬀectiveness of which is conﬁrmed b y the sim ulation and n umerical results in Section 6 . Besides unifying existing approaches, w e ﬁnd that assign- ing a low-dimensional li n ea r kernel is equiv alen t to imp os i n g a low-rank structure on the correlation matrix across diﬀerent levels. The eﬀectiveness of low-rank st r u ct u r es has b een w ell demonstrated in man y ﬁelds, such as economi cs ( F an et al. , 2008 ), epidemiology ( Zhong et al. , 2024 ), and engineering ( Chang et al. , 2021 ). Finally , our f ra m ework can easily handle ordinal v ariables by imp osing constrain ts on the laten t vectors to incorp orate the ordinal inform at i o n . T o facilitate computation, we trans- form these const r ai nts in to unconstrained forms for the Gaussian and exp one ntial k ernels and b ox constraints for the lin ea r k er n el . The adv an tages of incorp orat i n g ordinal informa- tion are demonstrated b y our sim ulation results. Ho wev er, one p otential limitation is th at w e requir e t h e l at ent vector to hav e length one for th e Gaussian a n d exp onen tial kernels, and length tw o for the l i n e ar kernel, whic h may aﬀect its gener a l i ty . F or instance, Qian et al. ( 2008 ) prop osed re st ri c ti n g the correlation to b e non-in cr ea si n g along the o rd i n a l levels, a theoretically rigorous but computationally demanding approac h. F urther inv estigation into optimal strategy for incor p orating ordinal information is le ft for fut u re researc h. A Pro o fs of Theorem 1 and Prop ositions 1 and 2 Pr o of of The or em 1 . Under the linear kernel, the l at ent parameterization { W V } can b e obtained through th e Cholesky decomp osition of the kernel matrix induced b y the latent parameterization { Z V } under the Mercer kernel. Sp eciﬁcally , let K ( j ) { Z V } denote the kernel matrix of th e j th qualitat i ve v ariable, where the 34 ( v , v ′ )th element is given b y K Z j ( z ( j ) v , z ( j ) v ′ ) for v , v ′ ∈ { 1 , · · · , a j } . The Cholesky decomp o- sition of K ( j ) { Z V } is expressed as K ( j ) { Z V } =  R ( j )  ⊤  R ( j )  , where R ( j ) = ( r ( j ) 1 , r ( j ) 2 , · · · , r ( j ) a j ) is an upp er triangular matrix of dimension a j × a j . Deﬁning w ( j ) i = r ( j ) i for i ∈ { 1 , 2 , · · · , a j } , w e then obtain a latent parameterization suc h that K Z j ( z ( j ) v , z ( j ) v ′ ) = K W j ( w ( j ) v , w ( j ) v ′ ) for al l v , v ′ ∈ { 1 , · · · , a j } . Since th e structure b etw een diﬀerent qualitative v ariables is either multiplicativ e as in ( 7 ) o r add i ti ve as in ( 8 ), the equality K Z ( Z V , Z V ′ ) = K W ( W V , W V ′ ) h ol d s for all V , V ′ ∈ × J j =1 { 1 , 2 , · · · , a j } . Pr o of of Pr op osition 1 . Pr o of of (a). Existenc e. W e p erform th e QR decom p osition  z (1) 1 , z (1) 2 , · · · , z (1) l 1  = QR , where Q ∈ R l 1 × l 1 is an orthogonal ma t r i x, and R ∈ R l 1 × l 1 is an upp er-triangular matrix with p ositive diagonal entries. Deﬁne  w (1) 1 , w (1) 2 , · · · , w (1) a 1  :=  R , Q ⊤ z (1) l 1 +1 , · · · , Q ⊤ z (1) a 1  . { Z V } and { W V } are equiv alent under the lin ea r kernel b ecause they satisfy the desired condition in Lemma S1. Uniqueness. W e hav e est ab l i sh e d the existence of su ch a laten t parameteri zat i o n . Ac- cording to Lemma S1, any equiv alent paramet er i za t i on satisﬁes  z (1) 1 , z (1) 2 , · · · , z (1) a 1  = Q  w (1) 1 , · · · , w (1) a 1  for some orthogonal matrix Q ∈ R l 1 × l 1 . The requiremen ts of W V in (a) ensure the unique- ness of t h e QR decom p osition, and consequently Q , w (1) 2 , · · · , w (1) l 1 . Th e uniqueness of the 35 remaining vectors is derived by solving the remaining a 1 − l 1 columns of ( 11 ). Pr o of of (b). Cartesian coord i n a t es can b e uniq u el y expressed in h yp ersph er i cal co ordi- nates as w (1) v ,l = r v cos  θ (1) v ,l  l − 1 Y ι =1 sin  θ (1) v ,ι  for 1 ≤ l < l 1 − 1 and w (1) v ,l = r v l Y ι =1 sin  θ (1) v ,ι  for l = l 1 − 1 , where v ∈ { 1 , · · · , a 1 } , 0 ≤ θ (1) v ,l ≤ π for 1 < l < l 1 − 1 and 0 ≤ θ (1) v ,l ≤ 2 π for l = l 1 − 1. In the ab ov e form ula, r v = ∥ w v ∥ 2 = ∥ z v ∥ 2 = 1 f or 1 ≤ v ≤ a 1 . T o ensure w (1) v ,l = 0 for all l > v , we requ i r e θ (1) v ,l = 0 for l = v . Ad d i t i on a l l y , w e imp ose θ (1) v ,l = 0 for l > v to preserve th e equality . Finally , to ensure w (1) v ,v > 0 for 1 ≤ v ≤ l 1 , we require 0 ≤ θ (1) v ,v − 1 ≤ π (rather th a n less than 2 π ) for 1 ≤ v ≤ l 1 . These constraints ar e exactly the ones as describ ed in ( b ). Pr o of of Pr op osition 2 . Existenc e. First, p erform the QR decomp osition:  z (1) 2 − z (1) 1 , · · · , z (1) l 1 +1 − z (1) 1  = QR , where Q ∈ R l 1 × l 1 is an orth o go n a l matrix, and R ∈ R l 1 × l 1 is an upp er triangular mat r i x with p ositive diagonal elements. W e then deﬁne  w (1) 1 , w (1) 2 , · · · , w (1) a 1  :=  0 , R , Q ⊤ ( z (1) l 1 +2 − z (1) 1 ) , · · · , Q ⊤ ( z (1) a 1 − z (1) 1 )  . By setting Q and w = − Q ⊤ z (1) 1 , { W V } sa t i sﬁ es the desi r ed con d i t i o n sta te d in Lem m a S2 . Therefore, { Z V } and { W V } are equiv alen t under the isotropic kernel. Uniqueness. W e ha v e already prov ed that suc h a latent parameterization exists. Accord- 36 ing to Lemma S2, there exist an orthogonal matrix Q and a vector w such that  z (1) 1 , z (1) 2 , · · · , z (1) a 1  = Q  w (1) 1 − w , w (1) 2 − w , · · · , w (1) a 1 − w  . (11) Since w (1) 1 = 0 , comparing th e ﬁrst columns of the tw o matr i ces in ( 11 ) gives w = − Q ⊤ z (1) 1 . Then, w e ha v e  z (1) 2 , · · · , z (1) l 1 +1  = Q  w (1) 2 + Q ⊤ z (1) 1 , · · · , w (1) l 1 +1 + Q ⊤ z (1) 1  . Rearranging the ab ov e formula yields  z (1) 2 − z (1) 1 , · · · , z (1) l 1 +1 − z (1) 1  = Q  w (1) 2 , · · · , w (1) l 1 +1  , whic h is precisel y the QR decomp osition. The uniqueness of Q , w (1) 2 , · · · , w (1) l 1 +1 , and w follo ws d i re ct l y fro m the uniq u en ess o f t h e QR decomp osition, giv en t h e requ i r em e nts for the sub diagonal e l em ents. The uniqu en ess of the remaining vectors can b e derived b y solving the remaining a 1 − l 1 columns of ( 11 ). Disclosure s ta t em en t The authors hav e the following conﬂicts of interest to declare. Ac kno wledgmen ts The authors thank the three revi ewers, t h e Asso ciate Editor, and the Editor for their v aluable commen ts. Deng’s w ork was compl e te d while she w as a p ostdo ctoral researc her at t h e Chinese Universit y of Hong Kong, Sh en zh en . 37 SUPPLEMENT AR Y MA TERIAL Supplemen ta ry Fi le This ﬁle includes addit i on a l details and results for the numerical comparisons in Section 6 , additional lemmas, and their pro ofs. Co de This ﬁle contains an R pack age, MixGP , whic h imp l em ents our metho d s and incl u d e s the co de to repro duce al l sim ulat i on s, ﬁgures, and tables. References Bac h, F. and Jordan, M. (2002), “Learning graphical mo del s with Merc er k ernels,” A dvanc es in Neur al Information Pr o c essing Systems , 15. Bac ho c, F., Gamboa, F., Loub es, J.-M., and V enet, N. (2017), “A Gaussi an pro cess regression mo del for distribution inputs,” IEEE T r ansactions on Information The ory , 64, 6620–6637. Balac handran, P . V., Xue, D . , Theiler, J., Hogden, J., and Lo okman, T. (2016), “Adap t i ve strategies for materials desig n using u n cer t a i nties,” Scientiﬁc R ep orts , 6, 1966 0 . Byrd, R. H., Lu, P ., No cedal, J., and Zh u, C. (19 9 5) , “A limited memory algor i t h m for b ound constrained optimi z ati o n , ” SIAM Journal on scientiﬁc c omputing , 16, 1190–1208. Carp en ter, B., Gel m a n , A., Hoﬀman, M. D., L ee, D., Go o drich, B., Betancourt, M., Brubak er, M., Guo, J., Li, P ., and Riddel l , A. (2017), “Stan: A Probabi l i st i c Program m i n g Language,” Journal of Statistic al Softwar e , 76, 1–32. Cerda, P ., V aro quaux, G., and K´ egl, B. (2018), “Similarity enco ding for learning with dirty categorical v ariables,” Machine L e arning , 107, 1477–1494. Chang, Y.-H., W ang, X. , Zhan g, L., Li, Y., Mak, S., W u, C.-F. J., and Y ang, V. (2021), “Reduced-order mo deling for co m p l ex ﬂo w em ulation b y common k ernel-smo othed prop er orthogonal decomp osition,” AIAA Journal , 59, 3291–3303. 38 Chen, Z., Mak, S., a n d W u, C. F. J. (2024), “A hierarchical exp ected improv ement metho d for Ba y esian op t i m i zat i o n , ” Journal of the Americ an Statistic al Asso ciation , 119, 1619– 1632. Claesk ens, G. and Hjort, N. L. (2008), “F requen tist and Ba yesian mo del av eraging,” in Mo del Sele ction and Mo del Aver aging , Cam bridge Univ ersit y Press, Cam bridge Series i n Statistical and Probabilist i c Mathematics, p. 192–226. Co v er, M. F., W arsc hko w, O., Bi l ek , M. M. M., and McKenzie, D. R. (2009), “A compre- hensiv e survey of M 2 AX phase elastic prop erties,” Journal of Physics: Condense d Matter , 21, 305403. Deng, X., Lin, C. D., Liu, K. W., and Row e, R. K. (2017), “Addit i ve G au s si an pro cess for computer mo dels with qual i t a ti ve and quan tita t i ve factors , ” T e chnometrics , 59, 283–29 2 . Dubrule, O. (1983 ), “Cross v alidation of kriging in a unique neighborho o d,” Journal of the International Asso ciation for Mathematic al Ge olo gy , 15, 68 7 –6 99 . Duv enaud, D. K., Nickisc h, H., and Rasm ussen, C. (2011), “Additi ve Gaussian pro cesses,” A dvanc es in N eur al Information Pr o c essing Systems , 24. F an, J., F an, Y., and Lv, J. (2008), “High dime n si on a l co v aria n ce matrix estimation usi n g a factor mo del,” Journal of Ec onometrics , 147, 186–197. Garrido-Merch´ an, E. C. and Hern´ andez-Lobato, D. (2020), “Dealing with categori cal and in teger-v a l u ed v ariables in Ba yesian optimization with Gaussian pro cesses,” Neur o c omput- ing , 380, 20–35. Gramacy , R. B. and Lee, H. K. H. (2008), “Bay esian treed Ga u ssi a n pro cess mo dels with an application to computer mo deling,” J ournal of the A meric an Statistic al Asso ciation , 103, 1119–1130. Hung, Y., Lin, L.-H., and W u, C. F. J. (2023), “Opt i m a l sim ulator selection,” Journal of the 39 A meric an Statistic al Asso ciation , 118, 1264–1271. Joseph, V. R., Gul, E., and Ba, S. (2015), “Maximum pro jection designs for computer exp eriments,” Biometrika , 102, 371–380. — (2020), “Designing computer exp erimen ts with multiple types of factors: The MaxPro approac h,” Journal of Quality T e chnolo gy , 52, 343–354. Kang, X. and Deng, X. (2020 ), “Design and a n al y si s of computer exp er i m ents with quanti- tativ e and qualitative inpu t s: A selective review,” Wiley Inter disciplinary R eviews: Data Mining and Know le dge Disc overy , 10, e135 8. Lange, K. (1999), Numeric al Analysis for Statisticians , Springer. Li, Z. and T an, M. H. Y. (2022 ) , “A Gaussian pro cess emulator based approac h for Ba yesian calibration of a functiona l input,” T e chnometrics , 64, 299–311. Lin, W. -A. , Sung, C.-L., and Chen, R.-B. (2024), “Category tree Ga u ssi a n pro cess for com- puter exp eriments with many-category qualitative factors and application to co oling sys- tem design,” Journal of Quality T e chnolo gy , 56, 391–408. Liu, K.-W. and Row e, R. K. (2015), “Numer i ca l st u d y of the eﬀects of geosynthetic reinforce- men t viscosity on b ehaviour of embankmen ts supp orted by deep-mixing-metho d columns,” Ge otextiles and Ge omembr anes , 43, 567–578. Luo, H., Cho, Y., Demmel, J. W., Li, X. S., and Liu, Y. (2024), “Hybrid paramet er search and dynamic mo del selection for mixed-v ariable Bay esian optimization,” Journal of Com- putational and Gr aphic al Statistics , 33, 85 5– 8 68 . McMillan, N. J., Sac ks , J., W elch, W. J., and Gao, F. (1999), “Analysis of prote i n activity data b y Gaussian sto chastic pro cess mo dels,” Journal of Biopharmac eutic al Statistics , 9, 145–160. Mercer, J. (19 09 ) , “ F unctions of Positiv e and Negati ve Type, and their Connection with the 40 Theory of Integral Equations,” Philosophic al T r ansactions of the R oyal So ciety of L ondon. Series A, Containing Pap ers of a Mathematic al or Physic al Char acter , 209, 4 15 – 44 6. Oune, N. and Bostanabad, R. (2021), “Latent m ap Gaussian pro cesses for mixed v ariable metamo deling,” Computer Metho ds in Applie d Me chanics and Engine ering , 387, 11412 8. P eng, C.-Y. and W u, C. J. (2014), “On the choice of n ugget in kri g i n g mo deling for deter- ministic computer exp eriments,” Journal of Computational and Gr aphic al Statistics , 23, 151–168. Pinheiro, J. C. and Bates, D. M. (1996), “Unc on st r a i n ed parametrization s for v a r i an ce - co v ariance matrices,” Statistics and Computing , 6, 289–296. Plate, T. A. (1999), “Accuracy v ersus in ter p r et ab i l i ty in ﬂexible mo deling: Implemen ting a tradeoﬀ using Gaussian pro cess mo dels, ” Behaviormetrika , 26 , 29–50. Qian, P . Z. G., W u, H., and W u, C. J. (2008), “Gaussian pro cess mo dels for computer exp eriments with qualitative and qua ntitativ e factors,” T e chnometrics , 50, 383–396. Rasm ussen, C. and Williams, C. (2006), “Mo del select i on a n d adaptation of hyperparame- ters,” in Gaussian Pr o c esses for Machine L e arning , MIT Press, pp . 105–128. Ro jo- ´ Alv arez, J., Mart ´ ınez-Ram´ on, M., Mu ˜ noz-Mar ´ ı, J. , and Camps-V alls, G. (2018), “Ker- nel functions and repro ducin g kernel Hil b ert spaces,” in Digital Signal Pr o c essing with Kernel Metho ds , John Wiley & Sons, pp. 165–207. Roustan t, O., P adon ou , E., Devill e, Y., Cl´ em e nt, A., Perrin, G., Giorla, J., and Wynn, H. (2020), “Group kernels for Gaussian p ro cess metamo dels with categorical inputs,” SIAM/ASA Journal on Unc ertainty Quantiﬁc ation , 8, 775–806. San tner, T. J., Williams, B. J., Notz, W. I., and Williams, B. J. (2003), The Design and A nalysis of Computer Exp eriments , vol. 1, Springer. Sa v es, P ., Diouane, Y. , Bartoli, N., Lefeb vre, T., and Morli er , J. (2023), “A mixed -ca t eg or i ca l 41 correlation kernel for Gaussian pr o cess,” Neur o c omputing , 550 , 126472. Sc hmidt, R. R., Cruz, E. E., and Iyengar, M. (2005 ), “ Ch a l l en ge s of data center thermal management,” IBM Journal of R ese ar ch and Development , 49, 709 –7 23 . Sc h warz, G. (1978), “Estimating the dimension of a mo del,” The Annals of Statistics , 461– 464. T ao, S., Apl ey , D. W., Plumlee, M., and Chen, W. (20 21 ) , “Latent v ariable G au ss i an pro cess mo dels: A ran k -b a sed analysis and an alternative app ro a ch,” International Journal for Numeric al Metho ds in Engine ering , 122 , 4007–4026. Xiao, Q., Mandal, A., Lin, C. D., and Deng, X. (2021), “EZGP: Easy-to-interpret G au ss i an pro cess mo dels for computer exp eriments with b oth q u antitativ e and qualitative factors,” SIAM/ASA Journal on Unc ertainty Quantiﬁc ation , 9, 333–353. Y erramilli, S., Iyer, A., Chen, W., and Apley , D. W. (2023), “F ully Bay esian infer en ce fo r laten t v ariable Gaussian pro cess mo dels,” SIAM/ASA Journal on Unc ertainty Quantiﬁ- c ation , 11, 1357–1381. Zhang, Y., T ao, S., Chen, W., and Apley , D. W. (2020), “A latent v ariable a p p ro a ch to Gaussian pro cess mo deling with qualitative and quantitativ e factors,” T e chnometrics , 62, 291–302. Zhong, Y., He, K., and Li, G. (2024), “Reduced-rank clustered co eﬃcient regression for addressing multicollinearit y in heterogeneous co eﬃcient estim a t i on , ” Biometrics , 80, ujae076. Zhou, Q., Qian, P . Z. G., and Zhou, S. (2011), “A simple approach to emulation for computer mo dels with qualitative and q u antitativ e facto rs , ” T e chnometrics , 5 3, 266–273. 42 Supplemen tary Material for “A General F ramew ork F or Mo d eling Gau ssian Pro cess with Qualitativ e and Quan titativ e F actors” Linsui Deng an d C. F. Jeﬀ W u Sc ho ol of Data Sci en ce , The Chinese Univ ersit y of Hong Kong, Sh en zh e n , China F ebru a r y 19, 2026 Our supplement is organize d as follows. Section SI provides additional d et ai l s on the nu- merical studies, includi ng the simulation settings for b enchmark examples and supplemen- tary results for the OTL circuit example, as w ell as a diagnostic analysi s of erroneous in- puts in the 3D ﬁnite elemen t mo del. Section SI I presents additional simulation compar- isons, incl u di n g comparisons of diﬀerent exp erimental designs and an extensive study of the b orehole example under v arying discretization levels, with detailed results on prediction ac- curacy and computational eﬃciency . Section SI I I contains the technical lemmas and their pro ofs that supp ort the theoretical results in the main pap er. The R pac k age and repro- ducible co de for this w ork are also av ailable at https://github.com/denglinsui/MixGP and https://github.com/denglinsui/MixGP- manuscript- sourcecode , resp ectively . SI Additional details and results o f n umerical com- parisons SI.1 F unctions used in simulation examples In the n umerical exp eriments, we adopt the same setting as presen ted in Zhang et al. ( 2020 ). W e detail the conﬁgurations here for complet en es s and clarity . F our b en chmark examples are considered in thi s study: b e am b ending , b or ehole , OTL cir cuit , and piston . These examples are commonly used for ev aluating the p erformance of surrogate mo delin g techniques b ecause they are deriv ed from real-world engineering problems and carry signiﬁcan t practical relev ance. The functional forms of these four examples are originally deﬁned based on contin uous input v ariables. T o adapt them to our study , we discretize cer t ain con tin uous v ariables and treat them as ordinal qualitative v ariables. The b e am b ending example mo dels the deﬂection of a be am with an elastic mo dulus of E = 600 GP a. The b eam is ﬁxed at one end, with a vertical force of P = 600 N applied at the free end. The deﬂection at the free end dep ends on the b eam length L ∈ [10 , 20], the height of the b eam’s cross-section h ∈ [1 , 2], and the cross-sectional shap e v , and is giv en b y: y ( L, h, v ) = L 3 3 × 10 9 h 4 I ( v ) , 1 where I ( v ) represents the normalized momen t of inertia dep endin g on the cross-section type. In this example, six cross-section types ar e considered, including circular, square, I-shap e, hollo w square, hollow circular and H-shap e, and the corresp onding I ( v ) are approximately 0.0491, 0.0833, 0.0449, 0.0633, 0.0373, and 0.0167 resp ectively . The geometric pi c tu re s can b e found in Zhang et al. ( 2020 ). The b or ehole example considers the ﬂow of water through a b orehole that is drilled fr om the ground surface through tw o aquifers ( Harp er and Gupta , 1983 ; Morris et al. , 1993 ). The ﬂo w rate (in m 3 / y ear) through the b orehole is given by: y ( r w , r , T u , H u , T l , H l , L, K w ) = 2 π T u ( H u − H l ) log ( r /r w ) n 1 + 2 LT u log( r /r w ) r 2 w K w + T u T l o , where r w ∈ [0 . 05 , 0 . 15] (in m ) is the radius of the b orehole, r ∈ [100 , 50000] (in m ) is the radius of inﬂu en ce , T u ∈ [63070 , 115600] (in m 2 / y ear) is the transmissivi ty of the upp er aquifer, H u ∈ [990 , 1110] (in m ) is the p oten tiometric head of the upp er aquifer, T l ∈ [63 . 1 , 116] (in m 2 / y ear) is the transmissivity of the low er aquifer, H l ∈ [700 , 820] ( i n m ) is the p otentiometric head of the low er aquifer, L ∈ [1120 , 1680] (in m ) is the length of the b orehole, and K w ∈ [9855 , 12045] (in m/ y ear) is the hydraulic conductivity of the b orehole lining. F ollo wing Zhang et al. ( 2020 ), w e adapt this function t o ev aluate surrogate mo dels wi t h mixed inputs by treating r w (radius of the b orehole) and H l (p oten tiometric head of the lo w er aquifer) as q u al it at i ve factors. Sp eciﬁcally , r w is d is cr et i z ed into three levels { 0 . 05 , 0 . 10 , 0 . 15 } , and H l is discretized into four levels { 700 , 740 , 780 , 820 } . The output tr ansformerless (OTL) cir cuit example con si de rs the midp oint voltage of a push-pull output transformerless ampliﬁer circuit. This circuit is commonly used in electrical engineering to drive loads without the need f or a transformer ( Ben-Ari and Steinberg , 2007 ). The midp oint v oltage (in volts) is mo deled as: y ( R b 1 , R b 2 , R f , R c 1 , R c 2 , B ) = β ( V b 1 + 0 . 74) ( R c 2 + 9) β ( R c 2 + 9) + R f + 11 . 35 R f β ( R c 2 + 9) + R f + 0 . 74 β R f ( R c 2 + 9) R c 1 { β ( R c 2 + 9) + R f } , where V b 1 = 12 R b 2 / ( R b 1 + R b 2 ), R b 1 ∈ [50 , 150] (in k Ω) is the resistance b 1, R b 2 ∈ [25 , 70] (in k Ω) is the resistance b 2, R f ∈ [0 . 5 , 3] (in k Ω) is the resistance f , R c 1 ∈ [1 . 2 , 2 . 5] (in k Ω) is the resistance c 1, R c 2 ∈ [0 . 25 , 1 . 2] (in k Ω) is the resistance c 2, and β ∈ [50 , 300] (Amp eres) is the curren t gain of the transistor. F ollo wing Zhang et al. ( 2020 ), w e adapt this example to ev aluate surrogate mo dels wi t h mixed inputs by treatin g R f (resistance f ) and β (current gain) as qualitat i ve factors. Sp ecif- ically , R f is discreti z ed into four levels { 0 . 5 , 1 . 2 , 2 . 1 , 2 . 9 } , and β is discretized in to six levels { 50 , 100 , 150 , 200 , 250 , 300 } . The piston exampl e mo dels the cycle time of a piston moving within a cylinder. This example simulates the linear motion of a piston b eing transformed into rotational motion v ia a connected ro d and disk. The cycle t i me (in se con ds ) is a measure of the pi st on ’ s per for man ce and is aﬀected by multiple physical factors. The resp onse funct ion for th e cycle t i me is given b y y ( M , S, V 0 , k , P 0 , T a , T 0 ) = 2 π s M k + S 2 P 0 V 0 T a V 2 T 0 , (S1) where V = S 2 k r A 2 + 4 k P 0 V 0 T 0 T a − A and A = P 0 S + 19 . 62 M − k V 0 S , (S2) 2 Gau 1 add Gau ord add 1 2 3 4 1 2 3 4 0.0 0.2 0.4 0.6 −3 −2 −1 0 1 2 lev el z 1 (a) R f , additiv e Gaussian k e rn el . Exp 1 add Exp ord add 1 2 3 4 1 2 3 4 0.00 0.05 0.10 0.00 0.02 0.04 0.06 lev el z 1 (b) R f , additiv e expone ntial k ernel. Gau 1 add Gau ord add 1 2 3 4 5 6 1 2 3 4 5 6 0.00 0.05 0.10 0.15 −2 −1 0 1 2 lev el z 1 (c) β , additive Gaussian k ernel. Exp 1 add Exp ord add 1 2 3 4 5 6 1 2 3 4 5 6 0.0 0.3 0.6 0.9 1.2 −2 0 2 lev el z 1 (d) β , additive exponential k ern el . Figure S1: The b oxplots and scatter p o i nts depict the latent vectors z 1 for resistance R f and current gain β , estimated by Gau add 1 , Gau add ord , Exp add 1 , and Exp add ord , resp ectively , across 30 repli ca t i on s in the OTL example. Poin ts from the same replication are connected b y lines. M ∈ [30 , 60] ( i n kg) represents the mass of the p i st on , S ∈ [0 . 005 , 0 . 020] (in m 2 ) represents the surface area of the piston, V 0 ∈ [0 . 002 , 0 . 010] (in m 3 ) is the initial gas v olume, k ∈ [1000 , 5000] (in N /m ) is the spring co eﬃcient, P 0 ∈ [90000 , 110000] (in N /m 2 ) is the atmosphe ri c pre ss ur e, T a ∈ [290 , 296] (in K ) is the ambien t temp erature, and T 0 ∈ [340 , 360] (in K ) is the ﬁlling gas temp erature. F ollo wing Zhang et al. ( 2020 ), this exampl e i s adap te d to ev aluate surrogate mo dels with mixed inputs b y treating P 0 (atmospheric pressure) and k (spr i ng co eﬃcien t) as qu al it at i ve fac- tors. Sp eciﬁcal ly , P 0 is discretized in to three levels { 90000 , 100000 , 110000 } , and k is discretized in to ﬁ ve levels { 1000 , 2000 , 3000 , 4000 , 5000 } . SI.2 Additional results of the O TL examples Figure S1 visualizes the latent v ect or s z 1 for resistance R f and current gain β , estimated by Gau add 1 , Gau add ord , Exp add 1 , and Exp add ord , resp ectively , across 30 replications in the OTL exampl e. Similar t o Figure 2 in the pap er, incorp orating ordinal information preserv es the natural ordinal structure and b eneﬁts from regularizat i on eﬀects. SI.3 Diagnosis of err oneo us inputs in the 3D ﬁnite elemen t mo del In Section 6.2 of our main pap er, we p oint out t wo data p oin ts, ( u 1 , v 1 , v 2 , v 3 ) ∈{ (14 , 5 , 200 , 8000), (14 , 10 , 200 , 1578) } , p oten tially ha ving reversed lab els and exclude them from the training pro- cess. Based on the fact that there are only 13 v alid combinations for ( v 1 , v 2 , v 3 ) ∈ { (1 , 100 , 4800), 3 (5 , 200 , 1578) } , we susp ect the correct inputs should b e ( u 1 , v 1 , v 2 , v 3 ) ∈ { (14 , 1 , 100 , 4800), (14 , 5 , 200 , 1578) } . T o verify our suspicion, we use the 18 base mo dels trained in Section 6.2 to make predictions for b oth the original inputs—(14 , 5 , 200 , 8000) (Lo cation 1) and (14 , 10 , 200 , 1578) (Lo cation 2)—and the revised inputs—(14 , 1 , 100 , 4800) (Lo cation 1) and (14 , 5 , 200 , 1578) (Lo cation 2). The predicted v alues are summarized in T able S1 . The observed v alue at Lo cation 1 is − 0 . 535, and at Lo cation 2, it is − 0 . 768. W e ﬁnd that the predictions b y using the revised inputs are muc h closer to t he observed v alues compared to those b y using the original inputs. This conﬁrms that the tw o p oints are indeed outliers and should b e excluded from the trainin g pro cess. T able S1: Predicted v alues from 18 base mo dels for b oth the orig i n al and revise d inputs at t wo lo ca t i on s identiﬁed a s p otential outliers. Results are summarized as the median, mean, and standard devia ti o n (SD) ac ro ss replications. Method n Criterion Gau multi Exp multi Linear multi Gau add Exp add Linear add 1-d 2-d o rd 1-d 2-d ord 1-d 2-d ord 1-d 2-d ord 1-d 2-d o rd 1-d 2-d ord Location 1 (Original Input) Mean -0.756 -0.756 -0.747 -0.7 50 -0.748 -0.751 -0.742 -0.744 -0.748 -0.727 -0.72 1 -0.721 -0.738 -0.732 -0.730 -0.732 -0.732 -0.731 27 Median -0.754 -0.754 -0.749 -0.754 - 0. 7 55 -0.754 -0.746 -0.749 -0.750 -0.729 -0.725 -0.72 5 -0.742 -0.737 -0.738 -0.731 -0.735 -0.734 SD 0.017 0.015 0.010 0.026 0.034 0.010 0.038 0.028 0.011 0.047 0.048 0.042 0 . 05 1 0 . 05 0 0 . 04 5 0.063 0.061 0.052 Mean -0.752 -0.753 -0.748 -0.7 49 -0.749 -0.750 -0.750 -0.754 -0.751 -0.722 -0.71 9 -0.717 -0.731 -0.728 -0.726 -0.723 -0.722 -0.721 45 Median -0.751 -0.753 -0.749 -0.752 - 0. 7 52 -0.755 -0.752 -0.754 -0.752 -0.718 -0.717 -0.71 6 -0.730 -0.726 -0.726 -0.718 -0.718 -0.717 SD 0.010 0.009 0.007 0.010 0.009 0.009 0.033 0.010 0.008 0.032 0.032 0.024 0 . 03 1 0 . 03 1 0 . 02 8 0.038 0.037 0.033 Mean -0.752 -0.754 -0.750 -0.7 50 -0.750 -0.752 -0.752 -0.757 -0.754 -0.711 -0.70 8 -0.709 -0.718 -0.716 -0.714 -0.709 -0.709 -0.709 63 Median -0.750 -0.755 -0.750 -0.753 - 0. 7 54 -0.756 -0.753 -0.756 -0.755 -0.709 -0.708 -0.70 8 -0.715 -0.713 -0.713 -0.709 -0.709 -0.709 SD 0.008 0.008 0.005 0.009 0.009 0.009 0.030 0.025 0.006 0.020 0.017 0.016 0 . 02 1 0 . 02 0 0 . 01 8 0.016 0.016 0.015 Location 1 (Revised Input) Mean -0.531 -0.531 -0.533 -0.5 36 -0.536 -0.533 -0.531 -0.533 -0.533 -0.578 -0.57 5 -0.575 -0.595 -0.587 -0.584 -0.597 -0.595 -0.594 27 Median -0.533 -0.533 -0.534 -0.535 - 0. 5 36 -0.534 -0.534 -0.534 -0.534 -0.559 -0.558 -0.55 9 -0.575 -0.567 -0.567 -0.564 -0.566 -0.560 SD 0.007 0.008 0.007 0.007 0.007 0.006 0.011 0.007 0.008 0.064 0.060 0.056 0 . 06 9 0 . 06 1 0 . 05 5 0.090 0.084 0.088 Mean -0.534 -0.535 -0.535 -0.5 37 -0.537 -0.536 -0.535 -0.536 -0.535 -0.577 -0.57 0 -0.572 -0.586 -0.583 -0.581 -0.578 -0.580 -0.576 45 Median -0.535 -0.535 -0.535 -0.536 - 0. 5 36 -0.535 -0.535 -0.535 -0.535 -0.559 -0.554 -0.55 5 -0.570 -0.563 -0.563 -0.559 -0.559 -0.560 SD 0.005 0.005 0.007 0.004 0.004 0.004 0.007 0.006 0.007 0.064 0.066 0.058 0 . 06 1 0 . 06 3 0 . 06 1 0.066 0.064 0.059 Mean -0.536 -0.536 -0.536 -0.5 36 -0.536 -0.537 -0.537 -0.537 -0.537 -0.552 -0.55 0 -0.551 -0.558 -0.558 -0.556 -0.554 -0.554 -0.554 63 Median -0.536 -0.536 -0.536 -0.536 - 0. 5 36 -0.536 -0.536 -0.536 -0.537 -0.550 -0.549 -0.55 0 -0.557 -0.554 -0.555 -0.552 -0.552 -0.551 SD 0.002 0.002 0.003 0.002 0.002 0.002 0.004 0.003 0.005 0.019 0.017 0.016 0 . 02 0 0 . 01 9 0 . 01 7 0.017 0.017 0.017 Location 2 (Original Input) Mean -0.841 -0.841 -0.829 -0.8 41 -0.841 -0.843 -0.811 -0.822 -0.820 -0.882 -0.88 1 -0.881 -0.880 -0.881 -0.879 -0.877 -0.877 -0.878 27 Median -0.841 -0.839 -0.833 -0.838 - 0. 8 38 -0.838 -0.828 -0.830 -0.827 -0.876 -0.878 -0.87 7 -0.876 -0.879 -0.874 -0.876 -0.876 -0.871 SD 0.028 0.026 0.028 0.019 0.020 0.017 0.058 0.035 0.029 0.051 0.052 0.048 0 . 05 0 0 . 05 1 0 . 05 1 0.062 0.063 0.058 Mean -0.818 -0.823 -0.802 -0.8 32 -0.830 -0.834 -0.812 -0.821 -0.803 -0.883 -0.88 4 -0.883 -0.882 -0.884 -0.883 -0.883 -0.883 -0.883 45 Median -0.827 -0.831 -0.788 -0.833 - 0. 8 32 -0.836 -0.817 -0.823 -0.816 -0.876 -0.877 -0.87 6 -0.876 -0.875 -0.876 -0.874 -0.874 -0.874 SD 0.029 0.027 0.027 0.015 0.015 0.013 0.030 0.023 0.035 0.035 0.033 0.032 0 . 03 7 0 . 03 5 0 . 03 6 0.040 0.040 0.039 Mean -0.808 -0.821 -0.799 -0.8 31 -0.831 -0.834 -0.820 -0.826 -0.822 -0.879 -0.87 8 -0.878 -0.877 -0.878 -0.878 -0.878 -0.878 -0.878 63 Median -0.811 -0.835 -0.790 -0.835 - 0. 8 34 -0.837 -0.819 -0.829 -0.827 -0.876 -0.876 -0.87 5 -0.874 -0.876 -0.876 -0.875 -0.875 -0.875 SD 0.031 0.029 0.028 0.013 0.014 0.010 0.033 0.027 0.022 0.027 0.026 0.026 0 . 02 8 0 . 02 7 0 . 02 7 0.026 0.026 0.026 Location 2 (Revised Input) Mean -0.773 -0.773 -0.773 -0.7 75 -0.775 -0.775 -0.775 -0.774 -0.772 -0.826 -0.82 1 -0.822 -0.827 -0.825 -0.823 -0.817 -0.817 -0.819 27 Median -0.769 -0.769 -0.769 -0.770 - 0. 7 69 -0.770 -0.770 -0.768 -0.768 -0.821 -0.814 -0.81 7 -0.825 -0.825 -0.823 -0.814 -0.814 -0.813 SD 0.015 0.016 0.015 0.017 0.020 0.015 0.017 0.017 0.015 0.049 0.048 0.044 0 . 04 9 0 . 04 8 0 . 05 0 0.059 0.059 0.056 Mean -0.768 -0.768 -0.768 -0.7 69 -0.768 -0.769 -0.766 -0.765 -0.765 -0.822 -0.82 1 -0.821 -0.825 -0.825 -0.824 -0.820 -0.818 -0.819 45 Median -0.766 -0.766 -0.766 -0.766 - 0. 7 67 -0.766 -0.765 -0.765 -0.765 -0.817 -0.815 -0.81 5 -0.817 -0.819 -0.818 -0.813 -0.812 -0.813 SD 0.010 0.010 0.009 0.013 0.014 0.011 0.012 0.010 0.006 0.034 0.033 0.031 0 . 03 6 0 . 03 3 0 . 03 5 0.039 0.040 0.039 Mean -0.767 -0.766 -0.766 -0.7 66 -0.766 -0.768 -0.766 -0.766 -0.766 -0.818 -0.81 6 -0.816 -0.821 -0.819 -0.818 -0.815 -0.815 -0.814 63 Median -0.768 -0.769 -0.768 -0.769 - 0. 7 69 -0.768 -0.767 -0.769 -0.768 -0.816 -0.813 -0.81 3 -0.819 -0.818 -0.818 -0.811 -0.811 -0.811 SD 0.012 0.012 0.012 0.011 0.012 0.007 0.008 0.012 0.009 0.026 0.025 0.025 0 . 02 9 0 . 02 7 0 . 02 5 0.026 0.026 0.025 4 Competing Methods Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV Random MaxPro Minmax LHD EzG P E E z G P ctGP G au 1 multi G au 2 multi G a u O r d multi Exp 1 multi Exp 2 multi E x p O r d multi Li nea r 2 multi L ine ar 3 multi L in ear O r d multi G au 1 add G a u 2 add Ga u O r d add Exp 1 add E x p 2 add E x p O r d add Linear 2 add Lin e ar 3 add Linear O r d add B IC MA v r BIC M S el L O OC V log − lik L OOC V l 2 0.01 0.10 1.00 0.01 0.10 1.00 0.01 0.10 1.00 RRMSE Figure S2: Comp ar i so n of RRMSE across diﬀere nt metho ds and k ernel conﬁ g u r at i o n s for the OTL circuit example. Each b o xplot summarizes the results from 30 indep enden t runs with diﬀerent training sets genera t ed via minimax LHS, rando m , and MaxPr o designs, resp ectiv ely . All metho ds were ev aluated on t h e same set of 10,000 uniformly distributed test p oints. SI I A ddit io na l sim ulat io n comparisons SI I.1 Comparisons of diﬀerent exp erimen tal designs In this section, we examine the eﬀect of exp erimental design on mo del p er for man ce . Sp ecif- ically , w e consider random sampli n g, the MaxPro design ( Joseph et al. , 2015 , 2020 ), and the maximin Latin hypercub e design (LHD) ( San tner et al. , 2003 ), which was previously examined in Section 6.1 , each using 80 training p oints. As sho wn in Figure S2 , the RRMSE v alues are highly consistent across al l three designs for every metho d, i n di c ati n g that the choice of d es ign has a negligible impact on p erformance. SI I.2 The b orehole example with v arying discretization degree In this section, we ev aluate the predictive accuracy and computati onal eﬃci en cy of v arious metho ds for the b orehole example under diﬀeren t discretization levels. In the preliminar y example presented in Section 6.1 , r w w as discr et i z ed into q 1 = 3 lev els and H l in to q 2 = 4 levels. Here, we extend our analysis b y considering multiple combinations of discretization degrees: q 1 ∈ { 2 , 4 , 6 } for r w and q 2 ∈ { 2 , 4 , 6 , 8 , 10 } for H l . W e generat e 5 q 1 q 2 design p oints using a minimax LHD, and map r w ∈ [0 . 05 , 0 . 15] an d H l ∈ [700 , 820] to equally spaced v alues. T o sa v e computation time, w e run our metho ds with only three random restarts and select the solution yielding the smallest log-likelihoo d. Figures S3 – S5 rep ort the RRMSEs of all metho ds. Consistent with the results in Section 6.1 , metho ds with addi t i ve kernels (including the comp eting metho ds EzGP and EEzGP ) exhibit p o or predictive p erformance. M et h o ds with multiplicativ e Gaussi an and exp onential kernels gene r- ally p erform w ell. In contrast, metho ds with multiplicativ e linear k ernels deliver satisfactory results only under v ery coarse discretization ( q 1 = 2, q 2 = 2) if ordinal information is not lev eraged, p ossibly b ecause fewer restarts increase th e risk of getting stuck in lo cal optima. By 5 comparison, Linear multi ord , together wi t h Gau multi ord , generally e xh i bi t s the b est p erformance, whic h indicates that incorp orating ordinal information can substantially enhance predictive accuracy and also improv e optimization stability . T able S2 summarizes the computational times of three comp eting metho ds and our 18 base metho ds. The computational time of mo del selection and mo del av eraging app roaches are not rep orted, as they are constructed b y p ost-pro cessing and aggregating the resul t s from ou r base mo dels, rather than b e in g standalone metho ds with with m in i mal additional computational costs. All compu t at i onal times were measured on a p ersonal computer equipp ed wit h a 13th Gen Intel(R) Core(TM) i7-13700KF CPU (3.40 G Hz) , 64 GB RAM, running Windo ws 10 and R 4.4.1. EEzGP is the fastest o v erall. ctGP is eﬃcient when the num b er of discretization lev els is s mal l , but its computational time increases rapidly as the levels grow. EzGP has complexit y comparable to that of our base metho d s. F or the base metho ds, comp u t ati on al time gen er all y increases with laten t dimension. With Gaussian or exp onential kernels, metho ds incorp orating ordinal information ( Gau multi ord , Gau add ord , Exp multi ord , and Exp add ord ) exhibit computational time comparable to the corr es p ond in g methods with the same latent dimension l j = 1 ( Gau multi 1 , Gau add 1 , Exp multi 1 , and Exp add 1 ), and can ev en b e faster when the discreti zat i on degree is high. Ho w e ver, with multiplicativ e linear kernels, incorp orating ordinal information greatly increases costs. Despite the higher computational time, Linear multi ord ac hiev es higher prediction accuracy than Linear multi 2 or Linear multi 1 . Thus, there is a trade-oﬀ b et w e en accuracy and eﬃciency . Figure S6 displays the RRM SE and computational time trends of EzGP , EEzGP , ctGP , Gau multi ord , Linear multi ord , and Linear multi 2 . Among our metho ds, Linear multi ord and Linear multi 2 represen t the slo w es t and fastest in terms of computational time, resp ectively; from the p ersp ective of prediction accuracy , Gau multi ord consisten tly ran k s among the top-p erform in g base metho ds; and w e in cl ud e BIC MAvr as an exemplar of the four mo del a veraging and mo del selection metho ds b ecause they yield similar predictive accuracy in the rep or t ed RRMSE results. EzGP is the fastest metho d, but the predicti ve accuracy of EzGP and EEzGP is relatively lo w. Although ﬁner discretization requires estimating correlations among mor e lev els, the RRMSE of ctGP decreases as larger sample sizes oﬀset the added complexity . This gain in accuracy comes at the cost of reduced computational eﬃciency . F or our metho ds, RRMSE decreases at lo w to mo derate discretization degrees and increases when the degree is high. Finally , Gau multi ord deliv ers high predictive accuracy while maintaining reasonable computational time. SI I I T ec hnical lemmas and their pro of s In this section, we demonstrate the parame t ri c equiv alence under the conditions I = 0, J = 1, and when the tw o kernels K Z ( · , · ) and K W ( · , · ) are identical. Under these assumptions, the equiv alence of parameterizations dep ends solely on the latent parame te r iz at ions { Z V } and { W V } . F or the linear kernel, Lemma S1 states that t h e equiv alence of the latent parameter i zat i ons is determined up to an orthogonal transformation. Lemma S1 (Equiv alence under Linear Kern el ) . Two latent p ar ameterizations { W V } and { Z V } ar e e quivalent under the line ar kernel if and only if  z (1) 1 , z (1) 2 , · · · , z (1) a 1  = Q  w (1) 1 , w (1) 2 , · · · , w (1) a 1  for some ortho gonal matrix Q ∈ R l 1 × l 1 such that Q ⊤ Q = I l 1 × l 1 . Pr o of of L emma S1 . Denote Z (1) =  z (1) 1 , z (1) 2 , · · · , z (1) a 1  and W (1) =  w (1) 1 , w (1) 2 , · · · , w (1) a 1  . If there exists an orthogonal matrix Q ∈ R l 1 × l 1 suc h that Z (1) = QW (1) , then we ha v e  W (1)  ⊤ W (1) =  W (1)  ⊤ Q ⊤ QW (1) =  Z (1)  ⊤ Z (1) , 6 Competing Methods Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV q2=2 q2=4 q2=6 q2=8 q2=10 EzG P E E z G P ctGP Ga u O r d multi G au 1 multi Gau 2 multi E x p O r d multi Exp 1 multi Exp 2 multi Linea r O r d multi Linear 2 multi L in ear 3 multi Gau Ord add G a u 1 add Gau 2 add Exp O r d add E x p 1 add Exp 2 add Lin ea r O r d add L inea r 2 add Linear 3 add B IC MA v r BIC M S el L O OC V log − lik L OOC V l 2 0.01 0.10 1.00 0.01 0.10 1.00 0.01 0.10 1.00 0.01 0.10 1.00 0.01 0.10 1.00 RRMSE Figure S3: Comparison of RRMSE across diﬀerent metho ds and k er n el conﬁgurat i on s for the b orehole example with discretization levels q 1 = 2 and q 2 ∈ { 2 , 4 , 6 , 8 , 10 } . Eac h b o xplot summarizes the results from 5 q 1 q 2 indep endent runs with diﬀ er ent training sets generated via maximin Latin hypercu b e desig n s. All metho ds were ev aluated on the same set of 10,000 uniformly di st r i b u t ed test p oin ts. 7 Competing Methods Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV q2=2 q2=4 q2=6 q2=8 q2=10 EzG P E E z G P ctGP Ga u O r d multi G au 1 multi Gau 2 multi E x p O r d multi Exp 1 multi Exp 2 multi Linea r O r d multi Linear 2 multi L in ear 3 multi Gau Ord add G a u 1 add Gau 2 add Exp O r d add E x p 1 add Exp 2 add Lin ea r O r d add L inea r 2 add Linear 3 add B IC MA v r BIC M S el L O OC V log − lik L OOC V l 2 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 RRMSE Figure S4: Comp ar i so n of RRMSE across diﬀere nt metho ds and k ernel conﬁ g u r at i o n s for the b orehole example with discretization lev els q 1 = 4 and q 2 ∈ { 2 , 4 , 6 , 8 , 10 } , using the same exp erimental setup as in Fi gu r e S3 . Competing Methods Multiplicative Gaussian Multiplicative Exponential Multiplicative Linear Additive Gaussian Additive Exponential Additive Linear BIC LOOCV q2=2 q2=4 q2=6 q2=8 q2=10 EzG P E E z G P ctGP Ga u O r d multi G au 1 multi Gau 2 multi E x p O r d multi Exp 1 multi Exp 2 multi Linea r O r d multi Linear 2 multi L in ear 3 multi Gau Ord add G a u 1 add Gau 2 add Exp O r d add E x p 1 add Exp 2 add Lin ea r O r d add L inea r 2 add Linear 3 add B IC MA v r BIC M S el L O OC V log − lik L OOC V l 2 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 0.03 0.10 0.30 1.00 RRMSE Figure S5: Comp ar i so n of RRMSE across diﬀere nt metho ds and k ernel conﬁ g u r at i o n s for the b orehole example with discretization lev els q 1 = 6 and q 2 ∈ { 2 , 4 , 6 , 8 , 10 } , using the same exp erimental setup as in Fi gu r e S3 . 8 q1=2 q1=4 q1=6 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 10 100 1000 q2 Computation time (in seconds) Method EzGP EEzGP ctGP Gau Ord multi Linear Ord multi Linear 2 multi (a) Computational time for selected method s ( EzGP , EEzGP , ctGP , Gau multi ord , Linea r multi ord and Linea r multi 2 ). q1=2 q1=4 q1=6 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10 0.01 0.10 1.00 q2 RRMSE Method EzGP EEzGP ctGP BIC MAvr Gau Ord multi Linear Ord multi Linear 2 multi (b) RRMSE for selected metho ds, and additionally including the BIC-based mo del a v eraging metho d ( BIC MAvr ). Figure S6: Comparison of computational time ( i n seconds) and predicti on accuracy for the b orehole example a cr oss v aryin g discretization levels, using the same exp erimen tal setup as in Figure S3 . 9 T able S2: Computational time (in seconds) for three comp eting metho ds and our 18 base metho ds applied to the bor eh o l e problem under diﬀerent discretizati on levels, using th e same exp erimental setup as in Fi gu r e S3 . Method q 2 Time EzGP EEzGP ctGP Gau multi Exp multi Linear multi Gau add Exp add Linear add 1-d 2-d o rd 1-d 2-d o rd 1-d 2-d ord 1-d 2-d ord 1-d 2-d ord 1-d 2-d ord q 1 = 2 2 Mean 5 2 9 29 29 29 29 29 29 21 21 33 29 29 29 29 29 29 22 22 35 SD 1 < 1 3 1 1 1 1 1 1 1 1 3 2 2 2 2 2 2 2 2 6 4 Mean 21 5 11 38 61 50 40 68 41 44 5 3 10 9 54 71 52 55 73 51 45 53 82 SD 3 1 3 4 7 5 4 1 1 4 6 8 28 9 8 7 9 10 7 6 6 25 6 Mean 47 12 14 75 110 90 61 131 65 54 83 287 67 97 74 72 100 77 60 79 92 SD 3 3 2 12 5 7 8 11 8 9 8 67 14 16 11 10 24 10 11 9 13 8 Mean 98 21 22 110 162 138 114 201 115 85 125 650 105 156 104 114 172 122 78 1 25 131 SD 1 4 2 13 9 6 11 12 11 10 8 152 14 11 16 17 26 10 15 15 24 10 Mean 178 38 37 162 245 195 163 302 185 104 190 712 144 228 148 152 269 153 107 185 177 SD 13 9 2 21 10 11 14 18 10 14 12 183 29 22 27 40 31 30 22 13 35 q 1 = 4 2 Mean 20 6 10 37 51 47 40 63 40 37 4 2 70 45 62 46 46 63 46 38 43 67 SD 1 1 2 4 6 5 3 5 3 5 6 10 6 6 6 6 7 7 6 7 15 4 Mean 91 18 24 75 143 1 1 6 93 168 79 6 9 110 398 87 130 89 100 137 100 73 102 120 SD 9 5 3 13 17 13 14 18 12 16 16 116 18 25 11 11 22 13 13 13 29 6 Mean 231 48 63 159 264 203 169 312 209 116 193 968 152 244 141 171 275 176 109 186 186 SD 6 14 4 25 22 23 15 26 28 19 20 290 31 19 29 31 38 23 26 32 49 8 Mean 490 89 182 302 467 324 257 481 285 1 65 314 790 202 408 207 247 461 274 161 311 352 SD 10 26 13 51 70 58 51 100 37 42 42 143 56 28 50 62 47 56 39 28 85 10 Mean 872 166 376 426 688 365 412 720 358 274 478 875 401 657 414 465 756 451 345 521 392 SD 22 36 15 102 107 75 65 99 62 55 95 191 69 69 35 85 71 30 60 72 102 q 1 = 6 2 Mean 50 13 14 63 109 89 75 125 76 57 78 138 74 98 76 75 112 84 63 80 104 SD 2 3 2 10 4 4 9 8 9 9 10 49 14 17 11 9 22 9 10 15 23 4 Mean 228 39 65 164 272 213 185 308 183 109 190 892 151 247 139 168 264 167 110 175 200 SD 5 13 4 29 22 21 11 25 18 22 19 226 35 14 33 37 33 30 28 28 66 6 Mean 630 92 261 324 582 359 309 594 333 1 91 388 774 248 482 258 311 585 278 216 383 407 SD 12 31 19 72 77 67 38 88 55 54 70 192 63 19 71 71 69 70 52 33 77 8 Mean 1 273 208 713 550 975 522 494 1115 472 402 660 1110 533 879 541 605 1035 613 464 698 589 SD 80 51 32 170 204 114 103 257 80 89 149 244 106 85 40 93 144 43 65 124 163 10 Mean 2 051 327 1542 998 1485 845 993 1797 848 674 1145 2902 753 129 3 756 836 1534 873 567 999 867 SD 101 86 107 213 295 19 5 185 301 149 91 214 493 152 96 148 99 104 66 119 16 0 172 whic h implies that w ⊤ v w v ′ = z ⊤ v z v ′ for an y v , v ′ ∈ { 1 , · · · , a 1 } . F or the reverse direction, we p erform the QR dec omp osit i on as follows Z (1) = Q Z R Z and W (1) = Q W R W where Q Z , Q W ∈ R l 1 × l 1 are orthogonal matrices, R Z and R W are upp er triangular matrices with p ositive diagonal elements. Since { W V } and { Z V } are equiv alen t under the linear k ernel, w e hav e: R ⊤ W R W =  W (1)  ⊤ W (1) =  Z (1)  ⊤ Z (1) = R ⊤ Z R Z . According to the Cholesky decomp osition and its uniquen es s, we deduce that R W = R Z . The orthogonal matrix can b e deﬁned as Q = Q Z Q ⊤ W . F or an isotropic k ernel, the k er ne l v alue is uniquel y determined by the distance b etw een t wo 10 laten t vectors. Consequently , an y isometric transformation of R l 1 results in an equiv alent latent parameterization, which is formalized in Lemma S2 . Lemma S2 (Eq ui v alenc e under Isotropic Kernel) . Tw o latent p ar ameteriz ations { W V } and { Z V } ar e e quivalent under an isotr opic kern el if and only if  z (1) 1 , z (1) 2 , · · · , z (1) a 1  = Q  w (1) 1 − w , w (1) 2 − w , · · · , w (1) a 1 − w  , for some ve ctor w ∈ R l 1 and some ortho gonal matrix Q ∈ R l 1 × l 1 such that Q ⊤ Q = I l 1 × l 1 . Pr o of of L emma S2 . Let Z (1) =  z (1) 1 , z (1) 2 , · · · , z (1) a 1  and W (1) =  w (1) 1 , w (1) 2 , · · · , w (1) a 1  . Sup- p ose there exists an orthogonal matrix Q ∈ R l 1 × l 1 suc h that  z (1) 1 , z (1) 2 , · · · , z (1) a 1  = Q  w (1) 1 − w , w (1) 2 − w , · · · , w (1) a 1 − w  , (S3) where w ∈ R l 1 . In this case, paramete r equi v alence is guaranteed by K Z ( Z V , Z V ′ ) = K 1     z (1) v 1 − z (1) v ′ 1    2  = K 1     Q  w (1) v 1 − w (1) v ′ 1     2  = K W ( W V , W V ′ ) , where K 1 ( · ) is the decreasing kernel generating function. On the other hand, assume the ab o v e formula holds for any v 1 , v ′ 1 ∈ { 1 , · · · , a 1 } . Then, we ha v e    z (1) v 1 − z (1) v ′ 1    2 2 =    w (1) v 1 − w (1) v ′ 1    2 2 , since K 1 ( · ) is a decreasing function. Note that Euclidean distance can b e expressed using the inner pro duct. By Lemma S1 , we know there exists an orthogonal matrix Q ∈ R l 1 × l 1 suc h that z (1) v 1 − z (1) v ′ 1 = Q  w (1) v 1 − w (1) v ′ 1  , for any v 1 , v ′ 1 ∈ { 1 , · · · , a 1 } . Thus, deﬁning w = Q ⊤ z (1) v 1 − w (1) v 1 results in a constan t vector w , indep enden t of v 1 , based on the for mula ab ov e. Finally , the d eﬁ n ed Q and w ensure that ( S3 ) holds, which justiﬁes the reverse direction as w ell. This completes the pro of. References Ben-Ari, E. N. an d Stein b e rg, D. M. (2007), “Mo deling data from computer exp eriments: An empirical comparison of kriging with MARS and pro jection pursui t regression,” Quality En- gine ering , 19, 327–338. Harp er, W. V. and Gupta, S. K. (1983), “Sensitivit y/uncertain t y analysi s of a b orehole scenario comparing Latin hypercub e sampling and deterministic sens i ti v i ty approaches,” T ec h. rep., Battelle Memorial Inst., Columbus, OH (USA) . Oﬃce of Nuclear W aste Isolation. Joseph, V. R., Gul, E., and Ba, S. (2015), “Maximum pro jection designs for computer exp eri- men ts,” Biometrika , 102, 371–380. — (2020) , “Designing computer exp eriments with multiple types of factors: The MaxPro ap- proac h,” Journal of Quality T e chnol o gy , 52, 343–354. Morris, M. D., M i t chell, T. J., and Ylvisak er, D. (1993), “Bay esian design and an al y si s of computer exp eriments: Use of deriv atives i n surface prediction,” T e chnometrics , 35, 243– 255. San tner, T. J., Williams, B. J., Notz, W. I., and Wil li am s, B. J. (2003), The Design and Analysis of Computer Exp eriments , vol. 1, Springer . Zhang, Y., T ao, S., Chen, W., and Apley , D. W. (2020), “A laten t v ariable approac h to Gaussian pro cess mo deling with qualitative and quantitativ e factors,” T e chnometrics , 62, 291–302. 11

A general framework for modeling Gaussian process with qualitative and quantitative factors

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment