Calculations of Sobol indices for the Gaussian process metamodel

Global sensitivity analysis of complex numerical models can be performed by calculating variance-based importance measures of the input variables, such as the Sobol indices. However, these techniques, requiring a large number of model evaluations, ar…

Authors: Am, ine Marrel (LMTE), Bertr

Calculations of Sobol indices for the Gaussian process metamodel
CALCULA TIONS OF SOBOL INDICES F OR THE GA USS IAN PR OCESS MET AMODEL Amandine MARREL ∗ , 1 , Bertrand IOOSS † , B ´ eatrice LA URENT ⋄ , Olivier R OUST ANT ‡ Submitted t o: R eliability E ngine ering a nd System Safety for t he special SAMO 2007 issue ∗ CEA Cadarac he, DEN/DTN/SMTM/LMTE, 13108 Saint P aul lez Durance, Cedex, F r ance † CEA Cadarache, DEN/DER/SESI/LCFR, 13108 Sain t Paul lez Du rance, Cedex, F r an ce ⋄ Institut de Math ´ ematiques, Universit ´ e de T ou lous e (UMR 5219), F rance ‡ Ecole des Mines de Sain t-Etienne, F rance Abstract Global sensitivity analysis o f complex numerical mo dels can b e p erformed by calculat- ing v ariance-based imp ortance meas ur es of the input v ariables, such a s the So bo l indices. How ev er, these tec hniques, requiring a lar ge n umber of mo del ev aluations, ar e often un- acceptable for time exp ensive computer co des. A well known and widely used decisio n consists in repla c ing the computer co de by a metamo del, predicting the mo del r esp onses with a negligible computation time and rending stra ightforw ard the estimation of Sob ol indices. In this pap er , we discuss ab out the Gaussia n pro cess mo del which gives analy tica l expressions of Sob ol indices . Two appr oaches ar e studied to compute the Sob ol indices: the fir st based on the pr edictor of the Gaussia n pro ces s mo del and the seco nd based o n the global sto chastic pro cess mo del. Comparisons b etw een the tw o estimates, made on analytical ex a mples, show the sup erior ity of the second appro ach in terms of conv ergence and robustnes s . Moreov er, the second a pproach allows to int egra te the mo deling err or of the Gaussian pr o cess mo del by directly giving so me confidence interv als on the Sobol indices. These techniques are finally applied to a real case of hydrogeolo gical mo deling. Keyw ords: Gaussian pro c ess, cov aria nce, metamodel, s ensitivity ana lysis, uncertaint y , computer co de. 1 Corresponding author: A. Marrel, Email: amandine.marrel@cea.fr Phone: +33 (0)4 42 25 26 52, F ax: +33 (0)4 42 25 62 72 1 1 INTR ODUCTION Environmen tal risk a s sessment is o ften based on complex computer co des, simulating for instance an atmospher ic o r hydrogeological p ollution tra ns po rt. These computer mo d- els c a lculate several output v a lue s (sca lars o r functions) which can depend on a high nu mber of input parameter s and physical v a riables. T o provide guidance to a b etter un- derstanding of this kind of mo deling and in o r der to reduce the resp onse uncerta int ies most effectiv ely , sensitivity meas ures o f the input imp o rtance o n the r esp onse v ariability can b e use ful (Saltelli et al. [2 4], Kleijnen [12], Helton et al. [9 ]). How e ver, the estimatio n of these measures (based on Monte-Carlo methods for exa mple) r equires a large num b er of mo del ev aluations, which is una cceptable for time exp ensive computer co des. This k ind of problem is of c o urse not limited to environmental mo deling a nd can b e applied to any simulation s ystem. T o av oid the problem of huge c a lculation time in sensitivity analysis, it can b e useful to replace the complex computer c o de by a mathematical a ppr oximation, called a resp onse surface o r a surr ogate mo del or also a metamo del. The respo nse surface metho d (Box & Drap er [2]) consists in constr ucting a function fro m few e xpe riments, that simulates the behavior o f the r eal phenomenon in the domain of influential par ameters. These metho ds hav e b een g e ne r alized to develop sur roga tes for costly computer co des (Sacks et a l. [23], Kleijnen & Sargent [1 3]). Sev eral meta mo dels ar e classically used: p oly nomials, splines, generalized linear mo dels, or lear ning statistica l mode ls like neural net w ork s, regress ion trees, supp or t vector machines (Chen et al. [3], F ang et al. [8]). Our attention is fo cused o n the Gaussia n pro cess mo del which can be viewed as an extension of the kriging principles (Matheron [18], Cr essie [6], Sacks et al. [23]). This metamo del whic h is characterize d by its mean and cov a riance functions, presents several adv antages: it is an exac t interpolator and it is interpretable (not a black-box function). Moreov er, n umerous a uthors (for example, Curr in et a l. [7 ], Santner et al. [25], V azquez et al. [2 8], Ra smussen & Williams [2 2]) have shown how this mo del can provide a statistical basis for computing an efficient predictor of code resp onse. In a ddition to its efficiency , this mo del gives an analytical formula whic h is very useful for sensitivity analysis, esp ecially for the v ariance- based impo r tance measures, the so -called Sob ol indices (Sobol [26], Saltelli et al. [24]). T o der ive analytical expression of So b o l indices, Chen et al. [4] used tensor- pro duct fo rmulation and Oa kley & O’Ha gan [20] consider ed the Bayesian fo r malism of 2 Gaussian pro cesses. W e pro po se to co mpare these tw o analytica l formulations of Sob ol indices for the Gaussian pro ces s mo del: the first is obtained consider ing o nly the predic to r, i.e. the mean of the Ga ussian pro cess mo del (Chen et al. [4]), while the second is obtained using all the g lobal stochastic model (Oakley & O’Ha gan [2 0]). In the last c a se, the estimate of a So bo l index is itself a ra ndom v ariable . Its standar d deviation is av ailable and we prop ose an o riginal alg orithm to estimate its distribution. Consequently , our metho d le a ds to build confidence interv als for the Sob ol indices. T o o ur knowledge, this information has not b een pro po sed b efore and ca n b e obtained thanks to the ana lytical fo rmulation of the Gaussian pr o cess model error. This is particularly interesting in practice, when the predictive quality o f the metamo del is not high (b ecaus e of small lear ning sample size for example), and our co nfidence on Sob ol index e stimates via the metamo de l is p o or. The next sec tio n briefly explains the Gaussian pro cess mo deling a nd the Sob ol indices defined in the tw o appr oaches (predictor -only and globa l mo del). In section 3 , the numer- ical computation o f a Sob ol index is pr esented. In the case of the glo bal sto chastic mo del, a pro cedure is develop ed to s im ulate its distribution. Sec tio n 4 is devoted to applica tions on a nalytical functions. First, compar isons a re made be t ween the So bo l indices based on the pr edictor and those based on the global mo del. The p e r tinence of simulating all the distribution of Sobo l indices is therefore ev aluated. Finally , Sob ol indices and their uncertaint y are computed for a real data set coming from a hydrogeolo gical transp or t mo del based on waterflo w a nd diffusion disper sion equations. The last sec tio n provides some p oss ible extensions a nd co ncluding remarks . 2 SOBOL INDICE S WITH GA USSIAN PR OCESS MODEL 2.1 Gaussian pro cess mo del Let us co nsider n realiz a tions of a computer co de. Each realizatio n y ( x ) of the computer co de output co r resp onds to a d-dimensional input vector x = ( x 1 , ..., x d ). The n input po int s c orresp onding to the co de runs a r e called a n exp erimental desig n and ar e deno ted as X s = ( x (1) , ..., x ( n ) ). The outputs will b e denoted as Y s = ( y (1) , ..., y ( n ) ) with y ( i ) = y ( x ( i ) ) , i = 1 , ..., n . Gaussian pro cess (Gp) mo deling treats the deter ministic resp onse y ( x ) 3 as a rea lization of a ra ndom function Y ( x ), including a regr e ssion part a nd a centered sto chastic pr o cess. The sample space Ω denotes the spa ce of all p oss ible outco mes ω , which is usually the Leb esgue-mea surable se t of r eal num b ers . The Gp is defined on R d × Ω and ca n be written as: Y ( x , ω ) = f ( x ) + Z ( x , ω ) . (1) In the following, we use indifferently the terms Gp mo del a nd Gp metamo del. The deter minis tic function f ( x ) pr ovides the mean approximation o f the computer co de. O ur study is limited to the parametric case where the function f is a linear co mb i- nation of elementary functions . Under this assumption, f ( x ) ca n b e written as follows: f ( x ) = k X j =0 β j f j ( x ) = F ( x ) β , where β = [ β 0 , . . . , β k ] t is the regres sion par ameter vector, f j ( j = 1 , . . . , k ) are ba sis functions a nd F ( x ) = [ f 0 ( x ) , . . . , f k ( x )] is the co rresp onding regr ession matrix. In the case o f the one-degree p olyno mial regres sion, ( d + 1) basis functions ar e used:    f 0 ( x ) = 1 , f i ( x ) = x i for i = 1 , . . . , d. In o ur applicatio ns, we use this o ne-degree p olyno mial as the regr ession part in or - der to simplify all the analytical numerical co mputation of sensitivity indices. This can be extended to other bases o f regressio n functions. Without prior information on the relationship b etw een the output and the inputs, a ba s is of one-dimensional functions is recommended to simplify the co mputations in sensitivity analysis and to keep one of the most adv antages of Gp mo del (Martin & Simpson [17]). The sto chastic par t Z ( x , ω ) is a Gaus s ian centered pro ce s s fully characterized by its cov a riance function: Cov Ω ( Z ( x , ω ) , Z ( u , ω )) = σ 2 R ( x , u ) , where σ 2 denotes the v a riance of Z and R is the c orrelatio n function that provides int erp ola tion and spatial correla tion prop erties. T o simplify , a stationary pro cess ( Z ( x , ω )) is c onsidered, which means that the correlation b etw een Z ( x , ω ) and Z ( u , ω ) is a function of the difference b etw een x and u . Moreover, our study is restricted to a family o f correlatio n functions that can b e 4 written a s a pro duct of o ne-dimensional co r relation functions: Cov Ω ( Z ( x , ω ) , Z ( u , ω )) = σ 2 R ( x − u ) = σ 2 d Y l =1 R l ( x l − u l ) . (2) This form of cor relation function is par ticularly well adapted to get some simplifications of the integrals in the future analy tical developmen ts: in the cas e of indepe ndent inputs, it implies the c o mputation of only one o r tw o -dimensional integrals to compute the Sob ol indices. Indeed, as des crib ed in section 3.2, the a pplication a nd the computation of the Sob ol index formulae are simplified when the corr elation function has the form of a one- dimensional pr o duct (Santner et a l. [25]). Among other authors, Chil` es & Delfiner [5] and Rasmussen & Willia ms [22] give a list of cor relation functions with their adv a ntages and drawbac ks. Among all these functions, our a tten tion is devoted to the g eneralized exp onential corr elation function: R θ , p ( x − u ) = d Y l =1 exp( − θ l | x l − u l | p l ) with θ l ≥ 0 and 0 < p l ≤ 2 , where θ = [ θ 1 , . . . , θ d ] t and p = [ p 1 , . . . , p d ] t are the corr elation parameters . This choice is motiv ated by the deriv a tion a nd regular it y pr op erties of this function. Moreover, within the ra nge of cov ariance para meters v alues, a wide sp ectrum o f sha pes are p ossible: for example p = 1 gives the expone ntial c o rrelatio n function while p = 2 g ives the Gaussian correla tion function. 2.2 Join t and conditional distributions Under the hypothesis o f a Gp mo del, the learning s ample Y s follows a mult iv a riate nor ma l distribution p Ω ( Y s | X s ): p Ω ( Y s , ω | X s ) = N ( F s β , Σ s ) , where F s = [ F ( x (1) ) t , . . . , F ( x ( n ) t )] is the r e gressio n matrix and Σ s = σ 2 R θ , p  x ( i ) − x ( j )  i,j =1 ...n is the cov a riance matrix . If a new p o in t x ∗ = ( x ∗ 1 , ..., x ∗ d ) is considered, the joint pro ba bilit y distribution of 5 ( Y s , Y ( x ∗ , ω )) is: p Ω ( Y s , Y ( x ∗ , ω ) | X s , x ∗ , β , σ , θ , p ) = N     F s F ( x ∗ )   β ,   Σ s k ( x ∗ ) k ( x ∗ ) t σ 2     , (3) with k ( x ∗ ) = ( Cov Ω ( y (1) , Y ( x ∗ , ω )) , . . . , Cov Ω ( y ( n ) , Y ( x ∗ , ω )) ) t = σ 2 ( R θ , p ( x (1) , x ∗ ) , . . . , R θ , p ( x ( n ) , x ∗ ) ) t . (4) By conditioning this join t distribution on the lear ning sample, we can r eadily obtain the co nditional distribution of Y ( x ∗ , ω ) which is Gaussian (von Mises [3 0]): p Ω ( Y ( x ∗ , ω ) | Y s , X s , x ∗ , β , σ , θ , p ) = N ( I E Ω [ Y ( x ∗ , ω ) | Y s , X s , x ∗ , β , σ , θ , p ] , V ar Ω [ Y ( x ∗ , ω ) | Y s , X s , x ∗ , β , σ, θ , p ]) , (5) with I E Ω [ Y ( x ∗ , ω ) | Y s , X s , x ∗ , β , σ , θ , p ] = F ( x ∗ ) β + k ( x ∗ ) t Σ − 1 s ( Y s − F s β ) , (6) V ar Ω [ Y ( x ∗ , ω ) | Y s , X s , x ∗ , β , σ , θ , p ] = σ 2 − k ( x ∗ ) t Σ − 1 s k ( x ∗ ) . (7) The conditional mean of Eq. ( 6) is used as a predictor. The conditional v aria nce formula o f Eq. (7) corr esp onds to the mean squared error (MSE) of this pr edictor and is also k nown as the krig ing v ariance. As we o btained the distribution for a new p oint conditionally to the lear ning sample, we can co nsider the cov a riance b etw een t wo new sites. A Gp conditional to the lea r ning sample is obtained a nd denoted as follows: ( Y | Y s , X s , β , σ , θ , p ) ∼ Gp( I E Ω [ Y ( x ∗ , ω ) | Y s , X s , β , σ, θ , p ] , Cov Ω ( Y ( x ∗ , ω ) , Y ( u ∗ , ω ) | Y s , X s , x ∗ , β , σ , θ , p )) (8) with the same e xpression for the conditiona l mean than Eq. (6) and Cov Ω ( Y ( x ∗ , ω ) , Y ( u ∗ , ω ) | Y s , X s , β , σ, θ , p ) = σ 2  R θ , p ( x ∗ , u ∗ ) − k ( x ∗ ) t Σ − 1 s k ( u ∗ )  . (9) The conditional Gp mo del (8) provides an ana lytical formula which ca n b e directly used for sens itivity analysis , and mo r e precisely to compute the Sob ol indices. T o simplify the no tations, the co nditional Gp ( Y | Y s , X s , β , σ , θ , p ) will now b e written in a simplified 6 form: Y Gp | Y s ,X s ( X, ω ). 2.3 Sob ol indices Metho ds based on v ariance decomp osition aim at determining the pa rt of the v a riance of the output Y ( x ) resulting from each v aria ble x i , i = 1 , . . . , d . A global measur e of the sensitivity o f Y ( x ) to each input x i is given b y the fir st order Sob ol index (Sob ol [2 6], Saltelli et al. [24]): S i = V ar X i [ I E X 1 ,...,X d Y | X i ] V ar X 1 ,...,X d [ Y ] for i = 1 , . . . , d. These indices hav e b een defined for deter ministic functions Y o f the inputs X 1 , . . . , X d but, in the case of the conditio na l Gp mo del, we hav e a sto chastic function of the inputs. A first solutio n is applying the Sob ol index formula to the predictor, i.e. the mean o f the co nditional Gp (Eq . (6)) whic h is a deterministic function of the inputs. Analytica l calculations are develop ed by Chen et al. [4]. The second approa ch that we consider consists in using the who le glo bal conditiona l Gp by taking int o account not only the mean of conditiona l Gp mo del but als o its cov ariance structure a s Oakley & O ’Hagan [20] did. In this case, when the Sob ol definition is applied to the global Gp mo del, a random v aria ble is obtained and co ns titutes a new sensitivity meas ur e. Its exp ectation ca n b e then considered as a sens itivit y index. Its v ariance and mo re generally its distribution can then b e used as an indicator o f sensitivity index accur acy . T o sum up, the tw o appr oaches can b e defined as follows: • Approach 1: Sob ol indices computed with the predicto r-only S i = V ar X i I E X 1 ,...,X d [ I E Ω [ Y Gp | Y s ,X s ( X, ω )] | X i ] V ar X 1 ,...,X d I E Ω [ Y Gp | Y s ,X s ( X, ω )] for i = 1 , . . . , d. (10) • Approach 2: Sob ol indices computed with the glo bal Gp mo del ˜ S i ( ω ) = V ar X i I E X 1 ,...,X d [ Y Gp | Y s ,X s ( X, ω ) | X i ] I E Ω V ar X 1 ,...,X d [ Y Gp | Y s ,X s ( X, ω )] for i = 1 , . . . , d. (11) ˜ S i ( ω ) is then a r a ndom v ar iable; its mean can b e considered a s a sensitivity index 7 and its v ar ia nce as a n indicator of its accuracy :                µ ˜ S i = I E Ω V ar X i I E X 1 ,...,X d [ Y Gp | Y s ,X s ( X, ω ) | X i ] I E Ω V ar X 1 ,...,X d [ Y Gp | Y s ,X s ( X, ω )] for i = 1 , . . . , d. σ 2 ˜ S i = V ar Ω V ar X i I E X 1 ,...,X d [ Y Gp | Y s ,X s ( X, ω ) | X i ] ( I E Ω V ar X 1 ,...,X d [ Y Gp | Y s ,X s ( X, ω )]) 2 for i = 1 , . . . , d. (12) Our work fo cuses on the computation and the study o f the sensitivity indices defined following the t wo appr o aches, r esp ectively S i and µ ˜ S i . W e will also pr op ose a metho dology to n umerically simulate the probability distribution of ˜ S i . Then, a study to compa re the accuracy and the ro bustness of the tw o indices is made on s everal test functions and the use of the dis tribution of ˜ S i is illustra ted to build confidence interv als. 3 IMPLEMENT A TION OF SOBOL INDICES 3.1 Estimation of Gp parameters First of all, to build the co nditional Gp defined by Eq. (8), reg ression and corr e la tion parameters (often c a lled hyperpara meters) have to b e determined. Indeed, the Gp mo de l is characterized by the reg ression pa r ameter v ector β , the correla tion parameter s ( θ , p ) and the v ar iance pa rameter σ 2 . The maxim um likelihoo d method is commonly used to estimate these parameter s fro m the learning sample ( X s , Y s ). Several algorithms ha ve b een pro po sed in previous paper s to numerically solve the maximization o f likelihoo d. W elch a t al. [31] use the simplex sea rch metho d and int ro duce a k ind of for ward se lection a lgorithm in which corr elation parameter s are added step by step to inc r ease the log- likelihoo d function. In Kennedy and O’Hagan’s GEM- SA softw a re (O’Hag an [21]), which uses the Bayesian formalism, the p o s terior distribution of hyperpa r ameters is ma x imized, using a co njugate gradient metho d (the Po wel metho d is used as the numerical recip e). The DA CE Matlab free to olb ox (Lophaven et al. [14]) uses a p ow erful sto chastic algo r ithm based on the Ho oke & Jee ves metho d (Baza r aa et al. [1]), which requires a starting point and some b o unds to constrain the optimization. In complex applica tions, W elch’s algor ithm reveals some limitations and for complex model with high dimensional input, GEM-SA and DA CE softw are cannot b e applied dir e c tly on data including a ll the input v ariables. T o so lve this problem, we us e a sequential version 8 (inspired by W elch’s a lgorithm) of the DA CE algor ithm. It is based on the step b y step inclusion of input v aria bles (previously sor ted). This methodo logy , desc r ib ed in details in Marr el et a l. [1 5], allows progr essive parameter estimation by input v aria bles se le ction bo th in the regressio n part a nd in the cov aria nc e function. 3.2 Computation of Sob ol indices for the tw o approac hes T o p erform a v ariance -based sensitivity a nalysis for time consuming co mputer mo dels, some authors prop ose to appr oximate the computer co de by a metamo del (neural netw orks in Martin & Simpson [16], p olyno mials in Io oss et al. [10], b o o s ting regress ion trees in V olkov a et a l. [29]). F o r metamo dels with s ufficient prediction capa bilities, the bias due to the use o f the metamo del instead of the true mo del is negligible (Jacques [11]). The metamo del’s predicto r hav e to b e ev a luated a la rge num ber of times to compute So bo l indices via Monte Ca rlo methods. Recent works based on poly nomial chaos expansio ns (Sudret [27]) have used the sp ecial form of this orthogo na l functions expa nsion to derive analytical estimation of Sob ol indices. How ever, the mo deling er r or of this metamode l is not av aila ble and then has not been integrated inside the Sob o l index estimates. The conditiona l Gp metamo del provides a n ana ly tic formula which can be ea sily used for sensitivity a na lysis in an a nalytical way . Moreov er, in the ca se of indep endent inputs and with a c ov ar iance which is a pr o duct of o ne-dimensional co v a riances (Eq . (2)), the analytical formulae o f S i and µ ˜ S i (resp ectively E qs. (10) and (12)) lea d to numerical int egra ls, more precisely to resp ectively one-dimensional and tw o-dimensio nal int egr a ls. The accura cy of these numerical integrations can b e ea sily co ntrolled a nd ar e less computer time exp ensive than Mo n te Car lo simulations. F ew analytica l developmen ts o f Sob ol indices computatio n (for S i , µ ˜ S i and σ 2 ˜ S i ) ca n b e found in O akley & O’Haga n [2 0]. 3.3 Sim ulation of the distribution of ˜ S i F or the se cond appro ach where ˜ S i is a random v ariable, the distribution o f ˜ S i is not directly av ailable. By taking the mean r elated to all the inputs exce pt X i , the main effect of X i is defined and denoted A ( X i , ω ): A ( X i , ω ) = I E X 1 ,...,X d [ Y Gp | Y s ,X s ( X, ω ) | X i ] . 9 A ( X i , ω ) is still a Gaus sian pro cess defined on R × Ω and characterized by its mean a nd cov a riance which can b e determined in an ana lytical wa y by integrating the Gp mo del ov er all the inputs except X i . In the ca se of indep endent inputs, one-dimensiona l integrals are o btained and ca n b e n umerically c o mputed. Then, to obtain the Sob ol indice s, we consider the v ariance r elated to X i of the Gaus sian pro cess defined by the centered ma in effect. This v ariance is written Z b i a i  A ( x i , ω ) − Z A ( x i , ω ) dη x i  2 dη x i with dη x i the probability density function of the input X i defined on [ a i ; b i ]. This last expression is a one-dimensiona l random in tegra l which has to b e discretized and approxi- mated by simulations. The discretiza tion of this random integral ov er the spa ce of X i leads to a Gauss ian vector of n dis elements: V n dis ( ω ) =  A ( a i , ω ) , A ( a i + ( b i − a i ) n dis , ω ) , . . . , A ( a i + ( n dis − 1) n dis ( b i − a i ) , ω ) , A ( b i , ω )  t . The mean and cov ariance matrix of this vector ar e computed using those of the Gaus sian pro cess A ( X i , ω ). The ra ndom vector V n dis is then m ultiplied by the matrix re la ted to the n umerical sc heme us ed to co mpute the in tegral (r ectangle or tr a p e z e metho d, Simpson’s formula ...). The Gaussia n vector obtained from this multiplication is denoted ˜ V n dis . T o sim ulate it, we use the well known sim ulation method based on the Cholesky factorisatio n of the cov ar iance matrix (Cres s ie [6]). W e simulate a n dis -size centered and reduced Gaussia n vector and multiply it b y the tria ng ular matrix from the Cho lesky decomp osition. Then, a n ev aluation of the random integral which constitutes a realiz a tion of ˜ S i is c o mputed fro m the simulation of the vector ˜ V n dis . This op eratio n is done k sim times to obtain a probability distribution for ˜ S i . It can b e noted, that only one Choles ky factorization o f the c ov a riance matrix o f the n dis -size vector is nec essary , and used for all the k sim simulations o f ˜ S i . T o determine if the dis cretization num b er n dis and the num b er of simulations k sim are sufficient, the conv er gence o f the mean and v ariance of ˜ S i can b e studied. Indeed, their v alues can b e e a sily computed following their a na lytical expressio ns (11). 10 4 APPLICA TIONS 4.1 Comparison of S i and µ ˜ S i T o co mpa re and study the b ehavior of the tw o sens itivit y indices S i and µ ˜ S i , we consider several test functions wher e the true v a lues of Sob ol indices ar e known. Compariso ns b e- t ween the tw o approa ches ar e p erformed in terms of metamo del predictivity , i.e. r elatively to the a ccuracy o f the Gp model, co nstructed fro m a learning sample. This accura cy is represented by the predictivity co efficient Q 2 . It corres po nds to the cla ssical co efficient of determination R 2 for a test s ample, i.e. for prediction res iduals: Q 2 ( Y , ˆ Y ) = 1 − P n test i =1  Y i − ˆ Y i  2 P n test i =1  ¯ Y − Y i  2 , where Y deno tes the n test true observ ations of the test s et a nd ¯ Y is their empirical mean. ˆ Y represents the Gp mo del predicted v alues . T o o btain different v a lues o f Q 2 , we simulate different learning samples with v a rying size n . F o r e ach s ize n , a La tin Hyp er c ube Sample of the inputs is simulated (McKay et al. [19]) to give the matrix X s ( n rows, d columns). Then, the test function is ev aluated o n the n data p oints to constitute ( X s , Y s ) and a conditional Gp mo del is built on ea ch learning sample. F or each Gp mode l built, the predictivity co efficient Q 2 is estimated o n a new test sample of size 10 000 and the tw o sensitivity indices S i and µ ˜ S i are computed. F or each v alue of the learning sample size n , all this pro cedure, i.e . Gp mo deling and estimation of sensitivit y indices, is done 100 times. Co nsequently , the empirical mean, 0.05-qua nt ile and 0.9 5 -quantile o f S i and µ ˜ S i are computed for s a me v alues of lea rning sample size n , and similar approximate v alue s of Q 2 . 4.2 T est on t he g-function of Sob ol First, an analytical function called the g-function of Sob ol is used to co mpare the Sob ol indices S i based o n the predictor and the Sob ol indices µ ˜ S i based on the g lobal Gp mo del. The g -function of Sob ol is defined for d inputs ( X 1 , . . . , X d ) uniformly distributed on [0 , 1] d : g Sob ol ( X 1 , . . . , X d ) = d Y k =1 g k ( X k ) wher e g k ( X k ) = | 4 X k − 2 | + a k 1 + a k and a k ≥ 0 . 11 Because of its complex ity (consider able nonlinear and non-monotonic rela tionships) and to the av ailability of analytical sensitivity indices, it is a w ell k nown test exa mple in the studies of global sensitivity a nalysis algorithms (Saltelli et al. [24]). The imp ortance of each input X k is re pr esented by the co efficient a k . The low er this co efficient a k , the mo re significant the v aria ble X k . The theoretical v alues o f first order Sobo l indices a re known: S i = 1 3(1+ a i ) 2 Q d k =1 1 3(1+ a k ) 2 for i = 1 , . . . , d. F or our a nalytical test, we choo se d = 5 and a k = k for k = 1 , . . . , 5 . . Let us recall that we study only first order sensitivity indices. F or ea ch input X i , the conv ergence of S i and µ ˜ S i in function of the predictivity co efficient Q 2 is illustrated in figure 1. The c o nv e r gence o f sensitivity index estimates to their exa c t v a lues in function of the meta mo del predictivity is verified. In practica l situations, a meta mo del with a predictivity low er than 0 . 7 is often co ns idered as a po o r appr oximation of the computer co de. T able 1 shows the connection b etw een the lear ning sample size n and the pre dic tiv ity co efficient Q 2 . As the sim ulation of a learning sample and its Gp modeling are done 1 00 times for each v alue of n , the mean and the standard deviatio n of Q 2 are indicated. [T able 1 ab out here.] Figure 1 also shows how the g lobal Gp model outp erfor ms the pr edictor-only model by showing smaller confidence in terv als for the five sensitivity indices . [Figure 1 ab out here .] T o sum up the convergence of the indices fo r the different inputs, it can be useful to consider the error b etw een the theoretical v a lues of So bo l indices S theo i and the estimated ones in L 2 norm: err L 2 = d X i =1 ( S theo i − ˆ S i ) 2 (13) where ˆ S i denotes the indices es tima ted with o ne o f the tw o metho ds ( ˆ S i = S i or ˆ S i = µ ˜ S i ). Figure 2 illustrates this con vergence in function o f the lear ning sample size n and in function o f the predic tiv ity co efficient Q 2 . [Figure 2 ab out here .] F rom Figur e 2, we conclude that the sensitivity indice s defined using the global Gp mo del ( µ ˜ S i ) are better in mean than the o ne estimated with the predictor only ( S i ). This 12 difference b etw een the t wo a pproaches is espec ia lly significant for high v alues of So bo l indices like the indices related to the first input ( S 1 and µ ˜ S 1 ). F or lower indices, these t wo approaches give in mean the sa me results. E ven if the t wo sensitivit y indices seem to hav e the same rate of c onv e rgence in function o f n or Q 2 , it is impo rtant to notice that the second a pproach is more robust. Indeed, µ ˜ S i has a low er sampling deviation a nd v aria bilit y than S i . Besides, this higher r obustness is more significant when the accuracy of the metamo del is weak ( Q 2 < 0 . 8). So , ta k ing int o acc o unt the cov ar iance structure o f the Gp mo del app ea r s useful to reduce the v a riability of the estimation of the sensitivity index. 4.3 T est on I shigami function W e now consider another analytical function curr ently used in se ns itivit y studies (Saltelli et al. [24]), the Is higami function, wher e each of the three input r andom v aria bles ( X 1 , X 2 , X 3 ) follows a uniform pro ba bilit y distribution on [ − π , + π ]: f Ishig ( X 1 , X 2 , X 3 ) = sin( X 1 ) + 7 sin 2 ( X 2 ) + 0 . 1 X 4 3 sin( X 1 ) The theo retical v a lue s of firs t order Sobo l indices are known:          S 1 = 0 . 3139 S 2 = 0 . 4424 S 3 = 0 Like for the g-function of Sob ol, the err or with the theoretical v alues of So b o l indices in L 2 norm is computed fo r the t wo approa ches for different lea rning s a mple size n a nd consequently for different v a lues o f Q 2 . As b efore (Eq. (13)), the dia grams of convergence are s hown in figure 3. [Figure 3 ab out here .] As obser ved for the g-function of Sob ol, the indices defined with the g lobal mo de l are still more robust and less v a riable pa rticularly for low v a lues o f Q 2 . How ever, the difference betw een the mean of the tw o indices is not significant. F o r high v alues of the Gp mo del accuracy ( Q 2 > 0 . 8), the tw o appr oaches g ive the same v a lues but the first o ne (with only the pr edictor) rema ins eas ier to compute. So, the use of the cov aria nce str ucture thro ugh 13 the index ˜ S i seems to have a sig nifica nt interest when the Gp metamo del is ina ccurate or when few data are av ailable to avoid to o muc h v aria bilit y of the estimated indices. 4.4 Construction of confidence in terv als for sensitivit y indices As well as b eing mor e r o bust in mean, the index defined with the s e cond appro ach ˜ S i has the adv antage to ha ve a v ariance easy to compute. More generally , it is po ssible to build a confidence in terv a l of any lev el for this sensitivit y index, us ing the metho do logy describ ed in section 3.3 to s im ulate its distribution. This estima tion of the uncertaint y on the estimation o f Sob ol indices is ano ther adv antage of using the global Gp mo del: in practical cases with s mall learning sample s ize, only one Gp model is constructed. The predictivity co efficient Q 2 can b e es timated by cro ss-v alida tion or leav e-one-out, and if Q 2 shows a low predictivity (typically less than 0 . 8), we wish to hav e s ome confidence in the estimation of Sob ol indices c omputed from the Gp mo del. Contrary to Gp model, other metamo dels do not a llow to directly estimate the Sob ol indices uncerta in ties due to the mo del uncer tainties. T o illustrate this, let us co nsider aga in the g-function of Sob ol. Like in the pr evious section 4.2, we c o nsider different s izes of the learning sample (fro m n = 20 to n = 50 ). F or each v alue of n , we build a conditional Gp mo del and we co n trol its accura cy estimating the Q 2 on a test s a mple. W e s imulate the distribution of ˜ S i to obtain the empir ic al 0 . 0 5 and 0 . 95-qua ntiles and co nsequently an empiric a l 9 0%-confidence interv al. Then, we check if the theor etical v alues of Sob ol indices b elong to the empirica l 90%-confidence int erv al. W e rep ea t this pro cedure 1 00 times for each size n . Therefor e, we are able to estimate the real level of our confidence interv al and compare it to the 90% exp ected. The r eal levels obtained in mean for any size n and ea ch input are presented in T able 2. [T able 2 ab out here.] F or the high v a lues of Sob ol indices ( S 1 and S 2 for example), the obser ved levels of the 90%- c o nfidence in terv a l built from the sim ulation of the distr ibution of ˜ S i are really satisfactory and close to the exp ected level. In this case, the use of the glo bal Gp mo del which gives confidence interv als for So bo l indices has a sig nificant interest. On the o ther hand, for very low indices (close to zero ), the Gp metamo del ov erestimates the Sob ol indices. It e x plains the inaccura cy of the confidence interv al. Indeed, witho ut a pr o cedure of inputs selectio n, each v ariable app ears in the Gp metamo del a nd in its cov a riance. 14 T aking into account the v aria nce leads to give a minimal bo und for the influence of all the v ar iables a nd consequently to ov erestimate the low est Sob ol indices. This tendency is confirmed by the o bserv a tion of the mea n o f µ ˜ S i estimated for the three last inputs in T able 2. W e can make the s ame study with the Ishigami function for n = 30 to n = 130 which induces a Q 2 v arying from 0 . 5 to 0 . 95. As a ll the pro cedure (i.e. learning sample simulation, Gp mo deling and sensitivity analysis) is r ep e ated 1 00 times for each size n , the conv ergence o f the obse r ved level of the empir ical 90%-confidence interv al can b e o bserved in function of n . Similarly , we ca n study this c onv ergence in function of Q 2 . Figur e 4 shows all these diagrams of conv er gence. [Figure 4 ab out here .] As previously remarked on the g-function of Sob ol, the 90%-co nfidence interv als are ef- ficient for the high v alues of So bo l indices ( S 1 and S 2 for exa mple). F o r these indices, the observed level of the c o nfidence interv al converges to theore tica l level 0 . 9. W e can also notice that the predictivity quality of the Gp mo deling which is required to obtain accurate confidence int erv al corres po nds appr oximately to Q 2 > 0 . 80. How ever, we judge that for Q 2 > 0 . 6 , the error is not to o strong and the obtained 9 0%-confidence in terv a l can b e considered as a reliable a nd useful information. O n the other hand, for very low indices (close to zero), the pro blem o f overestimating the So bo l indices still dama g es the accuracy of the interv al confidence fo r a ny size n and any Q 2 . This remar k is particular ly true when the index is equal to zero (for e x ample S 3 ). 4.5 Application on an h ydrogeologic transp ort co de The tw o a pproaches to compute the So b o l indices ar e now applied to the data obtained from the mo deling of stro n tium 90 (noted 90 Sr) transp or t in saturated p oro us media using the MAR THE softw are (dev elop ed b y BR GM, F rance). The MAR THE computer co de mo dels flow and tr ansp ort equatio ns in three-dimensional p oro us formations. In the context of an e nvironmental impact study , the MAR THE computer co de ha s b een a pplied to the mo del of 90 Sr transp or t in sa turated media for a r adwaste tempo rary storag e site in Russia (V olko v a et al. [29]). One of the final purpo ses is to deter mine the short-term evolution of 90 Sr transp ort in soils in order to help the rehabilitatio n decision making. Only a par tia l characterization of the site has b een made and, cons equently , v alues of the 15 mo del input parameters a re not known precisely . One of the first goals is to identify the most influential para meters of the co mputer co de in order to improve the characterization of the site in a judicious way . T o realize this global sensitivity analysis and b ecause of large computing time of the MAR THE co de, a Gp metamo del is built on the basis of a first lear ning sa mple. 4.5.1 Data presen tation The 20 uncer tain mo del para meters are p ermeability of different geolo gical layers comp os- ing the simulated field (pa rameters 1 to 7), lo ngitudinal disp ers ivity co efficients (par ame- ters 8 to 10), tra ns verse disp ersivity co efficients (para meters 1 1 to 1 3), s orption co efficients (parameters 14 to 16), p o rosity (parameter 17) and meteoric water infiltration intensities (parameters 18 to 2 0 ). T o study sensitivity of the MAR THE co de to these par ameters, 300 simulations of these 2 0 parameters hav e b een made by the LHS metho d. F or ea ch simulated set of pa rameters, MAR THE co de computes tr a nsp ort equations of 90 Sr and predicts the evolution o f 90 Sr concentration for year 2010. F or ea ch simulation, the o utput we consider is the v alue of 90 Sr concentration, predicted for year 2010, in a piezometer lo cated on the waste rep os itory site. 4.5.2 Gp mo deling and computation of Sobol indices T o mo del the concentration in the piezometer predicted by MAR THE co de in 2010 in func- tion o f the 20 input parameter s , w e fit a Gp metamo del conditionally to 30 0 simulations of the co de. The regre ssion a nd corr elation parameters of the Gp mo del ar e estimated by maximum likeliho o d and a pro cedure of input selection is used. The input v ariables int ro duced in the metamo del a re the sor ption co efficie nt o f the upper lay er (par ameter 14 denoted k d 1), an infiltration intensit y (parameter 20 denoted i 3 ) a nd the per meability of the upp er lay er (parameter 1 denoted per 1). The accura c y of the Gp mo de l is check ed with the estimatio n o f Q 2 by a cro ss v alidation on the lear ning sample. The predictivity co efficient estimated is: Q 2 = 0 . 9 2 . F rom pr e vious study (Marr el et a l. [15]), we ha ve found that the linear r e gressio n gives a Q 2 = 0 . 69 a nd the metamo del bas ed on b o osting of re g ression trees gives a Q 2 = 0 . 83 . F rom lab or atory measures a nd bibliographica l infor - mation, prior distributions have b een determined for the inputs k d 1, i 3 and per 1 and ar e resp ectively a W eibull, a trap ezoida l and a uniform distributions. The par ameters of these 16 distributions has been es timated o r fixed a prior i. Then, us ing the g lobal Gp mo del, the Sob ol indices defined by µ ˜ S i are computed (Eq. (12)) as well a s the standard deviation σ ˜ S i and the 90%-confidence interv al asso ciated (cf. metho do logy 3 .3). The results ar e presented in T able 3, with the Sob ol indices obtained with the pr e dictor-only approach and with the b o os ting predictor . [T able 3 ab out here.] The use of Gp mo del gives a b etter predictivity than the b o osting of regres sion trees (resp ectively Q 2 = 0 . 92 and Q 2 = 0 . 83) and conseq uent ly a more accura te es timation of Sob ol indices. Besides, the Sobo l indices estimated with the bo o s ting mo del do not even belo ng to the confidence in terv a ls given by the Gp mo del. Even if the sensitivity indices based on the predictor only and the o nes e s timated with the who le Gp mo del a re very close, the se cond appro ach has the a dv antage to give confidence interv als a nd consequently to ha ve a more r igorous analy sis. Without consider ing their in teractio ns, the 3 inputs k d 1, i 3 and per 1 explained nearly 90% of the total v ariance of the output and the most influent input is clea r ly k d 1, fo llowed by i 3 a nd per 1. So, k d 1 is the most imp ortant parameter to b e characterized in order to reduce the v ariability of the co ncent ratio n predicted by MAR THE co de. Using the whole Gp mo del, we also ha ve a n indication o f the accuracy of So b o l indices. The sta nda rd deviation of the indices a r e s mall and increase the confidence in the v alue estimated ( µ ˜ S kd 1 , µ ˜ S i 3 and µ ˜ S per 1 ). Mor eov er, the very small overlap of the 90 %-confidence in terv al of the 3 indices indicates that the order o f influence of the inputs is w ell deter mined a nd strongly c o nfirms the pr edominance of k d 1. So, the confidence interv als and the sta ndard deviation obtained with the whole Gp model give more confidence in the interpretation of So bo l indices. T aking in to account the v aria bility of the Gp model via its cov ariance structure gives more robustnes s to the r esults a nd their analys is. Howev e r , this increase of precision and confidence has a numerical cost. Indeed, the num b er o f numerical integrals being com- puted is of orde r O ( dn 2 ) where d is the num b er of inputs a nd n the num ber o f simulations, i.e. the lea rning sample size. The numerical cost dep ends also on the numerical preci- sion r equired for the appr oximation of the int egr a ls. Mo reov er, a hig h precisio n is often essential to provide the ro bustness of the computatio n o f So bo l indices, esp ecially when the distributio n of the inputs is narr ow and far from the uniform distribution (like the 17 W eibull distribution of k d 1). In this last case, it can b e judicious to adapt the numerical scheme in order to increase the precisio n in the region o f high densit y . 5 CONCLUSION W e hav e s tudied the Ga ussian pro ces s metamo del to per form sensitivity a nalysis, by es ti- mating Sobo l indices , of complex computer co des . This metamo del is built conditionally to a lea rning sample, i.e. to n simulations o f the co mputer co de. The Gp mo del pr op oses an analytical fo r mula which can b e directly used to derive ana lytical expres s ions of Sob ol indices. Indeed, in the case of indep endent inputs and with our choice o f r egressio n and cov a riance functions, the formula of Gp mo del le ads to one and t wo-dimensional numeri- cal integrals, av oiding a large num b er of metamo del predictor ev aluations in Monte Ca rlo metho ds. The use of Gp mo del instead o f o ther metamo del is therefore highly efficient . Another adv antage o f Gp metamo del stands in using its cov aria nce structure to compute Sob ol indices and to build a s so ciated confidence in terv a ls, b y using the glo bal sto chastic mo del including its cov ar iance. On analy tical functions, the b ehavior and conv erg e nc e of the So bo l index es timates were studied in function of the learning s a mple size n and the predictivity of the Gp metamo del. This a nalysis reveals the significant interest of the glo bal stochastic mo del approach when the Gp metamo del is inaccur ate or when few data are av ailable. Indeed, the use of the c ov a riance structure gives sensitivity indices whic h are more ro bust a nd less v aria ble. Moreover, all the distr ibutio n of the sensitivity index (defined as a random v aria ble) can b e simulated following an orig inal algor ithm. Co nfidence interv als of any level for the Sob ol index can then b e built. In our tests, the obser ved level of the interv al was compa red to the exp ected o ne on analytical functions. F or the highest v a lues o f Sob ol indices and under the hypothesis o f a Gp metamo del with a predictivity co efficient larg e r than 6 0%, the co nfidence interv als are s atisfactory . In this case, the use of the glo bal Gp mo del which gives confidence interv als for Sob ol indices has a sig nificant in terest. The only drawbac k is that the use o f cov ar iance structure has a tendency to give a minimal bo und for the influence of all the v ariables and co nsequently to ov e r estimate the low est Sob ol indice s a nd to give inaccura te confidence in terv a ls for very low indice s (close to zero). The use o f cov a riance structure was a lso illustrated o n real data , obtained from a 18 complex hydrogeologic al co mputer co de, simulating radionuclide g roundwater tr ansp ort. This application confirmed the in terest of the second approa ch and the adv a ntage o f Gp metamo del which, unlike other efficient metamo dels (neur al net works, reg ression trees, po lynomial chaos, . . . ), gives confidence interv als for the estimated sensitivity indice s. The same a ppr oach based o n the use of the global Gp metamo del can be used to mak e uncertaint y pr o pagation studies and to estima te the distribution o f the co mputer c o de output in function of the uncertainties on the inputs. 6 A CKNO WLEDGMENTS This work w as supp orted by the MRIMP pro ject of the “Risk Control Domain” that is managed by CEA/Nuclear Ener gy Division/Nuclear Developmen t and Innov ation Divi- sion. W e are als o grateful to S ´ ebastien da V eig a for helpful discus s ions. References [1] M.S. Bazar a a, H.D. Sherali, and C.M. Shetty . Nonline ar pr o gr amming . Jo hn Wiley & Sons, Inc, 199 3. [2] G.E. Box and N.R. Drap er. Empiric al mo del building and r esp onse surfac es . Wiley Series in Probability a nd Mathematical Statistics . Wiley , 1987 . [3] V.C.P . Chen, K-L. Tsui, R.R. Barton, and M. Meck esheimer. A r eview on desig n, mo deling and a pplications of computer experiments. IIE T r ansactions , 38:273–2 9 1, 2006. [4] W. Chen, R. Jin, and A. Sudjian to. A nalytical metamo del-based global sensitiv- it y a nalysis and uncertaint y propag ation for robust design. Journal of Me chanic al Design , 127 :875–8 86, 2005 . [5] J-P . Chil` es and P . Delfiner . Ge ostatistics: Mo deling sp atial unc ertainty . Wiley , New- Y ork , 1 999. [6] N.A.C. Cressie. Statistics for sp atial data . Wiley Series in P robability and Mathe- matical Statistics. Wiley , 1993. [7] C. Currin, T. Mitchell, M. Morris , and D. Ylvisa ker. Bay esian prediction of determin- istic functions with applications to the design a nd ana ly sis o f computer ex per iments. Journal of the Americ an St atistic al Asso ciation , 8 6(416):95 3–96 3 , 1991 . [8] K-T. F ang, R. Li, and A. Sudjianto. Design and mo deling for c omputer exp eriments . Chapman & Hall/CRC, 2 006. [9] J.C. Helton, J.D. Johnson, C.J. Sala b er ry , and C.B. Storlie . Sur vey of sampling- based metho ds for uncerta in ty a nd sensitivity analys is . R eliabili ty Engine ering and System Safety , 9 1:117 5 –120 9 , 2006 . 19 [10] B. Io oss , F. V an Dor pe, a nd N. Devictor. Resp ons e s urfaces and sens itivit y analys es for a n en viro nmen tal mo del of dose ca lculations. Reli ability Engine ering and System Safety , 91 :1241– 1251, 2006 . [11] J. Ja cques. Contributions ` a l’analyse de sensibilit ´ e et ` a l’analyse discriminante g ´ en´ er alis ´ ee . Th` ese de l’Universit ´ e Jos e ph F ourier, Gr enoble 1, 2005. [12] J. Kleijnen. Sensitivity analysis and related a na lyses: a revie w of some statistical techn iques. Journal of Statistic al Computation and S imu lation , 57 :1 11–1 42, 1997. [13] J. Kleijnen and R.G. Sar g ent. A metho dology for fitting and v alidating meta mo dels in simulation. Eur op e an Journal of Op er ational Rese ar ch , 12 0:14–2 9, 200 0 . [14] S.N. Lopha ven, H.B. Nielsen, and J. Sondergaa rd. D ACE - A Mat- lab kriging to o lbox, version 2.0 . T echnical Rep ort IMM-TR-2 002-1 2, Infor- matics and Ma thematical Mo delling, T echnical Universit y of Denmark, 2 002. < http ://ww w .immm.dtu.dk/ ∼ hb n/dace > . [15] A. Mar r el, B. Io o ss, F. V an Dorp e, and E. V olko v a. An efficient metho dology for mo d- eling complex computer co des with gaussian pro cesse s. Submitte d in Computational Statistics and Data Analysis , 20 07. [16] M. Marseguerr a, R. Mas ini, E. Zio, and G. Co jazzi. V ar iance deco mpo sition-based sensitivity ana lysis via neural netw o rks. Re liability Engine ering and System Safety , 79:229 –238, 20 03. [17] J.D. Ma rtin a nd T.W. Simpson. On the use of k r iging mo dels to approximate deter- ministic computer models . AIAA Journal , 4 3:4:853 –863, 2005 . [18] G. Ma theron. La Th´ eorie des V ariables R´ egiona lis´ ees, et ses A pplic ations . Les Cahiers du Centre de Mo rphologie Math´ ematique de F ont ainebleau, F as c ic ule 5. Eco le des Mines de Paris, 1970. [19] M.D. McKay , R.J . B eckman, a nd W.J. Conov er. A compariso n of three metho ds for selecting v alues of input v ariables in the analys is of output from a computer co de. T e chnometrics , 2 1:239– 245, 197 9. [20] J.E. Oakley and A. O’Hagan. Pro babilistic sens itivit y analysis of complex mo dels : a bay es ian approach. Journal of the Ro yal S tatistic al S o ciety, Series B , 66:75 1–769 , 2004. [21] A. O ’Hagan. B ay esian ana ly sis of computer co de outputs: A tutoria l. R eliability Engine ering and System Safety , 91:1 290–1 300, 2 006. [22] C.E. Ras m ussen and C.K.I. Williams. Gaussian pr o c esses for machine le arning . MIT Press, 2 0 06. [23] J. Sa cks, W.J. W elch, T.J. Mitchell, a nd H.P . Wynn. Design and analysis of c o mputer exp eriments. Statistic al Scienc e , 4 :409–4 35, 1989 . [24] A. Saltelli, K. Chan, and E.M. Sco tt, editors. Sen s it ivity analysis . Wiley Series in Probability and Statistics. Wiley , 20 00. [25] T. Santner, B. Williams, and W. Notz. The design and analysis of c omputer ex p eri- ments . Springer, 200 3. [26] I.M. Sob o l. Sensitivity estimates for non linear mathematical mo dels. Mathematic al Mo del ling and Computational Exp eriments , 1:4 07–41 4, 199 3. 20 [27] B. Sudret. Global sensitivity analy s is using p oly no mial chaos expa nsion. T o app e ar in R eliability Engine ering and S ystem Safety , 20 07. [28] E. V a zquez, E. W alter, and G. Fleury . Intrinsic krig ing and prior information. Applie d Sto chastic Mo dels in Business and Industry , 21:2 15–22 6, 200 5 . [29] E. V olko v a, B. Io oss , a nd F. V a n Dorp e. Global sensitivity analysis for a nu meri- cal mo del of r adionuclide migra tion from the RRC ” Kurchatov Institute” radwaste disp o sal s ite. T o app e ar in Sto chastic Envir onm en tal R ese ar ch and R isk Assesment , 2007. [30] R. von Mis es. Mathema tic al The ory of Pr ob ability and Statistics . Academic Press , 1964. [31] W.J. W elch, R.J. Buck, J. Sacks, H.P . Wynn, T.J. Mitchell, and M.D. Morris . Scree n- ing, predicting, and computer exp eriments. T e chnometrics , 34(1):15–2 5, 199 2. 21 List of F igures 1 Conv ergence of sensitivity ind ices in function of the p redictivit y coefficient Q 2 (g-Sobol function). . . . . . . . . . . . . . . . . . . . 23 2 Error in L 2 norm for sensitivity indices in fun ction of n and Q 2 (g-Sobol function). . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 4 3 Error in L 2 norm for sensitivity indices in fun ction of n and Q 2 (Ishigami function). . . . . . . . . . . . . . . . . . . . . . . . . . . 2 5 4 Conv ergence of the observe d lev el of the empirical 90%-confidence in fu nction of n and Q 2 (Ishigami function). . . . . . . . . . . . . . 26 22 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Q 2 S 1 Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model Theoretical 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Q 2 S 2 Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model Theoretical 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Q 2 S 3 Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model Theoretical 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Q 2 S 4 Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model Theoretical 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 Q 2 S 5 Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model Theoretical Figure 1: Conv ergence of sensitivit y in d ices in function of the predictivity coefficient Q 2 (g-Sobol function). 23 25 30 35 40 45 50 55 60 65 70 75 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Learning sample size n Error in L 2 norm Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Q 2 Error in L 2 norm Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model Figure 2: Er ror in L 2 norm for s en sitivit y indices in fu nction of n and Q 2 (g-Sobol f unction). 24 25 30 35 40 45 50 55 60 65 70 75 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Learning sample size n Error in L 2 norm Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Q 2 Error in L 2 norm Mean, 0.05−quantile and 0.95−quantile Predictor only Global Gp model Figure 3: Error in L 2 norm f or sensitivit y indices in f u nction of n and Q 2 (Ishigami f u nction). 25 30 40 50 60 70 80 90 100 110 120 130 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Learning sample size n S 1 Observed level of confidence interval 90% theoretical level of confidence interval 30 40 50 60 70 80 90 100 110 120 130 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Learning sample size n S 2 Observed level of confidence interval 90% theoretical level of confidence interval 30 40 50 60 70 80 90 100 110 120 130 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Learning sample size n S 3 Observed level of confidence interval 90% theoretical level of confidence interval 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Q 2 S 1 Observed level of confidence interval 90% theoretical level of confidence interval 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Q 2 S 2 Observed level of confidence interval 90% theoretical level of confidence interval 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Q 2 S 3 Observed level of confidence interval 90% theoretical level of confidence interval Figure 4: Conv ergence of the observed level of the empirical 90%-confidence in function of n and Q 2 (Ishigami fu nction). 26 List of T ables 1 Connection b et ween the learnin g samp le size n an d the predictivity coefficient Q 2 (g-Sobol function). . . . . . . . . . . . . . . . . . . . 28 2 Real observe d lev el of the empirical 90%-co nfid ence in terv al b uilt with the Gp model for the Sob ol index of each input p arameter (g-Sobol function). . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3 Estimated Sob ol indices, associated standard deviation and confi- dence interv als for MAR THE data. . . . . . . . . . . . . . . . . . . 30 27 Learning sample size n Predictivit y co efficient Q 2 Mean Standard deviation 25 0.67 0.21 35 0.88 0.09 45 0.96 0.02 55 0.98 0.01 65 0.98 6 . 10 − 3 75 0.99 4 . 10 − 3 85 0.99 3 . 10 − 3 95 0.99 2 . 10 − 3 T able 1: Connection b et ween the learning sample size n and the p redictivit y coefficient Q 2 (g-Sobol function). 28 V ariable Theoretical v alue Mean of µ ˜ S i Observed level of the empirical of Sob ol index confidence interv al X 1 0.7164 0.7341 0.9381 X 2 0.1791 0.1574 0.9369 X 3 0.0237 0.0242 0.5830 X 4 0.0072 0.0156 0.8886 X 5 0.0001 0.0160 0.0674 T able 2: Real observe d lev el of the empirical 90%-confidence interv al bu ilt with the Gp mo del for the Sobol index of eac h input p arameter (g-Sobol function). 29 input v ariable Boosting of Predicto r only Whole Gp model regression trees (Gp mo del) S i S i µ ˜ S i σ ˜ S i 90%-confidence interv al p er1 0.03 0.081 0.078 0.020 [ 0 . 046 ; 0 . 112 ] kd1 0.90 0.756 0.687 0.081 [ 0 . 562 ; 0 . 825 ] i3 0.03 0.148 0.132 0.022 [ 0 . 100 ; 0 . 170 ] T able 3: Estimated Sob ol indices, asso ciated standard d eviation and confid en ce interv als for MAR THE data. 30

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment