Bayesian experimental design for the active nitridation of graphite by atomic nitrogen

The problem of optimal data collection to efficiently learn the model parameters of a graphite nitridation experiment is studied in the context of Bayesian analysis using both synthetic and real experimental data. The paper emphasizes that the optima…

Authors: Gabriel Terejanu, Rochan R. Upadhyay, Kenji Miki

Bayesian experimental design for the active nitridation of graphite by   atomic nitrogen
Ba y esian exp erimen tal design for the activ e nitridation of graphite b y atomic nitrogen Gabriel T erejan u, Ro c han R. Upadhy a y , Kenji Miki Center for Pr e dictive Engine ering and Computational Scienc es (PECOS), Institute for Computational Engine ering and Scienc es (ICES), The University of T exas at Austin Abstract The problem of optimal data collection to efficien tly learn the mo del pa- rameters of a graphite nitridation exp eriment is studied in the con text of Ba yesian analysis using b oth synthetic and real exp erimen tal data. The pa- p er emphasizes that the optimal design can b e obtained as a result of an in- formation theoretic sensitivity analysis. Thus, the preferred design is where the statistical dep endence b et w een the mo del parameters and observ ables is the highest p ossible. In this pap er, the statistical dep endence b et w een random v ariables is quan tified b y mutual information and estimated using a k nearest neighbor based approximation. It is shown, that by monitoring the inference pro cess via measures such as en trop y or Kullback-Leibler di- v ergence, one can determine when to stop the data collection pro cess. The metho dology is applied to select the most informativ e designs on b oth a sim- ulated data set and on an experimental data set, previously published in the literature. It is also shown that the sequential Bay esian analysis used in the exp erimen tal design can also b e useful in detecting conflicting information ∗ Corresp onding author: Gabriel T erejanu. E-mail address: terejanu@ices.utexas.edu. Address: ICES, The Univ ersity of T exas at Austin, 1 Universit y Station, C0200 Austin, TX 78712, USA. Pr eprint submitte d to Exp erimental Thermal and Fluid Scienc e Octob er 31, 2018 b et w een measurements and mo del predictions. Keywor ds: Optimal exp erimen tal design, Uncertain ty quantification, Ba yesian analysis, Information gain, Mutual information 1. In tro duction The pap er examines the problem of optimal data collection in order to calibrate a graphite nitridation mo del. The calibration of the mo del is done in the context of Ba yesian framework, where the probability distribution of the uncertain parameters is inferred from observ ations using Ba y es’ theorem. Here, the goal of the exp erimen tal design is to select the optimal v alues for the c on trol v ariables which can b e tuned b y the exp erimen talist suc h that the p ost-experimental uncertain t y in the mo del parameters is reduced. Since pre-experimental decisions hav e to b e made before an y measure- men ts are taken, it is conv enien t to frame the design problem also in the Ba yesian framew ork, in order to av erage ov er the unknown future observ a- tions. A comprehensive and unified view of Bay esian exp erimental design is giv en in Chaloner and V erdinelli (1995). In this pap er, the optimal design is found to maximize the exp ected utilit y when the purp ose of the exp erimen t is to estimate the parameters of the mathematical mo del. Th us the utility function emplo yed in this work is based on Shannon’s measure of informa- tion, Shannon (1948), which reflects our goal for parameter estimation as describ ed by Lindley (1956). Other utilit y functions can b e tailored when the purp ose of the exp eriment is prediction, hypothesis testing or mixed ob jectives including minimizing the financial cost of the experiment. The prop osed approach is applied to a sequen tial design when the ac- quisition of the data can b e done in the context of exp eriments on-demand. 2 Suc h a sequential approach is describ ed b y F edoro v (1972) and Loredo and Chernoff (2003) whic h clearly identify the three stages of the exp erimen tal pro cess: exp erimen t - inference - design. The data already collected is used to up date the probabilit y densit y function (p df ) of the parameters, and the result of this inference can b e further used in identifying the next design whic h b est resolv es our questions of interest in the mo del. After the exp er- imen t is p erformed, the cycle contin ues with a new inference step, follow ed b y a design step and so on until the exp erimental ob jectiv e is reached. Due to the high computational complexit y of the Bay esian exp erimental design, its adoption is rather low esp ecially when dealing with nonlinear mo dels. A large b o dy of work can b e found in the literature on optimal design for linear or linearized mo dels under the name of Ba y esian alphab etic criteria, see DasGupta (1995). Ho w ever in the recent years a reviv ed in terest in Ba yes optimal designs for nonlinear models can b e attributed to Bay esian recursiv e up date, efficien t estimators of information-theoretic measures and to Mark ov chain Monte Carlo (MCMC) algorithms which can efficiently sample complicated p osterior distributions. In Muller (1999), the author mak es use of MCMC to create a new sim ulation-based design, approach whic h jointly samples from an artificial probabilit y mo del on the design, data and parameters. By exploiting the indep endence prop ert y of the noise from the design, Sebastiani and Wynn (1997) extended the maximum entrop y sampling pro- p osed b y Shewry and Wynn (1987) to estimation problems. They show that the exp erimen t which provides the maximum amoun t of information for mo del parameters is the one for whic h the predictiv e distribution has the largest entrop y . In other words, the most informativ e exp erimen t is the one where we kno w the least. A similar information-theoretic approac h is pre- 3 sen ted b y F arhang-Mehr and Azarm (2002), which maximizes the en tropy of Gaussian priors. These metho ds are aligned with previous ideas found in Lindley (1956) of incorp orating information-theoretic approaches in the exp erimen tal design pro cess. The main contribution of this pap er is to emphasize the intuitiv e in ter- pretation of the Ba y esian exp erimental design as an information-theoretic sensitivit y analysis, metho dology whic h is applied to an engineering prob- lem for whic h real experimental measurements exist. It is sho wn that the optimal design for estimation problems is the one whic h maximizes the m u- tual information of the parameters and the future observ ations. Given the connection b et ween m utual information and copula functions used to mo del statistical dep endencies, see Calsa verini and Vicen te (2009), this new in ter- pretation of Bay esian exp erimen tal design reveals that optimal sampling for parameter estimation can b e yielded by an information-theoretic sensitivity analysis, see also App endix B. Th us the optimal design o ccurs where the statistical dep endence b et ween observ ables and parameters is maximized. In con trast to the maximum entrop y sampling, no assumptions are made ab out the functional dependence of the en tropy of conditional distribution on the design. In the inference stage, w e use an adaptive MCMC algorithms prop osed b y Cheung and Beck (2009) to obtain samples from the p osterior distri- bution of mo del parameters. Estimators based on k -nearest neighbor are used to compute the information theoretic measures required in the design stage. The use of these estimators is adv an tageous when only samples are a v ailable to describ e the underlying distributions. The mutual information is estimated using Krask o v’s approach, see Krasko v et al. (2004), which extends the k -nearest neighbor based estimator for differen tial Shannon en- 4 trop y developed by Kozachenk o and Leonenk o (1987). While pap ers on exp erimen tal design can b e found in geoscience - Guest and Curtis (2009), neuroscience - Paninski (2005), biomedical applications - Clyde et al. (1996), Chung and Hab er (2011), Horesh et al. (2011), engineer- ing - T uc ker (2008), just to name a few, the application of Bay esian design principles to actual exp erimen ts still lags far behind the theoretical adv ance- men ts, see Chaloner and V erdinelli (1995). In Curtis and Maurer (2000), the authors compare the small num b er of papers published on a v erage p er y ear on exp erimental design to the large n umber of pap ers published on in- v erse metho ds, emphasizing the disconnect b etw een the amoun t of data and the amount of information contained in the data. A more efficient learning of mo del parameters can b e accomplished by using Ba y esian exp erimen- tal design which tightly couples the computational mo deling, exp erimen tal endea vors and data analysis. In this w ork, we consider an exp erimen t for the nitridation of graphite conducted by Zhang et al. (2009). The main ob jective of the exp erimen t is to measure the reaction rate of the graphite with activ e nitrogen. This quan tity is of great imp ortance for assessment of the effectiveness of the thermal protective system of space vehicles. The main parameter of in terest is the reaction probabilit y of the graphite nitridation reaction. The v alues of this quan tity presented in the literature (Park and Bogdanoff (2006), Goldstein (1964), Suzuki et al. (2008), Zhang et al. (2009)) v ary by several orders of magnitude. The exp erimen ts rep orted in Zhang et al. (2009) hav e a detailed description of the scenarios with readily av ailable data. The authors conducted a series of runs at differen t conditions that are not based on exp erimental design considerations. Hence the effectiveness of a rigorous exp erimen tal design strategy can b e demonstrated. 5 In Section 2 the problem of optimal exp erimental design is describ ed in the Ba y esian framework. The experimental setup for the nitridation of graphite is detailed in Section 3, follow ed by the mo del description in Section 4. The n umerical results for b oth sim ulated data and real experimental data are presented in Section 5, and the concluding remarks are giv en in Section 6. 2. Description of the exp erimen tal design Giv en a set of observ ations D n − 1 = { ˜ d 1 , ˜ d 2 , . . . , ˜ d n − 1 } w e are concerned with finding the next exp erimen tal design ξ n ∈ Ξ such that the mo del parametric uncertain ty is reduced after the exp eriment is p erformed and the asso ciated measurement data ˜ d n is collected. The mathematical mo dels used in this pap er are generally represen ted b y the following abstract mo del: r ( u , θ , ξ ,  s ) = 0 (1) d = d ( u , θ , ξ ,  m ) (2) Here u is the state of the system which ob eys the go v erning equations de- fined b y r ( · ), ξ ∈ Ξ is the con trol v ariables asso ciated with the exp erimen tal scenario, Ξ is the design space and θ ∈ Θ are the mo del parameters, where Θ is the parameter space of the model. Here, the mo del prediction d n calcu- lated using the measuremen t mo del d ( · ) is comparable with the exp erimen tal data ˜ d n for a particular scenario input ξ n . The random vector or random field  s captures an y sto c hastic forcing presen t in the go verning equations suc h as unmodeled dynamics and  m mo dels the discrepancy b etw een model predictions and exp erimen tal data. 6 2.1. Bayesian exp erimental design In the followings it is assumed that w e can afford to p erform up to N exp eriments to obtain the desired measureme n ts. Preferably , one will w ant to p erform as few informativ e exp eriments as p ossible in order to calibrate the mo del. The experimental design process emplo y ed in this pap er generates an optimal sequence of designs such that the information gain is maximized after each exp eriment. More ab out this sequential pro cess of designing exp erimen ts can b e found in F edorov (1972). Note that since this decision pro cess is sequential, and the next design dep ends on the previous exp erimen ts, it do es not guarantee to find the optimal N designs among all possible com binations of N designs. This w ould correspond to a batch up date of the prior parametric uncertaint y and an exp ensiv e com binatorial decision pro cess, which w ould require a large n umber of Monte Carlo samples to e stimate the necessary exp ected utilities for each p ossible combination of designs. The en tire experimental pro cess is divided in three stages: exp erimen t stage, inference stage and, design stage. In the exp erimen t stage, new data is collected according to the strategy obtained in the previous design stage. Initially , this data can also b e obtained from any exp eriments do cumen ted in the literature that are related with the underlying mo deling problem. In the inference stage, the newly obtained exp erimen tal data is used to up date the prior p df of the model parameters using Bay es rule. The resulting p df is further used to predict the distribution of the future observ ations for a v ariet y of scenarios which cov er the design space. The b est design is c hosen b y maximizing the exp ected Shannon information gain, exp ectation taking with resp ect to the distribution of future data. The exp erimental pro cess is depicted in Fig.1. 7 2.2. Bayesian infer enc e step The inference stage consists in applying Ba y es’s theorem to calculate the distribution of the mo del parameters conditioned on all the av ailable data, p ( θ | D n ). Using the conditioning rule, one can derive the following recursive Ba yesian up date: p ( θ | D n ) = p ( D n | θ ) p ( θ ) p ( D n ) = p ( ˜ d n , D n − 1 | θ ) p ( θ ) p ( ˜ d n , D n − 1 ) = p ( ˜ d n | D n − 1 , θ ) p ( D n − 1 | θ ) p ( θ ) p ( ˜ d n | D n − 1 ) p ( D n − 1 ) = p ( ˜ d n | D n − 1 , θ , ξ n ) p ( θ | D n − 1 ) p ( ˜ d n | D n − 1 , ξ n ) (3) Under the assumption of conditionally indep endent measuremen ts given mo del parameters, then the ab ov e expression can b e simplified using the follo wing relation: p ( ˜ d n | D n − 1 , θ , ξ n ) = p ( ˜ d n | θ , ξ n ). The denominator is giv en by , p ( ˜ d n | D n − 1 , ξ n ) = Z Θ p ( ˜ d n | θ , ξ n ) p ( θ | D n − 1 )d θ (4) and quan tifies the evidence pro vided by the new exp erimen tal data, ˜ d n , in supp ort of our mo del conditioned on all the previous measured data. The use of Ba yesian inference in the sequential exp erimen tal design is adv an ta- geous as it p ermits the iterativ e accumulation of information. The p osterior distribution obtained in this stage b ecomes the prior distribution in the next stage of inference. After all the N exp erimen ts ha ve been p erformed, the last p osterior distribution summarizes the information contained in all the a v ailable measurements. Instead of p erforming N exp erimen ts, the data collection pro cess can b e stopp ed earlier, when precision measures computed in the interim stages, 8 suc h as the determinan t of the sample cov ariance matrix, satisfy the thresh- olds predefined by the user. The pro cess can also be ceased when the rate of reducing the uncertain t y in the parameters has slo wed enough or when indication exists that mo del predictions and the corresp onding experimental observ ations are in disagreement. The inv erse problem of calibrating the mo del parameters from the mea- suremen t data is solv ed using Marko v c hain Mon te Carlo simulations. In our simulations, samples from the p osterior distribution θ ∼ p ( θ | D n ) are obtained using the statistical library QUESO Prudencio and Sch ulz (2011) equipp ed with the Hybrid Gibbs T ransitional Mark o v Chain Monte Carlo metho d proposed by Cheung and Bec k (2008, 2009). 2.3. Optimal exp erimental design step According to Lindley (1956) when the ob jectiv e of the exp erimen t is to learn ab out the mo del parameters θ , then the utility function is giv en by the amoun t of information provided by the measurement ˜ d n as result of p erforming the exp erimen t with the design ξ n . Since ˜ d n has not y et been observ ed, and we hav e to mak e a decision regarding the design ξ n prior to the exp erimen t, in the follo wing, w e are using the unknown measuremen t d n and its corresp onding predictiv e distribution. U ( d n , ξ n ) = Z Θ p ( θ | d n , D n − 1 ) log p ( θ | d n , D n − 1 )d θ − Z Θ p ( θ | D n − 1 ) log p ( θ | D n − 1 )d θ (5) Here, the prior p df for mo del parameters is given b y p ( θ | D n − 1 ), and it is used to compute the following predictiv e distribution for the observ ables when the input scenario is ξ n : p ( d n | D n − 1 , ξ n ) = Z Θ p ( d n | θ , ξ n ) p ( θ | D n − 1 )d θ (6) 9 Since we are in the pre-exp erimen tal stage, we can compute the a verage amoun t of information provided by an experiment ξ n b y marginalizing o v er all the unkno wn future observ ations: E d n [ U ( d n , ξ n )] = Z D U ( d n , ξ n ) p ( d n | D n − 1 , ξ n )d d n (7) The optimal exp erimen t is obtained by solving the follo wing optimization problem: ξ ∗ n = arg max ξ n ∈ Ξ E d n [ U ( d n , ξ n )] (8) By substituting Eq.(5) into Eq.(7), the exp ected utilit y can be written in the follo wing form: E d n [ U ( d n , ξ n )] = Z D Z Θ p ( θ , d n | D n − 1 , ξ n ) log p ( θ , d n | D n − 1 , ξ n ) p ( d n | D n − 1 , ξ n ) d θ d d n − Z D Z Θ p ( θ , d n | D n − 1 , ξ n ) log p ( θ | D n − 1 )d θ d d n (9) = Z D Z Θ p ( θ , d n | D n − 1 , ξ n ) log p ( θ , d n | D n − 1 , ξ n ) p ( θ | D n − 1 ) p ( d n | D n − 1 , ξ n ) d θ d d n = I( θ ; d n | D n − 1 , ξ n ) (10) Therefore the optimal exp erimen t is the one which maximizes the m utual information b et w een the mo del parameters and the mo del predictions. In other words w e would like to sample where the exp ected observ ations ha ve the highest impact on mo del parameters. ξ ∗ n = arg max ξ n ∈ Ξ I( θ ; d n | D n − 1 , ξ n ) (11) F rom information theory , see, for example Cov er and Thomas (1991), m utual information quantifies the reduction in uncertaint y that kno wing the mo del parameters provides ab out the mo del predictions and vice-versa. Being written as the Kullback-Leibler div ergence b et w een the joint distri- bution of parameters and predictions, and the pro duct of their marginals, 10 w e can also sa y that mutual information pro vides a measure of statistical dep endence b et ween the tw o random v ariables. This has b een shown more formally in Calsa verini and Vicente (2009) by making the connection b e- t ween m utual information and copula functions whic h are used to model the dep endence betw een random v ariables. F or completeness this connection is also presented in App endix B. It turns out that the mutual information is just the negative copula entrop y . Hence, mutual information is indep en- den t on the marginal distributions and it quantifies only the dep endence information con tained in the copula function. In neuroscience, Paninski (2005) uses the same Bay esian optimal design based on maximizing the m utual information b et w een mo del paramete rs and mo del predictions. It is shown that the information-maximization sampling is asymptotically more efficient than an indep enden t and identically dis- tributed sampling strategy . Compared with maximum en tropy sampling for parameter estimation prop osed by Sebastini and Wynn (2000), the sampling based on maximizing the m utual information is more general, no assumption ab out the indep endence assumption is made ab out the discrepancy mo del with resp ect to the design. Finally , in the design step, the exp erimental space Ξ is discretized and the corresp onding marginal distributions of mo del predictions and parame- ters, as w ell as the joint p df of the tw o are computed. The mutual informa- tion corresp onding to eac h exp erimen tal design is calculated as in Subsection 2.4, and the design corresp onding to the highest MI is selected for the next exp erimen t. Here, samples from the joint distribution of parameters and mo del predictions are obtained using Mon te Carlo simulations according to 11 the follo wing relation: p ( d n , θ | D n − 1 , ξ n ) = p ( d n | θ , ξ n ) p ( θ | D n − 1 ) (12) 2.4. Estimating the mutual information The estimation of the mutual information in this paper is done using the k NN (Nearest Neighbor) metho d prop osed b y Krasko v et al. (2004). This is based on the k NN estimator for the en tropy prop osed by Kozac henko and Leonenk o (1987). The entrop y of a random v ariable can b e approximated from N samples using Kozachenk o-Leonenk o’s estimator: H( X ) ≈ − ψ ( k ) + ψ ( N ) + d X N N X i =1 log  ( i ) (13) where ψ ( · ) is the digamma function, d X is the dimensionality of the random v ariable X , and  ( i ) is the distance from sample X i to its k NN, X k ( i ) , and is giv en by the maximum norm:  ( i ) = k X i − X k ( i ) k ∞ (14) This is a biased estimator due to the assumption of lo cal uniformity of the density . By definition, the MI can be decomposed as the sum of marginal en tropies min us the joint en tropy . Therefore, the abov e estimator can be used to approximate all three entropies and thus the mutual information. Ho wev er the errors made in estimating individual entropies will not cancel out in general. T o partially alleviate this problem, Krask o v’s approac h is not to fix k when estimating the marginal en tropies. It can b e shown, see Krask ov et al. (2004), that the MI of the parameters and mo del prediction can b e computed as follo ws: I( θ ; d n | D n − 1 , ξ n ) ≈ ψ ( k ) − 1 N N X i =1  ψ ( n θ ( i ) + 1) + ψ ( n d ( i ) + 1)  + ψ ( N ) (15) 12 where n d ( i ) and n θ ( i ) are the num b er of p oin ts in the marginal space within  ( i ) distance from the i th sample. Here,  ( i ) is the distance from the i th sample to its k NN in the join t space. Based on n umerical studies, Khan et al. (2007) shows that the estimators based on k NN and kernel densit y estimation (KDE) outp erform commonly used dep endence measures suc h as linear correlation, cross-correlogram or Kendall’s τ . F urthermore, it is also shown that in general the k NN esti- mator captures b etter the nonlinear dep endence than the KDE. How ev er the Krasko v’s MI estimator has to be used with care. A small v alue for k will result in a small bias but a larger v ariance and vice-versa. Also, the efficiency of the estimator decreases as the dimensionalit y of the joint space increases. F or completeness, an example test has b een carried out to show the p er- formance of the k NN estimator for the MI as function of the dimensionality for the joint space, the num b er of the samples and the n umber of k NN’s. The joint p df of t wo random v ariables of known dimensions is mo deled us- ing a Gaussian density function for which we can compute the exact MI. F or each combination of joint dimensionalit y , k NN, and sample size, a num- b er of 1000 trial runs hav e b een used to generate samples according to the Gaussian density function. The entries in the cov ariance matrix hav e b een considered to b e 1 on the diagonal and 0 . 9 for the off-diagonal elements. Using the true v alue of the MI and the estimated v alue, one can compute the relative absolute error for eac h run. The result of the a verage absolute relativ e error ov er the 1000 trial runs is shown in T able 1. 13 2.5. Example 1 : Simulation exp eriment Consider the follo wing nonlinear design problem: d = θ 1 ξ + θ 2 ξ 2 + f ( ξ )  (16) The goal of the exp erimen t design in this example is to efficien tly learn the mo del parameters, θ 1 and θ 2 , when the design space is giv en b y Ξ = [ − 1 , 1]. The discrepancy mo del is giv en by f ( ξ )  , where  is a standard normal random v ariable, N (0 , 1). The t wo parameters are uniformly dis- tributed according to θ 1 ∼ U (1 , 2) and θ 2 ∼ U (3 , 4). Tw o cases are ana- lyzed: additive noise when f ( ξ ) = 0 . 01 and m ultiplicative noise with respect to the design when f ( ξ ) = 0 . 5(1 . 1 − | ξ | ). Giv en the initial uncertaint y of the parameters and the discrepancy function, the resp onse of the mo del it is sho wn in Figs.2(a)-2(b). In this resp ect we w an t to compare the maximum en tropy sampling with the information maximization sampling describ ed in this paper. Hence, the tw o cost functions to b e maximized in the design stage are, • J ( ξ n ) = I(( θ 1 , θ 2 ); d n | D n − 1 , ξ n ) for information maximization (IM) sampling • J ( ξ n ) = H( d n | D n − 1 , ξ n ) for maxim um entrop y (ME) sampling The tw o sampling approaches, IM and ME, are also compared with a strategy based on random sampling (RND), whic h at eac h stage it randomly c ho oses a design. All the designs b eing equally likely at every stage. In this example w e consider that up to 10 exp eriments can be p erformed with scenario v alues equally distributed in the design space as indicated b y the dashed lines in Figs.2(a)-2(b). F or eac h case corresp onding to the 14 discrepancy function, additive noise or m ultiplicativ e noise, t w o other cases are analyzed with resp ect to exp erimental sampling restriction. In the first case no rep eated measurements are allo w ed. Th us, once an exp erimen tal design has b een chosen from the set of 10 designs, it can no longer b e used in the subsequent stages. The second case corresponds to allo wing rep eated measuremen ts in our calibration. Synthetic measuremen ts are created using Eq.(16), where the true v alue of the parameters are set to θ 1 = 1 . 5 and θ 2 = 3 . 5. A different noise sample has been generated for each of the 10 scenarios. F or this example w e hav e used 2000 samples in computing the MI and the k N N has b een set to 6. F or the t w o cases considered in this example (additiv e and m ultiplica- tiv e noise) eac h strategy will yield a sequence of designs. At eac h stage of the design, the p erformance of each strategy is giv en by the reduction in uncertain ty of the parameters which is quantified by the entrop y of the pa- rameter distribution. The entrop y is computed here using Eq.(13) and for comparison purp oses, since we are generating syn thetic measurements and use a random sampling approac h, the entrop y is av eraged ov er 100 Monte Carlo runs. As exp ected, for the additive noise the maxim um en tropy sampling is equiv alen t to the information maximization sampling as it is shown in Fig.3(a) and Fig.3(c). T able 2 shows that in this case the highest initial re- duction in uncertaint y is obtained b y starting the sampling at the b oundary of the design domain. F or m ultiplicative noise how ev er, due to the discrep- ancy model used, the maximum entrop y is forced to start the sampling using the design p oints in the middle of the domain, where the uncertaint y is the highest, see T able 3 and Fig.2(b). Because of the distribution of the noise, the p erformance of the maximum entrop y strategy is w orse on av erage ev en 15 compared with the random sampling. A faster reduction in uncertaint y is obtained in this case using the in- formation maximization sampling as it is sho wn in Fig.3(b) and Fig.3(d). F or b oth additive and multiplicativ e noise cases, an accurate approximation to the true v alue of the parameters is given by the information maximiza- tion sampling using only tw o experiments as it is shown in Figs.4(a)-4(d). Therefore, an efficient learning of model parameters in the general nonlinear design problems can b e achiev ed using information maximization sampling. Regarding the exp erimen tal sampling restriction, in both cases when w e allo w and do not allo w rep etitiv e measurements the p erformance of IM is sup erior when compared with ME and RND, see Fig.3. When rep eated measuremen ts are allo wed and w e hav e additive noise, b oth IM and ME consisten tly select the same b oundary design p oin ts, see T able 2. F or m ul- tiplicativ e noise and rep eated measurements, w e ha ve the same consistency in c ho osing the designs, IM selecting the b oundary design p oin ts while ME selecting the middle design p oints, which in this case are the least informa- tiv e designs. This explains the p o or performance of ME compared with IM and RND, see Fig.3(d). 3. A description of exp erimen tal setup and data The experimental scenario is describ ed in muc h detail in Zhang et al. (2009). F or completeness, we briefly describ e the exp erimen tal set-up, the measuremen ts and their asso ciated uncertainties. The exp erimen ts w ere p erformed to measure the reaction probabilit y of the graphite nitridation reaction, C Gr + N → CN. The high-purity graphite (grade DFP-2) in the form of 3.175 mm diameter ro ds was used as the sample. The sample w as 16 placed in a furnace-heated quartz tub e shown in Fig. 10. A quartz flo w tub e (22 mm inner diameter) extended through a high-temp erature tub e furnace with a hot region of appro ximately 45.5 cm. Atomic nitrogen w as generated b y flo wing nitrogen through a microw a v e disc harge lo cated up- stream of the furnace. The microw a ve disc harge was op erated at a constant p o w er of around 90-95 W under steady-state temperature and gas flow con- ditions. The pressure was measured b efore and after the furnace using a 10 torr Baratron capacitance manometer. The data sheet from the instrument rep orts uncertaint y as ± 1% of the recorded v alue. The temp eratures of the flo w tub e w ere measured using Omega stic k-on type K thermo couples at the lo cation of both pressure p orts and near the entrance and exit of the furnace. T emp eratures were monitored p erio dically to ensure steady state op eration. The gas p ressure and bulk v elo cit y in the flo w tub e w ere v aried b y simultane- ous adjustment of the incoming N 2 flo w rate with a throttling v alve lo cated in the downstream pumping manifold. The uncertain t y in the flo w rate is quoted as b eing no more than 1%. The mole fraction of atomic nitrogen en tering and exiting the furnace w ere measured by gas phase titration with nitric oxide. In this metho d, controlled quantities of nitric oxide (NO) is in tro duced in to the titration region. The NO reacts extremely rapidly with atomic N and the disapp earance of the NO signals the titration end p oin t. The nitrogen atoms that strike the graphite surface react with the graphite sample and cause mass loss. Each sample w as weighed immediately b efore it was placed into the flow tub e and immediately after it w as remo v ed from the flo w tub e, using a Mettler T oledo XP105 analytic balance with a 0.01 mg resolution. Control exp eriments were run at v arious conditions without the microw a v e discharge to measure the extra mass loss that o ccurs due to de-v olatilization of some residual hydrocarb ons in the sample or reaction 17 with oxygen leaking in to the flo w system. The measured mass loss was ad- justed b y subtracting the con trol mass loss. The mass loss rate was obtained b y dividing by the time span of the exp erimen t. The most extensive set of measuremen ts ha ve b een p erformed at furnace temp erature of 1273 K. The measuremen ts that are required for the calibration of v arious parameters describ ed in Section 4 are pressure exiting the furnace, and the mass loss of the sample. 4. Mo del description and uncertain parameters In this section, w e present the mo del that is used to compute the mea- sured data as a function of mo del and control parameters, the system defined abstractly in Eq. (1) in Section 2. The experiment (Zhang et al. (2009)) is briefly described in Section 3. Since the fo cus of this work is not in the detailed physical mo deling but in the in vestigation of most informed exp er- imen tal design strategy , we use an extremely simplified mo del with several assumptions prop osed by Zhang et al. (2009). The flo w is assumed to b e one dimensional and giv en b y the Hagen-Poisseuille flow model with tem- p erature v arying densit y given b y the ideal gas law. The bulk flow consists en tirely of nitrogen as the concen trations of atomic nitrogen and cy anates are negligibly small. T emp erature is assumed to not v ary with tub e radius. The temp erature along the cross section of a tub e is the same as the w all temp erature. The w all temp erature is linearly in terp olated b et ween kno wn temp erature v alues along the tub e length. The extra pressure drop due to the sample holder is tak en in to accoun t b y using an effective tub e diameter in the Hagen- P oisseuille flow mo del. With these assumptions the pressure, mean velocity 18 and densit y profiles along the tub e are obtained b y solving: P dP dz = − 128 ˙ m N 2 µ N 2 RT ( z ) π d 4 ef f W N 2 (17) v ( z ) = 4 . 0 ˙ m N 2 ρ ( z ) π d 2 ef f (18) ρ ( z ) = P ( z ) W N 2 RT (19) P ( z = 0) = P 1 ; v ( z = 0) = U 1 ; T ( z ) = Ψ( z ) (20) In the ab o ve P ( z ) denotes the temperature as a function of length along the tub e z , v ( z ) is the bulk flow velocity , ˙ m N 2 is the mass flow rate of N 2 , ρ ( z ) is the density of N 2 , T ( z ) is the temp erature of the wall and also the fluid along the tub e giv en by interpolating measured temp eratures (Ψ( z ) is the linear interpolant), d ef f is the effective diameter, W N 2 is the molecular w eight of N 2 , P in and U in are the inlet pressure and v elo cit y respectively and R is the universal gas constan t. In Eq. (17), Sutherland’s mo del is used for the viscosity . Thus, µ N 2 = µ ref ,N 2  T ref + S N 2 T + S N 2   T T ref  3 / 2 . The constan ts app earing in Sutherland’s mo del are also assumed to b e known precisely . They are: µ ref ,N 2 = 0 . 01781 e − 3 Pa s, S N 2 = 111 . 0 and T ref = 300 . 55 K. W e also need the v alues of the nitrogen atom concen tration as a function of z. The nitrogen atom concen tration is also assumed to not v ary with the tub e radius. This means that diffusiv e pro cesses for the N-atom as w ell as the temp erature are assumed to b e instantaneous. The assumption is justified due to the small radius, reasonably high sp eed flow and relatively slo w w all recom bination reactions. With these assumptions, the N -atom concen tration profile along the tub e is obtained b y solving: d ( v ( z ) C N ) dz = − γ N ( T ) ¯ v N ( T ) C N d ef f − 2 k ( T ) C N 2 C 2 N (21) C N ( z = 0) = C N , 1 (22) 19 C N is the molar concen tration of atomic N, γ N is the reaction co efficien t for reactions with the quartz w all defined b elow in Eq. (24), k ( T ) is the reaction rate for gas phase recom bination of N atoms defined in Eq. (23), ¯ v N ( T ) is the thermal velocity of N atoms giv en b y kinetic theory defined b elo w in Eq. (25), C N 2 is the concentration of N 2 gas. The first term in the righ t hand side mo dels the loss of N atoms as a result of recombi nation at the wall, the second term accounts for the loss of N atoms due to reaction in the gas phase to pro duce molecular nitrogen N 2 . The mo dels/definitions for the v aries quan tities introduced are: k ( T ) = A N N exp  − E a,N N RT  (23) γ N ( T ) = γ N ( T ref )  T T ref  α (24) ¯ v N ( T ) = r RT 2 π W N (25) W e assume that sublimation is negligible. The only reaction is a global first order reaction b etw een the solid graphite and nitrogen atom to give CN gas: C ( Gr ) + N → C N . In the c hosen mo del, called the Gas Kinetic (GK-) mo del, the transp ort of the nitrogen atoms to the surface is assumed to b e instantaneous. The backw ard reaction is ignored. Hence for the GK - mo del, the mass loss of carbon is given b y the follo wing: ∆ M C = (∆ tπ d sample W C )  Z z s + L s z = z s C N ( z ) p T ( z ) β N dz  r R 2 π W N (26) The main quan tity of interest for this exp eriment is the nitridation co efficien t β N . ∆ t is the time in terv al of the test, d sample is the diameter of the sample, W C is the molecular weigh t of carb on, z s is the sample lo cation, L s is the length of the sample and C N ( z ) is the molar concen tration of N along the sample obtained by solution of Eq. (21). In Eq. (26), only the forward 20 equation is considered and the wall concentration of N is the same as the free stream concen tration. In the following, the mass loss and the output pressure d n = { ∆ m C , P 2 } , are used as the observ ables. They can b e obtained as a solution of equations (17) and (26) along with the supplementary equations (18) to (25). F rom a review of the literature, we choose the most uncertain parameters to b e θ = { d ef f , γ N , β N } . The sto c hastic forcing term  s in Eq. (1) is neglected. The discrepancy term  m in Eq. (2) is a Gaussian random v ariable whose 95% confidence in terv al is taken to b e the estimated error quoted in Zhang et al. (2009), and it is used here to define the lik eliho o d function in the Ba yesian inv ersion. The control v ariables are the volumetric flow rate, inlet pressure and the inlet N atom mole fraction : ξ n = { F N 2 ,in , P 1 , X N , 1 } from which the mass flo w rate, ˙ m N 2 and inlet nitrogen atom concentration C N , 1 defined in Eqs. (18) and (22) can b e computed. 5. Results T o recapitulate, the most uncertain parameters app earing in the mo del form ulation described in Section 4 are the effective diameter d ef f , the reac- tion efficiency γ N and, the nitridation co efficien t β N . T o learn the mo del parameters θ = { d ef f , γ N , β N } the mass loss and the output pressure are used as observ ables, d n = { ∆ m C , P 2 } . In the followings we present the exp erimen tal design analysis on both simulated data and real exp erimental data as given in Zhang et al. (2009). F or these examples, we ha ve used 1000 samples p er level in our adaptive MCMC and 2000 samples at the last lev el. These 2000 samples ha ve also b een used in the forward uncertain ty propagation and in the MI calculation. W e ha ve set the k NN to 6 in the 21 follo wing results. 5.1. Example 2 : Simulate d me asur ements In this example w e use the mo del presented in Section 4 to generate syn thetic measuremen ts for pressure and mass loss b y adding noise to model predictions using 1% uncertain ty for pressure and 5% uncertaint y for mass loss. T o generate the measuremen ts the follo wing v alues hav e been used for the normalized parameters: d ef f d ef f 0 = 0 . 95, log  γ N γ N 0  = 1 . 0 and, β N β N 0 = 2 . 0, where the nominal v alues are giv en by: d ef f 0 = 0 . 022, γ N 0 = 10 − 3 , and β N 0 = 3 . 29 × 10 − 3. The v alues for the exp erimen tal scenarios ( F N 2 ,in , P 1 , X N , 1 , ∆ t , L S ) are the same as in Zhang et al. (2009), and presented in Fig.7. Similar with the previous example three strategies are used to select the optimal sequence of designs from the set of 28 designs: maxim um entrop y sampling, information maximization sampling and, ascending sampling whic h starts with the first tabulated design in Zhang et al. (2009) (en umerated in Fig.7 from E 0 to E 27), and con tinues providing the next av ailable one at each stage. Fig.8(a) sho ws the drop in uncertaint y v ersus the num b er of design stages. The design sequences pro vided by b oth maxim um entrop y sampling and the information maximization sampling, see T able 4, give a comparable learning rate, whic h is sup erior to the ascending sampling. Fig. 6 giv es the relativ e en trop y for ME sampling and relative mutual information for IM sampling at each design stage. The difference betw een the information the- oretic approaches and the ascending sampling is that the former ones giv e higher priorit y to the designs with high inflow. ME sampling preferably selects the experimental scenarios with large v alues for inflow conditions, either F N 2 ,in , P 1 , X N , 1 . F or example, first the 22 scenario of 14 th measuremen t has the largest inflo w rate and pressure among others. It follo ws the same type of experimental scenario. A ph ysical ex- planation for this trend could b e hypothesized as follows. A large v alue of F N 2 ,in results in the large pressure drop through the exp erimen tal re- gion. This enhances the sensitivit y of pressure on d ef f through Eq.(18). Also, a large pressure (e.g., 14 th , 6 th measuremen ts) accelerates chemical reactions esp ecially the gas-phase recom bination through the three-b o dy reaction: N + N + N 2 ↔ N 2 + N 2 . The significance of 20 th and 21 st mea- suremen ts preferably selected in the early stage is related to the com bination of the large v olumetric flow rate and relatively small pressure, resulting the large velocity . This reduces the residence time of flow and consequently has a large influence on the determination of N-atom concen tration. On the con trary , the scenarios with the small v alues of inflow conditions (e.g., 0 th , 1 st measuremen ts) are selected in the later stages. On the other hand, IM sampling prefer alternately selecting the exper- imen tal scenarios of the large v alues and small v alues for inflow conditions after the first 3 scenarios. These difference is clearly seen in the differen t order of selecting 0 th and 1 st measuremen ts. It seems to the authors that IM sampling p erforms in a more logical manner b y mixing a v ariety of infor- mation. This can b e seen in Fig.7 which pro vides a graphical representation for the inflo w conditions preferred by b oth IM and ME. The marginal p dfs of the parameters at differen t stages are plotted in Fig.9. Note that b oth information theoretic approaches provide a b etter determination of the mo del parameters than the ascending sampling ap- proac h. While including all the 28 measurements w e are able to recov er with go o d accuracy the mo del parameters, information maximization can pro vide an adequate estimate after the first tw o stages, follo wed closely by 23 the maxim um entrop y sampling. Kullbac k-Leibler (KL) div ergence b etw een the p dfs obtained at the cur- ren t stage and the previous one is also included in order to monitor the ev olution of the inference pro cess, see Fig.8(b). The KL div ergence is cal- culated using also a k NN estimator as describ ed in the App endix A. While the en tropy quantifies ho w certain we are ab out the mo del parameters, the KL div ergence quan tifies ho w m uc h information has b een gained b y adding the curren t measurement. This includes the reduction in uncertain ty as w ell as a change in the supp ort of the p df. Decreasing entrop y and large v alues of KL are indicativ e that the mo del is still learning. This is equiv alent with obtaining tigh ter p dfs at each stage ho wev er their high probability supp ort is changing from one stage to another. A sequence of low en tropies as well as lo w KL div ergence indicates that the learning pro cess has slo wed down and one can stop the exp erimen tal process. In this example, b y thresholding the KL at 0 . 5 nats, dep ending on the lev el of uncertain ty , giv en here b y the en tropy , one can stop the exp erimen tal pro cess at an y stage greater than 5. Th us for simulated data not all the 28 measuremen ts are needed to reco ver the mo del parameters with a giv en lev el of accuracy . 5.2. Example 3 : Study c ase on existing exp erimental me asur ements The same analysis as in the previous example is no w carried on real exp erimen tal data as provided in Zhang et al. (2009). As in the simulated measuremen t case, b oth the information theoretic approac hes giv e design se- quences whic h start with high flo w designs. See T able 5 for the corresp onding design sequences or Fig. 11 for the relative exp ected utility corresp onding to the tw o information theoretic strategies at each design stage. The profile of the uncertaint y reduction sho wed in Fig.13(a) is similar with the one in 24 Fig.8(a). Ho wev er, the same cannot b e said ab out the KL divergence plot- ted in Fig.13(b). No matter whic h design strategy is used, eac h subsequen t measuremen t brings new information which mo v es the high probability sup- p ort of the pdf from one stage to another, see Fig.14 for marginal pdfs at differen t design stages. As mentioned in the previous section, the mo del discrepancy is assumed to b e Gaussian and thus in this case the likelihoo d function has nonzero tails. Given that the prior is uniform, the subsequen t p osterior pdfs will hav e nonzero tails o ver the supp ort of the prior. When compared with the simulated data, this b eha vior is indicative of conflicting information betw een exp erimental measuremen ts and mo del predictions. In this case the p osterior distribution asserts that the high probabilit y support of both the prior and the data are highly implausible, for more information see O’Hagan and F orster (2004). When dealing with real data this is not unexp ected giv en that most of the time we ha ve mo del error or some of the measuremen ts are not self-consisten t. In other w ords, either different biases exist in the measuremen ts or the mo del along with the probabilistic assumptions ha v e limited explanatory p o wer of the real phenomenon, or both mo del errors and corrupted measurements can exist at the same time. In suc h situations, the exp erimen tal pro cess should be stopp ed after a pre-determined num b er of exp erimen ts to find the cause of the disagreemen t. While one can use a more direct wa y to c hec k for this disagreemen t, the authors b eliev e that using the KL div ergence in this settings can pro vide another useful to ol for mo del calibration diagnostics. Generally sp eaking, the order of measurement sequence pic k ed by ME sampling is v ery close to one with the simulated data case. In other words, it picks the scenario based on the large v alues of either F N 2 ,in , P 1 , X N , 1 , see Fig.12. Ho wev er, IM sampling picks the exp erimen tal scenarios in a slightly 25 differen t wa y from ones with the simulated data. F or example, with the real data, IM sampling prefers the large v alues of inflow conditions in the early stage and do es not alternativ ely pick differen t scenarios un til 10 th design stage. Note 1 st measuremen t is selected at the second stage, how ever the c hoice of 0 th measuremen t is further delay ed. As men tioned ab o ve, the large v alues of the inflow scenario parameters are more informative due to the abundance of nitrogen atoms at the surface of the sample. Ho wev er, when the real exp erimental data is used, this kind of data can b e problematic b ecause the ph ysical mo del error and exp erimen tal uncertain ty tend to b e magnified in such a condition. In Fig. 14, the p dfs of d ef f from ME sampling sta y close to the low er b ound un til 24 th design stage. Ho wev er, the p dfs from IM sampling start to shift tow ard the final estimate earlier (say , after it considers 26 th mea- suremen t that has a quite small volumetric flo w rate). This explains why the sampling in ascending order considering 0 th and 1 st measuremen ts first do es the b etter job for this parameter. F or the other parameters, γ N and β N , the large v alues of the inflow scenario parameters are preferable for get- ting more information due to enhancement of the c hemical reactions. In the middle column of Fig. 14, the posterior pdfs of γ N from IM sampling and ME sampling appreciably v aries from Stage 6 (Fig. 14(b) to Stage 12 (Fig. 14(e)). This can b e partially understo od by the exp erimental data uncertainties related to 13 th measuremen t and 27 th measuremen t. The authors recognize these measurements are problematic and probably bring quite different in- formation on this parameter. The scenarios of these exp eriments are almost iden tical. Nev ertheless, the observ ed N-atom concen tration is quite different (i.e., χ N , 13 /χ N , 27 ≈ 1 . 6). As a result, the measurements of the mass loss 26 that are used for calibration significantly differ each other. In fact, the lo cal p eaks seen in Fig. 13(a) at 9 th design stage from ME sampling and at 7 th design stage from IM sampling corresp onds to these experiments. Moreov er, the knurli ng feather seen in Fig. 13(b) app ears to b e related to some incon- sistency in the exp erimental measuremen t. F or example, for ME sampling, the first p eak around 4 th and 5 th design stage are related to 13 th and 6 th measuremen ts. The problem can b e substan tiated by a simple analysis as it follo ws. F rom Eq. (26), w e get the following relationship in an approximate manner: ∆ M C ≈ ∆ t ¯ C N ≈ ∆ tP χ N , (27) where ¯ ∗ is the av eraged v alue along the sample and χ N is the mole frac- tion (= C N RT /P ). Using the linear in terp olation for the pressure and concen tration, we can roughly estimate ¯ χ N and ¯ P by a veraging the ex- p erimen tal measuremen ts of P and χ at t wo measurement lo cations. The estimated ratio of the mass loss for these exp erimen t by Eq. (27) is given b y ∆ M 0 C, 6 / ∆ M ” C, 13 ≈ 0 . 9. This is significan tly different from the actual mea- suremen t, 0 . 7. The same kind of analysis explains the first p eak app earing at 3 rd and 4 th design stage, corresp onding to 20 th and 12 th measuremen ts, for IM sampling. (∆ M 0 C, 20 / ∆ M ” C, 12 ≈ 0 . 35 that is m uch larger than the measuremen t, 0 . 18). These simple analysis illustrates differen t exp erimen tal data provide the mo del with different information. As a consequence, Fig. 13(b) is significantly differen t from Fig. 8(b) where the simulated data is p erfectly consisten t with the mo del itself. 27 6. Conclusions In this pap er the exp erimen tal design problem is form ulated in the Ba yesian framework, and it is shown to b e equiv alent with an information theoretic sensitivit y analysis b et ween mo del parameters and observ ables. The optimal design selected by the information maximization sampling is the one which provides the highest statistical dep endence b et ween the tw o random v ariables. The statistical dep endence is quantified using mutual in- formation whic h is appro ximated in this pap er using a k nearest neigh b or estimator. Theoretically the information maximization sampling is more general than the maxim um entrop y sampling and more efficien t than ran- dom sampling. The entrop y and Kullbac k-Leibler information metrics are used to mon- itor the learning pro cess. A decreasing trend in entrop y is indicativ e that the uncertaint y in mo del parameters is reduced by assimilating eac h addi- tional measurement. The information gain brough t by eac h meas uremen t is quantified b y the Kullbac k-Leibler div ergence. Large v alues are indica- tiv e that the mo del is still learning. A large divergence b etw een subsequen t p osterior distributions migh t b e explained by a large uncertaint y reduction or the existence of conflicting information b et w een model predictions and measuremen ts. When the trend in the Kullback-Leibler divergence is de- creasing one can stop the exp erimental proces s giv en a level of accuracy , otherwise one may still c ho ose to stop the exp erimen tal pro cess to chec k the form ulation of the mo del, the probabilistic assumptions and/or analyze the collected measuremen ts to find p oten tial outliers. 28 App endix A: Estimating the Kullbac k-Leibler div ergence using k NN Similar with the en trop y and m utual information, the approximation of the Kullbac k-Leibler divergence is based on a k nearest neighbor approach as prop osed in W ang et al. (2006). D K L ( p ( x | D n ) || p ( x | D n − 1 ) ≈ d X N n N n X i =1 log ν n − 1 ( i ) ρ n ( i ) + log N n − 1 N n − 1 (28) where d X is the dimensionality of the random v ariable X , N n and N n − 1 giv e the n umber of samples { X i n | i = 1 , . . . , N n } ∼ p ( x | D n ) and { X j n − 1 | j = 1 , . . . , N n − 1 } ∼ p ( x | D n − 1 ) resp ectiv ely , and the t w o distances ν n − 1 ( i ) and ρ n ( i ) are defined as follo ws: ρ n ( i ) = min j =1 ...N n ,j 6 = i k X i n − X j n k ∞ (29) ν n − 1 ( i ) = min j =1 ...N n − 1 k X i n − X j n − 1 k ∞ (30) App endix B: Information-theoretic sensitivit y analysis Mutual information is a con venien t wa y to quantify the statistical de- p endence betw een t wo random v ariables, X 1 and X 2 : I ( X 1 , X 2 ) = Z f ( x 1 , x 2 ) log f ( x 1 , x 2 ) f X 1 ( x 1 ) f X 2 ( x 2 ) d x 1 d x 2 (31) A more formal relation b etw een statistical dep endence and mutual in- formation can be shown with the help of copula functions as presented in Calsa verini and Vicente (2009), and included here for completeness. Sklar (1959) pro v ed that the join t distribution of t wo random v ariables, F ( x 1 , x 2 ), can b e represen ted using a c opula function, C : [0 , 1] 2 → [0 , 1], and the marginals distributions, F X i : F ( x 1 , x 2 ) = C ( F X 1 ( x 1 ) , F X 2 ( x 2 )) (32) 29 Giv en that F X 1 and F X 2 are contin uous, then F X 1 ( X 1 ) and F X 2 ( X 2 ) are t wo uniformly distributed random v ariables. Th us, the copula function can b e regarded as a joint cumulativ e distribution of tw o uniformly distributed random v ariables. It separates the study of the dependence b et w een random v ariables and their marginal distributions. By denoting u i = F X i ( x i ), Eq. (32) can b e written as follo ws: F ( F − 1 X 1 ( u 1 ) , F − 1 X 2 ( u 2 )) = C ( u 1 , u 2 ) (33) Assuming that the densities f X i exist, then the join t probability density function is giv en by: f ( x 1 , x 2 ) = ∂ 2 ∂ x 1 ∂ x 2 F ( x 1 , x 2 ) = c ( F X 1 ( x 1 ) , F X 2 ( x 2 )) f X 1 ( x 1 ) f X 2 ( x 2 ) (34) where c ( F X 1 ( x 1 ) , F X 2 ( x 2 )) = ∂ 2 ∂ x 1 ∂ x 2 C ( F X 1 ( x 1 ) , F X 2 ( x 2 )) is the copula den- sit y . By substituting Eq. (34) in Eq. (31), and using that u i = F X i ( x i ), the m utual information can b e rewritten in terms of copula densit y function. I ( X 1 , X 2 ) = Z c ( F X 1 ( x 1 ) , F X 2 ( x 2 )) f X 1 ( x 1 ) f X 2 ( x 2 ) log c ( F X 1 ( x 1 ) , F X 2 ( x 2 ))d x 1 d x 2 = Z c ( u 1 , u 2 ) log c ( u 1 , u 2 )d u 1 d u 2 (35) Eq. (35) reveals that mutual information is the negativ e copula en tropy . Th us it do es not dep end on the marginal distributions, it only quan tifies the dep endence betw een the random v ariables whic h is contained in the copula function. This has b een p ointed out in literature in Calsav erini and Vicente (2009). 30 Ac kno wledgmen ts W e thank Dr. Jo c hen Marschall from SRI International for v aluable dis- cussions regarding the nitridation exp eriment. This material is based up on w ork supp orted b y the Department of Energy [National Nuclear Security Administration] under Aw ard Number [DE-FC52-08NA28615]. References K. Chaloner, I. V erdinelli, Ba yesian Exp erimental Design: A Review, Sta- tistical Science 10(3) (1995) 273–304. C. Shannon, A Mathematical Theory of Communication, Bell System T ech- nical Journal 27 (July , Octob er, 1948) 379–423, 623–656. D. V. Lindley , On a Measure of the Information Pro vided b y an Experiment, Ann. Math. Statist. 27(4) (1956) 986–1005. V. F edoro v, Theory of optimal exp eriments, Academic press New Y ork and London, 1972. T. Loredo, D. Chernoff, Bay esian Adaptive Exploration, Statistical Chal- lenges in Astronom y (2003) 57–70. A. DasGupta, Review of optimal Bay es designs, T ech. Rep., Purdue Univ er- sit y , 1995. P . Muller, Ba y esian Statistics 6, c hap. Sim ulation-based optimal design, Ox- ford Univ ersity Press, 459–474, 1999. P . Sebastiani, H. P . Wynn, Bay esian Exp erimen tal Design and Shannon Information, in: Bay esian Statistical Science, American Statistical Asso- ciation, 176–181, 1997. 31 M. Shewry , H. Wynn, Maximum en tropy sampling, Journal of Applied Statistics (1987) 165–170. A. F arhang-Mehr, S. Azarm, A sequential information-theoretic approach to design of computer exp eriments, in: AIAA, Atlan ta, Georgia, AIAA 2002–5571, 2002. R. Calsa v erini, R. Vicen te, An information-theoretic approach to statisti- cal dep endence: Copula information, EPL (Europh ysics Letters) 88 (6) (2009) 68003. S. H. Cheung, J. L. Beck, New Bay esian Up dating Metho dology for Mo del V alidation and Robust Predictions of a T arget System based on Hierar- c hical Subsystem T ests., Computer Metho ds in Applied Mec hanics and Engineering Accepted for publication. A. Krask ov, H. Stogbauer, P . Grassb erger, Estimating m utual information, Ph ysical Review E 69 (6) (2004) 066138. L. F. Kozac henko, N. N. Leonenko, Sample Estimate of the Entrop y of a Random V ector, Problems of Information T ransmission 23(2) (1987) 9–16. T. Guest, A. Curtis, Iterativ ely constructiv e sequen tial design of exp eri- men ts and surveys with nonlinear parameter-data relationships, Journal of Geoph ysical Research 114 (2009) B04307. L. P aninski, Asymptotic theory of information-theoretic exp erimen tal de- sign, Neural Computation 17 (2005) 1480–1507. M. Clyde, P . Muller, G. Parmigiani, Ba yesian Biostatistics, chap. Infer- ence and Design Strategies for a Hierarc hical Logistic Regression Mo del, Dekk er, New Y ork, 297–320, 1996. 32 M. Chung, E. Hab er, Exp erimental design in biological systems, in revision at SIAM Journal on Con trol and Optimization, 2011 . L. Horesh, E. Hab er, L. T enorio, Large-Scale In verse Problems and Quantifi- cation of Uncertaint y , chap. Optimal Exp erimen tal Design for the Large- Scale Nonlinear Ill-p osed Problem of Imp edance Imaging, Wiley , 273–290, 2011. M. A. A. T uck er, Application of Design of Exp erimen ts to Fligh t T est: A Case Study , in: AIAA, Los Angeles, California, 2008–847, 2008. A. Curtis, H. Maurer, Optimizing the design of geoph ysical exp eriments: Is it w orthwhile?, EOS, T rans. A,. Geophys. Un. 81 (20) (2000) 224–225. L. Zhang, D. A. Pejak ovic, J. Marschall, D. G. Fletcher, Lab oratory In ves- tigation of Activ e Carb on Nitridation by Atomic Nitrogen, AIAA Paper 2009-4251, AIAA, 2009. C. P ark, D. W. Bogdanoff, Sho ck-T ube Measuremen t of Nitridation Co ef- ficien t of Solid Carb on, Journal of Thermoph ysics and Heat T ransfer 20 (2006) 487–492. H. W. Goldstein, The Reaction of Activ e Nitrogen with Graphite, Journal of Ph ysical Chemistry 68 (1) (1964) 39–42. T. Suzuki, K. F ujita, K. Ando, T. Sak ai, Exp erimen tal Study of Graphite Ablation in Nitrogen Flow, Journal of Thermoph ysics and Heat T ransfer 22 (2008) 382–389. E. E. Prudencio, K. W. Sc hulz, The Parallel C++ Statistical Library ‘QUESO’: Quan tification of Uncertain ty for Estimation, Sim ulation and Optimization, Submitted to IEEE IPDPS . 33 S. H. Cheung, J. L. Bec k, New Ba yesian up dating methodology for mo del v alidation and robust predictions based on data from hierarchical subsys- tem tests, EERL Rep ort No.2008-04, California Institute of T echnology ., 2008. T. Co ver, J. Thomas, Elemen ts of Information Theory , New Y ork: Wiley , 1991. P . Sebastini, H. P . Wynn, Maxim um en tropy sampling and optimal Bay esian exp erimen tal design, J. R. Statist. So c. B 62, P art 1 (2000) 145–157. S. Khan, S. Bandy opadh ya y , A. R. Ganguly , S. Saigal, Relative performance of m utual information estimation metho ds for quantifying the dep endence among short and noisy data, Ph ysical Review E 76 (2007) 026209. A. O’Hagan, J. F orster, Kendall’s Adv anced Theory of Statistics. Bay esian Inference, v.2B, Oxford Univ ersity Press Inc, New Y ork, 79–80, 2004. Q. W ang, S. Kulk arni, S. V erdu, A Nearest-Neigh b or Approach to Esti- mating Divergence b etw een Contin uous Random V ectors, in: Informa- tion Theory , 2006 IEEE International Symp osium on, 242 –246, doi: 10.1109/ISIT.2006.261842, 2006. A. Sklar, F onctions de repartition a n dimensions et leurs marges, Publica- tions de lInstitut de Statistique de lUniv ersite de Paris, Paris 8. 34 Sample Size → 500 1 , 000 2 , 000 Dimensionalit y k NN k NN k NN dim( X , Y ) dim( X ) dim( Y ) 3 6 9 3 6 9 3 6 9 2 1 1 16% 10% 8% 11% 7% 5% 8% 5% 3% 3 2 1 21% 13% 11% 16% 9% 7% 12% 7% 5% 4 2 2 15% 11% 11% 10% 7% 6% 7% 5% 4% 5 3 2 18% 14% 13% 13% 9% 9% 9% 6% 9% T able 1: Absolute Relativ e Error for MI av eraged ov er 1000 trial runs Strategy Sequence Rep eated measuremen ts not allo wed ME Sampling 10 1 9 2 3 8 5 7 4 6 IM Sampling 1 10 9 2 3 8 7 4 6 5 RND Sampling 6 10 3 1 5 9 8 2 4 7 Rep eated measuremen ts allo wed ME Sampling 10 1 10 1 1 10 10 1 1 1 IM Sampling 1 10 1 10 10 1 10 10 1 1 T able 2: Results for example 1, the additive noise case. Design sequences yielded by the three strategies for one particular Monte Carlo run 35 Strategy Sequence Rep eated measuremen ts not allo wed ME Sampling 5 6 7 4 1 3 8 9 2 10 IM Sampling 1 10 9 7 2 8 4 3 5 6 RND Sampling 6 1 8 5 10 9 4 2 3 7 Rep eated measuremen ts allo wed ME Sampling 5 6 6 5 5 5 6 5 6 5 IM Sampling 1 10 1 10 10 1 1 1 1 10 T able 3: Results for example 1, the multiplicativ e noise case. Design sequences yielded by the three strategies for one particular Monte Carlo run Strategy Sequence ME Sampling 14 21 6 20 13 12 27 19 5 11 22 4 18 17 10 3 9 16 15 26 2 7 23 8 24 25 1 0 IM Sampling 21 5 20 0 1 13 27 14 2 12 15 16 11 22 19 9 10 3 4 6 26 25 23 7 8 24 18 17 ASC Sampling 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 T able 4: Results for example 2 (simulated data). Design sequences yielded b y the three strategies. Strategy Sequence ME Sampling 14 21 12 13 6 4 20 5 27 11 3 22 19 16 15 18 17 10 9 2 26 7 8 23 24 1 0 25 IM Sampling 21 1 20 12 27 14 13 11 19 22 26 9 10 25 23 7 24 8 18 17 0 5 2 15 16 3 4 6 ASC Sampling 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 T able 5: Results for example 3 (real data). Design sequences yielded by the three strate- gies. 36 Figure 1: Schematic of sequential exp erimen tal design pro cess −1 −0.5 0 0.5 1 −2 −1 0 1 2 3 4 5 6 ξ d 1 2 3 4 5 6 7 8 9 10 (a) Initial resp onse (additive noise) −1 −0.5 0 0.5 1 −2 −1 0 1 2 3 4 5 6 ξ d 1 2 3 4 5 6 7 8 9 10 (b) Initial resp onse (multiplicativ e noise) Figure 2: Results for example 1. Mo del resp onse given initial discrepancy and parametric uncertain ty . 37 0 2 4 6 8 10 −8 −7 −6 −5 −4 −3 −2 −1 0 n th design stage H( p( θ |D n ) ) IM ME RND (a) Average en tropy (additiv e noise - re- p eated measurements not allow ed) 0 2 4 6 8 10 −6 −5 −4 −3 −2 −1 0 n th design stage H( p( θ |D n ) ) IM ME RND (b) Av erage entrop y (multiplicativ e noise - rep eated measurements not allow ed) 0 2 4 6 8 10 −10 −8 −6 −4 −2 0 n th design stage H( p( θ |D n ) ) IM ME RND (c) Average entrop y (additive noise - re- p eated measurements allow ed) 0 2 4 6 8 10 −8 −7 −6 −5 −4 −3 −2 −1 0 n th design stage H( p( θ |D n ) ) IM ME RND (d) Av erage entrop y (multiplicativ e noise - rep eated measurements allow ed) Figure 3: The evolution of the en tropy of parameter distribution av eraged ov er 100 Monte Carlo runs for the three strategies: IM - Information Maximization Sampling, ME - Max- im um Entrop y Sampling, RND - Random Sampling, and the tw o case: no rep eated mea- suremen ts and rep eated measurements. 38 1 1.2 1.4 1.6 1.8 2 0 10 20 30 40 50 60 θ 1 p( θ 1 | D 2 ) IM ME RND (a) p ( θ 1 , D 2 ) (additive noise) 1 1.2 1.4 1.6 1.8 2 0 2 4 6 8 10 12 14 16 θ 1 p( θ 1 | D 2 ) IM ME RND (b) p ( θ 1 , D 2 ) (multiplicativ e noise) 3 3.2 3.4 3.6 3.8 4 0 10 20 30 40 50 60 θ 2 p( θ 2 | D 2 ) IM ME RND (c) p ( θ 2 , D 2 ) (additive noise) 3 3.2 3.4 3.6 3.8 4 0 5 10 15 20 θ 2 p( θ 2 | D 2 ) IM ME RND (d) p ( θ 2 , D 2 ) (multiplicativ e noise) Figure 4: Results for example 1. Marginal pdfs of mo del parameters giv en b y the three strategies after the second exp erimental design stage 39 200 300 400 500 600 700 800 1.5 2 2.5 3 3.5 4 4.5 5 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 E 21 E 20 F N2 [sccm] E 14 E 27 E 13 E 12 E 11 E 19 E 22 E 25 E 9 E 23 E 7 E 24 E 17 E 10 E 26 E 8 E 18 E 0 E 2 E 5 E 15 E 3 E 1 E 4 E 6 E 16 P 1 [Torr] X N1 [%] Figure 5: Results for example 2 (simulated data). 3D plot for inflow scenarios used in Ref. Zhang et al. (2009). 40 # # # # # # # # # # # # # # # # # # # # # # # # # # # # n th design stage measurement 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 (a) IM: Relativ e mutual information # # # # # # # # # # # # # # # # # # # # # # # # # # # # n th design stage measurement 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 (b) ME: Relativ e entrop y Figure 6: Results for example 2 (simulated data). Relative exp ected utilit y for Information Maximization Sampling and Maximum En tropy Sampling. Darker shadings indicate large exp ected utilities (mutual information for IM or entrop y for ME) relative to all av ailable scenarios within a design stage. The # sign indicates which scenario has b een selected at a given stage. Previously selected scenarios are excluded from subsequent analysis which is indicated here by white cells. 41 n th design stage Information Maximization Sampling 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Flow Pressure Concentration n th design stage Maximum Entropy Sampling 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Flow Pressure Concentration Figure 7: Results for example 2 (simulated data). Graphical representation for the inflow conditions selected by the tw o information theoretic strategies. Darker shadings indicate large inflow conditions such as flow, pressure, and concentration. 42 0 5 10 15 20 25 −12 −10 −8 −6 −4 −2 0 n th design stage H( p( θ |D n ) ) IM ME ASC (a) En tropy 0 5 10 15 20 25 0 1 2 3 4 n th design stage D KL ( p( θ |D n ) || p( Θ |D n−1 ) ) IM ME ASC 0.5 nats (b) Kullbac k Leibler divergence Figure 8: Results for example 2 (simulated data). The ev olution of the entrop y and the KL divergence given the p osterior distributions yielded b y the three strategies for exp erimen tal design: IM - Information Maximization Sampling, ME - Maximum Entrop y Sampling and ASC - Ascending Sampling 43 0.9 0.95 1 0 10 20 30 40 50 d eff /d eff0 p(d eff /d eff0 |D 2 ) IM ME ASC (a) Stage 2 - d ef f 0.8 1 1.2 1.4 0 10 20 30 40 50 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 2 ) IM ME ASC (b) Stage 2 - γ N 2 4 6 8 10 0 0.5 1 1.5 2 β N / β N0 p( β N / β N0 |D 2 ) IM ME ASC (c) Stage 2 - β N 0.9 0.95 1 0 20 40 60 80 d eff /d eff0 p(d eff /d eff0 |D 4 ) IM ME ASC (d) Stage 4 - d ef f 0.8 1 1.2 1.4 0 20 40 60 80 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 4 ) IM ME ASC (e) Stage 4 - γ N 2 4 6 8 10 0 1 2 3 β N / β N0 p( β N / β N0 |D 4 ) IM ME ASC (f ) Stage 4 - β N 0.9 0.95 1 0 20 40 60 80 d eff /d eff0 p(d eff /d eff0 |D 6 ) IM ME ASC (g) Stage 6 - d ef f 0.8 1 1.2 1.4 0 20 40 60 80 100 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 6 ) IM ME ASC (h) Stage 6 - γ N 2 4 6 8 10 0 1 2 3 4 β N / β N0 p( β N / β N0 |D 6 ) IM ME ASC (i) Stage 6 - β N 0.9 0.95 1 0 50 100 150 d eff /d eff0 p(d eff /d eff0 |D 8 ) IM ME ASC (j) Stage 8 - d ef f 0.8 1 1.2 1.4 0 20 40 60 80 100 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 8 ) IM ME ASC (k) Stage 8 - γ N 2 4 6 8 10 0 1 2 3 4 β N / β N0 p( β N / β N0 |D 8 ) IM ME ASC (l) Stage 8 - β N 0.9 0.95 1 0 100 200 300 d eff /d eff0 p(d eff /d eff0 |D 28 ) IM ME ASC (m) Stage 28 - d ef f 0.8 1 1.2 1.4 0 50 100 150 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 28 ) IM ME ASC (n) Stage 28 - γ N 2 4 6 8 10 0 2 4 6 β N / β N0 p( β N / β N0 |D 28 ) IM ME ASC (o) Stage 28 - β N Figure 9: Results for example 2 (simulated data). Marginal p dfs of mo del parameters giv en by the three strategies after a selected num b er of exp erimen tal design stages 44   Figure 10: Schematic of the exp erimen tal setup used in Zhang et al. (2009). Also shown in the inset is the sample and sample holder 45 # # # # # # # # # # # # # # # # # # # # # # # # # # # # n th design stage measurement 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 (a) IM: Relativ e mutual information # # # # # # # # # # # # # # # # # # # # # # # # # # # # n th design stage measurement 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 (b) ME: Relativ e entrop y Figure 11: Results for example 3 (real data). Relative exp ected utility for Information Maximization Sampling and Maximum En tropy Sampling. Darker shadings indicate large exp ected utilities (mutual information for IM or entrop y for ME) relative to all av ailable scenarios within a design stage. The # sign indicates which scenario has b een selected at a given stage. Previously selected scenarios are excluded from subsequent analysis which is indicated here by white cells. 46 n th design stage Information Maximization Sampling 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Flow Pressure Concentration n th design stage Maximum Entropy Sampling 1 3 5 7 9 11 13 15 17 19 21 23 25 27 Flow Pressure Concentration Figure 12: Results for example 3 (real data). Graphical representation for the inflow conditions selected by the tw o information theoretic strategies. Darker shadings indicate large inflow conditions such as flow, pressure, and concentration. 47 0 5 10 15 20 25 −12 −10 −8 −6 −4 −2 0 n th design stage H( p( θ |D n ) ) IM ME ASC (a) En tropy 0 2 5 10 15 20 25 0 5 10 15 20 n th design stage D KL ( p( θ |D n ) || p( Θ |D n−1 ) ) IM ME ASC 0.5 nats (b) Kullbac k Leibler divergence Figure 13: Results for example 3 (real data). The evolution of the en tropy and the KL div ergence given the p osterior distributions yielded by the three strategies for exp erimental design: IM - Information Maximization Sampling, ME - Maximum En tropy Sampling and ASC - Ascending Sampling 48 0.9 0.95 1 0 100 200 300 400 500 d eff /d eff0 p(d eff /d eff0 |D 6 ) IM ME ASC (a) Stage 6 - d ef f 0.8 1 1.2 1.4 0 10 20 30 40 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 6 ) IM ME ASC (b) Stage 6 - γ N 2 4 6 8 10 0 10 20 30 40 β N / β N0 p( β N / β N0 |D 6 ) IM ME ASC (c) Stage 6 - β N 0.9 0.95 1 0 200 400 600 800 d eff /d eff0 p(d eff /d eff0 |D 12 ) IM ME ASC (d) Stage 12 - d ef f 0.8 1 1.2 1.4 0 20 40 60 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 12 ) IM ME ASC (e) Stage 12 - γ N 2 4 6 8 10 0 10 20 30 40 50 β N / β N0 p( β N / β N0 |D 12 ) IM ME ASC (f ) Stage 12 - β N 0.9 0.95 1 0 200 400 600 800 d eff /d eff0 p(d eff /d eff0 |D 18 ) IM ME ASC (g) Stage 18 - d ef f 0.8 1 1.2 1.4 0 20 40 60 80 100 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 18 ) IM ME ASC (h) Stage 18 - γ N 2 4 6 8 10 0 10 20 30 40 50 β N / β N0 p( β N / β N0 |D 18 ) IM ME ASC (i) Stage 18 - β N 0.9 0.95 1 0 100 200 300 400 d eff /d eff0 p(d eff /d eff0 |D 24 ) IM ME ASC (j) Stage 24 - d ef f 0.8 1 1.2 1.4 0 20 40 60 80 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 24 ) IM ME ASC (k) Stage 24 - γ N 2 4 6 8 10 0 5 10 15 β N / β N0 p( β N / β N0 |D 24 ) IM ME ASC (l) Stage 24 - β N 0.9 0.95 1 0 100 200 300 400 d eff /d eff0 p(d eff /d eff0 |D 28 ) IM ME ASC (m) Stage 28 - d ef f 0.8 1 1.2 1.4 0 20 40 60 log( γ N / γ N0 ) p(log( γ N / γ N0 )|D 28 ) IM ME ASC (n) Stage 28 - γ N 2 4 6 8 10 0 5 10 15 β N / β N0 p( β N / β N0 |D 28 ) IM ME ASC (o) Stage 28 - β N Figure 14: Results for example 3 (real data). Marginal pdfs of mo del parameters given b y the three strategies after a selected num b er of exp erimen tal design stages 49

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment