Aerodynamic Data Fusion Towards the Digital Twin Paradigm

We consider the fusion of two aerodynamic data sets originating from differing fidelity physical or computer experiments. We specifically address the fusion of: 1) noisy and in-complete fields from wind tunnel measurements and 2) deterministic but bi…

Authors: S. Ashwin Renganathan, Kohei Harada, Dimitri N. Mavris

Aerodynamic Data Fusion Towards the Digital Twin Paradigm
A er odynamic Data Fusion T o w ar ds the Digit al T win P aradigm ∗ S. Ashwin Reng anathan † Argonne N ational Laboratory Lemont, IL 60439 K ohei Harada ‡ and Dimitri N. Ma vr is § Georgia Ins titute of T ec hnology Atlanta, GA 30332 W e consider the fusion of tw o aerodynamic data sets originating from differing fidelity ph ysical or computer experiments. W e specifically address the fusion of: 1) noisy and in- complete fields from wind tunnel measur ements and 2) deterministic but biased fields fr om numerical simulations. These tw o data sources are fused in or der to estimate the true field that best matc hes measured q uantities that serv es as the ground truth. F or exam ple, two sources of pr essure fields about an air craft are fused based on measured forces and moments from a wind-tunnel experiment. A fundamental challenge in this problem is that the true field is unkno wn and can not be estimated with 100% certainty . W e emplo y a Ba y esian framew ork to inf er the true fields conditioned on measured quantities of interest; essentially we perform a statistical cor rection to the dat a. The fused data ma y then be used to construct more accurate surrog ate models suitable f or earl y stages of aerospace design. W e also introduce an extension of the Proper Orthogonal Decomposition with constraints to solv e the same problem. Bo th methods are demonstrated on fusing the pressure distributions for flow past the RAE2822 airfoil and the Common Researc h Model wing at transonic conditions. Comparison of both methods rev eal that the Bay esian method is more robust when data is scarce while capable of also accounting for uncertainties in the dat a. Furthermore, giv en adequate dat a, the POD based and Ba y esian approaches lead to similar results. Nomenclatur e N = normal distribution µ = mean of multivariate normal distribution σ, τ = standard deviation ∗ Submitted to the editors † Pos tdoctoral Appointee, Division of Mathematics & Computer Science, 9700 S Cass A ve. Member ‡ Ph.D. Candidate, School of Aerospace Engineering, 270 Ferst Dr NW . Student Member § Reg ents Professor , School of Aerospace Engineering, 270 Ferst Dr NW . Fello w π = probability density z = measurements n = dataset size I n = identity matrix of size n × n H = output operator E = e xpectation of a random variable Σ , Γ = variance-co variance matr ix x = spatial-coordinates f (· ) = f or ward model ` = cor relation length scale M AP = maximum a-posteriori estimate M / R e / α = Mach/R eynolds number/ angle of attack C P = coefficient of pressure θ = fusion parameter C m / C M , C l / C L = coefficient of moment, lift (2D/3D) I. Introduction The Digital T win (DT) concept in aerospace systems design is a vision that aims to achiev e paradigm shift in flight certification, fleet management and sustainment [ 1 ]. The basic idea is to de velop a simulator integrating high-fidelity ph ysics models, historical flight-test data and sensor updates amongst others, to mir ror the operation of the cor responding flying twin. This w ould in tur n enable real-time health monitor ing of those v ehicles dur ing their operation that could potentiall y result in reliable and efficient designs that meet the design requirements of the future. The trend in aircraft design ov er the past couple decades ha ve resulted in an abundance of data being g enerated from three main sources namely , numer ical simulations, wind-tunnel measurements and flight testing; thereby la ying the f oundation to wards realizing an aerodynamic digital twin. Particularly , the steady g ro wth in computational capabilities in the past f ew decades ha v e assisted aircraft manuf acturers to numer ically simulate the aerodynamics of an aircraft under real flight conditions. This comes at a significantly cheaper cost than wind-tunnel or flight tests and thereby replacing or supplementing them to a good e xtent [ 2 ]. On the other hand, flight tes t and wind tunnel data, though not abundant are a vailable in non-negligible amounts. Numerical simulations at the practical scale are seldom ex act and suffer from model bias , whereas measurements are noisy , incomplete and could be contaminated b y er rors introduced due to model scale and instrumentation among other factors. Theref ore, as a means f or realizing the DT vision, there is a need to take advantag e of av ailable data b y combining them in a fashion that is likel y to make them more accurate than 2 their individual selv es. W e ref er to this problem as multifidelity data fusion and within the context of this w ork, we consider aerodynamic pressure fields ar riving from computational fluid dynamics (CFD) simulations and wind-tunnel measurements. W e distinguish the present work from some of the past work where the ter m data fusion has been used in the conte xt of using variable fidelity simulation codes to accelerate design optimization; f or instance see Keane (2003) [ 3 ]. On the contrary , the present w ork is purely data-dr iv en and uses only domain know ledg e about the actual models in vol v ed as will be demonstrated in later sections. An associated challeng e with the data fusion problem is to also quantify the confidence in the fused data. As mentioned ear lier, neither the CFD predictions nor the wind-tunnel field measurements are dev oid of uncertainties and theref ore their fusion is e xpected to inherit those uncer tainties. Uncertainty in CFD results could originate from inadequate phy sical models (such as turbulence closures [ 4 , 5 ]), g eometr y discrepancy , and discretization er rors that lead to numerical diffusion [ 6 ], just to name a f ew . The wind-tunnel measurements are affected by model installation effects (such as due to wall and sting), simulation of proper boundar y -lay er dev elopment, and flo w angle cor rection that could lead to errors in directional static stability . Quantifying these uncer tainties is a challeng e in itself; how ev er w e assume they are known and show ho w to properl y account for them in the proposed approach. The goal of data fusion theref ore, is to synergisticall y combine datasets from multiple fidelity sources, to improv e the o v erall prediction in addition to quantifying uncer tainty in the ov erall prediction. Combining aero/fluid dynamic data from computer and ph ysical e xper iments is commonly done with the Proper Orthogonal Decomposition (POD) originally introduced b y Lumle y [ 7 ] and Siro vich [ 8 ] in the context of turbulent fluid flo w , where the combined flo w snapshots are stack ed in a matr ix and a set of or thonormal modes are e xtracted via the singular v alue decomposition (SVD); the state of the flo w is then assumed to be a linear combination of these modes. In a wa y , the POD leads to a dimensionality reduction from the or iginal state-space dimension to the number of POD modes which are typically much f ew er . T aking advantag e of this fact, the gappy POD introduced by Sirovic h [ 9 ] imputes missing data in fluid dynamic datasets where the POD modes are first e xtracted from the complete datasets. Then the missing data from the incomplete sets are estimated as the best linear combination of the POD modes nearest to the incomplete dataset in the least-sq uares sense. See [ 10 ] for fur ther details. Rusc her et al [ 11 ] supplement the gapp y POD with wa velet methods to ensure continuity in the reconstructed data, while W en et al [ 12 ] appl y it when there are two sets of missing data but in a patter n that complement each other . Another technique that has a very similar goal as data fusion is the so-called Data Assimilation (D A), which ref ers to combining phy sical measurements with mathematical models par ticularl y in the context of dynamical sys tems. D A or iginated in the geosciences field f or weather prediction [ 13 , 14 ]. The methodology proposed in this work differs from D A b y the f act that it is purel y data-driven. Although domain know ledge about the models that generated the data is le verag ed, the models themselv es don ’ t feature in the proposed method, as in D A. Ov erall, our method is more g ener ic, robust to small data and applies to an y situation where there are multiple sources of data and some measurement of the g round truth. 3 As a first step, this work proposes a methodology to fuse two datasets that could be cor r upted by noise, bias and/or incompleteness. As mentioned before, we consider field data that represent the state of the system; such as the distribution of pressure on the sur face of an aircraft wing at a specific flight condition. Using this inf ormation and additional measurements of quantities of interest (QoIs) such as forces and moments, that are considered the g round truth, the inv erse problem of estimating the true field by fusing the av ailable datasets that best matches the measurements is posed and sol ved. The measurements of QoIs, although cor rupted by noise and wind-tunnel scaling effects like the field measurements are extracted from more mature and reliable instrumentation such as force balance. Further more, these measurements are independent of the field measurements, i.e. the y are not calculated as a function of the fields but are directly measured. For these reasons, the QoI measurements are treated as the g round tr uth in this work. An schematic of the ov erall methodology is pro vided in Figure 1. The in version of the pressure field from the measured f orces and moments is an ill-posed problem which is easy to see. For instance an arbitrary shift in the pressure distribution o ver a closed aerodynamic sur f ace does not chang e the the value of its integral; i.e. ∫ S pd s = ∫ S ( p − p ∞ ) d s . In this regard, w e incorporate a Ba yesian framew ork such that regularizing pr iors can be specified to tackle the ill-posedness of the in v erse problem, in addition to accounting f or uncertainties associated with the datasets; this method w as originally introduced in [ 15 ]. This wa y , the method performs a statis tical cor rection on the datasets and infers the probability distribution of the fused data. The output of the proposed method may then be used to construct more accurate sur rogate models of the fields which can be queried cheapl y to sol v e problems in aerospace design such as optimization and uncertainty quantification. Additionall y , w e e xtend the work in [ 15 ] and introduce a POD-based method to sol ve the same problem, namely the POD with constraints (CPOD). In contrast to the Bay esian approach, this method searches the POD subspace generated from the data to find the fused data cor responding to a giv en flight condition. Ho we ver , the CPOD, similar to the Ba yesian method also treats the measured f orces and moments as the ground tr uth. This w ork makes the f ollowing assumptions. Firs tly , we consider onl y two lev els of fidelity f or the sake of simplicity although it e xtends to a hierarchy of fidelities without any modification e xcept f or the notations. Secondly , w e assumes identical geometric fidelity f or all the sources. For instance, it does not fuse wing and airfoil pressure distributions. Thirdly , we assume that a pair of fidelity data are av ailable f or all flight conditions. In practice, the case might be that the wind-tunnel and CFD date are do not cor respond to identical flight conditions. T o meet this requirement, the CFD data used in this work is generated to match the flight conditions of a pre-e xisting wind-tunnel dataset. Fourthly , we assume that the uncer tainty in the a vailable data sources are kno wn. Whereas in the wind-tunnel measurements an es timate of these uncertainties are av ailable from replications and sensor calibration tests, quantifying all forms of uncertainty in CFD data is not trivial. While this assumption makes the proposed method subjectiv e, it also adds the flexibility of lev eraging expert domain know ledg e when a vailable. Finally , viscous effects are not accounted f or in the methodology because field measurements of shear stress are typicall y hard to obtain. For this reason, the QoIs considered in this w ork 4 Fig. 1 Schematic of the ov erall methodology are restricted to the lift and moment coefficients where the contributions from viscosity are relativel y marginal. The rest of the ar ticle is org anized as follo ws. The Bay esian methodology is outlined in the ne xt section. The tw o test cases and their associated detail are discussed in section III . Demonstration of the proposed Bay esian method is sho wn in section IV and the CPOD method and compar ison of both methods are discussed in section V. Finall y the paper concludes with a summary of main outcomes of the study and some directions f or future work. II. Data Fusion via Ba y esian Inf erence In section IV we demonstrate the methodology on the coefficient of pressure ( C P ) distr ibution on an aircraft wing-section as w ell as the entire aircraft. Ho we ver , here we keep the e xposition g eneral and consider two independent fidelity sources of data y 1 ∈ R n and y 2 ∈ R n represented b y random variable Y with densities π Y ( y 1 ) = N ( µ 1 , σ 2 1 I n ) π Y ( y 2 ) = N ( µ 2 , σ 2 2 I n ) (1) where σ 2 1 and σ 2 2 are the variance in the datasets and are assumed kno wn. Fur thermore, µ 1 and µ 2 are the e xpected values of the datasets which in the case of computer experiments are the direct predictions and in the case of ph ysical e xper iments could be the ensemble av eraged measurements of the fields. Since the source of µ 1 and µ 2 could be v ectors 5 of disparate lengths, we inter polate them onto a common g rid of length n . Let z = [ z 1 , . . . , z m ] > , z i ∈ R be m QoI’ s which are a function of y and whose direct measurements are also av ailable. Further more, w e assume that the f or ward problem z = f ( y ) is kno wn and w e are interested in sol ving the inverse problem y = f − 1 ( z ) giv en z . Finally we restr ict the cur rent scope of the problem to be linear and hence z = f ( y ) = H > y , where H is a linear operator ∈ R n × m that maps the field to the QoI. In the field of applied aerodynamics such a linear assumption is valid since quantities such as the lift, drag and moment coefficients are linear functions of the pressure distribution ∗ . The z are assumed to be noisy and are related to the f or w ard model via the follo wing relationship z = H > y +  (2) where  represents additiv e Gaussian white noise with probability density π E giv en by N ( 0 , τ 2 I m ) , where τ 2 is again assumed kno wn. Our goal is no w to estimate the probability distribution of the unkno wn y giv en the measurements z and this is done via Ba yes ’ rule explained as follo ws. A. Ba y es’ Rule The cor nerstone of the present methodology is the Bay es ’ rule, stated in (3) , whose highlight is that it operates on the probability densities of the v ar iables rather than the variables themsel ves, thereby incor porating the uncer tainty in the data naturally . The main idea is that, to inf er the dis tr ibution of some unknown parameter conditioned on a vailable data, π ( y | z ) † , the likelihood of obser ving the data giv en the model and its inputs, π ( z | y ) can be consolidated with whatev er prior belief is av ailable about the unknown parameter itself, denoted by π ( y ) and are related to each other as follo ws π ( y | z ) = π ( z | y ) π ( y ) π ( z ) (3) In the equation abo ve, π ( y | z ) is called the posterior density and π ( z ) is the marginal density of z which can be ev aluated b y integrating out y from the joint density π ( z , y ) . For practical purposes we are interested in some moment of the posterior density such as its mode or e xpected v alue. Since π ( z ) ev aluates to a constant, its ev aluation can be ignored in such situations. In this w ork w e shall work with the v alue of the parameter estimate that maximizes the posterior probability which is called the maximum aposteriori (MAP) estimate giv en b y y ∗ = ar g max y π ( z | y ) π ( y ) (4) which can be e valuated via gradient-based non-linear optimization methods. W e now proceed to give specific details on the prior and likelihood models. ∗ note that we do not account for viscous effects in this context since they are typically unav ailable from phy sical e xper iments † Note that we remov e the subscr ipts in the density f or conv enience of notation 6 B. Prior Distribution A prior distribution on the unknown y can be thought of as a regular izer that restricts the posterior distribution to ph ysicall y valid solutions. This is where w e take advantag e of the av ailable inf or mation in the form of y 1 and y 2 and combine them to define the pr ior . In this work, we introduce a fusion parameter θ ∈ R ⊆ [ 0 , 1 ] which combines y 1 and y 2 as ˜ µ = θ × E ( y 1 ) + ( 1 − θ ) × E ( y 2 ) = θ × µ 1 + ( 1 − θ ) × µ 2 (5) where θ is chosen as the solution of θ = ar g min θ k H > ˜ µ − z k 2 2 (6) The the prior distribution on y is then set as a multivariate Gaussian distribution centered on ˜ µ giv en by π ( y ) = 1 ( 2 π ) n / 2 | Σ | 1 / 2 e xp  − 1 2 ( y − ˜ µ ) > Σ − 1 ( y − ˜ µ )  (7) In other w ords, w e first chose the linear combination of the two sources of information that best matches the observed data (in the least squares sense) and define a Gaussian distribution centered around this estimate. This is built on the belief that the tr ue v alue of y lies some where in the neighborhood of y 1 and y 2 which is appro ximated as a linear w eighted combination. Although the cur rent wa y of estimating ˜ µ is not unique, it provides a very simple and intuitiv e wa y of specifying the pr ior; i.e. it is a linearl y w eighted av erage of the a vailable data. Further more, such prior specification are called sample based priors [ 16 ] where prior belief is e xpressed via a combination of sample solutions of the unkno wn. Note that alter nativ ely , one can treat θ as a random v ar iable and use a hierarchical Bay es [ 17 ] frame work and instead inf er the posterior π ( θ | z ) with some pr ior on π ( θ ) . In that case, with a uniform pr ior on theta and a Gaussian likelihood, using the MAP es timate of π ( θ | z ) is equiv alent to the present approac h. How ev er w e do not take that route in the present work to fa v or simplicity of e xposition. In (7) w e treat y to be spatiall y cor related and hence define the co variance matr ix Σ as Σ i j = C o v ( y ( x i ) , y ( x j )) = σ 2 e xp − k x i − x j k 2 2 2 ` 2 ! (8) where, x denote the spatial coordinates with k · k 2 denoting the Euclidean distance, the parameter ` represents the length scale that is assumed to be kno wn and (due to independence of y 1 and y 2 ) σ 2 = θ 2 σ 2 1 + ( 1 − θ ) 2 σ 2 2 . The choice of the squar ed-exponential kernel in (8) is to specify smoothness in the pr ior realizations although other choice of kernels may be considered depending on the problem. See Ch.4 of [18] for a compendium of cov ariance kernels. 7 C. Lik elihood Model Assuming that  and y are mutuall y independent, the probability density of z , conditional on Y = y is obtained b y shifting the density π E around H > y leading to the likelihood density π ( z | y ) ∼ N ( H > y , τ 2 I m ) = 1 √ 2 π τ e xp  − 1 2 τ 2 ( z − H > y ) > ( z − H > y )  (9) where again, τ 2 is the measurement noise. D. Maximum a-P osteriori (MAP) Estimation W e are interested in sol ving f or the inv erse problem of estimating the true field given the measurements in terms of its probability density function, π ( y | z ) , which is giv en by π ( y | z ) ∝ π ( z | y ) × π ( y ) π ( y | z ) ∝ exp  − 1 2 τ 2 k z − H > y k 2  × e xp  − 1 2 ( y − ˜ µ ) > Σ − 1 ( y − ˜ µ )  (10) By (10) what w e mean is that we estimate the tr ue y distribution that best fits the measured value of the quantity of interest while also being similar to the pr ior elicited f or y via (7) . The mode of the resulting posterior probability distribution is the MAP es timate we are interes ted in. It should be noted that since the poster ior distribution in the present conte xt is Gaussian, the mean, median and mode are identical and hence the choice does not matter . Theref ore w e are interested in solving the f ollowing optimization problem y M A P = ar g max y e xp  − 1 2 τ 2 k z − H > y k 2  × e xp  − 1 2 ( y − ˜ µ ) > Σ − 1 ( y − ˜ µ )  (11) which is equivalent to solving y M A P = ar g min y 1 2 τ 2 k z − H > y k 2 + 1 2 ( y − ˜ µ ) > Σ − 1 ( y − ˜ µ ) = ar g min y J ( y ) (say) (12) where J denotes the entire cost function. Since (12) is differentiable ev erywhere we ev aluate its gradient and set it to zero. Further more, J ( y ) is a symmetr ic positiv e-definite quadratic form which has a unique minimizer . Theref ore we write the g radient as 8 ∂ J ∂ y = 1 τ 2  −( z − H > y ) > H > + ( y − ˜ µ ) > Σ − 1  = 0 (13) Rearranging abo ve equation giv es y > M A P  1 τ 2 HH > + ˜ − 1  = 1 τ 2 z > H > + ˜ µ > Σ − 1 which gives y > M A P =  1 τ 2 z > H > + ˜ µ > Σ − 1   1 τ 2 HH > + Σ − 1  − 1 (14) It can then be sho wn that the posterior distr ibution is given by π ( y | z ) ∼ N ( y > M A P , Γ ) where, Γ =  1 τ 2 HH > + Σ − 1  − 1 (15) The diagonal elements of Γ contain the pointwise variance of y | z which may be used to construct confidence intervals on the predictions. The hyperparameters of the methodology are γ = { σ 2 1 , σ 2 2 , τ 2 , ` } . Among them, σ 2 1 , σ 2 2 , τ 2 represent the uncer tainty in the a v ailable dataset including the measured QoIs. Quantifying these uncertainties in the data is a v er y elaborate task which is not under taken in this work. Instead w e present a method that fuse the datasets by accounting f or these uncer tainties. As f or the length-scale parameter ` , although it can be estimated from data, in the present w ork we fix its v alue f or each of the test cases listed in section III. It is chosen from tr ial-and-er ror such that the realizations from the prior look phy sically reasonable. Note that one of the pr imar y advantag es of the Ba yesian frame work is the ability to specify subjective pr iors that lev erage domain know ledg e. The method is summarized in Algorithm 1. Result : y M A P , Γ , confidence intervals Data: fields: µ 1 , µ 2 , measurements: z , parameters: γ ; 1) Estimate θ from (6) 2) Compute y M A P from (14) 3) Compute posterior co variance matr ix from (15) 4) Extract standard deviation to constr uct confidence inter vals; ± p diag ( Γ ) Algorithm 1: Ba yesian Data Fusion 9 III. T est Cases and Dat a The tw o main test cases for the methods used in this w ork are the viscous transonic flow past the RAE2822 airfoil and the NAS A Common Research Model (CRM). The cor responding wind-tunnel data are e xtracted from [ 19 ] and [ 20 ] respectivel y . Specificall y , f or the CRM test case, the pressure distributions from pressure-sensitive paint (PSP) [ 21 , 22 ] measurements are obtained from the NAS A Ames 11ft T ransonic Wind T unnel provided in [ 23 ]. All the CFD simulations w ere per f or med using the commercial, finite-v olume based unstructured code ST AR -CCM+ [24]. RAE-2822 The RAE2822 is a commonly used benchmark test case f or transonic flight conditions. The air f oil shape and the mesh f or the RANS analy sis are sho wn in Figure 2. A mesh with approximatel y 122,500 polyhedral mesh elements is generated with 41 pr ism la yers to capture the boundar y lay er . The first lay er of the pr ism la y er is placed approximatel y 1 . 6 × 10 − 6 m aw ay from the wall such that the w all y + ≈ 1 f or the Re ynolds number ranges considered in this work. The A G ARD [ 19 ] wind-tunnel measurements are av ailable f or both the pressure distr ibution as w ell as the lift and moment coefficients and the operating conditions are summarized in T able 1. The air f oil sur f ace is discretized into an n = 128 size equall y-spaced g rid on which both the CFD and wind-tunnel data are inter polated before per f or ming the fusion. T able 1 RAE2822: summary of test cases Case Mach R eynolds number (millions) Angle of attack (degrees) 1 0.676 5.7 2.40 2 0.676 5.7 -2.18 3 0.600 6.3 2.57 4 0.725 6.5 2.92 5 0.725 6.5 2.55 6 0.728 6.5 3.22 7 0.730 6.5 3.19 8 0.750 6.2 3.19 9 0.730 2.7 3.19 10 0.745 2.7 3.19 11 0.740 2.7 3.19 Fig. 2 The RAE2822 airfoil shape and near-field mesh with polyhedral elements. 10 Fig. 3 T rimmed hexahedral mesh f or the CRM wing-body geometry containing 24M elements Common Resear ch Model The clean wing-body configuration of the CRM without tail-planes is used in this w ork. A mesh with approximatel y 24 million tr immed he xahedral elements w ere generated as sho wn in Figure 3, where the boundar y lay er is resol ved with 41 prism lay ers and the wall y + ≈ 1 . In order to keep the computational costs of the method tractable, the CFD and PSP data are interpolated onto a coarse g rid consisting of n = 8688 cells (further details are provided in the Appendix .B). The viscous contr ibutions to the lift coefficient is in O ( 10 − 5 ) and the moment coefficient is in O ( 10 − 3 ) across all the test cases; while this is negligible for the lift, it is not so f or the moments. In this work, the viscous contr ibutions are ignored to keep the e xposition simple although they can be easily included by adding an offset paramter in (6) and (9) of the form k z − H > y − δ k where δ > 0 is a discrepancy parameter and is in the same order as the viscous contributions. The test data for the CRM are set at a Re ynolds number of 5 million and a full-factorial design of Mach = { 0 . 70 , 0 . 85 , 0 . 87 } , angle of attack = { 0 , 0 . 5 , 1 . 0 , 1 . 5 , 2 . 0 , 3 . 0 , 4 . 0 } . The results are demonstrated in the next section for a selected combination of operating conditions. IV . Method Demonstration Here we demonstrate the methodology on inferr ing the true C P distributions given measurements. Here z 1 = C L / C l , z 2 = C M / C m , µ 1 = C P , P S P / W T , µ 2 = C P , C F D and ˜ µ = ˜ C P . The matr ix H ∈ R n × 2 where the two columns contain inf or mation about the local surface nor mals, cell areas and free-stream flo w necessar y to numer icall y integrate the pressure distributions to compute C L / C l and C M / C m respectiv ely . 11 A. RAE2822 Consider the Figure 4a where the CFD prediction and the wind-tunnel measurement of the C P distribution (for flight condition M a c h = 0 . 676 , Re = 5 . 7 M , α = − 2 . 18 de g . ) match quite closely . Ho we ver , the QoI’s ( C l , C m ) obtained b y integ rating the C P curves (ev aluating the f or w ard model) from the wind-tunnel and CFD C P ’ s are (− 0 . 122 , − 0 . 074 ) and (− 0 . 101 , − 0 . 075 ) respectiv ely . Both QoI’ s are off from the measur ed QoI’ s (− 0 . 121 , − 0 . 028 ) although, the discrepancy is more pronounced f or C m . In this case, no matter what value of θ is chosen, the resulting ˜ C P is not e xpected to produce outputs that match the measurements. Theref ore, we are interested in adjusting the cur v es in Fig. 4a such that the QoI’ s der iv ed from the adjusted cur v e matches the measured v alues in the least-squares sense. W e set τ 2 = 10 − 6 , σ 2 1 = 10 − 2 , σ 2 2 = 10 − 2 based on the belief that the the pressure distributions from CFD predictions and wind-tunnel measurements ha ve greater uncer tainty than the f orce and moment measurements; while ` is set to 10 − 4 . The statis tically adjusted solution is then giv en by the posterior distribution (15) and 500 dra ws from this distribution are sho wn in Fig.4b. The MAP estimate is the e xpectation of the posterior distribution and is shown in Fig.4c o ver laid with plots for the CFD and wind-tunnel distr ibutions. Notice that the current approach shrinks the best combination of the two distributions in order to minimize the misfit betw een the deriv ed QoIs and the measurements. It should be noted that the MAP estimate should not be treated as the tr ue C P distribution that uniquely defines the state of the sy stem at the giv en operating conditions. This is because the MAP estimate depends on the chosen values of the h yper parameters. A dditionally , it tr ies to minimize the misfit between noisy data and a deter ministic model while not accounting f or viscous effects. Therefore the fused C P should be anal yzed in ter ms of its probability densities which account f or the uncer tainty rather than treating it as a deterministic estimate. Having said that, the MAP estimate pro vides an inter pretable visualization of the posterior distribution and is used while compar ing v arious cur v es. A point w or th mentioning is the impact on the results due to the parameters. In Fig.4, the parameter τ 2 was set to 10 − 6 which implies that we trust the measured QoIs to posses very high signal-noise ratio. As a result the proposed approach tr ies to match them as closely as possible leading to relativel y more adjustment on the or iginal C P curves. On the other hand if w e admit our ignorance about the actual value of the measurements and set a higher τ 2 , then the method leads to relativel y less adjustment. This is demonstrated in figures 5a and 5b where the τ 2 is set to 10 − 4 and 10 − 2 respectiv ely . Notice that when τ 2 = 10 − 2 the uncertainty in the QoIs (particularly C m ) is very high that the misfit term in (12) becomes less impor tant and the pr ior dominates; as a result C P , M AP ≈ ˜ C P . Larger the τ 2 parameter , greater the misfit between the measured QoIs and those obtained by integrating the MAP estimate of C P (see the leg end entries of Figures 5). As a counter ex ample, consider the case sho wn in Figure 6 which cor responds to the flight conditions ( M a c h = 0 . 740 , Re = 2 . 7 M , α = 3 . 19 de g . ). For this case the QoI’ s integrated from the wind-tunnel cur v e (0.7049, -0.0875) match v er y closely with the measurements (0.7061, -0.087) and as a result the predicted MAP estimate f or C P almost ov erlaps with the wind-tunnel curve (Fig. 6c). Therefore in this case, the choice of the parameters pla y a relativel y minor role. 12 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Cp Measured: Cl = -0.1208, Cm = -0.0280 WT CFD (a) Original C P distributions 0.0 0.2 0.4 0.6 0.8 1.0 x/c 1.5 1.0 0.5 0.0 0.5 1.0 1.5 -CP (b) 500 posterior draw s 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Cp Measured Cl = -0.1208, Cm = -0.0280 CFD: Cl=-0.1014, Cm=-0.0748 WT: Cl=-0.1227, Cm=-0.0738 MAP: Cl=-0.1209, Cm=-0.0288 95% CI (c) MAP with confidence bands Fig. 4 C P distribution corresponding to Case-2 ( M a c h = 0 . 676 , R e = 5 . 7 M , α = − 2 . 18 de g . ) B. Common Resear ch Model For the CRM tes t case, the approach is repeated and results are presented f or a select fe w cases as f ollow s. The h yper parameters are set to τ 2 = 10 − 6 , σ 2 1 = 10 − 2 , σ 2 2 = 10 − 2 and ` = 0 . 01 . Figure 7 show s results f or M a c h = 0 . 87 and α = 4 . 0 de g . where the CFD predicts a relativ ely strong er shock compared to PSP measurements. Incidentally , the MAP estimate matches more closely with the CFD results compared to PSP . Further more, the PSP curve falls outside the 95% confidence region of the MAP in the vicinity of the shock, suggesting that the PSP measurements in this region are less reliable. While a different choice of parameter values will chang e the size of the confidence bands, the results clearl y demonstrate that the CFD results agree more closel y with the measurements than PSP f or this specific case. A contrasting behavior is obser v ed in Figure 8 where the MAP estimate matches more closel y with the PSP results compared to CFD. Finally , in the case of Figure 9, the MAP estimate lies betw een the CFD and PSP estimates where 13 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Cp Measured Cl = -0.1208, Cm = -0.0280 CFD: Cl=-0.1014, Cm=-0.0748 WT: Cl=-0.1227, Cm=-0.0738 MAP: Cl=-0.1239, Cm=-0.0560 95% CI (a) τ 2 = 10 − 4 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Cp Measured Cl = -0.1208, Cm = -0.0280 CFD: Cl=-0.1014, Cm=-0.0748 WT: Cl=-0.1227, Cm=-0.0738 MAP: Cl=-0.1064, Cm=-0.0740 95% CI (b) τ 2 = 10 − 2 Fig. 5 Effect of the measurement noise parameter τ 2 . Smaller value for ces method to adjust the C P curv es more. the CFD predictions f all outside of the 95% confidence bounds around the MAP estimate, in the vicinity of the shoc k. These results demonstrate that the proposed approach allo ws one to quantify the spatially distributed confidence in the fused data by accounting f or the uncer tainties in the or iginal datasets. W e claim that this makes the fused data more useful that the or iginal datasets and one ma y then use the inf er red posterior distributions to construct sur rogate models of the fields using model reduction methods [25, 26] f or instance. An alter nativ e wa y to use the proposed method is to in vok e domain know ledg e in the pr ior specification. F or instance, if domain know ledge recommends more tr ust in one amongst the PSP or CFD data, then the θ parameter can be treated as a tuning parameter whose v alue can be directly set as opposed to estimating it. The plots in Figure 10 represent an operating condition ( M a c h = 0 . 87 , α = 1 . 5 de g . ) where a strong shock wa ve is expected on the wing. While this is predicted by the CFD simulations, the PSP measurements show rather shallow g radients in the coefficient of pressure (Figs. 11d through 11f). Theref ore the user might c hoose to put more prior belief in the CFD predictions by setting θ = 0 in (5) . In this case, the method tries to adjust the CFD predictions so as match the measured QoI which results in a MAP estimate that looks relativel y similar to the actual CFD prediction. This is demonstrated in Figure 11. It should be noted that the MAP estimate is itself non-unique - i.e. there can be more than one MAP estimate that agrees with the measurements z equall y well. The estimate depends on the prior specification as illustrated through Figures 10 and 11; re-iterating the ill-posedness issue of the inv erse problem mentioned in section I. Therefore, it is critical to specify the prior more judiciously . The T able 2 summarizes the integ rated QoIs (b y ev aluating f or ward model) from the MAP , CFD and PSP C P distributions and compares them against the measured v alues. It can be seen that the MAP estimates match the measured 14 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Cp Measured: Cl = 0.7061, Cm = -0.0870 WT CFD (a) Original C P distributions 0.0 0.2 0.4 0.6 0.8 1.0 x/c 1.5 1.0 0.5 0.0 0.5 1.0 1.5 -CP (b) 500 posterior draw s 0.0 0.2 0.4 0.6 0.8 1.0 x 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Cp Measured Cl = 0.7061, Cm = -0.0870 CFD: Cl=0.8244, Cm=-0.0963 WT: Cl=0.7049, Cm=-0.0875 MAP: Cl=0.7061, Cm=-0.0870 95% CI (c) MAP with confidence bands Fig. 6 C P distribution corresponding to Case-11 ( M a c h = 0 . 740 , R e = 2 . 7 M , α = 3 . 19 de g . ) values very closel y . Theref ore, the methodology o v erall finds the bes t es timate that ag rees with measurements giv en the multiple sources of data, regular ized b y the specified pr ior belief on the estimate. The methodology is computationally efficient since the posterior distributions are analyticall y der iv ed as opposed to making numer ical appro ximations and requires only approximatel y 1 minute of wall-cloc k time to fuse one pair of dataset ( n ≈ 10000 ) on an Intel 4-core i7 processor with 16gb RAM. This establishes the scalability of the proposed approach f or larg e datasets. One of the ke y character istics of the proposed Ba yesian approach is that it assumes that more than one source of the dataset is av ailable f or a given operating condition and operates only on this dataset. How e v er, typically windtunnel and CFD campaigns are conducted in batches and the av ailable datasets span a range of operating conditions. In the f ollo wing section, w e propose another solution for the same problem that relies on lear ning flo w features from the entire dataset, unlike the proposed Bay esian approach follo wing which, we show compar isons of both methods. 15 (a) CFD (b) MAP (c) PSP 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.5 0.0 0.5 Cp Measured CL = 0.6733, CM = -0.0898 MAP, CL=0.6732 CM=-0.0898 PSP CFD 95% CI (d) 20% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 1.25 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Cp Measured CL = 0.6733, CM = -0.0898 MAP, CL=0.6732 CM=-0.0898 PSP CFD 95% CI (e) 50% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 1.25 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Cp Measured CL = 0.6733, CM = -0.0898 MAP, CL=0.6732 CM=-0.0898 PSP CFD 95% CI (f ) 80% Span Fig. 7 M a c h = 0 . 87 , R e = 5 M , α = 4 . 0 de g . T able 2 Summary of the CRM results in terms of the QoIs ( C L and C M ). The data under columns labeled MAP , PSP and CFD cont ain the QoI’s from ev aluating the f orward model at the C P distributions. Bold-face entries indicate best match to measurements Measurements MAP PSP CFD M a c h / R e / α C L C M C L C M C L C M C L C M 0.87/5m/4.0 0.6733 -0.0898 0.6732 -0.0898 0.6024 -0.1111 0.6773 -0.0932 0.87/5m/3.0 0.5416 -0.0762 0.5416 -0.0767 0.5105 -0.1159 0.2593 -0.0870 0.7/5m/1.5 0.2938 -0.0429 0.2935 -0.0431 0.2703 -0.0654 0.3646 -0.0959 0.85/5m/1.5 0.3383 -0.0486 0.3378 -0.0489 0.2758 -0.0482 0.4206 -0.1206 16 (a) CFD (b) MAP (c) PSP 0.0 0.2 0.4 0.6 0.8 1.0 x 1.25 1.00 0.75 0.50 0.25 0.00 0.25 0.50 Cp Measured CL = 0.2938, CM = -0.0429 MAP, CL=0.2935 CM=-0.0431 PSP CFD 95% CI (d) 20% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Cp Measured CL = 0.2938, CM = -0.0429 MAP, CL=0.2935 CM=-0.0431 PSP CFD 95% CI (e) 50% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 0.4 0.2 0.0 0.2 0.4 0.6 Cp Measured CL = 0.2938, CM = -0.0429 MAP, CL=0.2935 CM=-0.0431 PSP CFD 95% CI (f ) 80% Span Fig. 8 M = 0 . 7 , Re = 5 M , α = 1 . 5 (a) CFD (b) MAP (c) PSP 0.0 0.2 0.4 0.6 0.8 1.0 x 1.25 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Cp Measured CL = 0.5416, CM = -0.0762 MAP, CL=0.5416 CM=-0.0768 PSP CFD 95% CI (d) 20% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 1.00 0.75 0.50 0.25 0.00 0.25 0.50 Cp Measured CL = 0.5416, CM = -0.0762 MAP, CL=0.5416 CM=-0.0768 PSP CFD 95% CI (e) 50% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Cp Measured CL = 0.5416, CM = -0.0762 MAP, CL=0.5416 CM=-0.0768 PSP CFD 95% CI (f ) 80% Span Fig. 9 M a c h = 0 . 87 , R e = 5 M , α = 3 . 0 de g . 17 (a) CFD (b) MAP (c) PSP 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Cp Measured CL = 0.3441, CM = -0.0551 MAP, CL=0.3438 CM=-0.0552 PSP CFD 95% CI (d) 20% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Cp Measured CL = 0.3441, CM = -0.0551 MAP, CL=0.3438 CM=-0.0552 PSP CFD 95% CI (e) 50% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 0.6 0.4 0.2 0.0 0.2 0.4 0.6 Cp Measured CL = 0.3441, CM = -0.0551 MAP, CL=0.3438 CM=-0.0552 PSP CFD 95% CI (f ) 80% Span Fig. 10 M = 0 . 87 , Re = 5 M , α = 1 . 5 . θ estimated from (6) (a) CFD (b) MAP (c) PSP 0.0 0.2 0.4 0.6 0.8 1.0 x 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Cp Measured CL = 0.3441, CM = -0.0551 MAP, CL=0.3465 CM=-0.0556 PSP CFD 95% CI (d) 20% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Cp Measured CL = 0.3441, CM = -0.0551 MAP, CL=0.3465 CM=-0.0556 PSP CFD 95% CI (e) 50% Span 0.0 0.2 0.4 0.6 0.8 1.0 x 0.75 0.50 0.25 0.00 0.25 0.50 0.75 Cp Measured CL = 0.3441, CM = -0.0551 MAP, CL=0.3465 CM=-0.0556 PSP CFD 95% CI (f ) 80% Span Fig. 11 M a c h = 0 . 87 , R e = 5 M , α = 1 . 5 de g . . θ set to 0 to express greater prior belief on the CFD predictions 18 V . Data Fusion via Proper Orthogonal Decomposition W e now present the proper or thogonal decomposition with constraints (CPOD), which uses the POD [ 7 , 8 ] to first construct an orthogonal sub-space (also known as ’POD modes ’) from data (a vailable C P distributions). Then the unkno wn fused C P is appro ximated as a linear e xpansion of the POD modes, whose parameters (coordinates) can be estimated via linear least-squares methodology . A schematic of the method is pro vided in Figure 12. W e begin with a brief revie w of POD before proceeding to outline the proposed CPOD method. A. Proper Orthogonal Decomposition POD was or iginall y introduced in the context of turbulent flow modeling by Holmes et al [ 7 ], where it was used to characterize the coherent structures in the flo w from wind tunnel measurements. POD has a special characteristic of optimality in that it pro vides the most efficient means to capture the dominant components of a process [ 27 ]. Given a state variable u ∈ R n which may be the numer ical solution of a PDE on a computational mesh of size n or measurements from a ph ysical e xper iment (such as Particle Image V elocimetry), the POD expresses u as the linear e xpansion on a finite number of k or thonormal basis vectors φ i ∈ R n . That is, u ≈ k Õ i = 1 a i φ i (16) where, a i is the i th component of a ∈ R k and are the coefficients of the basis e xpansion. It can be sho wn that [ 27 , 28 ] the POD modes in the abov e equation are the same as the left singular v ectors of the snapshot matr ix (obtained b y stacking q snapshots of u ), U = [ u 1 , . . . , u q ] . That is, U = thin-svd ΦDΨ > (17) then Φ k represents the first k columns of Φ ∈ R n × q , after truncating the last q − k columns based on the relativ e magnitudes of the cumulativ e sum of their singular values. The L 2 error in approximation of the state variables due to the POD basis e xpansion is then given as q Õ j = 1   u j − ( Φ k Φ > k ) u j   2 2 = q Õ i = k + 1 d 2 i (18) where d i is the singular value corresponding to the i th column of Φ and is also the i th diagonal element of D . W e choose k such that Í k i = 1 d i / Í q i = 1 d i ≈ 0 . 99 , which essentially means we retain the modes that e xplain appro ximately 99% of the variability in the dataset. Let the (unkno wn) fused C P be denoted ˜ u . Firstl y , w e make the assumption that ˜ u is missing from the dataset but is in the subspace spanned by Φ k and hence ˜ u ≈ Φ k a , where a is to be estimated. Secondly , w e kno w that ˜ u has to 19 Fig. 12 Schematic of the CPOD methodology match the measurements z and hence one may consider estimating a b y minimizing k H > Φ k a − z k 2 2 , ho we ver this is an underdetermined problem unless k ≤ m , which is practically an unlikel y scenario. T o ov ercome this w e combine it with the minimization of k Φ k a − ˜ u k 2 2 , ho we ver ˜ u is unkno wn. Theref ore, we introduce an iter ative method that implements the CPOD which is descr ibed as f ollow s. B. CPOD The unkno wn ˜ u is first pro vided an initial guess f ollo wing which a is estimated via solving the minimization problem a ∗ = ar g min a 1 2 k Φ k a − ˜ u k 2 2 subject to H > Φ k a = z (19) Then, the guess f or ˜ u is updated as ˜ u ← Φ k a ∗ . The updated ˜ u is then used to enr ich the POD modes, i.e. Φ k ← thin-svd ( [ U , ˜ u ] ) and Eq. (19) is solv ed again. The procedure is repeated until the cost function ( J , to be defined in (20) ) reaches a steady value, which in this w ork is measured based on the standard deviation ( ∆ c ) of J o ver the previous c iterations f alling within some specified threshold  c , i.e. ∆ c = 1 c Í c i = 1 J 2 i −  1 c Í c i = 1 J i  2 ≤  c , where J i is the value of the cost function at the i th iteration. This stopping criter ion is used based on the assumption that w e are interested in the objectiv e function reaching a steady v alue rather than strictly zero. Note that this iterative approach in vol ves 20 re-computing the SVD of the data at ev ery iteration whose computational cost can be mitigated by per f or ming rank -1 updates to S VD [ 29 , 30 ]. W e no w show ho w to sol ve (19) , which is a linearl y constrained least-squares problem [ 31 ]. W e introduce the penalized cost-function J and re-wr ite the constrained optimization problem (19) as a ∗ , λ ∗ = ar g min a , λ J ( a , λ ) = 1 2 ( Φ k a − ˜ u ) > ( Φ > k a − ˜ u ) + λ > ( H > Φ k a − z ) (20) where λ ∈ R m × 1 is the v ector of Lagrang e parameters. The par tial derivativ es of the cos t function in (20) with respect to a and λ are then set equal to zero ∂ J ∂ a = Φ > k ( Φ k a − ˜ u ) + Φ > k H λ = 0 ∂ J ∂ λ = H > Φ k a − z = 0 (21) where the first line of (21) consists of k equations and the second consists of m equations. The full set of equations in (21) represent the Karush-Kuhn- T ucker (KKT) conditions [ 31 , 32 ] f or constrained optimization which can be wr itten in matrix form as         I k Φ > k H H > Φ k 0                 a λ         =         Φ > k ˜ u z         (22) where the I k is a k × k identity matr ix that results from Φ > k Φ k = I k and 0 is a m × m matrix of zeros. It can be shown that the coefficient matr ix in (22) is in vertible pro vided H is full rank (see Appendix .A) and hence the least-squares problem has a unique solution. The solution of (22) in vol v es computation of a matr ix inv erse during ev er y iteration but is computationall y cheap since the matrix has the reduced ( k + m ) × ( k + m ) size instead of the full n × n size where, k + m < < n . The CPOD method is summarized in Algor ithm 2 where it treats the av ailable data to be dev oid of an y uncer tainties and hence the u i are equivalent to the µ used the in the Ba yesian method. Further more, the CPOD algor ithm relies on a rich basis set obtained from POD and hence the snapshot matrix ( U ) combines all the av ailable data (CFD and 21 wind-tunnel) in step-1 of Algor ithm 2. Result : Fused v ector ˜ u Data: Snapshot matrix U = [ u 1 , . . . , u q ] , Fields at target flight condition u C F D , u W T , QoIs at targ et flight condition z , Stopping cr iterion c = 5 ,  c = 10 − 6 1) Extract POD modes U = ΦDΨ > Determine rank k ; Φ k = Φ ( : , 1 : k ) 2) Guess ˜ u as θ × u C F D + ( 1 − θ ) × u W T where θ ⊆ [ 0 , 1 ] ; 3) while ∆ c ≥  c do • Compute a ∗ via sol ving (19) • Update ˜ u ← Φ k a ∗ • Update Φ k : [ U , ˜ u ] = thin-svd ΦDΨ > • Compute cost function, (20) • Update ∆ c end Algorithm 2: CPOD - Proper Orthogonal Decomposition with Constraints C. Discussion W e now compare and contrast the per f ormance of the CPOD against the Bay esian approach. As we shall demonstrate the CPOD relies on lear ning the flo w features from the entire dataset. Theref ore when it is deemed necessary to enrich the POD basis set, additional data is generated via CFD since e xper imental data are not av ailable on demand. A dditionally , since the cost of generating CFD data for the RAE test case is significantly cheaper than the CRM model, w e restr ict compar ison of the methods based on the RAE test case only . Exper imental data cor responding to 11 flight conditions are av ailable (see T able 1) for the RAE test case; the number of snapshots, q is theref ore the CFD data generated in addition to these 11 cases. W e pick tw o contras ting cases (2 & 11) to demons trate the compar ison in fa v or of keeping the discussion precise. 1. Effect of Snapshot Size q Since the CPOD relies on searching f or the unknown ˜ u from a subspace, the r ichness of the snapshot matrix play s a major role in the per f or mance of the method unsur pr isingl y; this is demonstrated in Figure 13. Here, a maximin Latin Hypercube designs [ 17 ] of size 20, 40 and 80 are generated in the ( M ac h , R e , α ) space to augment the experimental data leading to q = 31 , 51 and 91 snapshots respectivel y . Note that the q = 22 case contains 11 CFD cases cor responding to e xactly the same flight conditions as in T able 1. 22 With too fe w snapshots ( q = 22 ), the CPOD determines a ˜ u that satisfies the f or w ard model within the specified tolerance but results in a C P curve that looks noisy and phy sically unrealistic (Figure 13a). This also demonstrates the ill-conditioning of the problem mentioned in section I, i.e. there are more than one C P curve that satisfies the forw ard model to matc h the measured QoIs. R ecall that the Ba yesian approach discussed ear lier ov ercomes this issue with the prior regular ization. With more snapshots, the CPOD subspace is enr iched leading to smoother C P curves. The sensitivity to the snapshot size manif ests more prominently only when the discrepancy between the f orward model predictions and the measured QoIs is significant, as in Case 2 shown in Figure 13a. For this specific case, q = 91 is necessary to ensure sufficiently smooth C P curves. As a counter e xample, Figure 13b show s that f e wer snapshots are sufficient f or this specific case. How ev er , based on the w orst-case scenar io (Case-2), q is fix ed at 91 for the rest of the results. 0.0 0.2 0.4 0.6 0.8 1.0 x 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 C P C P O D , q = 2 2 C P O D , q = 3 1 C P O D , q = 5 1 C P O D , q = 9 1 (a) Case-2: M a c h = 0 . 676 , Re = 5 . 7 × 10 6 , α = − 2 . 18 d e g . 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.5 0.0 0.5 1.0 C P C P O D , q = 2 2 C P O D , q = 3 1 C P O D , q = 5 1 C P O D , q = 9 1 (b) Case-11: M a c h = 0 . 74 , Re = 2 . 7 × 10 6 , α = 3 . 19 d e g . Fig. 13 Effect of snapshot size q on the smoothness of CPOD predictions. An aspect we care about as par t of the data fusion frame work is uncer tainty quantification. Whereas the MAP results are a function of the input uncer tainties in the data, the CPOD results do not account f or any of the uncer tainties and treats the dataset { u i } as ouputs of deterministic experiments as mentioned bef ore. Ho we ver , in the ne xt subsection we sho w how approximate confidence bounds can be constructed for the CPOD predictions. 2. Confidence bounds for CPOD One of the obvious limitations of the CPOD is its inability to account for uncer tainty in the datasets. Ho we ver , one can use a frequentist approach to obtain confidence bounds as demonstrated here. Assuming that the unkno wn fused C P is the true mean of a multi-variate nor mally distributed population with unknown cov ariance function, the confidence intervals are appro ximated via the S tudent ’s t − distribution [ 33 ] described as f ollo ws. Given T realizations of the ˜ u that 23 results from repeating Algorithm 2 with independent draw s of θ from a unif or m distribution U ( 0 , 1 ) , a t − distribution of ν = T − 1 degrees of freedom is defined as the dis tr ibution of the location of the sample mean relativ e to the tr ue mean, divided b y the sample s tandard deviation. This wa y , the t − distribution can used to construct confidence bounds for the true mean. The sample mean ( ˆ µ ) and co variance ( ˆ Σ ) are giv en by ˆ µ = 1 T T Õ i = 1 ˜ u i (23) ˆ Σ = 1 T − 1 T Õ i = 1 [ ˜ u i − ˆ µ ][ ˜ u i − ˆ µ ] > (24) The 1 − β confidence inter vals can then be provided as ˆ µ ± t 1 − β,ν diag ( ˆ Σ )/ √ T where diag ( ˆ Σ ) are the diagonal elements of ˆ Σ and t 1 − β,ν is the t − value cor responding to a ( 1 − β ) × 100 % confidence. The 95% confidence bounds are plotted f or the tw o RAE cases in Figure 14 with T = 1000 . The bounds are more pronounced in for the CPOD results in 14b compared to 14a par ticularl y around the shock. This is because the CPOD relies mainly on minimizing the QoI data misfit whereas the Bay esian method accounts f or uncer tainties in the data. It is emphasized that the confidence bounds in CPOD are an artifact of the randomness in the initial guess provided (via the parameter θ ) and does not account f or the the input uncer tainties, unlike the Bay esian method. In that sense, the CPOD confidence bounds might not be highly useful in uncertainty quantification although they visualize the effect of the mis-specification of the initial guess to the method. In all the plots in this section showing CPOD predictions, T = 1000 unless otherwise mentioned. The CPOD and Bay esian methods present tw o approaches to wards fusing data from multiple fidelity experiments. The common thread between the two approaches is that the y both they find the best fit between the predictions of the f or w ard model and the measurements of the QoI’ s in the least-squares sense. How ev er , they differ in a fundamental w ay - whereas the CPOD is a data-dr iv en approach that essentially depends on the flow -f eatures learned from the data (in the f or m of POD modes), the Bay esian approach relies on subjectiv e prior specifications. Another impor tant aspect that distinguish the approaches is that while the Bay esian approach accounts and propagates uncer tainties in the a vailable data, the CPOD does not. Despite the differences, the authors hav e obser v ed that with adequate snapshots the CPOD predictions are v er y similar to the MAP predictions of the Bay esian method. 3. CPOD vs MAP The predictions from the Bay esian (MAP) and the CPOD (with q = 91 ) methods are compared for the RAE cases 2 and 10 in Figure 15. Sur pr isingl y , the MAP and the CPOD predictions are very similar despite the fact that the CPOD depends on the entire dataset ( q = 91 in this case) to learn the flow features in the data and provide a ph ysicall y realis tic result whereas the MAP required only one snapshot each from the two fidelity experiments. The difference is more pronounced in cases where the discrepancy betw een the QoI’ s predicted from the f or ward model and the measurements 24 0.0 0.2 0.4 0.6 0.8 1.0 x 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 C P CPOD 95% CI 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.5 0.0 0.5 C P MAP 95% CI (a) Case-2: M a c h = 0 . 676 , Re = 5 . 7 10 6 , α = − 2 . 18 d e g . 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.5 0.0 0.5 1.0 C P CPOD 95% CI 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.5 0.0 0.5 1.0 1.5 C P MAP 95% CI (b) Case-11: M a c h = 0 . 74 , Re = 2 . 7 10 6 , α = 3 . 19 d e g . Fig. 14 Confidence bound comparison is more pronounced, as sho wn in Figure 15a. Although the CPOD has lear ned flo w-f eatures from data, in this case the snapshots are dominated by CFD data which inherits the bias in the data as w ell. Theref ore it leads to a slightly different prediction compared to the MAP which depends onl y on the model misfit and prior specification (which depends onl y on u C F D and u W T ). Despite the differences, one can appreciate the similar ity between the predictions at a very high lev el. 4. Computational Aspects The CPOD relies on an appropr iate initial guess very similar to the requirement of the pr ior in the Bay esian method. As mentioned in section I, both methods can lead to phy sically unrealistic results while still satisfying the f or w ard problem within a specified tolerance. This work relies on the assumption that the unkno wn (fused) C P is some where in the neighborhood of the av ailable C P ’ s from the various fidelity e xperiments. And within such a sufficiently small search space, it is obser v ed that the CPOD and the MAP predictions are quite similar . One of the significant differences between the tw o methods (in addition to those mentioned in the previous subsection) 25 0.0 0.2 0.4 0.6 0.8 1.0 x 1.00 0.75 0.50 0.25 0.00 0.25 0.50 0.75 C P CPOD MAP (a) Case-2: M a c h = 0 . 676 , Re = 5 . 7 10 6 , α = − 2 . 18 d e g . 0.0 0.2 0.4 0.6 0.8 1.0 x 1.0 0.5 0.0 0.5 1.0 C P CPOD MAP (b) Case-11: M a c h = 0 . 74 , Re = 2 . 7 10 6 , α = 3 . 19 d e g . Fig. 15 Comparison of the CPOD and Ba yesian methods for data fusion T able 3 Comparison of computational costs Computation CPOD Ba yesian SVD nq 2 - linear sys tem solv e ( k + m ) 3 - Cost function computation n 2 - Estimation of θ - n + m 3 Compute y M A P , Γ - n 3 + n 2 is the computational time comple xity . It w ould appear that the CPOD incurs significantly more computational cost since it operates on a snapshot set in contrast to the Ba yesian approach. Additionall y , the SVD (and recursiv e S VD-update) steps scale with the data size. How e ver , the Bay esian approac h in v olv es the inv ersion of matr ices of size n × n which is e xpensive as n gets bigger . On the other hand, the CPOD method is iterativ e although it was obser v ed to con ver ge monotonically within 3-5 iterations f or all the cases tested in this work; an ex ample of which is sho wn in Figure 16. It is ho we ver w orth recalling that the reduced computations in the Bay esian approach are a consequence of the choice of conjugate pr iors that lead to analyticall y closed-f or m e xpression f or the MAP and the confidence bounds. This will not be the case in a more general setup with g ener ic pr iors and the estimation of the hyperparameters included in the method. A v er y rough comparison of the computational time-comple xity for both methods are pro vided in T able 3 in ter ms of the order of magnitude ( O ) of floating point operations. Note that in the comparison, the Bay esian method assumes Gaussian prior and likelihoods and known hyperparameters whereas the CPOD represents one repetition at a fixed θ . Although from a time-complexity point of view , the Bay esian method seems expensiv e, from a wall-cloc k time point of vie w either methods take O ( s e c s ) in a standard w orkstation (Intel i7, 1.9GHz, 4 Core) at the time of writing this ar ticle f or both the 2D and 3D cases. This demons trates the feasibility of the present implementation with real large-scale data. 26 0 1 2 3 4 5 optimization iterations 0.0 0.2 0.4 0.6 0.8 1.0 cost function (a) Conv ergence of cost function 0 1 2 3 4 5 optimization iterations 8 × 1 0 1 9 × 1 0 1 Measured CL Measured CM * -10 (b) Conv ergence of QoIs. Solid lines are QoI histories Fig. 16 Optimizer history for the CPOD VI. Conclusion A no vel method to fuse noisy , incomplete and biased aerodynamic field inf or mation from wind-tunnel measurements and high-fidelity mathematical model predictions is presented. The method applies Bay esian inf erence to sol v e f or the fused C P distribution given predictions from computer experiments and wind-tunnel measurements. The approac h depends on providing a pr ior belief on the unkno wn C P distribution as a linear combination of the field data. Then a likelihood model is defined that minimizes the least-squares misfit between the output QoIs (computed via the f or w ard model (2) ) and the measurements. Combining the prior belief with the likelihood via Bay es ’ r ule, the fused C P distribution is estimated as a probability distribution. The parameters of the methodology are the uncer tainties associated with the datasets as well as the measurements in addition to a cor relation length-scale that deter mines the smoothness of the pr ior . One of the main assumptions made in this work is the specification of the prior via the estimation of the parameter θ . Future directions in this work shall add more r igor to the method by ackno wledging our ignorance on θ and estimating it via a Hierarchical Bay es method. The method is successfully demonstrated on the transonic flo w past a wing section (RAE2822 airf oil) as w ell as an entire aircraft (CRM). Ov erall, the proposed method performs a statistical adjustment to the a v ailable field data based on very measurements of QoIs treated as g round truth. The adjusted data can then be used to construct data-dr iv en models tow ards realizing the DT goal. As an alter nativ e method to sol ve the same problem, the CPOD method is introduced which, lik e the Bay esian method minimizes the misfit between the predicted and measured QoIs, how ev er the search space is constrained b y the POD subspace. Compar ison of the CPOD and Bay esian methods rev ealed that they result in sur prisingly similar results despite significant differences in their approach. A kno wn limitation of the CPOD is that it cur rently does not account f or the uncer tainty in the dataset; future directions in this work aims to devise methods that account f or the noise in the SVD step. Another obser v ed limitation of the CPOD is that in cases where experimental measurements of the fields is 27 limited, additional snapshots are likely to be generated from CFD. This could lead to biased results since the POD basis could inherit the bias present in the CFD data. Another f ocus of future w ork is the hybridization of the Ba yesian and CPOD method - i.e. specification of inf or mativ e priors derived from the POD. While this is a relativ ely straightf orward e xtension, the authors are in ves tigating the remedies to the kno wn limitations of the proposed methodologies as a first step. A ckno wledgments The mater ial is based upon work suppor ted b y Airbus in the frame of the Airbus / Georgia T ech Center for MBSE-enabled Ov erall Aircraft Design and by the U .S. Depar tment of Energy , Office of Science, under contract number DE-A C02-06CH11357. The insets used in Figures 1 and 12 sho wing wind-tunnel measurements are adapted from www.nasa.gov . Go vernment License: The submitted manuscript has been created b y UChicago Argonne, LLC, Operator of Argonne National Laboratory ("Argonne”). Argonne, a U.S. Depar tment of Energy Office of Science laborator y , is operated under Contract No. DE-A C02-06CH11357. The U .S. Go vernment retains f or itself, and others acting on its behalf, a paid-up nonex clusiv e, ir rev ocable w orldwide license in said ar ticle to reproduce, prepare der iv ative w orks, distribute copies to the public, and per f or m publicly and display publicly , b y or on behalf of the Go v ernment. The Depar tment of Energy will pro vide public access to these results of f ederally sponsored research in accordance with the DOE Public A ccess Plan (http://energy .gov/do wnloads/doe-public-access-plan). Ref erences [1] Glaessgen, E., and Starg el, D., “The digital twin paradigm for future NAS A and US Air Force v ehicles, ” 53rd AIAA/ASME/ASCE/AHS/ASC Structur es, S tr uctur al Dynamics and Materials Confer ence 20th AIAA/ASME/AHS Adaptiv e Structur es Conf erence 14th AIAA , 2012, p. 1818. [2] Johnson, F . T ., Tinoco, E. N., and Y u, N. J., “Thir ty y ears of dev elopment and application of CFD at Boeing Commercial Airplanes, Seattle, ” Computers & Fluids , V ol. 34, No. 10, 2005, pp. 1115–1151. [3] Keane, A., “Wing optimization using design of e xper iment, response sur face, and data fusion methods, ” Journal of Aircr aft , V ol. 40, No. 4, 2003, pp. 741–750. [4] Edeling, W ., Cinella, P ., Dwight, R., and Bi jl, H., “Bay esian estimates of parameter variability in the k–  turbulence model, ” Journal of Computational Physics , V ol. 258, No. 4, 2014, pp. 1–9. [5] Emory , M., Larsson, J., and Iaccarino, G., “Modeling of structural uncer tainties in Re ynolds-a verag ed Na vier -Stokes closures, ” Physics of Fluids , V ol. 25, No. 11, 2013. doi:10.1063/1.4824659. 28 [6] Mathelin, L., Hussaini, M. Y ., and Zang, T . A., “Stoc hastic approaches to uncer tainty quantification in CFD simulations,” Numerical Algorithms , V ol. 38, No. 1-3, 2005, pp. 209–236. [7] Holmes, Philip., Lumley , John L., Berk ooz, Gahl and Ro wle y, C. W ., T urbulence, Coherent Structures, Dynamical Sys tems and Symmetry , V ol. 36, 1998. doi:10.2514/2.399. [8] Siro vich, L., “T urbulence and the dynamics of coherent str uctures. I - Coherent structures. II, ” Quar ter ly of Applied Mathematics (ISSN 0033-569X) , V ol. 45, No. July , 1987, p. 561. [9] Everson, R., and Siro vich, L., “Karhunen–Loev e procedure f or gapp y data, ” JOSA A , V ol. 12, No. 8, 1995, pp. 1657–1664. [10] Bui- Thanh, T ., Damodaran, M., and Willco x, K. E., “ Aerodynamic data reconstr uction and inv erse design using proper orthogonal decomposition, ” AIAA jour nal , V ol. 42, No. 8, 2004, pp. 1505–1516. [11] Rusc her, C. J., Dannenhoffer , J. F ., and Glauser , M. N., “Repairing Occluded Data f or a Mach 0.6 Jet via Data Fusion, ” AIAA Journal , V ol. 55, No. 1, 2016, pp. 255–264. doi:10.2514/1.j054785, URL https://doi.org/10.2514/1.J054785 . [12] W en, X., Li, Z., Peng, D., Zhou, W ., and Liu, Y ., “Missing data reco very using data fusion of incomplete complementar y data sets: A par ticle image velocimetry application, ” Physics of Fluids , V ol. 31, No. 2, 2019. doi:10.1063/1.5079896. [13] Dee, D. P ., Uppala, S., Simmons, A., Ber risford, P ., Poli, P ., Koba yashi, S., Andrae, U., Balmaseda, M., Balsamo, G., Bauer, d. P ., et al., “The ERA-Interim reanaly sis: Configuration and performance of the data assimilation sys tem, ” Quarterly Journal of the ro yal meteor ological society , V ol. 137, No. 656, 2011, pp. 553–597. [14] Kalnay , E., Atmospheric modeling, data assimilation and pr edictability , Cambridge university press, 2003. [15] Reng anathan, A., Harada, K., and Ma vr is, D. N., “Multifidelity Data Fusion via Bay esian Inference, ” AIAA A viation 2019 F orum , 2019, p. 3556. [16] Cal vetti, D., and Somersalo, E., “Inv erse problems: From regularization to Bay esian inference, ” Wiley Inter disciplinar y Reviews: Computational Statistics , V ol. 10, No. 3, 2018, p. e1427. [17] Santner , T . J., Williams, B. J., Notz, W ., and Williams, B. J., The design and analysis of computer experiments , V ol. 1, Springer , 2003. [18] Williams, C. K., and Rasmussen, C. E., Gaussian processes for machine lear ning , V ol. 2, MIT press Cambridge, MA, 2006. [19] “EXPERIMENT AL D A T A B ASE FOR COMPUTER PR OGRAM ASSESSMENT - R eport of the Fluid Dynamics Panel W orking Group 04, ” T ech. rep., A GARD Advisory Report No. 138, 1979. URL http://eda- ltd.com.tr/caeeda_doc/AGARD- AR- 138.pdf . [20] V assberg, J., Dehaan, M., Riv ers, M., and W ahls, R., “Dev elopment of a common research model f or applied CFD validation studies, ” 26th AIAA Applied Aer odynamics Confer ence , 2008, p. 6919. 29 [21] McLachlan, B. G., and Bell, J. H., “Pressure-sensitiv e paint in aerodynamic testing, ” Experimental ther mal and fluid science , V ol. 10, No. 4, 1995, pp. 470–485. [22] Morr is, M., Donov an, J., Keg elman, J., Schwab, S., Levy , R., and Cr ites, R., “ Aerodynamic applications of pressure sensitive paint, ” AIAA jour nal , V ol. 31, No. 3, 1993, pp. 419–425. [23] Riv ers, M. B., and Dittberner, A., “Exper imental inv estig ations of the N ASA common research model, ” Jour nal of Aircr af t , V ol. 51, No. 4, 2014, pp. 1183–1193. [24] ST AR -CCM+, 2019. URL https://mdx.plm.automation.siemens.com/star- ccm- plus . [25] Reng anathan, S. A., Liu, Y ., and Ma vr is, D. N., “Koopman-Based Approach to Nonintrusive Projection-Based Reduced-Order Modeling with Black -Box High-Fidelity Models, ” AIAA Journal , V ol. 56, No. 10, 2018, pp. 4087–4111. [26] Reng anathan, S. A., “ A Methodology f or Non-Intrusiv e projection-based model reduction of e xpensive black -box PDE-based sys tems and application in the many -quer y conte xt, ” Ph.D. thesis, Georgia Institute of T echnology , 2018. [27] Holmes, Philip., Lumley , John L., Berk ooz, Gahl and Ro wle y, C. W ., T urbulence, Coherent Structures, Dynamical Sys tems and Symmetry , V ol. 36, 1998. doi:10.2514/2.399. [28] Chatterjee, A., “An introduction to the proper or thogonal decomposition, ” Curr ent Science , V ol. 78, No. 7, 2000, pp. 808–817. doi:10.1109/LPT .2009.2020494. [29] Lev e y , A., and Lindenbaum, M., “Sequential Karhunen-Loev e basis extraction and its application to images, ” IEEE T ransactions on Imag e pr ocessing , V ol. 9, No. 8, 2000, pp. 1371–1374. [30] Brand, M., “Fas t low -rank modifications of the thin singular value decomposition, ” Linear algebr a and its applications , V ol. 415, No. 1, 2006, pp. 20–30. [31] Boyd, S., and V andenberghe, L., Convex optimization , Cambr idge university press, 2004. [32] Nocedal, J., and W right, S., Numerical optimization , Springer Science & Business Media, 2006. [33] Ross, S., A firs t course in probability , Pearson, 2014. [34] Press, W . H., T eukolsky , S. A., V etterling, W . T ., and Flanner y , B. P ., “Numerical recipes in C++, ” The ar t of scientific computing , V ol. 2, 1992, p. 1002. [35] Ahrens, J., Ge veci, B., and La w , C., “Para view : An end-user tool for large data visualization, ” The visualization handbook , V ol. 717, 2005. 30 Appendix A. Proof of existence of solution f or (22) Here w e show that a unique solution e xists for the Equation 22. Theorem .1 The KKT matrix         I k Φ > k H H > Φ k 0         is inv er tible if H is full rank. Consider the f ollowing matrix inv ersion lemma by Pr ess et al [34]. If A =         P Q R S         , then A − 1 =         ˜ P ˜ Q ˜ R ˜ S         wher e, ˜ P = P − 1 + P − 1 QMRP − 1 ˜ Q = − P − 1 QM ˜ R = − MRP − 1 ˜ S = M and M = ( S − RP − 1 Q ) − 1 Proof. For the coefficient matr ix in (22) to be in vertible, it suffices to sho w that M e xists, since P = I m and hence P − 1 e xists. The M can be wr itten as M = [ 0 − ( H > Φ k ) × ( Φ > k H )] − 1 (25) The m columns of H contain the integral operators to compute the f orces and moments from the C P distributions. More specifically , the first column computes C L and the second column computes C M . If the columns of H w ere linear ly dependent, then C L and C M w ould be scalar multiples of each other which is not the case. Therefore it is concluded that H has full column rank of m . Since the or thogonal transformation preserves the rank, rank  ( H > Φ k ) × ( Φ > k H )  is equal to rank  H > H  which is full-rank. Theref ore M exis ts and hence the inv erse to the coefficient matr ix in (22) e xists. B. Pre-Processing The pressure distribution from the PSP measurement contains missing data and in the datasets considered in the present study , they are f ound predominantl y in regions near nose and tail of the fuselage; see Figure 17. These missing regions are then inter polated from the surrounding sur f ace data using the v oronoi kernel inter polation av ailable in open-source visualization tool Para view [ 35 ], although other techniques f or inter polation may be used. Once the 31 missing data are imputed, we inter polate them onto a common gr id to perf orm the data fusion. The common gr id is created b y coarsening the PSP measurement gr id to keep the computational costs tractable. The final number of cells and points of the gr ids are shown in the T able 4. The compar ison of the computed coefficients of lift and moment f or the original and coarse g rids are shown in T able 5 where the summar y statistic of the percent error across all the test cases is pro vided; a maximum er ror of 1.72% er ror confir ms that the final gr id is not too coarse. The Figure 18 and Figure 19 also sho w the contour plots of pressure distribution comparing the or iginal gr id and the coarse g rid. Overall we obser v e that the gr id coarsening has a negligible impact on the quantities of interest while using only ∼ 10% of the inf or mation on the original g rids which leads to impro ved computational efficiency . Num. of Cells Num. of Points CFD 595642 533443 PSP 84288 85711 Coarse 8688 9103 T able 4 Summary of grids geometry Mean Max Min CFD CL 0.53 0.74 0.37 CM 0.39 0.61 0.09 PSP CL 1.07 1.72 0.02 CM 0.45 1.30 0.03 T able 5 Absolute percent difference be tween orig- inal and coarse grids (a) Ra w Data (b) Inv alid Dat a Remo ved Fig. 17 PSP measurements cont aining in valid dat a points 32 (a) Original data (b) Coarsen data Fig. 18 CFD pressure distribution on upper wing (a) Original data (b) Coarsen data Fig. 19 PSP pressure distribution on upper wing 33

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment