Bayesian estimate of the degree of a polynomial given a noisy data sample
A widely used method to create a continuous representation of a discrete data-set is regression analysis. When the regression model is not based on a mathematical description of the physics underlying the data, heuristic techniques play a crucial rol…
Authors: Giovanni Mana, Paolo Alberto Giuliano Albo, Simona Lago
Ba y esian estimate of the degree of a p olynomial giv en a noisy data sample Gio v anni Mana, P aolo Alb erto Giuliano Alb o, and Simona Lago INRIM – Istituto Nazionale di Ricerca M etrologica, str. delle Cacce 91, 10135 T orino, Italy Abstract. A widely used metho d to create a con tinuous represent ation of a discrete data-set is regression analysis. When the regression model is not based on a mat hematical description of t he physics underlying the data, heuristic tec hniques play a cr ucial role a nd the model choice can ha ve a s ignifican t impact on the result. In this pap er, the problem of identifying the most appropriate mo del is formulated and solved in terms of Bay esian selection. Besides, probability calculus is the best wa y to choose among different al ternativ es. The results obtained are applied to the case of both univ ar iate and biv ari ate polynomial s used as trial solutions of systems of thermodynamic partial differen tial equations. Submitted to: Metr olo gia P ACS num bers : 07.05.Kf , 02.50.Cw, 06.20.Dk, 02.50.Tt E-mail: g.mana@ inrim.it 1. Intr o ductio n A problem in regress ion a nalysis is to determine how many basis functions to include in the reg ression mo del, for instance, when determining the calibr ation cur ve that bes t fits t he da ta [1]. An y s et of bas is functions can b e considered; when they are po lynomials, the pro blem is determining the degree o f the reg ression. A maximum likelihoo d appr oach, whic h leads to the highest po ssible num b er of the basis functions, cannot be the rig h t choice. This problem ha s b een consider ed by many a uthors in different statistical settings and their inv estigatio ns led to a num b er of prop osed solutions [2, 3, 4, 5, 6]. An or iginal a nd undeservedly neglected one is hidden in a tutorial pap er o n B ayesian r e a soning by Gull [7], wher e the basic idea is to calculate and to compare ea ch mo del pr obabilit y , given the data. In order to bring this result to the metrologist’s atten tion, we rea ssess the Gull work and make clea r its us e fulness in selecting amo ng linea r mo dels. In a dditio n, b y slightly c hang ing the mo del parametr is ation, we obtain an exact ana ly tical solution and demonstra te that, in suitable limit cas es, it reduces to the Gull’s approximate one. Here obtained results may hav e an impac t when the polyno mial co efficients are used for solving s ystems of par tial differential equations [8, 9]. In this ca se, different choices o f the p olynomial degr ee lead to different sets of co efficients and, consequently , 2 to different solutions. The av ailability o f a rig orous criter io n based on the probability calculus allows any arbitr ary choice – in general, driven only b y the residuals analysis – to b e avoided. T o illustrate the concepts her e describ ed, it is shown how to determine the set of ba sis functions that best fits the measur ed v alues o f the sp eed o f s o und in acetone, a s a function of the temp erature a nd pressur e. 2. Problem statement W e wan t to represe nt the y = [ y 1 , y 2 , ... y N ] T measurement results by the linear mo del y = W ( l ) a + ǫ , (1) where ǫ = [ ǫ 1 , ǫ 2 , ... ǫ N ] T are additive unco r related Gaussian errors having unknown v ar iance σ 2 and zero mean, a = [ a 0 , a 1 , ... a l − 1 ] T are l mo del parameters, W ( l ) is a N × l matrix explaining the data , W nm = w m ( x n ), a nd { w 0 ( x ) , w 1 ( x ) , ... w l − 1 ( x ) } is a s et of l basis functions. The basis functions may b e po lynomials, for instance, w m ( x ) = x m , but, in general, they a re a n y set o f linearly indep enden t functions. The problem is to find the set of basis functions most suppo r ted by the da ta; when they are p olynomia ls, this co rresp onds to find the optimal degree o f the reg ression. The in terpr e ta tiv e mo del of the data is summar ised by the matrix W ( l ); therefore , the problem is equiv alent to find – within a set o f matrices explaining the da ta – the one most suppo rted by the data. Since it explicitly a ppea r s in the final formulae a nd for the sake of notational simplicity , we lab el the W matrices by the num b er l of the mo del fr e e-parameters. Ho wev er, we can compare a lso mo dels having the same nu mber of pa rameters but differ en t basis functions. 3. Bay esian inferences According to the Bayes theorem – by assigning the same probability to all the mo dels – the proba bilit y of the l -th mo del to explain the data is pro por tional to the pro babilit y of the obser ved data given W ( l ), no ma tter what the v alues of the mo de l parameters may b e. In turn, it is the normalising factor of the likelihoo d of the mo del para meters times the probability distr ibution synthesising the info r mation av ailable ab out the parameter v alues b efore the measurement results ar e av aila ble . T o steer the calcula tion, we must first determine the post-data probability dens it y , P ( a , σ | y , l ), of the pa rameters of each mo del (which pa rameters include the unknown v ar iance σ 2 of the data ) given the y n data and the da ta-explaining matrix W ( l ). The po st-data pro ba bilit y dens ity is found via the pro duct rule of probabilities [10, 11], P ( a , σ | y , l ) Z ( y | l ) = N N ( y | σ , a , l ) π ( σ, a | l ) , (2) where the N -dimensional Gaussia n function N N ( y | σ , a , l ) = 1 p (2 π ) N σ N exp − | y − W a | 2 2 σ 2 (3) is the likelihoo d of the a and σ pa rameters, π ( σ , a | l ) is their pre-da ta pr obability density , the sought nor malisation factor o f N N ( y | σ , a , l ) π ( σ, a | l ), Z ( y | l ) = Z Γ N N ( y | σ , a , l ) π ( σ, a | l ) d σ d a , (4) is named mo del evidence , and the integration is car ried out over the hypervolume Γ asso ciated to the p ossible a and σ v alues. 3 Next, by o bserving that Z ( y | l ) is als o the proba bilit y densit y of the data given W ( l ) – whatever the v alues of a and σ may b e – we get the p ost-data mo del-pro ba bilit y , Prob( l | y ), by a pply ing a gain the pro duct rule of probabilities to the { l , y } pa ir. Hence [12], Prob( l | y ) A ( y ) = Z ( y | l ) , (5) where, prior the data are a t hand, we assig ned the same probability to each mo del and A ( y ) = L X l =1 Z ( y | l ) , (6) where L is the num b er of mo dels to b e compared, is the nor malisation factor of Z ( y | l ). Therefore, to solve the s ta ted problem, the calculation of the e v idence (4) is central. 4. Pre-data distri bution T o set the pre-data distribution of a a nd σ , we assume that they are indepe nden t. Hence, π ( σ, a | l ) = π σ ( σ ) π a ( a | l ). As rega rds σ , w e use the impro per J effreys prior [1 3] π σ ( σ ) = 1 /σ, (7) which is inv ariant for a change of the measur emen t unit of the data. As regards the a parameter s, let the m ea n o f y , whatever the a v alues may be, n ull. The r elev ant av erag e is carr ied out over the joint distribution N N ( y | σ , a , l ) π ( σ, a | l ), not over the sampling distribution of the data N N ( y | σ , a , l ), where the a v alues are fixed. Consequently , since y = W a + ǫ and ǫ are z e r o-mean error s, als o the pre-da ta mean of the a parameter s is zero. In a ddition, let β 2 1 b e t he pre-data c o v arianc e of y , wha tev er the a v alues may b e and where 1 is the unit matrix . Also in this cas e, the stated cov aria nce is relev ant to the joint distribution N N ( y | σ , a , l ) π ( σ, a | l ). Hence, by obs erving that W C aa W T + σ 2 1 = β 2 1 , b ecause o f (1), the pr e-data cov ariance C aa of the mo del parameters is C aa = ( β 2 − σ 2 )( W T W ) − 1 , (8) where β 2 > σ 2 . Even tually , since the prior distr ibution of a is constrained by h a i = 0 and (8), where the angle bracket indicate the mean, the pr inciple of maximum en tropy fixes the so ugh t pr e-data distribution to the l - dimensional Gaussia n distribution [11] π a ( a | β , l ) = N l ( a | 0 , C aa ) (9) having zer o mean and C aa cov ariance. Actually , the β v alue is unknown. Therefore, we should eliminate it by margina lisation, π a ( a | l ) = Z ∞ σ π a ( a , β | l ) d β , (10) where π a ( a , β | l ) = π a ( a | β , l ) π β ( β ) = 1 β s det( W T W ) (2 π ) l ( β 2 − σ 2 ) l exp − a T W T W a 2( β 2 − σ 2 ) , (11) and π β ( β ) = 1 / β is a Jeffre ys prior. How ever, since the in tegr ation in (10) cannot b e done ana lytically , we add β to the mo del par ameters and delay the marginalisa tion 4 ov er β a s muc h as po ssible. Hence, by putting (7) and (11) together , we o btain the pre-data distribution of the full set of mo del pa r ameters, π ( σ , β , a | l ) = 1 β σ s det( W T W ) (2 π ) l ( β 2 − σ 2 ) l exp − a T W T W a 2( β 2 − σ 2 ) . (12) A comment is necess a ry . In g e neral, the use of impro per prio rs – like π σ ( σ ) = 1 /σ and π β ( β ) = 1 /β – s hould b e avoided, becaus e, in such a case, the mo del evidence (4) is defined only up to unknown scale factors. Howev er, since in this case the sa me factor is included in a ll the evidences, this do es no t jeopar dise the mo del co mparison. 5. Mo de l sel ection 5.1. Evidenc e c alculation By c om bining (3) and (12), the evidence of W ( l ) is Z ( y | l ) = Z + ∞ 0 d σ 1 σ N +1 Z + ∞ σ d β 1 β s det( W T W ) (2 π ) N + l ( β 2 − σ 2 ) l × Z + ∞ −∞ exp − | y − W a | 2 2 σ 2 exp − a T W T W a 2( β 2 − σ 2 ) d a . (13) Before ca rrying out the integration, we o bserve that | y − W a | 2 = ( a − ˆ a ) T W T W ( a − ˆ a ) + y T ( y − ˆ y ) , (14) where ˆ a = ( W T W ) − 1 W T y is the least square s estimate o f a and ˆ y = W ˆ a is the measurand estimate. Hence, the first in tegr ation is exp − y T ˆ ǫ 2 σ 2 Z + ∞ −∞ exp − ( a − ˆ a ) T W T W ( a − ˆ a ) 2 σ 2 exp − a T W T W a 2( β 2 − σ 2 ) d a = σ β l s (2 π ) l ( β 2 − σ 2 ) l det( W T W ) exp − y T ˆ ǫ 2 σ 2 exp −| ˆ y | 2 2 β 2 , (15) where ˆ ǫ = y − ˆ y are the residuals a nd | ˆ y | 2 = ˆ y T ˆ y = ˆ a T W T W ˆ a . It m ust b e noted that y T ˆ ǫ > 0 b ecause | y − W a | 2 > 0 no matter what the a v alue may b e . Consequently , the right-hand side of (14) is gr eater than zero also when a = ˆ a . Hence, y T ˆ ǫ = y T ( y − ˆ y ) > 0. The next integration, 1 σ N +1 − l p (2 π ) N exp − y T ˆ ǫ 2 σ 2 Z + ∞ σ 1 β l +1 exp −| ˆ y | 2 2 β 2 d β = √ 2 l − 2 Γ( l / 2) − Γ p/ 2 , | ˆ y | 2 / (2 σ 2 ) σ N +1 − l p (2 π ) N | ˆ y | l exp − y T ˆ ǫ 2 σ 2 , (16) where Γ( z ) is the Euler g a mma function, elimina tes β . Even tually , provided N > l , the evidence is Z ( y | l ) = √ 2 l − 2 p (2 π ) N | ˆ y | l Z + ∞ 0 Γ( l / 2) − Γ p/ 2 , | ˆ y | 2 / (2 σ 2 ) σ N +1 − l exp − y T ˆ ǫ 2 σ 2 d σ = Γ N − l 2 4 √ π N Γ l 2 | ˆ y | l ( y T ˆ ǫ ) ( N − l ) / 2 − Γ l 2 2 ˜ F 1 l 2 , N − l 2 ; N +2 − l 2 ; y T ˆ ǫ | ˆ y | 2 | ˆ y | N , (17) where 2 ˜ F 1 ( a, b ; c ; z ) is the reg ularised hypergeo metric function. 5 5.2. Mo del pr ob ability By assuming that, prior the data are av ailable, each W ( l ) has the same probability , according to (5) and (6), the l -model probability is propor tional t o the l -model evidence; that is, Prob( l | y ) ∝ Z ( y | l ) ∝ Γ N − l 2 Γ l 2 | ˆ y | l ( y T ˆ ǫ ) ( N − l ) / 2 − Γ N − l 2 Γ N 2 2 ˜ F 1 l 2 , N − l 2 ; N +2 − l 2 ; y T ˆ ǫ | ˆ y | 2 | ˆ y | N . (18) It is worth noting that, since Z ( y | l ) is the marg inal probability density of the data given W ( l ) (no ma tter what the v alues o f the mo del para meters may b e), the dimensions of Pro b( l | y ) a re the same of | ˆ y | − N . If W ( l 0 ) expla ins the data exa c tly , then ˆ ǫ ( l 0 ) = 0 a nd Pr ob( l 0 | y ) = 1, as expected. F urther more, P rob( l | y ) is indep endent of the y scale. In fact, when y → λ y , the evidence o f l transforms as Z ( y | l ) → λ − N Z ( y | l ), which leav es Pro b( l | y ) unchanged. In addition, Z ( y | l ) de p ends only o n y and ˆ y ; ther efore, it is indep enden t o f the choice of the sampling p oints x n . Even tually , Pr ob ( l | y ) is not inv ar ian t fo r tra nslation of the or igin of the y -axis; this is a consequence of the h y i = 0 as s umption, which is embedded into the pre-data distr ibutio n of the a c oefficients. 5.2.1. Asymptotic b ehaviours. In the case when N − l ≫ 2, we can use the approximations ( N + 2 − l ) / 2 ≈ ( N − l ) / 2 and 2 ˜ F 1 ( N / 2 , ( N − l ) / 2; ( N − l ) / 2; z ) ≈ (1 − z ) − N/ 2 Γ[( N − l ) / 2] . (19) In addition, for a la rge da ta s ample, since ˆ y T ˆ ǫ /χ 2 y ≈ 0, wher e χ 2 y = | ˆ y | 2 is the sum of the squared data, and y = ˆ y + ˆ ǫ , it follows that y T ˆ ǫ /χ 2 y ≈ | ˆ ǫ | 2 /χ 2 y = χ 2 ǫ /χ 2 y , where χ 2 ǫ = | ˆ ǫ | 2 indicates the sum o f the squared residuals . Therefo r e, apart from the ˆ y − N ≈ const. factor that we omit, we can rewr ite (18) as Prob( l | y ) ≈ Γ[( N − l ) / 2]Γ( l / 2) ( χ ǫ /χ y ) N − l − Γ( N / 2) (1 + χ 2 ǫ /χ 2 y ) N/ 2 . ( 20 ) Even tually , if χ 2 ǫ /χ 2 y ≪ 1 – which mea ns g o o d data a nd goo d models – and N ≫ l , (20) simplifies further as Prob( l | y ) ≈ Γ[( N − l ) / 2]Γ( l / 2) ( χ ǫ /χ y ) N − l . (21) This equatio n br ings in to light that, among the mo dels having the same num b er of free pa rameters, the most s upported by the data is tha t whose a sso ciated sum of the squared residuals is minim um. In additions, it shows that the optimal model minimises the residuals by keeping at the s a me time l as small as p ossible, in or der to maximise N − l . As a last step we wr ite (21) as ln[Prob( l | y )] ≈ ln Γ N − l 2 + ln Γ l 2 + ( N − l ) ln( χ y /χ ǫ ) , (22) which is the a pproximation given in [7]. 6 - 1.0 - 0.5 0.0 0.5 1.0 - 15 - 10 - 5 0 5 x arbitrary units y arbitrary units - 1.0 - 0.5 0.0 0.5 1.0 - 15 - 10 - 5 0 5 x arbitrary units y arbitrary units Figure 1. Si m ulated noisy data-samples of the p olynomial (25); solid line are the most probable polynomials explaining these sp ecific data sets (left, a third degree polynomial; ri gh t, a fifth degree p olynomial). Now, we study the case when the data ar e samples o f a p olynomial having deg r ee q and the ǫ v aria nce tends to zero. In this case, provided l ≥ q + 1, the residuals are independent of the degree o f the fitting p olynomial, χ y /χ ǫ → ∞ , a nd ln[Prob( l | y )] ≈ ( N − l ) ln( χ y /χ ǫ ) , (23) which shows that the evidence is maximum when l is minimum. Therefore, the deg ree most s uppor ted by the data is q , as exp ected. If χ ǫ /χ y → ∞ – whic h means bad data or bad mo dels – and N ≫ l , (20) simplifies as Prob( l | y ) ≈ Γ[( N − l ) / 2]Γ( l / 2) , (24) which indica tes that the optimal data mo del has only one deg ree of freedom, that is, y n = a 0 + ǫ n . 6. Nume rical example The figure 1 shows t wo indep endent sets of N = 50 sim ulated data each, uniformly sampled in the [ x min = − 1 , x max = 1 ] in ter v al from the fifth deg ree po lynomial y = − 1 x − 10 x 2 + 2 x 3 + 5 x 5 . (25) The outputs of a Gaussia n random-n umber generator ha ving zero m ea n and 0.4 standard devia tion were added to the data. In order to fulfill the h y i = 0 requirement, the data hav e b een pre- pro cessed to remov e the arithmetic mea n. The σ = 0 . 4 v alue of the ǫ standard deviation was chosen intermediate b et ween the go o d and bad data limit cases. T o explain the data , a set of ten p olynomials – with degree fro m zer o to nine – hav e b een c onsidered, each p olynomial ha s been fitted to the da ta, and b oth the e rror and data estimates – ˆ ǫ a nd ˆ y , resp ectively – hav e b een calculated. Even tually , the evidence of e a c h po ly nomial has bee n found by application of (1 8) as well as, after normalisatio n, eac h p olynomial pr o babilit y to explain the data has b een calculated. The r esults for ea c h o f the data sets s hown in Fig. 1 are shown in Fig. 2. With the σ = 0 . 4 choice, the degr ee of the po lynomials tha t b est explain the data hav e b een a lw ays found eq ua l to three or five, dep ending on the sp ecific da ta s et. The figures 1 a nd 2 show thes e alterna ting cases. It is w or th noting that the probability to explain the da ta of a p olynomia l of fourth degree – whose ba sis function is missing 7 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 polynomial degree probability 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 polynomial degree probability Figure 2. Probab i l ities of the degree of the p olynomials explaining the dat a sets sho wn in Fig. 1. in (25) – is very low. As exp ected, by reducing the no ise level, the most likely degree stabilises o n five, whereas, by inc r easing the noise le vel, it stabilises o n three. 7. Sp eed of sound in acetone In [8] it was shown how to solve the thermo dynamic different ia l equations ∂ ρ ∂ p T = T c p 1 ρ 2 ∂ ρ ∂ T 2 p + 1 w 2 (26 a ) ∂ c p ∂ p T = − T ρ " 2 ρ 2 ∂ ρ ∂ T 2 p − 1 ρ ∂ 2 ρ ∂ T 2 p # (26 b ) relating density ρ ( p, T ), heat ca pacit y c p ( p, T ), a nd sp eed o f so und w ( p, T ), as a function o f the temper ature T and press ure p . These e q uations can b e s o lv ed if initial conditions ρ ( p 0 , T ) a nd c p ( p 0 , T ) are given at a the refer e nce pressure, p 0 , and the sp eed of sound is known on the entire rang e of press ures and temp erature s o f interest. When a num er ical integration of (26 a - b ) is carried o ut, the heat capacity shows often diverging v a lues at the extre mes of the temper ature r ange. Approa c hing the int eg r ation problem by using lo cal p olynomial representations of the thermo dynamic quantities eliminates the divergence a nd allows the uncer tain ty of the results to b e estimated. Hence, b y using the trial solutions ρ ( p, T ) = X i,j a ij ( p − p 0 ) i ( T − T 0 ) j , (27 a ) c p ( p, T ) = X i,j b ij ( p − p 0 ) i ( T − T 0 ) j , (27 b ) w ( p, T ) = X i,j c ij ( p − p 0 ) i ( T − T 0 ) j , (27 c ) and – once the deg rees of the p o lynomials hav e b een fixed – the unknown co efficien ts a ij , b ij , a nd c ij are o btained as describ ed in [8]. Briefly , the b est p olynomial approximations of the initial conditions and sp eed of sound a re used to determine a 0 j , b 0 j , a nd c ij ; s ubsequen tly , the remaining co efficients a ij and b ij are calcula ted by means of the e quations (26 a ) a nd (26 b ). As a n application example, we show ho w to deter mine the optimal p olynomia l when smo othing the measured v alues of the sp eed of sound in aceto ne as a function o f 8 - 1.0 - 0.5 0.0 0.5 1.0 x - 1.0 - 0.5 0.0 0.5 1.0 y - 1.0 - 0.5 0.0 0.5 1.0 z Figure 3. Measured v al ues of the sp eed of sound in acetone. The data ha ve b een scaled in [ − 1 , +1] inte r v als; x i s the temp erature, y i s the pressure, and z is the speed of sound. The p olynomial mo del most likely supported by the data is als o sho wn. Red dots are the data higher than what predicted by the model, blue dots are those low er. 0 2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 polynomial degree probability Figure 4. Probability of the pol ynomial s having degree from zero to s i x to explain the data in Fig. 3. temper ature a nd pressur e [8, 9]; a set of meas ur emen t r esults is shown in Fig. 3. F or the sake of n umerica l conv enience, the temp erature, pres sure, and sp eed hav e b een scaled in [ − 1 , + 1] interv als according to x = ( T − T 0 ) / ∆ T , y = ( p − p 0 ) / ∆ p , and z = ( w − w 0 ) / ∆ w , wher e the o ffsets and scale factors hav e bee n suitably chosen. As shown in Fig. 4, the regr ession ana ly ses using the s ev en ba sis-function s ets { 1 , x, y , ... x r y s , ... } q , whe r e 0 ≤ r + s ≤ q a nd q = 0 , 1 , ... 6, indicate that the q = 4 set is the o nly one supp orted by the da ta. This sharp s election is due to the fast incr ease of the num b er l of basis functions as the p olynomial degree increases . F or insta nce, if q = 3, then l = 10 ; if q = 4, then l = 15; if q = 5 , then l = 21. In order to carry o ut a more detailed a nalysis, reg ressions were car r ied o ut also by using the 190 893 s ubsets o f 14, 15, and 1 6 basis functions chosen in the { 1 , x, y , ... P r ( x ) P s ( y ) , ... } list, wher e 0 ≤ r + s ≤ 5 and P r ( x ) is a Legendr e p olynomials of degr ee r . The results a re shown in Fig . 5. The 14 ba sis functions who s e linear combination – which cor resp o nds to a fifth degree p olynomia l – most likely expla ins 9 æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ 6400 6450 6500 6550 6600 6650 6700 6750 0.0 0.1 0.2 0.3 0.4 basis function set probability Figure 5 . Zo om of the pr obabilit y to explain the data set i n Fig. 3 of the subsets from l = 6400 to l = 6750 of 14, 15, and 16 basis functions c hosen in the { 1 , x, y , .. . P r ( x ) P s ( x ) , ... } list, where 0 ≤ r + s ≤ 5. The subsets are n umbered wi th the shortest fir s t and the l ater elements in the list omitted first. The pr obabilit y of the remaining subsets is zero for al l practical purp oses. the data are { 1, P 1 ( x ), P 1 ( y ), P 2 ( x ), P 2 ( y ), P 1 ( x ) P 1 ( y ), P 3 ( x ), P 3 ( y ), P 1 ( x ) P 2 ( y ), P 4 ( x ), P 4 ( y ), P 1 ( x ) P 3 ( y ), P 2 ( x ) P 2 ( y ), P 1 ( x ) P 4 ( y ) } . A fallout of this Bay esian a nalysis are the probabilities of all the sets of smoothing po lynomials c onsidered to mo del the data. Conseq ue ntly , when, a s in this cas e, a nu mber o f basis function sets hav e a significant probability to explain the data, the quantities of in terest – in this ca s e, the speed of sound v alues – and the uncer tain ty inherent the mo del selec tio n can b e inferred by mo del averaging [14, 1 5]. 8. Conclus ions An a na lytical solutio n has been found for the problem of finding what basis functions m ust b e use d in linea r reg ression analyses. It relies on the Bay esian mo del s election and complement s the numerical r esults given in [7]. It uses the pr o babilit y algebra to select among different basis function sets and enco des a preference for the smallest set capable to explain the da ta. In practice, a probability density is assigned to the regres s ion co efficien ts prior the data a r e av ailable, consistently with the given prior information and a ccording to the maximum en tro p y principle. Next, the probability algebra allows this pro babilit y distribution to be upda ted according to the additional information delivered by the data. The r egression pr obability is prop ortional to the normalising fa c tor of the parameter likelihoo d times the pa r ameter prior dis tribution. A featur e o f this s olution is that, for a larg e data sample, the re gression probability depe nds only on the residuals and the num b er o f free pa rameters. The smaller a re the residuals, the higher the proba bilit y; but, a p enalt y exists for increas ing the num be r of parameters . In a ddition, if a basis-function s et explains the data exa ctly , its probability to ex plain the data is one. Ac kno wledg men ts This work was jointly funded by the E urop ean Metrolo g y Research P rogramme (EMRP) par ticipating co un tries within the E urop e an Asso ciatio n of National Metrology Institutes (EURAMET) and the Euro pean Union. 10 References [1] Massa E and Mana G 2013 An automated resistor net wo rk to i nspect the linearit y of resistance- thermometry measurements Me as. Sci. T e chnol. submitted [2] Anderson T W 1962 The c hoice of the degree of a p olynomial regression as a multiple decision problem Ann. Math. Statist. 33 255-65 [3] Sch wartz G 1978 Estimating the dimension of a mo del Ann . Statist . 6 461-4 [4] Shao J 1996 Bo otstrap m o del selection J. Amer. Stati st. Asso c . 9 1 655-65 [5] Philips R and Guttman I 1998 A new criteri on for v ar iable selection Stati stics & Pr ob ability L etters 38 11-9 [6] Guttman I, Pen a D and Redondas D 2005 A Ba yesian approach for predicting with p olynomial regression of unkno wn degree T e chnometrics 47 23-33 [7] Gull S F 1988 Ba yesian inductiv e inferences and maximum en tropy in: Maximum entr opy and Bayesian meyho ds in science and e ngine e ring 1 53-74 (Dordrec ht, the Netherlands: K lu wer Academic Publi shers) [8] Lago S and Giuliano Al bo P 2008 A new method to calcu l ate the thermo dynamical properties of liquids fr om accurate sp eed-of-sound measurements J. Chem. Thermo dynamics 40 1558-64 [9] Lago S and Giul i ano A lbo P 2013 A no vel application of r ecursiv e equation method f or determining thermo dynamic prop erties of single phase fluids from density and speed-of-sound measuremen ts J. Chem. Thermo dynamics 58 422-7 [10] Sivia D and Skilling J 2006 Data Analysis: A Bay esian T utorial (Oxford: Oxford University Press) [11] Jaynes E T 2003 Pr obabilit y theory: The logic of science (Cambridge: Cambridge Uni v ersity Press) [12] Mc Kay D JC 2003 Information Theory , Inference, and Learni ng Al gorithms (Cambridge: Camb r i dge Universit y Press) [13] Jaynes E T 1968 Pr ior Probabilities IEEE T r ans. Sy s. Sci. Cyb ernetic s 4 227-41 [14] W aserman L 2000 Ba yesian mo del selection and mo del av eraging Journal of Mathematic al Psycholo gy 44 92-107 [15] Mana G, M assa E and Predescu M 2012 Mo del selection in the av erage of inconsisten t data: an analysis of the measured Planck -constant v alues Met r olo gia 49 492-500
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment