Bayesian estimate of the degree of a polynomial given a noisy data sample

Ba y esian estimate of the degree of a p olynomial giv en a noisy data sample Gio v anni Mana, P aolo Alb erto Giuliano Alb o, and Simona Lago INRIM – Istituto Nazionale di Ricerca M etrologica, str. delle Cacce 91, 10135 T orino, Italy Abstract. A widely used metho d to create a con tinuous represent ation of a discrete data-set is regression analysis. When the regression model is not based on a mat hematical description of t he physics underlying the data, heuristic tec hniques play a cr ucial role a nd the model choice can ha ve a s igniﬁcan t impact on the result. In this pap er, the problem of identifying the most appropriate mo del is formulated and solved in terms of Bay esian selection. Besides, probability calculus is the best wa y to choose among diﬀerent al ternativ es. The results obtained are applied to the case of both univ ar iate and biv ari ate polynomial s used as trial solutions of systems of thermodynamic partial diﬀeren tial equations. Submitted to: Metr olo gia P ACS num bers : 07.05.Kf , 02.50.Cw, 06.20.Dk, 02.50.Tt E-mail: g.mana@ inrim.it 1. Intr o ductio n A problem in regress ion a nalysis is to determine how many basis functions to include in the reg ression mo del, for instance, when determining the calibr ation cur ve that bes t ﬁts t he da ta [1]. An y s et of bas is functions can b e considered; when they are po lynomials, the pro blem is determining the degree o f the reg ression. A maximum likelihoo d appr oach, whic h leads to the highest po ssible num b er of the basis functions, cannot be the rig h t choice. This problem ha s b een consider ed by many a uthors in diﬀerent statistical settings and their inv estigatio ns led to a num b er of prop osed solutions [2, 3, 4, 5, 6]. An or iginal a nd undeservedly neglected one is hidden in a tutorial pap er o n B ayesian r e a soning by Gull [7], wher e the basic idea is to calculate and to compare ea ch mo del pr obabilit y , given the data. In order to bring this result to the metrologist’s atten tion, we rea ssess the Gull work and make clea r its us e fulness in selecting amo ng linea r mo dels. In a dditio n, b y slightly c hang ing the mo del parametr is ation, we obtain an exact ana ly tical solution and demonstra te that, in suitable limit cas es, it reduces to the Gull’s approximate one. Here obtained results may hav e an impac t when the polyno mial co eﬃcients are used for solving s ystems of par tial diﬀerential equations [8, 9]. In this ca se, diﬀerent choices o f the p olynomial degr ee lead to diﬀerent sets of co eﬃcients and, consequently , 2 to diﬀerent solutions. The av ailability o f a rig orous criter io n based on the probability calculus allows any arbitr ary choice – in general, driven only b y the residuals analysis – to b e avoided. T o illustrate the concepts her e describ ed, it is shown how to determine the set of ba sis functions that best ﬁts the measur ed v alues o f the sp eed o f s o und in acetone, a s a function of the temp erature a nd pressur e. 2. Problem statement W e wan t to represe nt the y = [ y 1 , y 2 , ... y N ] T measurement results by the linear mo del y = W ( l ) a + ǫ , (1) where ǫ = [ ǫ 1 , ǫ 2 , ... ǫ N ] T are additive unco r related Gaussian errors having unknown v ar iance σ 2 and zero mean, a = [ a 0 , a 1 , ... a l − 1 ] T are l mo del parameters, W ( l ) is a N × l matrix explaining the data , W nm = w m ( x n ), a nd { w 0 ( x ) , w 1 ( x ) , ... w l − 1 ( x ) } is a s et of l basis functions. The basis functions may b e po lynomials, for instance, w m ( x ) = x m , but, in general, they a re a n y set o f linearly indep enden t functions. The problem is to ﬁnd the set of basis functions most suppo r ted by the da ta; when they are p olynomia ls, this co rresp onds to ﬁnd the optimal degree o f the reg ression. The in terpr e ta tiv e mo del of the data is summar ised by the matrix W ( l ); therefore , the problem is equiv alent to ﬁnd – within a set o f matrices explaining the da ta – the one most suppo rted by the data. Since it explicitly a ppea r s in the ﬁnal formulae a nd for the sake of notational simplicity , we lab el the W matrices by the num b er l of the mo del fr e e-parameters. Ho wev er, we can compare a lso mo dels having the same nu mber of pa rameters but diﬀer en t basis functions. 3. Bay esian inferences According to the Bayes theorem – by assigning the same probability to all the mo dels – the proba bilit y of the l -th mo del to explain the data is pro por tional to the pro babilit y of the obser ved data given W ( l ), no ma tter what the v alues of the mo de l parameters may b e. In turn, it is the normalising factor of the likelihoo d of the mo del para meters times the probability distr ibution synthesising the info r mation av ailable ab out the parameter v alues b efore the measurement results ar e av aila ble . T o steer the calcula tion, we must ﬁrst determine the post-data probability dens it y , P ( a , σ | y , l ), of the pa rameters of each mo del (which pa rameters include the unknown v ar iance σ 2 of the data ) given the y n data and the da ta-explaining matrix W ( l ). The po st-data pro ba bilit y dens ity is found via the pro duct rule of probabilities [10, 11], P ( a , σ | y , l ) Z ( y | l ) = N N ( y | σ , a , l ) π ( σ, a | l ) , (2) where the N -dimensional Gaussia n function N N ( y | σ , a , l ) = 1 p (2 π ) N σ N exp  − | y − W a | 2 2 σ 2  (3) is the likelihoo d of the a and σ pa rameters, π ( σ , a | l ) is their pre-da ta pr obability density , the sought nor malisation factor o f N N ( y | σ , a , l ) π ( σ, a | l ), Z ( y | l ) = Z Γ N N ( y | σ , a , l ) π ( σ, a | l ) d σ d a , (4) is named mo del evidence , and the integration is car ried out over the hypervolume Γ asso ciated to the p ossible a and σ v alues. 3 Next, by o bserving that Z ( y | l ) is als o the proba bilit y densit y of the data given W ( l ) – whatever the v alues of a and σ may b e – we get the p ost-data mo del-pro ba bilit y , Prob( l | y ), by a pply ing a gain the pro duct rule of probabilities to the { l , y } pa ir. Hence [12], Prob( l | y ) A ( y ) = Z ( y | l ) , (5) where, prior the data are a t hand, we assig ned the same probability to each mo del and A ( y ) = L X l =1 Z ( y | l ) , (6) where L is the num b er of mo dels to b e compared, is the nor malisation factor of Z ( y | l ). Therefore, to solve the s ta ted problem, the calculation of the e v idence (4) is central. 4. Pre-data distri bution T o set the pre-data distribution of a a nd σ , we assume that they are indepe nden t. Hence, π ( σ, a | l ) = π σ ( σ ) π a ( a | l ). As rega rds σ , w e use the impro per J eﬀreys prior [1 3] π σ ( σ ) = 1 /σ, (7) which is inv ariant for a change of the measur emen t unit of the data. As regards the a parameter s, let the m ea n o f y , whatever the a v alues may be, n ull. The r elev ant av erag e is carr ied out over the joint distribution N N ( y | σ , a , l ) π ( σ, a | l ), not over the sampling distribution of the data N N ( y | σ , a , l ), where the a v alues are ﬁxed. Consequently , since y = W a + ǫ and ǫ are z e r o-mean error s, als o the pre-da ta mean of the a parameter s is zero. In a ddition, let β 2 1 b e t he pre-data c o v arianc e of y , wha tev er the a v alues may b e and where 1 is the unit matrix . Also in this cas e, the stated cov aria nce is relev ant to the joint distribution N N ( y | σ , a , l ) π ( σ, a | l ). Hence, by obs erving that W C aa W T + σ 2 1 = β 2 1 , b ecause o f (1), the pr e-data cov ariance C aa of the mo del parameters is C aa = ( β 2 − σ 2 )( W T W ) − 1 , (8) where β 2 > σ 2 . Even tually , since the prior distr ibution of a is constrained by h a i = 0 and (8), where the angle bracket indicate the mean, the pr inciple of maximum en tropy ﬁxes the so ugh t pr e-data distribution to the l - dimensional Gaussia n distribution [11] π a ( a | β , l ) = N l ( a | 0 , C aa ) (9) having zer o mean and C aa cov ariance. Actually , the β v alue is unknown. Therefore, we should eliminate it by margina lisation, π a ( a | l ) = Z ∞ σ π a ( a , β | l ) d β , (10) where π a ( a , β | l ) = π a ( a | β , l ) π β ( β ) = 1 β s det( W T W ) (2 π ) l ( β 2 − σ 2 ) l exp  − a T W T W a 2( β 2 − σ 2 )  , (11) and π β ( β ) = 1 / β is a Jeﬀre ys prior. How ever, since the in tegr ation in (10) cannot b e done ana lytically , we add β to the mo del par ameters and delay the marginalisa tion 4 ov er β a s muc h as po ssible. Hence, by putting (7) and (11) together , we o btain the pre-data distribution of the full set of mo del pa r ameters, π ( σ , β , a | l ) = 1 β σ s det( W T W ) (2 π ) l ( β 2 − σ 2 ) l exp  − a T W T W a 2( β 2 − σ 2 )  . (12) A comment is necess a ry . In g e neral, the use of impro per prio rs – like π σ ( σ ) = 1 /σ and π β ( β ) = 1 /β – s hould b e avoided, becaus e, in such a case, the mo del evidence (4) is deﬁned only up to unknown scale factors. Howev er, since in this case the sa me factor is included in a ll the evidences, this do es no t jeopar dise the mo del co mparison. 5. Mo de l sel ection 5.1. Evidenc e c alculation By c om bining (3) and (12), the evidence of W ( l ) is Z ( y | l ) = Z + ∞ 0 d σ 1 σ N +1 Z + ∞ σ d β 1 β s det( W T W ) (2 π ) N + l ( β 2 − σ 2 ) l × Z + ∞ −∞ exp  − | y − W a | 2 2 σ 2  exp  − a T W T W a 2( β 2 − σ 2 )  d a . (13) Before ca rrying out the integration, we o bserve that | y − W a | 2 = ( a − ˆ a ) T W T W ( a − ˆ a ) + y T ( y − ˆ y ) , (14) where ˆ a = ( W T W ) − 1 W T y is the least square s estimate o f a and ˆ y = W ˆ a is the measurand estimate. Hence, the ﬁrst in tegr ation is exp  − y T ˆ ǫ 2 σ 2  Z + ∞ −∞ exp  − ( a − ˆ a ) T W T W ( a − ˆ a ) 2 σ 2  exp  − a T W T W a 2( β 2 − σ 2 )  d a =  σ β  l s (2 π ) l ( β 2 − σ 2 ) l det( W T W ) exp  − y T ˆ ǫ 2 σ 2  exp  −| ˆ y | 2 2 β 2  , (15) where ˆ ǫ = y − ˆ y are the residuals a nd | ˆ y | 2 = ˆ y T ˆ y = ˆ a T W T W ˆ a . It m ust b e noted that y T ˆ ǫ > 0 b ecause | y − W a | 2 > 0 no matter what the a v alue may b e . Consequently , the right-hand side of (14) is gr eater than zero also when a = ˆ a . Hence, y T ˆ ǫ = y T ( y − ˆ y ) > 0. The next integration, 1 σ N +1 − l p (2 π ) N exp  − y T ˆ ǫ 2 σ 2  Z + ∞ σ 1 β l +1 exp  −| ˆ y | 2 2 β 2  d β = √ 2 l − 2  Γ( l / 2) − Γ  p/ 2 , | ˆ y | 2 / (2 σ 2 )  σ N +1 − l p (2 π ) N | ˆ y | l exp  − y T ˆ ǫ 2 σ 2  , (16) where Γ( z ) is the Euler g a mma function, elimina tes β . Even tually , provided N > l , the evidence is Z ( y | l ) = √ 2 l − 2 p (2 π ) N | ˆ y | l Z + ∞ 0 Γ( l / 2) − Γ  p/ 2 , | ˆ y | 2 / (2 σ 2 )  σ N +1 − l exp  − y T ˆ ǫ 2 σ 2  d σ = Γ  N − l 2  4 √ π N   Γ  l 2  | ˆ y | l ( y T ˆ ǫ ) ( N − l ) / 2 − Γ  l 2  2 ˜ F 1  l 2 , N − l 2 ; N +2 − l 2 ; y T ˆ ǫ | ˆ y | 2  | ˆ y | N   , (17) where 2 ˜ F 1 ( a, b ; c ; z ) is the reg ularised hypergeo metric function. 5 5.2. Mo del pr ob ability By assuming that, prior the data are av ailable, each W ( l ) has the same probability , according to (5) and (6), the l -model probability is propor tional t o the l -model evidence; that is, Prob( l | y ) ∝ Z ( y | l ) ∝ Γ  N − l 2  Γ  l 2  | ˆ y | l ( y T ˆ ǫ ) ( N − l ) / 2 − Γ  N − l 2  Γ  N 2  2 ˜ F 1  l 2 , N − l 2 ; N +2 − l 2 ; y T ˆ ǫ | ˆ y | 2  | ˆ y | N . (18) It is worth noting that, since Z ( y | l ) is the marg inal probability density of the data given W ( l ) (no ma tter what the v alues o f the mo del para meters may b e), the dimensions of Pro b( l | y ) a re the same of | ˆ y | − N . If W ( l 0 ) expla ins the data exa c tly , then ˆ ǫ ( l 0 ) = 0 a nd Pr ob( l 0 | y ) = 1, as expected. F urther more, P rob( l | y ) is indep endent of the y scale. In fact, when y → λ y , the evidence o f l transforms as Z ( y | l ) → λ − N Z ( y | l ), which leav es Pro b( l | y ) unchanged. In addition, Z ( y | l ) de p ends only o n y and ˆ y ; ther efore, it is indep enden t o f the choice of the sampling p oints x n . Even tually , Pr ob ( l | y ) is not inv ar ian t fo r tra nslation of the or igin of the y -axis; this is a consequence of the h y i = 0 as s umption, which is embedded into the pre-data distr ibutio n of the a c oeﬃcients. 5.2.1. Asymptotic b ehaviours. In the case when N − l ≫ 2, we can use the approximations ( N + 2 − l ) / 2 ≈ ( N − l ) / 2 and 2 ˜ F 1 ( N / 2 , ( N − l ) / 2; ( N − l ) / 2; z ) ≈ (1 − z ) − N/ 2 Γ[( N − l ) / 2] . (19) In addition, for a la rge da ta s ample, since ˆ y T ˆ ǫ /χ 2 y ≈ 0, wher e χ 2 y = | ˆ y | 2 is the sum of the squared data, and y = ˆ y + ˆ ǫ , it follows that y T ˆ ǫ /χ 2 y ≈ | ˆ ǫ | 2 /χ 2 y = χ 2 ǫ /χ 2 y , where χ 2 ǫ = | ˆ ǫ | 2 indicates the sum o f the squared residuals . Therefo r e, apart from the ˆ y − N ≈ const. factor that we omit, we can rewr ite (18) as Prob( l | y ) ≈ Γ[( N − l ) / 2]Γ( l / 2) ( χ ǫ /χ y ) N − l − Γ( N / 2) (1 + χ 2 ǫ /χ 2 y ) N/ 2 . ( 20 ) Even tually , if χ 2 ǫ /χ 2 y ≪ 1 – which mea ns g o o d data a nd goo d models – and N ≫ l , (20) simpliﬁes further as Prob( l | y ) ≈ Γ[( N − l ) / 2]Γ( l / 2) ( χ ǫ /χ y ) N − l . (21) This equatio n br ings in to light that, among the mo dels having the same num b er of free pa rameters, the most s upported by the data is tha t whose a sso ciated sum of the squared residuals is minim um. In additions, it shows that the optimal model minimises the residuals by keeping at the s a me time l as small as p ossible, in or der to maximise N − l . As a last step we wr ite (21) as ln[Prob( l | y )] ≈ ln  Γ  N − l 2  + ln  Γ  l 2  + ( N − l ) ln( χ y /χ ǫ ) , (22) which is the a pproximation given in [7]. 6 - 1.0 - 0.5 0.0 0.5 1.0 - 15 - 10 - 5 0 5 x  arbitrary units y  arbitrary units - 1.0 - 0.5 0.0 0.5 1.0 - 15 - 10 - 5 0 5 x  arbitrary units y  arbitrary units Figure 1. Si m ulated noisy data-samples of the p olynomial (25); solid line are the most probable polynomials explaining these sp eciﬁc data sets (left, a third degree polynomial; ri gh t, a ﬁfth degree p olynomial). Now, we study the case when the data ar e samples o f a p olynomial having deg r ee q and the ǫ v aria nce tends to zero. In this case, provided l ≥ q + 1, the residuals are independent of the degree o f the ﬁtting p olynomial, χ y /χ ǫ → ∞ , a nd ln[Prob( l | y )] ≈ ( N − l ) ln( χ y /χ ǫ ) , (23) which shows that the evidence is maximum when l is minimum. Therefore, the deg ree most s uppor ted by the data is q , as exp ected. If χ ǫ /χ y → ∞ – whic h means bad data or bad mo dels – and N ≫ l , (20) simpliﬁes as Prob( l | y ) ≈ Γ[( N − l ) / 2]Γ( l / 2) , (24) which indica tes that the optimal data mo del has only one deg ree of freedom, that is, y n = a 0 + ǫ n . 6. Nume rical example The ﬁgure 1 shows t wo indep endent sets of N = 50 sim ulated data each, uniformly sampled in the [ x min = − 1 , x max = 1 ] in ter v al from the ﬁfth deg ree po lynomial y = − 1 x − 10 x 2 + 2 x 3 + 5 x 5 . (25) The outputs of a Gaussia n random-n umber generator ha ving zero m ea n and 0.4 standard devia tion were added to the data. In order to fulﬁll the h y i = 0 requirement, the data hav e b een pre- pro cessed to remov e the arithmetic mea n. The σ = 0 . 4 v alue of the ǫ standard deviation was chosen intermediate b et ween the go o d and bad data limit cases. T o explain the data , a set of ten p olynomials – with degree fro m zer o to nine – hav e b een c onsidered, each p olynomial ha s been ﬁtted to the da ta, and b oth the e rror and data estimates – ˆ ǫ a nd ˆ y , resp ectively – hav e b een calculated. Even tually , the evidence of e a c h po ly nomial has bee n found by application of (1 8) as well as, after normalisatio n, eac h p olynomial pr o babilit y to explain the data has b een calculated. The r esults for ea c h o f the data sets s hown in Fig. 1 are shown in Fig. 2. With the σ = 0 . 4 choice, the degr ee of the po lynomials tha t b est explain the data hav e b een a lw ays found eq ua l to three or ﬁve, dep ending on the sp eciﬁc da ta s et. The ﬁgures 1 a nd 2 show thes e alterna ting cases. It is w or th noting that the probability to explain the da ta of a p olynomia l of fourth degree – whose ba sis function is missing 7 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 polynomial degree probability 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 polynomial degree probability Figure 2. Probab i l ities of the degree of the p olynomials explaining the dat a sets sho wn in Fig. 1. in (25) – is very low. As exp ected, by reducing the no ise level, the most likely degree stabilises o n ﬁve, whereas, by inc r easing the noise le vel, it stabilises o n three. 7. Sp eed of sound in acetone In [8] it was shown how to solve the thermo dynamic diﬀerent ia l equations  ∂ ρ ∂ p  T = T c p 1 ρ 2  ∂ ρ ∂ T  2 p + 1 w 2 (26 a )  ∂ c p ∂ p  T = − T ρ " 2 ρ 2  ∂ ρ ∂ T  2 p − 1 ρ  ∂ 2 ρ ∂ T 2  p # (26 b ) relating density ρ ( p, T ), heat ca pacit y c p ( p, T ), a nd sp eed o f so und w ( p, T ), as a function o f the temper ature T and press ure p . These e q uations can b e s o lv ed if initial conditions ρ ( p 0 , T ) a nd c p ( p 0 , T ) are given at a the refer e nce pressure, p 0 , and the sp eed of sound is known on the entire rang e of press ures and temp erature s o f interest. When a num er ical integration of (26 a - b ) is carried o ut, the heat capacity shows often diverging v a lues at the extre mes of the temper ature r ange. Approa c hing the int eg r ation problem by using lo cal p olynomial representations of the thermo dynamic quantities eliminates the divergence a nd allows the uncer tain ty of the results to b e estimated. Hence, b y using the trial solutions ρ ( p, T ) = X i,j a ij ( p − p 0 ) i ( T − T 0 ) j , (27 a ) c p ( p, T ) = X i,j b ij ( p − p 0 ) i ( T − T 0 ) j , (27 b ) w ( p, T ) = X i,j c ij ( p − p 0 ) i ( T − T 0 ) j , (27 c ) and – once the deg rees of the p o lynomials hav e b een ﬁxed – the unknown co eﬃcien ts a ij , b ij , a nd c ij are o btained as describ ed in [8]. Brieﬂy , the b est p olynomial approximations of the initial conditions and sp eed of sound a re used to determine a 0 j , b 0 j , a nd c ij ; s ubsequen tly , the remaining co eﬃcients a ij and b ij are calcula ted by means of the e quations (26 a ) a nd (26 b ). As a n application example, we show ho w to deter mine the optimal p olynomia l when smo othing the measured v alues of the sp eed of sound in aceto ne as a function o f 8 - 1.0 - 0.5 0.0 0.5 1.0 x - 1.0 - 0.5 0.0 0.5 1.0 y - 1.0 - 0.5 0.0 0.5 1.0 z Figure 3. Measured v al ues of the sp eed of sound in acetone. The data ha ve b een scaled in [ − 1 , +1] inte r v als; x i s the temp erature, y i s the pressure, and z is the speed of sound. The p olynomial mo del most likely supported by the data is als o sho wn. Red dots are the data higher than what predicted by the model, blue dots are those low er. 0 2 4 6 8 0.0 0.2 0.4 0.6 0.8 1.0 polynomial degree probability Figure 4. Probability of the pol ynomial s having degree from zero to s i x to explain the data in Fig. 3. temper ature a nd pressur e [8, 9]; a set of meas ur emen t r esults is shown in Fig. 3. F or the sake of n umerica l conv enience, the temp erature, pres sure, and sp eed hav e b een scaled in [ − 1 , + 1] interv als according to x = ( T − T 0 ) / ∆ T , y = ( p − p 0 ) / ∆ p , and z = ( w − w 0 ) / ∆ w , wher e the o ﬀsets and scale factors hav e bee n suitably chosen. As shown in Fig. 4, the regr ession ana ly ses using the s ev en ba sis-function s ets { 1 , x, y , ... x r y s , ... } q , whe r e 0 ≤ r + s ≤ q a nd q = 0 , 1 , ... 6, indicate that the q = 4 set is the o nly one supp orted by the da ta. This sharp s election is due to the fast incr ease of the num b er l of basis functions as the p olynomial degree increases . F or insta nce, if q = 3, then l = 10 ; if q = 4, then l = 15; if q = 5 , then l = 21. In order to carry o ut a more detailed a nalysis, reg ressions were car r ied o ut also by using the 190 893 s ubsets o f 14, 15, and 1 6 basis functions chosen in the { 1 , x, y , ... P r ( x ) P s ( y ) , ... } list, wher e 0 ≤ r + s ≤ 5 and P r ( x ) is a Legendr e p olynomials of degr ee r . The results a re shown in Fig . 5. The 14 ba sis functions who s e linear combination – which cor resp o nds to a ﬁfth degree p olynomia l – most likely expla ins 9 æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ 6400 6450 6500 6550 6600 6650 6700 6750 0.0 0.1 0.2 0.3 0.4 basis function set probability Figure 5 . Zo om of the pr obabilit y to explain the data set i n Fig. 3 of the subsets from l = 6400 to l = 6750 of 14, 15, and 16 basis functions c hosen in the { 1 , x, y , .. . P r ( x ) P s ( x ) , ... } list, where 0 ≤ r + s ≤ 5. The subsets are n umbered wi th the shortest ﬁr s t and the l ater elements in the list omitted ﬁrst. The pr obabilit y of the remaining subsets is zero for al l practical purp oses. the data are { 1, P 1 ( x ), P 1 ( y ), P 2 ( x ), P 2 ( y ), P 1 ( x ) P 1 ( y ), P 3 ( x ), P 3 ( y ), P 1 ( x ) P 2 ( y ), P 4 ( x ), P 4 ( y ), P 1 ( x ) P 3 ( y ), P 2 ( x ) P 2 ( y ), P 1 ( x ) P 4 ( y ) } . A fallout of this Bay esian a nalysis are the probabilities of all the sets of smoothing po lynomials c onsidered to mo del the data. Conseq ue ntly , when, a s in this cas e, a nu mber o f basis function sets hav e a signiﬁcant probability to explain the data, the quantities of in terest – in this ca s e, the speed of sound v alues – and the uncer tain ty inherent the mo del selec tio n can b e inferred by mo del averaging [14, 1 5]. 8. Conclus ions An a na lytical solutio n has been found for the problem of ﬁnding what basis functions m ust b e use d in linea r reg ression analyses. It relies on the Bay esian mo del s election and complement s the numerical r esults given in [7]. It uses the pr o babilit y algebra to select among diﬀerent basis function sets and enco des a preference for the smallest set capable to explain the da ta. In practice, a probability density is assigned to the regres s ion co eﬃcien ts prior the data a r e av ailable, consistently with the given prior information and a ccording to the maximum en tro p y principle. Next, the probability algebra allows this pro babilit y distribution to be upda ted according to the additional information delivered by the data. The r egression pr obability is prop ortional to the normalising fa c tor of the parameter likelihoo d times the pa r ameter prior dis tribution. A featur e o f this s olution is that, for a larg e data sample, the re gression probability depe nds only on the residuals and the num b er o f free pa rameters. The smaller a re the residuals, the higher the proba bilit y; but, a p enalt y exists for increas ing the num be r of parameters . In a ddition, if a basis-function s et explains the data exa ctly , its probability to ex plain the data is one. Ac kno wledg men ts This work was jointly funded by the E urop ean Metrolo g y Research P rogramme (EMRP) par ticipating co un tries within the E urop e an Asso ciatio n of National Metrology Institutes (EURAMET) and the Euro pean Union. 10 References [1] Massa E and Mana G 2013 An automated resistor net wo rk to i nspect the linearit y of resistance- thermometry measurements Me as. Sci. T e chnol. submitted [2] Anderson T W 1962 The c hoice of the degree of a p olynomial regression as a multiple decision problem Ann. Math. Statist. 33 255-65 [3] Sch wartz G 1978 Estimating the dimension of a mo del Ann . Statist . 6 461-4 [4] Shao J 1996 Bo otstrap m o del selection J. Amer. Stati st. Asso c . 9 1 655-65 [5] Philips R and Guttman I 1998 A new criteri on for v ar iable selection Stati stics & Pr ob ability L etters 38 11-9 [6] Guttman I, Pen a D and Redondas D 2005 A Ba yesian approach for predicting with p olynomial regression of unkno wn degree T e chnometrics 47 23-33 [7] Gull S F 1988 Ba yesian inductiv e inferences and maximum en tropy in: Maximum entr opy and Bayesian meyho ds in science and e ngine e ring 1 53-74 (Dordrec ht, the Netherlands: K lu wer Academic Publi shers) [8] Lago S and Giuliano Al bo P 2008 A new method to calcu l ate the thermo dynamical properties of liquids fr om accurate sp eed-of-sound measurements J. Chem. Thermo dynamics 40 1558-64 [9] Lago S and Giul i ano A lbo P 2013 A no vel application of r ecursiv e equation method f or determining thermo dynamic prop erties of single phase ﬂuids from density and speed-of-sound measuremen ts J. Chem. Thermo dynamics 58 422-7 [10] Sivia D and Skilling J 2006 Data Analysis: A Bay esian T utorial (Oxford: Oxford University Press) [11] Jaynes E T 2003 Pr obabilit y theory: The logic of science (Cambridge: Cambridge Uni v ersity Press) [12] Mc Kay D JC 2003 Information Theory , Inference, and Learni ng Al gorithms (Cambridge: Camb r i dge Universit y Press) [13] Jaynes E T 1968 Pr ior Probabilities IEEE T r ans. Sy s. Sci. Cyb ernetic s 4 227-41 [14] W aserman L 2000 Ba yesian mo del selection and mo del av eraging Journal of Mathematic al Psycholo gy 44 92-107 [15] Mana G, M assa E and Predescu M 2012 Mo del selection in the av erage of inconsisten t data: an analysis of the measured Planck -constant v alues Met r olo gia 49 492-500

Bayesian estimate of the degree of a polynomial given a noisy data sample

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment