Approaches for multi-step density forecasts with application to aggregated wind power

The generation of multi-step density forecasts for non-Gaussian data mostly relies on Monte Carlo simulations which are computationally intensive. Using aggregated wind power in Ireland, we study two approaches of multi-step density forecasts which c…

Authors: Ada Lau, Patrick McSharry

The Annals of Applie d Statistics 2010, V ol. 4, No. 3, 1311–134 1 DOI: 10.1214 /09-A OAS320 c  Institute of Mathematical Statistics , 2 010 APPR OA CHES FOR MUL TI-STEP DENS ITY F ORECASTS WITH APPLICA TION TO AG GREGA TED WIND POWER By Ada La u 1 and P a trick McSharr y 2 University o f Oxfor d The generation of multi-step density forecasts for non-Gaussian data mostly relies on Monte Carlo simulations which are computa- tionally i ntensiv e. U sing aggr egated wind p ow er in Ireland, w e stud y tw o approac h es of m ulti-step density forecasts which can be obtained from simple iterations so that intensiv e computations are av oided. In the first approach, we apply a logistic transformation to normal- ize t he d ata approximately and describ e the transformed data using ARIMA–GARCH mo dels so that multi-step forecas ts can b e iterated easily . In th e second approach, w e describ e the forecast densities by truncated normal d istribu tions which are gov erned by tw o parame- ters, namely , the conditional mean and conditional v ariance. W e ap- ply ex p onentia l smoothing metho ds to forecast the tw o p arameters sim u ltaneously . Since the underlying mo del of ex p onential smo oth- ing is Gaussian, we are able to obtain multi-step forecasts of the parameters by simple iterations and thus generate forecast densities as truncated normal distributions. W e generate forecasts for wind p ow er from 15 minutes to 24 hours ahead. R esults show that the first approach generates sup erior forecasts and slightly outp erforms the second app roac h under v arious prop er scores. Nevertheless, the second approac h is computationally more efficient and gives more ro- bust results und er different lengths of training data. It also provides an attractive alternative app roac h since one is allo w ed to choose a particular parametric density for the forecasts, and is v aluable when there are no obvious transformations to normalize th e data. 1. In tro d uction. Wind p o wer forecasts are essen tial f or the efficien t op- eration and integrat ion of win d p o wer into the national grid. Since wind is Received Sep tember 2009; revised December 2009. 1 Supp orted by the Ox ford–Man Institut e of Q uantitativ e Finance. 2 Supp orted by a Roy al A cademy of Engineering/EPSRC Researc h F ello wship and by the Eu rop ean Commission under th e SafeWind Pro ject (ENK7-CT2008-213740). Key wor ds and phr ases. Non- Gaussian t ime series, logistic transformation, exp onential smoothing, trun cated normal distribution, AR I MA–GARCH mo del, continuous rank ed probabilit y score. This is an electro nic reprint o f the or iginal article published by the Institute of Mathematical Statistics in The Annals of Applie d S tatistics , 2010, V ol. 4, No . 3, 1311 – 1341 . This reprint differs from the origina l in pagina tion and typogr aphic detail. 1 2 A. LAU AN D P . M C S HARR Y v ariable and wind energy cannot b e s tored efficientl y , there are risks of p o wer shortages d uring p erio ds of lo w win d sp eed. Wind turb ines ma y also need to b e shut do w n w hen w ind sp eeds are to o high, leading to an abrup t dr op of p ow er su pply . It is extremely imp ortan t for p o we r system op erators to qu an- tify th e uncertainti es of win d p o wer generation in ord er to plan for system reserv e efficien tly [Dohert y and O ’Malley ( 2005 )]. In addition, win d farm op erators require accurate estimations of the u ncertain ties of win d p o w er generation to reduce p enalties and maximize reve nues from the electricit y mark et [Pinson, Chev allier and Karin iotakis ( 2007 )]. Since th e work of Bro w n , K atz and Murp hy ( 1984 ) in wind sp eed fore- casting using autoregressiv e mo dels, there has b een an increasing amount of researc h in wind sp eed and wind p o wer forecasts. Most of the early litera- ture fo cuses on p oin t forecasts, and in recen t y ears more emph asis h as b een placed on prob ab ilistic or density forecasts b ecause of the need to quan- tify uncertain ties. Ho we v er, the num b er of studies on m ulti-step den sit y forecasts is still relativ ely s mall, not to ment ion the ev aluation of f orecast p erform an ces for h orizons h > 1. Early w orks on multi-ste p d ensit y forecasts can b e found in Da vies, Pem b erton and Petruccell i ( 1988 ) and Mo eanaddin and T ong ( 1990 ), w h ere the densities are estimated u s ing recursive numer- ical quadr ature that requires s ignifican t computational time. Manzan an d Zerom ( 2008 ) pr op ose a nonp arametric wa y to generate density forecasts for the U.S. Indu strial Pr o duction series, which is b ased on b o otstrap metho d s . Ho we v er, Mon te Carlo s imulations are required and th is appr oac h is also computationally int ensiv e. One of the approac h es to w ind p o wer forecasting is to fo cus on th e mo d - eling of wind sp eed an d then transform the d ata into win d p o wer through a p ow er curve [Sanchez ( 2006 )]. An ad v antag e is that w ind sp eed time series are smo other and m ore easily d escrib ed by lin ear mo d els. Ho wev er, a ma jor difficult y is that the shap e of the p o wer curve ma y v ary w ith time, and also it is difficult to qu an tify the un certain ties in calibrating the nonlinear p ow er curv e. Another ap p roac h is to transform meteorological f orecasts into wind p ow er forecasts, where ens emble forecasts are generated from soph isticated n u merical wea th er prediction (NWP) mo dels [T a ylor, McSharr y and Buizza ( 2009 ), Pinson and Madsen ( 2009 )]. This approac h is able to pro duce r eliable wind p o wer forecasts up to 10 da ys ahead, b ut it requires th e computation of a large num b er of scenarios as wel l as exp ens ive NWP mo dels. A third approac h to wind p o wer forecasting fo cus es on the direct statistical mo del- ing of wind p o we r time series. In this case the difficult y lies on the fact that wind p o wer time series are highly nonlinear and n on-Gaussian. In particular, wind p ow er time series at in d ividual wind farms alw a ys conta in long c hains of zeros and su dden jumps fr om maximum capacit y to a lo w v alue d ue to gusts of wind since turbines ha v e to b e sh u t d o wn temp orarily . Nev erthe- less, it has b een sh o wn th at statistical time series mo dels ma y outp erform MUL TI-STEP WIN D POWER DENSITY FORECASTS 3 sophisticated meteorological forecasts for sh ort forecast horizons within 6 hours [Milli gan, Sch w artz and W an ( 2004 )]. Extensiv e reviews of the short term state-of-the-a rt w in d p ow er prediction are conta ined in Landb erg et al. ( 2003 ), Gieb el, Kariniotakis and Bro wn swo r d ( 2003 ) and Costa et al. ( 2008 ), in which p o wer curv e mo dels, NWP mo dels and other statistical mo dels are discussed. In this pap er we ad op t the third appr oac h and consider mo deling the win d p ow er data directly . W e aim at s hort f orecast horizons within 24 hours ahead, since for longer forecast horizons the NWP mo dels m ay b e more r eliable. As men tioned ab ov e, win d p ow er time series are highly n onlinear. Aggregating the ind ividual wind p o wer time series will s m o oth out the irregularities, resulting in a time series wh ic h is more app ropriately d escrib ed by linear mo dels under suitable tr an s formations. Aggrega ted wind p o w er generation is also more relev an t to p o we r companies since they mainly consider th e total lev el of wind p ow er generation av ailable for d ispatc h. Thus, it is economica lly imp ortant to generate r eliable densit y forecasts f or aggregated wind p o wer generation. F or this reason, as a fir st study , this pap er considers the mo deling of aggrega ted wind p o w er time series. On e ma y argue th at u tilizing sp atiotem- p oral correlations among ind ividual win d f arms may imp ro ve the r esults in forecasting aggregat ed wind p o wer. W e will show in Section 4 that this is not the case here, at least by the use of a simple multiple time series ap- proac h. Unless one is in terested in the p o wer generate d at individual w ind farms, it is more appropriate to forecast the aggregat ed w ind p ow er as a univ ariate time s er ies. W e prop ose tw o approac h es of generating m u lti-step ahead densit y forecasts for wind p o wer generation, and w e d emons trate the v alue of our approac h es usin g wind p o wer generation from 64 wind farms in Ireland. In the fir st approac h, we demonstrate that the logistic function is a suitable transf ormation to normalize the aggrega ted wind p ow er data. In the second approac h, we d escrib e the forecast densities by trun cated normal distributions whic h are go verned b y tw o parameters, namely , the conditional mean and conditional v ariance. W e app ly exp onen tial smo othing metho ds to forecast the t wo p arameters simultaneo usly . S ince th e un derlying m o del of exp onentia l sm o othing is Gaussian, w e are able to obtain multi-ste p forecasts of the parameters b y simple iterations and th us generate forecast d ensities as truncated normal distr ibutions. Although th e s econd ap p roac h p erforms similarly to the fi r st in terms of our ev aluatio n of the wind p o wer forecasts, it h as n umerous adv ant ages. It is computationally more efficient, its f ore- cast p erform ances are more r ob u st, and it p ro vid es the flexibilit y to c h o ose a suitable parametric fu nction for the densit y forecasts. I t is also v aluable when there are no ob vious tr ansformations to n ormalize the data. Our p ap er is organized as follo ws. In S ection 2 we describ e th e wind p ow er data that we us e in our study . Then we explain the t wo app roac hes 4 A. LAU AN D P . M C S HARR Y Fig. 1. The lo c ations of 64 wind farms in I r eland. Ther e ar e 68 wi nd farms and wind p ower time series in the r aw data, but 4 p airs of wi nd farms ar e so close that they ar e essential ly extensions f r om the c orr esp onding old wind farm. As a r esult, w e si mply c on- sider 64 wind f arms her e. The wi nd farms ar e distribute d thr oughout Ir eland, and Arklow Banks i s the only offshor e wind farm. of generating m u lti-step densit y forecasts in Section 3 . The fi rst approac h concerning the logistic transformation is describ ed in Section 3.1 , while in Section 3.2 w e give the detail s on the second approac h using exp onen tial smo othing metho d s and trun cated normal distrib utions. In Section 4 we construct 4 b enchmarks to ga uge the p erformances of our ap p roac hes, and w e ev aluate the forecast p erformances using v arious prop er scores. Finally , w e conclude our p ap er in Section 5 , where we sum marize the b enefits of our approac hes and discu s s imp ortan t futur e researc h directions. 2. Wind p o we r data. W e consider aggregated wind p o wer generated from 64 wind farm s in Ireland for appr oximate ly six mon th s from 13-Jul- 2007 to 01-Jan-2008. Th e d ata are recorded ev er y 15 minutes, giving a total n u m b er of 16,512 observ ations du ring the p erio d. The lo cations of the wind farms are sho w n in Figure 1 . One of th e wind farms, kno w n as Ar klo w Banks, is offsh ore. 3 W e sum up the capac ities 4 of all wind farms and the total ca- pacit y is 792.355 MW. In ord er to facil itate comparisons b et wee n data sets 3 Detailed inf ormation of individual wind farms, such as latitude, longitude and capacit y , is p rovided by Eirgrid p lc and can b e found in Lau ( 2010 ). 4 The capacity is the maxim u m output of a wind farm when all turbines op erate at their maximum nominal p ow er. MUL TI-STEP WIN D POWER DENSITY FORECASTS 5 Fig. 2. Time series of normalize d aggr e gate d wind p ower fr om 64 wi nd farms in Ir eland, wher e the aggr e gate d wind p ower i s normalize d by the total c ap acity of 792.355 MW. The data ar e disse cte d into a tr aining set and a testing set as shown by the dashe d line. Ab out four months of data ar e use d for p ar ameter estimation, and the r emaining two months of data ar e use d for out-of-sample evaluation. with d ifferen t capacities, we norm alize the aggrega ted wind p o we r by divid- ing b y the total capacit y , that is, 792.355 MW, and so the normalized data is b ou n ded w ithin [0 , 1]. W e ha v e c h ec ke d that forecast r esu lts, in particular, for approac hes inv olving nonlinear transf ormations, are in fact in sensitiv e to the exact v alue of normalization. 5 W e dissect th e data into a training set of ab out 4 mon th s (the first 11,008 data p oints) for parameter estimation, and a testing s et of ab out t wo months (the remaining 5504 data p oints) for out-of-sample forecast ev aluations. Figures 2 and 3 s h o w the original and the fi rst differences of the normalized aggregated wind p o wer resp ectiv ely . It is clear that wind p ow er data are nonstationary . The v ariance is c hang- ing with time, sho win g clusters of high and lo w v ariabilit y . Also, there are some o ccasional spike s . Figures 4 and 5 s h o w the auto correlation function of the wind p o wer and its first d ifferen ces resp ective ly . Auto correlation is significan tly reduced by taking fir st differen ces. Since our aim is to generate sh ort term forecasts u p to 24 hours ahead, w e d o not fo cus on mo d eling an y long term seasonalit y , which often app ears in wind data due to the changing wind patterns thr oughout the y ear. F or example, w e can mo d el a cycle of 90 days by regressing the data in the 5 In our pap er t h e va lue of n ormalization must n ot b e smaller th an the total capacity since we will consider the logistic t ransformation ( 1 ). 6 A. LAU AN D P . M C S HARR Y Fig. 3. First differ enc es of normalize d aggr e gate d wind p ower. It is cle ar that the varianc e changes with time, and ther e is volatility clustering as wel l as sudden spikes. The data ar e disse cte d by the dashe d li ne into a tr aining set and a testing set. Fig. 4. Sample ACF of the time series of normalize d aggr e gate d wind p ower up to a lag of 7 days. The auto c orr elations de c ay very slow ly. I t shows that the wind p ower data ar e highly c orr elate d and may inc orp or ate long memory effe cts. MUL TI-STEP WIN D POWER DENSITY FORECASTS 7 Fig. 5. Sample ACF of the first differ enc es of normalize d aggr e gate d wi nd p ower up to a lag of 7 days. The dashe d lines ar e the c onfidenc e b ounds at 2 standar d deviations, assuming that the data fol low a Gaussian whi te noise pr o c ess. The auto c orr elations ar e signific antly r e duc e d, but they ar e stil l signific ant up to a lag of 7 days. training set with 16 harmonics of sin es and cosines with p erio ds T = j / (90 × 96) , j = 1 , . . . , 16 . This give s a fitted time ser ies as shown in Figure 6 with R 2 = 0 . 395. One ma y then mo del the d eseasonalize d data, but stud ies show that results ma y b e worse than those obtained by m o deling the seasonalit y directly [Jorgenson ( 1967 )]. On the other hand, w e are more in terested in the diurnal cycle since it p la ys a more imp ortan t role in in tr ada y forecasts. Diurnal cycle s ma y app ear in w ind data due to differen t temp eratures and air p ressures dur ing the da y an d the n ight, and win d sp eeds are sometimes larger dur ing the d a y wh en con ve ction currents are driven by the heating of the sun. Thus, w e try to fit th e training d ata with harm on ics of higher frequencies, such as those with T = j / 96 where j is an intege r. Ho w eve r, results sho w that those harmonics cannot help us to explain the v ariances in the data, and, thus, w e decide to exclude the m o deling of an y diu r nal cycle in this p ap er. Aggregat ed w ind p ow er time series, although smo other than those from individual w ind farm s, are non-Gaussian. In particular, they are n onnega- tiv e. Figure 7 sh o ws the u nconditional dens ity of aggregated wind p o wer. This distribution has a sh arp er p eak than the normal distribution and is also significan tly right-sk ew ed. Common transformations for normalizing wind sp eed data include the logarithmic transf ormation and the square ro ot transformation [T a ylor, McSharry and Buizza ( 2009 )]. Ho wev er, those trans- formations are s h o wn to b e unsatisfactory for our particular wind p ow er d ata 8 A. LAU AN D P . M C S HARR Y Fig. 6. L ong term se asonality app e ars in the wind data. We r e gr ess the data in the tr ain- ing set with 16 harmonics of sines and c osines with p erio ds T = j / (90 × 96) , j = 1 , . . . , 16 , so that the maxim um p erio d is 90 days. T he fit gi ves an R 2 = 0 . 395 . The thin dashe d line is the obser ve d normalize d w i nd p ower and the solid line is the fitte d time series with a cycle of 90 days. The vertic al dashe d line disse cts the data into a tr aining set and a testing set. Fig. 7. Unc onditional empiric al density of the normalize d aggr e gate d wind p ower, fitte d using the data in the tr aining set. T he density is cle arly non-G aussian sinc e the data is b ounde d. The density is skewe d and has a sharp er p e ak than the Gaussian distribution. This density gi ves the climatolo gy f or e c ast b enchmark. MUL TI-STEP WIN D POWER DENSITY FORECASTS 9 Fig. 8. Density of the wind p ower data af ter applying the lo garithmic tr ansformation, which r emains non-G aussian. The lo garithmic tr ansformation is a c om mon tr ansforma- tion to c onvert wind sp e e d data into an appr oximate Gaussian distribution, but is cle arly unappr opriate for wind p ower data. The solid line is the fitte d Gaussian distribution by maximizing the likeli ho o d. as demons trated in Figures 8 and 9 . Nev ertheless, we could transform the wind p ow er data y t b y a logistic transf ormation. This can b e traced bac k to the w ork of John son ( 1949 ), and recen tly Bjørn ar Bremnes ( 2006 ) ap- plies this transformation to mo d el w in d p o w er. T he logistic transformation is giv en by z t = log  y t 1 − y t  , 0 < y t < 1 , (1) and th e trans f ormed d ata z t giv es a distr ib ution wh ic h can b e we ll approx- imated b y a Gaussian distribution as shown in Figur e 10 . I n con trast with individual wind p o wer data, w e d o n ot encounte r an y v alues of zero or on e and so ( 1 ) is wel l defined. In Section 3.1 we app ly this transformation and build a Gaussian mo del to generate m ulti-step densit y forecasts for wind p ow er. 3. Approac h es for densit y forecasting. Since our aim of this pap er is to generate m u lti-step ahead density forecasts without relying on Mont e Carlo sim u lations, it is im p ortant that our appr oac h can b e iterated easily . F or this reason, in b oth of the follo wing approac hes, we consid er a Gaussian mo d el at certain s tages so that w e can iterate the forecasts in a tr actable man n er. 10 A. LAU AN D P . M C S HARR Y Fig. 9. Density of the wind p ower data after applying the squar e r o ot tr ansformation, which r emains non-G aussian. The squar e r o ot tr ansformat ion is a c ommon tr ansforma- tion to c onvert wind sp e e d data into an appr oximate Gaussian distribution, but is cle arly inappr opriate for wind p ower data. The solid line is the fitte d Gaussian distribution by maximizing the likeli ho o d. Fig. 10. Density of the wind p ower data after applying the lo gistic tr ansformation, which c an b e wel l appr oximate d by a Gaussian distribution. The solid l ine i s the fitte d Gaussian distribution by maximizing the likeliho o d. MUL TI-STEP WIN D POWER DENSITY FORECASTS 11 Fig. 11. First differ enc es of the lo gistic tr ansforme d wind p ower. The varianc e is not changing as f ast as b efor e, and the amount of volatil ity clusterings i s r e duc e d. However, the tim e series i s stil l nonstationary . The data ar e disse cte d by the dashe d li ne into a tr aining set and a testing set. 3.1. Gaussian mo del for tr ansforme d data. In the fi rst app r oac h, w e con- sider the transf ormation of w ind p o wer data in to an appro ximately Gaus- sian distribution so that we could describ e the transformed data by a sim - ple Gaussian mo del, in particular, the con ven tional ARIMA–GAR C H mo del with Gaussian in no v ations. As d iscussed in Section 2 , w e tr an s form the wind p ow er data b y the logistic function in ( 1 ). This transformation m ap s th e sup- p ort fr om (0 , 1) to the ent ire real axis, an d Figure 10 sho ws that this resu lts in an app ro ximately Gaussian distribu tion. As w ind p o wer data are n onstationary , so are the transformed data and w e consider the first differences w t = z t − z t − 1 . When compared w ith the original fi rst differences y t − y t − 1 in Figure 3 , the logistic trans f ormed v alues z t ha ve few er v olatilit y clusterings and a smaller autocorrelation. Th is is sho w n in Figure 11 and Figure 12 , resp ectiv ely . Thus, we mo del z t b y an ARIMA( p, 1 , q )–GARCH( r , s ) mo d el 6 w t = µ + p X i =1 φ i w t − i + q X j =1 θ j ε t − j + ε t , ε t |F t − 1 i . i . d . ∼ N (0 , σ 2 ε ; t ) , (2) 6 W e hav e also considered mo d eling z t by A RMA( p, q ) –GARCH( r , s ) mo d els, but they are n ot selected b ased on the BIC v alues. 12 A. LAU AN D P . M C S HARR Y Fig. 12. Sample A CF of the first di ffer enc es of l o gistic tr ansforme d wind p ower up to a lag of 7 days. The dashe d lines ar e the c onfidenc e b ounds at 2 standar d deviations, assuming that the data fol low a Gaussian whi te noise pr o c ess. The auto c orr elations ar e slightly smal ler than that f or the original data, which is shown i n Figur e 5 . σ 2 ε ; t = ω + r X i =1 α i ε 2 t − i + s X j =1 β j σ 2 ε ; t − j , where w t = z t − z t − 1 , µ, φ i , θ j , ω , α i , β j are constan t co efficien ts satisfying the usual conditions [Tsa y ( 2005 )] and F t consists of all the past v alues of z up to time t . W e also consider an ARIMA ( p, 1 , q ) mo del for z t with con- stan t cond itional v ariance V ar[ ε t |F t − 1 ] = σ 2 ε ; t = σ 2 ε , so as to compare with the ARIMA( p, 1 , q )–GAR CH ( r , s ) mo del. W e select the mo dels b y minimiz- ing the Ba y esian Information Criteria (BIC). P arameters are estimated b y maximizing the Gaussian likel iho o d. The optimal h -step ahead f orecasts ˆ z t + h | t and ˆ σ 2 ε ; t + h | t can b e easily ob- tained, and the corresp ond ing h -step ahead density forecast of Z t + h is give n b y the Gaussian distribution, that is, f Z t + h | t ∼ N ( ˆ z t + h | t , ˆ σ 2 t + h | t ) so that ˆ σ 2 t + h | t = V ar [ z t + h |F t ] can b e obtained from { ˆ σ 2 ε ; t + j | t } h j =1 in a standard wa y , for example, b y expressing the mod el in a mo vin g a verag e (MA) represen- tation [Tsa y ( 2005 )]. T o restore the densit y of the normalized aggregate d wind p o w er Y t + h , w e compute the Jacobia n of the transformation in ( 1 ) where | J | = | dz /dy | = 1 / [ y (1 − y )] . Th e densit y of Y t + h is then giv en by MUL TI-STEP WIN D POWER DENSITY FORECASTS 13 f Y t + h | t ( y t + h ) = | J | f Z t + h | t ( z t + h ), that is, f Y t + h | t ( y t + h ) = 1 y t + h (1 − y t + h ) 1 q 2 π ˆ σ 2 t + h | t (3) × exp  −  log  y t + h 1 − y t + h  − ˆ z t + h | t  2  /(2 ˆ σ 2 t + h | t )  . Note that ( 3 ) is the h -step ahead conditional density of Y t + h giv en the conditional p oint forecast of ˆ z t + h | t at time t . 3.2. Exp onential smo othing and trunc ate d normal distribution. The sec- ond appr oac h deals with the original wind p o w er data y t directly . Ho w- ev er, since the data are non-Gaussian, there is a problem with th e iter- ation of multi-step ahead d ensit y forecasts. W e h andle this by expr essing the h -step ahead conditional density as a function of its fi r st t wo moment s. F or instance, the one-step ahead d ensit y is written as f t +1 | t ( y ; ˆ µ t +1 | t , ˆ σ 2 t +1 | t ), where ˆ µ t +1 | t = E[ y t +1 |F t ] is the conditional mean and ˆ σ 2 t +1 | t = V ar[ y t +1 |F t ] = V ar[ ε t +1 |F t ] = ˆ σ 2 ε ; t +1 | t is the conditional v ariance. 7 A t this momen t, w e do not attempt to figure out the exact form of the densit y fu nction f t +1 | t . Given an y f t +1 | t and a mo d el M for the d ynamics, we can alw ays ev olve the densit y function so that f t +1 | t ( y ; ˆ µ t +1 | t , ˆ σ 2 t +1 | t ) M − → f t + h | t ( y ; ˆ µ t + h | t , ˆ σ 2 t + h | t ) , (4) ˆ µ t + h | t = p ( h ) M ( ˆ µ t +1 | t , . . . , ˆ µ t + h − 1 | t ; y 1 , . . . , y t ) , ˆ σ 2 t + h | t = q ( h ) M ( ˆ σ 2 ε ; t +1 | t , . . . , ˆ σ 2 ε ; t + h | t ) , where M − → denotes the pro cess of ev olving the dynamics an d generating h - step ahead density forecasts under the u nkno wn m o del M , whic h in practice ma y requ ire th e u se of Monte Carlo simulat ions. Here p ( h ) M and q ( h ) M stand for fu nctions that giv e the conditional mean and the cond itional v ariance of y t , w ith parameters that dep end on the mo del M and the forecast horizon h . It is d iffi cu lt to obtain an y closed form for f t + h | t if the d istribution of inno v ations ε t is n on-Gaussian. Thus, w e pr op ose to use a t wo-step approac h to approxi mate f t + h | t . I n the first step, w e attempt to mod el the dynamics 7 In this pap er, ˆ σ 2 t + h | t denotes th e conditional v ariance of the data y t + h , while ˆ σ 2 ε ; t + h | t denotes th e conditional v ariance of the innov ation ε t + h , so that in general ˆ σ 2 t + h | t is a function of ˆ σ 2 ε ; t + j | t with j = 1 , . . . , h . 14 A. LAU AN D P . M C S HARR Y of the cond itional mean ˆ ℓ t + h | t and the conditional v ariance ˆ s 2 t + h | t of the data using a Gaussian mo del G . This is expr essed as Step 1: ˆ ℓ t + h | t = p ( h ) G ( ˆ ℓ t +1 | t , . . . , ˆ ℓ t + h − 1 | t ; y 1 , . . . , y t ) , (5) ˆ s 2 t + h | t = q ( h ) G ( ˆ s 2 ε ; t +1 | t , . . . , ˆ s 2 ε ; t + h | t ) , where p ( h ) G and q ( h ) G stand for f unctions that giv e the conditional mean and the conditional v ariance of y t + h , with p arameters that dep end on the Gaus- sian mo d el G and h orizon h . In mo del G , the inn o v ations are add itiv e and are assumed to b e i.i.d. Gaussian distrib uted. F or example, G can b e th e con ven tional ARIMA–GAR CH mo del w ith Gaussian innov ations. This ma y b e violated in realit y , so ˆ ℓ t + h | t and ˆ s 2 t + h | t obtained from mo del G ma y not b e the true cond itional mean ˆ µ t + h | t and conditional v ariance ˆ σ 2 t + h | t resp ectiv ely . They only ser ve as p ro xies to the true v alues. Although mo del G ma y n ot d escrib e r eal s ituations, w e r ely on a second step for remedial adjustments suc h that th e fi nal densit y forecast is an ap- pro x im ation to r ealit y . In the second step, w e assu me that the h -step ahead densit y f t + h | t can b e approximate d b y a p arametric fu nction D , whic h is c haracterized by a lo cation parameter an d a scale parameter. In p articular, the lo cation parameter and th e scale p arameter are obtained from the con- ditional mean ˆ ℓ t + h | t and the conditional v ariance ˆ s 2 t + h | t resp ectiv ely , whic h are estimated f rom the Gaussian mo del G . T h us, we simply take Step 2: f t + h | t ( y ; ˆ µ t + h | t , ˆ σ 2 t + h | t ) ≈ D ( y ; ˆ ℓ t + h | t , ˆ s 2 t + h | t ) (6) as the h -step ahead density forecast wh ere D is a function dep ending on t wo parameters only . As a result, the t w o-step approac h may b e able to giv e a go o d estimatio n of f t + h | t if ( 6 ) is a close appro ximation. In ( 6 ) the correct conditional mean ˆ µ t + h | t and conditional v ariance ˆ σ 2 t + h | t are generate d b y p ( h ) M ( · ) and q ( h ) M ( · ) under the tru e mo del M , while the corresp onding p ro xy v alues ˆ ℓ t + h | t and ˆ s 2 t + h | t are generated by p ( h ) G ( · ) and q ( h ) G ( · ) und er a Gauss ian mo del G . E m pirical studies will b e needed to determine the appropriate Gaussian mo del G as we ll as the b est c h oice D in order to app ro ximate th e final d ensit y f t + h | t . F or our normalized aggregated wind p o wer y t , c ho osing D as the tru n - cated normal d istribution b ounded within [0 , 1] gives a go o d approxima tion of f t + h | t . T runcated normal distributions ha ve b een applied successfully in mo deling b ounded, nonnegativ e data [S anso and Guenni ( 1999 ), Gneiting et al. ( 2006 )]. W e consider D to b e parameterized by the location parameter ˆ ℓ t + h | t and the scale parameter ˆ s 2 t + h | t , where N ( ˆ ℓ t + h | t , ˆ s 2 t + h | t ) is th e corre- sp ond ing normal distribu tion without truncation. Note that ˆ ℓ t + h | t and ˆ s 2 t + h | t MUL TI-STEP WIN D POWER DENSITY FORECASTS 15 will b e the true conditional mean and conditional v ariance if the data are indeed Gauss ian. T h e dens it y function f t + h | t is then give n by ( 6 ) so that f t + h | t ( y ; ˆ µ t + h | t , ˆ σ 2 t + h | t ) = 1 ˆ s t + h | t  ϕ  y − ˆ ℓ t + h | t ˆ s t + h | t  (7) .  Φ  1 − ˆ ℓ t + h | t ˆ s t + h | t  − Φ  − ˆ ℓ t + h | t ˆ s t + h | t  for y ∈ (0 , 1), wher e ϕ and Φ are the standard normal d ensit y and d istribu- tion f u nction resp ectiv ely . Instead of directly estimating ˆ ℓ t + h | t and ˆ s 2 t + h | t using the ARIMA–GAR CH mo dels, w e find that a b etter wa y is to smo oth th e tw o parameters s imultane- ously b y exp onenti al smo othin g metho ds. Exp onenti al smo othin g metho ds ha ve b een wid ely and successfully adopted in areas suc h as in v en tory fore- casting [Brown and Meye r ( 1961 )], electricit y forecasting [T a ylor ( 2003 )] and v olatilit y forecasting [T a ylor ( 2004 )]. A comprehens iv e review of exp onen- tial smo othing is give n by Gardner ( 2006 ). Hyndm an et al. ( 2008 ) provide a state sp ace framew ork for exp onen tial smo othing, whic h furth er strengthens its v alue as a statistical mo d el instead of an ad ho c forecasting pro cedure. Ledolter and Bo x ( 1978 ) sho w that exp onenti al smo othing metho ds pro- duce optimal p oin t forecasts if and only if the underlyin g data generating pro cess is within a su b class of ARIMA( p, d, q ) pro cesses. W e exte nd this prop er ty and demonstrate th at simultaneous exp onent ial smo othing on the mean and v ariance can pro d uce op timal p oin t forecasts if the d ata follo w a corresp onding ARIMA( p, d, q )–GAR CH( r , s ) pro cess. Th is enables us to generate m ulti-step ahead forecasts for the parameters ˆ ℓ t + h | t and ˆ s 2 t + h | t b y iterating the underlyin g ARIMA–GAR CH m o del of exp onentia l s m o othing. 3.2.1. Smo othing the lo c ation p ar ameter only. F or the simplest case, let us assume that the conditional v ariance of wind p o wer is constan t. This means that we only need to sm o oth the conditional mean ℓ t , while the con- ditional v ariance s 2 t will b e estimated d irectly f r om the data via estimating the v ariance of innov ations ˆ s 2 ε . F r om n o w on, we r efer to the conditional mean as the lo cation parameter and the conditional v ariance as the s cale parameter so as to remind us that they corresp ond to the truncated n ormal distribution. Again, the h -step ahead scale parameter ˆ s 2 t + h | t is obtained as a fu nction of ˆ s 2 ε . By simple exp onential smo othing, the smo othed series of the lo cation parameter ℓ t is giv en b y S t , which is up dated according to S t = αy t + (1 − α ) S t − 1 , (8) 16 A. LAU AN D P . M C S HARR Y where y t is the observed win d p o wer at time t and 0 < α < 1 is a smo othing parameter. W e initialize the series by setting S 1 = y 1 , and the one-step ahead forecast is ˆ ℓ t +1 | t = S t . Iterating ( 8 ) giv es ˆ ℓ t + h | t = S t . Ho wev er, the forecast errors y t − ˆ ℓ t | t − 1 are highly correlated, with a significan t lag one sample auto correlation of 0 . 2723. A simple wa y to improv e the forecast is to add a parameter φ s to accoun t for auto correlations in the forecast equation [T a ylor ( 2003 )]. W e call this th e simple exp onen tial smo othin g with error correction. The up dating equatio n is still giv en b y ( 8 ), bu t the forecast equation is mo dified as ˆ ℓ t +1 | t = S t + φ s ( y t − S t − 1 ) , (9) where | φ s | < 1 . Note that it is no w p ossible to ob tain negativ e v alues for ˆ ℓ t +1 | t in ( 9 ) and in such cases ˆ ℓ t +1 | t is ob viously not the true conditional mean. Nev er th eless, this is not a p roblem here since ˆ ℓ t +1 | t essen tially serves as the lo cation parameter of th e truncated normal distribu tion, whic h can b e negativ e. F ollo win g th e taxonomy introd uced by Hyn dman et al. ( 2008 ), w e denote ( 8 ) and ( 9 ) as the ETS( A, N , N | EC ) metho d , wh ere ETS stands for b oth an abbreviation for exp onen tial smo othing as we ll as an acron ym for error, tr end and s easonalit y resp ectiv ely . Th e A inside the brac k et stands for add itive errors in the m o del, the first N s tands for no trend, the second N stand s for no seasonalit y and E C stands for err or correction. By d irectly iterating ( 8 ) and ( 9 ) and exp ressing ˆ y t + h | t = ˆ ℓ t + h | t , we hav e ˆ ℓ t + h | t = S t + αφ s (1 − φ h − 1 s ) 1 − φ s ( y t − S t − 1 ) + φ h s ( y t − S t − 1 ) (10) for h > 1. T o generate h -step ahead forecasts of ˆ s 2 t + h | t , it is imp ortant that w e identify an und er lyin g mo del corresp ond ing to our u p d ating and forecast equations ( 8 ) and ( 9 ). It can b e easily c h ec ke d that the ETS( A, N , N | EC ) metho d is optimal for the ARIMA(1 , 1 , 1) mo del, in the sense that the fore- casts in ( 9 ) are the min im u m mean sq u are error (MMSE) forecasts. Ex- pressed in the form of an ARIMA(1 , 1 , 1) mod el with Gaussian inn o v ations, the ETS( A, N , N | EC ) method can b e w ritten as w t = φ s w t − 1 + ε t + ( α − 1) ε t − 1 , ε t i . i . d . ∼ N (0 , s 2 ǫ ) , (11) where w t = y t − y t − 1 , ε t is the Gaussian inno v ation with mean zero and constan t v ariance s 2 ǫ , and α, φ s are the smoothing parameters in ( 8 ) and ( 9 ). It can also b e easily ve rified that ( 10 ) is identic al to the h -step ahead forecasts obtained fr om the ARIMA(1 , 1 , 1) mo del in ( 11 ). It then follo ws from the ARIMA(1 , 1 , 1) mo del that the h -step ahead forecast v ariance is giv en b y ˆ s 2 t + h | t = ˆ s 2 ǫ h X j =1 Ω 2 h − j , (12) MUL TI-STEP WIN D POWER DENSITY FORECASTS 17 where Ω 0 = 1 , Ω h = φ h s + α (1 − φ h s ) / (1 − φ s ) for h ≥ 1, and ˆ s 2 ε is th e estimated constan t v ariance of the in n o v ations. Note that in this case, ( 12 ) is the explicit form of ˆ s 2 t + h | t = q ( h ) G ( ˆ s 2 ǫ ) in ( 5 ). Since maximum lik eliho o d estimators are w ell kn o wn to ha ve nice asymp- totic prop erties, w e estimate the three parameters α, φ s and ˆ s 2 ε b y maximiz- ing the likeli ho o d of the truncated normal distribution f t +1 | t ( y t +1 ; ˆ ℓ t +1 | t , ˆ s 2 t +1 | t ). One ma y also consid er minimizing the mean conti n uous r ank ed probabilit y scores (CRPS) of the density forecasts [ Gneiting et al. ( 2005 , 2006 )], b ut this requires a muc h larger amoun t of compu tation. Although it ma y s lightly im- pro ve the densit y forecasts, min im izing the C R P S is not app ealing here since w e aim at generating multi-st ep forecasts in a computationally efficien t wa y . After obtaining the parameters, from ( 10 ) and ( 12 ) w e can generate the h -step ahead densit y forecasts u sing ( 7 ). 3.2.2. Smo othing b oth the lo c ation and sc ale p ar ameters simultane ously. Next, we consid er heteroscedasticit y for the conditional v ariances of wind p ow er. In this case, apart from smoothin g the location parameter ℓ t , w e also simultaneo usly smo oth the s cale parameter s 2 t . In fact, w e sm o oth the v ariance of innov ations s 2 ε ; t and obtain the scale parameter s 2 t as a fu nction of s 2 ε ; t as in ( 5 ). Equipp ed with th e one-step ahead forecast of the lo cation parameter ˆ ℓ t | t − 1 , w e ma y calculate the squ ared difference b et wee n ˆ ℓ t | t − 1 and the ob- serv ed wind p o wer y t , that is, ( y t − ˆ ℓ t | t − 1 ) 2 , as the estimated v ariance s 2 ε ; t at time t . App lyin g simple exp on ential smo othing, the smo othed series of s 2 ε ; t is giv en by V t , which is up dated according to V t = γ ( y t − ˆ ℓ t | t − 1 ) 2 + (1 − γ ) V t − 1 , (13) where y t is the observed wind p o wer at time t , ˆ ℓ t | t − 1 is obtained by ( 9 ) and 0 < γ < 1 is a smo othing parameter. W e initialize the series by setting V 1 to b e the v ariance of the data in the training set. In fact, the forecasts are not sensitiv e to the c h oice of initial v alues due to the size of the d ata s et. The one-step ahead forecast is giv en by ˆ s 2 ε ; t +1 | t = V t . Again, the forecast errors are highly correlat ed and it is b etter to include an additional parameter φ v in the forecast equation to accoun t f or auto correlations. The mo difi ed forecast equation is then giv en b y ˆ s 2 ε ; t +1 | t = V t + φ v [( y t − ˆ ℓ t | t − 1 ) 2 − V t − 1 ] , (14) where | φ v | < 1. Unf ortunately , a ma jor drawbac k of introdu cing this extra term in the f orecast equation is that negativ e v alues of ˆ s 2 ε ; t +1 | t ma y o c- cur. Although this do es n ot h ap p en in our data, w e mo dify our appr oac h and consider smo othing the logarithmic transf ormed scale parameter log s 2 ε ; t 18 A. LAU AN D P . M C S HARR Y suc h that negativ e v alues are allo wed sin ce w e aim at dev eloping a general metho dology that applies to differen t data sets. The smo othed series for log s 2 ε ; t is then giv en by log V t . Denoting ε t = y t − ˆ ℓ t | t − 1 and e t = ε t / √ V t , the estimated logarithmic v ariance at time t is no w c hosen to b e g ( e t ) instead of log ε 2 t so that g ( e t ) = θ ( | e t | − E[ | e t | ]) , (15) where θ is a constant p arameter. T his ensures that g ( e t ) is p ositive for large v alues of e t and n egativ e if e t is small. T he up dating equation and the forecast equation are now written resp ectiv ely as log V t = γ g ( e t ) + (1 − γ ) log V t − 1 , (16) log ˆ s 2 ε ; t +1 | t = log V t + φ v [ g ( e t ) − log V t − 1 ] , whic h are analogous to ( 13 ) and ( 14 ), except that a logarithmic transf orma- tion is tak en and ( y t − ˆ ℓ t | t − 1 ) 2 is r eplaced by g ( e t ). W e initialize the series b y setting log V 1 = 0. In fact, th e smo othing pr o cedure is insensitiv e to the initial v alue du e to the size of the data set. No w, the h -step ahead forecasts of ˆ ℓ t + h | t are still obtained from ( 10 ), b ut to generate h -step ahead forecasts of ˆ s 2 t + h | t w e need to iden tify an underlying mo del for this smo othing metho d . W e summarize our exp onential smo othing metho d for b oth ℓ t and s 2 t b y com bining ( 8 ), ( 9 ) and ( 16 ): S t = αy t + (1 − α ) S t − 1 , ˆ ℓ t +1 | t = S t + φ s ( y t − S t − 1 ) , (17) log V t = γ g ( e t ) + (1 − γ ) log V t − 1 , log ˆ s 2 ε ; t +1 | t = log V t + φ v [ g ( e t ) − log V t − 1 ] , where g ( e t ) is given in ( 15 ) and e t as d efi ned previously . There are four smo othing parameters α, γ , φ s , φ v and a parameter θ for the estimated loga- rithmic v ariance g ( e t ). W e adopt the taxonom y similar to that for exp onen- tial smo othing for the lo cation parameter as d escrib ed in Section 3.2.1 , and denote ( 17 ) as the ET S( A, N , N | EC )–( A, N , N | EC ) metho d where the sec- ond brac ket of ( A, N , N | EC ) indicates the exp onen tial smo othing metho d applied for smo othing the v ariance. W e aim at iden tifying ( 17 ) with an ARIMA–GAR CH mo del. Using ( 11 ) as th e ARIMA(1 , 1 , 1) mod el for y t and w riting ε t = y t − ˆ ℓ t | t − 1 , the last equation in ( 17 ) can b e wr itten as log ˆ s 2 ε ; t +1 | t = log V t + φ v [ g ( e t ) − log V t − 1 ] = γ g ( e t ) + (1 − γ ) log V t − 1 + φ v [ g ( e t ) − log V t − 1 ] = ( γ + φ v ) g ( e t ) − φ v log V t − 1 (18) MUL TI-STEP WIN D POWER DENSITY FORECASTS 19 + (1 − γ ) { log s 2 ε ; t − φ v [ g ( e t − 1 ) − log V t − 2 ] } = ( γ + φ v ) g ( e t ) − φ v g ( e t − 1 ) + (1 − γ ) log s 2 ε ; t , where we ha v e used the up dating equation in ( 16 ). This is th e exp onential GAR CH, that is, EGARCH(2 , 1) mo del for the conditional v ariance of in - no v ations ε t [Nelson ( 1991 )]. Unlik e the con v en tional EGARCH mo dels for asset pr ices, g ( e t ) is symmetric since there is no reason to exp ect vol atilit y to increase w hen wind p o wer generation drops . In summary , the exp onen tial smo othing metho d in ( 17 ) is optimal for the ARIMA(1 , 1 , 1)–E GARCH(2 , 1) mo del, which can b e written as w t = φ s w t − 1 + ε t + ( α − 1) ε t − 1 , ε t |F t − 1 i . i . d . ∼ N (0 , s 2 ε ; t ) , (19) log s 2 ε ; t = (1 − γ ) log s 2 ε ; t − 1 + ( γ + φ v ) g ( e t − 1 ) − φ v g ( e t − 2 ) , where w t = y t − y t − 1 and g ( e t ) is giv en in ( 15 ), and w e ha v e assumed Gaus- sian inno v ations so that E[ | e t | ] = p 2 /π . S imilarly , we estimate the five pa- rameters α, φ s , γ , φ v and θ by maximizing the truncated n orm al lik eliho o d as menti oned in S ection 3.2.1 . No w, equipp ed with the ARIMA(1 , 1 , 1)– EGAR CH (2 , 1) mo del in ( 19 ), the h -step ahead forecasts f or the scale p a- rameter ˆ s 2 ε ; t + h | t can b e easily obtained [Tsa y ( 2005 )]. Consequentl y , the h - step ahead forecasts ˆ s 2 t + h | t can b e expressed as a fun ction of { ˆ s 2 ε ; t + j | t } h j =1 , whic h is analogous to ( 12 ) except that th e expression is muc h more compli- cated and, in practice, one would simply iterate the f orecasts. The h -step ahead d ensit y f orecasts can then b e obtained u sing ( 7 ). 4. F orecast ev aluations. 4.1. Benchmark mo dels. In this section we apply the appr oac hes of den - sit y forecasts in S ection 3 to forecast normalized aggregated win d p o w er in Ireland. T o ev al uate the forecast p erformances of our approac hes, w e com- pare the results w ith four simp le b enchmarks. The fi rst t wo b enc h marks are the p ersistence (random w alk) forecast and the constant forecast, wh ic h are b oth obtained as trun cated normal distribu tions in ( 7 ). F or the p ersistence forecast, we estimate the h -step ahead lo cation parameter ˆ ℓ t + h | t and scale parameter ˆ s 2 t + h | t using th e latest observ ations, that is, ˆ ℓ t + h | t = y t , ˆ s 2 t + h | t = P N j =1 ( y t +1 − j − y t − j ) 2 N (20) for t > N . W e fi nd that taking N = 48, that is, using data in the past 12 hours, giv es an appr opriate estimate for ˆ s 2 t + h | t . F or the constan t f orecast, we estimate th e constant lo cation parameter ˆ ℓ t + h | t and scale parameter ˆ s 2 t + h | t using data in the w hole trainin g set. They 20 A. LAU AN D P . M C S HARR Y are give n by th e sample mean and the sample v ariance of the 11,008 obser- v ations in the training set, so th at ˆ ℓ t + h | t = ˆ ℓ = P 11 , 008 j =1 y j 11 , 008 , ˆ s 2 t + h | t = ˆ s 2 = P 11 , 008 j =1 ( y j − ˆ ℓ ) 2 11 , 007 . (21) W e ha ve also consider ed generating the p ersistence and constan t f orecasts using the first approac h as describ ed in Section 3.1 . How ev er, our resu lts sho w that the second approac h giv es a b etter b enc hm ark in terms of forecast p erform an ce. On the other hand, th e th ir d and the fourth b enc h marks are obtained b y estimating emp irical d ensities from the data. The third b enc hm ark is the climatolo gy forecast, in which an empirical u nconditional densit y is fitted using d ata in the w h ole training set. The density has b een sh own in Figure 7 previously . The fourth b enchmark is th e empirical conditional densit y forecast. T o b e in line with th e use of exp onential smo othing to estimate the lo cation and scale parameters in Section 3.2 , w e consider an exp onenti ally w eighte d moving a ve r age (EWMA) of a set of empirical conditional densities. Due to computational efficiency as well as reliabilit y of dens it y estimations, at eac h time t we essen tially consider the EWMA of 14 empirical conditional densities g emp ( { Λ j t } ), where eac h of them is fitted using observ ations in th e past j days with j = 1 , 2 , . . . , 14 and { Λ j t } = { y t − 96 j +1 , y t − 96 j +2 , . . . , y t } is th e set of (96 × j ) latest observ ations used to fit the emp irical density . Up to an appropriate n orm alizatio n constan t, th e h -step ahead EWMA empir ical conditional densit y forecast is giv en b y f t + h | t ( y ) ∝ 14 X j =1 λ (1 − λ ) j − 1 g emp ( { Λ j t } ) (22) so that f or an y fixed forecast origin t , the h -step ahead d ensit y forecasts are identi cal for all h > 1. Th e smo othing p arameter in ( 22 ) is estimated to b e λ = 0 . 1988, wh ic h is obtained b y maximizing the log lik eliho o d, that is, P log f t +1 | t ( λ ; y t +1 ), using the data in th e training set on ly . It is p ossible to estimate a smo othing p arameter for eac h f orecast horizon h . Ho we v er, th e impro v ements are not significan t and, th u s, we simp ly ke ep u s ing λ = 0 . 1988 for all horizons. Figure 13 s h o ws the exp onen tial decrease of the weig hts b eing assigned to d ifferent empirical densities g emp ( { Λ j t } ). In summary , we consider the follo wing 4 b enc h marks and 4 approac hes of generating multi-step densit y forecasts, and compare th eir forecast p erfor- mances f rom 15 minutes up to 24 hours ahead: 1. P ersistence forecast [TN] 2. Constan t forecast [TN] 3. Climatology forecast [Empirical density] MUL TI-STEP WIN D POWER DENSITY FORECASTS 21 Fig. 13. The exp onential de cr e ase of the w ei ghts λ (1 − λ ) j − 1 assigne d to the empiric al c onditional densities g emp ( { Λ j t } ) fitte d with j days of latest observations, wher e λ = 0 . 1988 is obtaine d by maximizing the l ikeliho o d using data in the tr aining set. The EWMA em- piric al c onditional density for e c asts ar e obtaine d as the weighte d aver age of g emp ( { Λ j t } ) . 4. EWMA conditional densit y foreca st [Empir ical densit y] 5. The ARIMA(2 , 1 , 3) mo del [L T] 6. The ARIMA(4 , 1 , 3) – GAR CH (1 , 1) mo del [L T] 7. The ETS( A, N , N | EC ) metho d [TN] 8. The ETS( A, N , N | EC )–( A, N , N | E C ) metho d [TN], where [L T] stands for logistic transformation and [T N] stands f or tru ncated normal distribution, so as to remind u s h o w the densities are generated. 4.2. Point for e c asts. First, let us ev aluate the p oin t foreca sts generated b y differen t approac hes. W e consider the exp ected v alues of the d ensit y forecasts as the optimal p oint forecasts. Giv en a forecast densit y , w e can obtain the exp ected v alue directly b y n umerical integ ration. In particular, for f orecast densities in the form of tr uncated normal distributions, one may easily write down the exp ected v alue as ˆ y t + h | t = ˆ ℓ t + h | t − ˆ ℓ t + h | t  ϕ  1 − ˆ ℓ t + h | t ˆ s t + h | t  − ϕ  − ˆ ℓ t + h | t ˆ s t + h | t  (23) .  Φ  1 − ˆ ℓ t + h | t ˆ s t + h | t  − Φ  − ˆ ℓ t + h | t ˆ s t + h | t  , 22 A. LAU AN D P . M C S HARR Y where ˆ ℓ t + h | t and ˆ s 2 t + h | t are the lo cation and scale parameters of the trun cated normal distribu tion in ( 7 ). Note that due to the tru ncation, th e d istribution ma y not b e symmetric and so the exp ected v alue is in general different from th e lo cation parameter, that is, ˆ y t + h | t 6 = ˆ ℓ t + h | t . In fact, referr ing to ( 5 ), ˆ ℓ t + h | t = p ( h ) G ( ˆ ℓ t +1 | t , . . . , ˆ ℓ t + h − 1 | t ; y 1 , . . . , y t ) is obtained according to a Gaus- sian mo d el G , w h ic h ma y not giv e the true conditional mean ˆ y t + h | t of the data, and ma y ev en b e negativ e. Since the final density f t + h | t is only ob- tained when an appropr iate function D is c hosen, we see that D transf orms the cond itional mean from ˆ ℓ t + h | t for Gaussian data to th e optimal forecast ˆ y t + h | t for our d ata. This is analogous to calculating optimal p oint forecasts when the loss function is asymmetric [Christoffersen and Diebold ( 1997 ), P atton and Timmermann ( 2007 )]. Since the normalized agg regated win d p ow er is b ounded within [0 , 1], th e loss fu nction is alw a ys asymmetric unless the cond itional m ean is ˆ ℓ t + h | t = 0 . 5. When the conditional mean is not the optimal forecast, an additional term is added to comp ensate f or the asym- metric loss. Ch ristoffersen and Dieb old ( 1997 ) suggest an appro ximation to calculate the optimal forecast for conditionally Gaussian data by assum- ing ˆ y t + h | t = G ( µ t + h | t , σ 2 t + h | t ), w h ere µ t + h | t , σ 2 t + h | t are the cond itional mean and conditional v ariance. Their metho d inv olv es expandin g G into a T aylor series. T o ev aluate the p er f ormances of different forecasting app roac hes, we cal- culate h -step ahead p oin t forecasts for eac h of th e 5504 v alues in the testing set, wher e 1 ≤ h ≤ 96, that is, f r om 15 minutes up to 24 hours ahead. F or eac h forecast horizon h , w e calculate the mean absolute error (MAE) and the ro ot mean squared err or (RMSE) of the p oin t forecasts, where th e mean is tak en o v er the 5504 h -step ahead forecasts in the testing set. Figures 14 and 15 sho w th e results of p oin t foreca sts u nder MAE an d RMSE resp ectiv ely . Th e rankings of d ifferen t approac h es are similar un- der either MAE or RMSE, except for th e ETS( A, N , N | EC )–( A, N , N | EC ) metho d whic h p erforms relativ ely b etter un der MAE than RMSE. It p er- forms th e b est u n der MAE f or long f orecast horizons b ey ond 14 h ours. On the other hand, the tw o ARIMA–GAR CH mo dels outp erform all other approac hes for short forecast horizons within 12 h ours, and are almost as go o d as the ET S( A, N , N | EC )–( A, N , N | EC ) metho d for h orizons b eyo nd 12 hours. In terestingly , the ARIMA(2 , 1 , 3) mo del is p erforming almost iden tically to the ARIMA(4 , 1 , 3)–GARCH(1 , 1) mo d el. Th is phenomenon is in contrast with that for the ETS metho ds, w here smo othing b oth the location and scale parameters do p erform m uch b etter. It seems that includ ing the d ynamics of the conditional v ariance in the mo d eling of the logistic transformed wind p ow er z t cannot impro v e the p oint forecasts under MAE or RMSE. These ma y b e explained by Figure 3 whic h sho ws a significant ly changing v ariance MUL TI-STEP WIN D POWER DENSITY FORECASTS 23 Fig. 14. Me an absolute err or (MAE) of p oint for e c asts gener ate d by differ ent appr o aches for for e c ast horizons fr om 15 minutes to 24 hours ahe ad. The AR IMA–GARCH m o dels on lo gistic tr ansforme d data p erform b est for short horizons less than 12 hours wher e as the ETS ( A, N , N | E C ) – ( A, N , N | E C ) metho d wi th trunc ate d normal di stribution is b est for horizons gr e ater than 12 hours. Fig. 15. R o ot me an squar e d err or (RMSE) of p oint f or e c asts gener ate d by differ ent ap- pr o aches for for e c ast horizons fr om 15 minutes to 24 hours ahe ad. R esults ar e sim ilar to those under MAE. 24 A. LAU AN D P . M C S HARR Y in the original win d p o wer data y t , and b y Figure 11 whic h sho ws a fairly constan t v ariance for z t . W e will fu rther inv estigate this issue in th e ev al- uation of den sit y forecasts usin g the probabilit y integral transform (PIT), where we see that the conditional v ariance m o dels are ind eed capturing the c hanges in vola tilit y b etter and th us generate more reliable density f orecasts. As d iscussed in S ection 1 , one may argue that spatiotemp oral in formation among in dividual wind farm s should b e deplo yed to forecast aggregat ed wind p ow er. T o sho w that it is in deed b etter to forecast th e aggregated p o wer as a univ ariate time series, w e consider a simple multiple time series approac h. W e obtain the b est linear unbiased pr ed ictor (BLUP) of w ind p o wer gener- ation at a single wind farm us in g observ ations in the neighborh o o d, where the pred ictor is the b est in the sense that it min imizes m ean square errors. In other w ords, it is simply the kriging p redictor whic h is widely applied in spatial statistics [Cressie ( 1993 ), Stein ( 1999 )]. It can b e easily extended to deal with sp atiotemporal data [Gneiting, Gen ton and Guttorp ( 2007 )], and more details could b e foun d in Lau ( 2010 ). Compu ting the BLUP relies on the kn owledge of the co v ariances of th e pro cess b et we en different s ites. In the con text of spatiote mp oral data, w e obtain th e BLUP by calculating the empirical co v ariances among th e wind p ow er at different spatial as w ell as temp oral lags. 8 W e then substitute the empirical co v ariances into th e form u la of BLUP . W e apply this method and obtain 1, 6, 12 and 24 hours ahead p oin t forecasts for the p o w er generated at eac h individual win d farm, aggrega te all p ow er and normalize the result b y dividing by 792.35 5 MW. W e compute the RMSE of these aggrega ted forecasts, and fin d th at aggre- gating individ u al forecasts cannot b eat the p erform ances of our approac h es in Section 3 . The r esults are disp lay ed in T able 1 . Of course, one may exp ect that more sophisticated sp atiotemp oral mod els may b e able to outp erform our m etho ds h er e, b ut this will b e of more int erest to ind ividual p o wer gen- eration instead of aggregated ones as d iscussed in this pap er. 4.3. Density for e c asts. F or the densit y forecasts, w e use the con tinuous rank ed pr obabilit y score (CRPS) to rank the p erf ormances. Gneiting and Raftery ( 2007 ) discussed the pr op erties of CRPS extensiv ely , sho w ing that it is a strictly p rop er score and a lo wer s core alwa ys in dicates a b etter d ensit y forecast. CRPS has b ecome one of the p opular tools for d ensit y forecast ev aluatio ns, esp ecially for ensemble f orecasts in meteo rology . W e hav e also analyzed the p erformances of den s it y forecasts using other common m etrics suc h as the negativ e log lik eliho o d (NLL) scores. Ho wev er, we advocate the 8 One needs to d ecide the number of temp oral lags t o b e included in calculating the BLUP . In our case of empirical cov ariances, w e find th at including t emp oral lags within the past h our is generally the b est. F oreca st p erformances deteriorate when on e considers too many temp oral lags. MUL TI-STEP WIN D POWER DENSITY FORECASTS 25 T able 1 Summary of p oint for e c ast p erformanc es of differ ent appr o aches under R M SE. The b old numb ers indic ate the b est appr o ach at that for e c ast horizon 1 hour 6 hours 12 hours 24 hours P ersistence forecast 0.036 0.138 0.191 0.229 Constan t forecast 0 . 263 0 . 263 0 . 263 0 . 263 Climatolo gy forecast 0.285 0.285 0.285 0.285 EWMA conditional density 0.177 0.192 0.203 0.211 ARIMA(2 , 1 , 3) 0.032 0 . 118 0 . 177 0 . 207 ARIMA(4 , 1 , 3)–GARC H(1 , 1) 0 . 031 0 . 117 0 . 177 0.209 ETS( A, N , N | EC ) 0.032 0.126 0.193 0.230 ETS( A, N , N | EC )–( A, N , N | EC ) 0.034 0.126 0 . 176 0.21 5 BLUP (Multiple time series approach) 0.037 0.123 0.188 0.229 use of CRPS for r anking different approac hes since CRPS is more robust than the NLL scores, while the latter is alw a ys s everely affecte d by a few extreme outliers [Gneiting et al. ( 2005 )]. One ma y need to calculate the trimmed m ean of the NLL scores in ord er to resolv e this p roblem [W eigend and Sh i ( 2000 )]. Also, C RPS assesses b oth the calibration and the sharpn ess of th e density forecasts, while the NLL scores assesses sharp n ess only . Similar to ev aluating p oint forecasts, w e generate h -step ahead density forecasts for eac h of the 5504 v alues in th e testing set wher e 1 ≤ h ≤ 96. F or eac h h -step ahead densit y forecast f t + h | t , let F t + h | t b e the corresp onding cum u lativ e d istr ibution f u nction. Th e CRPS is computed as CRPS = Z 1 0 [ F t + h | t ( y ) − 1 ( y − y t + h )] 2 dy , (24) where 1 ( · ) is the indicator fun ction wh ic h is equal to on e wh en the argument is p ositiv e. Again, the mean CR P S is tak en o ver the 5504 h -step ahead densit y foreca sts in th e testing s et. Figure 16 sho ws the p erformances of d ensit y forecasts und er mean CRPS. The rankings are similar to those u nder MAE and RMSE in p oin t f ore- casts. The tw o ARIMA–GAR CH m o dels outp erf orm all other app roac hes for all forecast horizons. T a ble 2 summarizes the main results. Again, the p erform an ces of the ARIMA(2 , 1 , 3) mo del are v ery similar to that of th e ARIMA(4 , 1 , 3)–GARCH(1 , 1) m o del and, in con trast, the ET S( A, N , N | EC )– ( A, N , N | EC ) metho d is significan tly b etter than the ETS( A, N , N | E C ) metho d. T o in vestig ate the v alue of including the dynamics of conditional v ariances, we consider the p r obabilit y inte gral tr an s form (PIT). F or one-step ahead d ensit y f orecasts f t +1 | t , th e PIT v alues are giv en b y z ( y t +1 ) = Z y t +1 0 f t +1 | t ( y ) dy . (25) 26 A. LAU AN D P . M C S HARR Y T able 2 Summary of density for e c ast p erformanc es of di ffer ent appr o aches under C RPS. The b old numb ers indic ate the b est appr o ach at that for e c ast horizon 1 hour 6 hours 12 hours 24 h ours P ersistence forecast 0.019 0.077 0.111 0.137 Constan t forecast 0 . 159 0 . 159 0 . 159 0 . 159 Climatolo gy forecast 0.175 0.175 0.175 0.175 EWMA conditional density 0.098 0.111 0.120 0.127 ARIMA(2 , 1 , 3) 0.017 0.065 0 . 100 0 . 119 ARIMA(4 , 1 , 3)–GARC H(1 , 1) 0 . 016 0 . 063 0 . 099 0.120 ETS( A, N , N | EC ) 0.017 0.068 0.109 0.129 ETS( A, N , N | EC )–( A, N , N | EC ) 0.017 0.069 0.100 0.124 Dieb old, Gun ther and T ay ( 1998 ) s ho w that the s eries of PIT v alues z are i.i.d. un if orm if f t +1 | t coincides with the tru e und er lyin g dens it y from whic h y t +1 is generated. F or eac h forecasting app r oac h, we calculate th e p ercen tage of PIT v alues b elo w th e 5th, 50th and 95th quantiles of the U [0 , 1] d istribu- tion, that is, the p ercen tage of PIT v alues smaller than 0.05, 0.5 and 0.95 resp ectiv ely . W e denote them b y P 5 , P 50 and P 95 , and calculate the devia- tions of the p ercen tages ( P 5 − 5) , ( P 50 − 50) an d ( P 95 − 95). Figure 17 sho ws the deviations, where w e only fo cus on th e t wo ET S metho ds and the t wo Fig. 16. Me an c ontinuous r anke d pr ob ability sc or e (CRPS) of density for e c asts gener ate d by differ ent appr o aches for for e c ast horizons fr om 15 mi nutes to 24 hours ahe ad. R ankings ar e similar to those under MAE and RMSE in p oint for e c asts. MUL TI-STEP WIN D POWER DENSITY FORECASTS 27 Fig. 17. We c alculate the p er c entages P 5 , P 50 and P 95 of PI T values smal ler than 0.05, 0.5 and 0. 95 r esp e ctively, and c alculate the deviations ( P 5 − 5) , ( P 50 − 50) and ( P 95 − 95) . The ETS( A, N , N | EC )–( A , N , N | EC ) metho d and th e AR I MA(4 , 1 , 3) – GARCH(1 , 1) mo del i nde e d gener ate b etter c alibr ate d density for e c asts. The over al l c ali br ation of the ETS( A, N , N | EC )–( A , N , N | EC ) metho d is the b est, indi c ating that i t pr ovides the most r eliable descriptions of the changing volatility over time. Note that a p ositive slop e i mplies a ddnsity for e c ast which is over-c onserva tive, while a ne gative slop e impl i es the opp osite. ARIMA–GAR CH mo d els. W e see that the ETS( A, N , N | EC )–( A, N , N | EC ) metho d and the ARIMA(4 , 1 , 3)–GARCH(1 , 1) mo d el indeed generate den- sit y forecasts whic h are b etter calibrated. In particular, the ov erall calibra- tion of the ETS( A, N , N | EC )–( A, N , N | EC ) metho d is the b est, indicating that it p ro vid es the most reliable descriptions of th e changing v olatilit y o ver time. Note that a p ositiv e slop e in Figure 17 imp lies a d ensit y forecast which is o ver-conserv a tiv e and h as a large spread, while a negativ e slop e implies the opp osite. Th us, f or one-step ahead forecasts, the ARIMA–GAR CH mo dels are ov er-conserv ativ e, while the ETS metho ds are ov er-confiden t. Figure 17 only r eflects inf orm ation on the marginal d istributions of th e PIT v alues. Stein ( 2009 ) su ggests that it is also v aluable to ev al uate the distributions conditioned on vo latile p erio ds. It is particularly imp ortant to capture the v ariance dyn amics d uring times of large v olatilitie s , since for most of the times one do es not wan t to und erestimate the risk b y p r op os- ing an o ver-confiden t densit y forecast. Under estimating large risks u sually leads to a more d isastrous outcome than o v erestimating small risks. F ol- lo wing Stein ( 2009 ), we compare the abilit y of th e approac hes in capturin g v olatilit y dyn amics du ring the largest 10% of v ariance. T o estimate the v ari- ance of the d ata in the testing set, we d irectly adop t the p ersistence forecast 28 A. LAU AN D P . M C S HARR Y Fig. 18. Estimate d varianc e of data in the tes ting set using the p ersistenc e for e c ast ˆ s 2 ε ; t +1 | t in ( 20 ), which essential ly gi ves the 12-hour moving aver age of r e ali ze d varianc e. Cle arly, the varianc e changes with time and the lar gest values m ostly o c cur in e arly De- c emb er. ˆ s 2 ε ; t +1 | t in ( 20 ), wh ic h essen tially giv es the 12-hour mo ving a verage of realized v ariance. Figure 18 sho w s the c hanging v ariance, where the largest v alues mostly o ccur in early Decem b er. T he times corresp onding to the largest 10% of v aria nce are selecte d and w e compare the d istribution of z ( y t +1 ) at those times. The PIT d iagrams are sho wn in Figure 19 . It demonstrates th at the ARIMA–GAR CH mo del indeed giv es b etter calibrated one-step ahead densit y forecasts than the ARIMA mo del du ring v olatile p erio d s. The dif- ferences b etw een the t w o ETS metho ds are ev en more significan t, where the ETS( A, N , N | EC ) metho d giv es o ver-c onfident density f orecasts that underestimate the spread. 5. Conclusions and discussions. In this pap er we study t wo approac hes for generating multi-ste p d ensit y forecasts for b ounded n on-Gaussian data, and we apply our m etho ds to forecast wind p o wer generation in Ireland. In the first approac h, w e demonstrate that the logistic transformation is a go o d metho d to normalize wind p o we r data which are otherwise highly non- Gaussian and nons tationary . W e fi t ARIMA–GAR CH m o dels with Gaus- sian in no v ations for the logistic transformed d ata, and out-of-sample fore- cast ev aluations s ho w that they generate b oth sup erior p oint and density forecasts for all horizons from 15 minutes up to 24 hours ahead. A second ap- proac h is to assume that the h -step ahead conditional densities are describ ed MUL TI-STEP WIN D POWER DENSITY FORECASTS 29 Fig. 19. Histo gr ams of PIT values c onditione d on the lar gest 10% of estimate d varianc e, wher e the one-step ahe ad density f or e c asts ar e gener ate d using the AR I MA(2 , 1 , 3) m o del (top left), the ARIMA(4 , 1 , 3) – GA RCH(1 , 1) mo del (top ri ght), the ETS ( A, N , N | EC ) metho d (b ottom left) and the ETS( A , N , N | EC )–( A, N , N | EC ) metho d (b ottom right). T he dotte d lines c orr esp ond to 2 standar d deviations fr om the uni f orm density. b y a p arametric fun ction D with a lo cation parameter ˆ ℓ and scale parameter ˆ s 2 , namely , the conditional mean and the conditional v ariance of y t that are generated by an app r opriate Gaussian mo del G . Results sh o w that c h o osing D as the tr u ncated normal distribution is appropriate for aggreg ated wind p ow er data, and in this case ˆ ℓ and ˆ s 2 are the mean and v ariance of the original n ormal d istribution resp ectiv ely . W e apply exp onenti al smo othing metho ds to generate h -step ahead forecasts for the lo cation and scale param- eters. S ince the underlying mo dels of the exp onentia l sm o othing metho d s are Gaussian, we are able to obtain multi-step f orecasts by simple iterations and generate forecast d ensities as truncated normal distributions. Although the approac h usin g exp onent ial smo othing metho d s with tr u n- cated normal distributions cannot b eat the approac h considering logistic transformed data, they are still a useful alternativ e to pr o duce go o d densit y forecasts d ue to sev eral r easons. First, forecast p erformances of the exp onen- 30 A. LAU AN D P . M C S HARR Y tial smo othing metho ds are more robu st under differen t lengths of trainin g data, esp ecially when the size of the training set is relativ ely small and the estimation of th e ARIMA–GAR CH mod els ma y not b e reliable to extrap o- late into the testing set. T his has b een d emonstrated in ou r data, where we tak e 40% of the data as the training set and th e r emaining as the testing set. In suc h a case, the ET S( A, N , N | EC )–( A, N , N | EC ) metho d p erf orms b etter than the app roac h with logistic transformed data [Lau ( 2010 )]. Second, in the fi rst app r oac h using ARIMA–GAR CH m o dels, we hav e to select the b est mo del using BIC whenev er we consider an up dated training set. This is n ot necessary for the exp onential smo othing metho ds. Third, an adv an tage of forecasting by exp onenti al sm o othing metho ds is that it is compu tationally more efficien t to calculate p oin t forecasts due to th e closed form of densit y function that we ha ve c hosen, namely , the tru ncated normal d istribution D . On the other hand , in the first approac h , we ha ve to trans form the Gaussian densities and calculate the exp ected v alue of the transformed densities b y n u merical in tegrations, w h ic h require m uc h more computational p ow er. The second and third p oint s are critical sin ce, in practice, m any forecasting p rob- lems require frequent online up d ating. Finally , the second approac h allo ws us to c ho ose a parametric fun ction D for th e forecast d ensities, which giv es us more fl exib ility and on e m a y generate improv ed d ensit y f orecasts by test- ing v arious p ossible choic es of D . T h is adv anta ge is p articularly imp ortant when there are no ob vious transformations to normalize the data, and when there is evidence that supp orts simple parametric f orecast densities. In su mmary , w e ha ve dev elop ed a general approac h of generati ng m u lti- step densit y forecasts for non-Gaussian data. In particular, w e ha ve ap- plied our approac h es to generate multi- step densit y forecasts for aggregated wind p ow er data, whic h would b e economically v aluable to p o wer compa- nies, national grids and wind farm op erators. It w ould b e int eresting and c hallenging to prop ose mo dified metho ds b ased on our current approac h es, so that reliable d ensit y f orecasts for individ ual win d p ow er generation could b e obtained. Individ ual w ind p o wer time series are interesti ng sin ce they are highly nonlinear. Su d den jump s from maxim u m capacit y to zero m a y o ccur due to gusts of wind s, and there may b e long chains of zero v alues b ecause of lo w win d sp eeds or mainte nance of turbines. Characteristics of individu al wind p ow er d ensities include a p ositiv e p robabilit y mass at zero as wel l as a h ighly righ t-ske w ed d istribution, and it would b e c h allenging to generate m u lti-step density forecasts for ind ivid ual win d p o wer data. Another imp or- tan t area of f uture researc h is to dev elop spatiotemp oral mo dels to generate densit y forecasts for a p ortfolio of win d farms at d ifferent locations. Re- cen t dev elopments in this area include Hering and Gento n ( 2010 ). Some p ossible appr oac hes in clude the pro cess-conv olution metho d dev elop ed and studied b y Higdon ( 1998 ), which has b een ap p lied to the mo d eling of o cean temp eratures and ozone concen trations. Another p ossible approac h is the MUL TI-STEP WIN D POWER DENSITY FORECASTS 31 use of laten t Gaussian pro cesses. Th ose ap p roac hes hav e b een studied b y Sanso and Guenni ( 1999 ) who consider the p o wer tr uncated norm al (PTN) mo del, and by Berro cal, Raftery and Gn eiting ( 2008 ) who consider a mo d - ified v ersion of the PTN mo del called the t wo-st age mo d el. S p atiotemp oral mo dels will b e imp ortan t to wind farm inv estors to iden tify p oten tial s ites for new farm s. It would also b e of great imp ortance to th e national grid sys- tems wh er e a large p ortfolio of wind farms are conn ected, and sophisticated spatiotemp oral m o dels m a y b e constru cted to impro v e densit y forecasts for aggrega ted wind p o wer by exploring the correlations of p o wer generations b et w een n eigh b oring wind f arm s. Ac kn o wledgments. The wind p o wer generati on data in I r eland w as kindly pro v id ed by Eirgrid plc. The authors th ank Pierr e Pins on, James T aylo r, Max Little, the referees and th e asso ciate editors for ins igh tful commen ts and su ggestions. REFERENCES Berrocal , V. J., Rafter y , A . E. and G neiting , T. (2008). Probabilistic quantitativ e precipitation field forecasting using a tw o-stage spatial model. Ann. Appl. Statist. 2 1170–11 93. Bjørnar Bre mnes , J. (2006). A comparison of a few statistical mod els for making q uan- tile wind p ow er forecasts. Wi nd Ener gy 9 3–11. Bro wn , B. G., Ka tz , R. W. and Murphy , A. H. ( 1984). Time series mo dels t o sim ulate and forecast wind sp eed and wind p ow er. Journal of Appli e d Mete or olo gy 23 1184– 1195. Bro wn , R. G. and Me yer , R. F. (1961). The fundamental theorem of exp onential smooth - ing. O p er. R es. 9 673–685. MR0143317 Christoffersen , P . F. and Diebold , F. X. (1997). Optimal prediction under asymmetric loss. Ec onometric The ory 13 808–817. MR1610075 Cost a , A., C re spo , A., Na v arr o , J., Lizcano , G., Mad sen , H. and Feitona , E. (2008). A review on t h e young history of wind p ow er short-term prediction. R enewable & Sus- tainable Ener gy R eviews 12 1725–17 44. Cressie , N. A. C. (1993). Statistics for Sp atial Data . Wiley , New Y ork. MR1239641 Da vies , N., Pember ton , J. and Petr uccelli , J. D. (1988). An automatic proced ure for identificatio n, estimation and forecas ting univ ariate self exiting threshold autoregressiv e mod els. J. R oy. Statist. So c. Ser. D 37 199–204. Diebold , F. X., Gun th e r , T. A. and T a y , A. S. (1998). Ev aluating den sit y forecasts: With app lications to financial risk management. Internat. Ec onom. R ev. 39 863–883. Doher ty , R. and O’Malley , M. (2005). A new approach to qu antif y reserv e demand in systems with significan t installed wind capacity . I EEE T r ansact ions on Power Systems 20 587–595. Gardner , E. S. (2006). Exp onential smo othing: The state of th e art. J. F or e c ast. 4 1–28. Giebel , G., Karini ot akis , G. and Bro wnsw ord , R. ( 2003). The state-of-the-art in short-term prediction of wind p ow er—A literature ov erview. EU pro ject Anemos, De- liv erable Rep ort D1.1. Gneiting , T., Genton , M. G. and Guttorp , P . (2007). Geostatistical space–time models, stationarit y , separabilit y and full symmetry . In Statistic al M etho ds for Sp atio-T emp or al Systems . Mono gr aphs on Statistics and Appli e d Pr ob abil ity 107 151–175. Chapman & Hall/CR C, Boca Raton, FL. 32 A. LAU AN D P . M C S HARR Y Gneiting , T. and Rafter y , A. E. (2007). St rictly prop er scoring rules, prediction, and estimation. J. Amer. Statist. Asso c. 102 359–378. MR2345548 Gneiting , T., Rafte r y , A. E., Westveld , A. H. and Goldman , T. (2005). Calibrated probabilistic forecasting u sing ensemble mo del outp ut statistics and minimum CRPS estimation. Monthly We ather Re view 133 1098–1118. Gneiting , T., Larson , K., Westrick , K., Ge nton , M. G. and A ldrich , E. (2006). Calibrated probabilistic forecasting at the stateline wind energy center: The regime- switc hin g sp acetime metho d. J. Amer. Statist. Asso c. 101 968–979. MR 2324108 Hering , A. S . and Genton , M. G. (2010). Po wering up with space-time wind forecasting. J. Amer. Statist. Asso c. 105 92–104. Higdon , D. M. (1998). A pro cess-conv olution approach to mo deling temp eratures in th e North Atlantic Ocean. Journal of Ec olo gic al and Envir onmental Statistics 5 173–190. Hyndman , R. J., Koehler , A. B., Ord , J. K. and Sny d er , R. D. ( 2008). F or e c asting with Exp onential Smo othing: T he State Sp ac e Appr o ach . Springer, Berlin. Johnson , N. L. (1949). S ystems of frequency cu rves generated by meth od s of translation. Biometrika 36 149–176. MR0033994 Jor ge nson , D. W. (1967). Seasonal adjustment of d ata for econometric analysis. J. Amer. Statist. Asso c. 62 137–140. MR0215602 Landberg , L., Giebel , G., Ni elsen , H. A., Nielsen , T. and Madsen , H. (2003). Sh ort- term pred iction—An ov erv iew. Wind Ener gy 6 273–280. Lau , A. (2010). Probabilistic wind pow er forecasts: F rom aggregated approach to spa- tiotemporal mo dels. Ph.D. thesis, Mathematical Institut e, Un iv . Oxford. Ledol ter , J. and Box , G. E. P . (1978 ). Conditions for the optimalit y of exp onential smoothing forecast pro cedures. Metrika 25 77–93. MR0525280 Manzan , S. and Zerom , D. (2008). A b o otstrap-based n on - parametric forecast density . International Journal of F or e c asting 24 535–550. Milligan , M., Schw ar tz , M. and W an , Y. (2004). Statistical wind p ow er forecasting for U.S. wind farms. In The 17th Confer enc e on Pr ob ability and Statistics in the Atmo- spheric Scienc es/2004 Americ an Mete or olo gic al So ciety Annua l Me eting, Se attle, Wash- ington, January 11–15, 2004 . Moeanaddin , R. an d Tong , H. (1990). Numerical ev aluation of distributions in non linear autoregression. J. Time Ser. Anal. 11 33–48. MR1046979 Nelson , D. B. (1991). Conditional h eteroskedas ticit y in asset returns: A new approach. Ec onometric a 59 347–370 . MR1097532 P a tton , A. J. and Timmerm ann , A. (2007). Prop erties of optimal forecasts und er asym- metric loss and nonlinearity . J. Ec onometrics 140 884–918. MR2408931 Pinson , P ., Ch e v a llie r , C. and Kariniot akis , G. (2007). T rading wind generation with short-term probabilistic forecasts of wind p ow er. IEEE T r ansac tions on Power Systems 22 1148–1156. Pinson , P . and Mad sen , H. (2009). En sem ble-based prob ab ilistic forecasting at Horns Rev. Wind Ener gy 12 137–15 5. Sanchez , I. (2006). Short-term p rediction of wind energy pro duction. International Jour- nal of F or e c asting 22 43–56. Sanso , B. and Guenni , L. (1999). V enezuelan rainfall data analysed by using a Bay esian space-time mo del. Appl. Statist. 48 345–362 . Stein , M. L. (1999). Interp olation of Sp atial Data: Some The ory for Kriging . S pringer, New Y ork. MR1697409 Stein , M. L. (2009). Spatial interpolation of high-frequency monitoring data. Ann. Appl. Statist. 3 272–291. MUL TI-STEP WIN D POWER DENSITY FORECASTS 33 T a ylor , J. W. (2003). Short-term electricity demand forecasting using doub le seasonal exp onential smoothing. Journal of the Op er ational R ese ar ch So ciety 54 799–805. T a ylor , J. W. ( 2004). V olatilit y forecasting with smooth transition exp onential smooth- ing. I nternational Journal of F or e c asting 20 273–286. T a ylor , J. W., McSharr y , P . E. and Buizza , R. (2009). Wind p ow er density forecasting using wind ensemble predictions and time series mo dels. IEEE T r ansact ions on Ener gy Conversion 24 775–782. Tsa y , R. S. ( 2005). Analysis of Financial Tim e Series , 2nd ed. Wiley , H ob oken. MR2162112 Weigend , A . S. and Shi , S. (2000). Predicting daily probability distribut ions of S &P500 returns. J. F or e c ast. 19 375–392. Oxf ord–Man Institute University of Oxford Eagle House, W al ton Well Ro ad Oxf ord, O X2 6ED United Kingdom E-mail: ada.lau@o xf ord-man.ox.ac.uk Smith School of Enterprise and the Environment University of Oxford Ha yes House 75 George S treet Oxf ord, O X2 2BQ United Kingdom E-mail: patric k@mcsharry .net

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment