A Parallel Algorithm for solving BSDEs - Application to the pricing and hedging of American options

We present a parallel algorithm for solving backward stochastic differential equations (BSDEs in short) which are very useful theoretic tools to deal with many financial problems ranging from option pricing option to risk management. Our algorithm ba…

Authors: Celine Labart (LAMA), Jer^ome Lelong (LJK)

A Parallel Algorithm for solving BSDEs - Application to the pricing and   hedging of American options
A P arallel Algorithm for solving BSDEs - Application to t he pricing and hedging of American options Céline Labart ∗ Lab oratoire de Mathématiques , CNRS UMR 5127 Univ ersité de Sav oie, Campus Scien tifique 73376 Le Bourget du Lac, F rance Jérôme Lelong † Lab oratoire Jean Kun tzmann, Univ ersité de Gr enoble and CNRS, BP 53, 38041 Grenoble, Cedex 09, F rance Octob er 15, 2018 Abstract W e present a p a rallel algorithm for solving backw ard sto chastic different ia l equations (BSDEs in short) which ar e very useful theoretic to ols to deal with many financial problems ranging from option pr icing option to ris k management. Our algorithm based on Gob et and Labart (20 10) exploits the link b etw een BSDEs and non linea r partial different ia l equations (PDEs in short) and hence enables to solve high dimensional non linear P DEs. In this work, we apply it to the pricing a nd hedging of American options in high dimensional lo cal volatilit y mo dels, which remains very computationally demanding. W e hav e tested our algo rithm up to dimension 10 o n a cluster of 512 CP Us and we obtained linear sp eedups which pr oves the scalability of our implementation. Keyw ords : backward sto chastic differen tial equa tions, parallel c o mputing, Monte- Carlo metho ds, non linear PDE, America n o ptions, lo cal volatilit y mo del. 1 In tro du c tion Pricing and hedging American optio n s with a large num b er of underlying a s s ets is a c hallenging financial issue. On a single p ro cessor system, it can require sev eral hours of computation in high dimensions. Recen t adv ances in parallel computing hardware suc h as multi–co re processors, clusters and GPUs are then of h igh interest for the finance comm u nit y . F o r a couple of y ears, some researc h teams ha ve b een tac kling the paralleliza tion of n u merical algorithms for option pricing. Th u lasiram and Bondarenk o (2002) develo p ed a parallel alg orithm usin g MPI for pricing a class of m ultidimensional ∗ Email:cel ine.labart@univ- sa voie.fr † Email:j erome.lelong@imag.fr 1 financial deriv ativ es using a bin omial lattice approac h . Huang and Th u larisam (2005) presen ted algorithms for pricing American st yle Asian options using a binomial tree metho d. Concerning the parallelizatio n of Mon te–Carlo metho d s for pricing multi–dimensional Berm udan/American options, th e literature is quite rare. W e refer to T ok e and Girard (2006) and to Dung Doan et al. (2010). Both pap ers prop ose a paralleliza tion through grid computing of the Ibáñez and Zapatero (2004) algorithm, wh ic h computes the optimal exercise b oun dary . A GPGPU approac h based on quan tization tec hniques has recent ly been dev elop ed b y P agès and Wilb ertz (2011). Our approac h is based on solving Bac kward Sto c h astic Differen tial Equations (BSDEs in short). As explained in the s eminal pap er by El Karoui et al. (1997b), pricing and hedging Europ ean options in lo cal v olatilit y mo dels b oil d o wn to solving stand ard BSDEs. F rom El Karoui et al. (1997a), w e kn o w that the price of an American option is also linke d to a particular class of BSDEs called reflected BSDEs. Sev eral sequen tial algorithms to s olve BS- DEs can b e found in the lite r atur e. Ma et al. (1994) adopte d a PDE approac h , whereas Bouc hard and T ouzi (2004) and Gob et et al. (200 5 ) used a Mo nte– C arlo approac h based on the dynamic programming equation. The Mon te–Carlo appr oac h was also inv estigated b y Bally and P agès (2003) and Delarue and Menozzi (2006) who applied qu an tization tec h - niques to solv e the d yn amic programming equation. Our approac h is based on the algorithm dev elop ed by Gob et and Labart (2010) which combines Picard’s iteratio n s and an adaptive con trol v ariate. It enables to solv e standard B S DEs, i e, to g et the pr ice and delta of Eu ro- p ean options in a lo cal vola tilit y mo del. Comp ared to the algorit h ms based on the dynamic programming equation, ours provi d es regular solutions in time and space (whic h is coheren t with the regularit y of the option price and delta). T o apply it to th e pricing and hedging of American options, w e use a tec h nique introduced b y El Karoui et al. (1997a), whic h consists in appro ximating a reflected BSDE by a sequence of stand ard BSDEs w ith p enalisation. The pap er is organized as follo ws. In section 2, we briefly recall the link b et ween BSDEs and PDEs wh ic h is the heart of our algorithm. In s ection 3, we describ e the alg orithm and in Section 4 we exp lain ho w the paralleliza tion has b een carried out. Section 5 describ es ho w American options can b e priced using BSDEs. Finally , in Section 6, we conclude the pap er by some numerica l tests of our parallel algorithm for pricing and h edging Europ ean and American baske t options in dimension up to 10. 1.1 Definitions and Notations • Let C k ,l b b e the set of con tinuously different iable functions φ : ( t, x ) ∈ [0 , T ] × R d with con tin u ous and uniformly b ounded deriv ativ es w.r.t. t (resp. w.r.t. x ) up to ord er k (resp. up to order l ). • C k p denotes the set of C k − 1 functions with piecewise cont inuous k th order deriv ativ e. • F or α ∈ ]0 , 1], C k + α is the set of C k functions whose k th order deriv ativ e is Hölder con tin u ous with order α . 2 2 BSDEs 2.1 General results on standard BSDEs Let (Ω , F , P ) b e a give n probabilit y space on which is defined a q -dimensional standard Bro w- nian motion W , whose natural filtrat ion, augmen ted with P -n u ll sets, is denoted ( F t ) 0 ≤ t ≤ T ( T is a fixed terminal time). W e d enote ( Y , Z ) the solution of the follo win g bac kward sto chastic differen tial equ ation (BSDE) with fixed terminal time T − d Y t = f ( t, X t , Y t , Z t ) dt − Z t dW t , Y T = Φ( X T ) , (2.1) where f : [0 , T ] × R d × R × R q → R , Φ : R d → R and X is the R d -v alued pro cess solution of X t = x + Z t 0 b ( s, X s ) ds + Z t 0 σ ( s, X s ) dW s , (2.2) with b : [0 , T ] × R d → R d and σ : [0 , T ] × R d → R d × q . F rom no w on, we assume the follo wing Hyp othesis, whic h ensu res the existence and un ique- ness of the solution to Equations (2.1)-(2.2). Hyp othesis 1 • The driver f is a b ounde d Lipschitz c ontinuous function, ie, for al l ( t 1 , x 1 , y 1 , z 1 ) , ( t 2 , x 2 , y 2 , z 2 ) ∈ [0 , T ] × R d × R × R , ∃ L f > 0 , | f ( t 1 , x 1 , y 1 , z 1 ) − f ( t 2 , x 2 , y 2 , z 2 ) | ≤ L f ( | t 1 − t 2 | + | x 1 − x 2 | + | y 1 − y 2 | + | z 1 − z 2 | ) . • σ is uniformly el liptic on [0 , T ] × R d , ie, ther e exist two p ositive c onstants σ 0 , σ 1 s.t. for any ξ ∈ R d and any ( t, x ) ∈ [0 , T ] × R d σ 0 | ξ | 2 ≤ d X i,j =1 [ σ σ ∗ ] i,j ( t, x ) ξ i ξ j ≤ σ 1 | ξ | 2 . • Φ is b ounde d in C 2+ α , α ∈ ]0 , 1] . • b and σ ar e in C 1 , 3 b and ∂ t σ is in C 0 , 1 b . 2.2 Link with semilinear P D E s Let us also recall th e link b et w een BSDEs and semilinear PDEs. Although the r elation is the k eystone of our algorithm, as explained in S ection 3, w e do n ot dev elop it and refer to P ardoux and P eng (1992) or El Karoui et al. (1997b ) for more details. A ccording to P ardoux and P eng (1992, Theorem 3.1), we can link the solution ( Y , Z ) of the BSDE (2.1) to the solution u of the follo w ing PDE: ( ∂ t u ( t, x ) + L u ( t, x ) + f ( t, x, u ( t, x ) , ( ∂ x uσ )( t, x )) = 0 , u ( T , x ) = Φ( x ) , (2.3) where L is defined by L ( t,x ) u ( t, x ) = 1 2 X i,j [ σ σ ∗ ] ij ( t, x ) ∂ 2 x i x j u ( t, x ) + X i b i ( t, x ) ∂ x i u ( t, x ) . 3 Theorem 1 ( Delarue and Menozzi (2006), Th eorem 2.1) . Under Hyp othesis 1 , the solution u of PDE (2.3) b elongs to C 1 , 2 b . Mor e over, the solution ( Y t , Z t ) 0 ≤ t ≤ T of (2.1) satisfies ∀ t ∈ [0 , T ] , ( Y t , Z t ) = ( u ( t, X t ) , ∂ x u ( t, X t ) σ ( t, X t )) . (2.4) 3 Presen tation of the Algorithm 3.1 Description W e p resen t the algorithm in tro duced b y Gob et and Labart (201 0 ) to solv e standard BSDEs. It is based on Picard’s iterations com b in ed with an adaptive Mont e–Carlo metho d . W e recall that we aim at numerically solving BSDE (2.1), which is equiv alen t to solving the semilinear PDE (2.3). Th e curren t algorithm pro vides an appro ximation of the solution o f this PDE. Let u k denote the ap p ro ximation of th e solution u of (2.3) at step k . If we are able to compute an explicit solution of (2.2), the approxima tion of ( Y , Z ) at step k follo ws f r om (2.4): ( Y k t , Z k t ) = ( u k ( t, X t ) , ∂ x u k ( t, X t ) σ ( t, X t )), for all t ∈ [0 , T ]. Otherwise, we introd uce X N the appro ximation of X obtained with a N –time step Euler scheme: ∀ s ∈ [0 , T ] , dX N s = b ( ϕ N ( s ) , X N ϕ N ( s ) ) ds + σ ( ϕ N ( s ) , X N ϕ N ( s ) ) dW s , (3.1) where ϕ N ( s ) = sup { t j : t j ≤ s } is the largest discretization time not greater than s and { 0 = t 0 < t 1 < · · · < t N = T } is a regular sub division of the in terv al [0 , T ]. T h en, we write ( Y k t , Z k t ) = ( u k ( t, X N t ) , ∂ x u k ( t, X N t ) σ ( t, X N t )) , for all t ∈ [0 , T ] . It remains to explain ho w to build the approximat ion ( u k ) k of u . The basic idea is the follo wing: u k +1 = u k + Monte –Carlo ev aluations of the err or( u − u k ) . Com binin g Itô’s form ula app lied to u ( s, X s ) and u k ( s, X N s ) b et ween t and T and the semilinear PDE (2.3) satisfied by u , we get that th e correction term c k is giv en b y c k ( t, x ) = ( u − u k )( t, x ) = E h Ψ ( t, x, f u , Φ , W ) − Ψ N  t, x, − ( ∂ t + L N ) u k , u k ( T , . ) , W  |G k i where • L N u ( s, X N s ) = 1 2 P i,j [ σ σ ∗ ] ij ( ϕ ( s ) , X N ϕ ( s ) ) ∂ 2 x i x j u ( s, X N s ) + P i b i ( ϕ ( s ) , X N ϕ ( s ) ) ∂ x i u ( s, X N s ). • f v : [0 , T ] × R d → R den otes the follo win g fun ction f v ( t, x ) = f ( t, x, v ( t, x ) , ( ∂ x v σ )( t, x )) , where f is the d riv er of BSDE (2. 1 ), σ is the d iffusion co efficien t of the S DE satisfied b y X and v : [0 , T ] × R d → R is C 1 w.r.t. to its second argumen t. • Ψ and Ψ N denote Ψ( s, y , g 1 , g 2 , W ) = Z T s g 1 ( r , X s,y r ( W )) dr + g 2 ( X s,y T ( W )) , Ψ N ( s, y , g 1 , g 2 , W ) = Z T s g 1 ( r , X N ,s,y r ( W )) dr + g 2 ( X N ,s,y T ( W )) , 4 where X s,y (resp. X N ,s,y ) denotes the d iffusion pro cess solving (2.2) and starting fr om y at time s (resp. its appro ximation u s ing an Euler scheme with N time steps), and W denotes the standard Bro wnian motion app earing in (2.2) and used to sim u late X N , as giv en in (3.1). • G k is the σ -algebra generated b y the s et of all random v ariables used to b uild u k . In the ab o ve formula of c k , we compute the exp ectation w.r.t. the la w of X and X N and not w.r.t. the la w of u k , whic h is G k measurable. (See Definition 2 for a r igorous definition of G k ). Note that Ψ and Ψ N can actually b e written as exp ecta tions by introdu cing a random v ariable U uniformly distributed on [0 , 1]. Ψ( s, y , g 1 , g 2 , W ) = E U h ( T − s ) g 1 ( s + ( T − s ) U, X s,y s +( T − s ) U ( W )) + g 2 ( X s,y T ( W )) i , Ψ N ( s, y , g 1 , g 2 , W ) = E U h ( T − s ) g 1 ( s + ( T − s ) U, X N ,s,y s +( T − s ) U ( W )) + g 2 ( X N ,s,y T ( W )) i . In the follo wing, let ψ N ( s, y , g 1 , g 2 , W , U ) denote ψ N ( s, y , g 1 , g 2 , W , U ) = ( T − s ) g 1 ( s + ( T − s ) U, X N ,s,y s +( T − s ) U ( W )) + g 2 ( X N ,s,y T ( W )) (3.2 ) suc h that Ψ N ( s, y , g 1 , g 2 , W ) = E U [ ψ N ( s, y , g 1 , g 2 , W , U )]. F rom a pr actical p oin t of view, the PDE (2.3) is solv ed on [0 , T ] × D wh ere D ⊂ R d suc h that sup 0 ≤ t ≤ T | X t | ∈ D with a probability very close to 1. Algorithm 1 W e b e gin with u 0 ≡ 0 . A ssume that an appr oximate d solution u k of class C 1 , 2 is built at step k − 1 . Her e ar e the differ ent steps to c ompute u k +1 . • Pick at r andom n p oints ( t k i , x k i ) 1 ≤ i ≤ n uniformly distribute d over [0 , T ] × D . • Evaluate the Monte–Carlo c orr e ction c k at step k at the p oints ( t k i , x k i ) 1 ≤ i ≤ n using M indep endent simulations c k ( t k i , x k i ) = 1 M M X m =1 h ψ N  t k i , x k i , f u k + ( ∂ t + L N ) u k , Φ − u k , W m,k ,i , U m,k ,i i . • Compute the ve ctor ( u k ( t k i , x k i )) 1 ≤ i ≤ n . Now, we know the ve ctor ( u k + c k )( t k i , x k i )) 1 ≤ i ≤ n . F r om th ese val u e s, we extr ap olate th e function u k +1 = u k + c k on [0 , T ] × D . u k +1 ( t, x ) = P k ( u k + c k )( t, x ) , for ( t, x ) ∈ [0 , T ] × D , (3.3) wher e P k is a deter ministic op er ator, which only uses the values o f the function at the p oints ( t k i , x k i ) 1 ≤ i ≤ n to appr oximate the function on the whole do main [0 , T ] × D . The choic e of t he o p er ator P k is discusse d i n Se ction 3.2. Since c k is computed us ing Mon te-Carlo sim ulations in stead of a true exp ectation, the v alues ( c k ( t k i , x k i )) 1 ≤ i ≤ n are rand om v ariables. Therefore, u k +1 is a random fun ction dep end in g on the random v ariables n eeded to compute u k and W m,k ,i , U m,k ,i , 1 ≤ m ≤ M , 1 ≤ i ≤ n . In view of this comment, the σ -alg eb r a G k has to b e redefined to tak e into acco u n t this n ew source of randomness. 5 Definition 2 (Definition of the σ -algebra G k ) . L et G k +1 define the σ -algebr a gener ate d b y the set of al l r ando m variables use d to build u k +1 . U sing (3 .3 ) yields G k +1 = G k ∨ σ ( A k , S k ) , wher e A k is the set of r andom p oints use d at step k to build the estimator P k (se e b elow), S k = { W m,k ,i , U m,k ,i , 1 ≤ m ≤ M , 1 ≤ i ≤ n } , is the set of indep endent Br ownian motions use d to simulate the p aths X m,k ,N ( x k i ) , and G k is the σ -algebr a gener ate d by the set of al l r andom v ariables use d to build u k . 3.2 Choice of the op erator The most d elicate part of the algorithm is h o w to extrap olate a f u nction h and its deriv ativ es when only kno w ing its v alues at n p oints ( t i , x i ) i =1 ,...,n ∈ [0 , T ] × D . 3.2.1 A k e rnel op erator In the fi rst v ersion of Algorithm 1 presen ted in Gobet and Labart (2010), a fun ction h was extrap olated from the v alues computed on the grid by u sing a k ernel op erator of the form h ( t, x ) = n X i =1 u ( t i , x i ) K t ( t − t i ) K x ( x − x i ) , where K t is a one dimensional kernel whereas K x is a pro duct of d one d imensional kernels. Hence, ev aluating the function h at a giv en p oint ( t, x ) requires O ( n × d ) computations. The con v ergence r esult established b y Gob et an d Labart (201 0 , T heorem 5.1) is b ased on the p rop erties of t h e operator present ed in Gob et and Labart (2010, Section 4). Using the linearit y and th e b oundedness of the op erator, they managed to prov e that the errors k v − P k v k and k ∂ x v − ∂ x ( P k v ) k are b ounded, wh ic h is a k ey step in pro ving the conv ergence of the algorithm. A t the end of their p ap er, they p resen t an op erator based on k ernel estimators satisfying the assumptions required to pr o v e the con verge nce of the algorithm. 3.2.2 An extrapolating op erator The numerical prop erties of kernel op erators are very sensitiv e to the c hoice of their windo w parameters whic h is quite hard to tune for eac h new problem. Hence, w e h a ve tried to use an other s olution. Basically , w e hav e b orrow ed the solution p rop osed by Longstaff and Sc hw artz (2001) whic h consists in extrap olati n g a function b y solving a least squ are p roblem defined by the p ro jection of the original f u nction on a countable set of fun ctions. Assu me we kno w the v alues ( y i ) i =1 ,...,n of a fun ction h at the p oin ts ( t i , x i ) i =1 ,...,n , the function h can b e extrap olated b y computing α = arg min α ∈ R p n X i =1      y i − p X l =1 α l B l ( t i , x i )      2 , (3.4) where ( B l ) l =1 ,...,p are some real v alued f unctions d efined on [0 , T ] × D . On ce α is computed, w e set ˆ h ( t, x ) = P p l =1 α l B l ( t, x ). F or th e implemen tation, w e hav e chosen the ( B l ) l =1 ,...,p as a free family of multi v ariate p olynomials. F or suc h a choi ce, ˆ h is known to con verge u n iformly to h when p go es to infinit y if D is a compact s et and h is conti nuous on [0 , T ] × D . Our algorithm also requir es to compute the fir st and second deriv ativ es of h whic h are app ro ximated b y the 6 first and second d eriv ativ es of ˆ h . Although the idea of approximat in g the d eriv ativ es of a function by the d eriv ativ es of its approximat ion is not theoretically we ll justified, it is pro ved to be v ery efficien t in practic e. W e refer to W ang and Caflish (20 10 ) for a n application of this principle to the computations of th e Greeks f or American options. Practical determination of the vec tor α In this part, w e use the notation d ′ = d + 1. It is quite easy to see from Equation (3.4) that α is the solution of a linear system. The v alue α is a critical p oin t of the criteria to b e minimized in Equation (3.4) and the v ector α solv es p X l =1 α l n X i =1 B l ( t i , x i ) B j ( t i , x i ) = n X i =1 y i B j ( t i , x i ) for j = 1 , . . . , p Aα = n X i =1 y i B ( t i , x i ) (3.5) where the p × p matrix A = ( P n i =1 B l ( t i , x i ) B j ( t i , x i )) l,j =1 ,...,p and the v ector B = ( B 1 , . . . , B p ) ∗ . The matrix A is symmetric and p ositiv e d efinite but often ill-conditio n ed, so we cannot rely on the Cholesky factorizati on to solv e the linear system but instead we ha ve to u se some more elab orate tec hniques such as a QR factoriza tion with piv oting or a singular v alue decomp osition app roac h w hic h ca n b etter hand le an almost rank d eficien t matrix. I n our implement ation of Alg orithm 1, w e rely on the r outine dgelsy from L ap ack Anderson et al. (19 99 ), whic h solve s a li n ear syste m in the least square se n se b y using some QR decomp ositio n w ith pivot in g com bined with some orthogonalization tec hniques. F ortunately , the ill-conditio n ing of the matrix A is n ot fate; we can imp ro ve the situation b y cente r ing and normalizing the p olynomials ( B l ) l suc h that the domain [0 , T ] × D is actually mapp ed to [ − 1 , 1] d ′ . This reduction imp ro ves the n u merical b eha viour of the c haos decomp osition by a great deal. The construction of the matrix A has a complexit y of O ( p 2 nd ′ ). The compu tation of α (Equation 4.3) requires to solv e a linear system of size p × p w hic h requires O ( p 3 ) op erations. The o ve r all complexit y for computing α is then O ( p 3 + np 2 d ′ ). Choice of the ( B l ) l . The function u k w e wan t to extrap olate at eac h step of the algorit h m is prov ed to b e qu ite regular (at least C 1 , 2 ), so using multiv ariate p olynomials f or the B l should pr o vide a satisfactory approxima tion. A ctually , we used p olynomials with d ′ v ariates, whic h a r e built using tensor produ cts of u niv ariate p olynomia ls and if o n e w ants the v ector space V ect { B l , l = 1 , . . . , p } to b e the space of d ′ − v ariate p olynomials with global degree less or equal than η , then p has to b e equal to the b inomial co efficien t  d ′ + η η  . F or ins tance, for η = 3 and d ′ = 6 we find p = 84. This lit tle example sho ws that p c ann ot b e fixed b y sp ecifying the maxim um global degree of the p olynomials B l without leading to an explosion of th e compu tational cost, we therefore h ad to fi nd an other approac h. T o cop e with the curse of dimensionalit y , w e stud ied different strategies for truncating p olynomial chao s expansions. W e refer the reader to Chapter 2 of Blatman (2009) for a d etaile d review on the topic. F rom a computational p oint of view, w e could not affo r d the use of adaptiv e sparse p olynomial families because the construction of th e family is inevitably sequen tial and it wo u ld ha ve b een detrimenta l for the sp eed-up of our p arallel algorithm. Therefore, w e decided to u se sparse p olynomial chao s appro ximation based o n an h yp erb oli c set of i n dices a s in tro duced b y Blatman and Su d ret (2009). 7 A canonical p olynomial with d ′ v ariates can b e defined by a m ulti-index ν ∈ N d ′ — ν i b eing the d egree of the p olynomial with r esp ect the v ariate i . T run cating a p olynomial c h aos expansion b y kee p ing on ly the p olynomials with total degree not greater than η corresp onds to the set of multi- ind ices: { ν ∈ N d ′ : P d ′ i =1 ν i ≤ η } . The idea of h yp erb olic sets of indices is to consider the pseudo q − norm of th e m u lti-index ν with q ≤ 1      ν ∈ N d ′ :   d ′ X i =1 ν q i   1 /q ≤ η      . (3.6) Note t h at choosing q = 1 gives the full family of p olynomials with total d egree not greater than η . The effect of in tro du cing this pseudo-norm is to fav or lo w-order in teractions. 4 P arallel approac h In th is part, w e present a parallel versio n of Algorithm 1, whic h is f ar from b eing em bar- rassingly parallel as a cru de Mon te–Carlo algorithm. W e explain the difficulties encountered when parallelizi n g the algorithm and ho w w e solv ed them. 4.1 Detailed presentation of the algorithm Here are the notations we u se in the algorithm. • u k = ( u k ( t k i , x k i )) 1 ≤ i ≤ n ∈ R n • c k = ( c k ( t k i , x k i )) 1 ≤ i ≤ n ∈ R n • n : num b er of p oin ts of the grid • K it : num b er of iterations of the algorithm • M : n um b er of Mon te–Carlo samples • N : n umber of time steps used for the d iscretizati on of X • p : n u mb er of fun ctions B l used in the extrap olating op erator. This is not a p arameter of the al gorithm on its o w n a s it is determined by fixing η and q (the maxim u m total degree and the parameter of the hyperb olic mult i-ind ex set) b ut the parameter p is of great int erest w hen studying the complexit y of the algorithm. • ( B l ) 1 ≤ l ≤ p is a f amily of m ultiv ariate p olynomials used for extrap olating functions from a finite num b er of v alues. • α k ∈ R p is the v ector of the w eights of the c haos d ecomp osition of u k . • d ′ = d + 1 is the num b er of v ariates of the p olynomials B l . 8 Algorithm 1 Iterativ e algorithm 1: u 0 ≡ 0, α 0 ≡ 0. 2: for k = 0 : K it − 1 do 3: Pic k at r an d om n p oints ( t k i , x k i ) 1 ≤ i ≤ n . 4: for i = 1 : n do 5: for m = 1 : M do 6: Let W b e a Bro wn ian motion with v alues in R d discretized on a time grid with N time steps. 7: Let U ∼ U [0 , 1] . 8: Compute a i,k m = ψ N  t k i , x k i , f u k + ( ∂ t + L N ) u k , Φ − u k , W , U  . / ∗ W e r e c al l tha t u k ( t, x ) = P p l =1 α k l B l ( t, x ) ∗ / 9: end for c k i = 1 M M X m =1 a i,k m (4.1) u k i = p X l =1 α k l B l ( t k i , x k i ) (4.2) 10: end f or 11: Compute α k +1 = arg min α ∈ R p n X i =1      ( u k i + c k i ) − p X l =1 α l B l ( t k i , x k i )      2 . (4.3) 12: end for 4.2 Complexit y of t he algorithm In this section, we study in details the differen t parts of Algorithm 1 to d etermine their complexities. Before diving into the algorithm, w e would lik e to br iefly lo ok at the ev aluations of the function u k and its deriv ativ es. W e recall that u k ( t, x ) = p X l =1 α k l B l ( t, x ) where the B l ( t, x ) are of the f orm t β l, 0 Q d i =1 x i β l,i and the β l,i are some in tegers. Then the computational time for the ev aluation of u k ( t, x ) is prop ortional to p × d ′ . Th e first and second deriv ativ es of u k write ∇ x u k ( t, x ) = p X l =1 α k l ∇ x B l ( t, x ) , ∇ 2 x u k ( t, x ) = p X l =1 α k l ∇ 2 x B l ( t, x ) , and th e ev aluation of ∇ x B l ( t, x ) (resp. ∇ 2 x B l ( t, x )) has a computational cost prop ortional to d 2 (resp. d 3 ). 9 • The computation (at line 6) of the discretization of the d − d imensional Bro wnian motion with N time steps requires O ( N d ) computations. • The computation of eac h a k ,i m (line 8 ) requires the ev aluation of the fu nction u k and its first and second deriv ativ es whic h h as a cost O ( pd 3 ). Th en, the computation of c k i for giv en i and k has a complexit y of O ( M pd 3 ). • The computation of α (Equation 4.3) requires O ( p 3 + np 2 d ) op erations as explained in Section 3.2.2. The o ve r all complexit y of Algorithm 1 is O ( K it nM ( pd 3 + dN ) + K it ( p 2 nd + p 3 )). T o parallelize an algorithm, the firs t idea coming to mind is to fi nd lo ops w ith indep end en t iteratio n s whic h could b e spread out on different pro cessors with a min imum of comm uni- cations. In that resp ect, an embarrassingly parallel example is the w ell-kno w n M onte- C arlo algorithm. Un fortunately , Algorithm 1 is far f rom b eing so simple. T he iterations of th e outer lo op (line 2) a r e link ed from one step to t h e follo wing, consequen tly there is no h op e paral- lelizing this lo op. On the cont r ary , the iterations o ve r i (lo op line 4 ) are ind ep enden t as are the ones ov er m (loop line 5), so we hav e at h and t w o candidates to implement parallelizing. W e could ev en think of a 2 stage p arallelism : fi rst parallelizing the lo op ov er i ov er a small set of pr o cessors and inside this first lev el paralleli zing the loop o ver m . A ctually , M is not large enough for the paralleliza tion of the lo op o ve r m to b e efficient (see S ection 3.2). It turns out to b e far more efficien t to parallelize the lo op ov er i a s eac h iterat ion of t h e loop requires a significan t amoun t of work. 4.3 Description of the parallel part As we ha v e just explained, w e h a ve decided to parallelize th e lo op o v er i (line 4 in Algorithm 1). W e h a v e used a R obbin Ho o d approac h. I n the f ollo w in g, we assume to ha ve P + 1 pro cessors at hand with n > P . W e use th e follo win g master slav e sc h eme: 1. Send to eac h of the P slav e pro cessors the solution α k computed at the p revious step of the algorithm. 2. Spread the fir st P p oint s ( t k i , x k i ) 1 ≤ i ≤ P to the P sla v e pro cessors and assign eac h of them the computation of the corresp on d ing c k i . 3. As so on as a p r o cessor fin ishes its compu tation, i t sends its result bac k t o the master whic h in turn sends it a new p oint ( t k i , x k i ) at whic h ev aluating ψ N to compute c k i and this pro cess go es on u n til all th e ( c k i ) i =1 ,...,n ha ve b een compu ted. A t the end of this p ro cess, th e master kn o ws c k , wh ic h corresp onds to the approximat ion of the correction at th e rand om p oints ( t k i , x k i ) 1 ≤ i ≤ n . F r om these v alues and the ve ctor u k , the master computes α k +1 and th e algorithm can go through a new iteration o v er k . What is the b est wa y to send the data to eac h sla ve pr o cess? • Before starting an y computations, assign to eac h pro cess a blo c k of iterations ov er i and send the corresp on d ing data all at once. This w a y , just one connect ion has to b e initiali sed whic h is faster. But the time sp ent by the m aster to tak e care of its sla v e is longer w hic h implies that at the b eginning the sla v e pro cess will r emain longer 10 unemplo yed. When applying such a strategy , we implicitly assume that all the iterations ha ve the s ame computational cost. • Send data ite r ation by iteration. T he la tency at the b eginning is smaller than in the blo c k strategy and p erforms b etter when all iterations do not hav e the same computa- tional cost. Considering the wide range of data to b e sent and the inte n siv e u se of e lab orate structures, the most nat u ral w a y to pass these obj ects w as to r ely on the pac king mec hanism of MPI. Moreo v er, the li b rary we are using in the code (se e Section 4.5) already has a MPI binding whic h mak es the manipu lation of the differen t ob jects all the more easy . Th e use of pac king enabled to reduce the num b er of comm un ications b et w een th e master pro cess and a s lav e pro cess to ju st one comm unication at th e b eginning of eac h iteration of the lo op o v er i (line 4 of Algorithm 1). 4.4 Random n um b ers in a parallel en vironmen t One of the basic pr oblem wh en solving a probabilistic problem in paralle l computing i s the generation of random num b ers. Random n u m b er generators are usually d evised for sequenti al use only and sp ecial care s h ould b e tak en in parallel en vironment s to en s ure that the sequences of random n umbers generated on eac h pro cessor are indep enden t. W e w ould like to ha v e minimal comm unications bet ween the d ifferen t rand om num b er generators, ideally a fter t h e initialisat ion process, eac h generator should liv e indep endently of the others. Sev eral strategies exist for that. 1. Newbies in parallel compu ting might b e tempted to take any random n u m b er generator and fi x a differen t seed on eac h p ro cessor. Although this naiv e strateg y often leads to satisfactory results on to y examples, it can ind uce an imp ortan t bias and lead to detrimen tal results. 2. The first reasonable approac h is to split a sequence of random n umbers across sev- eral pro cessors. T o effici ently implemen t this strategy , the generator m ust ha ve some splitting facilities suc h that there is no need to dra w all the samples prior to an y compu - tations. W e refer to L’Ecuy er and Côté (1991); L’Ecuy er et al. (2002) for a pr esentat ion of a generator with sp litting f aciliti es. T o efficien tly split the sequence, one should kn o w in adv ance the n u mb er of samples needed by eac h pro cessor or at least an u p p er b ound of it. T o encounte r this problem, the splitting could b e made in substreams by jump ing ahead of P steps at eac h call to the rand om pro cedur e if P is the num b er of pro cessors in vol ved. This w ay , eac h pro cessor uses a sub-sequen ce of the in itial r andom n umber sequence rather than a con tiguous part of it. Ho wev er, as noted b y En tac h er et al. (1999), long range correlations in the original sequence can b ecome short range corre- lations b et w een d ifferen t pr o cessors wh en using substr eams. A ctually , the b est w ay to implemen t splitting is to use a generator with a huge p erio d suc h as the Mersenne T wister (its p erio d is 2 19937 − 1) and divide the p erio d b y a million or so if we think w e will n ot need more th an a million indep end en t substr eams. Doing so, we c ome u p with substreams whic h still ha ve an impressiv e length, in the case of the Mersenne T wister eac h su bstream is still ab out 2 19917 long. 11 3. A totally different approac h is to fi nd generators whic h can b e easily p arametrised and to compute sets of p arameters ensuring the s tatistical indep endence of the related generators. Sev eral generators offer such a facil ity suc h as the ones in cluded in the SPRNG packa ge (see Ma scagni (1997) for a d etailed presenta tion of the generat ors implemen ted in this pac kage) or the dynamically created Mersenne T wister (DCMT in short), see Matsumoto and Nishim ur a (200 0 ). F or our exp erimen ts, we ha ve decided to use the DCMT. Th is generator has a sufficient ly long p erio d (2 521 for the ve r sion w e u sed) and we can create at most 2 16 = 65536 indep enden t generators with this p erio d whic h is definitely enough for our needs. Moreo ver, the dynamic creation of the generators follo ws a deterministic pr o cess (if w e use the same seeds) whic h mak es it repro ducible. The d ra wback of the DCMT is that its initializati on pro cess might b e quite length y , act u ally the CPU time needed to crea te a n ew generator is n ot b ounded. W e giv e in Figure 1 the distribution. 0.4 0.8 1.2 1.6 2.0 2.4 0 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 Figure 1: Distribution of the CPU time needed for the creat ion of one Mersenne T wister generator 4.5 The library used for the implemen tation Our co de has b een implemen ted in C using the P NL library (see Lelong (2007- 2011 )). This is a scient ifi c library a v ailable u nder th e Lesser General Public Licence and it offers v arious facilitie s f or implementing mat h ematics and more recen tly some MPI bin dings ha v e b een added to easily manipulate the different ob jects a v ailable in PNL su c h as v ectors and matrices. In our p r oblem, we needed to m anipulate matrices and vec tors and pass them from the master pro cess to the sla v e pr o cesses and decided to use the pac king facilities offered b y PNL through its MPI bind ing. The tec hnical part wa s not only message p assing but also r an d om num b er generation as we already ment ioned ab ov e and PNL offers many functions to generate random v ectors or m atrices using sev eral random n u m b er generators among w h ic h the DCMT. Besides message p assin g, the a lgorithm also requires many other facilities such as m ulti- v ariate p olynomial c haos deco m p osition whic h is part of the library . F or the moment , three 12 families of polynomials (Canonical, Hermite and T c h eb ichev p olynomials) are imp lemen ted along with v ery effici ent me chanism to compute their first and seco n d deriv ativ es. Th e im- plemen tation tries to mak e the most of co de factorizati on to av oid recomputing common quan tities sev eral times. The polynomial c haos decomp osition to olb o x is quite flexible and offers a reduction facilit y suc h as describ ed in Section 3.2.2 whic h is completely transp aren t from the u ser’s side. T o face the cur se of dimens ionalit y , we used sparse p olynomial families based on an hyp erb olic set of indices. 5 Pricing and Hedging American options in lo cal v olatilit y mo dels In this Section, we presen t a metho d to price and hedge American options by using Algo- rithm 1, which solv es standard BSDEs. 5.1 F ramew ork Let (Ω , F , P ) b e a probabilit y space and ( W t ) t ≥ 0 a s tandard Bro wnian motion with v alues in R d . W e denote by ( F t ) t ≥ 0 the P -completion of the natural fi ltration of ( W t ) t ≥ 0 . Consider a financial mark et with d risky assets, with prices S 1 t , · · · , S d t at time t , and let X t b e th e d -dimensional ve ctor of log-returns X i t = log S i t , for i = 1 , · · · , d . W e assume that ( X t ) satisfies the follo wing sto chastic differen tial equation: dX i t =   r ( t ) − δ i ( t ) − 1 2 d X j =1 σ ij 2 ( t, e X t )   dt + d X j =1 σ ij ( t, e X t ) dW j t , i = 1 , · · · , d (5.1) on a finite in terv al [0 , T ], wh ere T is the maturit y of th e option. W e denote by X t,x s a con tin u ous version of th e flo w o f the sto c h astic differentia l Equatio n (5. 1 ). X t,x t = x a lmost surely . In the follo wing, w e assum e Hyp othesis 2 1. r : [0 , T ] 7− → R is a C 1 b function. δ : [0 , T ] 7− → R d is a C 1 b function. 2. σ : [0 , T ] × R d 7− → R d × d is a C 1 , 2 b function. 3. σ satisfies the fol lowing c o er ci v ity pr op e rty: ∃ ε > 0 ∀ ( t, x ) ∈ [0 , T ] × R d , ∀ ξ ∈ R d X 1 ≤ i,j ≤ d [ σ σ ∗ ] i,j ( t, x ) ξ i ξ j ≥ ε d X i =1 ξ 2 i W e are interest ed in computing the pr ice of an American option with pa yo ff Φ( X t ), wh ere Φ : R d 7− → R + is a contin uous function (in case of a put option, Φ( x ) = ( K − 1 d ( e x 1 + · · · + e x d ) + ). F rom Jaillet et al. (1990), we kno w that if Hyp othesis 3 Φ is c ontinuous and satisfies | Φ( x ) | ≤ M e M | x | for some M > 0 , 13 the price at time t of the American option with pa yoff Φ is giv en b y V ( t, X t ) = esssup τ ∈T t,T E  e − R τ t r ( s ) ds Φ( X τ ) |F t  , where T t,T is the set o f all stopp ing times wit h v alues in [ t, T ] and V : [0 , T ] × R d 7− → R + defined b y V ( t, x ) = sup τ ∈T t,T E  e − R τ t r ( s ) ds Φ( X t,x τ )  . W e also in tro du ce the amoun t ∆ ( t, X t ) in vol ved in the asset at time t to hedge the American option. Th e function ∆ is giv en b y ∆( t, x ) = ∇ x V ( t, x ). In order to link American option p rices to BSDEs, w e need three steps: 1. writing the p rice of an American option as a solution of a v ariational inequalit y (see Section 5.2) 2. linking the solution of a v ariatio n al inequalit y to the solution of a reflected BSDE (see Section 5.3) 3. appro ximating the solution of a RBSDE b y a sequence of standard BSDEs (see Section 5.4) W e refer to Jaillet et al. (1990) for the first step and to El Karoui et al. (1997a) for the second and third steps. 5.2 American option and V ariational Inequalit y First, w e r ecall the v ariational in equ alit y satisfied b y V . W e refer to Jaillet et al. (1 990 , Theorem 3.1) for more details. Under Hyp otheses 2 and 3, V solv es t h e follo w in g parabolic PDE ( max(Φ( x ) − u ( t, x ) , ∂ t u ( t, x ) + A u ( t, x ) − r ( t ) u ( t, x )) = 0 , u ( T , x ) = Φ( x ) . (5.2) where A u ( t, x ) = P d i =1 ( r ( t ) − δ i ( t ) − 1 2 P d j =1 σ 2 ij ( t, e x )) ∂ x i u ( t, x ) + 1 2 P 1 ≤ i,j ≤ d [ σ σ ∗ ] ij ( t, e x ) ∂ 2 x i x j u ( t, x ) is the generator of X . 5.3 V ariational Inequalit y and Reflected BSDEs This section is based on El Karoui et al. (1997a, Theorem 8.5). W e assume Hyp othesis 4 Φ is c ontinuous and has at most a p olynomia l gr owth (ie, ∃ C, p > 0 s.t. | Φ( x ) | ≤ C (1 + | x | p ) ). Let us consider the follo wing reflected BSDE        − d Y t = − r ( t ) Y t dt − Z t dW t + dH t , Y T = Φ( X T ) , Y t ≥ Φ( X t ) , ∀ t, R T 0 ( Y t − Φ( X t )) dH t = 0 . (5.3) 14 Then, u nder Hyp otheses 2 and 4, u ( t, x ) = Y t,x t is a viscosit y solution of the obstacle p roblem (5.2), where Y t,x t is the v alue at time t of the solution of (5.3) on [ t, T ] where the sup erscripts t, x m ean that X s tarts from x at t ime t . W e a lso ha v e ( ∇ x u ( t, x )) ∗ σ ( t, x ) = Z t,x t ( ∗ means transp ose). Then, we get that the price V and the delta ∆ of the option are give n by V ( t, X t ) = Y t , ∆ ( t, X t ) = ( Z t σ ( t, X t ) − 1 ) ∗ . 5.4 Approxim ation of a RBSDE by a sequence of standard B SDEs W e presen t a wa y o f appro ximating a RBSDE b y a sequence of standard BSDEs. The idea w as introduced b y El Karoui et al. (1997a, Section 6) for prov ing the existence of a solution to RBSDE by turning the constraint Y t ≥ Φ( X t ) into a p enalisation. L et us consider the follo wing s equence of BSDEs indexed b y i Y i t = Φ( X T ) − Z T t r ( s ) Y i s ds + i Z T t ( Y i s − Φ( X s )) − ds − Z T t Z i s dW s , (5.4) whose solutions are d enoted ( Y i , Z i ). W e define H i t = i R T t ( Y i s − Φ( X s )) − ds . Under Hyp othe- ses 2 and 4, th e sequence ( Y i , Z i , H i ) i con v erges to the solution ( Y , Z, H ) of the RBSDE (5.3), when i go es to in finit y . Moreo v er, Y i con v erges increasingly to Y . Th e term H i is often called a p enalisat ion. F rom a p r actica l p oin t, there is no use s olving suc h a sequence of BSDEs, b ecause we can directly apply our algorithm to solve Equation (5.4) for a giv en i . Therefore, w e actually consider the follo wing p enalized BSDE Y t = Φ( X T ) − Z T t r ( s ) Y s ds + ω Z T t ( Y s − Φ( X s )) − ds − Z T t Z s dW s , (5.5) where ω ≥ 0 is p enaliza tion w eigh t. In practice, the magnitude of ω must remain reasonably small as it app ears as a con traction constan t when stud yin g the sp eed of con ve r gence of our algorithm. Hence, the larger ω is, the slo wer our algorithm con ve r ges. So, a trade-off has to b e foun d b et we en the con ve r gence of our algorithm to the solutio n Y of (5. 5 ) and the accuracy of the appr oximat ion of the American option pr ice by Y . 5.5 Europ ean options Let u s consider a Eu rop ean option with p a y off Φ( X T ), wh ere X follo ws (5.1). W e d en ote b y V the option p rice and by ∆ the hed ging strategy asso ciated to the option. F rom El Karoui et al. (1997b), we know that the couple ( V , ∆) satisfies − dV t = − r ( t ) V t dt − ∆ ∗ t σ ( t, X t ) dW t , V T = Φ( X T ) . Then, ( V , ∆ ) is s olution of a standard BSDE. Th is corresp ond s to the particular case ω = 0 of (5.5), Y corresp onding to V and Z to ∆ ∗ t σ ( t, X t ). 6 Numerical results and p erformance 6.1 The cluster All our p erformance tests hav e b een carried out on a 256 − PC cluster from SUPELEC Metz. Eac h no de is a dual core pro cessor : INTEL Xeon-3075 2.66 GHz with a fron t side bus at 15 1333M h z. The tw o cores of eac h no de share 4GB of RAM and all the no des are in terconnected using a Gigabit Ethernet net work. In none of the exp eriment s, did w e mak e the most of the dual core a r c hitecture since our co de is one threaded. Hence, in our implemen tation a dual core pro cessor is actually seen as t wo single core pro cessors. The accuracy test s ha ve b een ac h iev ed using the facilities offered by the Univ ersit y of Sa vo ie computing cent er MUST. 6.2 Black- Scholes’ framew ork W e consider a d − dimensional Blac k-Sc holes mo del in which the dynamics under the risk neutral measure of eac h asset S i is supp osed to b e giv en by dS i t = S i t (( r − δ i ) dt + σ i dW i t ) S 0 = ( S 1 0 , . . . , S d 0 ) (6.1) where W = ( W 1 , . . . , W d ). Eac h comp onent W i is a standard Bro wnian motion. F or the n um erical exp eriments, the co v ariance structure of W will b e assumed to b e giv en b y h W i , W j i t = ρt 1 { i 6 = j } + t 1 { i = j } . W e supp ose that ρ ∈ ( − 1 d − 1 , 1), whic h ensures that the matrix C = ( ρ 1 { i 6 = j } + 1 { i = j } ) 1 ≤ i,j ≤ d is positiv e definite. Let L denote t h e l ow er triangular matrix i nv olv ed in the Cholesky decomp osition C = LL ∗ . T o sim ulate W on th e t ime-grid 0 < t 1 < t 2 < . . . < t N , we need d × N indep endent standard normal v ariables and set         W t 1 W t 2 . . . W t N − 1 W t N         =          √ t 1 L 0 0 . . . 0 √ t 1 L √ t 2 − t 1 L 0 . . . 0 . . . . . . . . . . . . . . . . . . . . . . . . √ t N − 1 − t N − 2 L 0 √ t 1 L √ t 2 − t 1 L . . . √ t N − 1 − t N − 2 L √ t N − t N − 1 L          G, where G is a normal random v ector in R d × N . The v ector σ = ( σ 1 , . . . , σ d ) is the v ector of v olatilitie s , δ = ( δ 1 , . . . , δ d ) is the ve ctor of instan taneous dividend r ates and r > 0 is the instan taneous i nterest rate. W e will denote the m aturit y t ime b y T . Since we kn o w h o w to sim ulate the la w of ( S t , S T ) exactly for t < T , there is no u se to discretize equation (6.1 ) using the Euler sc heme. In this Section N = 2. 6.2.1 Europ ean options W e w an t to stud y the numerical accuracy of our al gorithm and to do th at we first consider the case of Eur op ean baske t options for which we can compute b enc hmark p rice by using v ery efficien t Mon te-Carlo metho ds, see J our dain and Lelong (2009) for instance, while it is no more the case for American options. In this paragraph, the parameter ω appearing in (5.5) is 0. Europ ean put basket option. Consider the follo wing pu t b ask et option w ith maturit y T K − 1 d d X i =1 S i T ! + (6.2) Figure 2 presen ts the influ ence of the parameters M , n a n d q . The r esults obtained for n = 1000, M = 50 , 000 and q = 1 (curv e (+)) are ve r y close to the true price and moreo v er w e can see that the algorithm stabilizes after ve r y few iterations (less than 10). 16 1 3 5 7 9 11 13 15 17 19 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 n=1000 M=50000 q=1 n=1000 M=5000 q=1 n=1000 M=50000 q=0.4 n=2000 M=5000 q=1 1 3 5 7 9 11 13 15 17 19 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 + + + + + + + + + + + + + + + + + + + + × × × × × × × × × × × × × × × × × × × × ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ n=1000 M=50000 q=1 + n=1000 M=5000 q=1 × n=1000 M=50000 q=0.4 ∗ n=2000 M=5000 q=1 ♦ Number of iterations (K_it) Price Figure 2: Co nv ergence of the algorithm for a Euro p ea n put ba sket option with d = 5, ρ = 0 . 1, T = 3, S 0 = 100 , σ = 0 . 2, δ = 0, r = 0 . 05 K = 100, η = 3, ω = 0. The benchmark price computed with a high precision Monte–Carlo metho d yields 2 . 0 3 53 with a confidence in ter v al of (2 . 032 3 , 2 . 0383). • Influ ence of M : curv es (+) and ( × ) show that ta king M = 5000 is not enough to get the stabilizatio n of the algorithm. • Joined infl uence of n and M : curv es ( × ) and (  ) sho w the imp ortance of w ell balancing the num b er of discretizatio n p oin ts n with the n umb er of Mont e–Carlo simulati ons M . The sharp b ehavio u r of the curv e (  ) may lo ok sur prising at fir st, since we are tempted to think that a larger num b ers of p oint s n will increase the accuracy . Ho w ev er, increasing the num b er of p oint s but kee p in g th e num b er of Mon te–Carlo sim ulations constan t creates an o ve r fi tting phenomenon b ecause the Monte –Carlo err ors arising at eac h p oint are to o large and ind ep endent and it leads the appro ximation astra y . • Influ ence of q : w e c an see on c u rv es (+) and ( ∗ ) that dec r easing the hyp erb olic index q can lead a h ighly biased although smo oth con ve r gence. This highligh ts the impact of the c hoice of q on the solution compu ted by the algorithm. T o conclude, w e notice that the larger the num b er of Mon te–Carlo simulat ions is, the smo other the con v ergence is, but wh en the p olynomial family considered is to o sp arse it can lead to a biased con vergence . Europ ean call basket option. Consider the follo wing put bask et option with maturit y T 1 d d X i =1 S i T − K ! + (6.3) 17 1 2 3 4 5 6 7 8 9 10 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 M=30000 eta=5 q=0.6 M=30000 eta=4 q=0.6 M=5000 eta=5 q=0.6 1 2 3 4 5 6 7 8 9 10 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 + + + + + + + + + + × × × × × × × × × × ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ M=30000 eta=5 q=0.6 + M=30000 eta=4 q=0.6 × M=5000 eta=5 q=0.6 ∗ Number of iterations (K_it) Price Figure 3: Conv erg ence of the pr ice o f a E ur op ean ca ll basket o ption with d = 10 , ρ = 0 . 2, T = 1, S 0 = 100, σ = 0 . 2, δ = 0, r = 0 . 05, K = 100 , n = 10 0 0 and ω = 0. The be nchmark price computed with a high pr ecision Monte–Carlo metho d yields 7 . 020 7 with a confidence interv a l of (7 . 0 125 , 7 . 0288). Figure 3 i llustrates the impact of the sp arsit y o f the p olynomial basis considered o n the con v ergence of the algorithm. The smo othest con vergence is ac hiev ed by th e cu r v e (+), ie when M = 30 , 000, η = 5 and q = 0 . 6. The algorithm stabilizes v ery c lose to the tru e price and after v ery f ew iterations. • Influ ence of η : for a fixed v alue of q , the sp arsit y increases when η decreases, so the basis with η = 4 , q = 0 . 6 is more sparse than the one with η = 5 , q = 0 . 6. W e compare curv es (+) ( η = 5) and ( × ) ( η = 4) for fi xed v alues of q (= 0 . 6) and M (= 30 , 000 ). W e can se e that for η = 4 (curv e ( × )) the algorithm sta b ilizes after 7 iterations, whereas for η = 6 (curv e (+)) less iterations are needed to con ve r ge. • Influ ence of M : for fixed v alues of η (= 5) and q (= 0 . 6), we compare curv es (+) ( M = 30000 ) and ( ∗ ) ( M = 5000). Using a large num b er of simula tions i s not enough to get a go o d con v ergence, as it is sh o wn by curve ( ∗ ). A ctually , w hen the polynomial basis b ecomes t o o sparse, the approximat ion of t h e solution computed at eac h step of the algorithm incorp orates a significan t amount a noise which has a si m ilar effect to redu cing the n umb er of Mon te–Carlo sim ulations. T h is is precisely what w e observe on Figure 3: the cur v es ( × ) and ( ∗ ) h a ve a v ery similar b ehavio u r although one of them uses a muc h larger num b er of simulat ions. 6.2.2 American options In this paragraph, the p enalisatio n parameter ω app earing in (5.5) is 1. 18 Pricing Am erican put basket options. W e ha ve tested our algorithm on the pr icing of a m u ltidimensional American put option with pa yoff giv en by Equation (6.2) 1 2 3 4 5 6 7 8 9 10 7.5 7.9 8.3 8.7 9.1 9.5 9.9 M=5000 n=1000 eta=3 q=1 M=5000 n=1000 eta=5 q=0.8 M=5000 n=1000 eta=5 q=0.6 M=30000 n=2000 eta=3 q=1 M=30000 n=2000 eta=5 q=0.8 M=30000 n=2000 eta=5 q=0.6 1 2 3 4 5 6 7 8 9 10 7.5 7.9 8.3 8.7 9.1 9.5 9.9 + + + + + + + + + × × × × × × × × × ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ Δ Δ Δ Δ Δ Δ Δ Δ Δ M=5000 n=1000 eta=3 q=1 + M=5000 n=1000 eta=5 q=0.8 × M=5000 n=1000 eta=5 q=0.6 ∗ M=30000 n=2000 eta=3 q=1 ♦ M=30000 n=2000 eta=5 q=0.8 ◊ M=30000 n=2000 eta=5 q=0.6 Δ Number of iterations (K_it) Price Figure 4: Conv ergence o f the price of an American put bas ket option with d = 5, ρ = 0 . 2, T = 1, S 0 = 100 , σ = 0 . 25, δ = 0 . 1 , K = 10 0, r = 0 . 05 and ω = 1. Figure 4 presents the infl uence of the parameters M and q . • Influ ence of M : when zo oming o n Fig u r e 4, o n e can indeed see that the curves usin g 30000 Mon te–Carlo sim u lations are a little smo other than the others bu t these extra sim ulations d o not impro v e th e con vergence as m uch as in F igures 2 and 3 (c omp are curv es (+) and (  ), Fi gur e 4). The main explanation of this fact i s that put optio n s ha ve in general less v ariance than call options and in Figure 2 a maturit y of T = 3 w as used whic h leads to a larger v ariance than with T = 1. • Influ ence of q : once again, we can observe that increasing the sp arsit y of the p olynomial basis (ie, decreasing q ) can lead to a b iased con v ergence. When q = 0 . 6, w e get a biased result (see curves ( ∗ ) and ( △ )), ev en for M large (curv e ( △ ), M = 30000) . Then, it is advisable fo r American put options to use almost full p olynomial basis with few er Mon te–Carlo sim u lations in order to master the computational cost rather than doing the con trary . Hedging American put bask et options. No w, let us presen t the conv ergence of th e appro ximation of the delta at time 0. T able 1 presents the v alues of the delta of an American put bask et option when the iteratio n s increase. W e see that the conv ergence is ve r y fast (w e only need 3 iterations to get a stabilized v alue). The p arameters of the algorithm are the follo wing ones: n = 1000, M = 5000, q = 1, η = 3 and ω = 1. 19 Iteration ∆ 1 ∆ 2 ∆ 3 ∆ 4 ∆ 5 1 -0.203 931 -0.20 5921 -0.203 091 -0.20 5264 -0. 201944 2 -0.105 780 -0.10 2066 -0.103 164 -0.10 2849 -0. 108371 3 -0.109 047 -0.10 5929 -0.105 604 -0.10 5520 -0. 111327 4 -0.108 905 -0.10 5687 -0.105 841 -0.10 5774 -0. 111137 5 -0.108 961 -0.10 5648 -0.105 725 -0.10 5647 -0. 111274 T able 1: Conv ergence of the delta for a n American put basket option with d = 5, ρ = 0 . 2, T = 1 , S 0 = 100 , σ = 0 . 25, δ = 0 . 1 , K = 10 0, r = 0 . 05. 6.3 Dupire’s framewo r k W e consider a d -dimensional local v olatili ty mo del in whic h the d ynamics und er the risk- neutral measure of eac h asset is supp osed to b e give n by dS i t = S i t (( r − δ i ) dt + σ ( t, S i t ) dW i t ) S 0 = ( S 1 0 , . . . , S d 0 ) where W = ( W 1 , . . . , W d ) is defined and generated as in the Blac k-Sc holes framew ork. The lo cal vo latilit y fun ction σ w e ha ve chose n is of the form σ ( t, x ) = 0 . 6(1 . 2 − e − 0 . 1 t e − 0 . 001( xe r t − s ) 2 ) e − 0 . 05 √ t , (6.4) with s > 0. Since there exists a d ualit y b et we en the v ariables ( t, x ) and ( T , K ) in Dup ire’s framew ork, one should c h o ose s equal to the sp ot price of t h e underlying asset . Then, the b ottom of the smile is lo cated at the forw ard money . The parameters of the algorithm in this paragraph are the follo wing : n = 1000, M = 30000, N = 10, q = 1, η = 3. Pricing and Hedging E urop ean put basket options. W e consider the p ut bask et option with pa yo ff giv en by (6.2). The b enc hmark price and delta are computed u sing the algorithm prop osed by Jourdain and Lelong (2009), which is b ased on Mon te-Carlo metho ds. Concerning the delta, we get at th e last iteratio n the follo wing vect or ∆ = ( − 0 . 062403 − 0 . 061 271 − 0 . 06 2437 − 0 . 06912 0 − 0 . 06474 3). T he b enchmark delta is − 0 . 0625. Pricing and Hedging American put bask et options. W e still consider the put pa yoff giv en b y (6.2). In the case of American options in lo cal vol atilit y mod els, there is n o b enc h- mark. Ho w ev er, Figure 6 shows that the algorit h m conv erges after f ew iterations. W e get a price around 6 . 30. A t the last it eration, w e ge t ∆ = ( − 0 . 102159 − 0 . 1028 93 − 0 . 103237 − 0 . 110 546 − 0 . 10 6442). 6.4 Sp eed up. Remark 3. Be c ause in high dimensions, the se quential algorithm c an run sev e r al hours b e for e giving a pric e, we c ould not affor d to run the se quential algorithm on the cluster to have a b enc hmark va lue for the r efer enc e CPU time use d in the sp e e d up me asur ements. Inste ad w e have c ompute d sp e e d ups as the r atio sp e e d up = CPU time f or 8 pr o c essors / 8 CPU time f or n pr o c essors × n (6.5) This explains why we may get in the tables b elow some sp e e d ups slightly gr e ater than 1 . 20 1 2 3 4 5 6 7 8 9 10 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 app price ref price 1 2 3 4 5 6 7 8 9 10 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 + + + + + + + + + + app price + ref price Figure 5: Conv ergence of the algorithm for a Europ ean put basket option with d = 5, ρ = 0, T = 1, S 0 = 1 00, δ = 0, K = 100, r = 0 . 05, η = 3, ω = 0. The benchmark price computed with a high precision Monte–Carlo metho d yields 1 . 745 899 with a c o nfidence in terv al of (1 . 7 37899 , 1 . 75389 9). 1 2 3 4 5 6 7 8 9 10 6 7 8 9 10 Figure 6: Conv e r gence of the algo r ithm for an America n put bas ket option with d = 5 , ρ = 0, T = 1 , S 0 = 100 , δ = 0 . 1, K = 100, r = 0 . 0 5 a nd ω = 1. Our goal in this pap er wa s to design a scalable algorithm for high dimensional p roblems, so it is not surpr ising that the algorithm do es n ot b eha ve so w ell in relativ ely small dimension 21 as highlight ed in T able 2 . In dimension 3, the sp eed up s are linear up to 28 processors but then they dramatica lly decrease t ow ard ze r o: this ca n b e e xp lained b y t h e smal l CPU load of the computation of the correction term at a giv en p oint ( t i , x i ). Th e cost of eac h iteration of the lo op line 4 of Algorithm 1 is prop ortional to M pd 3 and when d is small so is p — the n u m b er of polynomials of total d egree less or equal t h an η . F or instance, for d = 3 and η = 3, we h a ve p = 20, which giv es a small complexit y for eac h iteration o v er i . Hence, when the num b er of pro cessors u sed increases, the amount of work to b e done by its p r o cessor b et ween tw o synchronisati on points decreases to such a p oint that most of the CPU time is used for transmitting data o r w aiting. This explains wh y the sp eed ups decrease so m uc h . A ctually , we we r e exp ecting such results as the parallel implementa tion of the algorithm has b een designed for high dimensional problems in which the amoun t of work to b e d one by eac h pro cessor cannot d ecrease so muc h un less sev eral dozens of thous and s of pr o cessors are used. This phenomena can b e observ ed in T able 3 which sho w s v ery impressiv e sp eed ups: in d imension 6, ev en with 2 56 pro cessors the sp eed u ps a r e sti ll linear w h ic h highligh ts the scalabilit y of our implemen tation. Ev en th ough computational times may lo ok a little higher than with other algorithms, one should keep in mind that our algorit h m n ot only computes prices but also hedges, therefore the efficiency of th e algorithm remains quite impressive . Nb pro c. Time Sp eed up 8 543.677 1 16 262. 047 1.03737 18 233. 082 1.03669 20 210. 114 1.03501 24 177. 235 1.02252 28 158. 311 0.981 206 32 140. 858 0.964 936 64 97.0 629 0.70016 128 103.51 3 0.3282 67 256 162.93 6 0.1042 74 T able 2: Speed ups for the American put option with d = 3, r = 0 . 0 2, T = 1 , σ = 0 . 2, ρ = 0 , S 0 = 100 , K = 95 , M = 100 0, N = 10, K = 10 , n = 2000, r = 3, q = 1, ω = 1. See Equation (6.5) for the definition of the “Sp eed up” column. 7 Conclusion In this work, we hav e p resen ted a parallel algorithm for solving BSDE and applied it to the pricing and hedging of American option whic h remains a computationally demanding pr oblem for wh ich v ery few s calable implement ations hav e b een studied. O ur parallel algorithm sh ows an impressive scalabilit y in high dimensions. T o impro ve the efficiency of the algorithm, we could try to refactor the extrap olation step to m ak e it more accurate and less sensitive to the curse of dimensionalit y . 22 Nb pro c. Time Sp eed up 8 1196.79 1 16 562. 888 1.06308 24 367. 007 1.08698 32 272. 403 1.09836 40 217. 451 1.10075 48 181. 263 1.10042 56 154. 785 1.10457 64 135. 979 1.10016 72 121. 602 1.09354 80 109. 217 1.09579 88 99.6 925 1.09135 96 91.9 594 1.08453 102 85.605 2 1.0965 110 80.203 2 1.0852 3 116 75.947 7 1.0867 6 128 68.681 5 1.0890 8 256 35.923 9 1.0410 8 T able 3: Speed ups for the American put option with d = 6, r = 0 . 0 2, T = 1 , σ = 0 . 2, ρ = 0 , S 0 = 100 , K = 95 , M = 500 0, N = 10, K = 10 , n = 2000, r = 3, q = 1, ω = 1. See Equation (6.5) for the definition of the “Sp eed up“ column. 23 References E. Anderson, Z . Bai, C. Bisc hof, S. Blac kford, J. Demmel, J . Dongarra, J. Du Croz, A. Green- baum, S. Hammarling, A. M cKenn ey , and D. Sorensen. LAP A CK U sers’ Guide . So ciet y for Indu strial and Applied Ma thematics, Philadelphia, P A, third edition, 1999. IS BN 0- 89871 -447-8 (p ap erbac k). V. Bally and G. Pag ès. Error an alysis of th e optimal quan tization algorithm for obstacle problems. Sto chastic Pr o c esses and their App lic ations , 106(1):1– 40, 200 3. G. Blatman. A daptive sp arse p olynomial chaos exp ansions for unc ertainty pr op agation and sensitivity analysis . PhD thesis, Un iv ersité Blaise P ascal - Clermont I I, 2009. G. Blatman and B. Su d ret. Anisotropic parcimonious p olynomial chao s expansions based on the sparsit y-f-effects principle. In Pr o c ICOSSAR’09, International Confer enc e in Struc- tur al Sa fety a nd R elability , 2009. B. Bouc hard and N. T ouzi. Discrete time approxima tion and Mon te Carlo sim u lation of bac kwa r d s to chastic differentia l equations. Sto chastic Pr o c esses and their Ap plic ations , 111:17 5–206, 20 04. F. Delarue and S. Menozzi. A forward-bac kw ard sto chastic algo r ithm for quasi-linear PDEs. A nnals of A pplie d Pr ob ability , 16(1):140– 184, 2006 . V. Dung Doan, A. Gaiwa d , M. Bossy , F. Baude, and I. S tok es-Rees. Pa r allel pricing algorithms for multimensional b ermudan/americ an options using Mon te C arlo metho ds. Mathematics and Computers in Simulation , 81(3):56 8–577, 201 0. N. El Karoui, C. Ka p oudjian, E. Pardoux, S. P eng, and M. Q uenez. Refl ected solutions o f bac kwa r d SDE’s, and relate d obstacle problems for P DE’s. the A nnals of Pr ob ability , 25 (2):70 2–737, 19 97a. N. El Karoui, S. Peng, and M. Quenez. Bac kward Sto chastic Differen tial Equations in Finance. Mathematic al F i nanc e , 7(1):1 –71, 1997b. K. Entac her, A. Uhl, and S . W egenkittl. Paralle l random num b er generation: Long-range correlatio n s among m ultiple p ro cessors, 1999. E. Gob et and C . Labart. Solving BSDE with adaptiv e control v ariate. SIAM Journal of N um. A nal. , 48(1), 2010. E. Gob et, J . Lemor, and X. W arin. A regression-based Mon te Carlo m etho d to s olve b ac kw ard sto c hastic differen tial equations. A nnals of Applie d Pr ob ability , 15(3): 2172–2202 , 2005. K. Huang and R. Thularisam. Paralle l Algorithm for pricing american Asia n Options with Multi-Dimensional Assets. Pr o c e e dings of the 19th Internationa l Symp osium on High P e r- formanc e Comp uting Sys tems and Ap plic ations , pages 177–185, 2005. A. Ibáñez and F. Zapatero. V aluation by simulatio n of american options through computation of the optimal exercise fronti er. Journal of Financial Quantitative A nalysis , 93:253– 275, 2004. 24 P . Jaillet, D. Lamberton, and B. Lap eyre. V ariational inequalities and the pricing of American options. A cta App l. Mat h. , 21:26 3–289, 1990. B. Jourdain and J. Lelong. Robust adaptiv e imp ortance sampling for normal rand om ve ctors. A nnals of A pplie d Pr ob ability , 19(5):1687 –1718, 2009 . P . L’Ecuy er and S. Côté. Implemen ting a random num b er pac kage with splitting faci lities. A CM T r ans. Math. Softw. , 17(1):98–1 11, 1991. IS SN 0098-3 500. doi: h ttp://doi.ac m.org/ 10.114 5/103147.103158. P . L’Ecuy er, R. Simard, E. J. Chen, and W. D. Kelton. An ob ject -oriente d random-n u m b er pac kage with many long streams and substreams. Op er. R es. , 50(6):1 073–1075, 2002. ISSN 0030- 364X. doi: htt p ://dx.doi.org/ 10.1287/opre.50.6.1073.358. J. Lelong. Pnl. http://w ww- ljk.imag. fr/membres/Jerome.Lelong/soft/pnl/index.html , 2007- 2011. F. Longstaff and R. Sc hw artz. V aluing American options b y sim ulation : A simple least-square approac h. R eview of Financial Studies , 14:11 3–147, 2001. J. Ma , P . Protter, and J. Y ong. S olving forw ard bac kw ard sto c hastic differenti al equations explicitly-a four step sc h eme. Pr ob ability The ory R e late d F ields , 98(1):3 39–359, 1 994. M. Mascag n i. S ome methods of p arallel pseudorandom n umb er g eneration. I n in Pr o c e e d- ings of the IMA W orkshop on Al gorithms for Par al lel Pr o c essing , pages 277–2 88. Springer V erlag, 1997. a v ailable at http://w ww.cs.fs u.edu/~mascagni/papers/RCEV1997.pdf . M. Matsumoto and T. Nishimura. Monte Carlo and Quasi-Monte Carlo Metho ds 1998 , c hap- ter D yn amic Cr eation of Pseudorandom Num b er Generator. S pringer, 2000. a v ailable at http://w ww.math. sci.hiro shima- u.a c.jp/~m- mat/MT/DC/dgene.pdf . G. P agès and B. Wilb ertz. GPGPUs in computational fin ance: Massiv e paral- lel computing f or American st yle options. A rXiv e-prints , Jan. 2011. URL http://a rxiv.org /abs/110 1.3228v1 . E. Pa r doux and S. P eng. Bac kward Sto c hastic Differen tial Equations an d Quasilinear Parab olic P artial Differen tial Equations. L e ctu r e Notes in CIS , 176(1) :200–217, 19 92. R. K. Th u lasiram and D. A. Bondarenko. P erformance ev aluation of p arallel algorithms for pricing multi d imensional. In ICPP W orkshops , pages 306–313, 2002. I. T ok e and J. Girard. Mon te Carlo V aluation of Multidimensional American Op tions Through Grid Compu ting. In L e ctur e notes in c omputer scienc e , v olume 3743, p ages 462– 469. Springer-Verlag, 2006. Y. W ang and R. Caflish. Pr icing and hedging america n -st yle options: a simple sim ulation- based approac h. The Journal of Computationa l Financ e , 13(3), 2010 . 25

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment