Variational Cumulant Expansions for Intractable Distributions

Jo ur na l of Articial I n te l li g enc e R e searc h 10 ( 1999 ) 435-45 5 Subm itt ed 8/98 ; pub l ished 6/9 9 V a riat ional Cum ulan t E xp a nsions for In t ractable Distri b uti ons Da vid B a rb er d a vidb@mbf ys. kun. nl Pi  e rr e v a n de Laar pi erre@mbf ys. kun. nl R WCP (R e al Worl d Co m p u ting Pa rt nersh ip ) Th e or etic a l F o u nda t i on S NN ( F o u nda t i on fo r Neur al Net w orks) Un i versi t y o f Nijme ge n , CPK1 2 31 , Ge ert G r o oteple in 2 1 Nij me g en, Th e Nethe rlands A b str a ct In tra ctable distr ib utions pre sen t a co mm o n dicult y i n inf er ence w i thin the p r oba - bilistic kno wledge r epres en tation fr amew o rk and v ar iationa l meth o d s ha v e r ecen tly b een p opular in pro viding an appr o x im a te s olution. In this ar t icle , w e descr ib e a p erturba t io nal appr oa c h i n the for m o f a cum ula n t expa nsion whic h, to lo w es t or der, reco v ers the s t a n- dar d K ul lba c k- Leibler v aria tional b o und. Higher -o rder terms descr ib e co rre ct io ns o n the v ar iationa l a ppro ac h without incurr in g m uc h fu r th e r co m putationa l cost. The rela t io n- ship t o other p erturba t io nal appr oa c hes suc h a s T A P i s a lso elucida t ed. W e demons t r ate the m e t ho d o n a par ticular c lass of un dir ected gr aphical mo dels, Bo l tzma n n ma c hines, for whic h our s im ulation res u lts co n r m impro v ed a ccura cy and enh a nced st a bilit y d ur ing lear ning. 1. I n t ro du c ti on In r e ce n t y ea r s, i n te r e st has s t ea d i ly g r o wn o v er ma n y asso ciat ed  e l d s i n r e p rese n ti ng an d pr o ce ssi n g i nf o r ma ti o n i n a pr o b a b il i sti c fr a m e w ork. Ar g u a b ly , the rea son f o r t h i s is t h a t pr i o r kn o wl e d g e of a prob lem doma i n i s r a r e l y a b so l u t e and the p robabi l i st i c fr a m e w ork therefo r e a r is e s nat u rall y a s a ca n di d a te for d e al in g wi t h thi s u ncerta i n t y . B ett er kno w n exa mp l e s of mo d e l s are B el ief Ne t w o rks ( P e arl , 19 88 ) a n d p robabi l i st i c neur a l net w o r ks (B i sh o p , 19 95 ; Ma cKa y , 1 995 ). Ho w ev e r , i nco r p o r a ti ng man y p ri or b e l i e fs ca n m a k e mo d e l s of d o mai ns so a m b iti o u s that d e ali n g wi t h them in a p robabi l isti c framew o r k i s i n trac ta b l e , and some form of app ro xi mat i on i s i nevi t abl e. One w el l-kn o wn app ro xi mat e tec hn iq ue i s Mo n te C a r lo sa m pl i ng (see e.g ., Nea l , 19 93 ; Gil ks, Ri c h a r dson, & Sp iege l hal t er, 199 6) , wh i c h has the b en e  t o f un i v ersal app li ca b i li t y . Ho w ev er, not onl y ca n s a mp l in g b e pr o - hi b iti v el y s lo w , b ut al so the l ac k o f a su itabl e co n v e r g ence cri te r ion ca n l ea d to l o w co n d e n c e i n th e r e s ul t s . Rec en tly , v a r iational met h o ds ha v e pr o vid e d a p o p ul ar al t ern a ti v e, s in c e they not o n l y a p pr o xi ma te bu t c an al so b ou nd quan t i ti e s of i n te r e s t gi vin g , p o ten t i all y , grea ter co n d e n c e i n the resu lts ( Jord a n , Gharamani , Jaa k o l a, & Saul , 19 98 ; B arb er & Wi e ge r i nc k, 19 99 ; B arb er & B i sh o p , 1 99 8) . Th e a ccur a cy of v a r i a ti o n a l me th o ds, h o w e v e r , i s l i mi t ed b y the exib i li t y of t h e v ari at i onal d i st r ib uti o n e m pl o y ed . Whi l st, i n pr i ncip l e , the exi bi l it y , and therefore, the a cc u rac y of t h e mo d el, c an b e i n c r e ased a t wi l l (se e e .g. , Ja akk ol a & Jor- dan , 199 8; La wrence, Bi shop , & J o rd an, 1 998 ), t h i s ge n e r a l l y i s a t t h e e xp en se o f i ncur ri ng ev en g r e ate r co m pu t ational d i c u l t i es an d o p t i mi za ti o n p robl ems. c  1999 AI Acc ess F o unda tion and M o r ga n Ka uf m an n Pu bl ishers. Al l ri g h ts r e s e r v ed. Barbe r & v an de La ar In thi s arti c l e, w e desc r i b e a n alternativ e a p pr o ac h wh i c h is a p e r t u rb a ti o n aroun d s t an- dard v a r i a tion a l met h o ds. Th is h a s th e pr o p ert y of p ot en t i al ly i mp ro vi ng t h e p erf o r ma n c e at onl y a smal l extra c omp uta ti o n a l co st. Si nce mo st qu a n ti t i es o f i n t erest a r e d e r i v a b le fr o m kno wl edge o f t h e n o r mali zi ng c onstan t o f a d is t r ib uti o n , w e pr e sen t a n app ro xi mat i on of thi s qu a n ti t y wh ic h, to lo w est o r der, rec o v e r s the sta n dard Ku ll b a c k-Lei bl e r v ari a ti o n a l b oun d. Hi g h e r -o r der c orrect i ons i n t h is expansi on a r e e x p ect ed to im pr o v e on the ac cur a cy of th e v ari a ti o n a l so l u t i o n , d espi te t h e l oss o f a stri ct b oun d. In sect i o n ( 2) w e w il l b ri ey di scuss w h y the normal izi ng consta n t o f p robabi l it y d is t r i- bu tions i s of suc h im p o r t ance . W e br iey r e vi ew t h e K ul l bac k-Le i b ler v ari a ti o n a l b o u nd i n sec ti o n (3) b e f o r e in tr o du c i ng t h e p e r t u rb a ti o n a l a p pr o ac h i n sec ti o n (4 ). Th i s app roa c h i s il l ustrate d on a w ell -kno w n c l ass o f u nd ir e cte d graph ical mo d e l s, Bo l tz m a n n mac hi n e s, i n sect i on (5 ), i n wh ic h w e al so expl ain the relation of thi s app roa c h to ot h e r p e r t u rb a - tional met h o ds s uc h a s T AP (Th o u l e ss , And e r so n , & P al mer, 19 77 ; K a p p e n & Ro d r   guez , 19 98 a) . W e conclu de i n sect i on (6 ) wi th a di scuss ion of the a d v an t ag es and dra wbac ks of thi s a p pr o ac h co mp a r e d t o b e tt er kn o wn te c hn iqu e s . 2. N ormal i zi n g Constan t s an d Gen erati ng F uncti ons W e consi der a f a mi l y o f p robabi l it y d is t r ib uti o n s Q o v er a v e ct or of r a n dom v a r i a b les s = f s i j i 2 [ 1 ; : : : ; N ] g , p ara m e teri z ed b y w , 1 Q ( s ; w ) = exp( H ( s ; w ) ) Z ( w ) . (1 ) With a s li gh t a b us e o f n o tat i on, w e wri te the normali zi ng co n sta n t a s Z ( w ) = Z d s exp( H ( s ; w ) ) . (2 ) Th e `p o te n ti a l ' H ( s ; w ) th u s u ni qu e l y d e termi n e s t h e d istr ib uti o n Q . W e ha v e a ss umed here that the ran dom v a r i a b le s is c on t i n u o u s|i f not, the c orr e sp ond i ng in teg r a l (se e equat i on 2) sh o u ld b e repl ac ed b y a su mmat i on o v er all p o ss ib l e di scret e sta te s. On e a p pr o ac h in stat i stics t o obta i n margin a l s a n d o ther quan t i ties o f in terest i s giv en b y c ons id e r in g ge n e r a ti ng fun c ti o n s ( Gri mm e tt & S t i rzak er, 1 99 2) , wh ic h t ak e t h e form G (  ; w ) = Z d s e x p ( l og Q ( s ; w ) +   s ) (3 ) Av erag es of v ari abl e s wi th r e sp ect to Q are g i v e n b y d e r iv ativ e s o f G (  ; w ) w ith resp e ct t o  for xed w , so t h a t, for e x a mp l e , h s 1 s 2 i = @ 2 G (  ; w ) =@  1 @  2 , ev a l uate d a t  = 0 . W e ca n equal ly w ell c ons id e r t h e fu nct i on Z (  ; w ) = Z d s e xp ( H ( s ; w ) +   s ) (4 ) so th a t h s 1 s 2 i = 1 Z (  1 ;  2 ; w ) @ 2 Z (  1 ;  2 ; w ) @  1 @  2       1 ;  2 =0 . (5 ) 1. W e ass u m e that t he distribution is strictl y p osi t i v e o v er the d o m ain. 4 36 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions F ormal ly , a l l qu a n ti t i es of i n t erest ca n b e c al c u late d from \ n o rm a l i z i n g consta n ts" o f di s- tri bu t i ons, p ossi b ly mo d i ed i n a sui ta b l e mann e r , su c h as in ( 4) a b o v e . I t i s clea r t h a t all momen t s of Q ca n b e ge n e r a ted b y di eren t i at i n g ( 4) in a si mi l a r ma n ner, and c an b e as e qu all y ea s il y obta i n e d fr o m t h e g enerat i n g f un c ti o n a p pr o ac h, a s from th e n o r mali si ng co n st an t app ro ac h . W e prefer here the n o r ma l i si ng co n sta n t app roa c h si nce thi s enab les u s more e asi l y to ma k e con t ac t wi th r e su l t s f ro m st atis t i ca l ph ysi cs. T he normal is in g consta n t app ro ac h is al so m o r e n a tur a l for the e xamp les of un di r e ct ed n et w o rk s t h a t w e sh all co n si der i n l a te r se ctions. In the foll o w in g sec ti o n s, w e la y o u t ho w to a p pr o xim a te the norm a l i si ng co n st an t, a s- su mi ng t h a t an y a d di ti o n a l te r ms i n th e n o r ma l i si ng co n st an t, of the form   s in equat i on (4 ), are a b so r b e d i n to the d en iti o n of the p ote n ti a l H . As w e wi l l s e e in sec ti o n (5. 2.3 ), the normal is in g co n sta n t a l so pl a ys a ce n t r a l role i n l ea r ni ng. Unfortun at el y , for m a n y p robabi l i t y di stri b uti o n s that ari se i n a p pl i c at i ons, ca l c u l a ti o n of the n o r ma l i z i n g c onstan t is in t r a ct abl e d ue to th e hi g h d im e n si o n a l i t y of the rand o m v ari a b l e s . When s i s a N {di mens ional b in a r y v e cto r , naiv e l y a t l ea st, a su mmat i on o v e r 2 N stat es has t o b e p e r fo r med to c al c u l a te the n o r mali zin g co n sta n t. Ho w e v e r , somet i mes it i s p o ssi b le t o e xp l o i t the structur e o f th e di stri b uti o n i n or- der t o r e d uce t h e c ompu ta ti o n a l exp e n se. A t r ivi al e xamp le i s that o f fac to r i se d mo d e l s, Q ( s ) / Q i ( s i ), i n w hi c h Z = R d s Q i  ( s i ) = Q i R d s i  ( s i ) , so that the co mp utat i on sca l es onl y li n e arl y wi t h N . S i mi larl y , in t h e c ase of a G aus sian d istr ib uti o n , c omp uta - tion sca l es (roughl y) w ith N 3 . T hese t r a cta b le d i stri bu t i ons ha v e rec en tly b ee n expl oited i n the v ari at i o n a l m e tho d to app ro xi mat e i n trac ta b l e d i st r ib uti o n s, see fo r exa mp l e (Sau l, Jaa kk o l a, & J o r dan, 1 99 6; Jaa kk o l a, 19 97 ; Ba r b e r & W i eg eri nc k, 19 99 ; Jord a n e t a l ., 199 8) . In t h e fol lo wi ng se ction w e wi l l b ri e  y revi e w one of t h e m o st c ommon v ari at i onal m e tho d s wh i c h e x pl o i ts t h e Ku l lb a c k-Lei bl e r b o u nd . 3. The Kull bac k- Lei b l er V a ri ati ona l Bound Ou r a i m i n thi s sec ti o n i s to br i e  y d esc r ib e t h e cur ren t sta te- of-t h e- art in a p pro xim a tin g the norm a l i z i n g co n st an t of an i n t r a cta b le pr o b a b i li t y d is t r ib uti o n . W e d e n o te th e i n t r a cta b le di stri b uti o n of in terest as Q 1 t o di stin g u i sh i t fr o m an a u xi li ary di stri bu tion Q 0 , w e w i ll i n t r o du c e. Wi t h o u t l o ss o f g en e r a l i t y , w e w ri te t h e di stri bu t i on o v er a v ec to r of r a n dom v ari a b l e s s , p a r a mete r ized b y a v ec tor w as Q 1 ( s ; w ) = e H 1 ( s ; w ) Z 1 ( w ) , (6 ) wh e r e t h e p o te n ti a l H 1 ( s ; w ) i s a kn o wn fu nct i on of s and w , e. g., in the ca se of a fac tori sed mo d e l t h i s w oul d h a v e the fo r m o f H 1 = P k w k s k . Th e co rr e s p o n di ng (assum e d i n trac tabl e ) norm a l i z i n g c ons t an t i s g i v e n b y Z 1 ( w ) = Z d s e H 1 ( s ; w ) . (7 ) W e w i ll use a fami ly of t r a ct abl e d i stri bu t i ons, Q 0 , paramet eri ze d b y  wh ic h w e wr ite as Q 0 ( s ;  ) = e H 0 ( s ;  ) Z 0 (  ) wi t h Z 0 (  ) = Z d s e H 0 ( s ;  ) (8 ) 4 37 Barbe r & v an de La ar wh e r e , b y a ssu mp t i on, t h e norm a l i z i n g c ons t an t Z 0 i s tra cta b l y co m pu t abl e. Stand a rd v a r i a tion a l me th o ds at te m pt t o n d the b est a p pr o xim a ti ng d i st r i bu t i o n Q 0 ( s ;   ) that mat c hes Q 1 ( s ; w ) b y mi n im izi ng th e Ku ll b a c k-Lei bl e r d i v erge n c e, KL( Q 0 ; Q 1 ) = Z d s Q 0 ( s ;  ) l o g Q 0 ( s ;  ) Q 1 ( s ; w ) . (9 ) Si n c e, u si ng J e n sen's in equali t y , K L( Q 0 ; Q 1 )  0 , w e i mmedi at el y obtain th e l o w e r b oun d log Z 1   Z d s Q 0 ( s ;  ) l o g Q 0 ( s ;  ) + Z d s Q 0 ( s ;  ) H 1 ( s ; w ) . ( 10 ) Usi ng th e den i t i on ( 8) w e o b t ai n t h e b oun d i n t h e more in tu iti v e fo r m, log Z 1  l og Z 0 + Z d s Q 0 ( s ;  ) ( H 1 ( s ; w )  H 0 ( s ;  ) ) ( 11 ) Th e lo w er b oun d i s m a d e as ti g h t as p o ss ib l e b y m a xi mi zin g t h e r i g h t- h a n d sid e wi t h resp ec t to the p a r a mete r s  of Q 0 , wh i c h corresp ond s to mi n im is in g th e K ul l bac k-Le i b ler d i v erge n c e (9 ). Thi s app roa c h has rec en t l y rece i v ed att en t i on i n the a r t i  c i a l i n t el l ige n ce co mm u ni t y and h a s b e en demonstr a te d to b e a u se f ul tec hn i que (Saul e t al . , 19 96 ; Ja akk ol a , 19 97 ; Ba r b e r & W i eg eri n c k, 199 9) . No te th a t wh il st thi s l o w e r b o u nd i s usefu l in pr o vi di ng an ob jec ti v e compari son o f t w o a p pr o xi ma ti o n s to the n o r ma l i si ng co n sta n t (t h e app ro xi mat i on wi th th e h i g h e r b o u nd v alu e is pr e ferr e d ) , it d o e s not t r a n sl a te di rec tl y i n t o a b o u nd on co n di tional prob a b il i t i es. An u pp er b oun d on the norm a l i si ng c onstan t i s als o requ ir e d i n thi s ca s e . Whi l st , i n some ca ses, th is ma y b e fea s ib l e , i n g en e r a l thi s tend s to b e a rather more d i cul t ta sk, a n d w e r e str ict our att en t i on to t h e l o w er b o u nd (Jaa kk o l a, 19 97 ). In the n e xt se ction, w e d esc r ib e an o ther a p pr o ac h t h at e n a b l e s us to e xp l o i t fu rther suc h trac ta b l e di stri b ution s. 4. The V ari a t i on al Cum ulan t Expa nsi on In o r der to exte n d the Ku l lb a c k- Lei bl er v ari a ti o n a l l o w e r b o u nd wi th a view to im pr o vin g i ts ac cur a cy w ith o u t m u c h grea te r co m pu t ational exp e n se, w e in t r o du c e the fo l l o wi ng f a mi l y of p robabi l i t y di stri bu t i ons Q  , p a r a mete r ize d b y  , w and  : Q  ( s ; w ;  ) = e H  ( s ; w ;  ) Z  ( w ;  ) ( 12 ) wh e r e the p o te n ti a l is gi v en b y H  ( s ; w ;  ) =  H 1 ( s ; w ) + (1   ) H 0 ( s ;  ) . ( 13 ) Th e probabi l i t y d istri b uti o n s i n thi s f a mi l y i n t erp ol a te b et w ee n a tract abl e di stri bu tion, Q 0 wh i c h i s o b t ai ned for  = 0, and an i n trac ta b l e d istri b uti o n , Q 1 w hi c h i s obta i n e d f o r  = 1 . F or i n t erm e d iate v alu es of  2 (0 ; 1) , th e di stri b ution s r e m a i n i n trac ta b l e . The norm a l i z i ng co n st an t of a d istri b uti o n f ro m th is fami l y i s giv e n b y log Z  ( w ;  ) = log Z d s e H 0 ( s ;  )+  ( H 1 ( s ; w )  H 0 ( s ;  ) ) = log Z 0 + l og D e  ( H 1 ( s ; w )  H 0 ( s ;  ) ) E 0 , ( 14 ) 4 38 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions wh e r e h : i 0 d e n o tes e xp ect at i on w ith r e s p e ct to t h e t r a ct abl e di stri bu tion Q 0 . T he n a l te r m i n ( 14 ) i s in trac tabl e for an y  6 = 0 , and w e sh a l l therefo r e dev el o p a T a yl o r seri e s expansi on for i t a r o u nd  = 0 . Th a t is, l o g D e  ( H 1 ( s ; w )  H 0 ( s ;  ) ) E 0   h H 1 ( s ; w )  H 0 ( s ) i 0 +  2 2 ( H 1 ( s ; w )  H 0 ( s )  h H 1 ( s ; w ) i + h H 0 ( s ) i 0 ) 2 + O (  3 ) ( 15 ) In terms o f the rand o m v ari a b l e  H  H 1 ( s ; w )  H 0 ( s ;  ), w e see that (1 5) co n ta i n s the r st te r ms i n a p o w er se r i e s in  H , the  rst b ein g t h e me an, and the sec ond the v ari a n c e of  H . M ore fo r mall y , w e rec og n i z e t h e  nal term i n (1 4) as t h e cum u lan t generat i ng fu nct i on K  H (  )  l o g D e   H E 0 , w ith r e sp ect to the ra n dom v ari a b l e  H (G r i mmet t & Sti rza k er, 1 99 2). If th e l o ga r ith m o f the mo m e n t g eneratin g fu nction D e   H E 0 i s  ni t e i n the nei g h b orh o o d o f t h e ori g i n t h e n t h e cum u lan t generat i ng fu nct i on K  H (  ) h a s a co n v e r g en t T a yl or se r ies, K  H (  ) = 1 X n =1 1 n ! k 0 n ( H )  n , ( 16 ) wh e r e k 0 n ( H ) i s the n th cum ul an t of  H wi th resp ec t t o t h e d is t r ib uti o n Q 0 . (The de ni t i on of t h e n th c u m ul an t i s k 0 n (  H ) = @ n @  n l o g D e   H E 0  =0 ) . F or co n v eni e n c e w e h a v e no lon g er denote d the e xp l ici t d e p end e n c e o n th e paramet ers  a n d w . Si n c e the in trac tabl e norm a l i z i n g c onstan t Z 1 c orr e sp ond s t o sett i ng  = 1 i n equat i on (1 4) , w e ca n w ri t e t h e T a yl o r se r ies a s log Z 1 = l og Z 0 + l X n = 1 1 n ! k 0 n ( H ) + 1 ( l + 1) ! k  l +1 (  H ) , ( 17 ) wh e r e t h e rema i n der f o l l o ws f rom the the Me an V alu e Theorem (see , fo r exa m pl e Sp iv ak (1 96 7) ), and k  l +1 (  H ) is t h e ( l + 1) th cum ul an t o f  H wi t h r e sp ec t to Q  , 0    1 . An app ro xi mat i o n i s obta i n e d b y si mp ly neg l ec ti ng t h i s (in tract abl e) r e m a i nd e r and usi n g the tru nca te d e xp a n si o n : log Z 1  l og Z 0 + l X n = 1 1 n ! k 0 n ( H ) , ( 18 ) No te that t h e ri g h t-hand si de of equation ( 18 ) d e p end s on t h e free paramet ers  , a n d w e sh all di scus s l a ter ho w t h ese paramet ers c an b e se t t o i mp ro v e t h e a cc u racy o f the app ro xi mat i on. In the foll o w in g t w o s ub se ctions , w e wi l l l o ok in mo r e d e ta i l a t t h e  rst a n d sec ond -o r der app ro xi mat i o n of thi s T a yl or seri es, si n c e these il l um in a te t h e v a r iational met h o d a n d the di cu lti e s i n reta i n in g a l o w er b o u nd . 4.1 First Order and t h e V a r ia tional Bo und Th e T a yl o r se r ies ( equ a ti o n (1 7)) up t o rs t o r der yi e l d s log Z 1 = l og Z 0 + h  H i 0 + 1 2 k  2 (  H ) , ( 19 ) 4 39 Barbe r & v an de La ar wh e r e the r e mai nd e r k  2 ( H ) i s e qu a l t o the v a r iance (t h e sec ond cum u lan t) o f  H wi th resp ec t to Q  wi th 0    1 , i .e ., k  2 = v ar  ( H ) = D  H 2 E   h  H i  2 , a n d h : i  d enot es exp e ct ation w ith resp e ct t o t h e d i st r i bu t i on Q  . Si nce th e v ari a n ce i s n o n neg at i v e for an y v alu e of  , w e h a v e log Z 1  l og Z 0 + h  H i 0 , ( 20 ) so that l o g Z 1 i s not o n l y app ro xi mat ed bu t also b oun ded fr o m b e l o w b y t h e ri g h t hand si de of equat i on (2 0). T hi s b oun d i s the standard mea n  e l d b o u nd ( s e e a l so equat i on (1 0)) , and i s empl o y e d in ma n y v ari a ti o n a l met h o ds (see , fo r e xamp le Saul e t a l ., 19 96 ; J a akk ol a , 19 97 ; Ba rb er & Wi e geri nc k, 19 99 ). Th is b oun d ca n t h e n b e m a d e as t i gh t as p o ssi b le b y optim izi ng wi t h r e s p e ct to the f ree paramet ers  o f th e t r a cta b le di stri bu tion Q 0 . 4.2 Seco nd a n d Hig her Order T o sec on d o r der e qu at i o n ( 17 ) yiel ds log Z 1 = l og Z 0 + h  H i 0 + 1 2 v a r 0 ( H ) + 1 6 k  3 ( H ) , ( 21 ) wi th t h e rema i n der k  3 = D  H 3 E   3 D  H 2 E  h  H i  + 2 h  H i 3  and 0    1. Si nce thi s remai nd e r c ann ot b e b oun ded i n a trac tabl e mann e r for general di stri bu t i ons Q 0 a n d Q 1 , w e ha v e n o o b vi o u s cri te r i o n to pr e fer o n e set o f paramet ers  of t h e t r a cta b le d i st r i bu t i on o v e r a n o ther. S im il arl y , for h i g h e r - ord e r expansi o n s i t i s un c l ea r , w ith in thi s expansi on, ho w to o b t ai n a tract abl e b o u nd and , co n sequen t l y , h o w to select a se t of p a r a m e te r s  . W e stress t h at a n y c r i t eri o n i s e s se n ti a l l y a h euri sti c si nce the err o r made b y the r e su l t i ng app ro xi mat i o n i s, b y assu mpti o n , not t r a cta b ly compu t abl e. Whi l st, i n p ri nci pl e , a w hole range o f c r iteri a coul d b e co n si dered in thi s fr a m e w ork, w e me n ti o n h e r e onl y t w o. The r st, th e b o u nd c r iteri o n , i s arg u abl y th e si mp lest and mo s t natur a l c hoice. Th e sec on d cri t eri on w e br iey men t i on is the i n dep e n dence metho d , whi c h w e d is c u ss m a i n ly for i ts rel a ti o n to the T A P m e tho d of sta ti st i ca l ph ysi cs ( T houl ess e t al . , 1 977 ). 4.2 .1 Boun d Criter i on Th e r st c r iteri o n assi g n s the fr e e p a r a mete r s  to ma xi mi ze t h e (  rst-o r der) b o u nd , e qu a - tion (20 ). T he resul tin g d i stri bu t i on Q 0 ( s ;   ) is used i n t h e tru nca ted app ro xi mat i on, equat i on ( 18) . T o se cond ord e r th is i s e xp l ici t l y g i v e n b y log Z 1  l og Z  0 + h  H i  0 + 1 2 v ar  0 ( H ) , ( 22 ) wh e r e a `  ' denote s t h a t the d i st r ib uti o n Q 0 ( s ;   ) i s to b e u sed. Not e that t h e c omp uta - tional c ost i n v o l v ed o v er t h a t o f th e v ari a ti o n a l met h o d of se ction (4 .1) i s small . N o fur ther optim izat i on is r e qu i red, onl y the calcul at i on o f the se cond cum u lan t v ar  0 (  H ) wh i c h b y assum ption i s t r a cta b le. 4 40 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions 4.2 .2 Ind epen denc e C riter i on Th e seco n d c r i t eri o n w e co n sid er is b a s e d on the fa ct th a t t h e exa ct v al ue of Z 1 i s i nd e p en- den t of the f ree paramet ers  , i.e. , @ l og Z 1 @  i  0 . ( 23 ) Si n c e l og Z 1 i s i n tra cta b le, w e su bsti t u te t h e t r un c ate d cum u lan t expansi on i n to equat i on (2 3) , @ @  i l o g Z 0 + l X n =1 1 n ! k 0 n ( H ) !  0 . ( 24 ) In ge n era l , th e r e a r e man y s o l uti o n s r e s ul tin g fr o m the i n dep e n dence cri t eri on a n d alone it i s t o o w e ak t o p ro vi d e rel i a b le so l u t i ons. U n der ce r t ai n restri ct i o n s, h o w ev e r , t h is app roa c h i s c l o s e l y relat ed to the T AP a p pr o ac h of st atisti c al m e c h a n i c s (Thou less e t a l ., 1 97 7; Pl e f k a , 19 82 ; Ka p p en & Ro dr  guez, 19 98 a, 19 98 b ), as w i ll a l so b e desc r i b e d i n sect i on ( 5.1 .2) . 5. B olt zm a nn Ma c h i nes T o i ll u st r a te the foreg oi ng theory , w e wi l l co n si der a cl a ss of d i sc r e te un di r e ct ed g r a p hi ca l mo d e l s, Bo l tz man n mac h in e s ( Ac kl ey , Hi n t on, & Sejn o wski , 1 98 5). Th ese h a v e app l icat i on i n a r t i  c i a l in tell i g en c e as sto c h a sti c c on nec ti o n is t mo del s (Jord a n et a l ., 1 998 ), in i mag e resto r a ti o n (G eman & Ge m a n , 19 84 ), a n d in sta ti stica l ph ysi cs (Itz ykson & Droue, 198 9) . Th e p ote n ti a l o f a Bo l t zmann mac hi ne, wi th b in a r y rand o m v ari abl es s i 2 f 0 ; 1 g , i s giv en b y H ( s ; w ) = X i w i s i + 1 2 X i; j w ij s i s j ( 25 ) wi th w ij  w j i and w ii  0 . Unf o r t u nat el y , B oltzmann mac hi n e s a r e i n g eneral i n t r a cta b le si nce calcul at i on of t h e n o r ma l i zin g co n sta n t in v o l v e s a sum ma ti o n o v er an exp o n e n ti a l n u m b er o f stat es. In an e arl y e ort to o v erco me t h is i n trac ta b i li t y , P e terson a n d And e r son (1 98 7) p rop o sed the v ari at i onal app ro xi mat i on usi n g a fac tori sed mo del a s a fast al t ern a - tiv e to s t o c h a s t i c s a mp l in g . Ho w ev e r , Ga l l a n d (1 993 ) p o i n t ed out that thi s app ro ac h often fai ls si nce i t i n a d e qu a tely ca p t u res se cond ord e r s t at i sti c s of t h e i n t r a cta b le di stri bu t i on. Usi ng the p o te n ti a l l y more ac cur a te T AP ap pro x imation di d n o t o v e r c ome these di cul ties and o ften l ea d to u nstabl e so l u t i o n s (Ga l l a n d, 1 993 ). F ur t h e r mo r e , i t i s not c l ea r ho w t o ext end the T AP app roa c h to dea l w ith us in g mo r e c omp lex, non-fac to r is e d app ro xi mat i ng di stri b uti o n s. W e e xami n e th e r e l at i onsh i p of our app roa c h to th e T AP me th o d i n sec - tion ( 5.1 .2 ). Anot h er app roa c h wh i c h ai ms to im pr o v e the a ccur a cy o f the c orr e l at i ons, though n ot th e n o r mali si ng co n sta n t, is li n e ar r e s p o n se ( P a r i si, 19 88 ) w hi c h w a s app l ied t o Bo l tz mann ma c hi nes b y K a p p e n a n d Ro dr   g u e z ( 19 98 a) for the c ase of usi n g a fac tori sed mo d e l , see se ction ( 5. 1). T he app ro ac h w e tak e here i s a l ittle d ieren t i n that w e w is h to  nd a b et te r app ro xi mat i on p ri mari l y to th e n o r mali si ng co n sta n t. App roa c h e s suc h as l in ea r resp o n se can then b e a p pl i ed to thi s app ro xi mation to deri v e a p pr o xi ma ti o n s for the co r relat i ons, i f desir e d . 4 41 Barbe r & v an de La ar Mo re exi bl e app ro xi matin g d is t r ib uti o n s ha v e al so b ee n consi dered i n th e v ari a ti o n a l app ro ac h , f o r e xamp le, mi xt u res of fact ori sed d i st r i bu t i ons (Jaa kk ola & Jo r dan, 19 98 ; La w rence et al., 1 99 8), and d e ci mat abl e s t r uct u res ( S a u l & Jo r dan, 1 99 4; Ba r b e r & W i eg eri nc k, 19 99 ). W e sho w ho w t o c om bi ne the use o f suc h more exib l e app ro xi matin g d i st r ib uti o n s wi th hi gher-o r der co rr e ctions to the v a r iat i onal p ro ce d ur e . F o r cl a r it y and s im pl i c i t y , w e wi l l i ni t i all y app ro xi mate th e n o r mali zin g c onstan t of a ge n e r a l i n trac ta b l e Boltzma n n ma - c h in e Q 1 , see equ a ti o n ( 25 ), us in g onl y a fa cto r is e d B oltzmann m a c h i ne a s our t r a cta b le mo d e l Q 0 . Su bsequ e n tl y , i n our si m ul a ti o n s, w e w i ll u se a m o r e  e xi b le trac tabl e mo del , a decim a ta b l e Bo l t zmann ma c h i ne. 5.1 Us ing F actorised B oltzm ann Mac hines Th e p ot en tial of a fac tori se d B ol t zmann mac h in e i s g i v e n b y H 0 = X i  i s i . ( 26 ) F or t h is si mp le m o del , ca l cul a ti ng h igh e r - ord e r cum ul an t s i s strai g h tforw a r d o n c e th e v a l ue of th e fr e e p a r a m e te r s  are kno w n si nce , h s i 1 s i 2 : : : s i l i 0 = h s i 1 i 0 h s i 2 i 0 : : : h s i l i 0 w ith i 1 6 = i 2 6 = : : : 6 = i l . ( 27 ) F or nota tion a l c on v eni ence , w e de ne m i to b e the e xp ec tat i on v a l u e o f s i , m i = h s i i 0 = (1 + exp (   i ))  1 : ( 28 ) 5.1 .1 Boun d Criter i on Usi ng a fa cto r ised d is t r ib uti o n to app ro xi mat e t h e i n tra cta b le normal izi ng consta n t c orr e - sp ond in g to equat i on ( 25) g i v e s, fr o m equat i on ( 20) , t h e v ari a ti o n a l b o u nd log Z 1  S ( m ) + X i w i m i + 1 2 X i;j w ij m i m j , ( 29 ) wh e r e , for c on v en ience, w e h a v e den e d th e e n tr o p y S ( m ) =  X i f m i log m i + (1  m i ) l og (1  m i ) g . ( 30 ) On e c an e i ther ma xi mi ze t h e b o u nd , i .e ., t h e ri g h t-hand si de of equ a tion (2 9) , di rect l y , or us e the f a ct that at the m a xi m um, @ @  k 2 4 S ( m ) + X i w i m i + 1 2 X i; j w ij m i m j 3 5 = 0 , ( 31 ) wh i c h leads t o th e xed p o i n t equat i on ( w ith the u nd e r st an di ng that the m e ans m are rel a ted t o t h e p a ramete rs  b y e qu a ti o n ( 28 ))  i = w i + X j w ij m j : ( 32 ) 4 42 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions Th e w el l-kn o wn \ mean  e l d" equat i ons (P a r is i, 198 8) are a re- expr e ss ion of (3 2) in te r ms of the mea n s onl y , giv en b y (28 ). Th e opti mal parame ters of the f a ct ori sed m o del mi g h t therefo r e a l so b e o b t ai ned b y i t erat i n g , eith e r sync hronou sly o r a s ync h ronousl y (P e te r son & And erson, 19 87 ), these xed p oin t e qu a ti o n s. No te that e qu a ti o n (31 ) a l so hold s for mi ni ma and s a d dl e -p oin ts, so i n general add i t i o n al c h e c ks a r e n e eded to a ssert that the paramet ers co r resp o n d t o a m a xi m um o f (2 9) . Inserti o n of th e xed p o i n t solu tion i n t o the sec ond -o r der e xp a n si o n f o r the b o u nd cri te r i o n , equat i on ( 22) , yiel ds (up to seco n d o r der) log Z 1  S ( m ) X i w i m i 1 2 X i;j w ij m i m j 1 4 X i; j w ij 2 m i (1  m i ) m j ( 1  m j ), ( 33 ) wh e r e w e e m ph a si ze t h a t the m i a r e g i v e n b y t h e v a r i a ti o n a l b oun d so l u t i on, i .e ., equat i on (3 2) . 5.1 .2 Ind epen denc e C riter i on Usi ng a fac to r i se d d is t r ib uti o n to a p pr o xim a te t h e i n t r a ct abl e n o r ma l i z i n g c onstan t , se e equat i on ( 25) , g i v es up t o s e co n d order l o g Z 1  S ( m ) + X i w i m i + 1 2 X i;j w ij m i m j + 1 4 X i; j w ij 2 m i (1  m i ) m j (1  m j ) + 1 2 X i 0 @  i  w i  X j w ij m j 1 A 2 m i ( 1  m i ) . ( 34 ) Th e i nd e p en dence cri t eri on l e ads to the equ a tions 0 B @ X j w ij 2 m j ( 1  m j ) + 0 @  i  w i  X j w ij m j 1 A 2 1 C A  1 2  m i  = X j  j  w j  X k w j k m k ! w ij m j ( 1  m j ) . ( 35 ) In ge n e r a l , t h e r e a r e man y solu t i ons t o these equat i ons a n d t h e sea rc h can b e l i mi te d b y i nserti ng i n equ a tion ( 34 ) the co n strain t that t h e se cond - ord e r solu t i on i s cl o s e to t h e r st - order so l u t i on, i.e.,  i = w i + P j w ij m j + O ( w 2 ), and b y neg l ec ti ng te r ms o f O ( w 4 ) an d hi gher. Th is s im pl i es e qu a ti o n (3 4) to  F T A P , the n e ga ti v e T AP free e n e r g y 2 ( T houl ess et al . , 19 77 ; Pl efk a, 198 2; Ka p p e n & Ro d r   guez , 19 98 a) . Th e i nd ep e n dence cri te r i o n , i .e. , @ F T AP =@  = 0, l e ads n o w t o t h e  xe d p oi n t co n di tion  i = w i + X j w ij m j + 1 2 X j w ij 2 (1  2 m i ) m j (1  m j ) , ( 36 ) Th e s e T A P equat i ons ca n h a v e p o or co n v e r g ence p ro p erti e s and ofte n p ro d uce a p o or solu tion (B r a y & Mo o r e , 19 79 ; Nemot o & T a k a y a m a , 19 85; Gall and , 1 99 3). Th e add iti o n a l 2. I n Thoules s e t a l. (197 7), Plef k a (19 82 ) and Ka p p en and R o dr  guez (19 98a ) the rand o m v aria bles s i 2 f 1 ; 1 g . 4 43 Barbe r & v an de La ar m m m s 1 s 2 s 3  1  12  1 3  2  3  2 3 @      @ @ @ @  @ (a) With s 1 m m s 2 s 3  0 2  0 3  0 23 @   @ (b ) Without s 1 Figur e 1: Th e B ol t zmann m a c h in e wi th t h e rand o m v ari a b l e s 1 ca n b e d e ci mat ed t o a Bo l tz mann ma c h i ne wi t h o u t s 1 . co n st r a i n t t h a t the T AP so l u t i on sh o u l d co r resp o n d to a mi n im u m o f F T AP i mp ro v e s the solu tion. Ho w ev e r , s in c e T AP is t h erefo r e e ss e n ti a l l y a n expans ion a r o u nd the  rst-o r der solu tion, w e e xp ect the n u me r ical d i e r e n c e b e t w een the b o u nd c r i t eri o n and con v erge n t T AP resu lts to b e sma l l . F or these rea sons , w e prefer th e str a i g h tforw ard b o u nd c r i t eri on and l ea v e other c r i t eri a for separat e stud y . 5.2 Num erical Res ult s In thi s sect i on w e p rese n t r e su l t s o f co mp uter si m ul at i ons t o v ali d a te o u r a p proac h. I n sec ti o n ( 5.2 .1 ) w e w il l c omp a r e the di ere n t metho d s o f sect i o n (4 .1 ) a n d sect i on (4 .2 ) on app ro xi mat i ng the normal izi ng c ons t an t Z . A si mi lar compari son i s made for app ro xi mat i ng the co r rel a ti o n s i n sec ti o n (5. 2.2 ). In sec ti o n (5 .2. 3) w e sh o w t h e r e su lts o f a l ea r ni ng p robl em i n w hi c h a B oltzma n n ma c hi ne i s u se d to l ea r n a set o f sa m pl e pat tern s, a t yp i c al mac hi ne l e arn i ng pr o b l e m. I n al l c ases, the n u m b ers o f rand o m v ari a b l e s in the Bo l tz m a n n ma c hi nes are c h o s e n to b e s ma l l t o f a ci li ta te co m pari so n s wi th t h e exa ct r e su l t s. In some si m ul at i ons w e wi l l n o t onl y use f a ct ori sed Bo l tz mann m a c h i nes b ut a l so more exi bl e mo del s that can mak e the v a r i a ti o n a l b o u nd (se e equat i on (2 0) ) ti g h te r , n a m e l y decim a ta b l e Bo l tz man n mac hi nes (Saul & Jord a n , 1 994 ; R  u g er, 1 99 7; Ba rb er & Wi e ge r i nc k, 19 99 ). Dec i mat i on i s a tec hn iq ue that eli mi n a te s r a n dom v a r iabl es s o t h a t t h e norm a l i z i ng co n st an t for t h e di stri b uti o n on t h e remain i ng v ari a b l e s remain s u nc hange d u p t o a consta n t kn o wn fac tor. F o r co m pl et eness, w e br iey d e scri b e th is te c h ni qu e i n the cu rren t co n text . Su pp ose w e ha v e a B ol t zmann mac h in e wi t h m a n y random v a r iabl es, fo r wh i c h a t h ree - v ari a b l e sub g r a p h is depi ct ed i n F i gure 1(a ), so t h a t r a n dom v a r iabl e s 1 i s c onn e cte d onl y to r a n dom v ari a b l e s s 2 a n d s 3 . Ma themat i ca l l y , t h e cond iti o n for i n v ari a n c e of Z a fter remo vi n g rand o m v ari abl e s 1 , Fi g u re 1 (b), b eco mes X s n s 1 e  0 +  0 2 s 2 +  0 3 s 3 +  0 23 s 2 s 3 = X s e  +  1 s 1 +  2 s 2 +  3 s 3 +  12 s 1 s 2 +  13 s 1 s 3 +  2 3 s 2 s 3 , ( 37 ) 4 44 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions g g h g g h g g h g g h g g h g g h g g h g g h (a) fa ctori sed m o d el g g h g g h g g h g g h g g h g g h g g h g g h       P P P B B B             P P P B B B (b) deci m atable g g h g g h g g h g g h g g h g g h g g h g g h       @ @ @ @ @ @     @ @ @ @ @ @ @ @     P P P B B B             P P P B B B ! ! ! ! ! ! !        a a a a a a a L L L L L L L ! ! ! ! ! ! !        a a a a a a a L L L L L L L (c) f u l ly co n nected Figu re 2: Bo l tz man n mac hi nes o f ei g h t ra n dom v ari a b les wh e r e w e i n c l ud e the c onstan t  f o r co n v eni e n ce . I n v a r iance i s fu l l led i f  0 2 =  2 + l o g 1 + e  1 +  12 1 + e  1 ,  0 3 =  3 + log 1 + e  1 +  1 3 1 + e  1 ,  0 23 =  23 + log  1 + e  1   1 + e  1 +  12 +  1 3  ( 1 + e  1 +  12 ) (1 + e  1 +  1 3 ) and  0 =  + l og  1 + e  1  . ( 38 ) F or d ec i mat abl e Bo l tz m a n n ma c h i nes, one can r e p eat t h e decim a tion pr o c ess un ti l, - nal l y , o n e e n ds up wi t h a Bo l tz mann mac h in e whi c h co n si st s o f a s in g l e rand o m v a r i a b le wh o s e n o rm a l i z i n g co n sta n t is tr ivi al to compu t e. Th i s mea n s t h a t for decim a ta b l e Bo l tz - mann mac hi n e s th e n o r mali zin g c onstan t ca n b e c omp ute d i n ti me l in e ar i n the n um b er of v a r i a b les. Dec i mat i on i n u nd i rec te d mo d e l s i s si mi l a r in sp ir it to v ari abl e el i mi nat i on sc h e m e s i n g r a p hi ca l mo del s. F or exa mp l e , in di rec ted b e l i ef n et w o rk s, \bu c k et el im in a - tion" enab les v ari abl es to b e e l i mi nat ed b y p a s sin g co mp ens a tin g messag es to o th e r bu c k e ts (c ol lec ti o n s of n o des) i n suc h that the desi red ma r g i n a l on t h e rema i n in g no d e s remain s the same (D ec h te r , 19 99 ). 5.2 .1 App r o xima ting the Nor malizing Cons t ant As o u r ` i n trac tabl e ' di stri bu tion Q 1 , w e to o k a fu l ly co n nec ted, ei g h t-no d e Bo l tz mann ma - c h in e , w hi c h i s n o t decim a ta b l e ( see F i gur e 2(c )). As our t r a cta b le Bo l tz m a n n m a c h i ne, Q 0 , w e t o ok b ot h the fac tori se d mo d el of Fi g u re 2( a) a n d th e d ec i mat abl e mo d e l of Fi g u re 2 (b ) . W e then t r i e d to a p pr o xi ma te the n o r ma l i zin g c ons t an t for th e ful l y co n nect ed mac hi ne i n w hi c h t h e p ara m e ters w ere rand o ml y d ra w n fr o m a zero- mean, G auss ian d i st r i bu t i on wi th u ni t v a ri ance . S ee the a p p e n di x for add iti o n a l theo r e ti c al a n d exp e r im e n ta l d e ta i l s rega rd i ng th e o p tim iza ti o n sc heme u sed. The relativ e err o r of t h e app ro xi mat i on, E = l o g Z ex act  l o g Z ap pr o x l o g Z ex act , ( 39 ) w a s t h en d e termi ned fo r b ot h the r st a n d se co n d-ord e r a p pr o ac h usi ng b ot h the fac tori sed and dec i mata b le mo d e l . Thi s w as r e p ea te d 55 0 t i mes for di e r e n t ran dom dr a wi ngs of the p a ramete r s of the fu l ly c onn e cte d Bo l tz man n m a c h in e. Th e resul tin g hi sto grams are 4 45 Barbe r & v an de La ar dep icte d i n Fi g u re 3. In th e se hi sto grams o n e ca n cl e arl y see that the b oun d, w hi l e p rese n t i n the  rst-o r der a p pr o ac h, i s lost in th e sec ond - ord e r a p proac h si nce some n o r mali si ng co n st an t e sti mat es w ere a b o v e t h e tru e v a l u e , viol a ti ng t h e lo w er b oun d. In v e out of 11 00 ca ses (t w o tim e s 55 0) the sync hron o u s me an  e l d i te r a ti o n s di d n o t c on v e rge an d these outli ers w e r e neg l ec ted in th e pl o ts. It i s, h o w e v er, p ossi bl e to mak e the n u m b e r of non - co n v erg en t c ase s ev en l o w er b y usi ng a syn c h ronous i nstea d of syn c hr o n o u s up d a te s of the mea n  e l d paramet ers ( P et erson & An derson, 19 87 ; A n sari , Hou, & Y u, 19 95 ; Saul et al . , 1 99 6). No te t h a t, i n b o th case s, t h e se co n d-ord e r met h o d i mp ro v e s o n a v erag e on the st and a r d (r st ord e r ) v ari at i o n al r e su lts. In dee d , usi n g onl y a fac tori se d mo d e l wi th the s e co n d-order co r rec ti o n i mp ro v e s on the s t and a r d v ari a ti o n a l dec i mata b le resul t. W e a l so dete r mi ned w het h er the sec ond -o r der app roa c h i mp ro v e s th e ac cu rac y of the pr e d i c ti o n i n ea c h i n di vi du a l r un . In o r der t o d et ermi n e t h i s w e u sed t h e pai red di erence ,  =      l o g Z ex act  l o g Z rst l o g Z e xact            l og Z exa ct  l og Z seco nd l o g Z ex act      . ( 40 ) If  i s p o si tiv e t h e sec ond ord e r i mpr o v es th e rs t o r der app roa c h . T he co r resp o n di ng resu lts o f o u r c omp ute r si m ul at i ons a r e depi ct ed i n F i gur e 4 . In all run s  w as p osi t i v e , co r resp o n di ng t o a c ons isten t i mp ro v e men t i n ac cu ra cy . 5.2 .2 App r o xima ting the Mea ns and Corr ela ti o ns Th e r e a r e m a n y tec hn iq ues that can b e ca l l e d u p on t o app ro xi mat e momen t s a n d correl a - tions . P ri mari ly , th e c u m ul a n t e xp a n si o n , as w e h a v e desc r i b e d i t, i s in tend e d to i mpr o v e the a ccuracy i n estim a tin g t h e n ormali si n g co n sta n t, and not di rec tl y the mome n ts of the di stri b uti o n . Ar g u a b ly t h e sim pl est w a y t o app ro xi mat e co r relations is t o u se t h e s t an- dard v a ri at i onal app roa c h to n d the b est app ro xi mat i n g di stri bu tion to Q 1 a n d then t o app ro xi mat e the momen t s of Q 1 b y t h e m o men ts of Q 0 . I n thi s a p proac h, ho w ev er, ce r t ai n co r relat i ons that are not p rese n t in t h e stru c tu re of the app ro xi matin g d is t r ib uti o n Q 0 , wi l l b e tri vi a l l y app ro xi mat ed. F or exampl e, app ro xi mat i n g h s i s j i Q 1 u sin g the fac tori sed di stri b uti o n g i v es h s i s j i Q 1  h s i i Q 0 h s j i Q 0 . W e w i ll exa mi n e i n t h i s se ction so me w a ys of i mpr o vi ng th e e s t i mat i on o f th e c orr e l at i ons. On e p ro ce d ur e that can b e used to app ro xi mate co r relations s uc h a s h s i s j i Q 1 i s giv en b y t h e so-c al led l i nea r r e s p o n se theo r y ( P a r i si, 198 8) . Th is app roa c h uses the r e l at i onsh i p (5 ) i n wh i c h the i n t r a cta b le n o rm a l i si ng c onstan t i s r e p lace d wi t h i t s ap pro xi mation (29 ). Th e app roa c h t h a t w e adopt here, d ue t o it's straigh tfo r w a rd rel a ti o n shi p to n o r mali si ng co n st an ts, i s g i v e n b y co n si deri ng t h e fol lo w i ng r e l at i o n sh ip : h s i s j i Q 1 = Q 1 ( s i = 1 ; s j = 1 ) = e h i + h j +2 J i j Z  [ i; j ] Z ( 41 ) wh e r e Z  [ i;j ] i s th e n o rm a l i si ng c onsta n t of t h e ori g i n a l net w o r k in wh ic h no d e s i a n d j h a v e b ee n r e m o v ed and th e bi ases are t r a n sformed to h 0 k = h k + J ik + J j k . T hi s mea n s th a t w e need to e v a l uate O ( n 2 ) norm a l i sin g co n sta n ts t o a p pro xim a te the correl a ti o n s. T o app ro xi mat e the normali si n g co n st an ts, w e u se the hi g h er-o r der e xp ansi o n ( 33 ). S i mi larl y , w e at tempt to o b ta i n a m o re a ccurate a p pr o xi ma ti o n t o th e mea n s h s i i , base d o n the id en t i t y , h s i i Q 1 = Q 1 ( s i = 1) = e h i Z  [ i ] Z ( 42 ) 4 46 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 20 40 60 80 100 120 absolute error Frequency Factorized First Order (a) F actori sed rst order. The m ean (abso lute) error i s 0 : 036 (0 : 0 34 w ithout outli ers). 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 50 100 150 200 250 300 absolute error Frequency Factorized Second Order (b) F acto rised seco n d o rder. The m ea n absol u te error is 0 : 007 9 (0 : 00 74 wi th o u t outlie rs). 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 50 100 150 absolute error Frequency Decimatable First Order (c) Decim a table rst order. The m ean (a b s olute) e rror i s 0 : 0 186 (0 : 0 18 1 without o ut l ier). 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 100 200 300 400 500 absolute error Frequency Decimatable Second Order (d) Deci m atable s econd order. T h e m ea n abso lut e erro r is 0 : 00 31 (0 : 002 7 wi th o u t outlier ) . Figur e 3: App ro xi mat i on o f t h e norm a l i z i ng consta n t of a ful l y co n nect ed, e i gh t -n o de Bo l tz - mann mac hi n e ( see F i gur e 2 (c )), wh o se p a r a mete r s are rand o m ly dr a wn from a ze r o -mea n , Ga u ssi a n d is t r ib uti o n wi th un i t v a r iance. A fact ori sed mo del (se e Figur e 2 (a )) i s used a s t h e ap pro xi matin g di stri bu t i on i n ( a) a n d ( b ) and a deci- mat abl e mo del ( see Fi g u re 2 (b)) is u sed i n (c ) a n d (d). Th e h istog r a ms are based on 550 di e r e n t ra n dom dr a wi ngs of t h e paramet ers. F or t h e readabi l it y of the hi stog ram, w e h a v e exc l u ded v e o u t l i e r s (four i n (a) an d (b), one i n ( c) and ( d )) , for whi c h the syn c h ro n ous me an el d itera ti o n d i d n o t con v erge . The m e an e r ror i s t h e mea n o f the ab so l ute v al ue o f equat i on ( 39) . wh e r e Z  [ i ] is th e norm a l i si ng consta n t o f th e o r i g i nal net w o r k i n w hi c h n o de i h a s b een remo v e d a n d the bi ase s are t r a n sformed to h 0 k = h k + J ik . Th e \in tract abl e" Bo l tz man n mac hi n e i n o u r si m ul a ti o n s has 8 ful l y conn e ct ed n o des wi t h b i a ses a n d w e i gh t s d ra w n fr o m a sta n dard ze r o -mea n , u ni t-v a r iance Ga u ssi a n d is t r ib uti o n . In gur e s 5 a n d 6 w e pl o t resu lts for app ro xi mat i ng t h e m e ans h s i i a n d co r relat i ons h s i s j i r e s p e ct i v e l y . As c an b e s e en, the app roa c h w e tak e to a p pr o xi ma te the correl a ti o n s i s a n i mp ro v e m e n t on the stand a r d v ari a ti o n a l f a ct ori sed mo d e l . W e e mp hasi z e th a t, si nce w e a r e pr im a r il y c once r ned wi th app ro xi mat i ng the normal is in g co n sta n t, th e r e ma y b e more sui ta b l e metho d s dedi ca ted 4 47 Barbe r & v an de La ar 0 0.02 0.04 0.06 0.08 0.1 0 20 40 60 80 absolute error Frequency Factorised Paired First and Second Order (a ) F acto rised m o del. 0 0.02 0.04 0.06 0.08 0.1 0 20 40 60 80 absolute error Frequency Decimatable Paired First and Second Order (b) Decim a table m o del. Figur e 4: Dierence b e t w ee n  rst a n d s e co n d-order a p pr o xim a ti o n o f t h e n o r mali zi ng c on- sta n t of a ful l y c onn e ct ed , ei g h t-no d e B ol t zmann mac hi n e usi n g ( a) a fac tori sed and (b) a dec i mata b le mo del . Th e h istog r a ms are b a s e d on the s a me 5 50 ran- dom dra wi ngs as u se d i n F i gur e 3 . Al l 1 100 d i e r e n c es, i ncl ud in g t h e v e non- co n v e r g en t r un s, w e r e l a r g er than ze r o . Aga i n f o r r e adab il i t y o f th e hi stog r a m, the non - co n v erg en t ru ns are not i n c l ud ed. T he me an d i e r e n c e w a s 0.0 28 1 (0 .0 26 9 wi thout the n o n -c on v erge n t r un s) a n d 0 .01 55 ( 0. 015 4 wi thout the n o n - con v erge n t ru n) fo r t h e fac tori se d mo del and t h e d e ci ma ta b l e mo d e l , resp e ctiv e l y . to the ta sk of app ro xi mat i n g the c orr e l at i ons themsel v es. In deed, one of t h e dra wbac ks of these p e r turb a ti v e app roa c hes i s t h a t p h ysi c al co n st r a i n ts, suc h as th a t momen ts sh o u l d b e b oun ded b e t w een 0 and 1, ca n b e vi o l a ted. 5.2 .3 Lear ni n g Th e resu lts of t h e pr e vi ous sec ti o n sh o w t h a t, f o r ran doml y c h o sen c on nec ti o n matri c es J , t h e hi g h er-o r der terms c an s igni  c an t l y i mp ro v e t h e a p pr o xim a ti o n to th e n o r mali si ng co n st an t o f the di stri b ution . H o w ev er, i n a l ea rn i ng sce n a r i o , s uc h resu lts m a y b e o f l ittle i n t erest sin ce the connect i on mat r ix w il l b e co m e i n t i mat el y relat ed to the pat tern s to b e l e arn e d |that i s, J w il l t ak e on a f o r m t h a t i s a p pr o p ri a te f o r stori ng t h e p a tte r ns i n the train i ng set as l e arn in g p ro gresses. W e are therefore i n te r e sted here i n train i ng n e t w orks on a set o f vi si bl e p a tt ern s. Th a t i s, w e sp l it the n o des of t h e net w o r k i n t o a set o f vi si bl e S V and a set o f hi d den un its S H wi th the aim o f l e arn i ng a set o f pat terns S 1 V : : : S p V wi t h a Bo l tz mann m a c h in e Q 1 ( S V ; S H ). By us in g th e K L di v e r g ence b et w ee n a n a p pr o xi ma ti ng di stri b uti o n ~ Q 0 ( S H j S V ) o n the hi dd en un i t s and th e c on di t i onal d is t r ib uti o n Q 1 ( S H j S V ) w e obtain t h e fol lo w in g b oun d ln Q 1 ( S V )   Z ~ Q 0 ( S H j S V ) l n ~ Q 0 ( S H j S V ) + Z ~ Q 0 ( S H j S V ) ln Q 1 ( S H ; S V ) ( 43 ) F or t h e c ase of Boltzma n n mac hi nes, l n Q 1 ( S H ; S V ) = H ( S H ; S V )  ln Z in w hi c h l n Z i s (a ssu med) i n t r a cta b le. In o r der to obta i n a n a p pr o xim a ti o n t o t h i s quan tit y , w e t h e r e fore i n t r o du c e a fu rther v ari a ti o n a l di stri b ution, Q 0 ( S V ; S H ) (not the sa m e a s ~ Q 0 ( S H j S V ) ) wh ic h ca n b e u sed as i n ( 10 ) to obtain a lo w er b o u nd o n l n Z . Unf o r t u nat el y , u sed i n ( 43 ), thi s 4 48 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions 0 0.02 0.04 0.06 0.08 0.1 0 50 100 150 absolute error Frequency Factorized First Order estimates of first moments (a) The m ea n a b s ol u te erro r in appro x i m ating th e m ea n s h s i i i s 0. 035 0. 0 0.02 0.04 0.06 0.08 0.1 0 50 100 150 200 250 300 350 absolute error Frequency Factorized Second Order estimates of first moments (b) The m ea n a bsol u te e rror in a p pro xim ati n g the m ea ns h s i i is 0 .01 29. Figur e 5: App ro xi mat i n g th e m e ans h s i i fo r 1 000 r a n doml y c h o sen 8 no d e f ul l y co n nec ted Bo l tz mann ma c h i nes. Th e w eigh ts w ere d ra w n from t h e sta n dard normal d is t r i- bu tion. (a) Th e sta n dard v ari at i onal app roa c h i n w hi c h the m e ans are estim a ted b y t h e mea n s of t h e b est app ro xi mat i n g fa cto r ised d is t r ib uti o n . (b ) Usin g the r a - tio of n o r mali si ng consta n ts ( 41 ), i n whi c h sec ond -o r der correct i o n s are i n c l ud ed i n app ro xi mat i n g the normal is in g co n sta n ts. Usi ng the ra ti o of n o r ma l i si ng c on- sta n ts wi thout t h e h i g h e r o rd er c orr e ctions ga v e a m e an err o r of 0. 04 6, s li gh t l y w o r se than the st and a r d resu lt v a r iational r e s ul t . 0 0.02 0.04 0.06 0.08 0.1 0.12 0 50 100 150 absolute error Frequency Factorized First Order estimates of correlations (a) M ean a b s olute e rror = 0.0 329 0 0.02 0.04 0.06 0.08 0.1 0.12 0 50 100 150 200 250 300 350 absolute error Frequency Factorized Second Order estimates of correlations (b) Me an a b s olute e rror = 0. 010 3 Figur e 6: App ro xi mat i n g the c orr e l at i o n s h s i s j i for 10 00 r a n doml y c hosen 8 n o de ful l y co n nec ted B ol t zmann m a c h in es. Th e w e i gh t s w ere d ra w n from t h e stand a r d norm a l d is t r ib uti o n . ( a) The stand a rd v ari a ti o n a l app roa c h i n whi c h the mea n s are e s t i mat ed b y the m e ans o f t h e b e s t a p pro xim a tin g fac to r i se d d istr ib uti o n , an d (b) u si ng the ratio of n o r mali si ng consta n ts, i n c l u di ng seco n d-o r der c orrect i ons. Without sec ond - ord e r correct i ons, t h i s gi v es a mea n err o r of 0 .04 28 . 4 49 Barbe r & v an de La ar l o w e r b o u nd on l n Z d o e s not g i v e a l o w e r b o u nd on th e l i k el i ho o d o f t h e vi sib l e u ni ts Q 1 ( S V ) . Nev erth e l ess, w e m a y hop e t h at th e a p pro xim a tion t o the l i k el ih o o d g r a d i e n t i s su c i en t l y ac cur a te su c h that asc en t of the l i k el ih o o d ca n b e made. T akin g the deri v a ti v e of ( 43 ) wi t h resp e ct t o t h e p a r a mete r s of the B ol t zmann m a c h i ne Q 1 ( S H ; S V ) , w e a r ri v e a t the f o l l o wi ng lea r ni n g ru le for g r a d ien t asce n t g i v en a p a tte r n S V :  J =   h s i s j i ~ Q 0 ( S H j S V )  h s i s j i Q 1 ( S H ; S V )  ( 44 ) wh e r e  is a c hose n lea r ni n g rat e. Th e co rr e l at i ons i n the rs t te r m o f (4 4) are s t r a i g h tfor- w a r d t o co m pu t e i n the ca s e o f usi n g a fa cto r ised di stri b uti o n ~ Q 0 . T he \free " exp ec ta ti o n s i n the sec ond te r m a r e m o r e troub l e some a n d n e ed t o b e app ro xi mate d in some m a n ner. W e exa m in e usi n g t h e sta n dard fac tori sed m o del app ro xi mat i ons fo r the f ree c orr e l a ti o n s and mo r e a cc u rat e \hi gher-o r der" a p pr o xi ma ti o n s as d e scri b ed i n se ction ( 5.2 .2 ). He r e , w e are p ri mari l y co n c ern e d wi t h m o n itori ng th e li k e l i ho o d b ou nd (4 3) u nd e r su c h gra d i e n t dy nami c s, so w e wi ll l o o k at the e xac t b o u nd v a l ue, it's a p pr o xim a ti o n usi ng a fac tori sed mo d e l for ln Z (2 9) , a n d i t 's sec ond - ord er c orr e ct i on (3 3) . W e consi d e r a small l e arn i ng probl em wi th 3 hi dd e n un its and 4 vi si bl e u ni ts. T e n visi b le train i ng p a tte r ns w ere f o r me d in wh ic h t h e el e m e n ts w e r e c h o sen to b e o n (v alu e = 1 ) wi th pr o b a b i li t y 0.4 . I n gur e 7( a) w e demonstrate v ari a ti o n a l l e arn in g i n w hi c h the free s t at i s- tics, r e qu i red f o r t h e l ik e l i ho o d gradi en t asce n t r ul e ( 44) , are app ro xi mat ed wi t h a si mp le fac tori se d assu mpti o n , h s i s j i Q 1 ( S H ;S V )  m i m j ( i 6 = j ) . W e d i spl a y th e t i me ev o l uti o n of the exa ct tr a i ni n g p a tte r n l i k el ih o o d b o u nd 3 (43 ) (so l i d l i ne), a n d t w o app ro xi mat i ons to i t : the r st a p pro xim a te s t h e i n trac tabl e l n Z te r m b y u si ng t h e s t and a r d v ari at i onal app roa c h wi th a f a ct ori sed mo d e l (dashed l i ne) . Th e sec ond a p pr o xim a ti o n u ses t h e sec ond -o r der co r rec ti o n t o the fact ori sed m o del v a l u e f o r ln Z (d o t-dash l i ne). The l ea r ni ng rate w as x e d a t 0 .05 . In moni to ri n g the p rog ress of l ea r ni ng, t h e hi g h e r -o rd er co r rec ti o n t o the li k e - l ih o o d b oun d c an b e see n t o b e a mo r e ac cur a te app ro xi mat i on o f the l i k el ih o o d co mp ared to t h a t g i v e n b y t h e u se o f the st and a r d v a r i a ti o n a l app ro xi mation alone. In te r e sti ngl y , us in g t h e sec ond - ord e r co r rec ti o n to th e li k e l i ho o d , a ma xi m um i n t h e li k e l i ho o d i s d e tec ted at a p oi n t cl o se t o sat u rat i on of t h e exa ct b o u nd . Th i s is n o t the c ase u si ng t h e  rst-o r der app ro xi mat i o n a l o n e and , w ith the s t and a r d a p pr o xi ma ti o n to the l i k e l i ho o d , th e dyn a m ics of th e l e arn in g p ro ce ss , i n te r ms o f the t r a i ni n g set l ik eli h o o d, a re p o o r ly moni to r e d . In  g u re 7( b ) the gradi en t dyn a m ics a re pro vid e d aga i n b y equ a tion (4 4) bu t n o w wi th t h e free co rr e l at i ons h s i s j i Q 1 ( S H ;S V ) a p pr o xi ma ted b y the more ac cur a te ratio-o f- norm a l i si ng- co n sta n ts m e tho d , a s desc r i b e d i n sect i on (5. 2.2 ). Ag ain , w e s e e a n i mp ro v e - men t i n the ac cur a cy of m o n ito r i ng the pr o gress of th e l ea r ni ng d ynami c s b y the si mp le i ncl usi on of t h e h i g h e r - ord e r t erm a n d, a gain , t h e h i g h e r - ord e r a p pr o xim a ti o n di sp la ys a maxi m u m at roughl y the co r rec t p osi t i on. T he l ik eli h o o d i s hi g h e r than in  g u re 7 (a) s in c e the g r a d i e n t o f t h e li k e l i ho o d b o u nd i s more a ccur a te l y app ro xi mat ed t h rough th e more ac cur a te c orr e l a ti o n estim a te s. No te that th e r e ason th a t the exa ct li k e l i ho o d l o w e r b oun d ca n b e a b o v e 1 is du e t o p a tt ern r e p et i ti o n s i n t h e t r a i ni n g se t. T he i ns t abi l iti e s i n l ea r ni ng aft er a p pr o xi ma tely 12 0 u p dat es a r e du e t o th e e m e r g ence o f m ul ti mo dal it y i n the l e arn ed di stri b uti o n Q 1 ( S H ; S V j J ), i nd i c at i n g that a m o r e p o w e r fu l app ro xi mat i on met h o d, a b le 3. I n practice , th e lik eliho o d on a tes t set is m ore a p propria te. H o w ev er, th e s m all n um b er of p oss ible p a t ter n s here (16) m ak es it ra th e r dicult to fo rm a represe n tati v e test set. 4 50 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions 0 10 20 30 40 0 0.5 1 1.5 2 time total pattern likelihood Learning using standard MF estimates ( a ) Le arning u s ing dy nam ic s in whic h th e f ree co rrela t i ons are app ro x im a ted u s ing a fa ctoris ed m o del. 0 50 100 150 200 0 0.5 1 1.5 2 time total pattern likelihood Learning using second order estimates (b ) Lea rn i n g u si n g d yn a m ics i n whic h the fre e correl atio n s are gi v en b y t he rati o of n o rm ali sing consta n ts approa c h . Figur e 7: Lea r ni ng a se t of 1 0 r a n dom, 4 -vi si bl e -u ni t patt ern s w i t h a Bo l tz m a n n mac hi ne wi th 3 hi dd e n un i t s. T he soli d l i ne i s t h e exac t v a l u e for the to ta l l ik eli h o o d b oun d (4 3) o f al l the p a tte rn s in t h e t r a i n in g set . Th e d a sh e d l i ne is the l i k el ih o o d b oun d app ro xi mat i o n b a s e d on usi n g a fa cto r is e d mo d e l t o app ro xi mate ln Z , equat i on (2 9) . Th e dash - d o t li n e us e s the s e co n d o r der co r rec ti o n to l n Z , equat i on (3 3) . to ca p t u re su c h m ul ti mo dal e ec ts, sh o u l d b e used to o b t ai n fu rther i mp ro v e m e n ts i n the l ik eli h o o d. 6. Di scussion In thi s arti c l e w e ha v e descri b e d p ertur bat i onal a p pr o xi ma ti o n s of i n trac ta b l e prob a b il i t y di stri b uti o n s. Th e app ro xi mations a re b a sed on a T a yl o r se r ies u si ng a fami l y o f p rob- abi l it y di stri b uti o n s that i n t erp ol a te b et w e en t h e i n tra cta b le di stri bu t i on a n d a t r a cta b le app ro xi mat i o n theret o. T hese app ro xi mat i ons ca n b e seen as a n e xte n si o n o f the v ari a ti o n a l met h o d, al t h o u g h they n o longe r b oun d th e qu a n ti t i es of i n te r e st. W e h a v e i l lu strat ed o u r app ro ac h b oth theo r e ti c all y and b y co m pu t er si m ul at i ons for Bo l tz mann mac hi nes. Th e se si m ul a ti o n s sh o w e d t h a t t h e app ro xi mation ca n b e i mp ro v e d b e y ond th e v ari a ti o n a l b oun d b y in clu di n g hi gher-o r der te r ms o f the c orresp ond in g cum u lan t expansi on. Si m ul at i ons sho w ed t h a t the ac cur a cy i n moni to r in g t h e t r a i n in g se t l i k el ih o o d du ri ng the l ea r ni ng p ro ce s s ca n b e im pro v ed b y i ncl ud i ng h igher o r der c orr e ctions . Ho w ev er, these p e r t u rb a ti o n a l ap pro x im a tions ca n not b e e xp ec ted t o co n si st en tly i mpr o v e on z eroth order (v a r iational ) so l u t i ons. F or i nsta n c e, i f t h e di stri bu tion i s stron g l y m u lti mo d a l , t h en us in g a un i mo dal v a r iational di stri bu t i on ca n not b e e xp ect ed to b e i mpr o v ed m u c h b y the i ncl usi on o f h igher-order p ertur bat i onal te r ms. On the other hand , h i g h e r - ord e r co r rec ti o n s to m ul tim o dal app ro xi mat i ons ma y w e l l im pro v e the so l u t i on c ons id e r a b l y . 4 51 Barbe r & v an de La ar W e e l u c i date d the rel a ti o n shi p o f o u r app roa c h to T A P , argui ng that o u r app roa c h i s exp e ct ed to oer a m o re sta b l e solu tion. F u rth e r mo r e , the a p pl i c ation of our app roa c h t o mo d e l s o th e r than the Boltzma n n mac hi n e i s tr a n sparen t . Ind e ed, these tec hn i ques are readi l y a p pl i c abl e t o o th e r v a r iational met h o ds i n a v ari et y o f c on t exts b o th fo r di scret e and co n ti n uous systems ( Barb er & B i sh o p , 19 98 ; W i eg eri nc k & Ba r b e r , 19 98 ). On e d ra w bac k o f t h e p ertu rbational a p pr o ac h th a t w e h a v e d e scri b ed i s that kno w n \ph ysi ca l " co n strain ts, for exa mp l e that m o men ts m u st b e b oun ded i n a c ertain range , are n o t n ec essari ly adh e r e d to. It w o u l d b e u se fu l to d e v e l op a p erturb a ti o n me th o d t h a t ensu res t h a t so l u t i o n s a r e at l e ast p h ysi c al . W e h o p e to appl y t h ese met h o ds to v a r iational te c hn iqu e s ot h e r than the stand a r d Ku l lb a c k-Lei bl er b oun d, an d al so to stud y the e v al uat i on of other s ui t abl e cri t eri a for the set ti ng o f the v ari at i onal p a r a mete r s. Ac kn o wl edgmen t s W e w oul d l ik e to t h a n k T om H esk e s, Be r t Kapp en, Ma rti jn Lei sin k, P ete r S o l l ic h and Wi m Wieg eri n c k f o r sti m u l a ti ng and h e l p ful di scussi ons a n d t h e referee s for e xcell en t co mm e n ts on a n earl i e r v e r si o n of t h i s pap e r . App en di x A . In t h i s a p p e n di x w e d e s c r ib e ho w t o o p tim ize th e v ari a ti o n a l b oun d wh en the normal izat i on co n st an t o f an i n trac ta b l e B oltzmann mac hi ne i s ap pro x im a te d u sin g a n o th e r t r a cta b le Bo l tz mann m a c h i ne. F or c on v eni ence , w e den o te t h e p o ten t i al o f an in tract abl e Bo l tz man n Ma c h i ne as H 1  X I w I s I , ( 45 ) wh e r e the \ ext en ded" p a r a m e te r s a r e giv e n b y  I 2 f  i ;  ij j i 2 [1 ; : : : ; N ] ; j = [ i + 1 ; : : : ; N ] g and s I 2 f s i ; s i s j j i 2 [1 ; : : : ; N ] ; j = [ i + 1 ; : : : ; N ] g . Th e p o te n ti a l o f the o ther t r a cta b le Bo l tz mann M ac hi n e i s denot ed as H 0  X I 2   I s I , ( 46 ) wh e r e  d e n o te s the se t o f p a r a m e te r s o f the tr a ct abl e Boltzma n n Ma c h i ne. W e w a n t t o opti mi z e the b oun d o f equat i on ( 20 ) w i t h r e sp ec t t o the f ree parame ters  J ( J 2  ) , i . e., @ @  J [ l og Z 0 + h  H i 0 ] = 0 , ( 47 ) wh i c h l e ads to h  H s J i 0  h  H i 0 h s J i 0 = 0 ( 48 ) 4 52 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions and th us to the xed p o i n t equ a ti o n 4  I = X J 2  X K [ F    1 ] I J F J K w K , ( 49 ) wh e r e the Fis her mat r i x is gi v en b y F I J = h s I s J i 0  h s I i 0 h s J i 0 wh ic h dep e n ds o n t h e fr e e paramet ers  , and F  i s t h e Fi sher matri x o f the t r a cta b le B ol t zmann ma c h i ne. F o r a fac tori se d app ro xi mat i n g mo del , F  i s a d iag onal matri x and equat i on (49 ) si mp li  e s t o equat i on ( 32 ). F or a decim a ta b le B oltzma n n mac hi n e , the el e m e n ts of t h e Fish er m a tri x are trac ta b l e . Th e i t eration ca n b e p erformed eith e r syn c hr o n o u sl y or a syn c hr o n o u sl y (P e te r son & A n ders o n , 1 987 ). W e p refer h e r e t h e sync h ronous c ase si nce for al l N paramet er-u p dat es w e onl y n e ed to c al c u l a te a n d i n v ert a s in g l e Fi sher mat r ix i nstea d of N mat r i c es i n the async h ronous ca se. In mo s t a p pl i c ations, ho w ev er, th e a syn c hr o n o u s me th o d se ems t o b e adv an t ag eous w i t h resp ec t to c on v erge n c e (P e te r son & And e r so n , 19 87; A n sari et a l ., 19 95 ; Sau l et a l ., 19 96) . References Ac kl e y , D. H., Hi n t on , G . E., & Sejno w ski , T. J. (1 98 5) . A lea r ni n g algo r i t h m for Bo l tz man n mac hi nes. Co gnitive Scienc e , 9 , 14 7{1 69 . Ansari , N. , Hou, E. S . H., & Y u, Y. (1 99 5). A n e w m e tho d to opti mi z e the sa tell i t e br o adca sti ng sc hedu les u si ng the m e an eld ann e al i ng o f a Ho p el d neur a l n e t w ork. IEEE T r ansa ctions on Ne ur al Netwo rks , 6 ( 2), 47 0{ 48 3. Ba r b e r , D., & B i sh o p , C. M. (19 98 ). Ens e m bl e l ea r ni ng i n Ba y esi a n n e u ral n e t w orks. I n B i sh o p , C . M . (Ed . ), Pr o c e e d i ngs of th e N ew ton Inst i t u t e Pr o gr am on Neu r al Netwo rks a nd M a chine L e ar ning . Kl u w e r . Ba r b e r , D ., & W i eg eri nc k, W . (1 999 ). T r a ct abl e v ari a ti o n a l st r uct u res f o r app ro xi mat i ng g r a p hi ca l m o del s. In K e arn s, M. , Soll a, S., & Cohn , D. (Eds.), A d v a nc es in Neur a l Info rma tion Pr o c ess i n g Syst e m s NIPS 1 1 . MIT Pr e ss. Bish op, C. M . ( 19 95) . Neu r al Ne t wo rks fo r Pat tern R e c o gnitio n . Cl a r e n don Pr e s s, O xfo r d. Bra y , A. J., & Mo o r e , M . A. (19 79 ). Evid ence fo r massl ess m o des i n the `solv abl e mo d el' o f a s pi n glass. Jo u r nal o f Physics C: So lid Sta te Phys ic s , 1 2 , L 4 41 {L44 8. De c h ter, R. (1 99 9) . B u c k et e l i mi nat i on: A un if yin g framew ork fo r pr o b a b i li sti c in ference . In Jord a n , M. I . (Ed.), L e a rning in Gr a ph i c al M o d e l s , pp . 75 {1 04 . Kl u w e r . Ga l l a n d, C . C. ( 19 93 ). Th e l im itat i ons o f dete r mi ni stic B oltzma n n mac hi n e learni n g . N et- w ork: Co mputa tion in Neur a l Systems , 4 , 3 55{ 37 9. Ge man , S., & G eman, D . (1 98 4) . Sto c hastic rel a xat i on, Gi bb s di stri bu tions , a n d the B a y esian resto r a ti o n of im a ge s . IE EE T r a nsa ctions o n P a tter n A nal ysis an d M a chine Int e l l i genc e , 6 (6) , 7 21 {7 41. 4. This xed p o in t eq uati on is t he g enerali zed m ean eld equatio n . This f orm o f i teratio n is not re stricted t o B ol t z m ann m ac hines bu t is a genera l c h a racteri stic of e x p onen t i al m o d el s. 4 53 Barbe r & v an de La ar Gil ks, W. , Ri c hard so n , S., & S pi eg el halter, D . ( Ed s.). (1 99 6) . Mar k o v cha in Monte C arl o in pr at i c e . Ch a p man a n d Ha l l . Gri mmett , G . R. , & S t i rzak er, D . R . ( 19 92 ). Pr ob a b il i t y a nd R a ndo m Pr o c e s ses (Sec on d Edi tion ) . Oxford S c i ence Pu bl i c ations. Cl arend o n Pr e s s, Oxf o r d. Itz yks o n , C ., & Droue, J.-M . (19 89 ). Sta tistic a l Field Th e o ry . C a m br i dge M onog r a p hs on M athema ti c al P h ysi c s. Cam b ri dge Un iv ersi t y Press. Jaa kk o l a, T. S . ( 199 7) . V ar iatio nal M eth o d s for Infer enc e a nd Estim atio n in Gr ap hic a l Mo dels . Ph .D . t h e si s, M assac h us e tt s In stitu t e o f T ec hn o l o gy . Jaa kk o l a, T. S., & Jord a n , M. I. (1 99 8). Im pr o vin g the mea n  e l d app ro xi mat i on via the use o f mi xtur e di stri b ution s. In Jo r dan, M . I. (Ed.), L e a rning in G r a ph ic al M o d e l s , V ol. 8 9, p p. 1 63 {17 3. Kl u w er. Jord a n , M. I., Gharamani , Z . , Jaa k o l a, T. S ., & Saul , L . K. (1 998 ). An In tro du ct i on t o V a r i a ti o n a l Met h o ds for G r a p hi ca l M o d e l s. In Jordan , M . I. (Ed . ), L e a rning in G r a ph ic al M o d els , p p. 1 05{ 16 1. Kl u w er. Kapp en, H. J., & Ro d r   g u e z, F . B. ( 19 98 a). Bo l tz m a n n mac h in e lea r ni n g u si ng mea n el d t h e ory a n d l i near r e sp onse co r rec ti o n . In Jord a n , M. I., K e arn s, M. J., & S o l l a , S. A. (Ed s. ), A dvan c es in Ne ur al Infor mat ion Pr o c e s sing System s 10 , p p. 280 {2 86 . Cam b ri d g e. Th e MIT Pr e ss . Kapp en, H. J., & R o dr   guez , F. B . ( 199 8b). Eci e n t l e arn in g i n Boltzma n n m a c h in es usi ng li n e ar resp onse t h e ory . N eur al Com puta tion , 10 (5 ), 11 37 {1 156 . La w rence , N. D ., Bi shop, C. M., & Jo r dan, M. I. ( 19 98 ). M i xtur e r e p resen t ations fo r i nfer- e n c e a n d learn in g i n B ol t zmann mac h in e s . In F o urte enth Co nfer enc e on Unc erta inty in A rt i  c ia l I ntel lig enc e . Ma cKa y , D . J. C . (19 95 ). Pr o b a b l e n e t w o rks a n d p l a u sib l e p redi ct i ons - a review of p rat i ca l B a y esian me th o ds f o r sup ervi sed n e u ral net w o r ks. Netwo rk: Co mp u t atio n i n Neur a l Sy stems , 6 (3) , 4 69 {5 05. Ne al , R. M . ( 199 3) . Pr o b a b i li sti c i nf e ren c e usi ng M ark o v c hai n Mo n te C a r lo me th o ds. T e c h . r e p ., Departmen t of Co m pu t er S c i ence , Uni v e r si t y o f T oro n to. C R G-TR-9 3-1 , h t tp : // w ww.cs.t oron t o.e d u/  r a d fo r d/pap e r s-o n li n e .h tml . Ne m o to , K., & T ak a y a m a , H. (1 985 ). T AP fr e e en e r g y st r uct u re of SK spi n glasses. Journa l o f Phys i cs C: S olid Sta te Ph ysics , 1 8 , L52 9{L5 35 . P a r isi , G . (1 988 ). Stat i s tic al F i eld The o ry . F ron t i ers i n Ph ysi cs Lec tur e N ote S e r i e s. Ad di son W e s ley. P e arl , J. (19 88 ). Pr o b abilist i c R e a soning in I ntel lig ent System s . M orga n K a u fmann , S an F ranci sc o. P e te r son, C., & And e r son, J. R . ( 19 87) . A mea n  e l d theo r y l e arn in g a l go r ith m for neur a l net w o r ks. Comp lex Systems , 1 , 9 95 {10 19 . 4 54 V aria tional Cumulant Ex p ansions f or Intra ct able Distributions Pl e f k a , T . ( 19 82 ). C o n v erg en c e co n di tion o f t h e T A P e qu a ti o n for t h e in n i t e-ra n ge d Isi ng spi n gl a ss mo d e l . Jo u r nal of Physics A : M a thema tic a l a nd gener a l , 1 5 , 19 71 {1 97 8. R  u g er, S . M. (19 97 ). Eci e n t i nf e r e n c e and learni n g i g decim a ta b l e b ol t zmann mac hi n e s. T e c h . r e p . T R 97 -5, F ac h b ereic h Inf o rm a ti k d e r T ec h ni sc h e n Un i v ersi t a t B erl i n. Sau l, L., & Jo r dan, M. I. (1 99 4) . Lea r ni ng i n B oltzma n n t r e es. N eur al Co mp u t atio n , 6 (6) , 1 17 4{ 118 4. Sau l, L . K., Jaa kk o l a, T., & Jord a n , M . I. (19 96 ). Me an  e l d theo r y f o r si g m o i d b eli ef net w o r ks. J o urnal o f A r tic a l Int e l l i genc e R ese ar ch , 4 , 61{ 76 . Sp i v a k, M . (1 96 7) . Ca lculus . W .A. Benjami n, Inc., Lond o n . Th o u l e ss , D. J., And e r son, P . W ., & P a l mer, R. G. (1 97 7). S o l uti o n of `sol v a b l e mo del of a spi n gl a ss'. Ph i l oso ph ic al M a gazine , 35 (3 ), 59 3{ 601 . Wieg eri n c k, W., & Ba r b e r , D. ( 199 8) . M ea n  e l d t h e ory base d on b el ief n e t w o rks for a p pr o xim a te in ference . In I CA NN 9 8: Pr o c e e dings of t he 8 th Inter natio nal Co nfe r enc e o n A rticial Ne ur al Netwo rks . 4 55

Variational Cumulant Expansions for Intractable Distributions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment