Error correcting code using tree-like multilayer perceptron
An error correcting code using a tree-like multilayer perceptron is proposed. An original message $\mbi{s}^0$ is encoded into a codeword $\boldmath{y}_0$ using a tree-like committee machine (committee tree) or a tree-like parity machine (parity tree)…
Authors: : Hosaka M., Mimura N., Shinzato T.
Statistical mec hanics of error correcting co de using monotonic and non-monotonic tree-l ik e m ultila y er p erceptrons Floren t Cousseau Gr aduate Scho ol of F r ontier Scienc e s, University of T okyo, Chib a 2 77-5861, Jap an ∗ Kazushi Mim ura F a culty of I nformation Scienc es, Hir oshima City University, Hir oshim a 731-3194, Jap an † Masato Ok ada Gr aduate Scho ol of F r ontier Scienc e s, University of T ok yo, Chib a 2 77-5861, Jap an Br ain Scienc e Institute, RIKEN, Saitama 351-0198, Jap a n (Dated: No v ember 4, 2018) Abstract An error correcting co de using a tree-lik e m ultila yer p erceptron is pr op osed. An original message s 0 is enco ded in to a co dew ord y 0 using a tree-lik e committee mac hin e (committee tree) or a tree-lik e parit y mac hin e (parit y tree). B ased on these arc hitectures, seve r al sc h emes featuring monotonic or non-monotonic units are introdu ced. The co dew ord y 0 is then transmitted via a Binary Asymmetric Channel (BA C) where it is corru pted b y noise. The analytical p erformance of these sc h emes is in vestig ated using the replica metho d of statistical mec hanics. Under s ome sp ecific conditions, some of the prop osed schemes are sho wn to saturate th e S hannon b ound at the infinite co dew ord length limit. The influ ence of the monotonicit y of the units on the p erform an ce is also discussed. ∗ Electronic addres s: flore n t@mns.k.u-to k y o.a c.jp † Electronic address: mimura@hiros hima-cu.ac.jp 1 I. INTR O DUCTION Reliabilit y in comm unication has alw ays b een a ma jor concern when dealing with dig it a l data. Esp ecially in to day ’s informat ion-dep endent so ciet y , it is vital to design efficien t w a ys of prev en ting data corruption when tr a nsmitting information. Error correcting co des hav e b een dev elop ed for this purp ose since the birth of the information theory field following the w ork of Shannon [1]. In 198 9, Sourlas deriv ed a set of error correcting co des, the so called Sourlas co des, whic h theoretically saturat e the Shannon b ound [2]. Although these co des turned out to be impractical, the main p oint of interest of t his pap er w a s the para llel made b etw een ph ysical spin glass system s and information theory . F ollowing this paper, t he to ols of statistical mec hanics hav e b een succes sfully applied to a wide range o f problems of information theory in recen t y ears. In the field of error correcting co des itself [3 – 5], as well as in spreading co des [6, 7], and compress io n co des [8–13 ], statistical mec hanical tec hniques ha ve shown great p otential. The presen t pap er uses similar tec hniques to in v estigate an error correcting co de sc heme where the co deword is enco ded us ing tree-lik e m ultilay er p erceptron ne ura l net w orks. It is kno wn that there exists a natural duality betw een lossy compression co des and error correcting codes. Indeed, a los sy compression code can be regarded as a standard error correcting co de, but one where the co dew or d is generated using the original deco der of the error correcting code sc heme and where the decompressed message is obta ined using the original enco der of the sc heme ( Cf. [14 ] fo r details). Recen tly , a lo ssy compression sc heme based on a simple p erceptron deco der w as in v es- tigated by Hosak a et al. [10]. In their pap er, they used statistical mec hanical tec hniques to inv estigate the theoretical p erformance of their sc heme at the infinite co dew ord length limit. The p erceptron they defined in their mo del uses a sp ecial hat- shaped non-monotonic transfer f unction. This r a ther uncommon feature enables the sch eme t o deal with biased messages and it is known t hat t his t yp e o f f unction ma ximizes the storage capacity of the simple p erceptron [15, 16]. They found t ha t their sc heme can theoretically yield Shannon optimal p erformance. Subsequen tly , Shinzato et al. [17] in ves tiga ted the same mo del but in the f ramew or k of error correcting co de. They found tha t their mo del can theoretically yield Shannon optimal p erformance. 2 Based on these studies, Mim ura et al. [12] prop osed a t r ee-lik e m ultilay er p erceptron net w ork for lossy compression purpo ses, but use only the standard sign function as t he transfer function of their mo del. They sho w ed that the parity tree mo del can theoretically yield Shannon optimal p erformance, but only when conside r ing unbiased messages. In con- trast, they sho wed that the committee tree mo del cannot yield optimal p erformance, ev en for un biased messages. How eve r, the adv antage of using a multila yer structure is impro v ed replica sym metric solution stabilit y , and an increased n um b er of co dew ords sharing the same distortion prop erties [1 8]. In a recen t study , Cousseau et al. [13] inv estigated the same t r ee- lik e m ultilay er p erceptron mo del but used the hat- shaped non-monoto nic transfer function in tro duced b y Hosak a et al. [10], th us com bining b oth adv a n tages of [1 0, 12]. By doing so, they w ere a ble to sho w that b oth parit y tree and committee tree structures can then theo- retically yield Shannon optimal p erformance ev en for biased messages under some sp ecific conditions. The purp ose of the presen t pap er is to discuss the p erformance of t he same tree-lik e p erceptron mo dels but in the error correcting co de framew ork, thus completing the to pic of p erceptron t yp e net w ork a pplicatio ns in co ding theory . In this pap er, we mak e use of the Binary Asymmetric Channel ( BAC). Indeed, the use of the non- mono tonic hat-shap ed transfer function introduced by Hosak a et al. [10] enables us to control the bias o f the o f the co dew ord sequenc e, and enables the relev ant sc hemes to deal with suc h an asymmetric c hannel (the BAC was also used by Shinzato et al. [17]). On the other hand, w e expect the sc hemes whic h use the standard monotonic sign function to b e able to deal o nly with the BSC c hannel, whic h corresp onds to a particular case of the BA C. The ma jority of p opular error correcting co des like turb o co des [19] and lo w densit y parit y chec k co des ( L DPC) [20, 21], whic h provide near Shannon p erformance in practical time frames, hav e b een widely studied but this w as generally restricted to symmetric c hannels. On the other hand, apart from a few studies [2 2, 23], little is kno wn when dealing with asymmetric c hannels. Multila y er p erceptrons ha ve b een widely studied ov er the y ears b y the machine learning comm unit y and a wide r ange of problems hav e b een considered (storage capacit y , learning rules, etc). These w orks rev ealed non-trivial b eha viors of eve n simple mo dels lik e the simple p erceptron netw ork fo r example. Man y of these previous results are summarized in reference [24]. The presen t analysis giv es us an o ppo r t unit y to discuss the difficult y of deco ding f or densely connecte d systems (or dense syste ms as opp osed to sparsely connected systems lik e 3 LDPC co des for example) using a systematic manner in the con text of multila yer netw orks. There has b een relatively little discussion of dense systems, mainly b ecause of t he compu- tational cost whic h is obv io usly higher than for sparse systems. How ev er, b ecause of their their ric h randomness, dense systems can po ssibly b e regarded as pseudo-random codes lik e the dense limit o f LDPC co des. In this pap er w e mainly f o cus on the necessary conditions to get Shannon optimal p er- formance. T o discuss practical decoders, it is first necessary to inv estigate the optimality of our sc hemes. This includes discussion of the o ptima l pa rameters fo r the transfer f unction since we need to kno w these pa r ameters to discuss the optimal deco der. In other w or ds, w e need a theoretical a nalysis o f the p erformance b efore w e can study the deco ding pro blem. The pap er is organized as follow s. Section I I in tro duces the framew ork of error correcting co des. Sec tio n I I I describes our mo del. Section IV deals with the BAC capacit y . Section V presen t s the mathematical to ols used to ev aluate the p erformance of the presen t sc heme. Section VI states the results and elucidates the lo cation of the phase transition, which c haracterizes the b est ac hiev able p erformance of the mo del. Section VI I is dev oted to the conclusion and discuss io n. I I. ERR OR CORRECTI NG C ODES In a general sc heme, an original message s 0 of size N is enco ded into a co dew ord y 0 of size M by some enco ding device. The aim of this stage is to a dd redundancy to the original data . Therefore, w e necessarily hav e M > N . Based on this redundancy , a prop er deco der device should b e able to recov er the original data ev en if it w ere corrupted b y no ise in the transmission c hannel. The quan tit y R = N / M is called t he co de rate and ev aluates the t r a de-off b et we en redundancy and co dew ord size. The co dew o rd y 0 is then fed into a c hannel where the bits are sub ject to noise. The r eceiv ed noisy message y (whic h is also M dimens io nal) is then deco ded using its redundancy to infer the original N dimensional message s 0 . In other words , in a Ba yes ian fr amew ork, one tries to maximize the following p osterior probabilit y , P ( s | y ) ∝ P ( y | s ) P ( s ) . (1) As data transmission is costly , generally one w ants to b e able to ensure error- free t r a ns- 4 mission while transmitting t he few est p ossible bits. In other w ords, one w ants to ensure error-free transmission while k eeping the co de ra te as la r g e as p ossible. F o r this purp ose, the well kno wn Shannon b ound [1] giv es a w ay to compute the b est a c hiev able co de rate whic h allow s error- free reco v ery . Ho w ev er, while this gives us the v alue o f suc h a n optimal co de rat e, it does not give any clue as to ho w to construct suc h an optimal co de. Therefore, sev era l co des ha v e b een prop osed ov er the years in an ongoing quest to find a co de whic h can reac h this theoretical b ound. I I I. ERR OR CORRECTIN G CODES USING MONOTONIC AND NON - MONOTON IC MUL TILA YER PERCEPTR O NS In this pa p er, since w e mak e use of tec hniques deriv ed fr o m statistical mec hanics, we will use Ising v ariables rat her than Bo olean ones. The Bo olean 0 is mapped on to 1 in the Ising framew ork while the Bo olean 1 is mapp ed to − 1. This mapping can b e used without an y loss of generality . W e a ssume that the o r ig inal message s 0 is generated from the uniform distribution and that all the bits are indep enden tly generated so tha t w e hav e P ( s 0 ) = 1 2 N . (2) The c hannel considered in this study is the Binary Asym metric Channel (BA C) where eac h bit is flipp ed indep enden tly of the others with asymmetric probabilities. If the original bit fed in to the channel is 1 , then it is flipp ed with probability p . Con v ersely , if the original bit is − 1, it is flipp ed with probability r . Figure 1 show s the BA C pro perties in details. The y =-1 0 y=-1 y =1 0 y=1 p r 1-p 1-r FIG. 1: The Binary Asymmetric C h annel (BA C ) w ell kno wn Binary Symmetric Channel (BSC) corresp onds to the particular case r = p . When the corrupted message y is receiv ed at the output of the c hannel, the goal is then to recov er s 0 using y . The state of the estimated message is denoted by the v ector s . The 5 general outline of the sche me is shown in Figure 2. F rom Figure 1 w e can easily deriv e the follo wing conditional probability , P ( y µ | y µ 0 ) = 1 2 + y µ 2 [(1 − r − p ) y µ 0 + ( r − p )] , (3) where we mak e use of the notations y 0 = ( y 1 0 , . . . , y µ 0 , . . . , y M 0 ), y = ( y 1 , . . . , y µ , . . . , y M ). Since w e assume that the bits a re flipp ed indep enden t ly , w e deduce P ( y | y 0 ) = M Y µ =1 1 2 + y µ 2 [(1 − r − p ) y µ 0 + ( r − p )] . (4) T o enco de the orig ina l message s 0 in to a co dew ord y 0 , w e use three non-monotonic tree-lik e parity mac hine or committee mac hine neural net works ((I), (I I) and (I I I)). In the same w ay , w e also in v estigate the standard mo no tonic parit y tree and committee tree neural net w orks ((IV) and (V)). (I) Multila y er parit y tree with non-monotonic hidden units (PTH). y µ 0 ( s 0 ) ≡ K Y l =1 f k r K N s 0 l · x µ l ! . (5) (I I) Multilay er committee tree with non-monotonic hidden units (CTH). y µ 0 ( s 0 ) ≡ sgn K X l =1 f k " r K N s 0 l · x µ l #! . (6) Note that in this case, if the n um b er of hidden units K is ev en, it is p ossible to get 0 as t he argumen t of the sign function. W e a v oid this uncertain ty b y considering only an o dd n um b er of hidden units f o r the committee tree with non-monotonic hidden units in the sequel. (I I I) Multilay er committee tree with a non-monotonic output unit (CTO). y µ 0 ( s 0 ) ≡ f k r 1 K K X l =1 sgn " r K N s 0 l · x µ l #! . (7) s 0 y 0 + s y Channel noise Original message (size N ) Codeword (size M ) Received message (size M ) Estimated message (size N ) FIG. 2: La y out of the scheme 6 (IV) Multila yer parit y tree (PT). y µ 0 ( s 0 ) ≡ K Y l =1 sgn r K N s 0 l · x µ l ! . (8) (V) Multila y er committee t ree (CT). y µ 0 ( s 0 ) ≡ sgn r 1 K K X l =1 sgn " r K N s 0 l · x µ l #! . (9) In this case also, if the num b er of hidden units K is ev en, it is a p ossible to g et 0 as the argumen t of the sign function. W e again a void this uncertaint y b y considering only an o dd n um b er o f hidden units for the committee tree in the sequel. The original message s 0 is split in to N /K -dimensional K disjoin t v ectors so that s 0 can b e written s 0 = ( s 0 1 , . . . , s 0 K ). In sc hemes (I), (I I), and (I I I), f k is a no n-monotonic function of a real para meter k of the form f k ( x ) = 1 if | x | ≤ k − 1 if | x | > k , (10) and the v ectors x µ l are fixed N /K -dimensional indep enden t vec t o rs unifo r mly distributed on {− 1 , 1 } . The use of random input v ectors is kno wn to maximize the storage capacity of p erceptron net w o r ks, making suc h a sc heme promising for erro r correcting ta sks . The sgn function denotes the sign function taking 1 for x ≥ 0 a nd − 1 for x < 0 . Eac h of these arc hitectures applies a different non-linear transformat io n to the original data s 0 . The general a rc hitecture of these p erceptron-based encoders and the non-mono t o nic f unction f k are display ed in Figur e 3. No t e that we can also consider an enco der based on a committee- tree where b o th the hidden-units and the output unit are non-monotonic. How ev er, this in tro duces an extra para meter (w e will hav e one threshold parameter for the hidden-units and one for the output unit) to tune and the p erformance should not c hange drastically . F or simplicit y , w e restrict our study to the a b ov e three cases. T o k eep the not ation as general as p ossible, as long as explicit use of the enco der is not necessary in computations, we will denote the transformatio n p erformed on ve ctor s b y the resp ectiv e enco ders using the fo llo wing notation: F k ( r K N s l · x µ l )! . (11) 7 ... ... ... ... ... X 1 X l X K s 1 s l s K u 1 u K u l y 0 0 0 0 µ µ µ µ µ µ µ k -k -1 1 x f k (x) FIG. 3: Left: General arc hitecture of the treelik e multila yer p erceptrons w ith N inp ut u nits and K hidd en units. Righ t: T h e non-monotonic f unction f k . F k tak es a differen t expression for the fiv e differen t ty p es of net work and k denotes the fact that all the enco ders depend on a real threshold par a meter k (except for sche mes (IV) and (V), where this function do es not dep end on k . How eve r for consistency , w e will k eep this notation for these sc hemes). F urthermore, note tha t F k con tains all the terms dep ending on index l (i.e.: F k ( { u l } ) con tains all the terms u 1 , . . . , u l , . . . , u K ). IV. BINAR Y ASYMMETRIC CHAN NEL (BA C) CAP A CI TY In this section, we compute the capacit y of the BA C. According to Shannon’s c hannel co ding theorem, the optimal co de rat e is given by the capacit y o f the c ha nnel. Any co de rate bigger than the c hannel capacit y will inevitably lead to informatio n loss. The definition of the c hannel capa cit y C is C = max input probabilit y { I ( X , Y ) } , (12) where I denotes mutual information, X denotes the c ha nnel input distribution, and Y denotes the c hannel output distribution. Computation of t he capacity of suc h a binary c hannel requires only simple algebra and calculations are straigh tforward, giving C B AC = H 2 ( γ C ) − 1 + Ω C 2 H 2 ( p ) − 1 − Ω C 2 H 2 ( r ) , (13) 8 where H 2 ( x ) = − x log 2 ( x ) − (1 − x ) log 2 (1 − x ) , (14) γ C = 1 1 + ∆ C = 1 2 [(1 − p )(1 + Ω C ) + r (1 − Ω C )] , (15) ∆ C = r r (1 − r ) 1 − r p p (1 − p ) 1 − p 1 / 1 − r − p , (16) Ω C = 2 γ C − 1 − r + p 1 − r − p . (17) In the special case r = p , the capacit y simplifies to C B S C = 1 − H 2 ( p ) , (18) whic h corresp onds to the capacit y of the BSC. V. ANAL YTICAL E V ALUA TION As stated in section I I, our goal is to maximize the p osterior P ( s | y ). Let us define the follo wing Hamiltonian: H ( y , s ) = − ln[ P ( s | y ) P ( s )] = − ln P ( y , s ) . (19) The ground state of t he ab o v e Hamiltonian trivially corresp onds to the ma ximum a p osteriori (MAP) estimator of the p osterior P ( s | y ). Then, let us compute the join t probability of y and s . W e ha v e P ( y , s ) = P ( y | s ) P ( s ) . (20) Since the relation b et wee n an arbitra ry message s and the co dew ord fed in to the c hannel is deterministic, for an y s , w e can write P ( y | s ) = P y F k ( r K N s l · x µ l )!! , = M Y µ =1 ( 1 2 + y µ 2 [(1 − r − p ) F k ( r K N s l · x µ l )! + ( r − p )] ) . (21) W e finally get the explicit expression of the Hamiltonian, H ( y , s ) = − ln P ( y , s ) = − ln " 1 2 N M Y µ =1 ( 1 2 + y µ 2 [(1 − r − p ) F k ( r K N s l · x µ l )! + ( r − p )] )# . (2 2) 9 Using this Ha miltonian, w e can define the f ollo wing partition function Z ( β , y , x ) = X s exp [ − β H ( y , s )] , (23) where the sum ov er s represen ts the sum ov er a ll p ossible states for ve ctor s , and β is the inv erse temp erature parameter. Suc h a par tition function can b e identified with the partition function o f a spin glass system with dynamical v ariables s and quenc hed v aria bles x . The av erage of this partition function ov er y and x naturally contains all t he in teresting t ypical pro perties of the sc heme, suc h as the free energy . Ho wev er, it is hard to ev aluate this a ve ra ge and w e need some t echniq ues t o inv estigate it. In this pap er, w e use the so-called R eplic a Metho d to calculate the av erag e of the partition function. Once the free energy is obtained, one can compute the critical co de rat e at whic h a phase transition o ccurs b et w een the ferromagnetic phase (error reco v ery p ossible) and the paramagnetic phase (deco ding imp ossible). This giv es us the b est co de rate the sch eme can ac hiev e. A co de rate exceeding this critical v alue will mak e deco ding imp ossible. The calculations to obtain the av erage o f the partition function h Z ( β , y , x ) i y , x are detailed in Appendix A. After long calculations, the replica symmetric (RS) free energy is obtained, − f RS ( q , ˆ q , m, ˆ m ) = extr q , ˆ q,m, ˆ m ( X y = ± 1 Z ∞ −∞ " K Y l =1 D R l # Z ∞ −∞ " K Y l =1 D t l # × ln [ I ( y , R l , t l , m, q )] × 1 2 + y 2 [(1 − r − p ) F k ( { R l } ) + ( r − p )] + R Z ∞ −∞ D U ln 2 cosh h p ˆ q U + ˆ m i − R ln 2 − Rm ˆ m − R ˆ q (1 − q ) 2 ) , (24) where I ( y , R l , t l , m, q ) = Z ∞ −∞ " K Y l =1 D z l # × " 1 2 + y 2 ( r − p ) + y 2 (1 − r − p ) F k n p 1 − q z l + p q − m 2 t l + mR l o # , (25) D x = e − x 2 2 √ 2 π dx. (26) 10 and where extr denotes extremiz at ion. The sum denotes the sum other all p ossible states for the v a riable y , that is ± 1. Note also t hat w e set β = 1. This choice of finite temp erature deco ding (in contrast to β → ∞ whic h corresp onds to the zero temp erature limit) corresp onds t o the m aximizer o f p osterior mar ginals (MPM) estimator, while the zero temp erature deco ding corresp onds to the MAP estimator [25, 28]. The MPM estimator is known to b e optimal f or the purp ose o f deco ding [26–2 8]. On top of that, in this pap er w e supp ose that all the channe l prop erties (i.e.: the true v alues o f ( p, r )) are kno wn to the decoder whic h implies that the sys tem’s state w e consider is lo cated on the Nishimori line [26, 27]. T o retriev e the free energy one has to extremize (24) with respect to the order parameters q , ˆ q , m, ˆ m . This is done b y solving the following saddle p oin t equations ∂ f RS ∂ q = 0 ⇔ ˆ q = − 2 R − 1 X y = ± 1 Z ∞ −∞ " K Y l =1 D R l # Z ∞ −∞ " K Y l =1 D t l # × I ′ q ( y , R l , t l , m, q ) I ( y , R l , t l , m, q ) × 1 2 + y 2 [(1 − r − p ) F k ( { R l } ) + ( r − p ) ] , (27) ∂ f RS ∂ m = 0 ⇔ ˆ m = R − 1 X y = ± 1 Z ∞ −∞ " K Y l =1 D R l # Z ∞ −∞ " K Y l =1 D t l # × I ′ m ( y , R l , t l , m, q ) I ( y , R l , t l , m, q ) × 1 2 + y 2 [(1 − r − p ) F k ( { R l } ) + ( r − p ) ] , (28) ∂ f RS ∂ ˆ q = 0 ⇔ q = Z ∞ −∞ D U tanh 2 ( p ˆ q U + ˆ m ) , (29) ∂ f RS ∂ ˆ m = 0 ⇔ m = Z ∞ −∞ D U tanh( p ˆ q U + ˆ m ) , (30) where I ′ q ( y , R l , t l , m, q ) = ∂ I ( y , R l , t l , m, q ) ∂ q , (31) I ′ m ( y , R l , t l , m, q ) = ∂ I ( y , R l , t l , m, q ) ∂ m . (32) An error correcting co de sc heme t ypically admits tw o solutions: one where m = q = 1 , called the ferromagnetic solution, and one where m = q = 0, called the paramagnetic solution. As the names indicate, these solutions come f r o m the phy sical ferromagnet state and corr esp ond to the case where the spins are all ordered ( m = q = 1) or to the case where the spins tak e completely random states ( m = q = 0). As we can deduce f r o m equations (A3) and (A6), the ferromagnetic solution corresponds to deco ding success since m = 1 implies p erfect ov erlap. Con v ersely , the paramagnetic pha se implies failure in the deco ding pro cess (ov erlap m is 0). 11 A. Replica symmetric solution using a parit y tree with non-monotonic hidden units Using a par ity tr ee with non- mo no tonic hidden units (5), the enco der function b ecomes F k ( { u l } ) = K Y l =1 f k ( u l ) . (33) Using this enco der function a nd substituting m = q = 0 in the saddle p oint equations, one can find a consisten t solution where q = m = ˆ q = ˆ m = 0. This corresp onds to the paramagnetic solution, where deco ding of the r eceiv ed mess ag e fails. Using these conditions in (24), one can retrieve t he free energy of the pa r a magnetic phase, − f par a = − H 2 1 2 [(1 − p )(1 + Ω P T H ) + r (1 − Ω P T H )] × ln 2 , (34) where Ω P T H = K Y l =1 Z + ∞ −∞ D z l f k ( z l ) . (35) In the same w ay , substituting m = q = 1 in t he saddle p oint equations, one can find a consisten t solutio n. How ever, the ferromagnetic solution cannot b e computed analytically . So we pro ceed nume rically b y simply c heck ing the integrand of equations (27) and (28). W e did that extensiv ely for v alues of K = 1, K = 2, and K = 3. In eac h case we found that the integrand div erges so that when ( q , m ) → (1 , 1 ), w e ha ve b oth ˆ q → ∞ and ˆ m → ∞ . Substituting ˆ q → ∞ and ˆ m → ∞ into (29) and (30) clearly yields q = m = 1 . So q = m = 1, ˆ q → ∞ and ˆ m → ∞ is a consisten t solution of the saddle p oint equations whic h corr esp onds to the ferromagnetic solution, where deco ding o f the receiv ed message succee ds. W e also c hec k ed higher v alues of K (up to K = 5) and did no t find an y other consisten t solution. W e conjecture that this result holds for an y finite v alue of K . Finally , substituting m = q = 1, ˆ m → ∞ and ˆ q → ∞ in to (24), one can get the free energy of the ferromagnetic phase, − f f er r o = − ln 2 2 [(1 + Ω P T H ) H 2 ( p ) + (1 − Ω P T H ) H 2 ( r )] − R ln 2 . (36) Note that when K = 1, the presen t sc heme corresp onds to the case of Shinzato et al. [17]. The result w e obtained when K = 1 is indeed equiv a len t to what they found. 12 B. Replica symmetric solution using a committee tree wit h non-monotonic hidden units When a committee tree with non- monotonic hidden units (6) is used, the enco der function b ecomes F k ( { u l } ) = sgn " K X l =1 f k ( u l ) # . (37) Using this enco der function a nd substituting m = q = 0 in the saddle p oint equations, one can find a consisten t solution where q = m = ˆ q = ˆ m = 0. This corresp onds to the paramagnetic solution, where deco ding of the r eceiv ed mess ag e fails. Using these conditions in (24), one can retrieve t he free energy of the pa r a magnetic phase, − f par a = − H 2 1 2 [(1 − p )(1 + Ω C T H ) + r (1 − Ω C T H )] × ln 2 , (38) where Ω C T H = Z + ∞ −∞ " K Y l =1 D z l # × sgn " K X l =1 f k ( z l ) # . (39) In t he same w ay , by substituting m = q = 1 in the saddle p oin t equations one can find a consisten t solutio n. How ever, the ferromagnetic solution cannot b e computed analytically , so w e pro ceed nume rically b y simply c hec king the in tegrand of equations (27) and (28). W e did that extensiv ely fo r K = 3 (we consider only o dd v alues of K f or this sc heme, and when K = 1 the presen t sc heme is equiv alen t to the parity tree case). W e found that the in tegrand dive r g es so that when ( q , m ) → (1 , 1) , w e hav e b oth ˆ q → ∞ and ˆ m → ∞ . W e also c hec k ed higher v alues of K (up to K = 5) and did no t find an y other consisten t solution. W e conjecture that this result holds for an y finite v alue of K . Finally , substituting m = q = 1, ˆ m → ∞ and ˆ q → ∞ in to (24), one can get the free energy of the ferromagnetic phase, − f f er r o = − ln 2 2 [(1 + Ω C T H ) H 2 ( p ) + (1 − Ω C T H ) H 2 ( r )] − R ln 2 . (40) 13 C. Replica symmetric solution using a committee t ree w ith a non-monotonic out- put unit When a committee tree with a non-mono t o nic o utput unit (7) is used, the enco der f unc- tion b ecomes F k ( { u l } ) = f k " r 1 K K X l =1 sgn( u l ) # . (41) Using this encoder function and substituting m = q = 0 in the saddle p oint equations do not imply ˆ m = ˆ q = 0 and a non-trivial solution is found, whic h mak es the fr ee energy to o complex to b e in v estigated. This sc heme is lik ely to giv e non-optimal p erformance in suc h a case and will not b e considered in what fo llo ws. Note that the limit where K → ∞ w as no t studied b ecause the saddle p oin t equations tak e a non-trivial form that is difficult to in v estigate (in the lossy compression case, this study is still tractable). The tec hniques to inv estigate the free energy in the K → ∞ limit described in reference [24] cannot b e easily a pplied here. Ho wev er, based o n the previous results of Cousseau et al. [13], it is probable that in the K → ∞ limit, the committee t ree with a non- monotonic output unit saturates the Shannon b ound in the general BA C case. D. Replica symmetric solution using a parity tree Using a par ity tr ee (8), the enco der function b ecomes F k ( { u l } ) = K Y l =1 sgn( u l ) . (42) Using this enco der function and substituting m = q = 0 in the saddle p oint equations, one can find a consisten t solution where q = m = ˆ q = ˆ m = 0 but only when K > 1 . This corresp onds to the par amagnetic solution, where deco ding of the receiv ed mes sag e fails. Using these conditions in (24 ) , one can retriev e the free energy of the paramagnetic phase, − f par a = − H 2 1 2 [(1 − p )(1 + Ω P T ) + r (1 − Ω P T )] × ln 2 , (43) where Ω P T = K Y l =1 Z + ∞ −∞ D z l × sgn( z l ) . (44) 14 When K = 1 is considered, m = q = 0 do es not imply ˆ m = ˆ q = 0 and a non-trivial solution is fo und that mak es the free energy to o complex to b e in ves tig ated. The sc heme is lik ely to giv e non-o ptimal p erformance in suc h a case and will not b e considered in what follo ws. In the same w a y , substituting m = q = 1 in the saddle p oin t equations, o ne can find a consisten t solution, but o nly when K > 1. Ho w ev er, the ferromagnetic solution cannot b e computed analytically , so w e pro ceed nume rically by simply c hec king the in tegrand of equations (27) and (28) . W e did that extensiv ely for v alues of K = 2 and K = 3. In eac h case, w e f o und tha t the in tegrand div erges so that when ( q , m ) → ( 1 , 1 ) w e hav e b oth ˆ q → ∞ and ˆ m → ∞ . W e a lso che ck ed higher v a lues of K (up to K = 5) and did not find an y other consisten t solution. W e conjecture that this result holds for any finite v alue of K > 1. Finally , substituting m = q = 1, ˆ m → ∞ and ˆ q → ∞ in to (24), one can get the free energy of the ferromagnetic pha se, − f f er r o = − ln 2 2 [(1 + Ω P T ) H 2 ( p ) + (1 − Ω P T ) H 2 ( r )] − R ln 2 . (45) E. Replica symmetric solution using a committee tree Using a committee tree (9), the enco der function b ecomes F k ( { u l } ) = sgn " r 1 K K X l =1 sgn( u l ) # . (46) Using this encoder function and substituting m = q = 0 in the saddle p oint equations do not imply ˆ m = ˆ q = 0 and a non- trivial solution is found that mak es t he free energy to o complex to b e in v estigated. This sc heme is lik ely to giv e non-optimal p erformance in suc h a case and will not b e considered in what follo ws. As in the lossy compression case [1 2], the committee tree is unable to yield Shannon optimal p erformance. Note that the limit where K → ∞ w as no t studied b ecause the saddle p oin t equations tak e a non-trivial form that is difficult to in v estigate (in the lossy compression case, this study is still tractable). The tec hniques to inv estigate the free energy in the K → ∞ limit described in reference [24] cannot b e easily a pplied here. Ho wev er, based o n the previous results of Mim ura et al. [12], it is probable that in t he K → ∞ limit the committee t r ee still fails to satura t e the Shannon b ound ev en in the BSC case. 15 VI. PHASE T RANSITION F or the parit y and committee tree with non-monotonic hidden units and fo r the standard parit y tree, w e found a paramagnetic and a ferromagnetic solution of the following f o rm: − f par a = − H 2 1 2 [(1 − p )(1 + Ω) + r (1 − Ω)] × ln 2 , (47) − f f er r o = − ln 2 2 [(1 + Ω) H 2 ( p ) + (1 − Ω) H 2 ( r )] − R ln 2 , (48) where Ω is given b y Ω P T H , Ω C T H , or Ω P T dep ending on t he enco der considered. It then b econmes p ossible to calculate the critical v alue of the co de rate R at whic h a sharp phase transition o ccurs b etw een the f erromagnetic and the paramagnetic phase. This indicates the b oundary b et w een p ossible deco ding (ferromagnetic phase) and imp ossible deco ding (paramagnetic phase). In other words , this enables us to calculate the optimal co de rate for eac h sc heme. At the phase transition p oin t, we hav e f par a = f f er r o . (49) Simple algebra leads to R = H 2 ( γ ) − 1 + Ω 2 H 2 ( p ) − 1 − Ω 2 H 2 ( r ) , (50) where γ = 1 2 [(1 − p )(1 + Ω) + r (1 − Ω)] (51) and where Ω is given b y the encoder considered (Ω P T H , Ω C T H , or Ω P T ). This equation has exactly the same form as the BAC capacit y equation (13) and in fact is equiv alen t to the BA C capacit y if and only if Ω = Ω C . Since Ω dep ends o n the enco der, we will t reat each case in t he following subsections. A. T uning of the pa rit y t ree with non-monotonic hidden units In the parity tree with non-mono t onic hidden units case, w e hav e Ω ≡ Ω P T H = K Y l =1 Z + ∞ −∞ D z l f k ( z l ) . (52) 16 The parit y tree with non-monotonic hidden units is optimal if and only if Ω P T H = Ω C ⇔ H ( k ) = 1 4 1 − K p Ω C , (53) where H ( x ) = Z + ∞ x D z . (54) This giv es us a condition on the threshold parameter k of the non-monotonic transfer func- tion f k . If the threshold k is tuned to satisfy (53), the sc heme ac hiev es the Shannon limit. The only remaining issue is whether suc h a n optimal threshold k exists. W e solved (53) n umerically with parameters ( p, r ) ∈ { ]0 , 1[ } 2 and alw ay s f o und an o ptimal threshold parameter k up to K = 11. Note that Ω C can b e negativ e, whic h causes problems for the K − th ro ot when considering an ev en num b er of hidden units K . How ev er a simple p erm utation of the probability p and r c hanges the sign of Ω C . Since the original messages are dra wn from the uniform distribution, this p ermutation can b e done without an y loss of generalit y . Instead of using s 0 , one uses − s 0 . W e did not c heck higher v alues of K , but w e conjecture that the same result holds. This means that the parit y tree with non-monotonic hidden units satura tes the Shannon b ound in the large co dew ord length limit f or any n um b er of hidden units K . B. T uning of the committee tree with non-monotonic hidden unit s In the committee tree with non monotonic hidden units case, w e ha v e Ω ≡ Ω C T H = Z + ∞ −∞ " K Y l =1 D z l # × sgn " K X l =1 f k ( z l ) # . (55) The committee tree with no n- monotonic hidden units is optimal if and only if Ω C T H = Ω C ⇔ Ω C = K − 1 2 X l =0 K l [2 H ( k )] l [1 − 2 H ( k )] K − l − [2 H ( k )] K − l [1 − 2 H ( k )] l , (56) where x y denotes the binomial co efficien t. This giv es us a condition on the threshold parameter k of the non-monotonic transfer function f k . If the threshold k is tuned to satisfy 17 (56), the sc heme a c hieve s the Shannon limit. Th us, w e should ch eck if suc h an optimal threshold k exists. W e solved (56) n umerically with parameters ( p, r ) ∈ { ]0 , 1[ } 2 and alw ay s f o und an o ptimal threshold parameter k up to K = 11. W e did not c hec k higher v alues of K , but w e conjecture that the same result holds. Note tha t as mentioned in the definition o f this enco der, w e considered only an o dd n um b er of hidden units K . Therefore, these results mean that the committee tree with non-monotonic hidden units saturates the Shannon b ound in the large co dew ord length limit for an y o dd n um b er of hidden units K . C. T uning of the parit y tree In the parity tree case, w e hav e Ω ≡ Ω P T = K Y l =1 Z + ∞ −∞ D z l × sgn( z l ) . (57) The parit y tree is optimal if a nd o nly if Ω P T = Ω C ⇔ Ω C = 0 . (58) This giv es us a strong condition on Ω C . F rom the definition (1 7), it can b e easily seen that Ω C = 0 if a nd only if r = p : that is when t he BA C channe l turns into the particular case of the BSC c hannel. This means that the standard monotonic parit y tree saturates the Shannon b ound in the large co dew o rd length limit, but only in the BSC case and for a n um b er of hidden units K > 1. This confirms what w e exp ected and is the equiv a len t of Mim ura et al. [12] lossy compression case. VI I. CONCL USION AND DISCUSSIO N W e inv estigated an error correcting co de sche me for uniformly unbiase d Bo olean messages using parity tree and committee tree m ultilay er p erceptrons. All the sc hemes whic h use the non-monotonic transfer function f k in their hidden lay er w ere sho wn to saturate the Shannon b ound under some sp ecific conditions. The use of f k enables the relev ant sc hemes to deal with asymmetric c hannels like the BA C while monotonic net works using only the standard sign function can deal o nly with symmetric c hannels like the BSC. 18 Indeed, w e confirmed that the standard monotonic par it y tree saturates the Shannon b ound only in the case of the BSC c hannel. The standard monotonic committee tree how ev er, fails to pro vide optimal p erformance ev en in the BSC case . As a g eneral conclusion, this pa p er sho ws that tree-like m ultilay er p erceptrons in tro duced in [10, 1 2, 13] within the fr amew ork of lossy compression can a lso b e used efficien tly in an error correcting code sc heme. F or eac h netw ork considered, we provide d a theoretical analysis of the t ypical perfor ma nce and gav e the necessary conditio ns fo r obtaining optimal p erformance. In each case, w e we re able to deriv e results similar to the lossy compression results. F ina lly , in the case of error correcting co de, t he replica symm etric solution stabilit y [18] w as no t c heck ed b ecause no replica symmetry breaking is exp ected on the Nishimori line [29]. This pap er discusses only the t ypical p erformance of the sc hemes at the infinite co dew ord length, how eve r, and do es not pro vide any explicit deco der. Because the presen t sc hemes mak e use of densely connected systems , a formal deco der cannot b e implemen ted as it w o uld require a deco ding t ime whic h would grow exp onen tially with the size of the original mes- sage. One promising alternativ e is to use the po pular b elief propagation (BP) algorithm to calculate an approximation of the marginalized p osterior probabilities. The BP algorithm is kno wn f or giving go o d results when w or king in the ferromagnetic phase, where no fr ustration is presen t in to the system. With the previous work do ne on lossy compression [10, 12, 13, 30] and on error correcting co de [17] using p erceptron t yp e netw orks, there is no w a sufficien t theoretical back g round to in v estigate and compare the practical p erformance (in the finite co dew ord length limit) of all the sch emes with the theoretical p erformance. In t he case of lo ssy compression with a simple p erceptron, the study o f the BP algorithm p erformance has already b een done b y Hosak a et al. [30]. Their w ork prov ides a solid ba se from whic h to b egin in ve stiga t ing the more complicated m ultilay er structure. The influence of the num b er of hidden units on the practical p erformance of the sc heme is an interes ting issue whic h will b e examined in future w ork. 19 Ac kno wledgements This w ork w as part ia lly supp orted b y a Gra n t-in-Aid for Encouragemen t of Y oung Sci- en tists (B) (Gran t No. 1870 0230), Gran t-in- Aid for Scien tific Researc h on Priorit y Areas (Grant Nos. 18079003 , 18020007), Gran t - in-Aid for Scien tific Researc h (C) (Gran t No. 16500093 ), a nd a G ran t-in-Aid fo r JSPS F ellow s (Grant No. 0 6J06774) from the Ministry of Education, Culture, Sport s, Science a nd T ec hnology of Japan. App endix A: Analytical Ev aluation using the replica metho d The free energy can b e ev aluated b y the replica metho d, f ( β , R ) = − 1 β N lim n → 0 h Z ( β , y , x ) n i y , x − 1 n (A1) where Z ( β , y , x ) n denotes the n -times r eplicated pa r tition function Z ( β , y , x ) n = X s 1 ,..., s n n Y a =1 exp [ − β H ( y , ˆ y ( s a ))] . (A2) V ector s a is giv en b y s a = ( s a 1 , . . . , s a K ) and superscript a denotes the replica index. W e pro ceed to the calculation of the replicated partition function (A2). Inserting the follo wing tw o iden t it ies, 1 = n Y a =1 K Y l =1 Z + ∞ −∞ dm a l δ s 0 l · s a l − N K m a l = 1 2 π i nK Z Y a Y l dm a l d ˆ m a l ! × exp " X a X l ˆ m a l s 0 l · s a l − N K m a l # (A3) and 1 = n Y a
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment