Fast Arithmetics Using Chinese Remaindering

F ast Arithmetics Using Chinese Remaindering George Da vida ∗ , Bruce Litow † and Guangwu Xu ‡ Abstract In this paper, some issues concer ning the Chinese remaindering representation are dis cussed. Some new conv erting metho ds, including an eﬃcient proba bilistic a lgorithm bas ed on a r ecent result of v on zur Gathen and Shparlinski [5], are describ ed. An eﬃcient reﬁnement of the NC 1 division algo rithm of Chiu, Davida and Litow [2] is given, where the num ber of moduli is r educed by a factor of lo g n . Keywor ds: Parallel algorithm; Chinese remaindering representation. 1 In tro duction F or the fu ndament al arithm etic op erations, it is often desirable to represent an intege r as a vecto r of smaller in tegers. This can b e d one b y selecting a set of p airwise co prime p ositiv e in tegers m 1 , m 2 , . . . , m r , and mapping an int e- ger x to the v ector of residues ( | x | m 1 , | x | m 2 , · · · , | x | m r ), where | x | m i denotes x (mo d m i ). This approac h is called the Ch inese remainderin g repr esen- tation (CRR) , as the C hinese remaind er theorem (CR T) guaran tees su c h mapping is meaningful. Using CRR, large calculations can b e split as a series of smaller calculatio ns that can b e p erformed indep endently and in parallel. S o, th is appr oac h has a signiﬁcan t r ole to pla y in applications su c h as cryp tograph y and high precision scientiﬁc computation. ∗ Department of EE & CS, Universi ty of Wisconsin-Milw aukee, WI, USA ; e-mail: davida@cs. uwm.edu † School of In formation T echnology , James Cook Univers ity , T ownsville, QLD, Aus- tralia; e-mail: bruce@c s.jcu.edu .au ‡ Department of EE & CS, Universi ty of Wisconsin-Milw aukee, WI, USA ; e-mail: gxu4uwm@uw m.edu 1 It is we ll kn o wn that th ree basic arith m etic op erations, addition, sub - traction, and m ultiplication, can b e p erformed in O (log n ) time using n O (1) pro cessors. These op erations can also b e done in the manner of log-space uniform. H o wev er, the parallel complexit y of inte ger division is a su btle problem and h as at tracted a lot of atten tion. Th e ﬁrst O (log n ) time n O (1) sized circuit for in teger division w as exhibited b y Beame, Co ok and Ho o ver [1]. R ecently , the log-depth, p olynomial size, logspac e-uniform circuit fam- ily for in teger d ivision (i.e., in teger division is in logspace-uniform NC 1 ) was describ ed b y Chiu, Da vida and Litow [2]. Th is settled a longstandin g op en problem and pro vided an optimal computation eﬃciency theoretically . In this pap er, w e discuss some issues concerning the Chinese remainder- ing representa tion. The organization of the p ap er is as follo ws. S ection 2 describ es the Chinese remaindering system. Tw o method s for conv erting a v ector to the corresp on d ing in teger are presen ted in this section. Section 3 fo cuses on the intege r division usin g C RR. Und er the f ramew ork of NC 1 , an eﬃcien t reﬁnement of the division alg orithm of Ch iu, Da vida and Lito w [2] is prop osed. 2 Chinese Remainder Represen tation Let M = { m 1 , m 2 , . . . , m r } b e a set of pairw ise coprime in tegers and M = r Y i =1 m i . F or a set of integ ers x 1 , x 2 , . . . , x r with 0 ≤ x i < m i , th e Chin ese Remainder Th eorem s a ys that the system of co ngruence          x ≡ x 1 (mo d m 1 ) x ≡ x 2 (mo d m 2 ) · · · x ≡ x r (mo d m r ) has a unique solution 0 ≤ x < M . In fact, using the extended Euclidean algorithm, one ﬁnds integ ers u 1 , u 2 , · · · , u r suc h that r X i =1 u i M m i = 1 , 2 and it is easy to verify that x = r X i =1 x i u i M m i (mo d M ) (1) giv es th e desir ed solution. It is r emark ed that one can also c ho ose u i = ( M m i ) − 1 (mo d m i ); and such c h oice of u i will b e us ed in the rest of our discussion. The ab o ve system is called a C hinese remaind er in g represen tation (CRR) based on the set M , and is denoted by CRR( M ). No w we present a method of ﬁ n ding u i ’s whic h ca n b e s een as an alt er- nativ e to th e Garner algorithm describ ed in [7] (pages 290,293). F or eac h j > 1, m j is coprim e to m 1 · · · m j − 1 . T herefore, by the extended Euclidean alg orithm, there exist in tegers α j , β j suc h that α j m j + β j m 1 · · · m j − 1 = 1 . (2) With these r − 1 pairs of ( α i , β i ), the co eﬃcien ts u i can b e computed as follo ws: u 1 ← α 2 α 3 · · · α r (mo d m 1 ) u 2 ← β 2 α 3 · · · α r (mo d m 2 ) u 3 ← β 3 α 4 · · · α r (mo d m 3 ) . . . u r ← β r (mo d m r ) The correctness of the ab o v e algorithm is based on the follo wing identi t y: ( α 2 · · · α r ) m 2 m 3 · · · m r + ( β 2 α 3 · · · α r ) m 1 m 3 · · · m r + ( β 3 α 4 · · · α r ) m 1 m 2 m 4 · · · m r + · · · + β r m 1 m 2 · · · m r − 1 = 1 . This iden tit y can b e ve riﬁed using the stand ard mathematical induction: for i > 2, supp ose that ( α 2 · · · α i − 1 ) m 2 m 3 · · · m i − 1 + ( β 2 α 3 · · · α i − 1 ) m 1 m 3 · · · m i − 1 + ( β 3 α 4 · · · α i − 1 ) m 1 m 2 m 4 · · · m i − 1 + · · · + β i − 1 m 1 m 2 · · · m i − 2 = 1 . 3 Multiply b oth sides of the ab o ve b y α i m i , an d app ly the equation (1) for j = i , one gets ( α 2 · · · α i ) m 2 m 3 · · · m i + ( β 2 α 3 · · · α i ) m 1 m 3 · · · m i + ( β 3 α 4 · · · α i ) m 1 m 2 m 4 · · · m i + · · · + β i m 1 m 2 · · · m i − 1 = 1 . It is remark ed that in this pro cess, w e call the extended Euclidean algo- rithm r − 1 times. F or the metho d describ ed in [7], r ( r − 1) 2 instances of extended Eu clidean algorithm need to b e in v oke d, for p airs ( m i , m j ) with i < j . Next we present a p robabilistic con verting m etho d f or CR T. F or p ositiv e in tegers N 1 , N 2 , let a 1 , a 2 , · · · , a r b e in { 1 , 2 , · · · , N 1 } . Pic k 2 r un iformly distributed random intege rs s 1 , s 2 , · · · , s r and t 1 , t 2 , · · · , t r in { 1 , 2 , · · · , N 2 } and consider the linear forms S = r X i =1 a i s i , T = r X i =1 a i t i . It has b een p ro ved by C o op erman, F eisel, von zur Gathen and Ha v asin in [3] that with h igh pr obabilit y gcd( a 1 , a 2 , · · · , a r ) = gcd( S, T ) . (3) This w as impro ved recen tly by v on zur Gathen and Shparlinski [5] and they ga v e the follo w ing strong resu lt: with probabilit y at least 6 π 2 + o (1), gcd( a 1 , a 2 , · · · , a r ) = gcd( S, T ) , pro vided that N 2 r + ln N 1 is large enough. This r esult can b e used to pr o duce a v ery eﬃcien t pr obabilistic algorithm for Ch inese remainderin g. Let us take a i = M m i . W e can ﬁnd x suc h that          x ≡ x 1 (mo d m 1 ) x ≡ x 2 (mo d m 2 ) · · · x ≡ x r (mo d m r ) b y th e follo wing steps: 4 1. Cho ose rand om linear forms S, T un til gcd( S, T ) = 1 . (The exp ected num b er for getting the desired pair of S, T is less than 2.) 2. Use extended Euclidean algorithm to get intege rs u, v su c h that uS + v T = r X i =1 ( us i + v t i ) M m i = 1 . 3. The solution x is x = r X i =1 x i ( us i + v t i ) M m i (mo d M ) . Remark. It can b e seen th at in this routine, if the extended Euclidean algorithm is used to compute all gcd s, th en the exp ected n u m b er of r ounds to get u, v in step 2 is less than 2. In step 3, us i + v t i can b e replaced b y ( us i + v t i ) (mo d m i ). 3 An Impro v ed NC 1 Division Algorithm In this section, we discuss the division algorithm of Chiu, Da vida and Lito w [2]. A careful analysis enables us to reduce the n u m b er of prime mo d uli by a factor of log n . Let α b e a real num b er. A rational n um b er α ′ is said to b e an n − bit under appr oximation to α if 0 ≤ α − α ′ ≤ 1 2 n . The n ext resu lt improv es the lemma 3.2 of [2]: Lemma 1 L et 1 2 ≤ α < 1 and β = 1 − α . If t 1 A 1 , t 2 A 2 , . . . , t n +1 A n +1 ar e ( n + 3) − bit under appr oximations to β , then 1 + t 1 A 1 + t 1 t 2 A 1 A 2 + · · · + n +1 Y i =1 t i A i is an n − bit under appr oximation to 1 α . 5 Pr o of . Let η = min 1 ≤ i ≤ n +1 { t i A i } . Note th at 0 ≤ β ≤ 1 2 and 0 ≤ β − η ≤ 1 2 n +3 , we see that 1 α − (1 + t 1 A 1 + t 1 t 2 A 1 A 2 + · · · + n +1 Y i =1 t i A i ) ≤ 1 α − (1 + η + η 2 + · · · + η n +1 ) = 1 1 − β − 1 − η n +2 1 − η =  1 1 − β − 1 1 − η  + η n +2 1 − η = β − η (1 − β )(1 − η ) + η n +2 1 − η ≤ 1 2 n +3 1 2 · 1 2 + 1 2 n +2 1 2 = 1 2 n . In [2], the log-depth, p olynomial size, logspace-uniform circuit family for in teger division wa s constructed by C hiu, Da vid a and Litow. In other w ords, integ er division is pro ve d to b e in logspace -uniform NC 1 . This solves a longstanding op en problem. Notice that the original construction of the NC 1 circuit family for integ er division needs 3 n 2 (actually 2 n 2 + 5 n ) primes n umb ers. The m ain purp ose of this section is to reﬁne the Chiu-Da vida-Lito w constru ction to ac hiev e more eﬃciency . T o b e more sp eciﬁc, w e shall sho w that n 2 log n + 3 n p rimes will b e su ﬃcien t. Theorem 1 The numb er of prime mo duli of the Chiu-Davida-Litow NC 1 inte ger division algorithm c an b e r e duc e d to n 2 log n + 3 n . Pr o of . The pr o of follo ws the similar line as in [2]. The goal is: giv en x, y < 2 n , compu te the CRR of  x y  . 6 Let N =  n 2 log n  + 3 n . Supp ose th at x, y are rep r esen ted in a CRR syste m with base { m 1 , m 2 , . . . , m n } where m i is the ( i + 2 )th p rime ( m 1 > 3). This base is extended to { m 1 , m 2 , . . . , m n , m n +1 , . . . , m N } . A p ro du ct D of the initial part of the base and some p o wer of 2 will b e constructed so that 1 2 ≤ y D < 1 . According to [2], if y = 2, set D = 2. If y > 2, then tak e j < n to b e th e n um b er such that m 1 m 2 · · · m j ≤ y < m 1 m 2 · · · m j m j +1 . Let k b e the smallest p ositiv e integ er su c h th at y < 2 k m 1 m 2 · · · m j (therefore y 2 k m 1 m 2 · · · m j ≥ 1 2 ), and set D = 2 k m 1 m 2 · · · m j . Let r =  n log n  . If n ≥ 2 6 , then n − log n − (lo g n ) 2 log n > 3. Th e fact that m n +1 > 2 n giv es ( m n +1 ) r > (2 n )  n log n  ≥  2 log n +1  n log n − 1 = 2 n + n − log n − (log n ) 2 log n > 2 n +3 (4) Since n + ( n + 1) r ≤ N , w e can form the follo wing pro ducts: A 1 = m n +1 m n +2 · · · m n + r A 2 = m n + r +1 m n + r +2 · · · m n +2 r · · · A n +1 = m n + nr +1 m n + nr +2 · · · m n +( n +1) r . 7 W e note that A i > 2 n +3 for i = 1 , 2 , . . . , n + 1 , by (4). Next, c ho ose t i =  ( D − y ) A i D  , for i = 1 , 2 , . . . , n + 1 . Similar to [2], t i A i can b e computed in NC 1 . It is also rou tin e to chec k that t i A i is an ( n + 3 ) − bit u nderapp ro ximation to β = D − y D . Finally , by the lemma 1, w e get an n − bit un derapproximat ion to 1 α where α = y D : γ = 1 + t 1 A 1 + t 1 t 2 A 1 A 2 + · · · + t 1 t 2 · · · t n +1 A 1 A 2 · · · A n +1 . Again, similar to [2], w e ha ve  x y  =  x γ D  or  x y  =  x γ D  + 1 . And all the compu tations are done in NC 1 . Remark. The C hebyshev b ounds for prim es can b e used to get an inequality whic h is a b it sharp er than the inequalit y (4), but there is no signiﬁcant reduction on the num b er of prime mo du li. References [1] P . Beame, S. Co ok and J. Ho o v er, Log d epth circuits f or division and related pr oblems SI AM J. C omput. , 15 :994–100 3 (1986). [2] A. Chiu , G. Da vida and B. L ito w, Division in logspace-uniform NC 1 , Theoret. In formatics Appl. 35 :259-27 5 (2001). [3] G. C o op erman, S. F eisel, J. v on zur Gathen and G. Ha v as, GCD of man y in tegers (Extended abstract), COC OON’99 , LNCS v ol. 1627, pp . 310-3 17 (1999 ). [4] G. Da vida and B. L itow, F ast Paral lel Arithmetic via Mo d ular Repre- sen tatio n SIAM J. Comput. 20 (4): 756-76 5 (1991). 8 [5] J. v on zur Gathen and I. Shp arlinski, GCD of rand om linear forms, ISAA C 2004 , LNCS v ol. 334 1, pp. 464-469 (20 04). [6] M. Hitz and E. Kaltofen, Intege r division in residue num b er systems. IEEE T rans action on Computers 44 (8): 983-989 (19 95). [7] D. Knuth, Th e Art of Programming, v olume 2: S emin umerical Algo- rithms, 3rd edition, Add ison-W esley , Reading, 1997. 9

Fast Arithmetics Using Chinese Remaindering

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment