A SWAR Approach to Counting Ones

A SW AR Approac h to Coun ting Ones Holger P etersen Reinsburgstr. 75 70197 Stuttgart German y Septem b er 3, 2018 Abstract W e inv estigate the complexity of algorithms counting ones in diﬀerent sets of operations. With addition and logical op erations (but no shift) O (log 2 ( n )) steps suﬃce to count o nes. P arity can b e computed with complexity O (log ( n )), whic h is the same b ound as for metho ds using shift- operations. If m ultiplication is av ailable, a solution of time complexity O (log ∗ ( n )) is p ossible improving the known boun d O (log lo g ( n )). 1 In tro duction The op eratio n o f counting the num b er of ones in a binary word consis ting o f n bits is als o known as sidew ays addition [9], bit co unt [8], or p opulation count [1]. F ast solutions based o n lo ok- up tables hav e be e n suggested by several a uthors (see the references in [11]), but for lar ge n they ar e clearly impractical. Under unit cost measur e, methods of asymptotical complexit y O (log log n ) are known, but they are based on mu ltiplicatio n like the one due to Gillies and Miller fro m the ﬁrst textb o o k on progr amming published in 1 9 57 [13, 9] or division as in [4, Item 169] (the a lgorithms are presented for spec iﬁc w or d lengths and we assume a gener alization in a natura l wa y). Since these mor e powerful arithmetical op erations might not b e executed as e ﬃcie ntly a s, e.g ., addition or lo gical ins tr uctions, we are int er ested in appro aches to the bit count problem avoiding these oper a tions, also ruling out shift instructions a s sp ecial cases of multiplication and divisio n. Exercise 2-9 o f [8] as ks for a solutio n of the bit count pr oblem based on an observ ation or ig inally due to W egner [12]. It is a remar k able fact that in tw o’s complement represe ntation for x 6 = 0 the right-most one of x ca n b e deleted b y the operation x & ( x − 1 ), where & deno tes the bit-wise AND of t wo v alues. With the help of this fact the bit count of the input can b e computed in a lo op tha t is executed once for each one. This will save considerable work in compar ison to a naive implementation when the n umber of one s is spar se. When the zero es are sparse the method can b e applied to the complement (this was already men tion in the last section o f [12]). The worst case as well as the av erage case complexity of these metho ds is how ever Θ( n ) (for the av erage c ase consider a binary w or d x with ν ( x ) ones and its complement having n − ν ( x ) ones). 1 In the current work we will develop algorithms for coun ting ones in a binary word bas ed on the concept of “SIMD with a r egister (SW AR)” [6]. The idea is to pa rtition larg e registers into ﬁelds of bits that a re pro cessed in pa rallel. W e obtain a lg orithms of complexity O ( √ n ) and O (log 2 n ) in settings without m ultiplicatio n and division, where the latter appr oach is less practical fo r curre nt word-lengths as it requires a lo t of larg e constants. Computing the pa rity of a binary word eﬃciently has received so me attent io n in its own rig h t, see, e.g., [2, 5]. It can of cour s e b e determined from the count o f o ne s , but a sp eciﬁc approach might b e faster . F or a modiﬁed parity function we get an O (log n ) solution in a r estricted computational mo del which is comp etitive with methods using shift instructions. If multiplication is included in the set of op erations , the complexity of count- ing ones can be reduced to O (log ∗ ( n )). Also included is a pa rit y function mak- ing use of multiplication and integer division which is sup erio r (in terms of “C-op erations ”) to the implementations the author is a ware o f. It is ins pired by the O (log ∗ ( n )) solutio n combined with [4, Item 16 9], which we outline in Appendix B. 2 Preliminaries W e will denote the bina ry log arithm by lo g. The w or d- length will b e deno ted by n , which is usually a power of 2. Bit po sitions are num b e r ed starting at 0 with the least sig niﬁcant bit. Thus bits of the words b eing pro cessed hav e weigh ts 1 to 2 n − 1 . Using the nota tion fro m [9], the bit co unt function is c alled ν . The tower function is de ﬁned by 0 2 = 1 a nd k 2 = 2 ( ( k − 1) 2) = 2 2 . . . 2 for k > 0 where k is the num b er of tw os . The in verse o f the to wer function is the iter ate d lo garithm lo g ∗ . It can b e extended to a n y n ≥ 1 as the n umber of times the function log must be applied until the result is less than or equa l to 1 Our ﬁrst re s ults refer to pro gramming models that rule o ut m ultiplication, division, and shift instructions. The restricted set of instr uc tio ns including only logical o per ations (and, or, xo r) and addition which will b e called here Obliv- ious Parallel Addition a nd Lo g ical Instructions (OP AL). Including subtraction do es not change this mo del in ter ms of asymptotic complexit y , since it ca n be simulated with neg ation and addition. OP AL is the mo del employ ed in [11]. Co de w ill b e presented in (subsets) of C [8]. With P ar allel Addition and Logical Instructions (P A L) we will denote the extension by ﬂow control instructions like “ if ” and “ while”, usually found in mo dern progr amming lang uages. Within the P AL model the bit count function from [8] and the improvemen t of E xercise 2-9 can be realized directly . 3 Results Theorem 1 The bit c ount function ν c an b e c ompute d with the help of O ( √ n ) instructions in the P AL mo del. 2 Pro of: The central idea is to apply W egner’s technique [1 2] to approximately √ n ﬁelds o f √ n bits each, sepa r ated by spacer bits [6]. Since the o riginal input do es not have spacer bits, the ﬁrs t task is to count o nes at the p ositions of the prosp ective spacer bits. This num b er is stored in the v aria ble s um . Then the following steps are r epe a ted. First the spacer bits ar e se t. By subtracting t wice the current most signiﬁcant bits of each ﬁeld (plus one for the least s ig niﬁcant p o sition), a one is deleted from all ﬁelds that con tain a one. F or ﬁelds without a one, the spa cer bit it r e set by this op e r ation. The diﬀerence to the previo us state of the most signiﬁcant bits of the ﬁelds is c omputed and the change is determined with the help of the sa me technique. The cur rent num b er of “active” ﬁelds is stored in the v aria ble count and additional bits r eset are recorded in sum . The subtraction from an empt y ﬁeld will cause a b or r ow and therefore no explicit subtraction from the nex t ﬁeld is neces sary . These steps are car ried out until all ﬁelds are empty . The num b er of iterations is b ounded above by the length o f the ﬁelds and th us O ( √ n ). The amortized cos t of up dating v aria ble count is also O ( √ n ), since ea c h of the O ( √ n ) spacer bits can change only once (it will never change from 0 to 1 within the lo o p). Therefo r e the c laimed complexity fo llows. W e pr esent the pr oc e dure just outlined for an 3 2 bit input, where fo r the sake of simplicity we ro und the length and the n umber o f ﬁelds to powers of 2: #defin e HIBI TSL 0x888 88888 // 8 field s with 4 bit s each bitcou nt(a) // ALU-bas ed regist er unsig ned long a; { regist er int sum; // overa ll bit sum regist er int count; // incre mental cont ributio ns regist er unsig ned long x; // auxi liary variable regist er unsig ned long oldhi , newh i; // hi bit s of fields x = (a & HIBITS L); for(su m = 0; x ; x &= (x-1) ) sum++; count = 8; // bitco unt(HIB ITSL) oldhi = HIBITSL ; a |= oldhi ; while( oldhi) { a &= (a - oldhi - oldh i - 1); newhi = (a & oldhi ); x = newhi ^ oldhi; while( x) { x &= (x-1) ; count- -; } oldhi = newhi ; sum += count; }; return (sum); } 3 ✷ Next we exclude ﬂow control ( if , while ) from the op erations allowed. By the result from [11], function ν cannot be computed under this r estriction. W e can how ever compute a mo diﬁed version o f ν for a po rtion of the bits. First we descr ibe an impo rtant building blo ck that allows us to shift bits rapidly using o nly a ddition and log ical op erations. Lemma 1 In a set of disjoint ﬁelds, the le ast signiﬁc ant bit of e ach ﬁeld c an b e shifte d to the most signiﬁc ant p osition in c onstant time, if al l interme diate p ositions c ontain zer o. Pro of: W e will descr ibe the tec hnique for a single ﬁeld s tretc hing from p ositio n i to p osition j > i . By incorp orating infor mation for the other ﬁelds int o the constants, the corresp onding mo diﬁcatio ns ca n b e achieved. First bit j is cle a red by ma s king with ¬ 2 j . Then the v alue 2 j − 2 i is added to the res ult. Finally we mask with ¬ (2 j − 2 i ). If bit i initially was 1 , then there will b e a ca rry that propag ates to p osition j . If it was 0, then p osition j will not be mo diﬁed. ✷ W e note that the tec hnique of Lemma 1 can replace the bit-by-bit shift of bits in the constr uc tio n from [11], which has a time complexity prop ortional to the num b er of p ositions. Theorem 2 L et m = n −⌈ lo g n ⌉ . The mo diﬁe d bit c ount function 2 m − 1 ν ( x mo d 2 m ) for the lowest m bits of an n bit input x c an b e c ompute d with the help of O (log 2 ( n )) instruct ions in t he O P AL mo del. Pro of: W e will demonstr a te how to co mpute the function for the lower n/ 2 bits of a word with n bits. The cla im then follows b y computing in addition the function for the most signiﬁca nt n/ 2 − ⌈ log n ⌉ bits (shift all constants ac- cordingly), adding the results a nd subtra cting the count of ones in the ov er la p by ﬁrst s hifting each bit to the target area a nd then subtracting it. The latter correctio n can b e done in time O (log n ) by Lemma 1. In stag e i o f the metho d, counts o f ones in ﬁelds o f length 2 i − 1 are com bined int o counts for ﬁelds of length 2 i by applying Lemma 1 the bits of the coun ts. Since by Lemma 1 we can only shift left, the counts are are lo ca ted at the most signiﬁcant bit o f ﬁelds. There are O (log n ) stages and each stage takes time O (log n ), resulting in the claimed b ound O log 2 ( n ) The following co de implements the metho d for the lower 16 bits of a 32 bit word. Bina ry r epresentations of the mas ks are given in the comments // least signif icant 16 bits count ed, 20 bits requ ired // (exce ss 4 bits for 5 bit count) bitcou nt(x) regist er unsig ned x; { regist er unsig ned y; y = x & 0x5555; // 01010 101010 10101 x -= y; // delet e fiel ds 4 y += 0 x5555; // 01010 1010101 0101 y &= 0xAAA A; // 10101 0101010 1010 x += y; y = x & 0x6666; // 00001 100110 01100110 x -= y; // delet e fiel ds y += 0xCCC C; // 00011 0011001 1001100 y &= 0x733 33; // 11100 1100110 0110011 y += 0x666 6; // 00001 1001100 1100110 y &= 0x799 99; // 11110 0110011 0011001 x += y; y = x & 0x3838; // 00000 111000 00111000 x -= y; // delet e fiel ds y += 0x1E1 E0; // 00111 1000011 1100000 y &= 0x61E 1F; // 11000 0111100 0011111 y += 0xF0F 0; // 00011 1100001 1110000 y &= 0x70F 0F; // 11100 0011110 0001111 y += 0x787 8; // 00001 1110000 1111000 y &= 0x787 87; // 11110 0001111 0000111 x += y; y = x & 0x780; // 00000 0000000 0011110000000 x -= y; // delet e fiel ds y += 0x3FC 00; // 00000 0011111 1110000000000 y &= 0x7C0 3FF; // 00111 1100000 0001111111111 y += 0x1FE 00; // 00000 0001111 1111000000000 y &= 0x3E0 1FF ; // 0001 1111000 00000111111111 y += 0xFF0 0; // 00000 0000111 1111100000000 y &= 0x1FF 00FF; // 11111 1111000 0000011111111 y += 0x7F8 0; // 00000 0000011 1111110000000 y &= 0xF80 7F; // 00000 1111100 0000001111111 x += y; return (x); } ✷ By so me pre- and p ost-pro cessing we can obtain a s olution in the P AL mo del of the same a symptotic complexity . Due to the lar ge consta n t factor it do es how ever not app ear to b e of practica l v a lue for sma ll word-lengths. Corollary 1 The bit c ount fu n ction ν ( x ) of an n bit input x c an b e c ompute d with the help of O (log 2 ( n )) instruct ions in t he P AL mo del. Pro of: W e pro ceed a s fo llows: 1. The Θ(log n ) most signiﬁca n t p ositio ns no t handled by the metho d of Theorem 2 a re copied to a v a riable and set to 0 in x (time complex it y O (1)). 5 2. F or the n − Θ (log n ) bits the mo diﬁed bit count function is computed according to Theo r em 2 (time complexity O (log 2 n )). 3. The resulting O (log n ) bits are transferred one by one to the least sig niﬁ- cant p ositions by extracting each bit with the help of a ma sk and building up the result in another v a riable (time co mplexit y O (log n )). 4. The most s ig niﬁcant bits of the orig inal input saved in step 1) are counted in a naive wa y and added to the result of step 3) (time co mplexit y O (log n )). The overall time complexity is dominated by s tep 2 ) and thus O (log 2 n ). ✷ F or the parity function o nly a single bit has to b e shifted in an a pproach as in Theorem 2. There fo re we obtain a more eﬃcient so lutio n than for bit count. Theorem 3 The mo diﬁe d p arity fun ction 2 n − 1 ( ν ( x ) mo d 2 ) c an b e c ompute d with the help of O (log n ) instructions in t he OP AL mo del. Pro of: In the initial stage the bits o f the input ar e mov ed up o ne p osition and the pa rity of pairs of bits is computed by a XOR. By mas k ing out the lower bits of each pair, we obta in 2 bit ﬁelds containing the pa rity in the most sig niﬁcant bit. In the fo llowing stages we a pply a v aria n t of Lemma 1, wher e we pro pa gate the most signiﬁcant bit of one ﬁeld dir ectly to the most sig niﬁcant bit of the next ﬁeld. W e make use of the fact that in the least sig niﬁcant p ositions of each ﬁeld space r bits in the sense of [6] a r e av aila ble after ma sking the most signiﬁcant bit. Each stage doubles the size o f the ﬁelds. W e illus trate the metho d for 32 bits in the following co de: parity (x) regist er unsig ned x; { x ^= (x + x); x &= 0xaaa aaaaa; // parity of 2 bit fields x += 0x666 66666; x &= 0x888 88888; // 4 bit fields x += 0x787 87878; x &= 0x808 08080; // 8 bit fields x += 0x7f8 07f80; x &= 0x800 08000; // 16 bit fields x += 0x7ff f8000; x &= 0x800 00000; // all 32 bits return (x); } ✷ By the r esult from [11] the O P AL mo del is no t able to directly co mpute the parity function. With the help of a single test a non-zer o result can how ever be transformed into 1. W e thus o bta in: Corollary 2 The p arity function ν ( x ) mo d 2 c an b e c ompute d with the help of O (log ( n )) inst ructions in the P AL mo del. 6 W e now turn to a more p ow erful mo del of co mputation. Theorem 4 The bit c ount function ν c an b e c ompute d in t ime log ∗ ( n ) with the help of addition, shift, lo gic al op er ations, and multiplic ation. Pro of: The alg orithm pro c e eds in iter ations. W e will maintain the inv aria n t that at the end of iter a tion k of the alg orithm ﬁelds o f length k 2 of the current int er media te r esult a contain a count of ones in corr espo nding p o sitions of the original input. The count will b e stored in the low orde r bits of each ﬁeld. Ini- tially the inv aria n t holds fo r k = 0 and the input a , since a single bit repre s en ts its own co un t. Consider k > 0 and assume the in v ariant holds for the cur rent intermediate result a a fter iter ation k − 1. W e deﬁne bit mask b k = · · · 11 · · · 11 | {z } ( k − 1) 2 00 · · · 00 | {z } ( k − 1) 2 11 · · · 11 | {z } ( k − 1) 2 that selects ha lf of the ﬁelds from the pr evious iteration a nd c k = · · · 00 · · · 00 | {z } k 2 − 2 · ( k − 1) 2 11 · · · 11 | {z } 2 · ( k − 1) 2 00 · · · 00 | {z } k 2 − 2 · ( k − 1) 2 11 · · · 11 | {z } 2 · ( k − 1) 2 that selects the low er 2 · ( k − 1) 2 bits of each ﬁeld of length k 2. W e let x = a & b k , y = ( a >> ( k − 1) 2)& b k . Next we multiply x and y with a num b e r m k = 0 0 · · · 01 | {z } 2 · ( k − 1 2 00 · · · 01 | {z } 2 · ( k − 1) 2 00 · · · 01 | {z } 2 · ( k − 1) 2 that contains k 2 / (2 · ( k − 1) 2) ones and form a = (( x · m k ) > > ( k 2 − ( k − 1) 2)& c k ) + (( y · m k ) > > ( k 2 − ( k − 1) 2)& c k ) . Notice that in each ﬁeld of length 2 · ( k − 1) 2 at most k 2 v alues of size less than 2 ( ( k − 1) 2) = k 2 are added. Therefore the s um is less than ( k 2) 2 and ﬁts into 2 · ( k − 1) 2 bits. The pro ces s stops when k 2 = 2 ( ( k − 1) 2) > n at the start of iteration k and a ﬁeld of ( k − 1) 2 bits can sto re the count of all bits of the input in binary . Then we multiply the intermediate result with 00 · · · 01 | {z } ( k − 1) 2 00 · · · 0 1 | {z } ( k − 1) 2 00 · · · 01 | {z } ( k − 1) 2 containing n/ ( ( k − 1) 2) ones and shift rig h t by n − ( k − 1) 2 p ositions to obta in the result. ✷ W e ﬁnally include co de for computing the pa rity function by combining ideas of the bit count function from Theor em 4 with an initia l stag e of an O (lo g( n )) metho d. By reducing the num b er o f p ossible o nes via a XOR, the ﬁe lds ca n be smaller than in the more gene r al setting of a full count. The ﬁnal multiply and shift of Theorem 4 is r eplaced by one divisio n a s in [4 , Item 169] to obtain 7 the following solution, w hich use s only 7 “C - ope rations” as o ppos e d to the 8 op erations of the “parity of word w ith a multiply” from [2]. parity (x) regist er unsig ned x; { x ^= x >> 1; // parit y of 2 bits x &= 0x555 55555; // select low bits of 2 bit fields x *= 0x15; // add three 2 bit fields x &= 0x410 41041; // select low bits of 6 bit fields return ((x % 0x3f) & 0x1); // apply HAKMEM techni que, // and return least sign. bit } This approach would w ork for w ord-leng ths up to 37 2 (with constants ad- justed). 4 Discussion W e hav e obtained several bit count and par it y alg orithms for se ts of op erations not including multiplication or division. Asymptotically the (modiﬁed) parity function with co mplexit y O (lo g n ) for the OP AL-mo del is as fast as the known solutions based on broa dw or d steps [9]. The la tter may include shift-op erations by a co nstant n umber of bit p o sitions. In a setting with the p owerful arithmetical instruction multiplication we could improve the co mplex ity of co un ting o nes from O (log log( n )) to O (lo g ∗ ( n )). The following table summarizes upp er time bo unds for countin g ones with diﬀerent sets o f op erations. Op erations depe nding on n dep. on ν ( x ) increment, decrement, O ( n ) O ( ν ( x )) logical op erations see App endix A [12] addition, logica l o p era tions O (log 2 ( n )) (P AL) Cor. 1 addition, shift, logical op erations O (log ( n )) (broadword steps) folklore, see [1] addition, shift, logical op erations, O (log ∗ ( n )) O (log log( ν ( x ))) m ultiplicatio n Thm. 4 [10] addition, shift, logical op erations, O (log log( n )) division [4, Item 169] F or most sets of o per ations we are not aware of a lgorithms depending on ν ( x ) and thus taking adv antage o f sparse inputs. A low er time bo und Ω(log n/ log log n ) on parit y (and thu s counting ones) in the mo del of broadword steps follows from a r esult in cir cuit complexity (se e [9, Ex- 127]). Although incremen t, decrement, and logical op erations seem to admit no eﬃcient algorithms, we were not a ble to prov e a low er b ound Ω( ν ( x )) corres p onding to the upper b ound from [1 2]. 8 App endix A The algo rithm from [12] of complex ity O ( ν ( x )) is bas ed on incre men t, decre- men t, and log ical opera tions. In contrast the standard metho d of complexity O ( n ) presented in [12] a s well a s the appro aches in [7] and [8, p. 47] r equire shift- or ro tate-op erations. W e therefore include an alg orithm of complexity O ( n ) that uses the mo re restricted set without shift/rota te a nd general addi- tion. It is based on technique (2) fro m [1 1]. bitcou nt(a) regist er unsig ned long a; { regist er int sum; // b it count regist er unsig ned long mask; // bit mask mask = 1; sum = 0; while( mask) { if (a & mask) sum++ ; mask |= (mask -1); // propa gate righ tmost 1 mask++ ; // new bit mask }; return (sum); } App endix B HAKMEM item 169 [4] des c r ibes a reduction of bit counting for word-lengths of at most 6 2 bits to int eg e r division. In its orig inal form the program is prese nted in a ssembly la nguage for the PDP- 6 /10 (36 bit architectures), which is not very well-kno wn now adays. 1 W e render it here in C (where the v ar iable names a and b co rresp ond to the registers of the original prog ram, co mmen ts ar e g iv en se pa rately): #defin e TWOB ITS 03333 333333 3 #defin e THRE EBITS 0307070707 07 bitcou nt(a) unsign ed a; { unsign ed b; 1 Most of the PDP-6/10 instructions are quite suggestiv e. Two p oss i ble exceptions are: LDB B,[01430 0,,A] : Load 35 bits (octal 043 ) from A with an oﬀset 1, coun ting bits from the ri gh t. Store them righ t adjusted in B (line 1 of the C program, this cannot b e implement ed directly in C and we use the approach mentione d in the comment of [4, item 169] on the LDB i nstruction). SUBB A,B : Subtract and s tore in b oth A and B (lines 6 and 7 of the C pr ogram). See [3] for mor e infor mation on the PDP-6/10 instruction set. 9 b = a>>1; // line 1 b &= TWOBI TS; // line 2 a -= b; // line 3 b >>= 1; b &= TWOBI TS; a -= b; // line 6 b = a; // line 7 b >>= 3; a += b; // line 9 a &= THREE BITS; return (a % 077); // line 11 } Comments: lines 1-6 : Cons ide r an o ctal digit o f v ar iable a co mpos ed of three bits x , y , and z with weight 4 x + 2 y + z . Then in line 3 the v alue 2 x + y + z is computed and in line 6 the sum x + y + z of the three o riginal bits is computed. line 7 : This simulates the second trans fer of SUBB . line 9 : Neighbo ring g roups of o ctal digits a re a dded in orde r to c ompute the sum of six bits. Notice that the three resulting bits are able to hold the maximum count for 6 bits without a carr y to the next o ctal digit. line 11 : Co nsider the conten ts of a as a num b e r in ba se 64 representation. Then the digit at p osition i fro m the right (starting at 0) has w eight 64 i = (63 + 1 ) i = 6 3 i + 63 i − 1 i + · · · + 6 3 i + 1 amd mo dulo 63 the sum o f all ba se 64 digits is computed. This limits the admissable word-length to 62 bits. References [1] Page “Population Coun t”. https ://che ssprogramming.wikispaces.com/ Popula tion+C ount (download June 8 , 2 015). [2] S. E. Anderson. Bit t widdling hacks. http ://grap hics.stanford.edu/ ∼ sean der/bit hacks.html (download Mar ch 19 , 2 0 14). [3] H. Bak er. PDP-10 machine languag e. http: //pdp10 .nocrew.org/docs/ instru ction- set (download Mar ch 19 , 2 014). [4] M. Beeler, R. W. Gos p er, a nd R. Schro epp el. HAKMEM. T echnical Rep ort AIM-239, Massach usetts Institute of T echnology , Cambridge, MA, USA, 1972. [5] S. B rumme. the bit t widdler. ht tp://bi ts.ste phan-brumme.com/ (down- load March 19 , 2 0 14). 10 [6] R. J. Fisher and H. G. Dietz. Co mpiling for SIMD within a register. In 11th Annual Workshop on L anguages and Compil ers for Par al lel Computing (LCPC98) , pages 2 90–304 . Spr inger V er lag, Cha p el Hill, 199 8 . [7] H. F rieden. A v aria n t technique for counting ones. Co mmu n. ACM , 3(8):474, 196 0. [8] B. W. Ker nighan and D. Ritchie. The C Pr o gr amming L anguage . P rent ice- Hall, 1978 . [9] D. Knuth. The Art of Computer Pr o gr amming: V ol 4, F ascicles 0-4 . Addison-W esley , 20 08. [10] H. Petersen. A non-oblivio us reductio n of counting o nes to multiplication. http:/ /arxiv .org/abs/1506.03473 . [11] H. S. W arren J r. F unctions realiza ble with word-parallel logical a nd tw o’s- complement addition ins tructions. Commun. ACM , 20(6):43 9–441, 197 7. [12] P . W egner . A tec hnique for co un ting ones in a binary computer. Commun. ACM , 3(5):322, 196 0. [13] M. V. Wilkes, D. J. Wheeler, and S. Gill. The Pr ep ar ation of Pr o gr ams for an Ele ctr onic D igital Computer . Addison-W esley , 2nd edition, 1957 . 11

A SWAR Approach to Counting Ones

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment