On the Probability Distribution of Superimposed Random Codes
A systematic study of the probability distribution of superimposed random codes is presented through the use of generating functions. Special attention is paid to the cases of either uniformly distributed but not necessarily independent or non unifor…
Authors: Bernd G"unther
1 On the Probability Distrib ution of Superimposed Random Codes Bernd G ¨ unther This pape r has been published with copyright by IEEE IEEE T rans. Inf. Th eory , 54(7):320 6–3210, 2 008; DOI 10.1109/TIT .2008.924658. Index T erms Database indexin g, False dro p estimates, Generating functions, Probability distrib ution, Superimposed coding Abstract A systematic study of the probability distribution of superimpo sed random codes i s presented through the use of generating functions. Special attention is paid to the cases of either uniformly distributed but not n ecessarily independ ent or non uniform but independe nt bit structures. Recommendations for optimal coding strategies are deri ved. I . I N T RO D U C T I O N Chemical structure retriev al systems are frequ ently presen ted with the task to produ ce a list o f all stored ch emical gr aphs containing a prescribed subgraph[ 1 ], [2]. Due to the absence of a line ar order am ong th e stored data tree based search strategies fail, and a sequential search has to b e perfo rmed. T o acceler ate this time consumin g process, the actual graph theoretical substructure match is preceeded b y pr escr een ing : th e entire database is matched aga inst a libr ary of simple b ut common d escriptors , an d the validity of descrip tors is record ed in a bitstring for each stor ed structure . Su itable choices for descriptors are sm all chemica l subgraph s co ntaining only few vertices, graph diamete rs, r ing sizes or any other prop erty that passes fro m subgr aphs to supergraph s. When a query structur e is su bmitted to th e system, the d escriptors are e valuated f or this q uery structu re resu lting in a query bit string. Only th ose stored structures are candidates for a match wh ere each bit is turned on in all those positions where the q uery bits are tur ned on and will b e subjected to the e xp ensiv e graph theor etical matching algorithm. For example, let us consider the co mpoun ds in table I. A chemist migh t ask for a list of all structures in our database containing 2-(cycloh exylmethyl)naphthalene, wh ich is too comp lex to b e one of the index descriptor s. Howev er, any m atching structure must necessarily contain cyclohexane a nd naphthalen e, and these mig ht be indexed. W e w ill prod uce an interm ediate result set that also contains 2-(2 -cyclohexylethyl)naphthalene, wh ich is n ot in accordance with the original qu ery specification and must be singled out by graph matching. The Beils tein database of organic comp ounds contains aro und 10 m illion structur es, each a chemical gr aph of up to 25 5 vertices, and th e num ber of d escriptors will have th e mag nitude of o ne thousan d. W ith such char acteristics, the pr eev aluated bitstrings will co nsume a co nsiderable amo unt of storag e and hence of processing time. On the other hand , one m ay expect the 1 -bits to be r elati vely scarce, whence it should be possible to comp ress the bitstring s witho ut losing too much inform ation. Thus, we are lo oking f or a map ψ : ˙ I N → ˙ I n , ˙ I = { 0 , 1 } transforming bitstrings of length N into strings of length n . However , Beilste in-Institut, Trak ehner Straße 7-9, 60487 Frankfurt, Germany . 2-(cy clohexyl methyl)naphthalene cyc lohexane naphtha lene 2-(2-c yclohexyl ethyl)naphthalene T AB LE I 2 we have to observe the partial order relation β ≤ β ′ between bitstring s defined such that the bits should satisfy β ( i ) ≤ β ′ ( i ) at all po sitions i . Since w e want to use the com pressed strings in th e sam e m anner (b y bitmasking ) as the o riginal one s the transform ation must be mo noton e: β ≤ β ′ ⇒ ψ ( β ) ≤ ψ ( β ′ ) an d in p articular ψ ( β ) ∨ ψ ( β ′ ) ≤ ψ ( β ∨ β ′ ) , where the wed ge denotes the bitwise inclusiv e or operatio n. If the latter inequ ality was strict infor mation would be lost, hence we claim: ψ ( β ) ∨ ψ ( β ′ ) = ψ ( β ∨ β ′ ) (1) for a ny two sour ce strings β , β ′ ∈ ˙ I N . This is the d efining con dition of superimp osed coding. I f we denote b y β j ∈ ˙ I N the elementary string h aving bit 1 in position j and 0 elsewhere and set ψ j := ψ ( β j ) ∈ ˙ I n equal to the cod e word assigned to bit j , then we must hav e ψ ( β ) = _ j ∈ β − 1 (1) ψ j , (2) explaining the ter m. Here β − 1 (1) is the in verse ima ge of 1 , i.e. th e set of all position s i wh ere th e i -th b it β ( i ) is turn ed on . As summar y we emphasize: in our context superposition is not a matter of c hoice but a requ irement. W e speak of supe rimposed random codin g if the cod e words ψ j are co nstructed using a r andom num ber gen erator . Non random app roaches have be en considere d e.g. in [3], [4] and perform very well when the num ber of code word s superp osed in equation (2) is bou nded, but are u npredictab le beyond this ba rrier . The ran dom appro ach was pr obably initiated by [5], but there an inv alid p robability analysis was given. A th oroug h study of the case where sou rce bitstrin gs and code words are of fixed weigh t was given in [6]. Bloo m filters (cf. [7] an d [ 8, p.5 72]) constitute a n ap plication o f this techniqu e where th e pr imary keys o f a datab ase are enco ded in a bitstrin g; it shou ld b e noted tha t the app roach fa vored by us pr oduces a twodimen sional bit array for the entire d atabase, one bitstring for each recor d. Despite th e contin uing popu larity of the subject in context of chemical substructure search a broade r systematic approach seems to be lacking, a gap that is intende d to be filled by the current paper . One q uestion th at h as to b e addressed is the o ptimal ch oice of codes. It is known a t least since Rob erts’ paper [6] th at the target bitstrings should co ntain a more o r less even balance of 0 and 1 -bits, but if we want to achieve this other than by try and error we must stud y the distribution of target b its. Of particu lar practical impo rtance is the situation where the source bits are turn ed o n with n on unifor m prob ability . A few statements hereabou t are by Rober ts in [ 6] but are based on in tuition, not co mputation , and in fact we disagree with his con clusion. In [9] the au thor gives rec ommend ations abo ut the selection of descripto rs, where as we set ou rselves the task of adapting the coding design to a given set of descriptors. W e mu st notice that the notion “rando m coding” has an in herent sem antic difficulty . It must be un derstood clearly that co ding and deco ding of a b it pattern hav e to be perfo rmed within one an d the same run of the code gen eration random experimen t. After each coding-deco ding cycle the code generatio n should be repeated independen tly . Howe ver , th is procedure does no t exactly fit when r ecording b it p atterns in databa ses: here the codes must b e fixed on ce an d fo r all. Since a large set of N codes is needed , we migh t expect good statistical behavior in this context too. The observant reader will have noticed that equation (2) allows for a target string ψ ( β ) to be covered b y a comparison pattern, ψ ( β ) ≤ ψ ( γ ) , even if the so urce patter ns d o not cover: β 6≤ γ . This will result in the inclu sion of fake hits in o ur result sets. As lon g as their num ber is small th is is ac ceptable, beca use th e b it co mparison will be followed by graph matching anyway . Howe ver , we want to predict and minimize this numbe r . In section II we will develop the pr obabilistic tools n ecessary to manage our ran dom bit strings. I n section III we demo nstrate how to co mpute th e distribution of the target b its fr om those of sour ce and code bits, and binom ially distributed and fixed weight code word s ar e intro duced as primary examples for co de g eneration . This settles the case o f unifor m but not necessarily indepen dent sour ce bit distributions. Th e requiremen t of un iformity will be dropp ed in section IV but the assump tion of indepen dence will be added. The comp letely ge neral case will be ad dressed in section V . I I . I S OT RO P I C B I T D I S T R I B U T I O N S Let’ s c onsider a fixed sour ce b it pattern β ∈ ˙ I N and a fixed target bit pattern α ∈ ˙ I n ; then P ( ψ ( β ) ≤ α ) = P ∀ j ∈ β − 1 (1) : ψ j ≤ α (3) = Y j ∈ β − 1 (1) P ( ψ j ≤ α ) , (4) where the probability is that of the code gene ration. No w varying the so urce pattern β independ ently we obtain an expression for the prob ability of target patterns: P ( ψ ( − ) ≤ α ) = X β ∈ ˙ I N P ( β ) Y j ∈ β − 1 (1) P ( ψ j ≤ α ) . (5) Definition 1: W e call a pr obability d istribution on ˙ I n isotr opic , if it dep ends o nly on th e number o f 1 -bits b ut not on their position. 3 This mean s th at the distribution is inv ariant u nder all coo rdinate permutation s (In [1 0] the term hom ogeneo us is used). As is e viden t from e quation ( 5), the distribution o f the target patterns is isotrop ic if the co de gener ation is, e ven if the source patterns are no n isotro pically d istributed. Since th is observation lea ds to a con siderable simplification of o ur an alysis we now giv e a short exposition of isotropic distributions. W e ar e given numbers p 0 , . . . p n ≥ 0 with P n k =0 n k p k = 1 such th at the prob ability of a particu lar bit pattern α ∈ ˙ I n is giv en by P ( α ) = p a with a := # α − 1 (1) . Our main analytical tool will be the proba bility genera ting functio n f ( t ) := n X k =0 n k p k t k , (6) this being the standard definitio n o btained b y co nsidering the number of 1 -bits as ran dom variable. For fixed α ∈ ˙ I n with a := # α − 1 (1) we define F a := P ( ξ ≤ α ) = a X k =0 a k p k (7) G a := P ( ξ ≥ α ) = n X k = a n − a k − a p k . (8) Of cou rse the qu antities F a and G n − a are du al to each other, in fact the on e is tra nsformed into th e other b y switchin g 0 an d 1 -bits. They play very different roles f or us, thou gh: notice the o ccurren ce of F a on the right hand side o f equation (5). G a is the relative numbe r o f candidates that will be selected b y prescreening with bit patter n α , because the cond ition ξ ≥ α singles out precisely those patterns ξ that have 1 -b its in th e same positions as α , and probably some more. W e define gen erating function s as fo llows : F ( t ) := n X m =0 n m F m t m (9) G ( t ) := n X k =0 n k G n − k t k (10) The following relation s are r eadily estab lished: F ( t ) = (1 + t ) n f t 1 + t (11) G ( t ) = ( − 1) n F ( − 1 − t ) (12) G ( t ) = t n f 1 + t t (13) p m = m X k =0 ( − 1) m + k m k F k (14) G m = m X k =0 ( − 1) k m k F n − k (15) The three sets of coefficients p m , F m and G m carry the same informa tion and can be readily co n verted into each other using the above relations. The read er will certain ly notice the dif feren ce between (11) and the standard situation [11, Thm. XI.1 ], which arises from the fact that the values α appear ing in definitio n (7) are partially but not linearly o rdered. For later reference we need the derivati ves o f f a t 1 . By (13) we hav e f (1 + ε ) = ε n G 1 ε = P n k =0 n k G k ε k . This allows to read off the T aylor coefficients of the functio n f at the point 1 at once: f ( k ) (1) k ! = n k G k (16) W e define the mom ents µ m = P n k =0 k m n k p k of our distribution as u sual an d remem ber their ge nerating fun ction[11, Exc. XI.7.24 ]: f e t = ∞ X m =0 µ m m ! t m . (17) Equation s (16) and (17) allow to express the first two moments in terms o f the distribution coefficients F m : µ 1 = n (1 − F n − 1 ) = nG 1 (18) µ 2 = n [ n − (2 n − 1) F n − 1 + ( n − 1 ) F n − 2 ] = n [( n − 1 ) G 2 + G 1 ] . (19) 4 The variance evaluates to µ 2 − µ 2 1 = n F n − 1 − nF 2 n − 1 + ( n − 1 ) F n − 2 , (20) whence in particula r: F n − 2 ≥ ( n − 1) − 1 ( nF n − 1 − 1) F n − 1 . (21) T wo examples are of primary importan ce in our context: Example 1: The b inomial distribution p m = (1 − q ) m q n − m with parameter 1 − q . H ere we hav e: f ( t ) = ( q + (1 − q ) t ) n (22) F ( t ) = ( q + t ) n (23) G ( t ) = (1 − q + t ) n (24) F m = q n − m (25) G m = (1 − q ) m (26) µ 1 = n (1 − q ) (27) µ 2 = n (1 − q ) [ n (1 − q ) + q ] (28) µ 2 − µ 2 1 = nq (1 − q ) . (29) Example 2: Fixed weight. Here only bit patter ns with a fixed number w of 1 -bits are p ermitted. W e g et: p m = ( n w − 1 m = w 0 m 6 = w (30) f ( t ) = t w (31) F ( t ) = (1 + t ) n − w t w (32) G ( t ) = (1 + t ) w t n − w (33) F m = ( n − w n − m ) ( n m ) m ≥ w 0 m < w (34) G m = ( w m ) ( n m ) m ≤ w 0 m > w (35) µ m = w m (36) µ 2 − µ 2 1 = 0 . (37) Hardly surprising , inequ ality (2 1) is sharp for this distribution. I I I . U N I F O R M C O D E W O R D G E N E R AT I O N Denoting the target distribution’ s co efficients by ˇ F a equation (5) c an be written as: ˇ F a = X β ∈ ˙ I N P ( β ) Y j ∈ β − 1 (1) F ( j ) a , (38) where the coefficients F ( j ) a describe the rando m experiment used for generating the j -th code word. If we take these independ ent of j , (38) simplifies considera bly . Intro ducing th e so urce bits probab ility generatin g fun ction Π( t ) := X β ∈ ˙ I N P ( β ) t # β − 1 (1) , (39) we can formulate a simple but significan t the orem: Theor em 1: If the co de word g eneration is performed un iformly with coefficients F a and Π is the generating fun ction o f the source pattern space, then the target bit distribution is giv en by ˇ F m = Π ( F m ) . This th eorem is the pi vot en abling us to compute the target distribution when source and code distribution are k nown. The sour ce bit distribution acts as transformation turning the co de bit d istribution into the target bit distrib utio n. The target distribution d epends linearly o n the source distribution but in a co mplex way on the code distribution. W e can immediately d erive the first two mo ments of the target bit distribution fr om eq uations (18) and (19): ˇ µ 1 = n [1 − Π ( F n − 1 )] (40) ˇ µ 2 = n [ n − (2 n − 1)Π ( F n − 1 ) + ( n − 1)Π ( F n − 2 )] (41) ˇ µ 2 − ˇ µ 2 1 = n h Π ( F n − 1 ) − n Π ( F n − 1 ) 2 + ( n − 1)Π ( F n − 2 ) i (42) 5 Example 3: Let’ s consider fixed we ight source words o f weight r and bin omially g enerated code words with par ameter q . From (31) we can read off the tran sformation function π ( t ) = t r , and b y (25) the coefficients are g iv en by F m = q n − m . Now our theo rem implies ˇ F m = q r ( n − m ) , i.e., th e target b its are b inomially distributed with parameter q r . Then (2 6) immediately implies ˇ G m = (1 − q r ) m , and b y (29) the variance equals ˇ µ 2 − ˇ µ 2 1 = nq r (1 − q r ) . This case is simp le because the individual target b its are in depend ently distributed, wh ich is n ot true in general. By definition ˇ G m equals the expected relative number of candidates m atching a test p attern of length m . They are determin ed by statistics with out ref erence to the or iginal content, he nce they are consider ed “false drops” 1 and their n umber shall b e minimized. Pur suing our example further, we assume that the test pattern is obtained by su bmitting a qu ery source string of weight s to the same superimposed co ding p rocess. Any pattern of weight m in the target query space will o ccur with probab ility ˜ p m = (1 − q s ) m q s ( n − m ) and we can expect a prop ortion of ϑ = P n m =0 n m ˇ G m ˜ p m = [(1 − q r ) ( 1 − q s ) + q s ] n random hits. ϑ is minimal if we choose q s = r r + s with ϑ = 1 − r r + s r s s r + s n . For r ≫ s we ha ve q r = r r + s r s = 1 − s r + s r s ≈ e − r r + s ≈ e − 1 and ϑ = 1 − 1 e (1 − q s ) n = 1 − 1 e 1 − e − s r n , hence ln ϑ ≈ − ns er . If we do no t want the relativ e n umber of false dr ops to exceed a propor tion of ϑ max but h av e to allow fo r query word s of length at least s min , then we must choose ou r target bit pattern s of leng th n & r e s min | ln ϑ max | . Notice that the o riginal bitstring length N do es not ente r at all, just the numb er of 1 -b its r is relev ant. Example 4: Our example above was ch osen fo r its simplicity ; we don’t recom mend using binomia lly distributed code words in practice. Fixed weight code words of weight w can be expected to exhibit a m uch more robust beh avior . Because of equation s (40) and (42) and the m onoton icity of probab ility ge nerating f unctions fixed weigh t code word s will produce minimal variance in the target distribution if the expectatio n is prescribed. Both ca ses may b e directly com pared if the p arameter of the b inomial d istribution is chosen as q = 1 − w n , because by (34) F n − 1 = n − w 1 / n n − 1 = n − w n = q . Furthermo re F n − 2 = n − w 2 / n n − 2 = ( n − w )( n − w − 1) n ( n − 1) = q nq − 1 n − 1 and ˇ F n − 2 = q r nq − 1 n − 1 r . The variance var = ˇ µ 2 − ˇ µ 2 1 = n q r − nq 2 r + ( n − 1) q r nq − 1 n − 1 r (43) = n q r − q 2 r − ( n − 1 ) q 2 r − q r q − 1 − q n − 1 r (44) can be computed asymptotically for large n by developing the rightmo st br acket in a power ser ies q − 1 − q n − 1 r = q r − rq r − 1 1 − q n − 1 ± · · · , where all higher powers may be safely discarded: var ≈ nq r (1 − q r ) 1 − r (1 − q ) q r − 1 1 − q r , (45) the first factor b eing equal to the value in the bin omial ca se. With suitab le p arameters the factor in square brackets may be quite small, i.e., the fixed we ight variance may be negligible while th e b inomial variance is not. W e reconsider exam ple 3 with fixed weight code words. In [6] it is shown 2 ϑ ≈ (1 − q r ) n (1 − q s ) . The minimu m value is attained for q r ≈ 1 2 with ln ϑ ≈ − ns r (ln 2) 2 . T his value is slightly better than in example 3. W e hav e var ≈ ns 2 r 2 (ln 2) 2 , w hich is by a factor s r ln 2 smaller than the binom ial case. I V . A DA P T I N G T O N O N U N I F O R M B U T I N D E P E N D E N T S O U R C E B I T D I S T R I B U T I O N S If the individual sour ce bits are ind ependen t and the i -th b it is turned o n with p robability p i the pr obability o f a gi ven sour ce pattern β ∈ ˙ I N is P ( β ) = Y i ∈ β − 1 (1) p i Y i 6∈ β − 1 (1) (1 − p i ) (46) ˇ F a = N Y j =1 p j F ( j ) a + 1 − p j (47) (47) is obtain ed by inserting (46) into (38) and distrib utively collecting terms. In our selection of op timal code word distrib utio ns we let our selves be guided b y what we learn ed in sectio n I II: W e set ˇ F n − 1 = 1 2 and min imize ˇ F n − 2 , o bserving that by equation (19) this is equiv alent to m inimization of the second m oment and hen ce of the variance. By inequality (21) F ( j ) n − 2 ≥ ( n − 1) − 1 nF ( j ) n − 1 − 1 F ( j ) n − 1 , the lower boun d bein g attained fo r fixed weigh t co de words. Substituting u j := p j F ( j ) n − 1 + 1 − p j (48) 1 This idiomat ic expre ssion is historic and deri ves from the applicati on to punched cards. 2 As a matter of fac t the equation is deri ved by using the binomial case as approximation and observing that target query weights hav e small v ariance. 6 we have to solve th e minimization problem ˇ F n − 2 = n N ( n − 1) N Q j p j · · N Y j =1 n ( u j − ν j (1 − p j )) 2 + ω j p j (1 − p j ) o = min (49) under the constraint 1 2 = ˇ F n − 1 = N Y j =1 u j (50) in the domain 1 − n − 1 n p j ≤ u j ≤ 1 (51) with ν j := (1 − p j ) − 1 1 − 1 − 1 2 n p j (52) ω j := (1 − p j ) − 1 " 1 − 1 n − 1 − 1 2 n 2 p j # = p − 1 j 1 − ν 2 j (1 − p j ) . (53) Observe that the terms ν j and ω j introdu ced for abbreviation ar e very c lose to 1 , in particular ω j > 0 . I gnoring the restriction s (51) temporarily , we see that ( 49) tends to + ∞ if at least o ne of the co ordinates u j does, therefor e (49) must h av e a n absolute minimum. W e are going to locate it u sing a Lagrange multiplier and will then have to check conditions (51). Thus 0 = ∂ ∂ u j ln ˇ F n − 2 − λ N X j =1 ln u j (54) 2 [ u j − ν j (1 − p j )] u j = λ u 2 j − 2 ν j (1 − p j ) u j + 1 − p j (55) N Y j =1 u j − ν j (1 − p j ) p j = 2 λ 2 1 − 1 n N ˇ F n − 2 . (56) W e see that λ 2 will be app roximately the geometric mean of the qu antities F ( j ) n − 1 and hence close to 1 . Expan ding u j = α + β ( λ − 2 ) + γ ( ν j − 1) ± · · · up to linear order in the sm all quan tities λ − 2 and ν j − 1 and substituting into (55) lead s to λ − 2 ≈ 1 n − 2 ln 2 P N k =1 p k 1 − p k (57) u j ≈ 1 − p j 1 − p j ln 2 P N k =1 p k 1 − p k (58) F ( j ) n − 1 ≈ 1 − 1 1 − p j ln 2 P N k =1 p k 1 − p k . (59) Summarizin g: Theor em 2: An op timal ta rget distribution is obtained by choo sing fixed weigh t c ode words of weight n 1 − p j ln 2 P N k =1 p k 1 − p k for encodin g of bit j . If the probab ilities p j are actually independ ent of j then this coincides with section III. V . T H E G E N E R A L C A S E There remains the question what to do if the sour ce bit distributions are n either u niform nor indepe ndent. W e may take a clue from theorem 2: as long as the individual bit pro babilities p j are not too large, say < 1 / 2 , then the code word length s recommen ded there do not vary significantly . W e may try one and th e same code distribution for all source bits and thus p lace 7 ourselves in the situa tion of theorem 1. The generatin g fun ction defined in equation (39) is the same that would be o btained from an isotropic bit distribution with p m = N m − 1 X β ∈ ˙ I N , # β − 1 (1)= m P ( β ) . (60) Choosing fixed length co de words of weight n (1 − q ) fo r a suitable pa rameter q theorem 1 tells us th at the target bits will have an expected weight of Π( q ) and we want to arrange for Π( q ) = 1 − ˇ G 1 = 1 2 . Substitutin g q = e − ε with 0 < ε ≪ 1 we derive from (17): ˇ G 1 = 1 − Π e − ε = ∞ X m =1 ( − 1) m +1 µ m m ! ε m . (61) This equ ation is qu ite suitable for pr actical application , b ecause the p ower series is fast conver ging and the lower moments of the source bit distribution ar e ea sily ev aluated. W e can solve for ε : ε = 1 µ 1 ˇ G 1 + µ 2 2 µ 3 1 ˇ G 2 1 + 3 µ 2 2 − µ 1 µ 3 6 µ 5 1 ˇ G 3 1 ± · · · (62) Con vergence of this power series is ag ain fast enou gh to use the partial sum above as pra ctical estima te. Notice that in case of source words o f fixed weigh t r we h av e µ m = r m and we recover section III exactly . R E F E R E N C E S [1] J. M. Barnard, “Substructure searching methods: Old and new , ” J. Chem. Inf. Comput. Sci. , vol. 33, pp. 532–538, 1993. [2] J. M. Barnard and D. W alko wiak, “Computer systems for substructure searching, ” in The Beilstei n System – Strate gies for Effect ive Sear ching , S. Heller , Ed. American Chemica l Society , 1998, pp. 55–72. [3] W . H. Kautz and R. C. Single ton, “Nonrando m binary superimposed codes, ” IEEE T rans. on Information Theory , vol . 10, no. 4, pp. 363–377, October 1964. [4] A. M. Rashad, “Superimposed codes for the search model of a. renyi, ” Intern. J. Compute r Math. , vol. 36, pp. 47–56, 1990. [5] C. N. Mooers, “ Applicati on of random codes to the gathe ring of stati stical informatio n, ” Master’ s thesis, Massachuset ts Institute of T echnology , 1948. [6] C. S. Robert s, “Parti al-match retrie val via the method of superimposed codes, ” Proce edings of the IEE E , vol . 67, no. 12, pp. 1624–1642, December 1979. [7] B. H. Bloom, “Space/t ime trade-of fs in hash coding with allo wable errors, ” CACM , vol. 13, no. 7, pp. 422–426, July 1970. [8] D. E. Knuth, The Art of Computer Pro gramming , 2nd ed. Addison-W esle y , 1998, vol . 3. [9] L. Hodes, “Select ion of descriptors according to discriminati on and redunda ncy . applicat ion to chemical structure searching, ” J. Chem. Inf. Comput. Sci. , vol. 16, pp. 88–93, 1976. [10] A. Reny i, “On the theory of random search, ” Bull. Amer . Math. Soc. , vol . 71, no. 6, pp. 809–928, 1965. [11] W . Feller , An Intr oduction to Probab ility T heory and Its Applicat ions , 3rd ed. Princeton , 1970.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment