On the Probability Distribution of Superimposed Random Codes

1 On the Probability Distrib ution of Superimposed Random Codes Bernd G ¨ unther This pape r has been published with copyright by IEEE IEEE T rans. Inf. Th eory , 54(7):320 6–3210, 2 008; DOI 10.1109/TIT .2008.924658. Index T erms Database indexin g, False dro p estimates, Generating functions, Probability distrib ution, Superimposed coding Abstract A systematic study of the probability distribution of superimpo sed random codes i s presented through the use of generating functions. Special attention is paid to the cases of either uniformly distributed but not n ecessarily independ ent or non uniform but independe nt bit structures. Recommendations for optimal coding strategies are deri ved. I . I N T RO D U C T I O N Chemical structure retriev al systems are frequ ently presen ted with the task to produ ce a list o f all stored ch emical gr aphs containing a prescribed subgraph[ 1 ], [2]. Due to the absence of a line ar order am ong th e stored data tree based search strategies fail, and a sequential search has to b e perfo rmed. T o acceler ate this time consumin g process, the actual graph theoretical substructure match is preceeded b y pr escr een ing : th e entire database is matched aga inst a libr ary of simple b ut common d escriptors , an d the validity of descrip tors is record ed in a bitstring for each stor ed structure . Su itable choices for descriptors are sm all chemica l subgraph s co ntaining only few vertices, graph diamete rs, r ing sizes or any other prop erty that passes fro m subgr aphs to supergraph s. When a query structur e is su bmitted to th e system, the d escriptors are e valuated f or this q uery structu re resu lting in a query bit string. Only th ose stored structures are candidates for a match wh ere each bit is turned on in all those positions where the q uery bits are tur ned on and will b e subjected to the e xp ensiv e graph theor etical matching algorithm. For example, let us consider the co mpoun ds in table I. A chemist migh t ask for a list of all structures in our database containing 2-(cycloh exylmethyl)naphthalene, wh ich is too comp lex to b e one of the index descriptor s. Howev er, any m atching structure must necessarily contain cyclohexane a nd naphthalen e, and these mig ht be indexed. W e w ill prod uce an interm ediate result set that also contains 2-(2 -cyclohexylethyl)naphthalene, wh ich is n ot in accordance with the original qu ery speciﬁcation and must be singled out by graph matching. The Beils tein database of organic comp ounds contains aro und 10 m illion structur es, each a chemical gr aph of up to 25 5 vertices, and th e num ber of d escriptors will have th e mag nitude of o ne thousan d. W ith such char acteristics, the pr eev aluated bitstrings will co nsume a co nsiderable amo unt of storag e and hence of processing time. On the other hand , one m ay expect the 1 -bits to be r elati vely scarce, whence it should be possible to comp ress the bitstring s witho ut losing too much inform ation. Thus, we are lo oking f or a map ψ : ˙ I N → ˙ I n , ˙ I = { 0 , 1 } transforming bitstrings of length N into strings of length n . However , Beilste in-Institut, Trak ehner Straße 7-9, 60487 Frankfurt, Germany . 2-(cy clohexyl methyl)naphthalene cyc lohexane naphtha lene 2-(2-c yclohexyl ethyl)naphthalene T AB LE I 2 we have to observe the partial order relation β ≤ β ′ between bitstring s deﬁned such that the bits should satisfy β ( i ) ≤ β ′ ( i ) at all po sitions i . Since w e want to use the com pressed strings in th e sam e m anner (b y bitmasking ) as the o riginal one s the transform ation must be mo noton e: β ≤ β ′ ⇒ ψ ( β ) ≤ ψ ( β ′ ) an d in p articular ψ ( β ) ∨ ψ ( β ′ ) ≤ ψ ( β ∨ β ′ ) , where the wed ge denotes the bitwise inclusiv e or operatio n. If the latter inequ ality was strict infor mation would be lost, hence we claim: ψ ( β ) ∨ ψ ( β ′ ) = ψ ( β ∨ β ′ ) (1) for a ny two sour ce strings β , β ′ ∈ ˙ I N . This is the d eﬁning con dition of superimp osed coding. I f we denote b y β j ∈ ˙ I N the elementary string h aving bit 1 in position j and 0 elsewhere and set ψ j := ψ ( β j ) ∈ ˙ I n equal to the cod e word assigned to bit j , then we must hav e ψ ( β ) = _ j ∈ β − 1 (1) ψ j , (2) explaining the ter m. Here β − 1 (1) is the in verse ima ge of 1 , i.e. th e set of all position s i wh ere th e i -th b it β ( i ) is turn ed on . As summar y we emphasize: in our context superposition is not a matter of c hoice but a requ irement. W e speak of supe rimposed random codin g if the cod e words ψ j are co nstructed using a r andom num ber gen erator . Non random app roaches have be en considere d e.g. in [3], [4] and perform very well when the num ber of code word s superp osed in equation (2) is bou nded, but are u npredictab le beyond this ba rrier . The ran dom appro ach was pr obably initiated by [5], but there an inv alid p robability analysis was given. A th oroug h study of the case where sou rce bitstrin gs and code words are of ﬁxed weigh t was given in [6]. Bloo m ﬁlters (cf. [7] an d [ 8, p.5 72]) constitute a n ap plication o f this techniqu e where th e pr imary keys o f a datab ase are enco ded in a bitstrin g; it shou ld b e noted tha t the app roach fa vored by us pr oduces a twodimen sional bit array for the entire d atabase, one bitstring for each recor d. Despite th e contin uing popu larity of the subject in context of chemical substructure search a broade r systematic approach seems to be lacking, a gap that is intende d to be ﬁlled by the current paper . One q uestion th at h as to b e addressed is the o ptimal ch oice of codes. It is known a t least since Rob erts’ paper [6] th at the target bitstrings should co ntain a more o r less even balance of 0 and 1 -bits, but if we want to achieve this other than by try and error we must stud y the distribution of target b its. Of particu lar practical impo rtance is the situation where the source bits are turn ed o n with n on unifor m prob ability . A few statements hereabou t are by Rober ts in [ 6] but are based on in tuition, not co mputation , and in fact we disagree with his con clusion. In [9] the au thor gives rec ommend ations abo ut the selection of descripto rs, where as we set ou rselves the task of adapting the coding design to a given set of descriptors. W e mu st notice that the notion “rando m coding” has an in herent sem antic difﬁculty . It must be un derstood clearly that co ding and deco ding of a b it pattern hav e to be perfo rmed within one an d the same run of the code gen eration random experimen t. After each coding-deco ding cycle the code generatio n should be repeated independen tly . Howe ver , th is procedure does no t exactly ﬁt when r ecording b it p atterns in databa ses: here the codes must b e ﬁxed on ce an d fo r all. Since a large set of N codes is needed , we migh t expect good statistical behavior in this context too. The observant reader will have noticed that equation (2) allows for a target string ψ ( β ) to be covered b y a comparison pattern, ψ ( β ) ≤ ψ ( γ ) , even if the so urce patter ns d o not cover: β 6≤ γ . This will result in the inclu sion of fake hits in o ur result sets. As lon g as their num ber is small th is is ac ceptable, beca use th e b it co mparison will be followed by graph matching anyway . Howe ver , we want to predict and minimize this numbe r . In section II we will develop the pr obabilistic tools n ecessary to manage our ran dom bit strings. I n section III we demo nstrate how to co mpute th e distribution of the target b its fr om those of sour ce and code bits, and binom ially distributed and ﬁxed weight code word s ar e intro duced as primary examples for co de g eneration . This settles the case o f unifor m but not necessarily indepen dent sour ce bit distributions. Th e requiremen t of un iformity will be dropp ed in section IV but the assump tion of indepen dence will be added. The comp letely ge neral case will be ad dressed in section V . I I . I S OT RO P I C B I T D I S T R I B U T I O N S Let’ s c onsider a ﬁxed sour ce b it pattern β ∈ ˙ I N and a ﬁxed target bit pattern α ∈ ˙ I n ; then P ( ψ ( β ) ≤ α ) = P  ∀ j ∈ β − 1 (1) : ψ j ≤ α  (3) = Y j ∈ β − 1 (1) P ( ψ j ≤ α ) , (4) where the probability is that of the code gene ration. No w varying the so urce pattern β independ ently we obtain an expression for the prob ability of target patterns: P ( ψ ( − ) ≤ α ) = X β ∈ ˙ I N P ( β ) Y j ∈ β − 1 (1) P ( ψ j ≤ α ) . (5) Deﬁnition 1: W e call a pr obability d istribution on ˙ I n isotr opic , if it dep ends o nly on th e number o f 1 -bits b ut not on their position. 3 This mean s th at the distribution is inv ariant u nder all coo rdinate permutation s (In [1 0] the term hom ogeneo us is used). As is e viden t from e quation ( 5), the distribution o f the target patterns is isotrop ic if the co de gener ation is, e ven if the source patterns are no n isotro pically d istributed. Since th is observation lea ds to a con siderable simpliﬁcation of o ur an alysis we now giv e a short exposition of isotropic distributions. W e ar e given numbers p 0 , . . . p n ≥ 0 with P n k =0  n k  p k = 1 such th at the prob ability of a particu lar bit pattern α ∈ ˙ I n is giv en by P ( α ) = p a with a := # α − 1 (1) . Our main analytical tool will be the proba bility genera ting functio n f ( t ) := n X k =0  n k  p k t k , (6) this being the standard deﬁnitio n o btained b y co nsidering the number of 1 -bits as ran dom variable. For ﬁxed α ∈ ˙ I n with a := # α − 1 (1) we deﬁne F a := P ( ξ ≤ α ) = a X k =0  a k  p k (7) G a := P ( ξ ≥ α ) = n X k = a  n − a k − a  p k . (8) Of cou rse the qu antities F a and G n − a are du al to each other, in fact the on e is tra nsformed into th e other b y switchin g 0 an d 1 -bits. They play very different roles f or us, thou gh: notice the o ccurren ce of F a on the right hand side o f equation (5). G a is the relative numbe r o f candidates that will be selected b y prescreening with bit patter n α , because the cond ition ξ ≥ α singles out precisely those patterns ξ that have 1 -b its in th e same positions as α , and probably some more. W e deﬁne gen erating function s as fo llows : F ( t ) := n X m =0  n m  F m t m (9) G ( t ) := n X k =0  n k  G n − k t k (10) The following relation s are r eadily estab lished: F ( t ) = (1 + t ) n f  t 1 + t  (11) G ( t ) = ( − 1) n F ( − 1 − t ) (12) G ( t ) = t n f  1 + t t  (13) p m = m X k =0 ( − 1) m + k  m k  F k (14) G m = m X k =0 ( − 1) k  m k  F n − k (15) The three sets of coefﬁcients p m , F m and G m carry the same informa tion and can be readily co n verted into each other using the above relations. The read er will certain ly notice the dif feren ce between (11) and the standard situation [11, Thm. XI.1 ], which arises from the fact that the values α appear ing in deﬁnitio n (7) are partially but not linearly o rdered. For later reference we need the derivati ves o f f a t 1 . By (13) we hav e f (1 + ε ) = ε n G  1 ε  = P n k =0  n k  G k ε k . This allows to read off the T aylor coefﬁcients of the functio n f at the point 1 at once: f ( k ) (1) k ! =  n k  G k (16) W e deﬁne the mom ents µ m = P n k =0 k m  n k  p k of our distribution as u sual an d remem ber their ge nerating fun ction[11, Exc. XI.7.24 ]: f  e t  = ∞ X m =0 µ m m ! t m . (17) Equation s (16) and (17) allow to express the ﬁrst two moments in terms o f the distribution coefﬁcients F m : µ 1 = n (1 − F n − 1 ) = nG 1 (18) µ 2 = n [ n − (2 n − 1) F n − 1 + ( n − 1 ) F n − 2 ] = n [( n − 1 ) G 2 + G 1 ] . (19) 4 The variance evaluates to µ 2 − µ 2 1 = n  F n − 1 − nF 2 n − 1 + ( n − 1 ) F n − 2  , (20) whence in particula r: F n − 2 ≥ ( n − 1) − 1 ( nF n − 1 − 1) F n − 1 . (21) T wo examples are of primary importan ce in our context: Example 1: The b inomial distribution p m = (1 − q ) m q n − m with parameter 1 − q . H ere we hav e: f ( t ) = ( q + (1 − q ) t ) n (22) F ( t ) = ( q + t ) n (23) G ( t ) = (1 − q + t ) n (24) F m = q n − m (25) G m = (1 − q ) m (26) µ 1 = n (1 − q ) (27) µ 2 = n (1 − q ) [ n (1 − q ) + q ] (28) µ 2 − µ 2 1 = nq (1 − q ) . (29) Example 2: Fixed weight. Here only bit patter ns with a ﬁxed number w of 1 -bits are p ermitted. W e g et: p m = (  n w  − 1 m = w 0 m 6 = w (30) f ( t ) = t w (31) F ( t ) = (1 + t ) n − w t w (32) G ( t ) = (1 + t ) w t n − w (33) F m =    ( n − w n − m ) ( n m ) m ≥ w 0 m < w (34) G m =    ( w m ) ( n m ) m ≤ w 0 m > w (35) µ m = w m (36) µ 2 − µ 2 1 = 0 . (37) Hardly surprising , inequ ality (2 1) is sharp for this distribution. I I I . U N I F O R M C O D E W O R D G E N E R AT I O N Denoting the target distribution’ s co efﬁcients by ˇ F a equation (5) c an be written as: ˇ F a = X β ∈ ˙ I N P ( β ) Y j ∈ β − 1 (1) F ( j ) a , (38) where the coefﬁcients F ( j ) a describe the rando m experiment used for generating the j -th code word. If we take these independ ent of j , (38) simpliﬁes considera bly . Intro ducing th e so urce bits probab ility generatin g fun ction Π( t ) := X β ∈ ˙ I N P ( β ) t # β − 1 (1) , (39) we can formulate a simple but signiﬁcan t the orem: Theor em 1: If the co de word g eneration is performed un iformly with coefﬁcients F a and Π is the generating fun ction o f the source pattern space, then the target bit distribution is giv en by ˇ F m = Π ( F m ) . This th eorem is the pi vot en abling us to compute the target distribution when source and code distribution are k nown. The sour ce bit distribution acts as transformation turning the co de bit d istribution into the target bit distrib utio n. The target distribution d epends linearly o n the source distribution but in a co mplex way on the code distribution. W e can immediately d erive the ﬁrst two mo ments of the target bit distribution fr om eq uations (18) and (19): ˇ µ 1 = n [1 − Π ( F n − 1 )] (40) ˇ µ 2 = n [ n − (2 n − 1)Π ( F n − 1 ) + ( n − 1)Π ( F n − 2 )] (41) ˇ µ 2 − ˇ µ 2 1 = n h Π ( F n − 1 ) − n Π ( F n − 1 ) 2 + ( n − 1)Π ( F n − 2 ) i (42) 5 Example 3: Let’ s consider ﬁxed we ight source words o f weight r and bin omially g enerated code words with par ameter q . From (31) we can read off the tran sformation function π ( t ) = t r , and b y (25) the coefﬁcients are g iv en by F m = q n − m . Now our theo rem implies ˇ F m = q r ( n − m ) , i.e., th e target b its are b inomially distributed with parameter q r . Then (2 6) immediately implies ˇ G m = (1 − q r ) m , and b y (29) the variance equals ˇ µ 2 − ˇ µ 2 1 = nq r (1 − q r ) . This case is simp le because the individual target b its are in depend ently distributed, wh ich is n ot true in general. By deﬁnition ˇ G m equals the expected relative number of candidates m atching a test p attern of length m . They are determin ed by statistics with out ref erence to the or iginal content, he nce they are consider ed “false drops” 1 and their n umber shall b e minimized. Pur suing our example further, we assume that the test pattern is obtained by su bmitting a qu ery source string of weight s to the same superimposed co ding p rocess. Any pattern of weight m in the target query space will o ccur with probab ility ˜ p m = (1 − q s ) m q s ( n − m ) and we can expect a prop ortion of ϑ = P n m =0  n m  ˇ G m ˜ p m = [(1 − q r ) ( 1 − q s ) + q s ] n random hits. ϑ is minimal if we choose q s = r r + s with ϑ =  1 −  r r + s  r s s r + s  n . For r ≫ s we ha ve q r =  r r + s  r s =  1 − s r + s  r s ≈ e − r r + s ≈ e − 1 and ϑ =  1 − 1 e (1 − q s )  n =  1 − 1 e  1 − e − s r  n , hence ln ϑ ≈ − ns er . If we do no t want the relativ e n umber of false dr ops to exceed a propor tion of ϑ max but h av e to allow fo r query word s of length at least s min , then we must choose ou r target bit pattern s of leng th n & r e s min | ln ϑ max | . Notice that the o riginal bitstring length N do es not ente r at all, just the numb er of 1 -b its r is relev ant. Example 4: Our example above was ch osen fo r its simplicity ; we don’t recom mend using binomia lly distributed code words in practice. Fixed weight code words of weight w can be expected to exhibit a m uch more robust beh avior . Because of equation s (40) and (42) and the m onoton icity of probab ility ge nerating f unctions ﬁxed weigh t code word s will produce minimal variance in the target distribution if the expectatio n is prescribed. Both ca ses may b e directly com pared if the p arameter of the b inomial d istribution is chosen as q = 1 − w n , because by (34) F n − 1 =  n − w 1  /  n n − 1  = n − w n = q . Furthermo re F n − 2 =  n − w 2  /  n n − 2  = ( n − w )( n − w − 1) n ( n − 1) = q nq − 1 n − 1 and ˇ F n − 2 = q r  nq − 1 n − 1  r . The variance var = ˇ µ 2 − ˇ µ 2 1 = n  q r − nq 2 r + ( n − 1) q r  nq − 1 n − 1  r  (43) = n  q r − q 2 r − ( n − 1 )  q 2 r − q r  q − 1 − q n − 1  r  (44) can be computed asymptotically for large n by developing the rightmo st br acket in a power ser ies  q − 1 − q n − 1  r = q r − rq r − 1 1 − q n − 1 ± · · · , where all higher powers may be safely discarded: var ≈ nq r (1 − q r )  1 − r (1 − q ) q r − 1 1 − q r  , (45) the ﬁrst factor b eing equal to the value in the bin omial ca se. With suitab le p arameters the factor in square brackets may be quite small, i.e., the ﬁxed we ight variance may be negligible while th e b inomial variance is not. W e reconsider exam ple 3 with ﬁxed weight code words. In [6] it is shown 2 ϑ ≈ (1 − q r ) n (1 − q s ) . The minimu m value is attained for q r ≈ 1 2 with ln ϑ ≈ − ns r (ln 2) 2 . T his value is slightly better than in example 3. W e hav e var ≈ ns 2 r 2 (ln 2) 2 , w hich is by a factor s r ln 2 smaller than the binom ial case. I V . A DA P T I N G T O N O N U N I F O R M B U T I N D E P E N D E N T S O U R C E B I T D I S T R I B U T I O N S If the individual sour ce bits are ind ependen t and the i -th b it is turned o n with p robability p i the pr obability o f a gi ven sour ce pattern β ∈ ˙ I N is P ( β ) = Y i ∈ β − 1 (1) p i Y i 6∈ β − 1 (1) (1 − p i ) (46) ˇ F a = N Y j =1  p j F ( j ) a + 1 − p j  (47) (47) is obtain ed by inserting (46) into (38) and distrib utively collecting terms. In our selection of op timal code word distrib utio ns we let our selves be guided b y what we learn ed in sectio n I II: W e set ˇ F n − 1 = 1 2 and min imize ˇ F n − 2 , o bserving that by equation (19) this is equiv alent to m inimization of the second m oment and hen ce of the variance. By inequality (21) F ( j ) n − 2 ≥ ( n − 1) − 1  nF ( j ) n − 1 − 1  F ( j ) n − 1 , the lower boun d bein g attained fo r ﬁxed weigh t co de words. Substituting u j := p j F ( j ) n − 1 + 1 − p j (48) 1 This idiomat ic expre ssion is historic and deri ves from the applicati on to punched cards. 2 As a matter of fac t the equation is deri ved by using the binomial case as approximation and observing that target query weights hav e small v ariance. 6 we have to solve th e minimization problem ˇ F n − 2 = n N ( n − 1) N Q j p j · · N Y j =1 n ( u j − ν j (1 − p j )) 2 + ω j p j (1 − p j ) o = min (49) under the constraint 1 2 = ˇ F n − 1 = N Y j =1 u j (50) in the domain 1 − n − 1 n p j ≤ u j ≤ 1 (51) with ν j := (1 − p j ) − 1  1 −  1 − 1 2 n  p j  (52) ω j := (1 − p j ) − 1 " 1 − 1 n −  1 − 1 2 n  2 p j # = p − 1 j  1 − ν 2 j (1 − p j )  . (53) Observe that the terms ν j and ω j introdu ced for abbreviation ar e very c lose to 1 , in particular ω j > 0 . I gnoring the restriction s (51) temporarily , we see that ( 49) tends to + ∞ if at least o ne of the co ordinates u j does, therefor e (49) must h av e a n absolute minimum. W e are going to locate it u sing a Lagrange multiplier and will then have to check conditions (51). Thus 0 = ∂ ∂ u j   ln ˇ F n − 2 − λ N X j =1 ln u j   (54) 2 [ u j − ν j (1 − p j )] u j = λ  u 2 j − 2 ν j (1 − p j ) u j + 1 − p j  (55) N Y j =1 u j − ν j (1 − p j ) p j = 2  λ 2  1 − 1 n  N ˇ F n − 2 . (56) W e see that λ 2 will be app roximately the geometric mean of the qu antities F ( j ) n − 1 and hence close to 1 . Expan ding u j = α + β ( λ − 2 ) + γ ( ν j − 1) ± · · · up to linear order in the sm all quan tities λ − 2 and ν j − 1 and substituting into (55) lead s to λ − 2 ≈ 1 n − 2 ln 2 P N k =1 p k 1 − p k (57) u j ≈ 1 − p j 1 − p j ln 2 P N k =1 p k 1 − p k (58) F ( j ) n − 1 ≈ 1 − 1 1 − p j ln 2 P N k =1 p k 1 − p k . (59) Summarizin g: Theor em 2: An op timal ta rget distribution is obtained by choo sing ﬁxed weigh t c ode words of weight n 1 − p j ln 2 P N k =1 p k 1 − p k for encodin g of bit j . If the probab ilities p j are actually independ ent of j then this coincides with section III. V . T H E G E N E R A L C A S E There remains the question what to do if the sour ce bit distributions are n either u niform nor indepe ndent. W e may take a clue from theorem 2: as long as the individual bit pro babilities p j are not too large, say < 1 / 2 , then the code word length s recommen ded there do not vary signiﬁcantly . W e may try one and th e same code distribution for all source bits and thus p lace 7 ourselves in the situa tion of theorem 1. The generatin g fun ction deﬁned in equation (39) is the same that would be o btained from an isotropic bit distribution with p m =  N m  − 1 X β ∈ ˙ I N , # β − 1 (1)= m P ( β ) . (60) Choosing ﬁxed length co de words of weight n (1 − q ) fo r a suitable pa rameter q theorem 1 tells us th at the target bits will have an expected weight of Π( q ) and we want to arrange for Π( q ) = 1 − ˇ G 1 = 1 2 . Substitutin g q = e − ε with 0 < ε ≪ 1 we derive from (17): ˇ G 1 = 1 − Π  e − ε  = ∞ X m =1 ( − 1) m +1 µ m m ! ε m . (61) This equ ation is qu ite suitable for pr actical application , b ecause the p ower series is fast conver ging and the lower moments of the source bit distribution ar e ea sily ev aluated. W e can solve for ε : ε = 1 µ 1 ˇ G 1 + µ 2 2 µ 3 1 ˇ G 2 1 + 3 µ 2 2 − µ 1 µ 3 6 µ 5 1 ˇ G 3 1 ± · · · (62) Con vergence of this power series is ag ain fast enou gh to use the partial sum above as pra ctical estima te. Notice that in case of source words o f ﬁxed weigh t r we h av e µ m = r m and we recover section III exactly . R E F E R E N C E S [1] J. M. Barnard, “Substructure searching methods: Old and new , ” J. Chem. Inf. Comput. Sci. , vol. 33, pp. 532–538, 1993. [2] J. M. Barnard and D. W alko wiak, “Computer systems for substructure searching, ” in The Beilstei n System – Strate gies for Effect ive Sear ching , S. Heller , Ed. American Chemica l Society , 1998, pp. 55–72. [3] W . H. Kautz and R. C. Single ton, “Nonrando m binary superimposed codes, ” IEEE T rans. on Information Theory , vol . 10, no. 4, pp. 363–377, October 1964. [4] A. M. Rashad, “Superimposed codes for the search model of a. renyi, ” Intern. J. Compute r Math. , vol. 36, pp. 47–56, 1990. [5] C. N. Mooers, “ Applicati on of random codes to the gathe ring of stati stical informatio n, ” Master’ s thesis, Massachuset ts Institute of T echnology , 1948. [6] C. S. Robert s, “Parti al-match retrie val via the method of superimposed codes, ” Proce edings of the IEE E , vol . 67, no. 12, pp. 1624–1642, December 1979. [7] B. H. Bloom, “Space/t ime trade-of fs in hash coding with allo wable errors, ” CACM , vol. 13, no. 7, pp. 422–426, July 1970. [8] D. E. Knuth, The Art of Computer Pro gramming , 2nd ed. Addison-W esle y , 1998, vol . 3. [9] L. Hodes, “Select ion of descriptors according to discriminati on and redunda ncy . applicat ion to chemical structure searching, ” J. Chem. Inf. Comput. Sci. , vol. 16, pp. 88–93, 1976. [10] A. Reny i, “On the theory of random search, ” Bull. Amer . Math. Soc. , vol . 71, no. 6, pp. 809–928, 1965. [11] W . Feller , An Intr oduction to Probab ility T heory and Its Applicat ions , 3rd ed. Princeton , 1970.

On the Probability Distribution of Superimposed Random Codes

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment