Balanced Families of Perfect Hash Functions and Their Applications
The construction of perfect hash functions is a well-studied topic. In this paper, this concept is generalized with the following definition. We say that a family of functions from $[n]$ to $[k]$ is a $\delta$-balanced $(n,k)$-family of perfect hash …
Authors: Noga Alon, Shai Gutner
Balanced F amilies of P erfect Hash F unctions and Their Applications Noga Alon 1 and Shai Gutner 2 1 Schools of Mathematics and Computer Science, T el-Aviv Universit y , T el-Aviv, 69978, Israel. ⋆ noga@math. tau.ac.il. 2 School of Computer S cience, T el-Aviv Universit y , T el-Aviv, 69978, Israel. ⋆⋆ gutner@tau .ac.il. Abstract. The construction of p erfect hash functions is a we ll-studied topic. In this p ap er, this concept is generalized with the follo wing defi - nition. W e say t hat a famil y of functions from [ n ] to [ k ] is a δ -b alanced ( n, k ) -family of p erfect hash functions if for every S ⊆ [ n ], | S | = k , the num b er of functions that are 1-1 on S is b etw een T /δ and δ T for some constant T > 0. The sta ndard definition of a famil y of p erfect has h f unc- tions requ ires that there will b e at leas t one function that is 1-1 on S , for eac h S of size k . In the new notion of b alanced families, we req u ire the num b er of 1-1 functions to be almost the same (tak ing δ to b e close to 1) for every such S . Our main result is that for any constan t δ > 1, a δ - balanced ( n, k )-family of p erfect hash functions of size 2 O ( k log l og k ) log n can b e constructed in time 2 O ( k log log k ) n log n . Using the technique of color-coding we can apply our explicit constructions t o devise approxi- mation algori thms for v arious coun ting problems in graphs. In particular, w e exhibit a deterministic polynomial time algorithm for approximating b oth the number of simple path s of length k and th e num b er of simple cycles o f s ize k fo r an y k ≤ O ( log n log log log n ) in a graph with n vertices. The approximatio n is up to any fixed desirable relativ e error. Key words: appro ximate counting of su bgraphs, color-coding, perfect hashing. 1 In t ro duction This pa p er deals with explicit co nstructions o f balanc e d families of p e rfect ha s h functions. The topic of perfect ha s h functions has b een widely studied under the mor e general framework of k -restrictio n problems (see, e.g ., [3 ],[13]). Thes e problems hav e an exis tent ial nature of requir ing a set of c onditions to ho ld a t least once for a ny choice of k elements out of the problem do main. W e gener- alize the definition of perfect has h functions, and introduce a new, simple, and yet useful notion which w e call balanced families of perfect has h functions. The ⋆ Researc h su p p orted in part by a grant from the Israel Science F ound ation, and by the Hermann Mink o wski Minerva Cen ter for Geometry at T el Aviv Universit y . ⋆⋆ This paper fo rms part o f a Ph.D. thesis w ritten b y the aut h or under the supervision of Prof. N. Alon and Prof. Y . Azar in T el Aviv Un iversi ty . purp ose of our new definit ion is to incorpor ate more s tructure in to the construc - tions. O ur explicit constructions together with the metho d of color -co ding from [5] ar e applied for problems of a pproximating the num b er of times that some fixed subgraph appe a rs within a large graph. W e fo cus on count ing simple paths and simple cycles. Recently , the metho d o f color -co ding has found int eresting ap- plications in computational biolog y ([17],[18],[1 9],[12]), specifica lly in detecting signaling pathw ays within pro tein in teraction. This problem is formalized using an undirected edge-weigh ted gr aph, where the task is to find a minimum weight path of length k . The a pplication of o ur results in this case is fo r approximating deterministically the num b er of minim um weigh t paths of length k . P erfect Hash F unctions. An ( n, k )-family of p e r fect hash functions is a family of functions from [ n ] to [ k ] such that for every S ⊆ [ n ], | S | = k , there exists a function in the family that is 1-1 on S . There is an extensive literature dealing with explicit constructio ns of perfect has h functions. The construction describ ed in [5] (following [1 1 ] and [16]) is of size 2 O ( k ) log n . The b est known explicit constructio n is of size e k k O (log k ) log n , which clo sely matches the k nown low er b ound of Ω ( e k log n/ √ k ) [15 ]. Finding and Coun ting P aths and Cycles. T he foundations for the gr aph algorithms pr e s ented in this pap er hav e b een la id in [5]. Two ma in rando mize d algorithms are presented there, as follows. A simple directed or undire c ted path of length k − 1 in a gr aph G = ( V , E ) that cont ains such a path can be found in 2 O ( k ) | E | exp ected time in the directed case and in 2 O ( k ) | V | ex p ected time in the undirected c a se. A simple directed or undirected cycle of size k in a g r aph G = ( V , E ) that con ta ins suc h a cycle can b e found in either 2 O ( k ) | V || E | or 2 O ( k ) | V | ω exp ected time, where ω < 2 . 3 7 6 is the exp onent o f matrix multiplica- tion. The derandomizatio n of these algo rithms incur an extra log | V | factor. As for the case of even cycles, it is s hown in [20] tha t for every fixed k ≥ 2, there is an O ( | V | 2 ) alg orithm for finding a s imple cy cle o f siz e 2 k in an undirected gr aph. Improv ed algor ithms for detecting given leng th cycles have b een presented in [6] and [21]. An in teres ting r esult from [6], r elated to the questions a ddr essed in the present pa p e r, is a n O ( | V | ω ) algorithm for counting the num b er of cycles of size at mo st 7. Flum and Grohe prov e d that the problem of counting exactly the nu mber of pa ths and cycles of length k in b o th dire c ted and undirected g r aphs, parameterize d by k , is # W [1]-co mplete [10]. Their result implies that most likely there is no f ( k ) · n c -algor ithm for counting the precise num b er of paths or cy- cles of length k in a graph of size n for any co mputable function f : I N → I N and constant c . This sug gests the problem o f a pproximating these quantities. Arvind and Raman obtained a r andomize d fixed-par a meter tractable algorithm to approximately co un t the num be r of c opies of a fixed subgra ph with bounded treewidth within a large g r aph [7]. W e settle in the a ffirmative the op en ques- tion they raise concerning the existence of a deterministic approximate counting algorithm for this problem. F or simplicit y , we give algo rithms for approximately counting paths and cycles. These r e sults can b e e asily extended to the pro blem of approximately counting b ounded treewidth subgraphs , co mbin ing the same approach with the method of [5]. The main new ing r edient in our deterministic algorithms is the applica tion of balance d families of p erfect hash functions- a combinatorial notion intro duced here which, while simple, app ea r s to b e very useful. Balanced F ami lies of P erfect Hash F unctions. W e say that a family of functions from [ n ] to [ k ] is a δ -ba la nced ( n, k )-family of p erfect hash functions if for every S ⊆ [ n ], | S | = k , the num b er o f functions that are 1-1 on S is betw een T /δ and δ T for some co nstant T > 0. Balanced families o f p erfect hash functions are a natural g e ne r alization of the usual concept of p er fect hash functions. T o a ssist with our explicit co nstructions, we define also the even more generalized notion of bala nced splitter s . (See s ection 2 for the definition. This is a generaliz a tion o f an ordina ry splitter defined in [15 ].) Our Results. The main fo cus o f the pap er is on explicit co nstructions of balanced families of p erfect ha sh functions and their applications. First, we give non-constructive upper b ounds on the size o f different types of ba lanced split- ters. Then, we compa re these b ounds with thos e a chiev ed by constructive a l- gorithms. Our main result is an explicit construction, for every 1 < δ ≤ 2, of a δ -balanced ( n, k )-family o f p erfect has h functions of s ize 2 O ( k log log k ) ( δ − 1) − O (log k ) log n . The running time of the pro cedur e that provides the construc- tion is 2 O ( k log log k ) ( δ − 1) − O (log k ) n log n + ( δ − 1) − O ( k / log k ) . Constructions of bala nced families of p erfect hash functions ca n be applied to v a r ious counting pro ble ms in gra phs. In pa rticular, we describ e deterministic algorithms that approximate the num b er o f times that a small subgr aph app ear s within a large g raph. The approximation is always up to s ome mult iplicative factor, that can b e made arbitra rily close to 1 . F or any 1 < δ ≤ 2 , the num b er of simple pa ths o f length k − 1 in a graph G = ( V , E ) can b e approximated up to a m ultiplicative fa ctor o f δ in time 2 O ( k log log k ) ( δ − 1) − O (log k ) | E | log | V | + ( δ − 1) − O ( k / log k ) . The num b er of simple cycles o f size k can b e a pproximated up to a m ultiplicative factor o f δ in time 2 O ( k log log k ) ( δ − 1) − O (log k ) | E || V | lo g | V | + ( δ − 1) − O ( k / log k ) . T ec hniques. W e us e pr obabilistic arguments in orde r to pr ov e the exis tence of different types of small size balanc e d splitters (whose precise definition is given in the next sectio n). T o construct a balanced s plitter , a na tural r andomized a l- gorithm is to choose a la rge enough num ber of indep endent random functions. W e show that in some cases, the metho d of conditional pr obabilities, when a p- plied on a pro p er c hoice of a p otential function, ca n de r andomize this pr o cess in a n efficient wa y . C o nstructions of small pro bability spaces that admit k - wise independent random v a r iables a re a lso a natura l to ol for achieving go o d s plit- ting pr o p erties. The use of error co rrecting co des is shown to b e useful when we wan t to find a family of functions from [ n ] to [ l ], where l is muc h bigger than k 2 , s uch that for ev ery S ⊆ [ n ], | S | = k , almost a ll o f the functions should b e 1-1 on S . Bala nced splitters ca n b e co mpo sed in different ways and our main construction is achiev ed by co mp o sing three types of splitters. W e a pply the ex- plicit co nstructions of ba lanced families o f p erfect hash functions to g ether with the color -co ding technique to get o ur approximate co unt ing algor ithms. 2 Balanced F amilies of Perfect Hash F unctions In this section we formally define the new notions of balanced families of p erfect hash functions and balanced splitters. Her e are a few basics first. Denote by [ n ] the set { 1 , . . . , n } . F or any k , 1 ≤ k ≤ n , the family o f k -sized subsets of [ n ] is denoted b y [ n ] k . W e denote by k m od l the unique in teg er 0 ≤ r < l for which k = q l + r , for so me integer q . W e now in tr o duce the new notion of balanced families of p erfect hash functions. Definition 1. Supp ose that 1 ≤ k ≤ n and δ ≥ 1 . We say that a family of functions fr om [ n ] to [ k ] is a δ -b alanc e d ( n, k ) -family of p erfe ct hash funct ions if t her e exists a c onstant r e al numb er T > 0 , su ch that for every S ∈ [ n ] k , the numb er of functions that ar e 1-1 on S , which we denote by inj ( S ) , satisfies the r elation T /δ ≤ inj ( S ) ≤ δ T . The following definition genera lizes b oth the la st definition and the definition of a splitter fro m [15]. Definition 2. Supp ose that 1 ≤ k ≤ n and δ ≥ 1 , and let H b e a family of functions fr om [ n ] to [ l ] . F or a set S ∈ [ n ] k we denote by spl it ( S ) the numb er of functions h ∈ H that split S into e qual-size d p arts h − 1 ( j ) T S , j = 1 , . . . , l . In c ase l do es not divide k we sep ar ate b etwe en two c ases. If k ≤ l , then spl it ( S ) is define d t o b e the numb er of functions that ar e 1-1 on S . Otherwise, k > l and we r e quir e the first k mod l p arts to b e of size ⌈ k/ l ⌉ and the r emaining p arts to b e of size ⌊ k /l ⌋ . We say that H is a δ -b alanc e d ( n, k, l ) -splitter if ther e exists a c onstant r e al numb er T > 0 , such that for every S ∈ [ n ] k we have T /δ ≤ spl it ( S ) ≤ δ T . The definitions of bala nc e d families o f p er fect hash functions and balanced splitters given a bove enable us to state the following easy comp o sition lemmas. Lemma 1. F or any k < l , let H b e an explicit δ -b alanc e d ( n, k , l ) -splitter of size N and let G b e an explicit γ -b alanc e d ( l , k ) -family of p erfe ct hash functions of s ize M . We c an use H and G to get an explicit δ γ - b alanc e d ( n, k ) -family of p erfe ct hash functions of size N M . Pr o of. W e comp ose every function of H with every function of G and get the needed result. ⊓ ⊔ Lemma 2. F or any k > l , let H b e an explicit δ -b alanc e d ( n, k , l ) -splitter of size N . F or every j , j = 1 , . . . , l , let G j b e an ex plicit γ j -b alanc e d ( n, k j ) -family of p erfe ct hash functions of size M j , wher e k j = ⌈ k /l ⌉ for every j ≤ k mod l and k j = ⌊ k/ l ⌋ otherwise. We c an u se these c onst ructions to get an explicit ( δ Q l j =1 γ j ) -b alanc e d ( n, k ) -family of p erfe ct hash functions of size N Q l j =1 M j . Pr o of. W e divide the set [ k ] in to l disjoint interv als I 1 , . . . , I l , where the size of I j is k j for every j = 1 , . . . , l . W e think of G j as a family of functions from [ n ] to I j . F or every combination of h ∈ H and g j ∈ G j , j = 1 , . . . , l , w e crea te a ne w function that maps a n element x ∈ [ n ] to g h ( x ) ( x ). ⊓ ⊔ 3 Probabilistic Constructions W e will use the following tw o claims: a v a riant of the Chernoff b ound (c.f., e.g., [4]) and Robbins’ formula [9] (a tight version of Stirling’s for mula). Claim. Let Y b e the s um of mutually independent indicator ra ndom v aria bles, µ = E [ Y ]. F o r all 1 ≤ δ ≤ 2, P r [ µ δ ≤ Y ≤ δ µ ] > 1 − 2 e − ( δ − 1) 2 µ/ 8 . Claim. F or every in teger n ≥ 1 , √ 2 π n n +1 / 2 e − n +1 / (12 n +1) < n ! < √ 2 π n n +1 / 2 e − n +1 / (12 n ) . Now we state the r esults for δ -bala nced ( n, k , l )-splitters of the three t yp es : k = l , k < l and k > l . Theorem 1. F or any 1 < δ ≤ 2 , ther e exists a δ -b alanc e d ( n, k ) -family of p erfe ct hash functions of size O ( e k √ k log n ( δ − 1) 2 ) . Pr o of. (sketc h) Set p = k ! /k k and M = ⌈ 8( k ln n +1) p ( δ − 1) 2 ⌉ . W e cho o se M indep endent random functions. F or a specific set S ∈ [ n ] k , the exp ected num ber of functions that a re 1-1 on S is exac tly p M . By the Chernoff b ound, the pr obability that for at least o ne set S ∈ [ n ] k , the num b er o f functions that are 1-1 o n S will not be as needed is at most n k 2 e − ( δ − 1) 2 pM / 8 ≤ 2 n k e − ( k ln n +1) < 1 . ⊓ ⊔ Theorem 2. F or any k < l and 1 < δ ≤ 2 , ther e exists a δ - b alanc e d ( n, k , l ) - splitter of size O ( e k 2 /l k log n ( δ − 1) 2 ) . Pr o of. (sketc h) W e set p = l ! ( l − k )! l k and M = ⌈ 8( k ln n +1) p ( δ − 1) 2 ⌉ . Using Robbins’ for - m ula, we g et 1 p ≤ e k +1 / 12 (1 − k l ) l − k +1 / 2 ≤ e k +1 / 12 e − k l ( l − k +1 / 2) = e k 2 − k/ 2 l +1 / 12 . W e choos e M indep endent random functions and pro ceed as in the pro of of Theorem 1. ⊓ ⊔ F or the case k > l , the probabilistic arg ument s from [15] can be generalized to pr ov e exis tence of balanced ( n, k , l )-splitters. Here w e fo cus on the special case of balanced ( n, k , 2)-splitters , which will b e o f interest la ter . Theorem 3. F or any k ≥ 2 and 1 < δ ≤ 2 , ther e exists a δ -b alanc e d ( n, k , 2) - splitter of size O ( k √ k log n ( δ − 1) 2 ) . Pr o of. (sketc h) Set M = ⌈ 8( k ln n +1) p ( δ − 1) 2 ⌉ , where p deno tes the probability to get the needed s plit in a random function. If follows easily fro m Ro bbins ’ formula that 1 /p = O ( √ k ). W e cho ose M independent random functions and pr o ceed as in the pro of of Theorem 1 . ⊓ ⊔ 4 Explicit Constructions In this pap er, we use the term explicit constr uction for an alg orithm that lists all the elemen ts o f the required family o f functions in time which is p olynomial in the total size o f the functions. F or a discussio n on o ther definitions fo r this term, the r eader is r eferred to [15]. W e s tate our r esults for δ -balanced ( n, k , l )-s plitters of the three types : k = l , k < l and k > l . Theorem 4. F or any 1 < δ ≤ 2 , a δ -b alanc e d ( n, k ) -family of p erfe ct ha sh functions of size O ( e k √ k log n ( δ − 1) 2 ) c an b e c onstruct e d deterministic al ly within time n k e k k O (1) n log n ( δ − 1) 2 . Pr o of. W e s e t p = k ! /k k and M = ⌈ 16( k ln n +1) p ( δ − 1) 2 ⌉ . Denote λ = ( δ − 1) / 4, so obviously 0 < λ ≤ 1 / 4. Consider a c hoice of M indep endent random functions from [ n ] to [ k ]. This choice will b e derandomized in the c ourse o f the algorithm. F or every S ∈ [ n ] k , we define X S = P M i =1 X S,i , where X S,i is the indica tor random v aria ble that is eq ual to 1 iff the i th function is 1- 1 o n S . Consider the following p otential function: Φ = X S ∈ ( [ n ] k ) e λ ( X S − pM ) + e λ ( pM − X S ) . Its exp ectation ca n b e calculated as follows: E [ Φ ] = n k ( e − λpM M Y i =1 E [ e λX S,i ] + e λpM M Y i =1 E [ e − λX S,i ]) = = n k ( e − λpM [ pe λ + (1 − p )] M + e λpM [ pe − λ + (1 − p )] M ) . W e now give an upp er b ound for E [ Φ ]. Since 1 + u ≤ e u for all u and e − u ≤ 1 − u + u 2 / 2 for a ll u ≥ 0, we get that pe − λ + (1 − p ) ≤ e p ( e − λ − 1) ≤ e p ( − λ + λ 2 / 2) . Define ǫ = e λ − 1, that is λ = ln(1 + ǫ ). Thus pe λ + (1 − p ) = 1 + ǫp ≤ e ǫp . This implies that E [ Φ ] ≤ n k (( e ǫ 1 + ǫ ) pM + e λ 2 pM / 2 ) . Since e u ≤ 1 + u + u 2 for all 0 ≤ u ≤ 1, w e hav e that e ǫ 1+ ǫ = e e λ − 1 − λ ≤ e λ 2 . W e conclude that E [ Φ ] ≤ 2 n k e λ 2 pM ≤ e 2( k ln n +1) . W e now describ e a deterministic alg o rithm for finding M functions, so that E [ Φ ] will still ob ey the last upper b ound. This is p e rformed using the metho d of conditional probabilities (c.f., e.g., [4], c hapter 15 ). The a lgorithm will have M phases, where each phas e will co nsist o f n steps. In step i of phas e j the algorithm will determine the i th v alue of the j th function. Out of the k p ossible v alues, we g reedily choose the v alue that will decreas e E [ Φ ] as muc h as p ossible. W e note that at any sp ecific step of the alg orithm, the e x act v alue of the conditional exp ectation of the potential function can b e easily computed in time n k k O (1) . After all the M functions have be e n determined, every set S ∈ [ n ] k satisfies the following: e λ ( X S − pM ) + e λ ( pM − X S ) ≤ e 2( k ln n +1) . This implies that − 2( k ln n + 1) ≤ λ ( X S − pM ) ≤ 2( k ln n + 1) . Recall that λ = ( δ − 1) / 4 , a nd therefore (1 − 8( k ln n + 1) ( δ − 1) pM ) pM ≤ X S ≤ (1 + 8( k ln n + 1) ( δ − 1) pM ) pM . Plugging in the v a lues of M and p w e get that (1 − δ − 1 2 ) pM ≤ X S ≤ (1 + δ − 1 2 ) pM . Using the fa ct that 1 /u ≤ 1 − ( u − 1) / 2 for a ll 1 ≤ u ≤ 2, we get the desired result pM /δ ≤ X S ≤ δ pM . ⊓ ⊔ Theorem 5. F or any 1 < δ ≤ 2 , a δ -b alanc e d ( n, k , ⌈ 2 k 2 δ − 1 ⌉ ) -splitter of size k O (1) log n ( δ − 1) O (1) c an b e c onstructe d in time k O (1) n log n ( δ − 1) O (1) . Pr o of. Denote q = ⌈ 2 k 2 δ − 1 ⌉ . Cons ider an explic it constructio n of an e rror cor recting co de with n co dewords ov er alphab et [ q ] whose normalize d Hamming distance is at le ast 1 − 2 q . Such explicit co des of length O ( q 2 log n ) exist [1]. Now let every index of the co de cor resp onds to a function from [ n ] to [ q ]. If we denote by M the length o f the co de, which is in fac t the size o f the splitter, then for every S ∈ [ n ] k , the num b er o f go o d splits is at least (1 − k 2 2 q ) M ≥ (1 − δ − 1 2 ) M ≥ M /δ, where the last inequality follows from the fact that 1 − ( u − 1 ) / 2 ≥ 1 / u for all 1 ≤ u ≤ 2. ⊓ ⊔ F or our next construction w e use small probability spac es that supp ort a sequence o f almos t k -size indep endent r andom v ariables . A sequence X 1 , . . . , X n of r andom Bo olean v a riables is ( ǫ, k )-indep endent if for an y k p ositions i 1 < · · · < i k and any k bits α 1 , . . . , α k we hav e | P r [ X i 1 = α 1 , . . . , X i k = α k ] − 2 − k | < ǫ . It is k nown ([14],[2],[1]) that sample spa ces of size 2 O ( k +log 1 ǫ ) log n that sup- po rt n random v ariables that are ( ǫ, k )-indep endent can b e constructed in time 2 O ( k +log 1 ǫ ) n log n . Theorem 6. F or any k ≥ l and 1 < δ ≤ 2 , a δ -b alanc e d ( n, k , l ) -splitter of size 2 O ( k log l − log( δ − 1)) log n c an b e c onstru cte d in t ime 2 O ( k log l − log( δ − 1)) n log n . Pr o of. W e use an e xplicit pr obability space o f size 2 O ( k log l − log( δ − 1)) log n that suppo rts n ⌈ log 2 l ⌉ random v ariables that are ( ǫ, k ⌈ log 2 l ⌉ )-indep endent where ǫ = 2 − k ⌈ log 2 l ⌉− 1 ( δ − 1). W e attach ⌈ log 2 l ⌉ rando m v ariables to each element of [ n ], thereby assigning it a v alue from [2 ⌈ log 2 l ⌉ ]. In case l is not a p ow er of 2, a ll elements of [2 ⌈ log 2 l ⌉ ] − [ l ] can b e ma ppe d to [ l ] by s o me arbitra ry fixed function. If follows fro m the cons truction that there ex is ts a constant T > 0 so that for every S ∈ [ n ] k , the num b er o f go o d splits satisfies T δ ≤ (1 − δ − 1 2 ) T ≤ spl it ( S ) ≤ (1 + δ − 1 2 ) T ≤ δ T . ⊓ ⊔ Corollary 1. F or any fix e d c > 0 , a (1 + c − k ) -b alanc e d ( n, k , 2) -splitter of size 2 O ( k ) log n c an b e c onstructe d in t ime 2 O ( k ) n log n . Setting l = k in Theo rem 6, we g et that a δ -balanced ( n, k )-family o f p er- fect hash functions of size 2 O ( k log k − log ( δ − 1)) log n can b e constructed in time 2 O ( k log k − log( δ − 1)) n log n . Note tha t if k is small enough with resp ect to n , say k = O (lo g n/ log log n ), then for any fixed 1 < δ ≤ 2, this a lready gives a family of functions of size p olynomial in n . W e improve up on this last r esult in the following Theo rem, which is our main construction. Theorem 7. F or 1 < δ ≤ 2 , a δ -b alanc e d ( n, k ) -family of p erfe ct hash func- tions of size 2 O ( k l og log k ) ( δ − 1) O (log k ) log n c an b e c onstructe d in time 2 O ( k l og log k ) ( δ − 1) O (log k ) n log n + ( δ − 1) − O ( k / log k ) . In p articular, for any fixe d 1 < δ ≤ 2 , the size is 2 O ( k log log k ) log n and the t ime is 2 O ( k log log k ) n log n . Pr o of. (sketc h) Denote l = ⌈ log 2 k ⌉ , δ ′ = δ 1 / 3 , δ ′′ = δ 1 / (3 l ) , and q = ⌈ 2 k 2 δ ′ − 1 ⌉ . Let H b e a δ ′ -balanced ( q , k , l )-splitter of size 2 O ( k log log k ) ( δ ′ − 1 ) − O (1) constructed using Theore m 6. F or every j , j = 1 , . . . , l , let B j be a δ ′′ -balanced ( q , k j )- family o f p erfect hash functions of size O ( e k/ log k k )( δ ′′ − 1) − O (1) constructed using Theorem 4, where k j = ⌈ k/ l ⌉ for ev ery j ≤ k m od l and k j = ⌊ k/ l ⌋ otherwise. Using Lemma 2 for comp osing H a nd { B j } l j =1 , we get a δ ′ 2 -balanced ( q , k )-family D ′ of per fect hash functions. Now let D ′′ be a δ ′ -balanced ( n, k , q )-splitter of size k O (1) ( δ ′ − 1 ) − O (1) log n constructed using Theor em 5 . Using Lemma 1 for co mpo sing D ′ and D ′′ , we get a δ - balanced ( n, k )-family o f p erfect hash functions, as needed. No te that for calculating the s ize of each B j , we use the fact that e u/ 2 ≤ 1 + u ≤ e u for all 0 ≤ u ≤ 1, a nd get the following: δ ′′ − 1 = (1 + ( δ − 1)) 1 3 l − 1 ≥ e δ − 1 6 l − 1 ≥ δ − 1 6 l . The time needed to construct each B j is 2 O ( k ) ( δ ′ − 1) − O ( k / log k ) . The 2 O ( k ) term is omitted in the final result, as it is negligible in r esp ect to the other terms . ⊓ ⊔ 5 Appro ximate Coun t ing of Paths and C ycles W e now s ta te what it means for an alg orithm to approximate a counting problem. Definition 3. We say that an algorithms appr oximates a c ounting pr oblem by a multiplic ative factor δ ≥ 1 if for every input x , t he output ALG ( x ) of t he algorithm satisfies N ( x ) /δ ≤ ALG ( x ) ≤ δ N ( x ) , wher e N ( x ) is the exact output of the c ounting pr oblem for input x . The technique of color -co ding is used for approximate counting of paths and cycles. Let G = ( V , E ) b e a dir ected or undirected gra ph. In our a lg orithms we will use constructions of bala nce d ( | V | , k )-families of per fect hash functions. Each such function de fines a colo r ing of the vertices of the g raph. A path is s aid to be c olorful if each vertex on it is color ed by a distinct color. O ur go al is to count the exa ct n um ber of colorful paths in each of these co lorings. Theorem 8. F or any 1 < δ ≤ 2 , the n umb er of simple (dir e cte d or undir e cte d) p aths of length k − 1 in a (di r e cte d or u ndir e cte d) gr aph G = ( V , E ) c an b e appr oximate d up to a multiplic ative factor of δ in time 2 O ( k l og log k ) ( δ − 1) O (log k ) | E | log | V | + ( δ − 1) − O ( k / log k ) . Pr o of. (sketc h) W e us e the δ -ba lanced ( | V | , k )-family of p erfect hash functions constructed using Theorem 7. Each function of the family defines a co lo ring of the vertices in k colo rs. W e know tha t there exists a consta nt T > 0 , so tha t for ea ch s e t S ⊆ V of k vertices, the num b er of functions that are 1-1 on S is betw een T /δ and δ T . The exact v alue of T can b e easily calculated in all of our explicit constr uctions. F or ea ch co loring, we us e a dynamic pro gramming approach in o rder to cal- culate the exact num b er of colo r ful paths. W e do this in k phases. In the i th phase, for each vertex v ∈ V and for ea ch subset C ⊆ { 1 , . . . , k } of i colo rs, we calculate the num b er o f colorful paths of length i − 1 tha t end at v and use the colors of C . T o do so, for every e dge ( u, v ) ∈ E , we chec k whether it can be the last edge o f a co lorful path of length i − 1 ending at either u o r v . Its contribution to the num b er of paths of length i − 1 is calculated using our knowledge o n the nu mber of paths o f leng th i − 2. The initialization of phase 1 is easy and after per forming phase k w e k now the exa c t num b er of pa ths o f length k − 1 that end at each vertex v ∈ V . The time to pro cess each coloring is therefore 2 O ( k ) | E | . W e sum the results ov er all color ings and a ll ending vertices v ∈ V . The result is divided b y T . In ca se the gra ph is undirected ,we further divide by 2 . This is guar anteed to b e the needed approximation. ⊓ ⊔ Theorem 9. F or any 1 < δ ≤ 2 , the n umb er of simple (dir e cte d or undir e cte d) cycles of size k in a (dir e ct e d or undir e cte d) gr aph G = ( V , E ) c an b e appr oxi- mate d up to a multiplic ative factor of δ in time 2 O ( k log log k ) ( δ − 1) O (log k ) | E || V | lo g | V | + ( δ − 1) − O ( k / log k ) . Pr o of. (sketc h) W e us e the δ -ba lanced ( | V | , k )-family of p erfect hash functions constructed using Theor e m 7. F or every set S of k vertices, the num b er of func- tions that a r e 1-1 on S is b etw e en T /δ and δ T . Every function defines a color ing and for each such colo ring we pro ceed a s follows. F or every vertex s ∈ V we r un the alg orithm descr ib ed in the pro of o f Theorem 8 in order to calculate for each vertex v ∈ V the e xact num b er of colorful paths of length k − 1 from s to v . In case there is an edge ( v , s ) that completes a cycle, w e add the r esult to our count. W e sum the r esults ov e r all the coloring s and a ll pairs of vertices s and v a s describ ed ab ove. The r esult is divided by k T . In case the gr a ph is undirected, we further div ide b y 2. The needed appr oximation is a chiev ed. ⊓ ⊔ Corollary 2. F or any c onstant c > 0 , t her e is a deterministic p olynomial time algorithm for appr oximating b oth the numb er of simple p aths of length k and the numb er of simple cycle s of size k for every k ≤ O ( log n log log log n ) in a gr aph with n vertic es, wher e the appr ox imation is u p to a multiplic ative factor of 1 + (ln ln n ) − c ln l n n . 6 Concluding Remarks – An in teresting op en pr oblem is whether for every fixe d δ > 1, there ex- ists an explicit δ -balanced ( n, k )-family of p erfect hash functions of size 2 O ( k ) log n . The key ing redient needed is an improved co nstruction of ba l- anced ( n, k , 2)-splitters. Such splitters c a n b e a pplied successively to get the balanced ( n, k , ⌈ lo g 2 k ⌉ )-splitter nee ded in Theorem 7. It see ms that the co n- structions presented in [2] could be go o d candidates for balanced ( n, k , 2 )- splitters, a ltho ugh the F o urier analy sis in this case (along the lines of [8 ]) seems elusive. – Other algor ithms from [5] ca n be g eneralized to deal with counting problems. In pa rticular it is po ssible to combine our approach here with the ideas of [5] based o n fast matr ix m ultiplication in o rder to approximate the n um- ber of cycles of a given length. Given a forest F on k vertices, the num b er of s ubgraphs of G isomorphic to F can be a pproximated using a recur sive algorithm similar to the one in [5]. F or a weigh ted gr aph, we can approxi- mate, fo r example, b oth the n umber of minim um (maximum) w eig ht paths of length k − 1 and the num b er of minimum (max imum) weigh t cyc le s o f size k . Finally , all the results can b e r e adily extended from paths and cycles to arbitrar y sma ll subgraphs of b ounded tree-width. W e omit the details. – In the definition o f a ba lanced ( n, k )-family o f p erfect hash functions, there is some co nstant T > 0, such that for every S ⊆ [ n ], | S | = k , the num b er of functions tha t are 1-1 o n S is close to T . W e no te that the v alue of T need not b e equal to the exp ected n umber of 1-1 functions on a set of size k , for the case that the functions were chosen independently acco rding to a uniform distribution. F o r example, the v alue of T in the construction of Theorem 7 is not ev en asymptotica lly equal to what o ne would exp ect in a uniform distribution. References 1. Noga Alon, Jehosh ua Bruck, Joseph Naor, Moni Naor, and R on M. Roth. Construc- tion of asymptotically go o d lo w-rate error-correcting co d es through pseudo- ran d om graphs. IEEE T r ansactions on I nformation The ory , 38(2):509, 1992. 2. Noga Alon, Oded Goldreic h, Johan H ˚ astad, and Ren´ e P eralta. Simple construc- tion of almost k-wise indep endent random v ariables. R andom Struct. A l gorithms , 3(3):289–3 04, 1992. 3. Noga Alon, Dana Moshko vitz, and Shmuel Safra. Algorithmic construction of sets for k -restrictions. ACM T r ansactions on Algorithms , 2(2):153–177, April 2006. 4. Noga A lon and Jo el H . Sp encer. The Pr ob abil istic Metho d . Second edition. Wiley , New Y ork, 2000. 5. Noga Alon, Raphael Y uster, and Uri Zwick. Color-co ding. Journal of the ACM , 42(4):844– 856, July 1995. 6. Noga Alon, Raphael Y uster, and Uri Zwic k. Finding and counting giv en length cycles. Algorithmic a , 17(3):209–223, Marc h 1997. 7. Vikraman A rvind and V enk atesh R aman. Ap proximatio n algorithms for some pa- rameterized counting problems. In Prosenjit Bose and Pat Morin, editors, ISAAC , vol ume 2518 of L e ctur e Notes in Computer Scienc e , pages 453–464. Sp rin ger, 2002. 8. Y ossi Azar, R a jeev Motw ani, and Joseph N aor. Approximating probability distri- butions using small sample spaces. Combi natoric a , 18(2):151–171, 1998. 9. William F eller. An intr o duction to pr ob ability the ory and its applic ations. Vol. I . Third edition. Wiley , New Y ork, 1968. 10. J¨ org Flum and Martin Grohe. The parameterized complexity of counting problems. SIAM Journal on Computing , 33(4):892 –922, August 2004. 11. Mic hael L. F redman, J´ anos Koml´ os, and End re Szemer´ edi. Storing a sparse table with O (1) w orst case access time. Journal of the ACM , 31(3):538–544, July 1984. 12. F alk H ¨ uffner, Sebastian W ern icke, and Thomas Zichner. Algorithm engineering for color-coding t o facilitate signaling pathw a y detection. In David Sankoff, Lushen g W ang, and F rancis Chin, editors, Pr o c e e dings of 5th A si a-Pacific Bi oinformatics Confer enc e, APBC 2007, 15-17 January 2007, Hong Kong, China , volume 5 of A dvanc es i n Bioinformatics and Computational Biolo gy , pages 277–286. Imp erial College Press, 2007. 13. Daphne Koller and Nimrod Megiddo. Constructing small sample spaces satisfying giv en constrai nts. SIAM Journal on Di scr ete Mathematics , 7(2):260–274, May 1994. 14. Joseph Naor and Moni Naor. Small-bias probability spaces: Efficient constructions and applications. SIAM Journal on Com puting , 22(4):838–85 6, A ugust 1993. 15. Moni Naor, Leonard J. Sch ulman, and Aravind Sriniv asan. Splitters and near- optimal d erand omization. In 36th Annual Symp osium on F oundations of Computer Scienc e , p ages 182–191, 1995. 16. Jeanette P . Schmidt and Alan Siegel. The spatial complexity of oblivious k -prob e hash functions. SIAM Journal on Computing , 19(5):775–78 6, October 1990. 17. Jacob Scott, T rey Ideker, Richard M. Karp, and R od ed Sharan. Efficient algo- rithms for d etecting signaling path w a ys in protein interaction netw orks. Journal of Computational Biolo gy , 13(2):133– 144, 2006. 18. Rod ed Sharan and T rey Ideker. Modeling cellular machinery through biologic al netw ork comparison. Natur e Biote chnolo gy , 24(4):427 –433, 2006. 19. T omer Shlomi, Daniel S egal, Eytan Ru ppin, and Ro ded Sharan. QPath: a metho d for query ing pathw a ys in a protein-protein interaction n etw ork. BMC B i oinfor- matics , 7:199, 2006. 20. Raphael Y uster and Uri Zwic k . Finding even cycles even faster. SIAM Journal on Discr ete Mathematics , 10(2):209 –222, May 1997. 21. Raphael Y u ster and Uri Zwic k . Detecting short directed cycles using rectangular matrix m ultiplication and dynamic programming. In Pr o c e e dings of the Fi fte enth Ann ual ACM-SIAM Symp osium on Discr ete Algorithms , pages 254–26 0, 2004.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment