Secure Network Coding Against the Contamination and Eavesdropping Adversaries
In this paper, we propose an algorithm that targets contamination and eavesdropping adversaries. We consider the case when the number of independent packets available to the eavesdropper is less than the multicast capacity of the network. By means of…
Authors: Yejun Zhou, Hui Li, Jianfeng Ma
Secure Netw ork Coding Against the Contaminati on and Ea v esdropping Adv ersaries Y ejun Zhou, Hui Li and Jianfeng Ma Ke y lab of CNIS Ministry of Education Xidian Un iv ersity Xi’an, Sh anxi 7 10071 , China Email: { yjzhou , lihui, jfma } @mail.xidian.ed u.cn Abstract —In this paper , we pr opose an algorithm t hat targets contamination and eav esdropping adversaries . W e consider th e case when th e number of i ndependent packets av ail able to the ea vesdr opper is less than th e multicast capacity of the network. By means of our algorithm ev ery node can verify the integrity of the recei ved packets easily and an eav esdropper is unable to get any “meaningful information”about the sourc e. W e call it “practica l security”if a n ea vesdropper is un able to get any meaningful info rmation about the sour ce. W e show that, by gi ving up a small amount of overa ll capacity , our algorithm achieves achiev es the practically secure condition at a probability of one, which is much higher than th at of Bhattad and Narayanan’ s [1]. Furthermore, the communication overhead of our algorithm are negligible compared with previous works, sin ce the transmission of the h ash va lues and the code coefficients are both av oided. I . I N T RO D U C T I O N The concept of network coding was first introduced by Ahlswede et al. [2]. They showed that multicast rates could be increased by allowing for network codin g in stead of ju st routing . Shortly afterwards, Li, Y eung and Cai [3] showed that it is sufficient for the encod ing fu nctions at the in terior n odes to be linear . Ho et al. [4 ] and [5] propo sed a rando m cod ing scheme in wh ich the messag e on ou tgoing edges of a no de are chosen to be a ra ndom linear c ombinatio n of the message on its in coming edges. In reality , network transmission may suffer from two kinds of adversaries: contam ination an d eav esdropp ing. Network coding has been studied to co n-que r these two kinds of adversaries. Ho et al. [6] con sidered th e prob lem of network coding in the presence of Byzantin e attacker . Gk antsidis et al. [7] also con sidered the r elated prob lem. Jaggi et al. [8] designed a resilient network co ding algorithm wh ich is informa tion-theor etically secu re and rate-op timal fo r d ifferent adversarial strengths. Homo morph ic hashing function was first propo sed in [9], which a llows nod es to ch eck blo cks o n-the- fly in a system wh ere content is encoded at the source using rateless codes. Howe ver , the total size of the ha sh values of their scheme is propo rtional to the numb er o f blocks, which could be very large and the cryptogra phic hash function is computatio nally expensive. Li e t al. [10] employed a batch content distribution verification schem e, wh ich re duced th e computatio nal cost of each node to cac he and scan all th e received p ackets when com puting a new p acket. The cry p- tograph ic hash function of their sch eme is co mputation ally inexpensive compared with which in [9]. Unfortu nately , their scheme deviate from th e classical network coding sch eme, which is band width consu med and dela y cou ld be induced a t the sinks. On the other h and, although batching can d ecrease the compu tation time, b atching block verification has the risk of letting so me malicious packets pr opagate since pa ckets ar e exchanged witho ut be ing ch ecked. Thu s, standa rd b atching technique s do not work well with n etwork co ding. Zhao et al. [ 11] p resented a signature schem e with low comp utation, but their sch eme required long start-up latency . Fin ally , all the works pre sented above have to distribute the co efficients which is band width consu med. Cai an d Y eung [12] consider ed th e problem o f u sing n et- work co ding to ac hieve perfect in formatio n security ag ainst an eavesdropper who can eavesdrop on a limited num ber of network links, and presented th e con struction of a secure linear network co de for this p urpose. A similar p roblem was considered in [13] fe aturing a rando m co ding approach in which o nly the input vector is m odified. Bhattad and Naray anan [1] first defin ed a mo del for security that is more suitab le fo r p ractical ap plications. I n th is paper, we also consider this ty pe of model, wh ich is not informa tion theoretically secure, but is secu re enou gh for the app lication. An in teresting ob servation made in [ 14] was th at fo r a compu - tation limited e av esdropp er with the use o f one way f unction it is possible to transmit at a high r ate witho ut the eavesdropper getting any m eaningfu l information about the source. A more general threat posed by interm ediate n odes was considered in [15]. In this paper, we consider these two kin ds of adversar ies at the sam e time, that is, th e adversary can contam inate the transmission on a subset of channels, and at the sam e time eav esdrop on another su bset o f chan nels with cardinality less than or equal to the multicast capacity o f the network. Ng ai and Y an g [16] studied the similar pro blem an d con structed a secure er ror-correcting network codes. The m ain co ntribution of this paper is to propose an algorithm , which can not only verify th e integrity o f the received packets ea sily but also achieve the pr actically secu re condition at a pr obability of on e. I n o ur schem e, we use the public p arameters as the “intend ed”hash values. T he o riginal packets are padd ed so that they are ha shed to the pub lic parameters. In this way the tra nsmission of the ha sh values is a voided. The code coefficients in our scheme a re g enerated in a p seudo-ra ndom number generato r in each node, so the distribution o f the coefficients is also avoided. W e show that the com munication overhead and the start-u p latency are negligible since the transmission of the hash values an d the coefficients are bo th av oided. This p aper is organ ized as fo llows. In the next section we giv e the n otations u sed in this pa per . The secure n etwork coding scheme is pr oposed in section III. I n Sectio n IV , we present the security of o ur algor ithm. Overhea d an d start- up latency of our scheme are d iscussed in Section V . Finally , this paper is con cluded in section VI . I I . N E T WO R K M O D E L A N D N OT I O N S In this paper, we assume that all th e messages a nd co- efficients are generated in F p , where p is a large en ough prime numb er . we sha ll u se small letters x,y etc. to denote vectors whose dimension s will b e clear fro m the context. The matrices are denoted by the capital letters such as X, ˆ X etc. The transpose o perator of vectors an d matrices will be deno ted by “ T ”thus x T will stand fo r colu mn vectors. A. Network Mo del W e repr esent a network by a directed graph G = ( V ; E ) , where V is the set of vertices (nod es) and E is the set o f edges (chann els). W e assume an or der on V which is consistent with the a ssociated partial ord er on G . A ne twork code is said to be linear if the message on any outgoin g edg e of any node is a linea r comb ination o f th e messages on the incomin g edges of the no de. In this paper, we assume th at th e source no de send s infor- mation X of the following form: X = x 11 x 12 . . . x 1 n x 21 x 22 . . . x 2 n . . . . . . . . . . . . x m 1 x m 2 . . . x mn = x 1 x 2 . . . x m (1) W e call x i , i = 1 , . . . , m a packet. Therefo re, for a linear co de the message o n edge e j ∈ E can be written as F e j X whe re F e j is a len gth m vector over F p (we call it global e ncoding kernel in this paper ) on edg e e j ∈ E . B. Th r eat Model There is a sou rce, Alice, and a destination, Bob, wh o commun icate over a wir ed or wireless network . There is also an eavesdropper Calvin, hidde n somewhere in the network. He aims to ea vesdr op on the transfer of in formatio n from Alice to Bob and in jects his own. A m alicious nod e can generate corrup ted packets and then distribute them to o ther n odes, which in turn u se them to (unintentionally) crea te ne w enco ded packets that are also c orrupted . A wiretap network is specified by a collection A of sets of edg es A = { A 1 , A 2 , . . . , A |A| } , A i ∈ E . Calvin selects a particu lar set A i ∈ A and listens to all messages tran smitted on edges in A i to get some 1 2 3 4 5 6 7 1 2 x x 1 2 x 2 x 1 2 2 x 3 x 1 2 3 4 5 6 7 1 x w w 1 x 2 w ( ) a ( ) b Fig. 1. Networks informa tion. W e assume that the set doesn’t chang e with time. When we are specified a linear cod e and a wiretap n etwork we use A i to r epresent a matrix wh ose rows co ntain all linearly indepen dent global enco ding kernel correspo nding to edge e j ∈ A i . In this case, the messages av ailable to Calvin is A i X . T he n umber o f rows in A i is represented by k i . W e define k as max i k i . C. Notions 1) : T he ne twork capacity is the time -av erage o f the maximum number of p ackets th at can be d eli vered fro m Alice to Bob, assuming no adversarial inte rference, i.e., the max - flow . T o simp lify notion , in this paper, we assume the max- flow f rom Alice to Bob is m . 2) : Practical security : Consid er a set of messages M . Let U b e subset of the set con taining the multicast informa tion X. W e say th at M has no in formation about U if I ( U ; M ) = 0 . W e say that M has no meaning ful infor mation abou t U if I (x i ; M ) = 0 , ∀ x i ∈ U . In this pap er we concentrate on two special cases an d gen eralize the r esults towards the en d. W e say that Calv in has no in formatio n ab out the source if I ( X ; M ) = 0 where M is the set of messages that Calvin chooses to obser ve. The secu rity cond ition con sidered by Cai and Y eun g [12] falls in this categor y . W e will use Shann on security to refer to this security requiremen t. Th e second case we con sider is when Calvin gets no meaningf ul in formation about the sou rce i.e. I (x i ; M ) = 0 , ∀ x i for messages M observed by Calvin . W e call this type o f secur ity as practical security . It is n oted that if Alice transmits a linear transfor mation of X, PX, instead of X th en th e message transmitted on edge e j would be F e j PX (P is a m × m matr ix which is u nknown to Calvin). In th is case, althou gh Calvin ha s some infor mation about the source he is unab le to ge t a ny meaningf ul in formatio n. As shown in Fig.1, let us assume that Calvin can listen to any one ed ge of this network. The m ulticast cap acity for this network is 2. x 1 and x 2 are the me ssages of Alice. I n Fig.1 (a), w is a u niform ran dom sequenc e indepen dent of the messages. This is an examp le o f the cod ing scheme constru cted by Cai and Y eung [12]. Obviously , the m aximum multicast capacity supported is 1 when this system has to be Shanno n secu re. When the security condition is relaxed to p ractical secu rity , as shown in Fig.1 (b), the max-flow can be ach iev ed. I I I . S E C U R E N E T W O R K C O D I N G A. Th e Ho momorphic Hash Functio n W e first c hoose the hash parameter s q , g . L et o ( x ) denote the o rder of x in th e field F q . Here we choo se o ( g ) = p in F q ( F p is a subfield of F q ). Fur thermor e we randomly select n + 2 number s u 0 , u 1 , · · · , u n , u n +1 from F p . Next, we compute g i = g u i (mo d q ) for all 0 ≤ i ≤ n + 1 . The public parameter of th e hash fun ction is p , q , g 0 , g 1 , · · · , g n +1 . Whereas u 0 , u 1 , · · · , u n +1 and g should b e kept secre t. Formally , we define D L [ g , p, q ] to b e the com putational problem : Gi ven y , g and q , where o ( g ) = p in F q , find x such tha t y = g x (mo d q ) . Hence, we have Lemma 1: Given g 0 , g 1 , · · · , g n +1 , an d the public parame- ters p , q , it is computatio nally infeasible for a node to find u 0 , u 1 , · · · , u n +1 , such that g i = g u i (mo d q ) if DL [ g , p, q ] is hard. Assume tha t each message is of the form: x = ( x 0 , x 1 , · · · , x n , r ) wh ere x i , r ∈ F p for 0 ≤ i ≤ n , and th e hash of x is com puted as H (x) = ( n Y i =0 g i x i ) g r n +1 (mo d q ) (2) Based o n this constru ction, we h av e H (x) = g ( P n i =0 u i x i + u n +1 r ) (mo d p ) (mo d q ) (3) For any two messages x = ( x 0 , x 1 , · · · , x n , r 1 ) and y = ( y 0 , y 1 , · · · , y n , r 2 ) , we define the addition o f x a nd y as x + y = ( z 0 , z 1 , · · · , z n , r ) (4) where r = ( r 1 + r 2 ) (mo d p ) and z i = ( x i + y i ) (mo d p ) for 0 ≤ i ≤ n Hence, this hash function has the following ho mmorph ic proper ty H (x) H (y) = H ( x + y) (5) The security of H is defined in term s o f the d ifficulty in finding collisions. It can be shown that the hash fun ction is indeed collision fr ee if D L [ g , p, q ] is ha rd. In particu lar we have: Lemma 2: The hash fun ction H is collision-f ree (nam ely it is co mputation ally in feasible to find two d ifferent messages x 1 and x 2 such tha t H (x 1 ) = H ( x 2 ) if DL [ g , p, q ] is h ard. It c an be pr oved that the hash function is inde ed collision- free, u sing an argument in [17] ( proof o f The orem 3.4 ). B. Alice ’ s Encode r a nd B ob’s Decoder Alice ′ s enco der : Alice encodes X in the following steps. She first ch ooses m p arity symbols r d , fo r d ∈ { 1 , · · · , m } unifor mly at rando m f rom the field F p and then generates a V andermo nde m atrix P as follows P = r 1 r 2 . . . r m r 2 1 r 2 2 . . . r 2 m . . . . . . . . . . . . r m 1 r m 2 . . . r m m (6) In the second step, Alice per-multiplies the source me ssage X with P X ′ = P X = x ′ 11 x ′ 12 . . . x ′ 1 n x ′ 21 x ′ 22 . . . x ′ 2 n . . . . . . . . . . . . x ′ m 1 x ′ m 2 . . . x ′ mn = x ′ 1 x ′ 2 . . . x ′ m (7) In the th ird step, Alice ad ds r 1 , r 2 , . . . , r m to X ′ and ge ts X ′′ as fo llows X ′′ = x ′ 11 x ′ 12 . . . x ′ 1 n r 1 x ′ 21 x ′ 22 . . . x ′ 2 n r 2 . . . . . . . . . . . . . . . x ′ m 1 x ′ m 2 . . . x ′ mn r m = x ′′ 1 x ′′ 2 . . . x ′′ m (8) Alice uses g i , 1 ≤ i ≤ m ( it is possible since any practical network cod ing system would make m ≪ n ) as the “intended ”hash values o f x ′′ 1 , x ′′ 2 , . . . , x ′′ m . A list o f p adding x 10 , x 20 , . . . , x m 0 can be co mputed. W e a dd the pa dding to ev ery packet a nd ge t b X as show in Fig 1. When a message packet x ′′ i is padde d with x i 0 to for m th e new packet b x i , th en H ( b x i ) = g i , 1 ≤ i ≤ m . Bob ′ s de co der : Bob first decodes b X and ge ts r 1 , r 2 , . . . , r m . The V ander monde matr ix P ca n be compu ted from r 1 , r 2 , . . . , r m . Bob then per-multiplies the associated matrix PX with P − 1 and g ets the origin al packet X. C. The Basic V erificatio n S cheme As show in Fig.1, It is n oted that the message can o nly be p added using the secret key that is kn own o nly by Alice. Next, Alice chooses a seed c and feed it to a pseud o-rand om generato r G . I nstead of choosing th e coefficients, the source uses the random number s c 1 , c 2 , . . . , c m generated by G as th e “intended ”coefficients. Since the coefficients can be computed from the p ublic function G , th ere w ould be no n eed to distribute the co efficients, and it suf fices if all the no des know c . Our pro posed scheme consists of two algorith ms, namely the enco ding algo rithm and the verificatio n algorithm . Enco ding Algorithm : The encode r per forms the f ol- lowing step s 1) Cho ose a rando m seed c . 1 , , m g g ! 1 , , m c c ! 1 , , m c c ! Source Node 0 1 1 , , , n g g g ! 0 1 1 , , , n g g g ! D i s t ri b u te t o a ll n o de s 11 12 1 1 21 22 2 2 1 2 X = n n m m m n m x x x r x x x r x x x r c c c § · ¨ ¸ c c c ¨ ¸ c c ¨ ¸ ¨ ¸ c c c © ¹ " " # # % # # " 10 11 1 1 20 21 2 2 0 1 ˆ X = n n m m mn m x x x r x x x r x x x r c c § · ¨ ¸ c c ¨ ¸ ¨ ¸ ¨ ¸ c c © ¹ " # # # % # # " Homomorphic Hash Function P er f o rm p a d d in g P se ud o -ra n d om Pseu do- rando m g e n e r a t o r g en e ra to r Perfor m verificat ion L ine ar c omb ina tio n seed Fig. 2. The basic verificat ion s cheme 2) Generate pseud o-rand om n umbers c 1 , c 2 , · · · , c m from G with c . 3) For each 1 ≤ i ≤ m , choo se g i = g u i (mo d q ) as the “intended ”hash values. 4) For e ach 1 ≤ i ≤ m , c ompute x i 0 = { u i − P n j =1 x ij u j − u n +1 r } u − 1 0 (mo d p ) . 5) Let ˆ X = (ˆ x 1 , ˆ x 2 , · · · , ˆ x m ) T , where ˆ x i = ( x i 0 , x i 1 , · · · , x in , r i ) for all 1 ≤ i ≤ m . 6) Outpu t x , c an d the pu blic param eter g 0 , g 1 , · · · , g n +1 p, q . Where x is the linear combinatio n x = P m i =1 c i ˆ x i . V erification Alg o rithm : During verification, each nod e is given a pac ket x and public inf ormation t . In the case where this packet is not tamp ered with, x is th e linear combinatio n x = P m i =1 c i ˆ x i , and t rep resents pub lic param eters g 0 , g 1 , · · · , g n +1 , p, q an d c . Each n ode can verify th e integrity of th e packet as follows 1) Fro m c , comp ute c 1 , c 2 , · · · , c m 2) Com pute the hash value H 1 = H (x) . 3) Com pute the hash value H 2 = Q m i =1 h i c i (mo d q ) , ( h i = g i , f or 1 ≤ i ≤ m ) 4) V erif y that H 1 = H 2 In our scheme, ev ery node selects and distributes rand om values to all its fo llowing nod es in stead of the transmission of the coefficients. The co efficients are generate d fro m a shar ed pseudo- random nu mber generato r in each node and th e globa l encodin g kerne l c an be calculate d re cursiv ely in any upstrea m to downstream or der . In practice, the need for distributing r andom values can be further elim inated b y using a pu blic ran dom fun ction. For example, it can be the SHA-1 h ash o f the orig inal file identifier, creation date, publisher, and o ther d ata that a re public a nd should be known to all the rec ei vers bef ore the download session b egins. Our verification schem e enables the nodes to check th e integrity of p ackets witho ut the requirement for a secure channel. Also, the com putation in volved in the hash values generation and verification pr ocesses is very simple. I V . S E C U R I T Y O F O U R A L G O R I T H M A. S ecurity against the con tamination adv ersaries It can be shown that the basic verificatio n sche me is inde ed secure if DL [ g , p, q ] is hard, using an argument similar to tha t in [10] (p roof of Theor em 3 ). Theor em 1: It is co mputation ally infeasible to find ˆ X = (ˆ x 1 , ˆ x 2 , · · · , ˆ x m ) T , y an d c = ( c 1 , c 2 , · · · , c m ) suc h that f or x = P m i =1 c i ˆ x i , we have y 6 = x an d H (x) = H (y) , n amely the basic scheme is secu re if DL [ g , p, q ] is h ard. Pr oof: W e prove this th eorem by showing that if there is a polyno mial time algorithm A tha t fi nds ˆ X = (ˆ x 1 , ˆ x 2 , · · · , ˆ x m ) T y and c = ( c 1 , c 2 , · · · , c m ) such that fo r x = P m i =1 c i ˆ x i we have y 6 = x and H (x) = H (y) ) with probability p that is not negligib le, we can use it to construct a p olyno mial time algorithm B that fin d a collision x and y in H with the same probab ility p whic h is not negligible. Howe ver, if DL [ g , p, q ] is har d, Lemma 2 show that the hash function H is co llision f ree, and thus p should be n egligible, which is a contr adiction. Therefo re, the basic schem es are secure if the discrete lo garithm DL [ g , p, q ] is hard . B. S ecurity against the eav esdr oppin g adversaries Theor em 2: Giv en a n etwork th at the n umber of ind epen- dent m essage a vailable to Calvin is less th an the mu lticast capacity i.e. k = max i k i < m . The algo rithm in section -B achieves the practically secure co ndition at a pro bability o f one wh en ran dom code is used. Pr oof: In our algor ithm, Alice transmits ˆ X instead o f X, so th e message available to Calvin is A i ˆ X . As lo ng as Calvin doesn’t get r 1 , r 2 , . . . , r m from A i ˆ X , he can’t ge t the g lobal encodin g kern el abo ut X, and still can’t get any meaningfu l informa tion abou t X with out the globa l e ncoding kernel ab out X. So by tak ing linear combin ations of the observed packets A i ˆ X Calvin shou ldn’t b e able to recover r 1 , r 2 , . . . , r m which implies b i A i ˆ X 6 = I ˆ X( ∀ b i , i ) (9) where b i is a k i × m matrix in F p m and I is an m × m identity m atrix. Since the number of indepen dent messages av ailable to the eav esdropp er is less than the multicast capacity of the network, the con dition (9) can always be satisfied. Moreover , Calvin can ’t g et any packet of X by only g etting the value r 1 , r 2 , . . . , r m which im plies b i A i P 6 = I m,n ( ∀ b i , n, i ) ( 10) Multiplying b oth sides by P − 1 , we h av e b i A i 6 = I m,n P − 1 ( ∀ b i , n, i ) (11) Where I m,n is th e n th row of an m × m iden tity m atrix an d b i is a k i × m matrix in F p m . Th e above condition is satisfied if e ach row of P − 1 is no t in the r ow space of each A i . ours [9] [10] [11] Communication Overhead 0.48% 3.23%. >2.44% 4.86% Start-Up Latency 0.127s 0.125s 0.127s 0.25s index p ape r Fig. 3. Comparisons of Communication Overhea d and Start -Up Latency Theor em 3: In a network th at supports a mu lticast capacity of m , if at mo st k ( k < m ) ed ges can be tapp ed simu lta- neously th en the multicast ca pacity und er practical secu rity requirem ents is m − 2 m n (Here, th e asymptotica lly negligible term 2 m n correspo nds to th e overhead d ue to the red undan cy Alice ap pends to ˆ X ). Pr oof: The ne twork supports a mu lticast capac ity of m so a linear co de can be fo und to m ulticast m packets [3]. Fro m theorem 2 a transfo rmation at the sour ce can be a pplied to make it practical secu rity . V . D I S C U S S I O N In this section , we examine the overhead and the start- up latency indu ced by o ur scheme . For fair comp arisons, we choose n =410 , | p | = 320 and | q | = 1 024 in the f ollow discussion. Th e size of every p acket is 16KB. Wh at’ s m ore, we assum e that the original file is divided into m =1 0 packets. The co mparisons are shown in Fig.3 A. Commu nication Overhead The commu nication overhead is cau sed by two parts of parameters. Th e first part re fers to the amo unt of data we need to distribute to each nod e for the security of our scheme. The second part is th e code co efficients. The actual commu nication overhead largely depends on the parameters cho sen fo r the actual imp lementations. In the scheme proposed by Kr ohn [9], the parame ters ch osen for the ho momorp hic h ash function would generate a hash value of size 1024 bits per packet. The total size of coefficients is 3200 b its p er packet. Hence, the total size o f the “first- order”h ash values and the co efficients would b e 3 .22% of the original data. For a file of size 1 GB, their meth od would require h ash values of size 8 M B. T o distribute these h ash values, the au thors in Kr ohn [9] pro posed to recu rsiv ely a pply the same scheme o n th e 8 MB hash values, which would generate mor e “second ”or higher or der of hash values. T he size of th e hig h orde r hash values constitutes 0.01% o f the size o f the orig inal data. Hen ce the total overhead is 3 .23%. In the sch eme pr oposed by Zhao [11], if the file is divided into 1 0 p ackets, each packet is a vector in F q . The size of each p acket is also about 1 6KB. The size o f each augm ented vector (with co ding vectors in th e fro nt) is abo ut 16.4 KB, and thus, the overhead o f each packet is 2.4 3%. On the o ther hand, after the initial setu p, the scheme o f [11] has to pu blish 3200 bits f or the new signature vector for the secu rity of their scheme. Thus the total overhead of th eir scheme is 4.86%. In conclusion , alth ough they p roposed a simple sign ature scheme, the com munication overhead of their schem e is very high . The scheme in [ 10] requir ed p adding of three values and they should also distrib ute the co efficients. T he ov erhead caused b y th e co efficients themselves is 2.44%. Therefor e the commun ication overhead of their scheme is h igher than us, although th ey u se th e tech nical o f b atching verificatio n. In our scheme, the coefficients are gen erated fro m a pseud o-rand om number generator in each n ode so the distribution of them is av oid ed. The commu nication overhea d of o ur schem e is only cau sed by padding we add in e very p acket. Each packet distributed only incurs 0.48% overhead which is negligible compared with previous works. Formally , Let the file size be S and each one of which is a vecto r in F p . The size of each vecto r is B = n lo g( p ) an d we have S = mn lo g( p ) . The size of each augmen ted vector (with the padding in the front an d the back) is B a = ( n + 2)lo g( p ) , an d thus, the overhead o f the packet is 2 n times the file size. Note tha t the c ommun ication overhead of ou r schem e is asymptotically n egligible. B. S tart-Up Latency At the beginning of a conte nt distrib u tion ses sion, the source and all th e no des participatin g in the distribution have to a gree on th e set of pa rameters used for th e codin g and verification. The public p arameters in our schem e are p , q , g 0 , g 1 , · · · , g n +1 and the total size o f the public parameters is appr oximately 16.3 KB. W ith these par ameters it would be su fficient f or any node to perform verification . Assuming that th e bandwid th between a node and th e source (or any o ther node fro m wh ich these p arameters are distributed) is 1 Mbps, it would take less than 0 .127 secon ds bef ore the no de is re ady to p erform verification. T he start-up latency in our scheme is fixed o nce the p arameters for th e hash function an d the block size ar e chosen, an d is indepe ndent of the size of the co ntent to b e distributed. The start-up latency of [10] is 0. 127 seconds, more or less the same with us. For the scheme in [9], the size of all the public p arameters is the same as the size of the da ta in a packet, which is 16 KB. It takes 0. 125 secon ds to be transmitted on the same link . However , when the node needs to r eceiv e 8 MB hash v alues of a 1 GB file as in the example giv en in [9], it would requ ire 64 secon ds, with the same 1 Mbps link. The start-up latency is p ropor tional with th e size of the file. T he pu blic p arameters of [11] consist of two parts: the p ublic parameters and the signature vectors. T he size o f their p ublic parameters is 32 .8KB an d it takes 0.25 seconds to be transmitted on the same lin k. Father more they have to publish n ew sign ature vectors for the security of their scheme in every setup. V I . C O N C L U S I O N In this paper, we investigate the secu rity issues th at arise from using network coding an d propose a secure algorithm. By means of our a lgorithm every node can verify the in tegrity of the r eceived packets easily and an eavesdropper is un able to get a ny meaning ful information abou t the source. W e show that when we give up a small amoun t of overall cap acity , the practically secu re co ndition can be ac hiev ed at a p robab ility of 1, which is much h igher th an th at of [1]. W e also pro pose a new p aradigm where the public parame ters are selected as the “intended ”hash values and the co de coefficients a re g enerated in a pseud o-rand om n umber generato r in e very no de. In this way th e distribution o f th e hash values and the coefficients are av oid ed. W e h av e shown that the co mmunicatio n overhead of our algorithm is 2 n , which is negligible co mpared with previous works and the start-up latency is tran sitory . A C K N OW L E D G M E N T The work o f Y ejun Zh ou, Hui Li and Jianfeng Ma was supported by National Natural Science Foundation of China grant 60 7721 36, National Natur al Science Founda tion o f China g rant 6 0633 020, 863 Hi-T ech Research an d Devel- opment Prog ram of China gr ant 200 7AA01Z 435, and 863 Hi-T ech Research and Development Pro gram of China grant 2007AA0 1Z429 . R E F E R E N C E S [1] K. Bhattad and K. R. Narayanan, “W eakly secure network coding, ” In Pr oc. of the F irst W orkshop on Network Coding, Theory , and Applica- tions (NetC od) , Ri va del Garda, Italy , 2005. [2] R. Ahlswede, N. Cai, S. -Y . R. Li, and R. W . Y eung, “Netw ork informa- tion flow , ” IEEE T rans. Inf. Theory , vol. 46(4), pp. 1204-1216, 2000. [3] S. -Y . R. Li, R. W . Y eung, and N. Cai, “Linear netw ork coding, ” IEEE T rans. on Informa-tion Theory , vol. IT -49, pp. 371-381, 2003. [4] T . Ho, R. Koe tter , M. M ´ e dard, D. R. Karger , and M. Effro s, “The benefits of coding over routing in a randomized setti ng, ”in Internati onal Symposium on Infor-mati on Theory (ISIT) , 2003. [5] T . Ho, M. M ´ e dard, J. Shi, M. Ef fros and D. R. Karger , “On randomized netw ork coding, ” In pr oc. 41st A nnual A llerton Confe ren ce on Commu- nicati onContr ol and Computi ng , Oct. 2003. [6] T . C. Ho, B. L eong, R. Koette r , M. M ´ e dard, M. Effros, and D. R. Karge r , “Byza ntine modification detect ion in multicast netw orks using randomize d netw ork coding, ” in International Symposium on Information Theory ,Chic ago, USA, June 2004. [7] C. Gkantsidis, J . Miller , P . Rodriguez, “Comprehensi ve V ie w of a Li ve Networ k Coding P2P System, ” Pr oceedings of the 6th A CM SIGCOMM confer ence on Internet measurement Oct. 2006 [8] S. Jaggi, M. Langberg, S. Katti, T . Ho, D. Katabi, and M. M ´ e dard, “Resili ent network coding in the presence of Byzantin e adver- saries, ”ac cepte d to IEE E INFOCOM’07 , Anchorage, Alaska, May 2007. [9] M. N. Krohn, M. J. Freedman, and D. Mazi ´ e res, “On-the-fly veri fication of rateless era-sure codes for efficie nt content distributio n, ” IEEE Symp. Securit y and Privacy , Oak-land, CA, pp. 226-240, May 2004. [10] Qiming Li, Dah-Ming Chiu, John C.S. Lui, “On the Practical and Securit y Issues of Batch Content Distrib ution V ia Network Coding, ” 14th IEEE International Confer ence (ICNP ’06) pp. 158 - 167, Nov 2006. [11] Fang Zhao, T on Kalke r , M. M ´ e dard, and Kee sook J. Han, “Signature s for content distribut ion with netwo rk coding, ” ISIT2007 , Nice, France, June 24 - June 29, 2007 [12] N. Cai and R. W . Y eung, “Secure network coding, ” I nternatio nal Sym- posium on Information Theory (ISIT) Lausanne, Switzer land, June 30 - July 5, 2002. [13] J. Feldman, T . Malkin, C. S tein, R. A. Servedio “On the capacit y of secure netwo rk coding, ” In Proc . 42nd Annual Allerton Confer ence on Communicat ion, Contr ol, and Computing , Sep. 2004. [14] K. Jain, “Security based on network topolog y against the wireta pping attac k, ” IEEE W ireless Communications , pp. 68-71, Feb, 2004. [15] L Lima, M. M ´ e dard, J Barros, “Random linea r networ k coding: A free cipher?” In Pr oc. of IEEE Internationa l Symposium on Information Theory (ISIT) , Nice, France, June 24-29 2007. [16] Ngai. Chi Kin, Y ang. Shenghao, “Deterministi c Secure Error-Correct ing (SEC) Netw ork Codes, ” IEEE Informati on Theory W o rkshop (ITW) , pp. 96 - 101, Sept2-6 2007. [17] M. Bellare , O. Goldreich, and S. Goldwa sser , “Incrementa l cryptogra - phy:The case of hashing and signing, ” CRYPTO , 1994. Secure Netw ork Coding Against the Contaminati on and Ea v esdropping Adv ersaries Y ejun Zhou, Hu i L i a nd Jianfeng Ma Ke y lab of CNIS Ministry of Educatio n Xidian Un iv ersity Xi’an, Sh anxi 7 10071 , Chin a Email: { yjzhou , lih ui, jfma } @mail.xidian. edu.cn Abstract —In th is p aper , we p ropose an algorithm that targets contamination and eav esdropping adversaries . W e consider th e case when th e number of i ndependent packets av ail able to the ea vesdr opper is less than th e multicast capacity of the network. By means of our algorithm ev ery node can verify the integrity of the recei ved packets easily and an eav esdropper is unable to get any “meaningful information”about the sourc e. W e call it “practical security”if an eaves dropper i s unabl e to get any meaningful info rmation about the sour ce. W e show that, by gi ving up a small amount of overa ll capacity , our algorithm achieves achiev es the practically secure condition at a probability of one, which is much higher than th at of Bhattad and Narayanan’ s [1]. Furthermore, the communication overhead of our algorithm are negligible compared with previous works, sin ce the transmission of the h ash va lues and the code coefficients are both av oided. I . I N T RO D U C T I O N The concept of network coding was first intro duced by Ahlswede et al. [2]. They showed that multicast rates could be increased by allowing for network codin g in stead of ju st routing . Shortly afterwards, Li, Y eung and Cai [3] showed that it is sufficient for the encod ing fu nctions at the in terior n odes to be linear . Ho et al. [4 ] and [5] propo sed a rando m cod ing scheme in wh ich the messag e on ou tgoing edges of a no de are chosen to be a ra ndom linear c ombinatio n of the message on its in coming edges. In reality , network transmission may suffer from two kinds of adversaries: contam ination an d eav esdropp ing. Network coding has been studied to co n-que r these two kinds of adversaries. Ho et al. [6] con sidered th e prob lem of network coding in the presence of Byzantin e attacker . Gk antsidis et al. [7] also con sidered the r elated prob lem. Jaggi et al. [8] designed a resilient network co ding algorithm wh ich is informa tion-theor etically secu re and rate-op timal fo r d ifferent adversarial strengths. Homo morph ic hashing function was first propo sed in [9], which a llows nod es to ch eck blo cks o n-the- fly in a system wh ere content is encoded at the source using rateless codes. Howe ver , the total size of the ha sh values of their scheme is propo rtional to the numb er o f blocks, which could be very large and the cryptogra phic hash function is computatio nally expensive. Li e t al. [10] employed a batch content distribution verification schem e, wh ich re duced th e computatio nal cost of each node to cac he and scan all th e received p ackets when com puting a new p acket. The cry p- tograph ic hash function of their sch eme is co mputation ally inexpensive compared with which in [9]. Unfortu nately , their scheme deviate from th e classical network coding sch eme, which is band width consu med and dela y cou ld be induced a t the sinks. On the other h and, although batching can d ecrease the compu tation time, b atching block verification has the risk of letting so me malicious packets pr opagate since pa ckets ar e exchanged witho ut be ing ch ecked. Thu s, standa rd b atching technique s do not work well with n etwork co ding. Zhao et al. [ 11] p resented a signature schem e with low comp utation, but their sch eme required long start-up latency . Fin ally , all the works pre sented above have to distribute the co efficients which is band width consu med. Cai an d Y eung [12] consider ed th e problem o f u sing n et- work co ding to ac hieve perfect in formatio n security ag ainst an eavesdropper who can eavesdrop on a limited num ber of network links, and presented th e con struction of a secure linear network co de for this p urpose. A similar p roblem was considered in [13] fe aturing a rando m co ding approach in which o nly the input vector is m odified. Bhattad and Naray anan [1] first defin ed a mo del for security that is more suitab le fo r p ractical ap plications. I n th is paper, we also consider this ty pe of model, wh ich is not informa tion theoretically secure, but is secu re enou gh for the app lication. An in teresting ob servation made in [ 14] was th at fo r a compu - tation limited e av esdropp er with the use o f one way f unction it is possible to transmit at a high r ate witho ut the eavesdropper getting any m eaningfu l information about the source. A more general threat posed by interm ediate n odes was considered in [15]. In this paper, we consider these two kin ds of adversar ies at the sam e time, that is, th e adversary can contam inate the transmission on a subset of channels, and at the sam e time eav esdrop on another su bset o f chan nels with cardinality less than or equal to m . Ngai and Y ang [16] studied the similar problem an d co nstructed a secur e er ror-correcting network codes. The m ain co ntribution of this paper is to propose an algorithm , which can not only verify th e integrity o f the received packets ea sily but also achieve the pr actically secu re condition at a pr obability of on e. I n o ur schem e, we use the public p arameters as the “intend ed”hash values. T he o riginal packets are padd ed so that they are ha shed to the pub lic parameters. In this way the tra nsmission of the ha sh values is a voided. The code coefficients in our scheme a re g enerated in a p seudo-ra ndom number generato r in each node, so the distribution o f the coefficients is also avoided. W e show that the com munication overhead and the start-u p latency are negligible since the transmission of the hash values an d the coefficients are bo th av oided. This p aper is organ ized as fo llows. In the next section we giv e the n otations u sed in this pa per . The secure n etwork coding scheme is pr oposed in section III. I n Sectio n IV , we present the security of o ur algor ithm. Overhea d an d start- up latency of o ur a lgorithm are discussed in Sectio n V . Finally , this p aper is conc luded in section VI. I I . N E T WO R K M O D E L A N D N OT I O N S In this paper, we assume that all th e messages a nd co- efficients are generated in F p , where p is a large en ough prime numb er . we sha ll u se small letters x,y etc. to denote vectors whose dimension s will b e clear fro m the context. The matrices are denoted by the capital letters such as X, ˆ X etc. The transpose o perator of vectors an d matrices will be deno ted by “ T ”thus x T will stand fo r colu mn vectors. A. Network Mo del W e repr esent a network by a directed graph G = ( V ; E ) , where V is the set of vertices (nod es) and E is the set o f edges (chann els). W e assume an or der on V which is consistent with the a ssociated partial ord er on G . A ne twork code is said to be linear if the message on any outgoin g edg e of any node is a linea r comb ination o f th e messages on the incomin g edges of the no de. In this paper, we assume th at th e source no de send s infor- mation X of the following form: X = x 11 x 12 . . . x 1 n x 21 x 22 . . . x 2 n . . . . . . . . . . . . x m 1 x m 2 . . . x mn = x 1 x 2 . . . x m (1) W e call x i , i = 1 , . . . , m a packet. Therefo re, for a linear co de the message o n edge e j ∈ E can be written as F e j X whe re F e j is a len gth m vector over F p (we call it global e ncoding kernel in this paper ) on edg e e j ∈ E . B. Th r eat Model There is a sou rce, Alice, and a destination, Bob, wh o commun icate over a wir ed or wireless network . There is also an eavesdropper Calvin, hidde n somewhere in the network. He aims to ea vesdr op on the transfer of in formatio n from Alice to Bob and in jects his own. A m alicious nod e can generate corrup ted packets and then distribute them to o ther n odes, which in turn u se them to (unintentionally) crea te ne w enco ded packets that are also c orrupted . A wiretap network is specified by a collection A of sets of edg es A = { A 1 , A 2 , . . . , A |A| } , A i ∈ E . Calvin selects a particu lar set A i ∈ A and listens to all messages tran smitted on edges in A i to get some 1 2 3 4 5 6 7 1 2 x x 1 2 x 2 x 1 2 2 x 3 x 1 2 3 4 5 6 7 1 x w w 1 x 2 w ( ) a ( ) b Fig. 1. Networks informa tion. W e assume that the set doesn’t chang e with time. When we are specified a linear cod e and a wiretap n etwork we use A i to r epresent a matrix wh ose rows co ntain all linearly indepen dent global enco ding kernel correspo nding to edge e j ∈ A i . In this case, the messages av ailable to Calvin is A i X . T he n umber o f rows in A i is represented by k i . W e define k as max i k i . C. Notions 1) : T he ne twork capacity is the time -av erage o f the maximum number of p ackets th at can be d eli vered fro m Alice to Bob, assuming no adversarial inte rference, i.e., the max - flow . T o simp lify notion , in this paper, we assume the max- flow f rom Alice to Bob is m . 2) : Practical security : Consid er a set of messages M . Let U b e subset of the set con taining the multicast informa tion X. W e say th at M has no in formation about U if I ( U ; M ) = 0 . W e say that M has no meaning ful infor mation abou t U if I (x i ; M ) = 0 , ∀ x i ∈ U . In this pap er we concentrate on two special cases an d gen eralize the r esults towards the en d. W e say that Calv in has no in formatio n ab out the source if I ( X ; M ) = 0 where M is the set of messages that Calvin chooses to obser ve. The secu rity cond ition con sidered by Cai and Y eun g [12] falls in this categor y . W e will use Shann on security to refer to this security requiremen t. Th e second case we con sider is when Calvin gets no meaningf ul in formation about the sou rce i.e. I (x i ; M ) = 0 , ∀ x i for messages M observed by Calvin . W e call this type o f secur ity as practical security . It is n oted that if Alice transmits a linear transfor mation of X, PX, instead of X th en th e message transmitted on edge e j would be F e j PX (P is a m × m matr ix which is u nknown to Calvin). In th is case, althou gh Calvin ha s some infor mation about the source he is unab le to ge t a ny meaningf ul in formatio n. As shown in Fig.1, let us assume that Calvin can listen to any one ed ge of this network. The m ulticast cap acity for this network is 2. x 1 and x 2 are the me ssages of Alice. I n Fig.1 (a), w is a u niform ran dom sequenc e indepen dent of the messages. This is an examp le o f the cod ing scheme constru cted by Cai and Y eung [12]. Obviously , the m aximum multicast capacity supported is 1 when this system has to be Shanno n secu re. When the security condition is relaxed to p ractical secu rity , as shown in Fig.1 (b), the max-flow can be ach iev ed. I I I . S E C U R E N E T W O R K C O D I N G A. Th e Ho momorphic Hash Functio n W e first c hoose the hash parameter s q , g . L et o ( x ) denote the o rder of x in th e field F q . Here we choo se o ( g ) = p in F q ( F p is a subfield of F q ). Fur thermor e we randomly select n + 2 number s u 0 , u 1 , · · · , u n , u n +1 from F p . Next, we compute g i = g u i (mo d q ) for all 0 ≤ i ≤ n + 1 . The public parameter of th e hash fun ction is p , q , g 0 , g 1 , · · · , g n +1 . Whereas u 0 , u 1 , · · · , u n +1 and g should b e kept secre t. Formally , we define D L [ g , p, q ] to b e the com putational problem : Gi ven y , g and q , where o ( g ) = p in F q , find x such tha t y = g x (mo d q ) . Hence, we have Lemma 1: Given g 0 , g 1 , · · · , g n +1 , an d the public parame- ters p , q , it is computatio nally infeasible for a node to find u 0 , u 1 , · · · , u n +1 , such that g i = g u i (mo d q ) if DL [ g , p, q ] is hard. Assume tha t each message is of the form: x = ( x 0 , x 1 , · · · , x n , r ) wh ere x i , r ∈ F p for 0 ≤ i ≤ n , and th e hash of x is com puted as H (x) = ( n Y i =0 g i x i ) g r n +1 (mo d q ) (2) Based o n this constru ction, we h av e H (x) = g ( P n i =0 u i x i + u n +1 r ) (mo d p ) (mo d q ) (3) For any two messages x = ( x 0 , x 1 , · · · , x n , r 1 ) and y = ( y 0 , y 1 , · · · , y n , r 2 ) , we define the addition o f x a nd y as x + y = ( z 0 , z 1 , · · · , z n , r ) (4) where r = ( r 1 + r 2 ) (mo d p ) and z i = ( x i + y i ) (mo d p ) for 0 ≤ i ≤ n Hence, this hash function has the following ho mmorph ic proper ty H (x) H (y) = H ( x + y) (5) The security of H is defined in term s o f the d ifficulty in finding collisions. It can be shown that the hash fun ction is indeed collision fr ee if D L [ g , p, q ] is ha rd. In particu lar we have: Lemma 2: The hash fun ction H is collision-f ree (na mely it is co mputation ally in feasible to find two d ifferent messages x 1 and x 2 such tha t H (x 1 ) = H ( x 2 ) if DL [ g , p, q ] is h ard. It c an be pr oved that the hash function is inde ed collision- free, u sing an argument in [17] ( proof o f The orem 3.4 ). B. Alice ’ s Encode r a nd B ob’s Decoder Alice ′ s enco der : Alice encodes X in the following steps. She first ch ooses m p arity symbols r d , fo r d ∈ { 1 , · · · , m } unifor mly at rando m f rom the field F p and then generates a V andermo nde m atrix P as follows P = r 1 r 2 . . . r m r 2 1 r 2 2 . . . r 2 m . . . . . . . . . . . . r m 1 r m 2 . . . r m m (6) In the second step, Alice per-multiplies the source me ssage X with P X ′ = P X = x ′ 11 x ′ 12 . . . x ′ 1 n x ′ 21 x ′ 22 . . . x ′ 2 n . . . . . . . . . . . . x ′ m 1 x ′ m 2 . . . x ′ mn = x ′ 1 x ′ 2 . . . x ′ m (7) In the th ird step, Alice ad ds r 1 , r 2 , . . . , r m to X ′ and ge ts X ′′ as fo llows X ′′ = x ′ 11 x ′ 12 . . . x ′ 1 n r 1 x ′ 21 x ′ 22 . . . x ′ 2 n r 2 . . . . . . . . . . . . . . . x ′ m 1 x ′ m 2 . . . x ′ mn r m = x ′′ 1 x ′′ 2 . . . x ′′ m (8) Alice uses g i , 1 ≤ i ≤ m ( it is possible since any practical network cod ing system would make m ≪ n ) as the “intended ”hash values o f x ′′ 1 , x ′′ 2 , . . . , x ′′ m . A list o f p adding x 10 , x 20 , . . . , x m 0 can be co mputed. W e a dd the pa dding to ev ery packet a nd ge t b X as show in Fig 1. When a message packet x ′′ i is padde d with x i 0 to for m th e new packet b x i , th en H ( b x i ) = g i , 1 ≤ i ≤ m . Bob ′ s de co der : Bob first decodes b X and ge ts r 1 , r 2 , . . . , r m . The V ander monde matr ix P ca n be compu ted from r 1 , r 2 , . . . , r m . Bob then per-multiplies the associated matrix PX with P − 1 and g ets the origin al packet X. C. The Basic V erificatio n S cheme As show in Fig.1, It is n oted that the message can o nly be p added using the secret key that is kn own o nly by Alice. Next, Alice chooses a seed c and feed it to a pseud o-rand om generato r G . I nstead of choosing th e coefficients, the source uses the random number s c 1 , c 2 , . . . , c m generated by G as th e “intended ”coefficients. Since the coefficients can be computed from the p ublic function G , th ere w ould be no n eed to distribute the co efficients, and it suf fices if all the no des know c . Our pro posed scheme consists of two algorith ms, namely the enco ding algo rithm and the verificatio n algorithm . Enco ding Algorithm : The encode r per forms the f ol- lowing step s 1) Cho ose a rando m seed c . 1 , , m g g ! 1 , , m c c ! 1 , , m c c ! Source Node 0 1 1 , , , n g g g ! 0 1 1 , , , n g g g ! D i s t ri b u te t o a ll n o de s 11 12 1 1 21 22 2 2 1 2 X = n n m m m n m x x x r x x x r x x x r c c c § · ¨ ¸ c c c ¨ ¸ c c ¨ ¸ ¨ ¸ c c c © ¹ " " # # % # # " 10 11 1 1 20 21 2 2 0 1 ˆ X = n n m m mn m x x x r x x x r x x x r c c § · ¨ ¸ c c ¨ ¸ ¨ ¸ ¨ ¸ c c © ¹ " # # # % # # " Homomorphic Hash Function P er f o rm p a d d in g P se ud o -ra n d om Pseu do- rando m g e n e r a t o r g en e ra to r Perfor m verificat ion L ine ar c omb ina tio n seed Fig. 2. The basic verificat ion s cheme 2) Generate pseud o-rand om n umbers c 1 , c 2 , · · · , c m from G with c . 3) For each 1 ≤ i ≤ m , choo se g i = g u i (mo d q ) as the “intended ”hash values. 4) For e ach 1 ≤ i ≤ m , c ompute x i 0 = { u i − P n j =1 x ij u j − u n +1 r } u − 1 0 (mo d p ) . 5) Let ˆ X = (ˆ x 1 , ˆ x 2 , · · · , ˆ x m ) T , where ˆ x i = ( x i 0 , x i 1 , · · · , x in , r i ) for all 1 ≤ i ≤ m . 6) Outpu t x , c an d the pu blic param eter g 0 , g 1 , · · · , g n +1 p, q . Where x is the linear combinatio n x = P m i =1 c i ˆ x i . V erification Alg o rithm : During verification, each nod e is given a pac ket x and public inf ormation t . In the case where this packet is not tamp ered with, x is th e linear combinatio n x = P m i =1 c i ˆ x i , and t rep resents pub lic param eters g 0 , g 1 , · · · , g n +1 , p, q an d c . Each n ode can verify th e integrity of th e packet as follows 1) Fro m c , comp ute c 1 , c 2 , · · · , c m 2) Com pute the hash value H 1 = H (x) . 3) Com pute the hash value H 2 = Q m i =1 h i c i (mo d q ) , ( h i = g i , f or 1 ≤ i ≤ m ) 4) V erif y that H 1 = H 2 In our scheme, ev ery node selects and distributes rand om values to all its fo llowing nod es in stead of the transmission of the coefficients. The co efficients are generate d fro m a shar ed pseudo- random nu mber generato r in each node and th e globa l encodin g kerne l c an be calculate d re cursiv ely in any upstrea m to downstream or der . In practice, the need for distributing r andom values can be further elim inated b y using a pu blic ran dom fun ction. For example, it can be the SHA-1 h ash o f the orig inal file identifier, creation date, publisher, and o ther d ata that a re public a nd should be known to all the rec ei vers bef ore the download session b egins. Our verification schem e enables the nodes to check th e integrity of p ackets witho ut the requirement for a secure channel. Also, the com putation in volved in the hash values generation and verification pr ocesses is very simple. I V . S E C U R I T Y O F O U R A L G O R I T H M A. S ecurity against the con tamination adv ersaries It can be shown that the basic verificatio n sche me is inde ed secure if DL [ g , p, q ] is hard, using an argument similar to tha t in [10] (p roof of Theor em 3 ). Theor em 1: It is co mputation ally infeasible to find ˆ X = (ˆ x 1 , ˆ x 2 , · · · , ˆ x m ) T , y an d c = ( c 1 , c 2 , · · · , c m ) suc h that f or x = P m i =1 c i ˆ x i , we have y 6 = x an d H (x) = H (y) , n amely the basic scheme is secu re if DL [ g , p, q ] is h ard. Pr oof: W e prove this th eorem by showing that if there is a polyno mial time algorithm A tha t fi nds ˆ X = (ˆ x 1 , ˆ x 2 , · · · , ˆ x m ) T y and c = ( c 1 , c 2 , · · · , c m ) such that fo r x = P m i =1 c i ˆ x i we have y 6 = x and H (x) = H (y) ) with probability p that is not negligib le, we can use it to construct a p olyno mial time algorithm B that fin d a collision x and y in H with the same probab ility p whic h is not negligible. Howe ver, if DL [ g , p, q ] is har d, Lemma 2 show that the hash function H is co llision f ree, and thus p should be n egligible, which is a contr adiction. Therefo re, the basic schem es are secure if the discrete lo garithm DL [ g , p, q ] is hard . B. S ecurity against the eav esdr oppin g adversaries Theor em 2: Giv en a n etwork th at the n umber of ind epen- dent m essage a vailable to Calvin is less th an the mu lticast capacity i.e. k = max i k i < m . The algo rithm in section -B achieves the practically secure co ndition at a pro bability o f one wh en ran dom code is used. Pr oof: In our algor ithm, Alice transmits ˆ X instead o f X, so th e message available to Calvin is A i ˆ X . As lo ng as Calvin doesn’t get r 1 , r 2 , . . . , r m from A i ˆ X , he can’t ge t the g lobal encodin g kern el abo ut X, and still can’t get any meaningfu l informa tion abou t X with out the globa l e ncoding kernel ab out X. So by tak ing linear combin ations of the observed packets A i ˆ X Calvin shou ldn’t b e able to recover r 1 , r 2 , . . . , r m which implies b i A i ˆ X 6 = I ˆ X( ∀ b i , i ) (9) where b i is a k i × m matrix in F p m and I is an m × m identity m atrix. Since the number of indepen dent messages av ailable to the eav esdropp er is less than the multicast capacity of the network, the con dition (9) can always be satisfied. Moreover , Calvin can ’t g et any packet of X by only g etting the value r 1 , r 2 , . . . , r m which im plies b i A i P 6 = I m,n ( ∀ b i , n, i ) ( 10) Multiplying b oth sides by P − 1 , we h av e b i A i 6 = I m,n P − 1 ( ∀ b i , n, i ) (11) Where I m,n is th e n th row of an m × m iden tity m atrix an d b i is a k i × m matrix in F p m . Th e above condition is satisfied if e ach row of P − 1 is no t in the r ow space of each A i . ours [9] [10] [11] Communication Overhead 0.48% 3.23%. >2.44% 4.86% Start-Up Latency 0.127s 0.125s 0.127s 0.25s index p ape r Fig. 3. Comparisons of Communication Overhea d and Start -Up Latency Theor em 3: In a network th at supports a mu lticast capacity of m , if at mo st k ( k < m ) ed ges can be tapp ed simu lta- neously th en the multicast ca pacity und er practical secu rity requirem ents is m − 2 m n (Here, th e asymptotica lly negligible term 2 m n correspo nds to th e overhead d ue to the red undan cy Alice ap pends to ˆ X ). Pr oof: The ne twork supports a mu lticast capac ity of m so a linear co de can be fo und to m ulticast m packets [3]. Fro m theorem 2 a transfo rmation at the sour ce can be a pplied to make it practical secu rity . V . D I S C U S S I O N In this section , we examine the overhead and the start- up latency indu ced by o ur scheme . For fair comp arisons, we choose n =410 , | p | = 320 and | q | = 1 024 in the f ollow discussion. Th e size of every p acket is 16KB. Wh at’ s m ore, we assum e that the original file is divided into m =1 0 packets. The co mparisons are shown in Fig.3 A. Commu nication Overhead The commu nication overhead is cau sed by two parts of parameters. Th e first part re fers to the amo unt of data we need to distribute to each no de for the security of our algorithm. The second part is th e code co efficients. The actual commu nication overhead largely depends on the parameters cho sen fo r the actual imp lementations. In the scheme proposed by Kr ohn [9], the parame ters ch osen for the ho momorp hic h ash function would generate a hash value of size 1024 bits per packet. The total size of coefficients is 3200 b its p er packet. Hence, the total size o f the “first- order”h ash values and the co efficients would b e 3 .22% of the original data. For a file of size 1 GB, their meth od would require h ash values of size 8 M B. T o distribute these h ash values, the au thors in Kr ohn [9] pro posed to recu rsiv ely a pply the same scheme o n th e 8 MB hash values, which would generate mor e “second ”or higher or der of hash values. T he size of th e hig h orde r hash values constitutes 0.01% o f the size o f the orig inal data. Hen ce the total overhead is 3 .23%. In the sch eme pr oposed by Zhao [11], if the file is divided into 1 0 p ackets, each packet is a vector in F q . The size of each p acket is also about 1 6KB. The size o f each augm ented vector (with co ding vectors in th e fro nt) is abo ut 16.4 KB, and thus, the overhead o f each packet is 2.4 3%. On the o ther hand, after the initial setu p, the scheme o f [11] has to pu blish 3200 bits f or the new signature vector for the secu rity of their scheme. Thus the total overhead of th eir scheme is 4.86%. In conclusion , alth ough they p roposed a simple sign ature scheme, the com munication overhead of their schem e is very high . The scheme in [ 10] requir ed p adding of three values and they should also distrib ute the co efficients. T he ov erhead caused b y th e co efficients themselves is 2.44%. Therefor e the commun ication overhead of their scheme is h igher than us, although th ey u se th e tech nical o f b atching verificatio n. In our scheme, the coefficients are gen erated fro m a pseud o-rand om number generator in eac h no de so the d istribution o f them is av oided. The com munication overhead of our scheme is only caused by padding we add in every p acket. Each pa cket distributed o nly incurs 0.48 % overhead which is negligible compare d with p revious work s. B. S tart-Up Latency At the beginning of a conte nt distrib u tion ses sion, the source and all th e no des participatin g in the distribution have to a gree on th e set of pa rameters used for th e codin g and verification. The public p arameters in our schem e are p , q , g 0 , g 1 , · · · , g n +1 and the total size o f the public parameters is appr oximately 16.3 KB. W ith these par ameters it would be su fficient f or any node to perform verification . Assuming that th e bandwid th between a node and th e source (or any o ther node fro m wh ich these p arameters are distributed) is 1 Mbps, it would take less than 0 .127 secon ds bef ore the no de is re ady to p erform verification. T he start-up latency in our scheme is fixed o nce the p arameters for th e hash function an d the block size ar e chosen, an d is indepe ndent of the size of the co ntent to b e distributed. The start-up latency of [10] is 0. 127 seconds, more or less the same with us. For the scheme in [9], the size of all the public p arameters is the same as the size of the da ta in a packet, which is 16 KB. It takes 0. 125 secon ds to be transmitted on the same link . However , when the node needs to r eceiv e 8 MB hash v alues of a 1 GB file as in the example giv en in [9], it would requ ire 64 secon ds, with the same 1 Mbps link. The start-up latency is p ropor tional with th e size of the file. T he pu blic p arameters of [11] consist of two parts: the p ublic parameters and the signature vectors. T he size o f their p ublic parameters is 32 .8KB an d it takes 0.25 seconds to be transmitted on the same lin k. Father more they have to publish n ew sign ature vectors for the security of their scheme in every setup. V I . C O N C L U S I O N In this paper, we investigate the secu rity issues th at arise from using network coding an d propose a secure algorithm. By means of our a lgorithm every node can verify the in tegrity of the r eceived packets easily and an eavesdropper is un able to get a ny meaning ful information abou t the source. W e show that when we give up a small amoun t of overall cap acity , th e practically secu re co ndition can be ac hiev ed at a p robab ility of 1, which is much h igher th an th at of [1]. W e also prop ose a new p aradigm where the public parameters are selected a s the “intended ”hash v alues and the co de c oefficients are ge nerated in a pseud o-rand om num ber gen erator in e very nod e. I n th is way th e distribution o f the hash values and the co efficients are av oid ed. W e h av e shown that the co mmunicatio n overhead of our algorithm is 2 n , which is negligible co mpared with previous works and the start-up latency is tran sitory . A C K N OW L E D G M E N T The work o f Y ejun Zh ou, Hui Li and Jianfeng Ma was supported by National Natural Science Foundation of China grant 60 7721 36, National Natur al Science Founda tion o f China g rant 6 0633 020, 863 Hi-T ech Research an d Devel- opment Prog ram of China gr ant 200 7AA01Z 435, and 863 Hi-T ech Research and Development Pro gram of China grant 2007AA0 1Z429 . R E F E R E N C E S [1] K. Bhattad and K. R. Narayanan, “W eakly secure network coding, ” In Pr oc. of the F irst W orkshop on Network Coding, Theory , and Applica- tions (NetC od) , Ri va del Garda, Italy , 2005. [2] R. Ahlswede, N. Cai, S. -Y . R. Li, and R. W . Y eung, “Netw ork informa- tion flow , ” IEEE T rans. Inf. Theory , vol. 46(4), pp. 1204-1216, 2000. [3] S. -Y . R. Li, R. W . Y eung, and N. Cai, “Linear netw ork coding, ” IEEE T rans. on Informa-tion Theory , vol. IT -49, pp. 371-381, 2003. [4] T . Ho, R. Koe tter , M. M ´ e dard, D. R. Karger , and M. Effro s, “The benefits of coding over routing in a randomized setti ng, ”in Internati onal Symposium on Infor-mati on Theory (ISIT) , 2003. [5] T . Ho, M. M ´ e dard, J. Shi, M. Ef fros and D. R. Karger , “On randomized netw ork coding, ” In pr oc. 41st A nnual A llerton Confe ren ce on Commu- nicati onContr ol and Computi ng , Oct. 2003. [6] T . C. Ho, B. L eong, R. Koette r , M. M ´ e dard, M. Effros, and D. R. Karge r , “Byza ntine modification detect ion in multicast netw orks using randomize d netw ork coding, ” in International Symposium on Information Theory ,Chic ago, USA, June 2004. [7] C. Gkantsidis, J . Miller , P . Rodriguez, “Comprehensi ve V ie w of a Li ve Networ k Coding P2P System, ” [8] S. Jaggi, M. Langberg, S. Katti, T . Ho, D. Katabi, and M. M ´ e dard, “Resili ent network coding in the presence of Byzantin e adver- saries, ”ac cepte d to IEE E INFOCOM’07 , Anchorage, Alaska, May 2007. [9] M. N. Krohn, M. J. Freedman, and D. Mazi ´ e res, “On-the-fly veri fication of rateless era-sure codes for efficie nt content distributio n, ” IEEE Symp. Securit y and Privacy , Oak-land, CA, pp. 226-240, May 2004. [10] Qiming Li, Dah-Ming Chiu, John C.S. Lui, “On the Practical and Securit y Issues of Batch Content Distrib ution V ia Network Coding, ” 14th IEEE International Confer ence (ICNP ’06) pp. 158 - 167, Nov 2006. [11] Fang Zhao, T on Kalke r , M. M ´ e dard, and Kee sook J. Han, “Signature s for content distribut ion with netwo rk coding, ” ISIT2007 , Nice, France, June 24 - June 29, 2007 [12] N. Cai and R. W . Y eung, “Secure network coding, ” I nternatio nal Sym- posium on Information Theory (ISIT) Lausanne, Switzer land, June 30 - July 5, 2002. [13] J. Feldman, T . Malkin, C. S tein, R. A. Servedio “On the capacit y of secure netwo rk coding, ” In Proc . 42nd Annual Allerton Confer ence on Communicat ion, Contr ol, and Computing , Sep. 2004. [14] K. Jain, “Security based on network topolog y against the wireta pping attac k, ” IEEE W ireless Communications , pp. 68-71, Feb, 2004. [15] L Lima, M. M ´ e dard, J Barros, “Random linea r networ k coding: A free cipher?” In Pr oc. of IEEE Internationa l Symposium on Information Theory (ISIT) , Nice, France, June 24-29 2007. [16] Ngai. Chi Kin, Y ang. Shenghao, “Deterministi c Secure Error-Correct ing (SEC) Netw ork Codes, ” IEEE Informati on Theory W o rkshop (ITW) , pp. 96 - 101, Sept2-6 2007. [17] M. Bellare , O. Goldreich, and S. Goldwasser , “Incremental cryptogra- phy:The case of hashin g and signing , ” CRYPTO , 1994.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment