Information-theoretically Secure Regenerating Codes for Distributed Storage

Information-theore tical ly Secure Re genera ting Codes for Distrib uted Storage Nihar B. Shah, K. V . Rashmi and P . V ijay K umar Abstract —Regenerating codes are a class of codes f or d is- tributed storage networks that provide reliability and av ailability of data, and a lso perform efﬁcient node repair . Another important aspect of a distributed sto rage network is its security . In this paper , we consider a threat model where an ea vesdropper ma y gain access to the data stored in a sub set of the storage nodes, and possibly also, to the data downloaded d uring repair of some nodes. W e prov ide explicit constructions of regenerating codes that achiev e information-theore tic secre cy capacity i n t h is setting. I . I N T R O D U C T I O N W e consider a distributed storage system consisting of n storage nodes in a network, each ha v ing a capacity to store α symbols over a ﬁnite ﬁeld F q of size q . Data cor respondin g to B message symb ols ( the message ), each dr awn un iformly and indep endently from F q , is to be dispersed a cross these n nodes. An end-u ser (called a data-collector ) m ust be able to r eco n struct the entire message b y downloadin g the da ta stored in any subset of k nodes. If data-reconstruc tion was the on ly requirem ent, any [ n, k ] max imum-distan ce-separable (MDS) code such as a Reed-Solomo n co de would sufﬁce. A second importa nt asp ect of a distributed storage system is the ha ndling o f node failures. When a stor age n ode fails , it is replaced b y a n e w , em pty n ode. T he rep lacement nod e is required to obtain the data that was previously stored in the failed no de b y d ownloading da ta f rom the remaining no des in the network. A typical means of acc omplishing th is is to download the entire message from the network , and extract the desired d ata from it. Howev er , downloadin g the entire message, when it e ventually stores only a fraction 1 k of it, is clearly wasteful of the network resource s. Recently , Dimakis et al. [1] introdu ced a n ew class of codes called ‘regene rating co des’ which are efﬁcient w ith respect to both storage space utilization and the amount of data downloaded for repair (termed r epair-bandwidth ) . Regenera t- ing codes permit node repair by downloading β symbols f rom any su bset of d ( ≥ k ) remaining n odes, and th e total repair - bandwidth dβ is typically much smaller tha n the me ssage size B . In [1] the authors also establish that th e parameters in volved must nec essarily satisfy the b ound: B ≤ k − 1 X i =0 min ( α, ( d − i ) β ) . (1) The authors are with the Dept. of ECE, Indian Institut e of Science , Bangal ore, India. Email: { nihar , rashmikv , vija y } @ece .iisc.erne t.in. P . V ijay Kumar is also an adjunct facult y member of the Electric al Engineering Systems Department at the Uni versity of Southern Cal ifornia , Los Angele s, CA 90089-2565. This work was supported by Infosys T echnologies Limited. It can be deduced (see [1]) that achieving e quality in (1), with parameters B , k and d ﬁxed, lead s to a tr adeoff between th e storage spac e α and the repair-bandwidth dβ . In this tradeoff, the case o f minimizing α ﬁr st and then β ( for ﬁxed d ) is termed as th e minimu m stor age r egenerating (MSR) case, while car rying out the m inimization in the reverse orde r is termed th e minimu m bandwidth regenerating (MBR) case. More details on the MSR and MBR cases are provid ed later in the p aper . Explicit constructions o f MSR an d MBR co des achieving this bound can be found in [2], [4]– [6]. The focus of the pr esent paper is on an add itional, important aspect of distributed sto rage systems, namely , security of the data. Nowadays, ind i viduals as well as b usinesses are incr eas- ingly st oring their data over untrusted networks. Peer -to-peer storage systems ha ve stor age nodes spread out g eograph ically . Such situations make the data prone to prying adversaries th at may g ain access to the data stored in some of the nodes. An eav esdroppe r can also g ain ad ditional inf ormation by listening to the data d ownloaded d uring multiple instances of repair of these nodes. It is imp erativ e to prevent such entities fro m gaining any usefu l inf ormation. T he present paper con structs explicit codes which, while satisfying the reconstruction and repair requirements in the d istributed stora ge network, pr e vents such an eav esdroppe r fro m o btaining any infor mation about the origin al message. The thr eat model considered in this paper is as follows. An eav esdroppe r can gain rea d-access to the data stored in any set of at-m ost ℓ ( < k ) storage nodes. The eav esdroppe r may also gain read -access to the d ata being d ownloaded dur ing (possibly multiple instances of) r epair o f som e ℓ ′ ( ≤ ℓ ) of these ℓ nodes. Note that the d ata downloaded by a rep lacement node durin g any instance of repair also contains the data that is eventually stored in that node. This is f ormalized in the following deﬁnition. Deﬁnition 1 ( { ℓ, ℓ ′ } secure distrib uted stor age system): Consider a distributed storage system in which an eav esdroppe r gains access to the data stored in some ( ℓ − ℓ ′ ) nodes, and the d ata stored as well as the data downloaded du ring r epair in so me o ther ℓ ′ nodes. An { ℓ, ℓ ′ } secure distributed storage system is o ne in which such an eav esdroppe r ob tains no info rmation about the message. W e assume that the eavesdroppers h a ve u nboun ded c om- putational power , ar e passive, n on-collu si ve, and that the underly ing code is globally known. As an examp le of this model, co nsider a p eer-to-peer storage system . The ℓ ′ nodes described above may represent n odes that are in a n etwork belongin g to an adversary , thereby allowing the eav esdropper to listen to all the data do wnloaded as these ℓ ′ nodes u ndergo (possibly m ultiple) failures and r epairs acr oss time. On th e other hand, the ( ℓ − ℓ ′ ) nodes may rep resent th e nodes wh ich 2 may be exposed only mo mentarily , allowing the eavesdropper access to only th e data stor ed. The problem of providing info rmation-th eoretic secrecy in distributed storage systems ca n be re lated to the W iretap Channel II [7] where an eav esdroppe r , listen ing to any ar- bitrary subset (of ﬁxed size) of symbols being transmitted over a no iseless po int-to-po int chann el, o btains essentially no inf ormation abou t th e or iginal message. While schemes providing secrecy in a distributed storage system with on ly the reconstruc tion requirem ent would follow from [7], th e requirem ent o f a ddressing no de-repair makes th e pro blem harder . Amo ng recen t resu lts in the con text of distributed storage, the prob lem of securely disseminating encoded data to the stor age nod es is considered in [8], and an a nalysis of com- munication an d interactio n requireme nts between the n odes is provided. In [ 9], the authors conside r the situation where data is stored over two n etworks, a nd an eav esdroppe r may gain access to any on e o f these networks. Connection s b etween optimal repair in distrib uted storage and communication across multiple-acce ss wiretap channels are estab lished in [10]. The system model c onsidered in the presen t pap er is based on the model introd uced by Paw ar et al. [3]. In [ 3], the au thors consider the case when ℓ ′ = ℓ and provide an upper bou nd on the number of message symbo ls B ( s ) that ca n be stored in the inf ormation- theoretically secure system as B ( s ) ≤ k − 1 X i = ℓ min ( α, ( d − i ) β ) . (2) The bound in (2) can be interpreted in the following intu iti ve manner . Out of the k nodes to which a data- collector con- nects, co nsider the case wh ere the ﬁrst ℓ o f these nodes are compro mised. Thu s, assuming the secrecy go als have been met, th ese ℓ nodes will provide zero in formation ab out the message symbols, an d only the remaining ( k − ℓ ) n odes in the sum mation in (1) p rovide useful information . It can be shown that the bound in ( 2) is, in fact, an u pper bou nd on the number of m essage symb ols in an information -theoretically secure system for all values of ℓ ′ . In th e seq uel, no tation pertainin g to th e secu re version o f the code will fre quently be indicated by the supe rscript ( s ) . For instance, B ( s ) denotes the n umber of message symbols in a system with secrecy constraints, and B denotes the number of m essage symbo ls in a system withou t secrecy constra ints (i.e., w hen ℓ = ℓ ′ = 0 ). Note that the difference B − B ( s ) is the pr ice paid for the additional secrecy constraint. In [3], the autho rs also sho w that the MBR code presen ted in [ 4] fo r the parame ters [ n, k , d = n − 1] can be made informa tion-theor etically secure by makin g use of a nested MDS cod e in the co nstruction. In the pr esent paper , we provide explicit constructions for informa tion-theor etically secure MBR and M SR cod es for: 1) MBR, all pa rameters [ n, k , d ] , and 2) MSR, all p arameters [ n, k , d ≥ 2 k − 2] . Each o f the co nstructions presen ted is { ℓ , ℓ ′ } info rmation- theoretically secure, for all v alues of ℓ and ℓ ′ . The secure MBR code presented is op timal for all { ℓ , ℓ ′ } , and the secure M SR code presented is o ptimal for all values of ℓ when ℓ ′ = 0 . T hus this also establishes the secr ecy capacity o f such a system for each of these p arameter values. It is u nknown at present as to whether or not the MSR cod e presen ted her e is optima l for ℓ ′ ≥ 1 . The secu re cod es provided in the p resent paper are b ased on our pre v ious work [2], wh ere we co nstruct explicit regen- erating co des fo r the parame ters listed above. The c odes in [2] are b ased on a new Pro duct-Matrix (PM) framew ork. W e will call the MBR an d MSR cod es of [2] as the PM- MBR and PM-MSR codes respec ti vely , and the corr espondin g secure versions con structed in th e p resent paper as the secure PM- MSR and the secure PM-MBR codes re specti vely . While all o ther regenerating codes in th e literature require the num ber of node s n to be equal to d + 1 , the PM codes [2] do not p ose any such constraint. Thus the PM codes ar e well su ited for distributed storage systems where the number of n odes n may vary in time, o r where the connectivity d required fo r repair may b e low . These codes are also linear , i.e., each symbol in the sy stem is a linear comb ination of the message symbols. As we shall subseque ntly see, the PM framework possesses two additiona l attributes that m akes it more attractive fo r construc ting secu re codes: (a) exact-repair , and ( b) d ata downloaded by a nod e for r epair is independe nt of th e set of d nod es to which it connects. A more detailed discussion is pr ovided in Section V. The rest of th e pap er is organized as fo llows. Section II presents th e g eneral ap proach followed in the p aper fo r cod e construction an d f or p roving infor mation-theo retic secrecy . Section III presents the secure MBR code for all param eters [ n, k, d ] and { ℓ, ℓ ′ } . Sectio n IV presents th e secure MSR codes f or all parameters [ n, k , d ≥ 2 k − 2] and { ℓ, ℓ ′ } . The paper conclu des with a discussion in Section V. I I . A P P ROA C H W e approach the p roblem of p roviding secrecy in th e pres- ence of eavesdroppers, in the following manner . T o construct a secure code for a given [ n, k , d ] , we choo se the correspond ing PM code [2] with the same values of system p arameters [ n, k, d ] . In the in put to the PM code (with out secrecy), we replace a speciﬁc, carefully chosen set o f R = B − B ( s ) (3) message symb ols with R random symbols. Each of these random symbols are chosen unifo rmly and independen tly from F q , and are also indepen dent o f the m essage symbols. If the ra ndom sy mbols a re treated as message symbols, th e secure cod e bec omes ide ntical to the original cod e. Hen ce, the processes o f reconstru ction and repair in the secu re code can be carried o ut in the s ame way as in the origin al code. T o prove { ℓ , ℓ ′ } secrecy of o ur c odes, we consider the worst ca se scenario where an eavesdropper has access to precisely { ℓ, ℓ ′ } n odes. Le t U den ote the collection of the B ( s ) message symb ols, and let R denote the collection o f R random symb ols as d eﬁned in (3). Fu rther, let E den ote the collection o f symbols that the eavesdropper gains access to. For each of the cod es presented in th is pape r , the proof of informatio n-theore tic secrecy proc eeds in the fo llowi ng manner . All log arithms are taken to th e base q . 3 Step 1: W e show that given all the message symbols U as side-infor mation, the eavesdropper can recover all th e R random symb ols, i.e., H ( R|E , U ) = 0 . Step 2: Next we show that all but R of the sy mbols obtained by the eavesdropper are functio ns of these R symbols, i.e., H ( E ) ≤ R . Step 3: W e ﬁnally show that the two cond itions listed in steps 1 and 2 a bove ne cessarily imp lies tha t th e m utual informa tion b etween the message symbols U and the symb ols E o btained by the ea vesdrop per is zero, i.e., I ( U ; E ) = 0 . I I I . S E C U R E M B R C O D E S F O R A L L [ n, k , d ] , { ℓ, ℓ ′ } MBR codes achiev e the minimum p ossible repair- bandwidth : a re placement n ode d ownloads only wh at it stores, i.e., have dβ = α . Substituting th is in th e bo und in ( 1), and replacing th e in equality with eq uality , we get that in th e absence of secr ecy requiremen ts an MBR code must satisfy B =  k d −  k 2  β , α = dβ . (4) In this section, we p resent explicit c onstructions o f informa tion-theor etically secure MBR cod es for all p arameter values [ n, k , d ] and all { ℓ, ℓ ′ } . These codes meet the upper bound ( 2) on the to tal number o f message symbols, th us showing that ( 2) is ind eed th e secre cy capacity at th e M BR point for all param eters. These codes are based on the PM- MBR codes co nstructed in [2]. W e ﬁrst provide a brief description of th e PM-MBR co des, before moving on to the construction of the secure PM-MBR cod es. W e construct c odes for the case β = 1 , and codes fo r any higher value o f β can be obtain ed by a simple co ncatenation of the β = 1 code. In the terminology of distributed storage , this pr ocess is known a s striping . Thus an MBR cod e with β = 1 has α = d . A. Recap of the Pr od u ct-Matrix MBR codes The PM-MBR cod e [2] can be de scribed in terms of an ( n × α ) cod e matrix C , where the α elements in its i th row represent the α symbols sto red in no de i (1 ≤ i ≤ n ) . The code matrix C is a produ ct of two matrices: a ﬁxed ( n × d ) encodin g matrix Ψ and a ( d × α ) message matr ix M comprising the B message symbols in a possibly redund ant fashion , i.e., C = Ψ M . (5) Denoting the i th row of Ψ a s ψ t i , the α symbols stored in the i th storage n ode is expressed as ψ t i M . The supe rscript ‘ t ’ denotes the tra nspose of a matrix. In the PM- MBR cod e, the encodin g m atrix Ψ an d the message matrix M are of the form Ψ |{z} n × d =  Φ |{z} n × k ∆ |{z} n × ( d − k )  , M |{z} d × d =       S |{z} k × k T |{z} k × ( d − k ) T t |{z} ( d − k ) × k 0 |{z} ( d − k ) × ( d − k )       The matrices Φ and ∆ are chosen in such a way th at (a) any k rows of Φ are linearly independ ent, and (b ) any d rows of Ψ are linearly ind ependent. These requirem ents can be met, f or example, by choosing Ψ to b e either a Cauchy or a V an dermon de matr ix. The cho ice of the matrix Ψ governs the choice o f the size q of th e ﬁnite ﬁeld F q , e.g., ch oosing Ψ as V andermond e allows us to use any q ≥ n . The matrice s S and T in the message matrix M a re populated by the B message symbo ls, B = k d −  k 2  = k ( d − k ) + k ( k + 1) 2 , (6) as follows. The k ( k +1) 2 symbols in the upper triangu lar half of the ( k × k ) symmetric matrix S and the k ( d − k ) elements in the ( k × ( d − k )) matrix T ar e set equal to the B message symbols. Note th atthe symmetry of matrix S makes M also symmetric. Example 1: W e illustrate the code with an example; this example will also be used subsequen tly to illustrate the secure code. Let n = 6 , k = 3 , d = 4 . T hen with β = 1 , we get α = d = 4 and B = 9 . W e d esign the co de over the ﬁnite ﬁeld F 7 . The (6 × 4) encoding matr ix Ψ can be chosen as a V ander monde m atrix with its i th row as ψ t i = [1 i i 2 i 3 ] . The matrices S and T , and hence the message matrix M are populated by the 9 message symbo ls { u i } 9 i =1 as S =   u 1 u 2 u 3 u 2 u 4 u 5 u 3 u 5 u 6   , T =   u 7 u 8 u 9   , M =     u 1 u 2 u 3 u 7 u 2 u 4 u 5 u 8 u 3 u 5 u 6 u 9 u 7 u 8 u 9 0     . W e now describe the reco nstruction and the rep air proce sses in the PM- MBR code. 1) Reco nstruction: L et Ψ DC =  Φ DC ∆ DC  be the ( k × d ) sub matrix of Ψ , correspon ding to the k ro ws of Ψ to which the data-collecto r connects. Th us the data-collector has access to the sym bols Ψ DC M =  Φ DC S + ∆ DC T t Φ DC T  . By constru ction, the matrix Φ DC is no nsingular . Hence, by multiplying the matrix Ψ DC M on the lef t b y Φ − 1 DC , on e can recover ﬁrst the matrix T a nd subseque ntly , th e matrix S . 2) Rep air: Let ψ t f be the row of Ψ c orrespon ding to the failed node f . Thus the d symbo ls stored in the failed no de are ψ t f M . The replacem ent for the failed node f c onnects to an arbitrary set { h i | 1 ≤ i ≤ d } of d remaining nodes. Each of these d nodes passes on the inn er product ( ψ t h i M ) ψ f to the replacemen t node. Th us from these d nodes, th e replace ment node obtain s th e d = α sym bols Ψ rep M ψ f , whe re Ψ rep = h ψ h 1 · · · ψ h d i t . By co nstruction, the ( d × d ) m atrix Ψ rep is in vertib le. Th is allows the replacem ent n ode to recover M ψ f . Since M is symmetric, ( M ψ f ) t = ψ t f M which is precisely the data sto red in the node prior to failure. B. Information-th eor etic Secr ecy in the PM-MBR Code For the MBR code, we have dβ = α , i.e., a replace ment node stores all the data that it downloads durin g its repair . Thus an eav esdropper does no t obtain any extra in formation from the data that is downloaded for repair . Hence for an MBR code, we can ass ume with out loss of g enerality that ℓ ′ = 0 . In this section, we will co nstruct codes that achieve the upper boun d in (2) at th e M BR point. Substituting α = dβ 4 in (2) and replacing the inequ ality with eq uality , we g et that such a co de must necessarily satisfy B ( s ) =  k d −  k 2  β −  l d −  ℓ 2  β . (7) W e now constru ct a n { ℓ, ℓ ′ } secure MBR code satisfy- ing (7), b ased on th e PM-MBR cod e. W e de note the PM-M BR code [2] described above as C , and the secure PM-MBR code constructed he re as C ( s ) . As mentio ned previously , we will present the co nstruction for th e case β = 1 . Let Ψ ( s ) be the ( n × d ) encod ing matr ix of c ode C ( s ) . Choose Ψ ( s ) to satisfy the f ollowing prope rty in addition to those required by Ψ : wh en re stricted to the ﬁrst ℓ c olumns, any ℓ rows are line arly indep endent. The cho ice of Ψ ( s ) as a Cauc hy or V and ermond e matrix satisﬁes this addition al pro perty as well. W e now mo dify the message matrix M of co de C to obtain message m atrix M ( s ) of code C ( s ) . Replace the R = B − B ( s ) = l d −  ℓ 2  (8) message symbols in the ﬁrst ℓ rows (and hence ﬁrst ℓ colum ns) of the symmetric matrix M by R r andom sym bols. Each random symbo l is chosen ind ependently and unifor mly across the elements of F q . Thus the ( n × α ) cod e matrix for the secure PM-MBR code C ( s ) is given by C ( s ) = Ψ ( s ) M ( s ) . Example 2: W e will use the PM-MBR co de in Example 1 to ob tain a secure PM-MBR code f or [ n = 6 , k = 3 , d = 4] with ℓ = 1 . From ( 7) with β = 1 we get B ( s ) = 5 . Thus we have R = B − B ( s ) = 4 . W e re place the four message symb ols u 1 , u 2 , u 3 and u 7 in Example 1 with r andom symbols r 1 , r 2 , r 3 and r 7 drawn u niformly and independe ntly from F 7 to get the new message matrix M ( s ) as: M ( s ) =     r 1 r 2 r 3 r 7 r 2 u 4 u 5 u 8 r 3 u 5 u 6 u 9 r 7 u 8 u 9 0     . (9) Since the m atrix Ψ in Ex ample 1 is a V andermo nde matrix which already satisﬁes the add itional p roperty , we retain it in the n ew code, i.e. , Ψ ( s ) = Ψ . Th us the secure PM-MBR code for the de sired parameter s is giv e n by C ( s ) = Ψ ( s ) M ( s ) . The following theorem s prove the pro perties of recon struc- tion, repa ir and secrecy in the secu re PM-MBR co de. Theor em 1 (Rec onstruction and Repa ir): I n code C ( s ) pre- sented above, a data-collector can recover all the B ( s ) message symbols by downloading data stored in any k node s, an d a failed node can be repaired by download ing one symbol ea ch from any d rem aining nodes. Pr oof: T re ating the random sym bols a lso as message symbols, the secure PM-M BR code C ( s ) becomes identical to the PM- MBR code C . Thu s recon struction and re pair in C ( s ) are identical to that in C . Theor em 2 (In formation-th eor etic Secr ecy): In code C ( s ) designed for a gi ven value of ℓ , an ea vesdropper having access to at m ost ℓ node s g ets no in formation pertain ing to the message. Pr oof: Let Ψ ( s ) eve be th e ( ℓ × d ) sub matrix of Ψ ( s ) , correspo nding to the ℓ rows of Ψ to which the eavesdropper has gained access. Th us the eavesdropper has access to the ℓd symbols in the ( ℓ × d ) matrix E ( s ) deﬁned as E ( s ) = Ψ ( s ) eve M ( s ) . (10) Follo wing the approach d escribed in Section II, we ﬁrst show that gi ven the message symbo ls as side informatio n, a n eav esdroppe r can decod e all the ran dom sym bols. T o th is en d, deﬁne ˜ M ( s ) as a ( d × d ) matrix obtain ed by setting all message symbols in M ( s ) to zero. Th us ˜ M ( s ) has its ﬁrst ℓ rows and ﬁrst ℓ columns identical to that of M ( s ) , and zeros else where. Let ˜ E ( s ) = Ψ ( s ) eve ˜ M ( s ) , (11) which are the ℓd symbols that the eavesdropper has access to, given the m essage symbo ls as side inform ation. Recall the proper ty of Ψ ( s ) eve wherein any ℓ rows, when restricted to the ﬁrst ℓ colu mns, are indep endent. Thus, recovering the R ran- dom symbo ls f rom ˜ E is identical to data reconstruction in the original PM-MBR co de ˆ C designed for [ ˆ n = n, ˆ k = ℓ, ˆ d = d ] , ˆ ℓ = 0 . Th us, given the message symbols, the e a vesdropper can decode all th e rand om sy mbols. The next step is to show that H ( E ) ≤ R . From the value of R in (8), it sufﬁ ces to show that out of the ℓd symb ols that the eav esdroppe r has access to,  ℓ 2  of them are functions (linea r combinatio ns) o f the re st. Consider , the ( ℓ × ℓ ) matrix E ( s ) (Ψ ( s ) eve ) t = Ψ ( s ) eve M ( s ) (Ψ ( s ) eve ) t . (12) Since M ( s ) is symme tric, the ( ℓ × ℓ ) matrix in (12) is also symmetric. Thus  ℓ 2  depend encies amon g th e elements of E ( s ) can be described b y the  ℓ 2  upper-triangu lar elements of the expression E ( s ) (Ψ ( s ) eve ) t − Ψ ( s ) eve ( E ( s ) ) t = 0 . (13) Using the linear-independe nce property of the rows o f Ψ ( s ) , it can b e shown that these  ℓ 2  redund ant equation s are linearly indepen dent. Thus the eaves dropp er has access to at- most ℓd −  ℓ 2  indepen dent symbols, i.e., H ( E ) ≤ R . W e hav e shown that in the secure PM-MBR co de, steps 1 and 2 of the approach described in Section II hold true. Th e ﬁnal part of the proo f, Step 3, establishes that the eav esdropper obtains no in formation about the message. I ( U ; E ) = H ( E ) − H ( E |U ) (14) ≤ R − H ( E |U ) (15) = R − H ( E |U ) + H ( E |U , R ) (16) = R − I ( E ; R|U ) = R − ( H ( R|U ) − H ( R|E , U )) (17) = R − H ( R|U ) (18) = R − R (19) = 0 , (20) where (15) fo llows fr om the result of Step 2 ; ( 16) fo llows since every symbo l in the system is a fu nction of U and R , giving H ( E |U , R ) = 0 ; (18) f ollows from the result of Step 1; and ( 19) follows since the rando m symbo ls are indep endent of the message symbols. 5 I V . S E C U R E M S R C O D E S F O R A L L [ n, k , d ≥ 2 k − 2 ] , { ℓ, ℓ ′ } MSR codes ach ie ve the minimum p ossible s torage at each node. Since a data-co llector con necting to any k nod es should be able to rec over all the B message symbo ls, each nod e must necessarily store at-least a fraction 1 k of th e entire data. Hence for an MSR code we hav e α = B k . It follows f rom (1) (replacing the inequ ality with equality) that in the absen ce of secrecy requirem ents an MSR code must satisfy B = k α, dβ = α + ( k − 1) β . (21) From (21) we see that, in general, for an MSR code dβ > α . Thus th e amou nt of d ata downloaded du ring repair is greater than what is e ventu ally stored. This requires us to d istinguish between th e situations when the eavesdropper has a ccess to only the data stored in a nod e, and when it has access to the data downloaded du ring re pair . No te that the data downloaded by a r eplacement nod e during repair also co ntains the d ata that is eventually stored in it. In this section we present explicit con structions o f informa tion-theor etically secu re MSR codes for all parameter values [ n, k , d ≥ 2 k − 2 ] and all { ℓ, ℓ ′ } . Th e secure MSR codes are based on the PM-MSR co des presented in [2]. A. Recap of the Pr od u ct-Matrix MSR code s W e ﬁrst provide a brief d escription of the PM-MSR code [2]. Th e code is designed fo r the case d = 2 k − 2 , and can be extended to d > 2 k − 2 via sh ortening (see [2], [5] for a d etailed d escription of sho rtening in MSR codes). As in the MBR case, we construct codes for the case when β = 1 . Setting d = 2 k − 2 and β = 1 in (2 1) giv es B = α ( α + 1) , α = k − 1 , d = 2 α . (22) The PM-MSR cod e C in [2] can b e described in terms of an ( n × α ) co de m atrix C = Ψ M , with the i th row of C containin g the α sym bols stor ed in no de i . The ( n × d ) encodin g m atrix Ψ is of the fo rm Ψ = [Φ ΛΦ] , where Φ is a n ( n × α ) matrix and Λ is an ( n × n ) diag onal matrix satisfying: (a) any α rows of Φ are linearly independ ent, (b) any d rows of Ψ are linearly ind ependen t, and (c) the d iagonal elements of Λ are all distinct. The (( d = 2 α ) × α ) message matrix M is of the form M = [ S 1 S 2 ] t , where S 1 and S 2 are ( α × α ) symm etric matrices. The two matrices S 1 and S 2 together con tain α ( α + 1) distinct symb ols, and th ese positions are po pulated by the B = α ( α + 1 ) message symbols. This completes the d escription of the code construction . A description of the reconstructio n an d rep air op erations under this code c an b e foun d in [ 2]. The repair algor ithm in [2] is such that the data downloaded b y any node for repa ir is indepen dent of the set o f d nodes to which it con nects. This property is highly advantageous while construc ting secu re codes, as d iscussed in Section V. B. Information-th eor etic Secr ecy in th e PM-MSR Co d e For the MSR case, from (2) we g et B ( s ) ≤ ( k − ℓ ) α . (23) On the o ther hand , the { ℓ, ℓ ′ } secur e MSR cod es constructed in the p resent paper (fo r d ≥ 2 k − 2 ) ach iev e B ( s ) = ( k − ℓ )( α − ℓ ′ β ) . (24) Thus our codes are optimal for ℓ ′ = 0 . As m entioned previously , it is un known at presen t as to whether or not, our codes are op timal when ℓ ′ ≥ 1 . The expression f or B ( s ) in (24) can be interpre ted as follows. Consid er a data- collector attem pting to rec onstruct the m essage fr om th e data stored in som e k n odes, and an eav esdroppe r having access to som e ℓ of the se k node s. T hese ℓ n odes will no t p rovide any usefu l inf ormation, thu s re sulting in th e ﬁrst term ( k − ℓ ) in the pro duct. Furthermo re, the eav esdroppe r may have access to the data p assed fo r repair of some ℓ ′ of the ℓ no des, and hence to the ℓ ′ β (poten tially distinct) symbols passed by each of the remain ing ( k − ℓ ) nodes dur ing repair . These symb ols should n ot reveal any informa tion, and hence the secon d term ( α − ℓ ′ β ) . W e now describe the construction of th e secure PM- MSR code (f or β = 1 ). W e retain the n otation used in Section II I-B. Choose Ψ ( s ) such that it satisﬁes th e following p roperty in addition to tho se required fo r Ψ : w hen restricted to the ﬁrst ℓ columns, any ℓ rows of Ψ ( s ) are linea rly independe nt. Next, deﬁne a c ollection R o f R = B − B ( s ) = ℓ α + ( k − ℓ ) ℓ ′ (25) random symbols picked inde pendently w ith a uniform distri- bution over the elements of F q , where ( 25) follows from (21) and (24). Use these R rando m symbo ls to re place the following R symbo ls in th e message ma trix M of code C , to obtain matrix M ( s ) : the ℓα −  ℓ 2  symbols in the ﬁrst ℓ rows (a nd hence the ﬁrst ℓ column s) of the sy mmetric matrix S 1 , the  ℓ 2  symbols in the intersectio n of the ﬁrst ( ℓ − 1) rows and ﬁrst ( ℓ − 1) columns of th e sy mmetric matrix S 2 , and the ( k − ℓ ) ℓ ′ remaining symbo ls in the ﬁrst ℓ ′ rows (and hence the ﬁr st ℓ ′ columns) of S 2 . The secu re PM-M SR co de is given by C ( s ) = Ψ ( s ) M ( s ) . The following theorem s prove the proper ties of recon struc- tion, repa ir and secrecy in the secu re PM-MSR cod e. Theor em 3 (Recon struction and Repair): I n code C ( s ) pre- sented above, a data-collector can recover all the B ( s ) message symbols by d ownloading d ata stored in any k nodes, and a failed no de c an be rep aired by downloading o ne sy mbol each from any d rem aining nodes. Pr oof: As in the p roof of Theo rem 1 , treating the ran- dom symbols also as message symbols, the secu re PM-MSR code C ( s ) becomes id entical to the PM-MSR cod e C . Thus reconstruc tion and r epair in C ( s ) are identical to th at in C . Theor em 4 (Informa tio n-theoretic Secr ecy): In cod e C ( s ) designed for a given value of ℓ , an eavesdropper having access to at m ost ℓ node s g ets no info rmation pertain ing to th e message. Pr oof (Sketch): Let Ψ ( s ) eve be the ( ℓ × d ) submatrix of Ψ ( s ) , correspo nding to the ℓ rows of Ψ to which the eav esdroppe r has gained access. Furthe r , let Φ ( s ) eve1 be the ( ℓ ′ × α ) sub matrix o f Φ ( s ) , correspo nding to the ℓ ′ nodes in which the eav esdroppe r has access to the repair d ownloads as 6 well. Note that by deﬁnition of an { ℓ, ℓ ′ } secure system, these ℓ ′ nodes are a subset of the set of ℓ nodes that con stitute the matrix Ψ ( s ) eve . From the repair algor ithm o f the PM-MSR code of [ 2], it tur ns out th at the sy mbols E that the eavesdropper gains access to comprises the elements of the ( ℓ × α ) matrix Ψ ( s ) eve M an d the elements of the ( d × ℓ ′ ) matrix M (Φ ( s ) eve1 ) t . Follo wing the ap proach described in Section I I, an d in a manner analo gous to the proo f of Theorem 2, it can ﬁrst be shown that given the message symbols as side informa tion, an eav esdroppe r can decode all th e rando m symb ols. Next, using the prop erties of the matrix Ψ ( s ) and the speciﬁc structur e of the message matrix M ( s ) , it can also be sho wn that H ( E ) ≤ R . Finally , the arguments in (14) to (2 0) estab lished that the eav esdroppe r ob tains no info rmation about the message. The extension to th e case d > 2 k − 2 can be achieved via shortening ( [2], [5]), using which one can use any linea r secure MSR code with param eters [ n + 1 , k + 1 , d + 1 , ℓ + 1 , ℓ ′ ] to construct a linear secure MSR code f or parame ters [ n, k , d, ℓ, ℓ ′ ] . V . D I S C U S S I O N The Produc t-Matrix fra mew o rk [ 2] possesses two particular attributes that make the codes b uilt in this f ramew ork attractiv e from the secu rity perspective. First, m any codes in the litera- ture includin g tho se in [1] co nsider functional r epa ir , wherein the data stored in th e replacem ent node is p ermitted to be different fr om that of the failed n ode as lo ng as it satisﬁes the reconstruc tion and functional- repair properties o f the system. This allows an eavesdropper to gain a greater amount of informa tion by rea ding the data stor ed in a n ode across multiple instances of repair . On the other hand, PM codes offer exact-r epair , wherein the data stored in th e replacement node is iden tical to that in the failed node. Secon d, even if rep air is exact, the data downloaded during repair of a par ticular node may depend on the set of d no des helping in the repair process, and he nce may b e different d uring d ifferent instances of repair of that node. The PM framework, by design, ensur es that the information con tained in the sym bols do wnloaded by the replacemen t node is inde pendent of the id entities of the helper nodes. This restricts information exposed to an ea vesdropper that has acce ss to the data downloaded d uring repair . R E F E R E N C E S [1] A. G. Dimakis, P . B. Godfrey , Y . Wu, M. W ainwright, and K. Ramcha n- dran, “Netw ork Coding for Distri but ed Storage Systems, ” IEE E T rans. on Inf. Theory , vol. 56, no. 9, pp. 4539–45 51, 2010. [2] K. V . Rashmi, N. B. Shah, and P . V . Kumar , “Optimal Exact- Rege nerat ing Codes for the MSR and MBR Points via a Product- Matrix Construct ion, ” IEEE T rans. on Inf. T heory , vol. 57, no. 8, pp. 5227–5239, Aug. 2011. [3] S. Pa war , S. El Rouayheb, and K. Ramchandran, “On secure distribu ted data storage under repai r dynamics, ” in Pr oc. ISIT , Austin, Jun. 2010. [4] K. V . Rashmi, N. B. Shah, P . V . Kumar , and K. Ramchandran, “Explicit Construct ion of Optimal Exact Regene rating Codes for Distrib uted Storage, ” in Pr oc. Allerton Conf . , Urbana-Champaign , Sep. 2009. [5] N. B. Shah, K. V . Rashmi, P . V . Kumar , and K. Ramchandran, “Explicit Codes Minimizin g Repair Bandwidth for Distrib uted Storage, ” in Pro c. IEEE ITW , Cairo, Jan. 2010. [6] C. Suh and K. Ramchandran, “Exact Reg enerat ion Codes for Distrib uted Storage Repair Using Interfere nce Alignment, ” in Proc. ISIT , Austin, Jun. 2010. [7] L. Ozaro w and A . W yner , “Wi re-tap channe l II, ” in A dvances in Cryptolo gy , 1985, pp. 33–50. [8] S. El Rouayheb, V . Prabhakara n, and K. Ramchand ran, “Secure distrib u- ti ve storage of decentr alize d source data: Can intera ction help?” in Pr oc. IEEE ISIT , Austin, Jun. 2010. [9] P . Oli veira, L. Lima, T . V inhoza, J. Barros, and M. M ´ edard, “T rusted Storage ov er Untrusted Networks, ” in Pr oc. IEEE GLOB ECOM , Miami, Dec. 2010. [10] D. Papai liopoul os and A. Dimakis, “Distrib uted Storage Codes Meet Multipl e-Acce ss W iretap Chann els, ” in Pr oc. Allerton Con f. , Urbana- Champaign , Sep. 2010.

Information-theoretically Secure Regenerating Codes for Distributed Storage

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment