Minimization of Storage Cost in Distributed Storage Systems with Repair Consideration

In a distributed storage system, the storage costs of different storage nodes, in general, can be different. How to store a file in a given set of storage nodes so as to minimize the total storage cost is investigated. By analyzing the min-cut constr…

Authors: Quan Yu, Kenneth W. Shum, Chi Wan Sung

Minimization of Storage Cost in Distributed Storage Systems with Repair   Consideration
Minimizat ion of Storage Cost in Distrib uted Storage Systems with Repair Cons iderati on Quan Y u Departmen t of Electronic Eng ineering City University of Hong Kong Email: q uanyu2@studen t.cityu.edu.h k Kenneth W . Shum Institute of Network Cod ing The Chinese University of Ho ng K o ng Email: wk shum@inc.cu hk.edu. hk Chi W an Sung Departmen t of Electro nic Engin eering City University of Hong Kong Email: a lbert.sung@cityu .edu.hk Abstract —In a distributed storage system, the storage costs of d ifferent storage nodes, in general, can be different. How to store a fi le in a given set of storage nodes so as to minimize the total storage cost is in vestiga ted. By analyzing the min -cut constraints of the information flow graph, the feasibl e region of the stora ge capacities of the nodes can be determined. The storag e cost minimization c an then be reduced to a lin ear progra mming problem, which can be readily solved. Moreov er , the tradeoff between storage cost and repair - bandwidth is established . I . I N T R O D U C T I O N Distributed st orage system provid es an elegant way for reliable data storage. The s torage nodes are distributed across a wide geogr aphical area . When a small sub set of storage nodes encoun ters a disaster , the source data o bject can still be recon- structed from the surviving n odes. T o keep the reliability of the distributed storage system above a certain le vel, redun dancy is essential. T wo strategies are wid ely employed to introduce redund ancy . Th e most straightfor ward strategy is replication , in which each storage node stor es an entire copy of the source data ob ject. This meth od, thoug h simple, has low storage efficiency . The other strategy is erasure codin g, adopted in Oceanstore [1] an d T otal Recall [2] systems. A sou rce data object is di vided in to k eq ual size frag ments, and then these k frag ments would be encoded and distributed over n storage nodes; each n ode stores one en coded fragmen t. As a result, the source da ta object can be reconstru cted from any k available storage nodes. Compare d with the replication strategy , er asure coding provides better storage efficiency . Howe ver , in the face of repairing a failed stora ge nod e, e rasure cod ing wastes bandwidth . This is b ecause a newcomer has to first reconstru ct the entire source data object by download ing data f rom any k surviving nodes and then to re-e ncode and store only a fraction of the downloaded data. In o rder to minimize th e repair-band width, Dim akis et al. in [3], [4] pr opose th e co ncept o f r e generating co des . In their formu lation, the data allo cated to each storage n ode is equal to α units. When a node failure occurs, a ne wcomer chooses arbi- trarily d ( d ≥ k ) av a ilable nod es to conn ect to an d downloads β un its of data from ea ch of these d nodes. By in troducing th e informa tion flow graph , they translate the repair p roblem into This work was partiall y supported by a grant from the Uni versi ty Grants Committee of the Hong Kong Special Administrati ve Region, China (Project No. AoE/E-02/08 ). a single- source mu lti-cast problem in network co ding theory . A trad eoff b etween the storage ca pacity per n ode and rep air- bandwidth is also established. In [5 ], a distributed st orage system, in which different download costs a re associated with storage nod es, is introd uced. Sp ecifically , th e autho rs fo cus on the scenar io that ther e are totally two sets of storage n odes accordin g to the d if ferent download costs. A trad eoff betwe en download cost and repair-bandw idth is id entified. In most c urrent studies of d istributed storage systems, the amount o f d ata stor ed on each no de is simply assumed to be id entical. How to distribute the data a cross a co llection of storage n odes is not an easy problem . G i ven the total stor age budget, for different acce ss models, Leon g et al. in [ 6] tr y to find the corr espondin g optimal storag e allocation, in the sense of maximizing the proba bility of successful data recovery . It is shown that symmetric allocatio n is not always an optima l solution. Ho wever , it s model deals with only t he recovery problem o f source data o bject; the repair problem of failed nodes is not considere d. In a r ealistic scenario, the sto rage nodes should b e allowed to stor e different amounts o f data accord ing to the cond itions of transmission links between sou rce n ode and stor age n odes as well as stora ge c ost associated with each storage node. It is natural that different storage n odes may h a ve different storage costs in a r eal distributed storage system. Sin ce the storage n odes ar e distributed acro ss a geog raphical wide area, the storage costs are affected by many factors, such as rents of th e data stor age cen ters, storage har dware costs and labor costs for maintena nce. In this paper, we co mbine the storage allocation problem s with repair pro blems, and take dif fer ent storag e costs into consideratio n. Our ob jectiv e is to seek an optimal storage allocation, wh ich minimizes the total storage cost, subject to the constraints ob tained by analyzin g the corre sponding infor- mation flow graphs. More specifically , we fo cus on the case that th ere ar e totally two types of storage nodes, each having a different stor age cost. W e will show that our storage cost minimization problem can be so lved as a L inear Prog ramming (LP) prob lem. By identifying the feasible region of this LP problem , the minimum storag e co st would be obtained at the corner poin ts. Moreover , the tradeoff b etween the storage c ost and repair-bandwid th can also be established. This paper is o rganized as follows. The problem of storag e cost minimization is f ormulated in Section II. In Section II I, we draw the in formation flow graph, and identify the min- cut constraints. In Section I V, we characterize the minim um storage cost by a linear pr ogrammin g pro blem. In Section V, we illustrate the tradeoff between storage cost and repair- bandwidth . W e conclu de in Section VI. I I . P RO B L E M F O R M U L A T I O N Consider a distributed storage system consisting of two types of storage no des, each having a d ifferent storag e co st per unit d ata. Let the storage cost for the first typ e o f no des be C 1 , and the storage cost for the second type be C 2 . W e assume that ther e are totally n storag e n odes, am ong which n 1 nodes belong to typ e 1 an d n 2 nodes b elong to type 2. A data object of size M units is enco ded an d distributed among the n stor age node s. For simp licity in presentatio n, we assume that the storage capacities of the nod es of type 1 are iden tical and equal to α 1 , wh ile the stor age capacities o f type 2 nodes are id entical an d eq ual to α 2 . The total storage co st f or stor ing the original data object can be calculated as C 1 n 1 α 1 + C 2 n 2 α 2 . There are two compon ents in the design of d istributed storage systems: (i) A data collector (DC) connecting to any k av ailab le stora ge n odes should be ab le to reconstruc t the original data object by downloading a nu mber of p ackets from these k storage nod es. (ii) Once a storag e no de fails, a newcomer initializes a rep air p rocess and regenerate s the failed node so that any DC, connec ting to this newcomer and other k − 1 existing nod es, is a ble to rebuild the origin al d ata object. Durin g the rep air pro cess, the newcomer chooses d ( d ≥ k ) surviving storage node s to connect to, each belong s either to typ e 1 or type 2, and then downloads β u nits of data from each of these d no des. The traffic dβ incurre d b y the repair opera tion is defined as the r epair-bandwidth . There are two modes for storag e-node rep air . T he first one is called fu nctional repair and the second one is exact repair . I n fu nctional r epair , the conten t of th e newcomer is not necessarily the same as the conten t in the failed node to be replace d. W e only n eed to ensur e that a ny DC co nnecting to any k sto rage n odes is able to r ebuild the original data file. In exact r epa ir , th e conten t of the newcomer is required to b e exactly the same as the content in th e failed node . W e refer the readers to [7], [ 8] for code co nstruction for exact repair . In this paper, we focus on fu nctional r epair . W e model the d istributed storage system as an infor mation flow graph introdu ced in [3], [4]. For any information flow graph, to be detailed in the n ext section, if th e minimu m of the cut cap acities b etween the source and each data collector is not less th an the ob ject data size M , th en ther e always exists a linear network co de suc h that all data collectors can reconstruc t the d ata object [9]. Our objective of this work is to seek an optimal sto rage allocation a cross the n stor age nodes th at minim izes the total storage cost C S under the constrain ts describ ed above. I I I . M I N - C U T C O N S T R A I N T S The distributed storag e ne twork with storage cost is ab- stracted and mod eled b y an inform ation fl ow graph G = Fig. 1. Information Flow Graph ( n 1 = n 2 = 2 , d = 3 , k = 2 ). ( V , E ) . W e label the storage no des from 1 to n , so that the storage no des 1 to n 1 are o f typ e 1, while the storage n odes n 1 + 1 to n are o f typ e 2. The vertices are divided into stages, starting fro m stage − 1 . In the i -th stage, we have one newcomer whic h re places a failed nod e. The ed ges are directed, and lab eled by the cor- respond ing cap acities. W e define the informatio n flo w graph more f ormally as follows. 1) There is a sin gle sour ce vertex, S , in stage − 1 . It represents the data ob ject to be distributed among the storage nodes. 2) W e put 2 n vertices in stage 0. T hese vertices are called In i and Out i , for i = 1 , 2 , . . . , n . For each i , we dr aw a directed edge from the source vertex to In i with infinite capacity . For i = 1 , 2 , . . . , n 1 , we d raw a d irected edge from In i to Out i with capacity α 1 . This sign ifies that the storage c apacities in the storage nodes of typ e 1 ar e limited to α 1 units. For i = n 1 + 1 , n 1 + 2 , . . . , n , we draw a directed edge fr om In i to Out i with capacity α 2 . This indica tes that each nod e of type 2 can store n o more th an α 2 units o f data. 3) For s = 1 , 2 , . . . , we p ut two vertices in stage s . If storage node i fails in the s - th stage, we co nstruct two vertices, In i and Out i in stage s . The vertex In i is connec ted to d “Out” nod es in earlier stag es. The capacities o f th ese d edg es are all equ al to β . If node i is of type j , ( j is either 1 or 2) we d raw an edg e from In i to Out i with ca pacity α j . 4) A data collector is r epresented b y a vertex, called DC , which is conn ected to k “Out” nod es with d istinct subscripts. All these k edges h av e infin ite capacity . An example of the infor mation flow graph is s hown in Fig. 1 . A flow on the inf ormation flow gr aph G is an assignmen t of n on-negative real number s to the edges, satisfying th e flow conservation constra ints an d the capacity constraints. A flow F can b e regarded as a fu nction from the e dge set E to th e set of non-n egati ve real number s, F : E → R + , such that (i) for each edg e e ∈ E , F ( e ) is less than o r equal to the capacity o f e , and (ii) for ea ch vertex o ther th an th e so urce vertex an d the d ata collectors, the sum of inco ming flows is equal to the sum of outgoin g flo ws, i.e. , if v ∈ V is eith er an “in” or “o ut” vertex, then X e : H ead ( e )= v F ( e ) = X e : T ail ( e )= v F ( e ) where H ead ( e ) and T ail ( e ) stand fo r the head and tail of edge e respectively . The valu e of a flow F with respect to a d ata collector DC is defined as the sum o f in coming flows to this d ata c ollector, X e : H ead ( e )= DC F ( e ) . The maxim al flow value with respect to a specific data collector DC , deno ted by m ax-flow ( DC ) , is the maximal value of flow to this d ata co llector DC , over all legitimate flo ws. The max-flow the orem in network cod ing [ 9], [10] says that if m ax-flow ( DC ) ≥ M for all data collecto r DC , then there exists a linear network code which sends M u nits of data to ev ery data collecto r . Giv en a pa rticular data co llector DC , a n ( S , DC ) -cut is a partition o f the vertice s ( W , ¯ W ) such that S ∈ W and DC ∈ ¯ W . (Here ¯ W stands fo r the set co mplement of W in V .) T he capacity of an ( S , DC ) - cut is defined as the sum of cap acities of th e edg es from W to ¯ W . It is well known that the max- flow with respect to a data collecto r DC is e qual to the minimum cut capacity . L et th e c apacity o f an e dge e be deno ted b y c ( e ) . For each ( S , DC ) -cut , we have the following constrain t X e T ail ( e ) ∈W H ead ( e ) ∈ ¯ W c ( e ) ≥ M . (1) The summation in (1) is over all ed ges with heads in W and tails in ¯ W . The storage cost min imization prob lem can be expressed as follows: min C S , C 1 n 1 α 1 + C 2 n 2 α 2 , (2) subject to the co nstraints (1) f or all ( S , DC ) -cuts ( W , ¯ W ) . The optimiza tion is a linear prog ramming problem with two variables α 1 and α 2 . Giv en parameter s n 1 , n 2 , k , d , M , β , C 1 and C 2 , w e let the min imum storage co st in th e above linear pro gram be C ∗ S . The values of α 1 and α 2 which ac hiev e C ∗ S are den oted by α ∗ 1 and α ∗ 2 . W e will also investigate the tradeoff between th e storage cost and the repair-bandwidth. In this co ntext, we will write C ∗ S ( β ) , α ∗ 1 ( β ) and α ∗ 2 ( β ) as fu nctions of β . Theor em 1: Let A be the set o f k - vectors α = ( α (1) , α (2 ) , . . . , α ( k )) whose com ponents are either α 1 or α 2 , an d the numb er of compon ents in α which eq ual α i is at mo st n i , fo r i = 1 , 2 . Giv en n 1 , n 2 , k , d and β , the file size M is u pper bounded by M ≤ k X i =1 min { α ( i ) , ( d − i + 1) β } , (3) for any α ∈ A . Furtherm ore, we can construct an in formation flow grap h such that equ ality in (3) h olds f or some α ∈ A . Fig. 2. An example of cut (d=3, k=2). Pr oof: (sketch) The p roof is based o n the analysis of min- cut in the in formation flow gr aph, an d is similar to the p roof of [4, Lemma 2]. The main difference is that in this paper, the capacity o f an edg e be tween an “in ” no de and an “out” no de may be e ither α 1 or α 2 , whereas in [4], all α ’ s are identical. Because th e numb er o f stor age nod es of typ e i is equal to n i ( i = 1 , 2 ), there are at most n i edges with cap acity α i in a min-cut. There fore we take the minimum on ly over th e set A . As the p roof of (3) is basically the same as th at of Lemm a 2 in [ 4], the details are omitted . W e illustrate Th eorem 1 by the example in Fig. 1. A sample cut ( W , ¯ W ) is shown in Fig. 2. Th e vertices in ¯ W are d rawn in shaded colo r . T he v alues o f α (1 ) and α (2) are either α 1 or α 2 . The set A co nsists of four pairs ( α 1 , α 1 ) , ( α 1 , α 2 ) , ( α 2 , α 1 ) , and ( α 2 , α 2 ) . The file size M is upper boun ded by M ≤ min { α 1 , 3 β } + min { α 1 , 2 β } M ≤ min { α 2 , 3 β } + min { α 1 , 2 β } M ≤ min { α 1 , 3 β } + min { α 2 , 2 β } M ≤ min { α 2 , 3 β } + min { α 2 , 2 β } . The co st minimization problem is to minimize C S in (2), subject to the constraints in (3) over all α ∈ A . This optimization can be reduced to a linear pr ogrammin g proble m, as shown in the next theorem. Theor em 2: Let θ m , ( k − m )(2 d − k − m + 1) β / 2 . The cost minimiz ation prob lem is equ i valent to minimizin g C S as defined in (2) subject to the following 2( k + 1) linear constraints, M ≤ min { m, n 1 } α 1 + ( m − min { m, n 1 } ) α 2 + θ m , (4) M ≤ ( m − min { m, n 2 } ) α 1 + min { m, n 2 } α 2 + θ m , (5) for m = 0 , 1 , . . . , k . Pr oof: For e ach α ∈ A , the ine quality in ( 3) can b e replaced by 2 k linear inequalities. W e introduc e a “ switch” function s b ( x, y ) , ( x if b = 0 , y if b = 1 . Let B = { 0 , 1 } k be the set of all binary vectors of length k . The inequality in (3) is equi valent to the follo wing 2 k inequalities: M ≤ k X i =1 s b i ( α ( i ) , ( d − i + 1) β ) , where ( b 1 , b 2 , . . . , b k ) ∈ B . T his yields |A| 2 k linear inequali- ties. W e may grou p these |A| 2 k linear inequalities by the n umber of zero s in ( b 1 , b 2 , . . . , b k ) . Amo ng those lin ear inequ alities with m ze ros in ( b 1 , b 2 , . . . , b k ) , where m is an integer between 0 and k , the mo st stringen t inequa lity is the o ne associated with ( b 1 , b 2 , . . . , b k ) = (0 , 0 , . . . , 0 | {z } m , 1 , 1 , . . . , 1 | {z } k − m ) , which is, M ≤ m X i =1 α ( i ) + k X i = m +1 ( d − i + 1 ) β = m X i =1 α ( i ) + θ m . If there are p α 1 ’ s and q α 2 ’ s amo ng α (1) , . . . , α ( m ) , we hav e M ≤ pα 1 + q α 2 + θ m . Among the grou p of linear ineq ualities with m zer os in ( b 1 , b 2 , . . . , b k ) , many inequ alities are r edundan t, meaning that we can remove them without altering th e feasible region. W e only retain two inequalities, the one in which the coefficient of α 1 is smallest, and the one in which the coefficient of α 2 is smallest, namely th e inequalities in (4) and (5). The other inequalities in the same g roup are some conv ex comb inations of these two inequalities, an d hence can be ig nored witho ut changin g th e shape of the feasible region. If we pu t m = 0 in either (4) or (5), we see that there is no feasible solutio n to the line ar progr amming p roblem if β is strictly less than 2 M k (2 d − k +1) . From now on, we will assume that β is no less th an 2 M k (2 d − k +1) . I V . S T O R A G E C O S T M I N I M I Z A T I O N W e solve the linear progra mming prob lem in T heorem 2 by considerin g f our d if ferent c ases: (A ) n 1 ≥ k and n 2 ≥ k , (B) n 1 ≥ k and n 2 < k , (C) n 1 < k and n 2 ≥ k , and ( D) n 1 < k and n 2 < k . A. Case A: n 1 ≥ k an d n 2 ≥ k When both n 1 and n 2 are larger than or equal to k , the two inequalities in (4) and (5) can be written as M ≤ mα 1 + θ m , and (6) M ≤ mα 2 + θ m . (7) The r egion defined by these two inequalities is the in tersection of two half-plan es, which can be o btained by translating the first quadr ant in the α 1 - α 2 plane diago nally alo ng the 45- degree line α 1 = α 2 . Theor em 3: F or β ≥ 2 M k (2 d − k +1) , we h av e α ∗ 1 ( β ) = α ∗ 2 ( β ) = max 1 ≤ m ≤ k ( M − θ m ) /m. Pr oof: T aking all co nstraints (6) and (7), for m = 1 , 2 , . . . , k into consideratio n, the feasible region is in th e form 0 5 10 15 20 25 30 0 5 10 15 20 25 30 α 1 α 2 Feasible Region Fig. 3. An example of the feasible regi on in the linear program { ( α 1 , α 2 ) : α 1 ≥ µ and α 2 ≥ µ } , wher e µ is the m aximum value as defined in the theore m. No matter what the costs C 1 and C 2 are, (provid ed that they are positive) the optim al solution to the linear prog ramming is at the corner p oint of the f easible region, name ly ( α ∗ 1 , α ∗ 2 ) = ( µ, µ ) . In the case wh ere n 1 and n 2 are b oth larger than or equ al to k , we see th at the optimal stora ge allocation is to pu t the same amount of d ata in b oth typ e 1 a nd type 2 nod es. T he storage costs of the two types o f node s do not matter . B. Case B: n 1 ≥ k and n 2 < k For m = 1 , 2 , . . . , k , the two ineq ualities in (4) and (5) can be w ritten as mα 1 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m , where q m , min { m, n 2 } . The se two inequ alities define an infinite po lyhedral region. For m = 1 , 2 , . . . , k , let R m be the region R m , { ( α 1 , α 2 ) ∈ R 2 + : mα 1 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m } , The feasible r egion of the linear program is th us th e inter- section of R 1 , R 2 , . . . , R k . The corner point of th e region R m can be obtained by solv ing the two equations obtained by setting the inequ alities to equalities, and has coo rdinates α 1 = α 2 = ( M − θ m ) /m. In o ther word s, f or m = 1 , 2 , . . . , k , the corner point of R m lies on the line α 1 = α 2 in th e α 1 - α 2 plane. An example of the feasible region is shown in Fig. 3. The horizon tal an d the vertical axes ar e α 1 and α 2 respectively . The para meters o f the d istributed storage system ar e n 1 = 8 , n 2 = 2 , d = 8 , k = 6 , M = 66 and β = 3 . 3 . The region to the right and above all lines is the fe asible region. The dashed line ind icates the 45-degree lin e α 1 = α 2 . Th e optim al point is on e of the vertices of the fea sible region . The ch oice of the vertex which minimizes the stor age cost depen ds on th e ratio C 1 n 1 / ( C 2 n 2 ) , i.e., the slope of the ob jectiv e functio n. 16 18 20 22 24 26 28 30 32 90 100 110 120 130 140 150 160 170 180 190 Repair Bandwidth d β Storage Cost C2=1 C2=0.2 C2=0.6 C2=1.4 C2=1.8 Fig. 4. Storage Cost and repair -bandwidth Tra deof f, C 1 = 1 . W e can ob serve from Fig. 3 that if the co st C 1 is much greater than C 2 , then th e optim al point always lies on the line α 1 = α 2 , i.e., α ∗ 1 ( β ) = α ∗ 2 ( β ) for all β . Case C is similar to Case B. T he f easible region of case C can be r egarded as th e mir ror imag e of the feasible region of case B with respect to the line α 1 = α 2 . W e therefore skip the discussion on Case C. C. Case D: n 1 < k an d n 2 < k The f easible region of the linear p rogram in Theore m 2 is bound ed by p m α 1 + ( m − p m ) α 2 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m , for m = 1 , 2 , . . . , k , wh ere q m is defin ed as in the pr e vious section and p m , min { m, n 1 } . The f easible r egion is the intersection o f R m , { ( α 1 , α 2 ) ∈ R 2 + : p m α 1 + ( m − p m ) α 2 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m } for m = 1 , 2 , . . . , k . As in Case B, we can show that fo r m = 1 , 2 , . . . , k , the vertex of the polyhed ral region R m lies on the line α 1 = α 2 in th e α 1 - α 2 plane. V . T R A D E O FF B E T W E E N S T O R AG E C O S T A N D R E P A I R - B A N DW I D T H Explicit formula e for α ∗ 1 ( β ) , α ∗ 2 ( β ) a nd C ∗ S ( β ) c an be found , but due to space limitations, we do not typ e th e formu lae in th is p aper . T o illustrate the tradeoff between storage cost and repair- bandwidth , we co nsider a distributed storage system with parameters used in Fig. 3: n 1 = 8 , n 2 = 2 , d = 8 , k = 6 , M = 66 . T he minimum rep air-bandwidth is 2 M d/ ( k (2 d − k + 1)) = 16 . W e fix the cost C 1 for the storage node s of ty pe 1 to be 1, and increa se C 2 from 0.2 to 1 .8, with step size 0.4. For each value of C 2 we plot C ∗ S ( β ) for dβ from 16 to 3 2. The resu lting curves are shown in Fig. 4. T he curve in th e middle correspon ds to C 1 = C 2 = 1 . Th is reduce s to the case in [4] where the costs of both typ es of no des are the same. V I . C O N C L U S I O N In this pape r , we aim at seek ing an optimal storage allo- cation that minimizes the storag e co st in distributed stor age systems. Specifically , we focu s o n the network with two types of storage no des, each h a ving a different stor age cost. W e demonstra te that the minimization problem can be solved as a lin ear programmin g problem. It is shown that the f easible region can be determine d by analyzing the m in-cut constraints of the correspond ing informatio n flow gr aph. The minimum storage cost can be achieved at th e corn er po ints. Moreover , the tradeoff between t he storag e co st and repa ir-bandwidth is established. Ou r method can be extended to more gener al cases, in wh ich the storag e costs of all storag e nodes ar e not the sam e. W e can implemen t codin g scheme and repair proto col for distributed storage sy stem with sto rage cost by using random lin ear network cod ing over a finite field. T he p ackets transmitted f rom a surviving storage nod e to the newcomer are a linear c ombination of the data in the memo ry of the surviving stora ge no de. If we apply existing c ode c onstruction methods from linear network coding to d istributed storage system, the re quired finite field size may be unb ounde d. It is becau se the finite field size req uirement is a mon otonically increasing fu nction of th e numb er of data collectors, wh ich may be unbo unded. T o m ake sure that the regeneration process will b e successful after arbitrar ily many stages of repairs, it is importan t to show that the finite field size r equiremen t is upp er bound ed by som e constant. How to construct linear network code f or distributed storag e system with stor age cost is an interesting d irection for f uture studies. R E F E R E N C E S [1] J. Kubiat o wicz et al., “Ocea nStore: an archit ecture for global-scal e persisten t storage, ” in Pro c. 9th Int. Conf. on Archi tectu ral Support for pr ogramming Languag es and Operating Systems (ASPLOS) , Cambridge, MA, Nov . 2000, pp. 190–201. [2] R. Bhagw an, K. T ati, Y . Cheng, S. Sav age, and G. V oelker , “T otal recall : system support for automated av ailabili ty management, ” in Pr oc. of the 1st Conf . on Network ed Systems Design and Implemen tation , San Francisco , Mar . 2004. [3] A. G. Dimakis, P . B. Godfre y , Y . Wu, M. J. W ainwright, and K. Ram- chandra n, “Netw ork coding for distribut ed storage system, ” in Pr oc. IEEE Int. Conf. on Computer Commun. (INFOCOM ’07) , Anchorage, Alaska, May 2007. [4] ——, “Network coding for distr ibut ed storag e systems, ” IEEE T rans. Inf. Theory , vol. 56, no. 9, pp. 4539–4551, Sep. 2010. [5] S. Akhlaghi, A. Kiani , and M. R. Ghana v ati, “Cost-bandwidt h tradeof f in distrib uted storage systems, ” Computer Communicat ions , vol. 33, no. 17, pp. 2105–2115, Nov . 2010. [6] D. Leong, A. G. Dimakis, and T . Ho, “Distrib uted storage alloca tions, ” Nov . 2010, arXiv: 1011.5287 [cs.IT]. [7] K. V . Rashmi, N. B. Shah, P . V . Kumar , and K. Ramchandran , “Explicit construct ion of optimal exact regenera ting codes for distributed storage, ” in Allerton 47th A nnual Conf . on Commun., Contr ol, and Computing , Montice llo, Oct. 2009, pp. 1243–1249. [8] C. Suh and K. Ramchand ran, “Exact-repai r MDS code constructi on using interferen ce alignment, ” IEEE T rans. Inf. Theory , vol. 57, no. 3, pp. 1425–1442, Mar . 2011. [9] S.-Y . R. Li, R. W . Y eung, and N. Cai, “Linear netw ork coding, ” IEEE T rans. Inf . Theory , vol. 49, pp. 371–381, Feb . 2003. [10] R. K ¨ otter and M. M ´ edard, “ An algebraic approa ch to netwo rk coding, ” IEEE/ACM T rans. on Networking , vol. 11, no. 5, pp. 782–905, Oct. 2003.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment