Minimization of Storage Cost in Distributed Storage Systems with Repair Consideration

Minimizat ion of Storage Cost in Distrib uted Storage Systems with Repair Cons iderati on Quan Y u Departmen t of Electronic Eng ineering City University of Hong Kong Email: q uanyu2@studen t.cityu.edu.h k Kenneth W . Shum Institute of Network Cod ing The Chinese University of Ho ng K o ng Email: wk shum@inc.cu hk.edu. hk Chi W an Sung Departmen t of Electro nic Engin eering City University of Hong Kong Email: a lbert.sung@cityu .edu.hk Abstract —In a distributed storage system, the storage costs of d ifferent storage nodes, in general, can be different. How to store a ﬁ le in a given set of storage nodes so as to minimize the total storage cost is in vestiga ted. By analyzing the min -cut constraints of the information ﬂow graph, the feasibl e region of the stora ge capacities of the nodes can be determined. The storag e cost minimization c an then be reduced to a lin ear progra mming problem, which can be readily solved. Moreov er , the tradeoff between storage cost and repair - bandwidth is established . I . I N T R O D U C T I O N Distributed st orage system provid es an elegant way for reliable data storage. The s torage nodes are distributed across a wide geogr aphical area . When a small sub set of storage nodes encoun ters a disaster , the source data o bject can still be recon- structed from the surviving n odes. T o keep the reliability of the distributed storage system above a certain le vel, redun dancy is essential. T wo strategies are wid ely employed to introduce redund ancy . Th e most straightfor ward strategy is replication , in which each storage node stor es an entire copy of the source data ob ject. This meth od, thoug h simple, has low storage efﬁciency . The other strategy is erasure codin g, adopted in Oceanstore [1] an d T otal Recall [2] systems. A sou rce data object is di vided in to k eq ual size frag ments, and then these k frag ments would be encoded and distributed over n storage nodes; each n ode stores one en coded fragmen t. As a result, the source da ta object can be reconstru cted from any k available storage nodes. Compare d with the replication strategy , er asure coding provides better storage efﬁciency . Howe ver , in the face of repairing a failed stora ge nod e, e rasure cod ing wastes bandwidth . This is b ecause a newcomer has to ﬁrst reconstru ct the entire source data object by download ing data f rom any k surviving nodes and then to re-e ncode and store only a fraction of the downloaded data. In o rder to minimize th e repair-band width, Dim akis et al. in [3], [4] pr opose th e co ncept o f r e generating co des . In their formu lation, the data allo cated to each storage n ode is equal to α units. When a node failure occurs, a ne wcomer chooses arbi- trarily d ( d ≥ k ) av a ilable nod es to conn ect to an d downloads β un its of data from ea ch of these d nodes. By in troducing th e informa tion ﬂow graph , they translate the repair p roblem into This work was partiall y supported by a grant from the Uni versi ty Grants Committee of the Hong Kong Special Administrati ve Region, China (Project No. AoE/E-02/08 ). a single- source mu lti-cast problem in network co ding theory . A trad eoff b etween the storage ca pacity per n ode and rep air- bandwidth is also established. In [5 ], a distributed st orage system, in which different download costs a re associated with storage nod es, is introd uced. Sp eciﬁcally , th e autho rs fo cus on the scenar io that ther e are totally two sets of storage n odes accordin g to the d if ferent download costs. A trad eoff betwe en download cost and repair-bandw idth is id entiﬁed. In most c urrent studies of d istributed storage systems, the amount o f d ata stor ed on each no de is simply assumed to be id entical. How to distribute the data a cross a co llection of storage n odes is not an easy problem . G i ven the total stor age budget, for different acce ss models, Leon g et al. in [ 6] tr y to ﬁnd the corr espondin g optimal storag e allocation, in the sense of maximizing the proba bility of successful data recovery . It is shown that symmetric allocatio n is not always an optima l solution. Ho wever , it s model deals with only t he recovery problem o f source data o bject; the repair problem of failed nodes is not considere d. In a r ealistic scenario, the sto rage nodes should b e allowed to stor e different amounts o f data accord ing to the cond itions of transmission links between sou rce n ode and stor age n odes as well as stora ge c ost associated with each storage node. It is natural that different storage n odes may h a ve different storage costs in a r eal distributed storage system. Sin ce the storage n odes ar e distributed acro ss a geog raphical wide area, the storage costs are affected by many factors, such as rents of th e data stor age cen ters, storage har dware costs and labor costs for maintena nce. In this paper, we co mbine the storage allocation problem s with repair pro blems, and take dif fer ent storag e costs into consideratio n. Our ob jectiv e is to seek an optimal storage allocation, wh ich minimizes the total storage cost, subject to the constraints ob tained by analyzin g the corre sponding infor- mation ﬂow graphs. More speciﬁcally , we fo cus on the case that th ere ar e totally two types of storage nodes, each having a different stor age cost. W e will show that our storage cost minimization problem can be so lved as a L inear Prog ramming (LP) prob lem. By identifying the feasible region of this LP problem , the minimum storag e co st would be obtained at the corner poin ts. Moreover , the tradeoff b etween the storage c ost and repair-bandwid th can also be established. This paper is o rganized as follows. The problem of storag e cost minimization is f ormulated in Section II. In Section II I, we draw the in formation ﬂow graph, and identify the min- cut constraints. In Section I V, we characterize the minim um storage cost by a linear pr ogrammin g pro blem. In Section V, we illustrate the tradeoff between storage cost and repair- bandwidth . W e conclu de in Section VI. I I . P RO B L E M F O R M U L A T I O N Consider a distributed storage system consisting of two types of storage no des, each having a d ifferent storag e co st per unit d ata. Let the storage cost for the ﬁrst typ e o f no des be C 1 , and the storage cost for the second type be C 2 . W e assume that ther e are totally n storag e n odes, am ong which n 1 nodes belong to typ e 1 an d n 2 nodes b elong to type 2. A data object of size M units is enco ded an d distributed among the n stor age node s. For simp licity in presentatio n, we assume that the storage capacities of the nod es of type 1 are iden tical and equal to α 1 , wh ile the stor age capacities o f type 2 nodes are id entical an d eq ual to α 2 . The total storage co st f or stor ing the original data object can be calculated as C 1 n 1 α 1 + C 2 n 2 α 2 . There are two compon ents in the design of d istributed storage systems: (i) A data collector (DC) connecting to any k av ailab le stora ge n odes should be ab le to reconstruc t the original data object by downloading a nu mber of p ackets from these k storage nod es. (ii) Once a storag e no de fails, a newcomer initializes a rep air p rocess and regenerate s the failed node so that any DC, connec ting to this newcomer and other k − 1 existing nod es, is a ble to rebuild the origin al d ata object. Durin g the rep air pro cess, the newcomer chooses d ( d ≥ k ) surviving storage node s to connect to, each belong s either to typ e 1 or type 2, and then downloads β u nits of data from each of these d no des. The trafﬁc dβ incurre d b y the repair opera tion is deﬁned as the r epair-bandwidth . There are two modes for storag e-node rep air . T he ﬁrst one is called fu nctional repair and the second one is exact repair . I n fu nctional r epair , the conten t of th e newcomer is not necessarily the same as the conten t in the failed node to be replace d. W e only n eed to ensur e that a ny DC co nnecting to any k sto rage n odes is able to r ebuild the original data ﬁle. In exact r epa ir , th e conten t of the newcomer is required to b e exactly the same as the content in th e failed node . W e refer the readers to [7], [ 8] for code co nstruction for exact repair . In this paper, we focus on fu nctional r epair . W e model the d istributed storage system as an infor mation ﬂow graph introdu ced in [3], [4]. For any information ﬂow graph, to be detailed in the n ext section, if th e minimu m of the cut cap acities b etween the source and each data collector is not less th an the ob ject data size M , th en ther e always exists a linear network co de suc h that all data collectors can reconstruc t the d ata object [9]. Our objective of this work is to seek an optimal sto rage allocation a cross the n stor age nodes th at minim izes the total storage cost C S under the constrain ts describ ed above. I I I . M I N - C U T C O N S T R A I N T S The distributed storag e ne twork with storage cost is ab- stracted and mod eled b y an inform ation ﬂ ow graph G = Fig. 1. Information Flow Graph ( n 1 = n 2 = 2 , d = 3 , k = 2 ). ( V , E ) . W e label the storage no des from 1 to n , so that the storage no des 1 to n 1 are o f typ e 1, while the storage n odes n 1 + 1 to n are o f typ e 2. The vertices are divided into stages, starting fro m stage − 1 . In the i -th stage, we have one newcomer whic h re places a failed nod e. The ed ges are directed, and lab eled by the cor- respond ing cap acities. W e deﬁne the informatio n ﬂo w graph more f ormally as follows. 1) There is a sin gle sour ce vertex, S , in stage − 1 . It represents the data ob ject to be distributed among the storage nodes. 2) W e put 2 n vertices in stage 0. T hese vertices are called In i and Out i , for i = 1 , 2 , . . . , n . For each i , we dr aw a directed edge from the source vertex to In i with inﬁnite capacity . For i = 1 , 2 , . . . , n 1 , we d raw a d irected edge from In i to Out i with capacity α 1 . This sign iﬁes that the storage c apacities in the storage nodes of typ e 1 ar e limited to α 1 units. For i = n 1 + 1 , n 1 + 2 , . . . , n , we draw a directed edge fr om In i to Out i with capacity α 2 . This indica tes that each nod e of type 2 can store n o more th an α 2 units o f data. 3) For s = 1 , 2 , . . . , we p ut two vertices in stage s . If storage node i fails in the s - th stage, we co nstruct two vertices, In i and Out i in stage s . The vertex In i is connec ted to d “Out” nod es in earlier stag es. The capacities o f th ese d edg es are all equ al to β . If node i is of type j , ( j is either 1 or 2) we d raw an edg e from In i to Out i with ca pacity α j . 4) A data collector is r epresented b y a vertex, called DC , which is conn ected to k “Out” nod es with d istinct subscripts. All these k edges h av e inﬁn ite capacity . An example of the infor mation ﬂow graph is s hown in Fig. 1 . A ﬂow on the inf ormation ﬂow gr aph G is an assignmen t of n on-negative real number s to the edges, satisfying th e ﬂow conservation constra ints an d the capacity constraints. A ﬂow F can b e regarded as a fu nction from the e dge set E to th e set of non-n egati ve real number s, F : E → R + , such that (i) for each edg e e ∈ E , F ( e ) is less than o r equal to the capacity o f e , and (ii) for ea ch vertex o ther th an th e so urce vertex an d the d ata collectors, the sum of inco ming ﬂows is equal to the sum of outgoin g ﬂo ws, i.e. , if v ∈ V is eith er an “in” or “o ut” vertex, then X e : H ead ( e )= v F ( e ) = X e : T ail ( e )= v F ( e ) where H ead ( e ) and T ail ( e ) stand fo r the head and tail of edge e respectively . The valu e of a ﬂow F with respect to a d ata collector DC is deﬁned as the sum o f in coming ﬂows to this d ata c ollector, X e : H ead ( e )= DC F ( e ) . The maxim al ﬂow value with respect to a speciﬁc data collector DC , deno ted by m ax-ﬂow ( DC ) , is the maximal value of ﬂow to this d ata co llector DC , over all legitimate ﬂo ws. The max-ﬂow the orem in network cod ing [ 9], [10] says that if m ax-ﬂow ( DC ) ≥ M for all data collecto r DC , then there exists a linear network code which sends M u nits of data to ev ery data collecto r . Giv en a pa rticular data co llector DC , a n ( S , DC ) -cut is a partition o f the vertice s ( W , ¯ W ) such that S ∈ W and DC ∈ ¯ W . (Here ¯ W stands fo r the set co mplement of W in V .) T he capacity of an ( S , DC ) - cut is deﬁned as the sum of cap acities of th e edg es from W to ¯ W . It is well known that the max- ﬂow with respect to a data collecto r DC is e qual to the minimum cut capacity . L et th e c apacity o f an e dge e be deno ted b y c ( e ) . For each ( S , DC ) -cut , we have the following constrain t X e T ail ( e ) ∈W H ead ( e ) ∈ ¯ W c ( e ) ≥ M . (1) The summation in (1) is over all ed ges with heads in W and tails in ¯ W . The storage cost min imization prob lem can be expressed as follows: min C S , C 1 n 1 α 1 + C 2 n 2 α 2 , (2) subject to the co nstraints (1) f or all ( S , DC ) -cuts ( W , ¯ W ) . The optimiza tion is a linear prog ramming problem with two variables α 1 and α 2 . Giv en parameter s n 1 , n 2 , k , d , M , β , C 1 and C 2 , w e let the min imum storage co st in th e above linear pro gram be C ∗ S . The values of α 1 and α 2 which ac hiev e C ∗ S are den oted by α ∗ 1 and α ∗ 2 . W e will also investigate the tradeoff between th e storage cost and the repair-bandwidth. In this co ntext, we will write C ∗ S ( β ) , α ∗ 1 ( β ) and α ∗ 2 ( β ) as fu nctions of β . Theor em 1: Let A be the set o f k - vectors α = ( α (1) , α (2 ) , . . . , α ( k )) whose com ponents are either α 1 or α 2 , an d the numb er of compon ents in α which eq ual α i is at mo st n i , fo r i = 1 , 2 . Giv en n 1 , n 2 , k , d and β , the ﬁle size M is u pper bounded by M ≤ k X i =1 min { α ( i ) , ( d − i + 1) β } , (3) for any α ∈ A . Furtherm ore, we can construct an in formation ﬂow grap h such that equ ality in (3) h olds f or some α ∈ A . Fig. 2. An example of cut (d=3, k=2). Pr oof: (sketch) The p roof is based o n the analysis of min- cut in the in formation ﬂow gr aph, an d is similar to the p roof of [4, Lemma 2]. The main difference is that in this paper, the capacity o f an edg e be tween an “in ” no de and an “out” no de may be e ither α 1 or α 2 , whereas in [4], all α ’ s are identical. Because th e numb er o f stor age nod es of typ e i is equal to n i ( i = 1 , 2 ), there are at most n i edges with cap acity α i in a min-cut. There fore we take the minimum on ly over th e set A . As the p roof of (3) is basically the same as th at of Lemm a 2 in [ 4], the details are omitted . W e illustrate Th eorem 1 by the example in Fig. 1. A sample cut ( W , ¯ W ) is shown in Fig. 2. Th e vertices in ¯ W are d rawn in shaded colo r . T he v alues o f α (1 ) and α (2) are either α 1 or α 2 . The set A co nsists of four pairs ( α 1 , α 1 ) , ( α 1 , α 2 ) , ( α 2 , α 1 ) , and ( α 2 , α 2 ) . The ﬁle size M is upper boun ded by M ≤ min { α 1 , 3 β } + min { α 1 , 2 β } M ≤ min { α 2 , 3 β } + min { α 1 , 2 β } M ≤ min { α 1 , 3 β } + min { α 2 , 2 β } M ≤ min { α 2 , 3 β } + min { α 2 , 2 β } . The co st minimization problem is to minimize C S in (2), subject to the constraints in (3) over all α ∈ A . This optimization can be reduced to a linear pr ogrammin g proble m, as shown in the next theorem. Theor em 2: Let θ m , ( k − m )(2 d − k − m + 1) β / 2 . The cost minimiz ation prob lem is equ i valent to minimizin g C S as deﬁned in (2) subject to the following 2( k + 1) linear constraints, M ≤ min { m, n 1 } α 1 + ( m − min { m, n 1 } ) α 2 + θ m , (4) M ≤ ( m − min { m, n 2 } ) α 1 + min { m, n 2 } α 2 + θ m , (5) for m = 0 , 1 , . . . , k . Pr oof: For e ach α ∈ A , the ine quality in ( 3) can b e replaced by 2 k linear inequalities. W e introduc e a “ switch” function s b ( x, y ) , ( x if b = 0 , y if b = 1 . Let B = { 0 , 1 } k be the set of all binary vectors of length k . The inequality in (3) is equi valent to the follo wing 2 k inequalities: M ≤ k X i =1 s b i ( α ( i ) , ( d − i + 1) β ) , where ( b 1 , b 2 , . . . , b k ) ∈ B . T his yields |A| 2 k linear inequali- ties. W e may grou p these |A| 2 k linear inequalities by the n umber of zero s in ( b 1 , b 2 , . . . , b k ) . Amo ng those lin ear inequ alities with m ze ros in ( b 1 , b 2 , . . . , b k ) , where m is an integer between 0 and k , the mo st stringen t inequa lity is the o ne associated with ( b 1 , b 2 , . . . , b k ) = (0 , 0 , . . . , 0 | {z } m , 1 , 1 , . . . , 1 | {z } k − m ) , which is, M ≤ m X i =1 α ( i ) + k X i = m +1 ( d − i + 1 ) β = m X i =1 α ( i ) + θ m . If there are p α 1 ’ s and q α 2 ’ s amo ng α (1) , . . . , α ( m ) , we hav e M ≤ pα 1 + q α 2 + θ m . Among the grou p of linear ineq ualities with m zer os in ( b 1 , b 2 , . . . , b k ) , many inequ alities are r edundan t, meaning that we can remove them without altering th e feasible region. W e only retain two inequalities, the one in which the coefﬁcient of α 1 is smallest, and the one in which the coefﬁcient of α 2 is smallest, namely th e inequalities in (4) and (5). The other inequalities in the same g roup are some conv ex comb inations of these two inequalities, an d hence can be ig nored witho ut changin g th e shape of the feasible region. If we pu t m = 0 in either (4) or (5), we see that there is no feasible solutio n to the line ar progr amming p roblem if β is strictly less than 2 M k (2 d − k +1) . From now on, we will assume that β is no less th an 2 M k (2 d − k +1) . I V . S T O R A G E C O S T M I N I M I Z A T I O N W e solve the linear progra mming prob lem in T heorem 2 by considerin g f our d if ferent c ases: (A ) n 1 ≥ k and n 2 ≥ k , (B) n 1 ≥ k and n 2 < k , (C) n 1 < k and n 2 ≥ k , and ( D) n 1 < k and n 2 < k . A. Case A: n 1 ≥ k an d n 2 ≥ k When both n 1 and n 2 are larger than or equal to k , the two inequalities in (4) and (5) can be written as M ≤ mα 1 + θ m , and (6) M ≤ mα 2 + θ m . (7) The r egion deﬁned by these two inequalities is the in tersection of two half-plan es, which can be o btained by translating the ﬁrst quadr ant in the α 1 - α 2 plane diago nally alo ng the 45- degree line α 1 = α 2 . Theor em 3: F or β ≥ 2 M k (2 d − k +1) , we h av e α ∗ 1 ( β ) = α ∗ 2 ( β ) = max 1 ≤ m ≤ k ( M − θ m ) /m. Pr oof: T aking all co nstraints (6) and (7), for m = 1 , 2 , . . . , k into consideratio n, the feasible region is in th e form 0 5 10 15 20 25 30 0 5 10 15 20 25 30 α 1 α 2 Feasible Region Fig. 3. An example of the feasible regi on in the linear program { ( α 1 , α 2 ) : α 1 ≥ µ and α 2 ≥ µ } , wher e µ is the m aximum value as deﬁned in the theore m. No matter what the costs C 1 and C 2 are, (provid ed that they are positive) the optim al solution to the linear prog ramming is at the corner p oint of the f easible region, name ly ( α ∗ 1 , α ∗ 2 ) = ( µ, µ ) . In the case wh ere n 1 and n 2 are b oth larger than or equ al to k , we see th at the optimal stora ge allocation is to pu t the same amount of d ata in b oth typ e 1 a nd type 2 nod es. T he storage costs of the two types o f node s do not matter . B. Case B: n 1 ≥ k and n 2 < k For m = 1 , 2 , . . . , k , the two ineq ualities in (4) and (5) can be w ritten as mα 1 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m , where q m , min { m, n 2 } . The se two inequ alities deﬁne an inﬁnite po lyhedral region. For m = 1 , 2 , . . . , k , let R m be the region R m , { ( α 1 , α 2 ) ∈ R 2 + : mα 1 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m } , The feasible r egion of the linear program is th us th e inter- section of R 1 , R 2 , . . . , R k . The corner point of th e region R m can be obtained by solv ing the two equations obtained by setting the inequ alities to equalities, and has coo rdinates α 1 = α 2 = ( M − θ m ) /m. In o ther word s, f or m = 1 , 2 , . . . , k , the corner point of R m lies on the line α 1 = α 2 in th e α 1 - α 2 plane. An example of the feasible region is shown in Fig. 3. The horizon tal an d the vertical axes ar e α 1 and α 2 respectively . The para meters o f the d istributed storage system ar e n 1 = 8 , n 2 = 2 , d = 8 , k = 6 , M = 66 and β = 3 . 3 . The region to the right and above all lines is the fe asible region. The dashed line ind icates the 45-degree lin e α 1 = α 2 . Th e optim al point is on e of the vertices of the fea sible region . The ch oice of the vertex which minimizes the stor age cost depen ds on th e ratio C 1 n 1 / ( C 2 n 2 ) , i.e., the slope of the ob jectiv e functio n. 16 18 20 22 24 26 28 30 32 90 100 110 120 130 140 150 160 170 180 190 Repair Bandwidth d β Storage Cost C2=1 C2=0.2 C2=0.6 C2=1.4 C2=1.8 Fig. 4. Storage Cost and repair -bandwidth Tra deof f, C 1 = 1 . W e can ob serve from Fig. 3 that if the co st C 1 is much greater than C 2 , then th e optim al point always lies on the line α 1 = α 2 , i.e., α ∗ 1 ( β ) = α ∗ 2 ( β ) for all β . Case C is similar to Case B. T he f easible region of case C can be r egarded as th e mir ror imag e of the feasible region of case B with respect to the line α 1 = α 2 . W e therefore skip the discussion on Case C. C. Case D: n 1 < k an d n 2 < k The f easible region of the linear p rogram in Theore m 2 is bound ed by p m α 1 + ( m − p m ) α 2 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m , for m = 1 , 2 , . . . , k , wh ere q m is deﬁn ed as in the pr e vious section and p m , min { m, n 1 } . The f easible r egion is the intersection o f R m , { ( α 1 , α 2 ) ∈ R 2 + : p m α 1 + ( m − p m ) α 2 ≥ M − θ m , ( m − q m ) α 1 + q m α 2 ≥ M − θ m } for m = 1 , 2 , . . . , k . As in Case B, we can show that fo r m = 1 , 2 , . . . , k , the vertex of the polyhed ral region R m lies on the line α 1 = α 2 in th e α 1 - α 2 plane. V . T R A D E O FF B E T W E E N S T O R AG E C O S T A N D R E P A I R - B A N DW I D T H Explicit formula e for α ∗ 1 ( β ) , α ∗ 2 ( β ) a nd C ∗ S ( β ) c an be found , but due to space limitations, we do not typ e th e formu lae in th is p aper . T o illustrate the tradeoff between storage cost and repair- bandwidth , we co nsider a distributed storage system with parameters used in Fig. 3: n 1 = 8 , n 2 = 2 , d = 8 , k = 6 , M = 66 . T he minimum rep air-bandwidth is 2 M d/ ( k (2 d − k + 1)) = 16 . W e ﬁx the cost C 1 for the storage node s of ty pe 1 to be 1, and increa se C 2 from 0.2 to 1 .8, with step size 0.4. For each value of C 2 we plot C ∗ S ( β ) for dβ from 16 to 3 2. The resu lting curves are shown in Fig. 4. T he curve in th e middle correspon ds to C 1 = C 2 = 1 . Th is reduce s to the case in [4] where the costs of both typ es of no des are the same. V I . C O N C L U S I O N In this pape r , we aim at seek ing an optimal storage allo- cation that minimizes the storag e co st in distributed stor age systems. Speciﬁcally , we focu s o n the network with two types of storage no des, each h a ving a different stor age cost. W e demonstra te that the minimization problem can be solved as a lin ear programmin g problem. It is shown that the f easible region can be determine d by analyzing the m in-cut constraints of the correspond ing informatio n ﬂow gr aph. The minimum storage cost can be achieved at th e corn er po ints. Moreover , the tradeoff between t he storag e co st and repa ir-bandwidth is established. Ou r method can be extended to more gener al cases, in wh ich the storag e costs of all storag e nodes ar e not the sam e. W e can implemen t codin g scheme and repair proto col for distributed storage sy stem with sto rage cost by using random lin ear network cod ing over a ﬁnite ﬁeld. T he p ackets transmitted f rom a surviving storage nod e to the newcomer are a linear c ombination of the data in the memo ry of the surviving stora ge no de. If we apply existing c ode c onstruction methods from linear network coding to d istributed storage system, the re quired ﬁnite ﬁeld size may be unb ounde d. It is becau se the ﬁnite ﬁeld size req uirement is a mon otonically increasing fu nction of th e numb er of data collectors, wh ich may be unbo unded. T o m ake sure that the regeneration process will b e successful after arbitrar ily many stages of repairs, it is importan t to show that the ﬁnite ﬁeld size r equiremen t is upp er bound ed by som e constant. How to construct linear network code f or distributed storag e system with stor age cost is an interesting d irection for f uture studies. R E F E R E N C E S [1] J. Kubiat o wicz et al., “Ocea nStore: an archit ecture for global-scal e persisten t storage, ” in Pro c. 9th Int. Conf. on Archi tectu ral Support for pr ogramming Languag es and Operating Systems (ASPLOS) , Cambridge, MA, Nov . 2000, pp. 190–201. [2] R. Bhagw an, K. T ati, Y . Cheng, S. Sav age, and G. V oelker , “T otal recall : system support for automated av ailabili ty management, ” in Pr oc. of the 1st Conf . on Network ed Systems Design and Implemen tation , San Francisco , Mar . 2004. [3] A. G. Dimakis, P . B. Godfre y , Y . Wu, M. J. W ainwright, and K. Ram- chandra n, “Netw ork coding for distribut ed storage system, ” in Pr oc. IEEE Int. Conf. on Computer Commun. (INFOCOM ’07) , Anchorage, Alaska, May 2007. [4] ——, “Network coding for distr ibut ed storag e systems, ” IEEE T rans. Inf. Theory , vol. 56, no. 9, pp. 4539–4551, Sep. 2010. [5] S. Akhlaghi, A. Kiani , and M. R. Ghana v ati, “Cost-bandwidt h tradeof f in distrib uted storage systems, ” Computer Communicat ions , vol. 33, no. 17, pp. 2105–2115, Nov . 2010. [6] D. Leong, A. G. Dimakis, and T . Ho, “Distrib uted storage alloca tions, ” Nov . 2010, arXiv: 1011.5287 [cs.IT]. [7] K. V . Rashmi, N. B. Shah, P . V . Kumar , and K. Ramchandran , “Explicit construct ion of optimal exact regenera ting codes for distributed storage, ” in Allerton 47th A nnual Conf . on Commun., Contr ol, and Computing , Montice llo, Oct. 2009, pp. 1243–1249. [8] C. Suh and K. Ramchand ran, “Exact-repai r MDS code constructi on using interferen ce alignment, ” IEEE T rans. Inf. Theory , vol. 57, no. 3, pp. 1425–1442, Mar . 2011. [9] S.-Y . R. Li, R. W . Y eung, and N. Cai, “Linear netw ork coding, ” IEEE T rans. Inf . Theory , vol. 49, pp. 371–381, Feb . 2003. [10] R. K ¨ otter and M. M ´ edard, “ An algebraic approa ch to netwo rk coding, ” IEEE/ACM T rans. on Networking , vol. 11, no. 5, pp. 782–905, Oct. 2003.

Minimization of Storage Cost in Distributed Storage Systems with Repair Consideration

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment