Repair Optimal Erasure Codes through Hadamard Designs
In distributed storage systems that employ erasure coding, the issue of minimizing the total {\it communication} required to exactly rebuild a storage node after a failure arises. This repair bandwidth depends on the structure of the storage code and…
Authors: Dimitris S. Papailiopoulos, Alex, ros G. Dimakis
Repair Optimal Erasure Codes through Hadamard Designs Dimitris S. Papailiopoulos and Ale xandros G. Dimakis Electrical Engineering Uni versity of Southern California Los Angeles, CA 90089 Email: { papailio, dimakis } @usc.edu V i veck R. Cadambe Electrical Engineering and Computer Science Uni versity of California Irvine, Irvine, CA, 92697 Email: vcadambe@uci.edu Abstract In distributed storage systems that employ erasure coding, the issue of minimizing the total communication required to exactly rebuild a storage node after a failure arises. This repair bandwidth depends on the structure of the storage code and the repair strategies used to restore the lost data. Designing high-rate maximum-distance separable (MDS) codes that achiev e the optimum repair communication has been a well-known open problem. In this work, we use Hadamard matrices to construct the first explicit 2 -parity MDS storage code with optimal repair properties for all single node failures, including the parities. Our construction relies on a novel method of achie ving perfect interference alignment over finite fields with a finite file size, or number of extensions. W e generalize this construction to design m -parity MDS codes that achieve the optimum repair communication for single systematic node failures and show that there is an interesting connection between our m -parity codes and the systematic-repair optimal permutation-matrix based codes of T amo et al. [21] and Cadambe et al. [22], [23]. I . I N T RO D U C T I O N Distributed storage systems ha ve reached such a massive scale that recov ery from failures is now part of regular operation rather than a rare exception [4]. Large scale deployments typically need to tolerate multiple failures, both for high availability and to prevent data loss. Erasure coded storage achiev es high failure tolerance without requiring a large number of replicas that increase the storage cost. Three application conte xts where erasure coding techniques are being currently deplo yed or under in vestig ation are Cloud storage systems [5], archi val storage, and peer -to-peer storage systems like Cleversafe and W uala. One central problem in erasure coded distributed storage systems is that of maintaining an encoded representation when failuires occur . T o maintain the same redundancy when a storage node leaves the system, a ne wcomer node has to join the array , access some existing nodes, and exactly reproduce the contents of the departed node. In its most general form this problem is kno wn as the Exact Code Repair Pr oblem [2], [1]. There are se veral metrics that can be optimized during repair: the total information read from existing disks during repair [8], [9], the total information communicated in the network (repair bandwidth [2]), or the total number of disks required for each repair [5], [10], [21], [23]. Currently , the most well-understood metric is that of repair bandwidth. For designing ( n, k ) MDS erasure codes that ha ve n storage nodes and can tolerate an y n − k failures, the information theoretic cut-set bounds for repair communication were specified in [2] and shown to be achie vable for all values of n, k in a series of recent papers [3], [13], [16], [18]–[20]. In particular , it was sho wn that for a ( n, k ) code, if a single node fails, downloading 1 n − k fraction of every surviving disk is sufficient and optimal in terms of repair bandwidth for the repair of a failed node. Beyond MDS codes, [2] demonstrated a tradeoff between storage and repair communication, and code constructions for other points of this tradeoff are under activ e in vestig ation, see e.g. [3], [20], or [24] for multiple node repair schemes. On this tradeof f, the minimum storage point is achiev ed by MDS erasure codes with optimal repair , also known as Minimum Storage Regenerating (MSR) codes. For code rates k /n ≤ 1 / 2 , explicit MSR codes were designed by Shah et al. [16], Rashmi et al. [20], and Suh et al. [15]. For the high-rate re gime, ho wev er , the only kno wn complete constructions [18], [19] require large file sizes (symbol extensions) and field sizes. These constructions use the symbol extension interference alignment (IA) technique of [11] to establish that there exist MDS storage codes, that come arbitrarily close to (but do not exactly match) the information theoretic lower bound for the repair bandwidth for all n , k . These asymptotic constructions are impractical due to the arbitrarily large finite field size and the f ast growing file size, required ev en for small v alues of n and k . Our Contribution : W e introduce the first explicit high-rate ( k + 2 , k ) MDS storage code with optimal repair communication. Our storage code e xploits fundamental properties of Hadamard designs and perfect IA instances that can be understood through the use of a lattice representation of the symbol extension technique of Cadambe et al. [11], [18], [19]. Our coding and repair strategy bears resemblance to the notion of er godic interference alignment [25], which is a finite-symbol-extension based IA scheme in the wireless channels. Independently of this work, there has recently been a substantial progress in designing high-rate explicit MSR codes. T amo et al. [21] and Cadambe et al. [23] designed MDS codes for any ( n, k ) parameters that have optimal repair for the systematic nodes, but not the code parities. It seems that extending these designs to allo w optimal parity repair is not straightforward. The advantage of our work is that all n nodes are optimally repaired and the disadv antage is that our construction is currently only optimal for n − k = 2 . Our key technical contribution is a scheme that achie ves perfect interference alignment with a finite number of extensions. This was dev eloped in [26] and used in 2 parity storage code with optimal repair for k nodes and near optimal repair of 2 nodes, that can handle any single node failure. W e use a combinatorial view of dif ferent interference alignment schemes using a framew ork we call dots-on-a-lattice. Hadamard matrices are shown to be crucial in achie ving finite perfect alignment and ensuring the full-rank of desired subspaces. Finally , we present m -parity MDS code constructions based on Hadamard designs that achiev e optimal repair for systematic node failures, but suboptimal repair for parity nodes. W e show that these codes are equiv alent to codes that in volv e permutation matrices in the manner of [21] and [23] under a similarity transformation. 2 -parity Code P arameters: Assuming that the file to be encoded has size M = k 2 k +1 , each of the k + 2 storage nodes stores a coded block of size M k . Repairing a single node failure costs k +1 2 k M in repair communication bandwidth, matching the theoretic lower bound. Finally , we give explicit conditions on the MDS property of the code and show that finite fields of size greater than or equal to 2 k + 3 suf fice to satisfy them. m -parity Code P arameters: For file sizes M = k m k , our ( k + m, k ) codes achieve a repair communication bandwidth of k + m − 1 mk M for single systematic node failures, matching the information theoretic lower bound. The MDS property of these codes is sho wn to hold for arbitrarily large finite fields with high probability . I I . M D S S T O R AG E C O D E S W I T H 2 P A R I T Y N O D E S In this section, we consider the code repair problem for MDS storage codes with 2 parity nodes. After we lay down the model for repair , we continue with introducing our code construction. Let a file of size M = k N denoted by the vector f ∈ F kN q be partitioned in k parts f = f T 1 . . . f T k T , each of size N , where N denotes the subpacketization factor , N 2 ∈ N ∗ . 1 W e wish to store f across k systematic and 2 parity storage units each ha ving storage capacity M k = N , hence we consider a data rate of k k +2 . W e require that the encoded storage array is resilient up to an y 2 node erasures. T o satisfy the redundancy and erasure resiliency properties, the file is encoded using a ( k + 2 , k ) MDS distrib uted storage code. A storage code has the MDS property when any possible collections of k storage nodes can reconstruct the file f . systematic node systematic data 1 f 1 . . . . . . k f k parity node parity data 1 f 1 + . . . + f k 2 A T 1 f 1 + . . . + A T k f k Fig. 1. A ( k + 2 , k ) C O D E D S TO R AG E A R R A Y . In Fig. 1 we provide a general structure of a two parity MDS encoded storage array . The first k storage nodes store the systematic file parts. W ithout loss of generality , the first parity stores the sum of all k systematic parts f 1 + . . . + f k and the second parity stores a linear combination of them A T 1 f 1 + . . . + A T k f k . Here, A i denotes an N × N matrix of coding coefficients used by the second parity node to “scale and mix” the contents of the i th file piece f i , i ∈ { 1 , . . . , k } . This representation is a systematic one: k nodes store uncoded file pieces and each of the 2 parities stores a linear combination of the k file parts. In this work, we are interested in maintaining the same le vel of redundancy when a storage component fails or leav es the system. T o do that the code r epair process has to take place to exactly regenerate the lost data in a newcomer storage component. Let, for example, a systematic node i ∈ { 1 , . . . , k } fail. Then, a newcomer joins the storage network, connects to the remaining nodes, and has to download sufficient data to reconstruct f i . It is important to note that the lost systematic part f i , exists only as a term of a linear combination at each parity node, as seen in Fig. 1. Therefore, to regenerate the N elements of f i , the ne wcomer has to download from the parity nodes a size of data equal to the size of the lost piece, i.e., N linearly independent coded elements. Assuming that it downloads the same amount of data from both parities, the do wnloaded contents can be represented as a stack of N equations " p (1) i p (2) i # 4 = V (1) i T f 1 + . . . + V (1) i T f k V (2) i T A T 1 f 1 + . . . + V (2) i T A T k f k = V (1) i T A i V (2) i T f i | {z } useful data + k X s =1 ,s 6 = i V (1) i T A s V (2) i T f s | {z } interference by f s , (1) 1 F q denotes the finite field, ov er which all operation are performed. f 1 f 1 f 2 f 1 + f 2 V (1) 1 V (2) 1 A 1 V (2) 1 T f 1 + A 2 V (2) 1 T f 2 V (1) 1 T f 1 + V (1) 1 T f 2 basis V (1) 1 A 2 V (2) 1 T f 2 A T 1 f 1 + A T 2 f 2 Fig. 2. Repair of a (4 , 2) code. Let systematic node 1 fail. Then, a newcomer node joins the system and downloads data from the 3 remaining nodes to regenerate f 1 . The useful information is mixed with the undesired part f 2 in both information chunks downloaded from the parities. These interference parts are highlighted in red. T o retriev e f 1 a basis of the interference equations needs to be downloaded by systematic node 2 . Then, the newcomer can erase interference and inv ert the matrix multiplying f 1 to retrie ve it. Note that for in vertibility , we need the additional condition that the matrix h V (1) 1 A 1 V (2) 1 i T has a full rank of N . where p (1) i , p (2) i ∈ F N 2 q are the equations do wnloaded from the first and second parity node, respecti vely , and V (1) i , V (2) i ∈ F N × N 2 q are the r epair matrices . Each repair matrix is used to mix the N parity contents so that a set of N 2 equations is formed. Then, retrie ving f i from (1) is equiv alent to solving an underdetermined set of N equations in the kN unknowns of f , with respect to the N desired unkno wns of f i . Ho wever , this is not possible due to k − 1 additi ve interfer ence components in the receiv ed equations generated by the undesired unknowns f s , s ∈ { 1 , . . . , k }\ i , as noted in (1). These k − 1 interference terms corrupt the desired data and need to be canceled. Hence, the newcomer needs to do wnload additional data from the remaining k − 1 systematic nodes, that will “replicate” and cancel the interference terms from the do wnloaded equations. T o cancel a single interference term of (1) that has size N , it suffices to do wnload a basis of equations that generates it. The dimensions of this basis does not need to be equal to N . For example, to erase " V (1) i T A s V (2) i T # f s , s ∈ { 1 , . . . , k }\ i (2) the ne wcomer needs to connect to systematic node s and download a number of linear equations in f s that can generate (2); this number is equal to N 2 ≤ rank " V (1) i T A s V (2) i T #! ≤ N , (3) This is e xactly the communication bandwidth price we are paying to delete a single interference term in order to be able to reconstruct f i . The lo wer bound in (3) comes from the fact that N 2 linearly independent equations need to be do wnloaded from each of the parities, hence rank ( V (1) i ) = rank ( V (2) i ) = N 2 for any i ∈ { 1 , . . . , k } . Eventually , we need to generate all undesired terms in the newcomer , so to subtract them from (1). Then, a full rank system of N equations in the N unknowns has to be formed. A generic example of a code repair instance for a (4 , 2) storage code is gi ven in Fig. 2. In general, to repair a systematic node i ∈ { 1 , . . . , k } of an arbitrary ( k + 2 , k ) MDS storage code, we need to obtain a feasible solution to the following rank constrained, rank minimization (performed o ver F q ) R i : min V (1) i , V (2) i k X s =1 ,s 6 = i rank h V (1) i A s V (2) i i s.t.: rank h V (1) i A i V (2) i i = N , where i) the full rank constraints correspond to the requirement that the N equations downloaded from the parities are linearly independent, when viewed as equations in the N components of f i and ii) the rank minimization corresponds to minimizing the sum of bases dimensions needed to cancel each interference term. For a specific feasible selection of repair matrices the r epair bandwidth to exactly regenerate systematic node i is giv en by γ i = N |{z} #equations lost + k X s =1 ,s 6 = i rank h V (1) i A s V (2) i i | {z } dim. of interference equations by f s = N + k X s =1 ,s 6 = i rank h V (1) i A s V (2) i i , (4) where the sum rank term is the aggregate of interference dimensions. An optimal solution to R i is guaranteed to minimize the repair bandwidth we need to communicate to the repair systematic node i ∈ { 1 , . . . , k } . A 1 = diag 9 1 1 1 1 1 1 1 1 − 1 − 1 − 1 − 1 − 1 − 1 − 1 − 1 + 7 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 , A 2 = diag 5 1 1 1 1 − 1 − 1 − 1 − 1 1 1 1 1 − 1 − 1 − 1 − 1 + 2 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 , A 3 = diag 9 1 1 − 1 − 1 1 1 − 1 − 1 1 1 − 1 − 1 1 1 − 1 − 1 + 4 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Fig. 3. A repair optimal (5 , 3) MDS code ov er F 11 Remark 1: From [2] it is known that the theoretical minimum repair bandwidth, for an y single node repair of an optimal (linear or nonlinear) ( k + 2 , k ) MDS code is exactly ( k + 1) N 2 , where N has to be an e ven number . This bound is prov en using cut-set bounds on infinite flo w graphs. Here, we provide an interpretation of this bound in terms of linear codes by calculating the minimum possible sum of ranks in R i : since each repair matrix has to hav e full column rank N 2 to be a feasible solution, the minimum number of dimensions each interference can be suppressed to is N 2 . This aggregates in a minimum repair bandwidth of ( k + 1) N 2 repair equations. If we wish to achieve this bound, interference alignment has to be employed, so that undesired components like (2) are confined to the minimum number of dimensions. Interestignly , linear codes suf fice to asymptotically achie ve this bound [18], [19]. W e kno w what the theoretical minimum repair bandwidth is and that there exist asymptotically optimal schemes, howe ver , designing MDS codes with repair strategies that achiev e it has been challenging. The difficulty in designing optimal MDS storage codes lies in a threefold requirement: i) the code has to satisfy the MDS property , ii) systematic nodes of the code hav e to be optimally repaired, and iii) parity nodes of the code hav e to be optimally repaired. Currently , there exist MDS codes for rates k n ≤ 1 2 [15], [20] for which all nodes can be optimally repaired. For the high data rate regime, T amo et al. [21] and Cadambe et al. [23] presented the first MDS codes where any systematic node failure can be optimally repaired. Howe ver , prior to this work, there do not exist MDS storage codes of arbitrarily high rate that can optimal repair any node. In the following, we present the first explicit, high-rate, repair optimal ( k + 2 , k ) MDS storage code that achie ves the minimum repair bandwidth bound for the repair of any single systematic or parity node failure. I I I . A R E PAI R O P T I M A L 2 P A RI T Y S T O R AG E C O D E Let a ( k + 2 , k ) MDS storage code for file size M = k 2 k +1 , with coding matrices A i = a i X i + b i X k +1 + I N , i ∈ { 1 , . . . , k } (5) where N = 2 k +1 , X i = I 2 i − 1 ⊗ blkdiag I N 2 i , − I N 2 i , (6) and a i , b i satisfy a 2 i − b 2 i = − 1 , for all i ∈ { 1 , . . . , k } . 2 Theor em 1: There e xists a finite field F q of order q ≥ 2 k + 3 and explicit constants a i , b i ∈ F q , ∀ i ∈ { 1 , . . . , k } , such that the ( k , k + 2) storage code in (5) is a repair optimal MDS storage code. In Fig. 3, we giv e the coding matrices of a (5 , 3) MDS code ov er F 11 based on our construction. Remark 2: The code constructions presented here ha ve generator matrices that are as sparse as possible, since any additional sparsity w ould violate the MDS property . This creates the additional benefit of minimum update complexity when some bits of the stored data object change. Before we proceed with pro ving Theorem 1, we state the intuition behind our code construction and the tools that we use. Motiv ated by the asymptotic IA schemes, we use similar concepts moti vated by a combinatorial explanation of interference alignment in terms of dots on lattices. In contrast to the asymptotic IA codes, here, instead of letting randomness choose the coding matrices, we select particular constructions based on Hadamard matrices that achiev e exact interference alignment for fixed in k file sizes (symbol extensions). In section V we prov e the optimal repair of systematic nodes, in Section VII we show the optimal repair of parity nodes, and in section VIII we state explicit conditions for the MDS property . 2 W e use − 1 to denote the field element q − 1 over F q . I V . D O T S - O N - A - L A T T I C E A N D H A DA M A R D D E S I G N S In Section II, we sho wed that minimizing the communication bandwidth to repair nodes of a storage code is equi valent to the problem of minimizing the dimensions of interference terms generated during each repair process. Here, we consider the problem of designing coding and repair matrices that can achiev e perfect interference alignment in a finite number of extensions. W e begin by assuming arbitrary constructions and then we use a combinatorial explanation of IA to find conditions under which perfect alignment in the finite file regime. Eventually , we show that exact IA conditions and linear independence requirements posed by our problem are simultaneously satisfied through the use of Hadamard designs. Assume two arbitrary N × N full rank matrices T 1 and T 2 that commute. W e wish to construct a full rank matrix V , with at most N 2 columns, such that the span of T 1 V aligns as much as possible with the span of T 2 V : we ha ve to pick V such that it minimizes the dimensions of the union of the two spans, that is the rank of [ T 1 V T 2 V ] . How can we construct such a matrix? Assume that we start with one vector with nonzero entries, i.e., V = w , and for simplicity we let it be the all-ones vector . Then in the general case, T 1 w and T 2 w have zero intersection which is not desired. Howe ver , we can augment V such that it has as columns the elements of the set { w , T 1 w , T 2 w , T 1 T 2 w } . Observ e that each vector T x 1 1 T x 2 2 w of V can be represented by the power tuple ( x 1 , x 2 ) . This helps us visualize V as a set of dots on the 2 -dimensional integer lattice as shown in Fig. 4. 0 1 2 1 2 0 x 1 x 2 w T 1 w T 2 w T 1 T 2 w Fig. 4. Representing V as dots on a lattice. For this new selection of V , we ha ve T 1 V = T 1 w T 2 1 w T 1 T 2 w T 2 1 T 2 V (7) and T 2 V = T 2 w T 2 T 1 w T 2 2 w T 1 T 2 2 V . (8) The intersection of the spans of these two matrices is now nonzero: the matrix [ T 1 V T 2 V ] has rank 7 instead of the maximum possible of 8 . This happens because the vector T 1 T 2 w is repeated in both matrices T 1 V and T 2 V . In Fig. 5 we illustrate this concatenation, in terms of dots on Z 2 , where the intersection between the two spans is manifested as an overlap of dots. T 1 V T 2 V x 1 x 2 Fig. 5. Representing [ T 1 V T 2 V ] as dots on a lattice. Remark 3: Observe how matrix multiplication of T 1 and T 2 with the vectors in V is pronounced through the dots representation: the dots representations of T 1 V and T 2 V matrices are shifted v ersions of V along the x 1 and x 2 axes. The key idea behind choosing a new V at each step is to iterativ ely augment the old one with products of the T i matrices raised to specific po wers times the current V initialize : V ← w (9) multiply with powers of T 1 : V ← V T 1 V . . . T m − 1 1 V (10) multiply with powers of T 2 : V ← V T 2 V . . . T m − 1 2 V . (11) In general, by using powers up to m , with m 2 ≤ N 2 , we obtain V with m 2 columns that are the elements of the set V = { T x 1 1 T x 1 2 w : x s ∈ { 0 , . . . , m − 1 }} , (12) where w = 1 N × 1 . Then, matrix V achie ves the follo wing property m 2 < rank ([ T 1 V T 2 V ]) < ( m + 1) 2 , (13) which means that we can asymptotically create as much alignment as we desire within the spans of the matrices T i V , for arbitrarily large “symbol extensions”, i.e. for sufficiently lar ge N , ( m + 1) 2 /m 2 is arbitrarily close to 1 . For example, we giv e the m = 4 case in Fig. 6, where we observe that the alignment is more substantial (with respect to the size of V ) compared to Fig. 5. This alignment scheme, in a more general form, was presented by Cadambe and Jafar in [11] to prov e T 1 V T 2 V x 1 x 2 Fig. 6. Representing [ T 1 V T 2 V ] as dots on a lattice. the Degrees-of-Freedom of the K -user interference channel. For that wireless scenario, the T i matrices are giv en by nature and are i.i.d. diagonals. Perfect alignment of spaces for these matrices is not kno wn to be possible for finite m [11], [28]. For network coding problems, and in particular , for storage coding problems, the analogous T i matrices (our coding matrices) are free to design under some specific constraints that ensure the MDS property of the code. Before, we gi ve explicit matrices that achie ve alignment in a finite number of extensions, we answer the analogous question considering our to y example: do there e xist T 1 and T 2 matrices such that we can construct a full-rank V that achiev es perfect intersection (e xact alignment) of the spans of T 1 V and T 2 V , for some m and N = m 3 ? That is, can we find matrices such that span ( T 1 V ) = span ( T 2 V ) and rank ( V ) = m 2 (14) is possible? W e sho w that a sufficient condition for perfect alignment is satisfied when the elements of the matrices are m th roots of unity , i.e., T m i = I N . (15) T o see, that we formally state the dots on a lattice representation. Let a map L from a matrix with r columns, each generated as T x 1 1 T x 2 2 w , to a set of r points, such that the column T x 1 1 T x 2 2 w maps to the point ( x 1 , x 2 ) . Then, we hav e for V L ( V ) 4 = { x 1 e 1 + x 2 e 2 ; x 1 , x 2 ∈ [ m ] } , (16) where [ m ] = { 0 , . . . , m − 1 } and e i is the i -th column of the identity matrix. Using this representation, the products T 1 V and T 2 V map to L ( T 1 V ) = ( ( x 1 + 1) e 1 + x 2 e 2 : x 1 , x 2 ∈ [ m ] ) and L ( T 2 V ) = ( x 1 e 1 + ( x 2 + 1) e 2 : x 1 , x 2 ∈ [ m ] ) (17) respectiv ely . For perfect alignment, we hav e to design the T i matrices such that L ( T 1 V ) = L ( T 2 V ) . (18) A sufficient set of conditions for perfect span instersection is that V , T 1 V , and T 2 V perfectly intersect, i.e. L ( T 1 V ) = L ( V ) ⇔ ( ( x 1 + 1) e 1 + x 2 e 2 : x 1 , x 2 ∈ [ m ] ) = ( x 1 e 1 + x 2 e 2 : x 1 , x 2 ∈ [ m ] ) , (19) L ( T 1 V ) = L ( V ) ⇔ ( x 1 e 1 + ( x 2 + 1) e 2 : x 1 , x 2 ∈ [ m ] ) = ( x 1 e 1 + x 2 e 2 : x 1 , x 2 ∈ [ m ] ) . (20) The abov e conditions are satisfied when the matrix powers “wrap around” upon reaching a certain modulus m . This wrap-around property is obtained when the T 1 and T 2 matrices have elements that are m th roots of unity T m 1 = T 0 1 = T m 2 = T 0 2 = I N . (21) Howe ver , arbitrary diagonal matrices whose elements are m th roots of unity are not suf ficient to ensure the full rank property of V . T o hint on a general procedure which outputs “good” T i matrices, we see an example where we pick them such that V has orthogonal columns. Let us briefly consider the case where m = 2 and N = 2 3 , for which we choose T 1 = diag 1 − 1 1 − 1 1 − 1 1 − 1 and T 2 = diag 1 1 − 1 − 1 1 1 − 1 − 1 . (22) For these matrices, V has m 2 = 4 orthogonal columns V = [ w T 1 w T 2 w T 1 T 2 w ] = 1 1 1 1 1 − 1 1 − 1 1 1 − 1 − 1 1 − 1 − 1 1 1 1 1 1 1 − 1 1 − 1 1 1 − 1 − 1 1 − 1 − 1 1 (23) and T 2 V = [ T 2 w T 1 T 2 w w T 1 w ] , T 3 V = [ T 1 w w T 1 T 2 w T 2 w ] indeed ha ve fully overlapping spans. Interest- ingly , we observ e that for the additional matrix T 3 = diag 1 1 1 1 − 1 − 1 − 1 − 1 (24) we ha ve that [ V T 3 V ] = H 8 , where H 8 is the 8 × 8 Hadamard matrix. In the following we see that Hadamard designs provide the conditions for perfect alignment and linear independence in a more general setting. Let m = 2 , N = 2 L , and X i = I 2 i − 1 ⊗ blkdiag I N 2 i , − I N 2 i , for i ∈ [ L ] , and consider the set H N = ( L Y i =1 X x i i w : x i ∈ { 0 , 1 } ) . (25) Lemma 1: Let an N × N Hadamard matrix of the Sylvester’ s construction H N 4 = " H N 2 H N 2 H N 2 − H N 2 # , (26) with H 1 = 1 . Then, H N is full rank with mutually orthogonal columns, that are the N elements of H N . The proof of Lemma (1) can be found in the Appendix. Example T o illustrate the connection between H N and H N we “decompose” the Hadamard matrix of order 4 H 4 = 1 1 1 1 1 − 1 1 − 1 1 1 − 1 − 1 1 − 1 − 1 1 = [ w X 2 w X 1 w X 2 X 1 w ] , (27) where X 1 = diag 1 1 − 1 − 1 and X 2 = diag 1 − 1 1 − 1 . Due to the commutati vity of X 1 and X 2 , the columns of H 4 are also the elements of H 4 = { w , X 1 w , X 2 w , X 1 X 2 w } . Now , consider the matrix V i that has as columns the elements of V i = L Y s =1 ,s 6 = i X x s s w : x s ∈ { 0 , 1 } . (28) W e know that the space of V i is in v ariant with repsect to X j since the corresponding lattice representation wraps around itself due to X 2 i = I N . Additionally , we hav e L ( X i V i ) = e i + L X s =1 ,s 6 = i x s e s : x s ∈ { 0 , 1 } , and we observe that L ( X i V i ) ∩ L ( V i ) = ∅ , i.e., L ( V i ) does not include an y points with nonzero x i coordinates. Then, due to the orthogonality of elements within H N , we ha ve |L ( V i ) | = |L ( X j V i ) | = rank ( V i ) = rank ( X i V i ) = N 2 , (29) for any i, j ∈ { 1 , . . . , L } . Hence, we obtain the following lemma for the set H N and its associated L map. Lemma 2: For any i, j ∈ { 1 , 2 , . . . , L } we hav e that rank ([ V i X j V i ]) = |L ( V i ) ∪ L ( X j V i ) | = N , i = j, N 2 , i 6 = j. (30) In Fig. 7 we giv e an illustrative example of the aforementioned definitions and properties. For N = 2 3 , we consider H 8 and V 3 along with the matrix product X 2 V 3 and their corresponding lattice representations. x 1 x 2 L ( V 3 ) L ( X 2 V 3 ) x 1 x 2 0 1 1 0 x 1 x 2 x 3 L ( H 8 ) Fig. 7. W e set N = 8 and show the dots representation of H 8 , V 3 , and X 2 V 3 . W e use the aforementioned properties of Hadamard matrices to construct repair matrices V i for our code construction; these matrices have perfect space alignment properties for the repair instances of the code in (5) induced by single node failures. Remark 4: Notice that equations (22) and (23) are respectiv ely analogous to the channel matrices and beamforming vectors used in wireless channels for ergodic interference alignment [25]. In particular , for the K user interference channel, the channel matrices used for ergodic alignment are diagonalized v ersions of the column vectors of H 2 . V . O P T I M A L S Y S T E M AT I C N O D E R E PA I R Let systematic node i ∈ { 1 , . . . , k } of the code in (5) fail. The coding matrix A i corresponding to the lost systematic piece f i , holds one matrix, that is, X i , which is unique among all other coding matrices, A s , s ∈ { 1 , . . . , k }\ i . W e pick the repair matrix as a set of N 2 vectors whose lattice representation is in variant to all X j s b ut to one k ey matrix: the unique X i component of A i . W e construct the N × N 2 repair matrix V i whose columns are the elements of the set V i = k +1 Y s =1 ,s 6 = i X x s s w : x s ∈ { 0 , 1 } . (31) This repair matrix is used to multiply both the contents of parity node 1 and 2 , that is, V (1) i = V (2) i = V i . During the repair , the useful (desired signal) space populated by f i is [ V i A i V i ] (32) and the interference space due to file part f s , s ∈ { 1 , . . . , k }\ i , is [ V i A s V i ] . (33) Remember that an optimal solution to R i requires the useful space to ha ve rank N and each of the interference spaces rank N 2 . Observe that the follo wing holds for each of the interference spaces N 2 ≤ rank ([ V i ( a s X s + b s X k +1 + I N ) V i ]) ≤ |L ( V i ) ∪ L ( X s V i ) ∪ L ( X k +1 V i ) | = |L ( V i ) | = N 2 , (34) for s ∈ { 1 , . . . , k }\ i , since L ( X s V i ) = L ( V i ) , s ∈ { 1 , . . . , k + 1 }\ i. (35) Then, for the useful data space we ha ve N ≥ rank ([ V i A i V i ]) = rank ([ V i ( a i X i + b i X k +1 + I N ) V i ]) ( ∗ ) = rank ([ V i X i V i ]) = |L ( V i ) ∪ L ( X i V i ) | = |L ( H N ) | = N , (36) for any a i 6 = 0 , where ( ∗ ) comes from the fact that ( a i X i + b i X k +1 + I N ) V i is a linear combination of columns from V i , X k +1 V i , and X i V i . The column spaces of V i and X k +1 V i are identical, hence we can generate the columns of ( a i X i + b i X k +1 + I N ) V i by linear combinations of the columns in X i V i and in V i , howe ver V i is already in the con- catenation [ V i ( a i X i + b i X k +1 + I N ) V i ] . This means that [ V i X i V i ] and [ V i ( a i X i + b i X k +1 + I N ) V i ] ha ve the same span. Therefore, we are able to generate the minimum amount of interference and at the same time satisfiy the full rank constraint of R i . The repair matrix in (31) is an optimal solution for R i and systematic node i can be optimally repaired by downloading ( k + 1) N 2 worth data equations, for all i ∈ { 1 , . . . , k } . In Fig. 8, we sketch the structure of our code. In each block of the second parity we denote the key matrices that comprise it. W e select our repair matrix such that it “absorbs” all matrices but the key one. That way , interference aligns in half the dimensions, and the useful space spans all N dimensions. X 1 X 3 X 2 I 16 I 16 I 16 I 16 I 16 I 16 V 1 X 1 X 3 X 2 I 16 I 16 I 16 X 1 V 1 X 2 V 1 X 3 V 1 X 4 V 1 V 1 N/ 2 N desired data in terference Fig. 8. A (5 , 3) repair optimal code. V I . O P T I M A L P A R I T Y R E P A I R The ingredient of our construction that “unlocks” optimal repair for the first parity is the inclusion of the identity matrix in each A i . The same goes for the X k +1 matrix and the repair of the second parity . Both these additionally included matrices refine the parity repair process such that optimality is feasible. Selecting appropriate constants a i and b i is also essential to our dev elopments. T o optimally solve the problem, we rewrite the parity repair as a systematic one in an equiv alent re-interpretation of our code. A. Repairing the first parity Let the first parity node fail. W e mak e a change of v ariables to obtain a ne w representation for our code in (5), where the first parity is a systematic node in an equiv alent representation. W e start with our ( k , k + 2) MDS storage code of (5) I N 0 N ... 0 N 0 N I N ... 0 N . . . . . . . . . . . . 0 N 0 N ... I N I N I N ... I N A 1 A 2 ... A k f . (37) and make the follo wing change of v ariables k X i =1 f i = y 1 . (38) f s = y s , s ∈ { 2 , . . . , k } . (39) W e solv e (38) and (39) for f 1 in terms of the y i variables and obtain f 1 = y 1 − k X s =2 y s . (40) Then, we plug (39) and (40) in (37), to have the equiv alent representation I N − I N ... − I N 0 N I N ... 0 N . . . . . . . . . . . . 0 N 0 N ... I N I N 0 N ... 0 N A 1 A 2 − A 1 ... A k − A 1 . y , (41) where y = y T 1 . . . y T k T ∈ F kN q . The first parity node of the code in (5) now corresponds to the node which contains y 1 in the aforementioned representation. The coding matrices under this new representation are A 1 = a 1 X 1 + b 1 X k +1 + I N , (42) A s − A 1 = a s X s + ( b s − b 1 ) X k +1 − a 1 X 1 , (43) for s ∈ { 2 , . . . , k } . In contrast to the systematic node repair process, in the follo wing we use a repair matrix of a slightly different structure. W e construct the repair matrix V a with columns in the set V a = ( k +1 Y s =2 ( X 1 X s ) x s w : x s ∈ { 0 , 1 } ) . (44) Observe that this set is also a subset of H N . Then, to repair the node of (41) that contains y 1 (i.e., the one that corresponds to the first parity node of (37)) we do wnload X 1 V a times the contents of the first parity in (41) and V a times the contents of the second parity . Hence, during this repair , the useful space is spanned by [ X 1 V a A 1 V a ] (45) and the interference space due to file part y s , s ∈ { 2 , . . . , k } , is [ X 1 V a ( A s − A 1 ) V a ] . (46) Before we proceed, observe that the follo wing hold L ( X 1 X s V a ) = L ( V a ) = ( k +1 X s =2 x s ( mod 2) ! e 1 + k +1 X s =2 x s e s ; x s ∈ { 0 , 1 } ) (47) ⇔L ( X 1 V a ) = L ( X s V a ) = ( 1 + k +1 X s =2 x s ( mod 2) ! e 1 + k +1 X s =2 x s e s ; x s ∈ { 0 , 1 } ) (48) ⇒L ( X s 1 V a ) = L ( X s 2 V a ) , (49) for any s, s 1 , s 2 ∈ { 1 , . . . , k + 1 } . The abo ve equations imply that L ( V a ) ∪ L ( X 1 V a ) = ( k +1 X s =1 x s ; x s ∈ { 0 , 1 } ) = L ( H N ) . (50) Therefore, we ha ve the following for each of the interference spaces N 2 ≤ rank ([ X 1 V a ( a s X s + ( b s − b 1 ) X k +1 − a 1 X 1 ) V a ]) ≤ |L ( X 1 V a ) ∪ L ( X s V a ) ∪ L ( X k +1 V a ) | = |L ( X 1 V a ) | = N 2 . (51) Moreov er , for the useful data space we hav e rank ([ X 1 V a ( a 1 X 1 + b 1 X k +1 + I N ) V a ]) = rank ([ X 1 V a V a ]) = |L ( V a ) ∪ L ( X 1 V a ) | = |L ( H N ) | = N . (52) Thus, we can perform optimal repair of the node containing y 1 in (41), which is equiv alent to optimally repairing the first parity of our code in (5). B. Repairing the second parity Here, we ha ve an additional step. W e first manipulate our coding matrices of (5) to obtain an equi valent representation for the same code. Then, in the same manner we re write this code in a form where the second parity of (5) is a systematic node in some representation. W ithout loss of generality , we can multiply an y coding column block that multiplies the i th file part I A i = I a i X i + b i X k +1 + I N (53) with a full rank matrix and maintain the same code properties, as sho wn in [20]. In the following deriv ations, we use the fact that X 2 s = I N , for an y s ∈ { 1 , . . . , k + 1 } . W e multiply the i -th block of (5) with a i X i − b i X k +1 + I N to obtain I N a i X i + b i X k +1 + I N ≡ a i X i − b i X k +1 + I N ( a i X i − b i X k +1 + I N ) ( a i X i + b i X k +1 + I N ) = a i X i − b i X k +1 + I N ( a i X i + I N ) 2 − b 2 i I N ≡ a i X i − b i X k +1 + I N 2 a i X i + ( a 2 i − b 2 i + 1) I N . ( ∗ ) = a i X i − b i X k +1 + I N 2 a i X i , (54) where in ( ∗ ) we use the fact that a 2 i − b 2 i + 1 = 0 . W e continue by multiplying the i -th column block with ( a i ) − 1 X i to obtain a i X i − b i X k +1 + I N 2 a i X i ≡ I N − a − 1 i b i X k +1 X i + a − 1 i X i 2 I N ≡ I N − a − 1 i b i X k +1 X i + a − 1 i X i I N , (55) where in the last step we multiplied the contents of the second parity with 2 − 1 . Hence, let A 0 i = I N − a − 1 i b i X k +1 X i + a − 1 i X i , i ∈ { 1 , . . . , k } . (56) Then, we re write our original code as I N 0 N ... 0 N 0 N I N ... 0 N . . . . . . . . . . . . 0 N 0 N ... I N A 0 1 A 0 2 ... A 0 k I N I N ... I N f 0 (57) where f 0 is a full rank row transformation of f . W e proceed in the same manner that we handled the first parity repair . W e make a change of variables such that the second parity becomes a systematic node in a ne w representation k X i =1 f 0 i = y 0 1 (58) and obtain the equi valent form I N − I N ... − I N 0 N I N ... 0 N . . . . . . . . . . . . 0 N 0 N ... I N A 0 1 A 0 2 − A 0 1 ... A 0 k − A 0 1 I N 0 N ... 0 N . y 0 , (59) where A 0 1 = I N − a − 1 1 b 1 X k +1 X 1 + a − 1 1 X 1 , (60) A 0 s − A 0 1 = a − 1 s X s − a − 1 s b s X k +1 X s + a − 1 1 b 1 X k +1 X 1 − a − 1 1 X 1 (61) Then, the parity node which corresponds to systematic node 1 here, can be repaired by using V b with columns in the set V b = ( X x k +1 k +1 k Y s =2 ( X 1 X s ) x s w : x k +1 , x s ∈ { 0 , 1 } ) . (62) Again, the follo wing equations hold L ( X k +1 V b ) = L ( V b ) = ( k X s =2 x s ( mod 2) ! e 1 + k +1 X s =2 x s e s ; x s ∈ { 0 , 1 } ) , (63) L ( X s 1 V b ) = L ( X s 2 V b ) = ( 1 + k X s =2 x s ( mod 2) ! e 1 + k +1 X s =2 x s e s ; x s ∈ { 0 , 1 } ) , (64) and L ( X s 1 X k +1 V b ) = L ( X s 1 V b ) , (65) for all s 1 , s 2 ∈ { 1 , . . . , k } . Hence, we ha ve for the interfence space generated by component y 0 s , s ∈ { 2 , . . . , k } N 2 ≤ rank X 1 V b ( A 0 s − A 0 1 ) V b ≤ |L ( X s V b ) ∪ L ( X 1 V b ) ∪ L ( X k +1 X 1 V b ) ∪ L ( X k +1 X s V b ) | = |L ( X 1 V b ) ∪ L ( X s V b ) | = N 2 . (66) Moreov er , the useful space is full rank rank X 1 V b I N − a − 1 1 b 1 X k +1 X 1 + a − 1 1 X 1 V b = rank ([ X 1 V b V b ]) = N . (67) Thus, we can perform optimal repair for the second parity of the code in (5), with repair bandwidth ( k + 1) N 2 . V I I . T H E M D S P RO P E RT Y In this section, we gi ve explicit conditions on the a i , b i constants, for all i ∈ { 1 , . . . , k } , and the size of the finite field F q , for which the code in (5) is MDS. W e discuss the MDS property using the notion of data collectors (DCs), in the same manner that it was used in [2]. A DC can be considered as an e xternal user that can connect and has complete access to the contents of some subset of k nodes. A storage code where each node expends M k worth of storage, has the MDS property when all possible n k DCs can decode the file f . W e can sho w that testing the MDS property is equiv alent to checking the rank of a specific matrix associated with each DC. This DC matrix is the v ertical conca tenation of the k stacks of equations stored by the nodes that the DC connects to. If all n k DC matrices are full rank, then we declare that the storage code has the MDS property . W e start with a DC that connects to systematic nodes { 1 , . . . , k − 1 } and the first parity node. The determinant of the corresponding DC matrix is det I N . . . 0 N × N 0 N × N . . . . . . . . . 0 N × N . . . I N 0 N × N I N . . . I N I N = det ( I N ) 6 = 0 , (68) since I N is a full rank diagonal matrix. W e continue by considering a DC that connects to systematic nodes { 1 , . . . , k − 1 } and the second parity node. F or that we ha ve det I N . . . 0 N × N 0 N × N . . . . . . . . . 0 N × N . . . I N 0 N × N A 1 . . . A k − 1 A k = det ( A k ) 6 = 0 , (69) due to A k being full rank. Finally , we consider DCs that connect to k systematic nodes and both parity nodes. Let a DC that connects to systematic node { 1 , . . . , k − 2 } and the two parities. The corresponding DC matrix is I N . . . 0 N × N 0 N × N 0 N × N . . . . . . . . . 0 N × N . . . I N 0 N × N 0 N × N I N . . . I N I N I N A 1 . . . A k − 2 A k − 1 A k . (70) The leftmost ( k − 2) N columns of the matrix in (70) are linearly independent, due to the upper-left identity block. Moreover , the leftmost ( k − 2) N columns are linearly independent with the rightmost 2 N , using an analogous argument. Hence, we need to only check the rank of the sub-matrix I N I N A k − 1 A k . (71) In the general case, a DC that connects to some k − 2 subset of systematic nodes and the two parities has a corresponding matrix where the follo wing block needs to be full rank so that the MDS property can be satisfied I N I N A i A j , (72) for i, j ∈ { 1 , . . . , k } and i 6 = j . The code is MDS when rank I N I N a i X i + b i X k +1 + I N a j X j + b j X k +1 + I N = rank I N I N a i X i + b i X k +1 + I N a j X j + b j X k +1 + I N × I N I N 0 N × N − I N ! = rank h I N 0 a i X i + b i X k +1 + I N a i X i − a j X j +( b i − b j ) X k +1 i = N 2 + rank ( a i X i − a j X j + ( b i − b j ) X k +1 ) = N , (73) for all i, j ∈ { 1 , . . . , k } , which is true if rank ( a i X i − a j X j + ( b i − b j ) X k +1 ) = N 2 . (74) Since the diagonal elements of X i are {± 1 } , the pre vious requirement gi ves the lemma. Lemma 3: The code in (5) is MDS when i ) a i − a j + ( b i − b j ) 6 = 0 , (75) ii ) a i + a j − ( b i − b j ) 6 = 0 , (76) iii ) a i − a j − ( b i − b j ) 6 = 0 , (77) and iv ) a i + a j + ( b i − b j ) 6 = 0 , (78) for all i 6 = j ∈ { 1 , . . . , k } . Now , remember that our initial constraint on the a i and b i constants was a 2 i − b 2 i = − 1 ⇔ ( a i − b i )( a i + b i ) = − 1 . (79) one solution to the previous equation is the follo wing a i − b i = x i (80) a i + b i = − x − 1 i , (81) If we input the above solution to (79), then the MDS equations (75)-(78) become a i − a j + ( b i − b j ) = a i + b i − ( a i + b j ) = − x − 1 i + x − 1 j 6 = 0 ⇔ x − 1 i 6 = x − 1 j , (82) a i + a j − ( b i − b j ) = a i − b i + a j + b j = x i − x − 1 j 6 = 0 ⇔ x i 6 = x − 1 j , (83) a i − a j − ( b i − b j ) = a i − b i − ( a j − b j ) = x i − x j 6 = 0 ⇔ x i 6 = x j , (84) a i + a j + ( b i − b j ) = a i + b i + a j − b j = − x − 1 i + x j 6 = 0 ⇔ x − 1 i 6 = x j , (85) The above conditions can be equi valently stated as x i 6 = x j and x i x j 6 = 1 , (86) for any i 6 = j ∈ { 1 , . . . , k } . Then, consider a prime field F q of size q . The set of x i s that satisfies our MDS requirements, is such in which no two elements are in verses of each other . It is kno wn that, over a prime field, half the nonzero elements are in verses of the other nonzero half. If we additionally do not consider x i ∈ { 1 , q − 1 } , then we are left with q − 3 2 elements. Therefore, we can consider a prime field of size q that has the property k ≤ q − 3 2 ⇔ q ≥ 2 k + 3 (87) and obtain x 1 , . . . , x k such that our requirements are satisfied. Then, the elements a i and b i , for all i ∈ { 1 , . . . , k } , can be obtained through the follo wing equations a i = 2 − 1 x i − 2 − 1 x − 1 i (88) b i = − 2 − 1 x i − 2 − 1 x − 1 i . (89) Observe that the above solutions yield a i 6 = 0 (that is needed for successful repair), for all i ∈ { 1 , . . . , k } , when x i / ∈ { 0 , 1 , q − 1 } . Therefore a prime field of size greater than, or equal to 2 k + 3 always suffices to obtain the MDS property . V I I I . G E N E R A L I Z I N G T O M O R E T H A N 2 PA R I TI E S A. m -parity codes with optimal systematic r epair W e generalize the Hadamard design construction of Section III and of the code in [26], to construct ( k + m, k ) MDS storage codes for file sizes M = k m k . Our constructions are based on a generalization of the Sylvester construction for complex Hadamard matrices that use m th roots of unity . W e generate these matrices as H m k = H m ⊗ H m k − 1 , (90) where H m is the m -point Discrete Fourier Transform matrix over a finite field. F or example, for m = 3 and F 7 , we ha ve H 3 = 1 1 1 1 ρ ρ 2 1 ρ 2 ρ and H 9 = H 3 H 3 H 3 H 3 ρ H 3 ρ 2 H 3 H 3 ρ 2 H 3 ρ H 3 , (91) where ρ = 2 . Then, we consider the set H m k = ( k Y i =1 X x i i w : x i ∈ { 0 , 1 , . . . , m − 1 } ) , (92) where w = 1 m k × 1 and X i = I m i − 1 ⊗ blkdiag I N m i , ρ I N m i , . . . , ρ m − 1 I N m i . (93) Here, ρ denotes an m th root of unity which yields X m i = I m k . (94) As with the m = 2 case, there is a one-to-one correspondence between the elements of the set H m k and the columns of H m k . The general m proof for that property follo ws the same manner of the m = 2 case, thus we omit it. Remark 5: T o maintain the full rank property of H m k , the finite field o ver which we operate should be chosen such that all m th roots of unity are distinct. The number of distinct m th roots of unity o ver a finite field F q is gi ven by the number of (distinct) solutions of the equation x m = 1 . This is equal to the order of the cyclic group that generates m th roots of unity within the multiplicati ve group of F q . This subgroup has order m when m divides q − 1 [27]. 1) Code construction: Our ( k + m, k ) MDS code encodes a file f of size M = k m k in the manner of I km k A ( k,m ) f , (95) where A ( k,m ) = I m k I m k . . . I m k λ 1 , 1 X 1 λ 1 , 2 X 2 . . . λ 1 ,k X k λ 2 , 1 X 2 1 λ 2 , 2 X 2 2 . . . λ 2 ,k X 2 k . . . λ m − 1 ,k X m − 1 1 λ m − 1 , 2 X m − 1 2 . . . λ m − 1 ,k X m − 1 k , (96) with λ i,j ∈ F q . 2) Optimal r epair of the systematic nodes: For this code, let systematic node i ∈ { 1 , . . . , k } fail. Then, to repair it we construct the repair matrix V i that has as columns the elements of set V i = k Y s =1 ,s 6 = i X x s s w : x s ∈ { 0 , 1 , . . . , m − 1 } . (97) This matrix is used to multiply the contents of each of the parity nodes. Here, the useful space during the repair is given by V i X i V i X 2 i V i . . . X m − 1 i V i (98) and the interference space generated by systematic component j 6 = i is spanned by V i X j V i X 2 j V i . . . X m − 1 j V i . (99) Due to the modulus- m property of the po wers of the X i matrices, we obtain the following under the lattice representation L ( V i ) = L X l j V i and L X l 1 i V i ∩ L X l 2 i V i = ∅ , (100) for an y j ∈ { 1 , . . . k } 6 = i , and l , l 1 , l 2 ∈ { 0 , . . . , m − 1 } , with l 1 6 = l 2 . The above property and the fact that the elements of H m k are linearly independent leads us to the follo wing lemma. Lemma 4: For any i, j ∈ { 1 , 2 , . . . , k } we ha ve that rank ( V i X j V i X 2 j V i . . . X m − 1 j V i ) = L ( V i ) ∪ L ( X j V i ) ∪ L X 2 j V i ∪ . . . ∪ L X m − 1 j V i (101) = m k , i = j, m k − 1 , i 6 = j. (102) By Lemma (4) we see that each of the k − 1 interference terms is confined within m k − 1 dimensions and the full rank property of the useful space is maintained. This is equiv alent to stating that we can repair a single systematic node failure by downloading exactly m k + ( k − 1) m k − 1 = ( n − 1) m k − 1 equations, which matches e xactly the information theoretic repair optimal of [2]. In Fig. 9 we gi ve an illustration of the repair spaces for a (6 , 3) code. W e sketch the structure of our code on the left of the figure. Each parity block is associated with a specific ke y matrix X i . This allo ws a selection of V i that is an in variant subspace to all matrices but to the key , one which multiplies the desired and lost file piece. This selection of V i results in perfect alignment of interference in 3 2 dimensions, while ensuring a full rank 3 3 useful space. in terference X 1 X 3 X 2 useful data X 2 1 X 2 2 X 2 3 X 2 1 X 2 2 X 2 3 V 1 X 1 X 3 X 2 I 27 I 27 I 27 I 27 I 27 I 27 I 27 I 27 I 27 N N/ 3 V 1 X 1 V 1 X 2 1 V 1 Fig. 9. A (6 , 3) systematic-repair optimal code. 3) Suboptimal r epair of the parities: In contrast to our 2 -parity code of (5), for this m -parity code, a parity node failure is repaired using the scheme of W u et al. [12]. W e first rewrite our code in a new systematic re-interpretation, where the lost parity is now in systematic form, in the same manner of the parity repair of our 2 -parity code. During the repair , we align a single interference block by in verting the corresponding matrices. This induces a repair download of m k − 1 + ( n − 2) m k equations, which suffices to e xactly reconstruct what was lost. This repair strategy is only optimal for ( n, 2) codes and asymptotically matches the file size for lar ge k . 4) The MDS pr operty: W e establish the MDS property of our m -parity codes in a probabilistic sense: we show that when we select the λ i,j variables uniformly at random o ver a sufficiently lar ge finite field, then the code is MDS with probability arbitrarily close to 1 . This is shown using the Schwartz-Zippel lemma [29], [30] on a nonzero polynomial on λ i,j s induced by the products of all possible DC matrix determinants. Let a DC of the code in (95) that connects to k − p systematic nodes and p parities. For simplicity consider that this is the DC that is connected to the last k − p systematic nodes and the first p parity nodes. The induced determinant of the corresponding DC matrix will be zero if the following determinant is zero det 0 ( k − p ) m k × pm k I ( k − p ) m k I m k I m k ... I m k λ 1 , 1 X 1 λ 1 , 2 X 2 ... λ 1 ,k X k λ 2 , 1 X 2 1 λ 2 , 2 X 2 2 ... λ 2 ,k X 2 k . . . λ p − 1 , 1 X p − 1 1 λ p − 1 , 2 X p − 1 2 ... λ p − 1 ,k X p − 1 k = | I ( k − p ) m k | det I m k I m k ... I m k λ 1 , 1 X 1 λ 1 , 2 X 2 ... λ (1) p X p λ 2 , 1 X 2 1 λ 2 , 2 X 2 2 ... λ (2) p X 2 p . . . λ p − 1 , 1 X p − 1 1 λ p − 1 , 2 X p − 1 2 ... λ p − 1 ,p X p − 1 p . (103) Since each of the X i matrices is diagonal, each column of the matrix in the right hand side of (103) has exactly p nonzero elements. These, pm k columns can be considered to fall into m k groups, with each element of a group ha ving identical non-zero support with any other vector in that group. Then, any two columns within a block I m k λ 1 ,i X i λ 2 ,i X 2 i . . . λ p − 1 ,i X p − 1 i (104) are orthogonal since their nonzero supports hav e zero ov erlap. Hence, a linear dependence will only e xist among columns of a given non-zero support. W e can then re write the matrix determinant of (103) as det P r B 1 0 m k × m k ... 0 m k × m k 0 m k × m k B 2 ... 0 m k × m k . . . 0 m k × m k ... 0 m k × m k B p P c = | P r || P c | m k Y i =1 | B i | (105) where P r and P c are the permutation matrices that group the columns and ro ws of the matrix according to their non-zero support so to generate the block diagonal matrix. The p × p matrix B i is of the form ρ i 1 ,j 1 λ i 1 ,j 1 ρ i 1 ,j 2 λ i 1 ,j 2 ... ρ i 1 ,j p λ i 1 ,j p ρ i 2 ,j 1 λ i 2 ,j 1 ρ i 2 ,j 2 λ i 2 ,j 2 ... ρ i 2 ,j p λ i 2 ,j p . . . . . . . . . . . . ρ i p ,j 1 λ i p ,j 1 ρ i p ,j 2 λ i p ,j 2 ... ρ i p ,j p λ i p ,j p . (106) where ρ i 1 ,j 1 is some m th root of unity , the indeces depend on i , and no λ i,j appears more than once within each matrix. W e can expand the determinant of an y B i matrix using the Leibniz formula, where p ! monomials of degree p appear . Each one of them includes a dif ferent subset of the λ i,j variables. Hence, the induced polynomial cannot be the zero polynomial. Therefore, the determinant of B i is a nonzero polynomial of degree p in the λ i,j variables, hence Q m k i =1 | B i | is also a non zero polynomial of degree pm k in the λ i,j variables. Accordingly , we can compute the determinant of each DC in this way . In the same manner, each of them will be a nonzero polynomial in λ i,j . The product of all these determinants be a nonzero polynomial in λ i,j of some degree d . By the Schwartz-Zippel lemma [30], we kno w that when we dra w λ i,j uniformly at random ov er a field of size q , this induced polynomial is zero with probability less than or equal to d q . Hence, the MDS property is satisfied with probability arbitrarily close to 1 , for sufficiently large finite fields. I X . C O N N E C T I O N T O P E R M U TA T I O N - M ATR I X B A S E D C O D E S Here we in vestigate some interesting connections between our systematic-repair optimal codes of Section VIII and the permutation-matrix based codes presented in [21] and [23]. Under a similarity transformation, our codes are equiv alent to ones with coding matrices picked as specific permutation matrices. Multiplying the column space of an X i matrix of our construction with the Hadamard matrix H m k , yields a matrix that is a permutation of the columns of the Hadamard matrix H − 1 m k X i H m k = H − 1 m k H m k P i = P i , (107) where P i is some permutation matrix. This is due to the fact that the elements of H m k wrap around, i.e., L ( H m k ) = L ( X i H m k ) for any i . Example Consider the m = 2 , k = 3 case: H 2 3 = [ w X 2 w X 1 w X 1 X 2 w ] (108) X 1 H 2 3 = [ X 1 w X 1 X 2 w w X 2 w ] = H 2 3 P 1 (109) X 2 H 2 3 = [ X 2 w w X 1 X 2 w X 1 w ] = H 2 3 P 2 , (110) where P 1 and P 2 are permutation matrices. The wrap-round property of the columns of the Hadamard matrix produces permutations of itself when multiplied by the X i matrices, and each permutation is distinct. W ithout loss of generality [16], we can re write the A ( m,k ) matrix of (95) as H − 1 m k A ( k,m ) ( I k ⊗ H m k ) = H − 1 m k I m k H m k I m k H m k ... I m k H m k λ 1 , 1 X 1 H m k λ 1 , 2 X 2 H m k ... λ 1 ,k X k H m k λ 2 , 1 X 2 1 H m k λ 2 , 2 X 2 2 H m k ... λ 2 ,k X 2 k H m k . . . λ m − 1 ,k X m − 1 1 H m k λ m − 1 , 2 X m − 1 2 H m k ... λ m − 1 ,k X m − 1 k H m k = H − 1 m k H m k H m k ... H m k λ 1 , 1 H m k P 1 , 1 λ 1 , 2 H m k P 1 , 2 ... λ 1 ,k H m k P 1 ,k λ 2 , 1 H m k P 2 , 1 λ 2 , 2 H m k P 2 , 2 ... λ 2 ,k H m k P 2 ,k . . . λ m − 1 ,k H m k P m − 1 , 1 λ m − 1 , 2 H m k P m − 1 , 2 ... λ m − 1 ,k H m k P m − 1 ,k = I m k I m k ... I m k λ 1 , 1 P 1 , 1 λ 1 , 2 P 1 , 2 ... λ 1 ,k P 1 ,k λ 2 , 1 P 2 , 1 λ 2 , 2 P 2 , 2 ... λ 2 ,k P 2 ,k . . . λ m − 1 ,k P m − 1 , 1 λ m − 1 , 2 P m − 1 , 2 ... λ m − 1 ,k P m − 1 ,k , (111) where P i,j is a permutation matrix. The systematic nodes of this equi valent ( k + m, m ) MDS code can be optimally repaired using the repair matrices V i H − 1 i , where V i has the columns of the set V i = n Q k s =1 ,s 6 = i X x s s w : x s ∈ { 0 , 1 , . . . , m − 1 } o . This is true since the rank properties of the correspoding useful and interference spaces remain the same under full rank column transformations. Interestingly , this connection is two-way . W e gi ve an example of a permutation code from [23] that exactly maps to our designs. Example W e consider the (5 , 3) permutation code of [23], designed for file sizes M = 3 · 2 3 . The three coding matrices of the first parity of this code are three identity matrices I 8 . The three coding matrices of the second parity are three permutation matrices P 1 = I { 5 , 6 , 7 , 8 , 1 , 2 , 3 , 4 } , : , P 2 = I { 3 , 4 , 1 , 2 , 7 , 8 , 5 , 6 } , : , and P 3 = I { 2 , 1 , 4 , 3 , 6 , 5 , 8 , 7 } , : , (112) where I { i 1 ,i 2 ,i 3 ,i 4 ,i 5 ,i 6 ,i 7 ,i 1 8 } , : indicates a permutation of the columns of the 8 × 8 identity matrix. W e know that these matrices commute, therefore since they are normal, they can be simultaneously diagonalized under a common eigen basis. It can be checked that a common basis for the abov e commuting permutation matices is the Hadamard matrix, which gi ves H 8 P 1 H T 8 = X 1 , H 8 P 2 H T 8 = X 2 , H 8 P 3 H T 8 = X 3 . (113) The connection manifested by the above equiv alence examples seems very interesting. W e believe that further inv estigation on it can lead to better understanding of the repair optimal high-rate MDS code regime. X . C O N C U L S I O N S W e presented the first e xplicit, high-rate, ( k + 2 , k ) erasure MDS storage code that achie ves optimal repair bandwidth for any single node failure, including the parities. Our construction is based on perfect interference alignment properties offered by Hadamard designs. W e generalize our 2 -parity constructions to erasure codes with m -parities that achieve optimal repair of the systematic parts. X I . A C K N O W L E D G E M E N T The authors w ould like to thank Changho Suh for insightful discussions. A P P E N D I X Proof of Lemma 1 : Observe that H N = H T N and H N H T N = H N H N = " 2 H N 2 H N 2 0 N 2 × N 2 0 N 2 × N 2 2 H N 2 H N 2 # = 2 I 2 ⊗ H N 2 H N 2 = 2 I 2 ⊗ 2 I 2 ⊗ H N 2 H N 4 = 4 I 4 ⊗ H N 4 H N 4 . . . = N · ( I N ⊗ H 1 H 1 ) = N · I N . (114) W e also hav e that N 6 = 0 ( mod q ) , for q > 2 , thus the rank of H N is N and its columns are mutually orthogonal. Then, let an N × N diagonal matrix X i = I 2 i − 1 ⊗ blkdiag I N 2 i , − I N 2 i (115) defined for i = { 1 , . . . , log 2 ( N ) } . X i is a diagonal matrix, whose elements is a series of alternating 1 s and − 1 s, starting with N 2 i 1 s that flip to − 1 s and back e very N 2 i positions. W e can no w e xpand H N in the follo wing way H N = " H N 2 H N 2 H N 2 − H N 2 # = 1 2 × 1 ⊗ H N 2 | {z } F 1 X 1 1 2 × 1 ⊗ H N 2 . (116) W e proceed in the same manner by e xpanding all “smaller” H N 2 i s F 1 = 1 2 × 1 ⊗ h 1 2 × 1 ⊗ H N 2 2 X 1 1 2 × 1 ⊗ H N 2 2 i = 1 2 2 × 1 ⊗ H N 2 2 | {z } F 2 X 2 1 2 2 × 1 ⊗ H N 2 2 F 2 = 1 2 3 × 1 ⊗ H N 2 3 | {z } F 3 X 3 1 2 3 × 1 ⊗ H N 2 3 . . . F log 2 ( N ) − 1 = 1 N × 1 X log 2 ( N ) 1 N × 1 , (117) where F i is an N × N 2 i matrix. Thus, span ( H N ) = span ([ F 1 X 1 F 1 ]) = span ([ F 2 X 2 F 2 X 1 F 2 X 1 X 2 F 2 ]) . . . = span log 2 ( N ) Y i =1 X x i i w : x i ∈ { 0 , 1 } , (118) which proves the final part of Lemma 1. R E F E R E N C E S [1] The Coding for Distrib uted Storage wiki http://tinyurl.com/storagecoding [2] A. G. Dimakis, P . G. Godfrey , Y . W u, M. J. W ainwright, and K. Ramchandran, “Network coding for distributed storage systems, ” in IEEE T rans. on Inform. Theory , vol. 56, pp. 4539 – 4551, Sep. 2010. [3] A. G. Dimakis, K. Ramchandran, Y . W u, and C. Suh, “ A survey on network codes for distributed storage, ” in IEEE Pr oceedings , vol. 99, pp. 476 – 489, Mar. 2011. [4] S. Ghemaw at, H. Gobioff, and S.-T . Leung “The Google file system, ” in Proc. ACM Symp. on Op. Sys. Principles (SOSP) , Oct., 2003. [5] O. Khan, R. Burns, J. Plank, and C. Huang, “In search of I/O-optimal recovery from disk failures, ” to appear in Hot Storage 2011, 3r d W orkshop on Hot T opics in Storage and File Systems , Portland, OR, Jun., 2011. [6] H. W eatherspoon and J. D. Kubiato wicz, “Erasure coding vs. replication: a quantitiative comparison, ” in Proc. IPTPS , 2002. [7] M. Blaum, J. Brady , J. Bruck, and J. Menon, “EVENODD: An ef ficient scheme for tolerating double disk failures in raid architectures, ” in IEEE T ransactions on Computers , 1995. [8] Z. W ang, A. G. Dimakis, and J. Bruck, “Rebuilding for array codes in distributed storage systems, ” in Pr oc. W orkshop on the Application of Communication Theory to Emer ging Memory T echnolo gies (A CTEMT) , 2010. [9] L. Xiang, Y . Xu, J.C.S. Lui, and Q. Chang, “Optimal recovery of single disk failure in RDP code storage systems” in Pr oc. ACM SIGMETRICS (2010) international conference on Measur ement and modeling of computer systems [10] F . Oggier and A. Datta, “Self-repairing homomorphic codes for distributed storage systems, ” in Pr oc. IEEE Infocom 2011 , Shanghai, China, Apr . 2011. [11] V . R. Cadambe and S. A. Jafar , “Interference alignment and the de grees of freedom for the K user interference channel, ” IEEE Tr ans. on Inform. Theory , vol. 54, pp. 3425–3441, Aug. 2008. [12] Y . Wu and A. G. Dimakis, “Reducing repair traffic for erasure coding-based storage via interference alignment, ” in Proc. IEEE Int. Symp. on Information Theory (ISIT) , Seoul, Korea, Jul. 2009. [13] K.V . Rashmi, N. B. Shah, P . V . Kumar , and K. Ramchandran “Exact regenerating codes for distributed storage, ” In Allerton Conf. on Contr ol, Comp., and Comm. , Urbana-Champaign, IL, September 2009. [14] N. B. Shah, K. V . Rashmi, P . V . Kumar, and K. Ramchandran, “Explicit codes minimizing repair bandwidth for distrib uted storage, ” in Pr oc. IEEE ITW , Jan. 2010. [15] C. Suh and K. Ramchandran, “Exact regeneration codes for distributed storage repair using interference alignment, ” in Pr oc. 2010 IEEE Int. Symp. on Inform. Theory (ISIT) , Seoul, K orea, Jun. 2010. [16] N. B. Shah, K. V . Rashmi, P . V . Kumar , and K. Ramchandran, “Interference alignment in regenerating codes for distributed storage: necessity and code constructions, ” Sep. 2010. Preprint A vailable online at http://arxiv .org/abs/1005.1634. [17] Y . W u. “ A construction of systematic MDS codes with minimum repair bandwidth, ” Submitted to IEEE T ransactions on Information Theory , Aug. 2009. Preprint av ailable at http://arxiv .org/abs/0910.2486. [18] V . Cadambe, S. Jafar , and H. Maleki, “Distrib uted data storage with minimum storage regenerating codes - exact and functional repair are asymptotically equally efficient, ” in 2010 IEEE Intern. W orkshop on W ireless Network Coding (W iNC) , Apr 2010. [19] C. Suh and K. Ramchandran, “On the existence of optimal exact-repair MDS codes for distributed storage, ” Apr . 2010. Preprint a vailable at http://arxiv .org/abs/1004.4663 [20] K. Rashmi, N. B. Shah, and P . V . Kumar , “Optimal exact-regenerating codes for distributed storage at the MSR and MBR points via a product-matrix construction, ” submitted to IEEE Transactions on Information Theory . Preprint available at at http://arxiv .org/pdf/1005.4178. [21] I. T amo, Z. W ang, and J. Bruck “MDS Array Codes with Optimal Rebuilding, ” to appear in 2011 IEEE Symposium on Information Theory (ISIT) . Preprint available at http://arxiv .org/abs/1103.3737 [22] V . R. Cadambe, C. Huang and J. Li, “Permutation codes: optimal exact-repair of a single failed node in MDS code based distributed storage systems, ” to appear in 2011 IEEE Symposium on Information Theory (ISIT) . [23] V . R. Cadambe, C. Huang, S. A. Jafar , and J. Li, “Optimal repair of MDS codes in distributed storage via subspace interference alignment, ” arxiv pr e-print 2011 . Preprint available at http://arxi v .org/abs/1106.1250. [24] K. W . Shum and Y . Hu, “Exact minimum-repair-bandwidth cooperative regenerating codes for distributed storage systems, ” to appear in 2011 IEEE Symposium on Information Theory (ISIT) . Preprint av ailable at http://arxi v .org/abs/1102.1609. [25] B. Nazer, S. A. Jafar , M. Gastpar, and S. Vishw anath, ”Ergodic interference alignment, ” in Pr oc. 2009 IEEE Symposium on Information Theory (ISIT) , pp.1769-1773, Jun. 2009 [26] D. S. Papailiopoulos and A. G. Dimakis, “Distributed storage Codes through Hadamard designs, ” to appear in ISIT , 2011 . [27] R. Lidl and H. Niederreiter, F inite Fields (Encyclopedia of Mathematics and its Applications) , Cambridge Univ . Press, 2008. [28] G. Bresler and D. N. C. Tse, ”3 User interference channel: Degrees of freedom as a function of channel di versity , ” 47th Allerton Conf . on Comm. Control and Comp. , pp.265-271, Sep. 2009 [29] T . Ho, R. K oetter , M. Mdard, M. Ef fros, J. Shi, and D. Karger , “ A random linear network coding approach to multicast, ” IEEE T rans. on Inform. Theory , vol. 52, pp. 4413 – 4430, Oct. 2006. [30] R. Motwani and P . Raghavan, Randomized Algorithms , Cambridge Univ . Press, 1995.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment