Residual-Based Detections and Unified Architecture for Massive MIMO Uplink
Massive multiple-input multiple-output (M-MIMO) technique brings better energy efficiency and coverage but higher computational complexity than small-scale MIMO. For linear detections such as minimum mean square error (MMSE), prohibitive complexity l…
Authors: Chuan Zhang (1, 2, 3)
Journal of Signal Pro cessing Systems man uscript No. (will b e inserted b y the editor) Residual-Based Detections and Unified Arc hitecture for Massiv e MIMO Uplink Ch uan Zhang · Y ufeng Y ang · Sh unqing Zhang · Zaic hen Zhang · Xiaoh u Y ou Received: Marc h 2, 2018 / Accepted: date Abstract Massiv e m ultiple-input m ultiple-output (M- MIMO) tec hnique brings b etter energy efficiency and co verage but higher computational complexity than small- scale MIMO. F or linear detections suc h as minimum mean square error (MMSE), prohibitiv e complexit y lies in solving large-scale linear equations. F or a b etter trade- off betw een bit-error-rate (BER) performance and com- putational complexity , iterativ e linear algorithms like conjugate gradient (CG) ha v e b een applied and hav e sho wn their feasibility in recen t years. In this paper, residual-based detection (RBD) algorithms are prop osed for M-MIMO detection, including minimal residual (MIN- RES) algorithm, generalized minimal residual (GMRES) algorithm, and conjugate residual (CR) algorithm. RBD algorithms fo cus on the minimization of residual norm p er iteration, whereas most existing algorithms fo cus on the appro ximation of exact signal. Numerical results ha ve shown that, for 64-QAM 128 × 8 MIMO, RBD al- gorithms are only 0 . 13 dB aw a y from the exact matrix in version metho d when BER= 10 − 4 . Stability of RBD algorithms has also b een verified in v arious correlation conditions. Complexit y comparison has sho wn that, CR algorithm require 87% less complexity than the tradi- tional method for 128 × 60 MIMO. The unified hardw are arc hitecture is prop osed with flexibility , which guaran- Ch uan Zhang ], ∗ · Y ufeng Y ang ] · Zic hen Zhang · Xiaoh u Y ou Lab of Efficient Arc hitectures for Digital-communication and Signal-processi ng (LEADS), National Mobile Communica- tions Research Lab oratory , Quan tum Information Center, Southeast Universit y , Nanjing, China E-mail: { chzhang, yfyang, zczhang, xhyu } @seu.edu.cn ] contr ibuted equally to this work, ∗ corresponding author Sh unqing Zhang Shanghai Institute for Adv anced Communications and Data Science, Shanghai Universit y , Shanghai, China. E-mail: shunqing@sh u.edu.cn tees a lo w-complexit y implementation for a family of RBD M-MIMO detectors. Keyw ords Massive MIMO · residual-based detection · minimal residual · conjugate residual · unified hardw are 1 In tro duction Multiple-input multiple-output (MIMO) is a k ey tech- nique for wireless communications [1] and has b een in- corp orated in to standards such as the 3rd generation partnership pro ject (3GPP) long term evolution (L TE) and IEEE 802 . 11n [2]. By equipping h undreds of anten- nas at transmitters and serving relatively a small num- b er of users [3], its adv anced version massive MIMO (M-MIMO) pro vides significan t impro v ement in sp ec- tral efficiency , interference reduction, transmit-p o wer efficiency , and link reliabilit y [4]. Because of the large an tenna num b er at base sta- tion (BS) or user side, computational complexit y b e- comes unaffordable in M-MIMO detection. Among ex- isting detections, zero forcing (ZF) is a basic w a y , whic h neglects the effect of noise [5]. Ho wev er, its performance is not satisfactory . Though linear schemes lik e mini- m um mean square error (MMSE) [6] impro v e the p er- formance compared with ZF, its computation complex- it y still increases drastically as the antenna num ber gro ws. F or a M-MIMO channel H , computational com- plexit y of MMSE inv ersion is O ( M 3 ), which makes it costly in applications [7]. T o av oid matrix inv ersion, Neumann series expansion (NSE) [8 – 10] has b een em- plo yed for appro ximation. Ho w ev er, complexit y remains unaffordable when NSE terms b ecome more than 2. Th us, iterativ e linear solv ers are prop osed for further reduction, suc h as Gauss-Seidel [11, 12] and conjugate 2 Ch uan Zhang et al. gradien t (CG) [13, 14]. Metho ds lik e successive ov er- relaxation (SOR) [15, 16] and its v ariation [17] are also considered. Meanwhile, efficien t optimizations of algo- rithms are also proposed lik e precondition [18, 19]. Most iterativ e linear detectors can reduce MMSE’s complex- it y to O ( M 2 ) with tolerable p erformance loss. It is worth noting that existing algorithms mainly fo cus on approximating exact solution [20], whereas this pap er prop oses residual-based detection (RBD) al- gorithms whic h fo cus on the minimization of residual norm. Firstly , minimal residual (MINRES) algorithm, whic h is a basic RBD, is considered. Its extended v er- sion, generalized minimal residual (GMRES) b y [21] is also considered, with unav oidable drawbac ks, whic h will b e detailed b elow. In M-MIMO scenario, GMRES can b e derived into another version: conjugate residual (CR). Computation process and con vergence pro of of eac h RBD algorithm are elab orated. Numerical results under different antenna configurations and correlations are given as w ell. F or lo w er complexity , the iteration n umber is chosen as 2, 3, and 4, resp ectively . Complex- it y comparison among prop osed RBD algorithms and the traditional one is also s ho wn, to demonstrate RBD algorithms’ adv an tages in p erformance and complexity . F or application considerations, efficient hardware ar- c hitectures of RBD algorithms are required. In this pa- p er, hardw are architectures of MINRES and CR algo- rithms are prop osed. Ho w ev er, as a family of algorithm, computation similarity can b e referred and accordingly a unified design metho d is prop osed. The hardware ar- c hitecture can be giv en by tw o common modules: iter- ativ e mo dule and co efficien t mo dule. Unified hardware arc hitecture for both MINRES and CR algorithms is further prop osed, whic h can also take care of GMRES. Moreo ver, the prop osed design metho d can b e also ap- plied in some RBD and other iterativ e detectors. The remainder of the pap er is organized as follows. Section 2 giv es the system mo dels of non- and corre- lated M-MIMO detectors. Section 3 employs RBD al- gorithms and sho ws the con v ergence pro of for eac h al- gorithm. Numerical results are given in Section 4. Sec- tion 5 elaborates the computational complexit y of RBD algorithms. Section 6 prop oses the hardware architec- tures of RBD algorithms and the unified design method. Finally , Section 7 concludes the en tire pap er. Notation: The lo w ercase and upper b old face let- ters stand for column v ector and matrix, resp ectiv ely . The operations ( . ) T and ( . ) H denote transpose and con- jugate transpose, respectively . The en try in the i -th ro w and j -th column of A is A ( i, j ). The vector α in the k -th iteration is α k . Complexity is denoted in terms of complex-v alued m ultiplication num ber. 2 System Mo del for Massiv e MIMO Uplink 2.A Linear Detection Mo del Consider an uplink of a massive MIMO system with N antennas at the base station (BS), which simulta- neously serves M single an tenna users. Here, N is al- w ays muc h bigger than M ( N >> M ). The transmit- ted signal and receiv ed v ectors are denoted b y s = [ s 1 , s 2 , ..., s M ] T and y = [ y 1 , y 2 , ..., y N ] T , resp ectively , where s ∈ C M , y ∈ C N . Then the system mo del is y = Hs + n , (1) where H is an N × M uplink channel matrix, n is the v ector represen ting Additiv e White Gaussian Noise (A W GN) with zero-mean and v ariance σ 2 . According to MMSE equalization sc heme, at the BS side, the estimate of the transmitted symbol v ector ˆ s is ˆ s = ( H H H + σ 2 I M ) − 1 H H y = A − 1 ˜ y , (2) where the matrix I means identit y matrix with dimen- sion M , and the MMSE filtering matrix A is defined based on Gr am matrix G : A = G + σ 2 I M , (3) where G = H H H . Corresp ondingly , output of matched filter ˜ y is ˜ y = H H y . (4) Nev ertheless, computational complexity of exact ma- trix inv ersion A − 1 is O ( M 3 ). Metho ds such as Cholesky decomp osition based metho d are not suitable for M- MIMO detection when its scale increases. 2.B Correlated Channel Mo del Consider correlation of an tennas for M-MIMO, this pa- p er applies Kr one cker model in [22] and H can b e de- noted by H = R 1 / 2 r WR 1 / 2 t , where W ∈ C N × M is an N × M i.i.d. c hannel matrix with zero mean and unit v ariance. Meanwhile R r ∈ C N × N and R t ∈ C M × M are spatial correlation matrices at BS and user side: R r ( i, k ) = ( ( ζ r e j θ ) k − i , i ≤ k , R 0 r ( k , i ) , i > k ; (5) R t ( i, k ) = ( ( ζ t e j θ ) k − i , i ≤ k , R 0 t ( k , i ) , i > k . (6) The i -th ro w and k -th column is denoted b y R ( i, k ). R 0 t and R 0 r are conjugate matrices of R t and R r , respec- tiv ely . This pap er contains four scenarios of correlation condition to elab orate common M-MIMO detectors. Residual-Based Detections and Unified Architecture for Massive MIMO Uplink 3 - Unc orr elate d : In this condition, correlations of BS and users are ignored, which means correlation fac- tor ζ t = ζ r = 0. Under this circumstance, R 1 / 2 t and R 1 / 2 r are actually I N and I M , resp ectively . Then H is the ideal i.i.d. R ayleigh fading channel matrix. - User Corr elate d : F or m ulti-an tenna users, if the dis- tance b et ween tw o BS antennas is larger than half- w av elength, correlation betw een BS an tennas can be neglected. In this condition R 1 / 2 r b ecomes diagonal matrix D r th us H = D r WR 1 / 2 t . - BS Corr elate d : F or single-antenna users, correlation among users is omitted. Nevertheless, as M-MIMO con tains large-scale an tenna arra y , pathloss betw een BS and users cannot b e ignored. Thus the channel is H = R 1 / 2 r WD t , where D t is a diagonal matrix where pathloss atten uation factor is represented. - F ul ly Corr elate d : When fully correlated, both user and BS should b e considered. Thus matrix remains H = R 1 / 2 r WR 1 / 2 t , where R 1 / 2 r and R 1 / 2 t are sho wn in Eq.s (5) and (6), resp ectiv ely . 3 Residual-Based Detection Algorithms In this section residual-based detection (RBD) is pro- p osed as a series. F or a linear detection problem As = y , (7) supp ose that s ∗ denotes the exact estimation of detec- tion signal, existing detection metho ds mainly fo cus on the appro ximation of s to s ∗ , whic h is denoted by the absolute error k s − s ∗ k . Whereas vector r = k y − As k de- notes the residual norm of the signal, RBD algorithms mainly fo cus on the minimization of v ector r in the computation pro cess. This section will giv e detailed de- scription of RBD algorithms and the relationship b e- t ween these algorithms will b e giv en to o. 3.A Minimal Residual Algorithm As a kind of pro jection algorithm for massiv e MIMO detection, prop osed minimal residual (MINRES) [23] is the simplest algorithm for its short calculation pro cess, whic h is shown in Algorithm 1. It is easily sho wn that MINRES minimizes the func- tion f ( s ) = k y − As k 2 2 in the direction of r . Since MIN- RES is the simplest RBD algorithm, it requires the fil- tering matrix A only to be positive definite. Since the MMSE filtering matrix A is symmetric positive definite (SPD), the requiremen t can b e met easily . So k r k +1 k 2 2 = ( r k − α k Ar k , r k − α k Ar k ) = ( r k − α k Ar k , r k ) − α k ( r k − α k Ar k , Ar k ) . (8) Algorithm 1 Minimal Residual Algorithm Input: A and ˜ y 1: for k = 0 , . . . , K do 2: r k = ˜ y − As k 3: α k = r H k Ar k k Ar k k 2 4: s k +1 = s k + α k r k 5: end for Output: ˆ s = s K +1 F or the v ector r k − α k Ar k is orthogonal to search direction Ar k , th us the right side of Eq. (8) v anishes and therefore k r k +1 k 2 2 = ( r k − α k Ar k , r k ) = ( r k , r k ) − α k ( Ar k , r k ) = k r k k 2 (1 − ( Ar k , r k ) ( r k , r k ) ( Ar k , r k ) ( Ar k , Ar k ) ) = k r k k 2 (1 − ( Ar k , r k ) 2 ( r k , r k ) 2 k r k k 2 2 k Ar k k 2 2 ) . (9) F or the p ositiv e definite matrix A , ( Ax , x ) ( x , x ) ≥ λ min ( A + A T ) / 2 > 0 . (10) Since matrix A is p ositive definite, its in v ersion A − 1 is p ositiv e definite, to o. Similarly , let t = Ax then ( Ax , x ) ( Ax , Ax ) = ( t , A − 1 t ) ( t , t ) ≥ λ min ( A − 1 + A − T ) / 2 > 0 . (11) Finally , let µ ( A ) denotes λ min ( A + A T ) / 2, then k r k +1 k 2 2 ≤ (1 − µ ( A ) µ ( A − 1 )) k r k k 2 2 . (12) F rom the deriv ation giv en, residual norm in MIN- RES algorithm decreases after eac h iteration, th us the con vergence of MINRES can b e pro v en. 3.B Generalized Minimal Residual Algorithm The Generalized Minimal Residual (GMRES) Algorithm is an iterative metho d to calculate the solution of non- symmetric system of linear systems [23]. It is the gener- alized version of MINRES, GMRES inference canceller w as prop osed in [21, 24] first and in this pap er, the essence of GMRES will b e in tro duced. Some compu- tation pro cesses to elab orate the computation pro cess of GMRES are supplemented in this paper. As a pro- jection method based on κ = κ V in whic h κ V is V -th Krylo v subspace, GMRES can minimize the residual norm to approximate the exact solution of As = y by the v ector s k ∈ κ k , where κ V = span { y , Ay , A 2 y , ..., A V − 1 y } . (13) 4 Ch uan Zhang et al. T o av oid the linear indep endence of vectors y , Ay ,..., A V − 1 y , Arnoldi iteration [25] is used to form orthog- onal basis q 1 , q 2 , ..., q V for κ V . Thus v ector s V ∈ κ V can b e rewritten as s = s 0 + Q V p V , where Q V is an m -b y- V matrix formed by basis q 1 , q 2 , ... , q V . Mean while, a ( V +1)-by- V upp er Hessen b erg matrix ˜ H V is pro duced in the Arnoldi iteration pro cess, where A Q V = Q V +1 ˜ H V . (14) Th us, the whole GMRES process can b e deduced. Define J ( p ) = k y − As k 2 = k y − A ( s 0 + Q V p ) k 2 = k r 0 − AQ V p k 2 = k β q 1 − Q V +1 ˜ H V p k 2 = k Q V +1 ( β e 1 − ˜ H V p ) k 2 . (15) Since the column-vectors of Q V +1 are orthogonal, it is easy to understand that J ( p ) = k β e 1 − ˜ H V p k 2 . (16) With the definition of J ( p ), GMRES algorithm min- imizes it and make the signal approximating s 0 + κ V . After knowing this, GMRES appro ximation can b e de- noted b y simple equation s V = s 0 + Qp V , (17) where p V = arg min k β e 1 − ˜ H V p V k 2 . (18) Accordingly , the computation pro cess of GMRES algorithm is sho wn in Algorithm 2. With the information given, M-MIMO detection prob- lems can be solv ed easily . How ev er, k ey step of GMRES is step-12 in Algorithm 2, whic h is not mentioned in [21, 24]. T o supplement the pro cess of GMRES and mak e it easier to b e understo o d, Giv ens rotation to solv e this optimization problem is in tro duced in this pap er and can b e seen in App endix A. F or matrix A , ( A T + A ) / 2 is p ositive definite, then in the k -th iteration, k r k k ≤ (1 − λ 2 min (1 / 2( A T + A )) λ max ( A T A ) ) n/ 2 k r 0 k , (19) where λ min ( M ) and λ max M denote the minimum and maxim um eigenv alue of matrix M , resp ectiv ely . While in M-MIMO detection sc heme, matrix A is SPD, then Eq. (19) can b e deformed to k r k k ≤ ( τ 2 ( A ) 2 − 1 τ 2 ( A ) 2 ) n/ 2 k r 0 k , (20) where τ 2 ( A ) is the condition n umber of A . F rom Eq.s (19) and (20), it can be seen that resid- ual norm of GMRES strictly decreases after iterations, Algorithm 2 Generalized Minimal Residual Algo- rithm Input: A and ˜ y r 0 = ˜ y − As 0 , β = k r 0 k 2 and q 1 = r 0 /β Define the ( V + 1)-b y- V matrix ˜ H V . ˜ H V = 0 1: for j = 1 , . . . , V do 2: w j = Aq j 3: for i = 1 , . . . , j do 4: ˜ H ( i, j ) = q T i w j 5: w j = w j − ˜ H ( i, j ) q i 6: end for 7: ˜ H ( j + 1 , j ) = k w j k 2 8: if ˜ H ( j + 1 , j ) = 0 then 9: V = j and go to 13 10: end if 11: q j +1 = w j / ˜ H ( j + 1 , j ) 12: p V = arg min k β e 1 − ˜ H V p k 2 13: s k = s k − 1 + Q V p V 14: end for Output: ˆ s = s V whic h shows the con v ergence of it. Syn thesizing Arnoldi GMRES algorithm and Giv ens rotation, the complete GMRES algorithm is a kind of adv anced algorithm as a M-MIMO detection metho d by minimizing the norm of the residual v ector. 3.C Conjugate Residual Algorithm As can b e seen in Section 3.B, complete GMRES algo- rithm needs to o many operations and some of them are square ro ot, and even matrix inv ersion from Givens ro- tation, which should b e a voided in M-MIMO detection sc heme. T o remedy this and k eep the performance of the algorithm for M-MIMO detection, GMRES can be up dated to an adv anced version. Consider GMRES is an algorithm for nonsymmet- ric problem, while M-MIMO detection is solving a SPD problem, some restrictions can b e added to GMRES, whic h makes the GMRES algorithm inv olving into the prop osed conjugate residual (CR) algorithm. Switch- ing nonsymmetric problems to Hermitian problems, CR can low er the computational complexity of GMRES. Being another Krylo v subspace iterative method, CR also minimizes the residual vector in each iteration and is feasible in M-MIMO detection. Computation pro cess of CR algorithm is sho wn in Algorithm 3. The output ˆ s can b e prov ed to support the conv er- gence of the algorithm [26]. F or CR on an SPD system, k s k k 2 − k s k − 1 k 2 = 2 α k s T k − 1 p k − 1 + p T k − 1 p k − 1 ≥ 0 . (21) Residual-Based Detections and Unified Architecture for Massive MIMO Uplink 5 Algorithm 3 CR for MMSE detection Input: A and ˜ y s 0 = 0 , r 0 = ˜ y − As 0 , p 0 = r 0 e 0 = Ap 0 , m 0 = Ar 0 1: for k = 1 , . . . , K do 2: α k = r H k − 1 m k − 1 / k e k − 1 k 2 3: s k = s k − 1 + α k p k − 1 4: r k = r k − 1 − α k e k − 1 5: m k = Ar k 6: β k = r H k m k / r H k − 1 m k − 1 7: p k = r k + β k p k − 1 8: e k = m k + β k e k − 1 9: end for Output: ˆ s = s K Therefore, k s k k ≥ k s k − 1 k . (22) Then, final solution can b e expressed as s l = s ∗ , s l = s l − 1 + α l − 1 p l − 1 = · · · = s k + α k +1 p k + · · · + α l − 1 p l − 1 = s k − 1 + α k p k − 1 + α k +1 p k + · · · + α l − 1 p l − 1 . (23) F rom the conclusion ab o v e, it can b e deduced that k s l − s k − 1 k 2 − k s l − s k k 2 = 2 α k p T k − 1 ( α k +1 p k + · · · + α l − 1 p l − 1 ) + α 2 k p T K − 1 p k − 1 ≥ 0 . (24) While for the MMSE linear detection problem, lin- ear equation As = y is to b e solved. Thus k s l − s k − 1 k 2 A − k s l − s k k 2 A = 2 α k p T k − 1 A ( α k +1 p k + · · · + α l − 1 p l − 1 ) + α 2 k p T k − 1 Ap k − 1 = 2 α k q T k − 1 ( α k +1 p k + · · · + α l − 1 p l − 1 ) + α 2 k q T k − 1 p k − 1 > 0 . (25) The deriv ation ab ov e indicates that the residual norm is strictly decreasing. Th us CR is feasible for massive MIMO detection. 4 Numerical Results and Comparison 4.A Results with Differen t Antenna Configurations With 64-QAM and i.i.d. channel model, the bit-error- rate (BER) comparison b etw een each RBD algorithm and tw o an tenna configurations are considered. Here iteration time k is set as 2, 3 and 4, resp ectively . It is w orth noting that because CR algorithm is a deriv ation of GMRES algorithm in M-MIMO sc heme, they ha ve the same BER p erformance as mentioned in Section 3.C. F or b etter elab oration, it is also shown in Fig. 1. It can b e seen that when k = 4, b oth of 4 6 8 10 12 14 16 18 SNR [dB] 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 BER GMRES ( Iteration k =2) GMRES ( Iteration k =3) GMRES ( Iteration k =4) CR (Iteration k =2) CR (Iteration k =3) CR (Iteration k =4) Cholesky Inversion Fig. 1: P erformance comparison with N × M = 128 × 16. them approximate traditional matrix in v ersion. T o be sp ecific, when BER= 10 − 4 , CR has only 0 . 28 dB gap b et w een Cholesky decomposition. Here Fig.s 2 and 3 compares BER p erformance of eac h RBD algorithm when the antenna configuration is N × M = 128 × 16 and 128 × 8, resp ectively . In Fig. 2, when BER= 10 − 3 and iteration time k = 4, MINRES has 2 . 3 dB drawbac k compared with Cholesky decom- p osition while GMRES and CR hav e only 0 . 18 dB gap with Cholesky decomposition. Performance of RBD al- gorithms improv e a lot along with the incremen t of it- eration time k . Meanwhile, GMRES and CR outper- form MINRES a lot. F or example, when BER= 10 − 2 and iteration time k = 3, CR and GMRES outp erform MINRES b y 2 . 73 dB SNR gap. 4 6 8 10 12 14 16 18 SNR [dB] 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 BER GMRES ( Iteration k =2) GMRES ( Iteration k =3) GMRES ( Iteration k =4) CR (Iteration k =2) CR (Iteration k =3) CR (Iteration k =4) MINRES (Iteration k =2) MINRES (Iteration k =3) MINRES (Iteration k =4) Cholesky Inversion Fig. 2: P erformance comparison with N × M = 128 × 16. F or another an tenna configuration N × M = 128 × 8, as sho wn in Fig. 3, RBD algorithms p erform well in ap- pro ximating Cholesky decomposition scheme. MINRES has h uge p erformance improv emen t as iteration time in- 6 Ch uan Zhang et al. creases. T ake BER= 7 × 10 − 2 for instance, MINRES has 6 . 2 dB gain when iteration time increases from k = 2 to k = 3. CR and GMRES hav e almost the same perfor- mance with exact matrix inv ersion when iteration time k ≥ 3, in which condition SNR gap betw een them is less than 0 . 2 dB. 4 6 8 10 12 14 16 18 SNR [dB] 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 BER GMRES ( Iteration k =2) GMRES ( Iteration k =3) GMRES ( Iteration k =4) CR (Iteration k =2) CR (Iteration k =3) CR (Iteration k =4) MINRES (Iteration k =2) MINRES (Iteration k =3) MINRES (Iteration k =4) Cholesky Inversion Fig. 3: Performance comparison with N × M = 128 × 8. 4.B Results with Differen t Correlation Conditions Consider N × M = 128 × 8 M-MIMO system and iter- ation time is 4, BER performances of eac h RBD algo- rithm and Cholesky decomp osition are given in Fig. 4. Here three conditions are considered: i) User Corr e- late d case ( ζ t = 0 . 2 , ζ r = 0), ii) BS c orr elate d c ase ( ζ t = 0 , ζ r = 0 . 3), iii) F ul ly Corr elate d case: ( ζ t = 0 . 2 , ζ r = 0 . 3). 6 7 8 9 10 11 12 SNR [dB] 10 -4 10 -3 10 -2 10 -1 BER Cholesky Inversion (User Correlated) CR and GMRES( User Correlated) MINRES (User Correlated) Cholesky Inversion (BS Correlated ) CR and GMRES (BS Correlated ) MINRES (BS Correlated) Cholesky Inversion (Fully Correlated) CR and GMRES (Fully Correlated ) MINRES (Fully Correlated) Fig. 4: P erformance comparison with correlations. In Fig. 4, as the correlation factor ζ v aries, p er- formance of GMRES and CR remain stable and the p erformance loss is less than 0 . 5 dB. MINRES algo- rithm will suffer from the change of the correlation condition. How ev er, MINRES loses up to 1 . 7 dB when BER= 9 × 10 − 2 . Thus RBD algorithms are not very sensitiv e to correlation conditions for M-MIMO. 5 Computational Complexit y Analysis Computational complexit y of eac h RBD algorithm is compared to describ e the complexity issue of RBD al- gorithms. In this section computational complexit y is analyzed for better understanding of RBD algorithms. As men tioned in Section 3.A, MINRES algorithm is the basic algorithm in RBD algorithms. GMRES algorithm is the generalized version of MINRES and is complex in computation pro cess. T o meet the requirement of M-MIMO system, GMRES can be deriv ed into CR al- gorithm, which is suitable for M-MIMO detection. T a- ble 1 concludes the comparison of differen t algorithms in terms of complex-v alued additions and complex-v alued m ultiplications. The detection complexity is mainly con tributed by complex-v alued multiplication. The complexit y of each algorithm is compared with Cholesky decomp osition. Supp ose the an tenna num ber at BS is 128 and SNR is 20 dB. Complexit y comparison is shown in Fig. ?? . 0 10 20 30 40 50 60 Number of user antennas, M 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Number of Complex-Valued multiplications × 10 5 Cholesky Inversion MINRES (Iteration k =2) MINRES (Iteration k =3) GMRES ( Iteration k =2) GMRES ( Iteration k =3) CR (Iteration k =2) CR (Iteration k =3) Fig. 5: Computational complexit y comparison. 5.A Complexit y of MINRES Being the basic RBD, MINRES has the simplest com- putation process, though in Fig. ?? its complexit y is not the least. Ho w ever, when user antenna n um b er is 60, Residual-Based Detections and Unified Architecture for Massive MIMO Uplink 7 Op eration MINRES GMRES CR Addition 2 kM ( 1 2 k 2 + 3 2 k + 1) M (4 K + 1) M Multiplication 4 kM 2 + 2 k M ( 5 2 k 2 + 1 2 k + 1) M 2 + ( 1 2 k 2 + 1 2 k ) M ( k + 3) M 2 + 8 k M T able 1: Complexit y Comparison of Different Algorithms. MINRES can ac hieve 76% complexit y reduction com- pared with traditional matrix in v ersion after 3 itera- tions. Th us the complexit y reduction out weighs the per- formance loss in terms of trade-off. 5.B Complexit y of GMRES As a generalized version of MINRES, GMRES has more application scenarios. Ho wev er, its complexit y rises also. As shown in Fig. ?? , GMRES has higher complexity than other RBD algorithms. Similarly , when user an- tenna num ber is 60, GMRES reduces the complexity of traditional metho d by 50% after 3 iterations. 5.C Complexit y of CR It is clear that CR has the low est complexit y of RBD algorithms: in the same condition, CR reduces the com- plexit y of traditional metho d b y 87% when user an- tenna num ber is 60 after 3 iterations. Having the BER p erformance in Section 4, CR is the b est algorithm in RBD algorithms and can substitute GMRES in M- MIMO. 6 Hardw are Arc hitecture for RBD Algorithms Computational pro cess of RBD algorithms is in tro duced in Section 3. T o further elaborate RBD algorithms, cor- resp onding hardw are arc hitectures are shown in Sec- tion 6.A and 6.B. Since GMRES algorithm maintains the same BER performance as CR algorithm with unaf- fordable computational complexity , GMRES algorithm is not hardw are friendly . Thus the implemen tation of GMRES is replaced by CR algorithm. A metho d to unify the hardw are design is also proposed in Section 6.C. Using the new design metho d, RBD algorithms can b e designed by only tw o basic mo dules. Unified arc hitectures of MINRES and CR are prop osed in Sec- tion 6.C to v alidate the design metho d. 6.A Hardw are Architecture of MINRES Algorithm As men tioned in Section 3.A, MINRES is the most basic algorithm of RBD algorithm, thus the hardw are archi- tecture of it is not very complex. Being divided into t wo units, Fig. 7 sho ws the architecture of MINRES algorithm. Preprocessing Unit computes the output of matc hed filter ˜ y and matrix A . Minimal Residual Al- gorithm Unit is the main unit of the architecture and it minimizes the residual norm in each iteration. In Fig. 7, ˜ y is denoted by y E and the symbol output is denoted b y s k +1 , where the index k is iteration time. 6.A.1 Pr epr o c essing Unit In this unit, matched filter mo dule computes ˜ y by ˜ y = H H y while MMSE filtering matrix A is computed b y Gram matrix mo dule. Since matrix A is Hermitian, M × M low er triangular systolic array is adopted to compute it. Each pro cessing elemen t (PE) p erforms a multiply- accum ulate (MAC) op eration with same inputs. 6.A.2 Minimal R esidual A lgorithm Unit In this unit, sym bol signals are stored and computed iterativ ely . Square module with r , m , s , α stores the corresp onding signals of each iteration. The hermitian of sym bol is given after hermitian conjugate module and mo dule with “/” means division operation, in whic h the input from downside is the divisor. Mo dule with “D” is the delay er which can provide the signal of last iter- ation for the algorithm. “Mo d” module computes the mo dulus of the input signal. A t the end of the iteration, output s k is the final sym b ol output of MINRES. 6.B Hardw are Architecture of CR Algorithm Being another RBD algorithm, CR has m uch lo wer com- putational complexity compared with traditional exact matrix inv ersion. As is shown in Section6.A.1, CR has b etter p erformance than MINRES. Thus CR p erforms w ell in terms of performance within RBD algorithms. Hardw are architecture of CR contains three parts: pre- pro cessing unit, conjugate residual algorithm unit, out- put unit. Same with that of Section 6.A.1, prepro cessing unit computes ˜ y and A . Conjugate residual algorithm unit is the main unit and it computes sym b ol signals iterativ ely to minimize residual norm as well. 8 Ch uan Zhang et al. Gram Matrix Matched Filter H H y D y E r s m e p A Hermitian Conjugate d D / / Mod -1 ¦ s E r r H s k D s k- 1 D r k- 1 r k m k § D e k e k- 1 D p k p k- 1 || e k- 1 || 2 Fig. 6: Hardw are architecture of CR metho d. s Prep roces sing Unit Out put D A - r ¦ m / Mod Gram Matrix Matched Filter y H H y E y k -A s k r k Ar k ||A r k || 2 Her miti an C on juga t e r k H Ar k ¦ r k s k+ 1 s k M i ni mal Resi du al Algorit hm Un it Fig. 7: Hardw are architecture of MINRES metho d. 6.B.1 Pr epr o c essing Unit F unctioning as a preliminary unit, this unit has the same arc hitecture with that in Section 6.A.1. 6.B.2 Conjugate R esidual A lgorithm Unit As the main computing unit of CR algorithm, this unit adopts similar functional mo dules with that in Sec- tion 6.A.2. Differen tly , with the output of prepro cess- ing unit, CR algorithm needs initialization, which is denoted by the input from the down side of the storage mo dule of vector r , p , e , m . Mean while, in this arc hi- tecture the left input of division module is dividend and the upp er input is the divisor of division op eration. 6.B.3 Output Unit With the output of CR algorithm unit, we provide the estimation of transmitted signal stored in s . When the iteration ends, final sym b ol output is denoted b y s E . 6.C Unified Hardw are Architecture Being RBD algorithm, MINRES and CR ha v e differ- en t arc hitectures. Th us in terms of implemen tation they are uncorrelated. Thanks to the special c haracteristic of RBD algorithm that the minimization of residual norm, RBD algorithms can b e designed b y a unified design metho d. In this part a design metho d to normalize the hardw are architecture of RBD algorithm is firstly in- tro duced and then unified arc hitecture of MINRES and CR are giv en. 6.C.1 Normalizing Design Metho d of RBD Algorithms The purp ose of the normalizing design metho d is to mak e the hardware design of RBD algorithms flexible and reusable, users can switc h detectors with existing hardw are resources as long as they wan t to. T o meet this purp ose, syn thesizing the characteristic of RBD al- gorithms, whic h is minimizing the residual norm, the normalizing design metho d is then applied. Consider the computation pro cess and hardware mo d- ule of RBD algorithms, this metho d takes tw o mo dules as basic mo dules: iterativ e mo dule and co efficien t mo d- ule. Iterativ e module can iterativ ely update the signal or compute the residual norm in each iteration. Another mo dule is coefficient mo dule whic h computes the coeffi- cien t of eac h v ector in computation pro cess. Hardw are arc hitectures of b oth basic modules are sho wn in Fig. 8. Iterativ e mo dule consists of t w o op eration unit, a m ultiplier and an accumulator, which p erforms a MAC op eration. Co efficien t module consists of t w o Hermitian conjugate module, a division module and a m ultiplier, Residual-Based Detections and Unified Architecture for Massive MIMO Uplink 9 y x a b a Iterative mo dule / Hermi tia n Con jugate Hermi tia n Con jugate q p n m c b Co efficient mo dule Fig. 8: Basic mo dules of unified hardware architecture. whic h pro vides the co efficien t for eac h iteration mo dule. Upp er input of this mo dule is the dividend and the input from downside is the divisor. Operation of eac h mo dule can b e denoted by y = x + ab , c = m H n p H q . (26) Ha ving these tw o basic mo dules, hardware arc hitec- tures of RBD algorithms can be unified. Besides basic mo dules, only some m ultipliers and delay ers are needed. Th us the flexibilit y of hardw are can b e impro v ed and the arc hitecture can b e reused for further usage. Those t wo basic modules can also b e used in some other itera- tiv e detection algorithms. T o v alidate the reasonability of this design method, unified hardware arc hitectures are giv en in Section 6.C.2 and 6.C.3. 6.C.2 Unifie d Ar chite ctur e of MINRES A lgorithm As the basic algorithm of RBD algorithms, MINRES do es not need man y basic modules, the unified archi- tecture of it contains tw o iterativ e mo dules and a co ef- ficien t mo dule. Input signal of this unified arc hitecture is also com- puted from Gram matrix mo dule and matc hed filter. By normalizing design method, MINRES algorithm adopts t wo iterative mo dules to store the residual r and signal s . Co efficient mo dule serves for the co efficient α . Aside from basic mo dules, unified hardw are arc hitecture of MINRES only has an additional m ultiplier. After the computation of iteration, symbol output is given as the output of an iterativ e mo dule. 6.C.3 Unifie d Har dwar e A r chite ctur e of CR Algorithm T raditional hardware arc hitecture of CR algorithm is kind of complex as shown in Fig. 6. After normalization, the arc hitecture is shown in Fig. 10. Unified hardware arc hitecture of CR algorithm con- tains four iterative modules and t w o co efficien t mo d- / Hermitian Conjugate Hermitian Conjugate Gram Matrix Matched Filter r ¦ s H H y A y E - Ar k Ou tput Fig. 9: Unified arc hitecture of MINRES algorithm. ules. Iterative modules are placed for the storage of sig- nal r , e , p and s . Co efficien t mo dules store the v alue of co efficien t α and β . Within eac h iteration, signal m is up dated by a multiplier and t w o delay ers in the archi- tecture store corresp onding signal of last iteration, as men tioned in Algorithm 3. Initialization of eac h signal is the upp er input of each mo dule. With the prop osed metho d in Section 6.C.1, hard- w are arc hitectures of RBD algorithms can be unified. F urthermore, the design metho d can also b e applied to other linear iterativ e detectors like CG. 7 Conclusion In this pap er, RBD algorithms are first proposed, in- cluding MINRES algorith m, GMRES algorithm and CR algorithm. Distinguished from most of other iter- ativ e linear detection algorithms, proposed RBD algo- rithms fo cus on the minimization of residual norm. Nu- merical results of differen t an tenna configurations and correlation conditions ha v e demonstrated the approx- imation to the p erformance of traditional matrix in- v ersion and the stability of algorithms, resp ectiv ely . In addition, computational complexit y of RBD algorithms are compared and the comparison with matrix inv ersion sho ws the complexit y reduction adv an tage of RBD al- gorithms. Finally hardware architectures of RBD algo- rithms are first given and the following proposed nor- malizing design metho d is adopted, then the unified hardw are arc hitectures of RBD algorithms are prop osed. Therefore, the prop osed RBD algorithms are of goo d p erformance, lo w complexity , and correlation robust- ness, which are fav orable for M-MIMO systems. F uture w ork will b e directed tow ards FPGA implemen tation of RBD algorithms and further optimization of RBD algorithms. Ac kno wledgements T o b e edited. 10 Ch uan Zhang et al. / Herm itia n Con jugat e Herm itia n Con jugat e / Hermitian Conjugate Hermitian Conjugate Gr am Ma trix Ma tched Fil ter e p r s m D D ¦ § - H H y A y E A r 0 A r k m k- 1 r k- 1 Out put Fig. 10: Unified arc hitecture of CR algorithm. A Deriv ation of Givens rotation Given the problem p = arg min k β e 1 − ˜ H V p k 2 , knowing that ˜ H V is a ( V + 1)-by- V matrix. It is shown that an ov er- constrained linear system of V + 1 equations for V unknowns is given and the minimum can be computed by QR decom- p osition [27]. An ( V + 1)-b y-( V + 1) orthogonal matrix Ω V and an ( V + 1)-by- V upp er triangular matrix ˜ R V such that Ω V ˜ H V = ˜ R V . Because of the c haracteristic of matrix ˜ H V and ˜ R V , they can b e denoted as ˜ H V +1 = ˜ H V h V +1 0 h V +2 ,V +1 , ˜ R V = R V 0 , (27) where h V +1 = ( h 1 ,V +1 , ... , h V +1 ,V +1 ) T . Premultiplying the Hessenberg matrix with Ω V , a nearly triangular matrix can b e yielded with zeros and a row with multiplicativ e iden tity as Ω V 0 0 1 ˜ H V +1 = R V r V +1 0 ρ 0 σ . (28) If σ = 0, this matrix w ould be triangular. Givens rotation [28] will remedy this as G V +1 = I V 0 0 0 c V b V 0 − b V c V , (29) where c V = ρ p ρ 2 + σ 2 and b V = σ p ρ 2 + σ 2 . (30) After the processing of Giv ens rotation, matrix Ω V can b e formed as Ω V +1 = G V Ω V 0 0 1 . (31) Mean while, a triangular matrix is yielded as Ω V +1 ˜ H V +1 = R V r V +1 0 r V +1 ,V +1 0 0 , (32) where r V +1 ,V +1 = p ρ 2 + σ 2 . Then given the QR decomp osition, the minimization prob- lem can b e solved by the transform that k ˜ H V p V − β e 1 k = k Ω V ( ˜ H V p V − β e 1 ) k = k ˜ R V p V − β Ωe 1 k . (33) Afterwa rds, using v ector ˜ g V to denote β Ωe 1 as ˜ g V = g V γ V , (34) where g V ∈ R V and γ V ∈ R . Finally , norm k ˜ H V p V − β e 1 k can b e denoted by k ˜ H V p V − β e 1 k = k ˜ R V p V − β Ω V e 1 k = R V 0 p V − g V γ V . (35) So vector p that minimizes the norm is p V = R − 1 V g V , (36) where vector g V can b e up dated easily and the minimization problem can b e solved. References 1. Erik Larsson et al. Massive MIMO for next generation wireless systems. IEEE Commun. Mag. , 52(2):186–195, 2014. 2. Juho Lee, Jin-Kyu Han, and Jianzhong Charlie Zhang. MIMO tec hnologies in 3GPP L TE and L TE-adv anced. EURASIP Journal on Wir eless Communic ations and Networking , 2009(1):1–10, 2009. 3. Ho on Huh, Giusepp e Caire, Haralab os C Papadopoulos, and Sean A Ramprashad. Achieving “massive MIMO” spec tral efficiency with a not-so-large n um b er of anten- nas. IEEE T rans. Wir eless Commun. , 11(9):3226–3239, 2012. 4. Lu Lu, Geoffrey Y e Li, A Lee Swindlehurst, Alexei Ashikhmin, and Rui Zhang. An ov erview of massiv e MIMO: Benefits and c hallenges. IEEE Journal of Se- le cte d T opics in Signal Pr o c essing , 8(5):742–758, 2014. 5. Quentin H Sp encer, A Lee Swindlehurst, and Martin Haardt. Zero-forcing metho ds for downlink spatial mul- tiplexing in m ultiuser MIMO channels. IEEE T r ans. Sig- nal Pr o c ess. , 52(2):461–471, 2004. Residual-Based Detections and Unified Architecture for Massive MIMO Uplink 11 6. Erik G Larsson. MIMO detection metho ds: Ho w they wo rk [lecture notes]. IEEE Signal Pro c ess. Mag. , 26(3), 2009. 7. Aravindh Krishnamoorthy and Deepak Menon. Matrix inv ersion using Cholesky decomp osition. In Pr o c. IEEE Signal Pro c essing: Algorithms, Ar chite ctur es, A rr ange- ments, and Applic ations (SP A) , pages 70–72, 2013. 8. F eng W ang, Chuan Zhang, Junmei Y ang, Xiao Liang, Xiaohu Y ou, and Shugong Xu. Efficien t matrix inv ersion arc hitecture for linear detection in massive MIMO sys- tems. In Pr o c. IEEE Digital Signal Pr o c essing (DSP) , pages 248–252, 2015. 9. Xiao Liang, Ch uan Zhang, Sh ugong Xu, and Xiaoh u Y ou. Coe fficien t adjustmen t matrix in version approach and ar- chitecture for massive mimo systems. In Pr o c. Inter. Conf. on ASIC (ASICON) , pages 1–4, 2015. 10. Michael W u, Bei Yin, Guohui W ang, Chris Dick, Joseph R Ca v allaro, and Christoph Studer. Large-scale MIMO detection for 3GPP L TE: Algorithms and FPGA implementations. IEEE Journal of Sele cte d T opics in Signal Pr o c essing , 8(5):916–929, 2014. 11. Linglong Dai, Xinyu Gao, Xin Su, Shuangfeng Han, I Chih-Lin, and Zhao cheng W ang. Low-complexit y soft- output signal detection based on Gauss–Seidel metho d for uplink multiuser large-scale mimo systems. IEEE T r ans. V eh. T e chnol. , 64(10):4839–4845, 2015. 12. Zhizhen W u, Ch uan Zhang, Y e Xue, Sh ugong Xu, and Xiaohu Y ou. Efficien t arc hitecture for soft-output mas- sive MIMO detection with Gauss-Seidel method. In Pr oc. IEEE C ir cuits and Systems (ISCAS) , pages 1886–1889, 2016. 13. Bei Yin, Michael W u, Joseph R Ca v allaro, and Christoph Studer. VLSI design of large-scale soft-output mimo de- tection using conjugate gradients. In Pr o c. IEEE Cir cuits and Systems (ISCAS) , pages 1498–1501, 2015. 14. Bei Yin, Michael W u, Joseph R Ca v allaro, and Christoph Studer. Conjugate gradient-based soft-output detection and precoding in massive MIMO systems. In Pr o c. IEEE International Workshop on Gre en Communic ations, p ar- al lel with IEEE GLOBECOM , pages 3696–3701, 2014. 15. Peng Zhang, Leib o Liu, Guiqiang Peng, and Shao jun W ei. Large-scale MIMO detection design and FPGA im- plementations using SOR method. In Pr o c. IEEE In- ter. Conf. on Communic ation Softwar e and Networks (ICCSN) , pages 206–210, 2016. 16. Xinyu Gao, Linglong Dai, Y uting Hu, Zhongxu W ang, and Zhao cheng W ang. Matrix inv ersion-less signal de- tection using SOR metho d for uplink large-scale MIMO systems. In Pr o c. IEEE Glob al Communic ations Confer- enc e (GLOBECOM) , pages 3291–3295, 2014. 17. Anlan Y u, Chuan Zhang, Sh unqing Zhang, and Xiaohu Y ou. Efficien t SOR-based detection and architecture for large-scale MIMO uplink. In Pr o c. IEEE Asia Pacific Confer enc e on Cir cuits and Systems (APCCAS) , pages 402–405, 2016. 18. Y e Xue, Ch uan Zhang, Sh unqing Zhang, and Xiaohu Y ou. A fast-conv ergent pre-conditioned conjugate gradien t de- tection for massiv e MIMO uplink. In Pro c. IEEE Digital Signal Pr o c essing (DSP) , pages 331–335, 2016. 19. J. Jin, Y. Xue, Y. L. Ueng, X. Y ou, and C. Zhang. A split pre-conditioned conjugate gradien t method for massive MIMO detection. In Pr oc. IEEE International Workshop on Signal Pr oc essing Systems (SiPS) , pages 1–6, 2017. 20. Shaoshi Y ang and La jos Hanzo. Fifty y ears of MIMO de- tection: The road to large-scale MIMOs. IEEE Commun. Surveys T uts. , 17(4):1941–1988, 2015. 21. Ab derrazek Ab daoui, Marion Berbineau, and Hichem Snoussi. GMRES interference canceler for doubly iter- ativ e MIMO system with a large n umber of antennas. In Pr o c. IEEE International Symp osium on Signal Pr o cess- ing and Information T e chnolo gy , pages 449–453, 2007. 22. Jean-Philipp e Kermoal, Laurent Sch umacher, Klaus I P edersen, Preben E Mogensen, and F rank F rederiksen. A stochastic MIMO radio c hannel mo del with experimental v alidation. IEEE J. Sel. Ar e as Commun. , 20(6):1211– 1226, 2002. 23. Y ousef Saad. Iter ative metho ds for sp arse line ar systems . SIAM, 2003. 24. Ab derrazak Ab daoui, Marion Berbineau, and Hichem Snoussi. GMRES interference canceller for MIMO relay net work. In Pro c. IEEE GLOBECOM , pages 1–5, 2008. 25. Heinrich V oß. An Arnoldi method for nonlinear eigen- v alue problems. BIT numeric al mathematics , 44(2):387– 401, 2004. 26. David Chin-Lung F ong and Michael A Saunders. CG v ersus MINRES: An empirical comparison. SQU Journal for Scienc e , 17(1):44–62, 2012. 27. Dirk W ubben, Ronald Bohnke, V olker Kuhn, and K-D Kammey er. MMSE extension of V-BLAST based on sorted QR decomp osition. In Pr oc. IEEE V ehicular te ch- nolo gy c onfer enc e (VTC)-F al l. , volume 1, pages 508–512, 2003. 28. F uyun Ling. Givens rotation based least squares lattice and related algorithms. IEEE T r ans. Signal Pr oc ess. , 39(7):1541–1551, 1991.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment