5Gperf: signal processing performance for 5G
The 5Gperf project was conducted by Huawei research teams in 2016-17. It was concerned with the acceleration of signal-processing algorithms for a 5G base-station prototype. It improved on already optimized SIMD-parallel CPU algorithms and designed a…
Authors: ** - Huawei Central Research Institute (CRI) Wireless Technology Lab - Huawei Central Software Institute (CSI) Paris Team *(구체적인 개인 저자명은 논문에 명시되지 않음)* --- **
5Gperf: signal processin g performance for 5 G Improved algorithms and SIMD software acceleration for base-stations Gaétan HA INS, Wijnan d SUIJLEN, LIA NG Wenli ang and WU Zixu Technical Report PA DAL- TR - 2018 -2 2018 -2-7 Huaw ei T echnolo gies 2012Labs/CSI/DPSL/ PA DAL Hu a we i P a ri s R& D C en t er 5Gpe r f: signal proc es sin g performance for 5G improved algorithms and SIMD software acceleration for base-stations Gaétan Hains and Wijnand Suijlen Huawei Pa rallel and Distrib uted Algorith ms Lab. Paris Re search Center, Boul ogne-Billanc ourt, France LIANG Wenliang and WU Z ixu Huawei 5G Research De partment Shanghai, P. R.C. Abstract —The 5Gperf project was conducted b y Huawei researc h te ams in 2016-17. I t was concerned with the a cceleration of signal-process ing alg orithms for a 5 G base -station prototype. It i mproved on al ready optimized SIMD-par allel CPU algorithms and des igned a new software tool f or higher programmer productivity w hen converting MATLAB code to optimize d C. Keywords— 5G wireless comm unication system s, software acceleration, s ignal processing algorithms, SIMD CPU opera tions. I. I NTRODUCTIO N As a leading vendo r of wireless telecommunicatio n systems , Huawei/CRI (Central Research Institute)’s Wirele ss Technology Lab is developing a 5G base-statio n prototyp e and has demons trate d its very hi gh performance b ased on MI MO technology [1,2,3]. Base-sta tion p ow er consumption and throughpu t critically depends on the efficiency of the signal- processing s y stem. Its algo rithms are desig ned by w ireless signal experts, usually in MATLAB and then have to be converted to high-performanc e sequential C, a la bo r-intensive process of up to one man*month per new a lgo rithm or pipeline module version. The 5Gperf project has been a collabo ration with Huaw ei’s CSI (Central So ftware Institute)’s Paris team fo r improv in g k ey algorithms and designing a software too l to improv e human productiv it y in high-perform ance C codes. This paper su mmarizes the pro ject and its results. II. 5G BASE STATION PROTOTY PE A ne w 5G base-station p r o tot y ped, b uilt and tested. Its signal processing system is a pair of two algorithm pipelines for pro cessing sig nal packets as shown in figure 1. Each s ta ge in the pip eline implements a spec ialized algorithm and the system throughput is limited b y the speed of each one. It currently runs o n a h ardware configuratio n of five Hua w ei E9000 blade s er v ers connected by Infiniband. One pipeline instance per CPU core is runni ng in con tinuous mode. Each algorithm h as been carefully designed in MATLAB to maximize signal quality. It has then been converted to optimized sequential C code to become the compute kernel that implements the correspondi ng pipeline stage. This process is too la bor-intensive a nd the result s not always op timal because of the co mplex interplay betw een h ardware a rchitecture and relatively -small c o mpute kernels. The 5Gperf project has scrutinized some performance-criti cal alg orithms and prov ided a new softwa re tool for improving the MATLAB -to-C conversio n’s productiv it y and code performa nce. Fig. 1. The ba se -station’ s signal -processing pipeline Figure 2 illust rates o ne o f the prototype fiel d-tes ts [ 9]. The top-rig ht image shows 26 cell-phon e stands like the one in the bottom-rig ht image. The y we re all c o nnected to the antenna array shown in the top-left image. The antenna array is less than 1m wide. I t features 4 x 8 x 2 = 64 transceivers operating ov er a radio frequency band of 100MHz. Fig. 2. The 5G base- station prototype Fig. 5. Beam fo rming Turbo dec oding has also been re-implemented , tested and given acceleration fac to rs of 1.7 to 1.9 for its pipeline sta ge. The correspon ding number of CPUs required for a paral lel multi-stream execution of m ultiple instances of turbo dec ode has been red uced from 20 to 12, with similar energy savings. IV. I MPROVING PROGRAMME R PRODUCTIVITY : A GENERIC TOOL The 5Gperf proj ect has also design ed a nd implemented a new software too l called the optimiz er so that programmers can convert performance-naive MATLAB code t o opt imized C i n much less time and in a r eliable fas hion. It pro vides high prog ra mm er pro ductivity, highest-possible pe rformance for the predefined operations it applies and portability to ARM architectures t hro ugh Numscale’s bSIM D librar y [4, 5]. The optimizer’s principle and des ign is s ummarized by figu re 6. The application developer in c harge of the signal-processing pipeline’s al go rithms can transform a high-level a lgo rithm descriptio n to o ptimized and po rtable C code by an notating critical po rtions of his code. Each such c ode se g ment correspon ds to an algo rithm buildi ng blo ck available in t he optimizer’s database. The optim izer tool then re places the annotatio ns by the mos t efficient version of the building block on a giv en matrix size and target architectu re. Fig. 6. The 5Gperf software tool The op timizer’s parser l ooks fo r pragma instructions in the source C code. /// PRA GMA INCLU DES This line tells the optimizer to put e xtra includes requi red by building blocks where it appears. /// PRA GMA FUNCTIO NS This line tel ls the op timizer to put extra co des required b y building blocks where it appears. /// PRA GMA BEGIN ke , p2, p3, . . . , pn This line tells the optimizer to insert o ptimized c ode produced by the kernel named ke . The follo wing example is v alid input for optimi zer.py : 1. #include 2. #include 3. /// PRAGMA IN CLUDES 4. /// PRAGMA F UNCTIONS 5. int main() { 6. std::cout << "b egin" << s td::endl; 7. /// PRAGMA BEGIN algo, b, c, 8. std::cout << "BAD " << std::e ndl; 9. /// g, h 10. std::cout << "BAD" << std::e ndl; 11. /// PRAGMA END 12. std::cout << " e nd" << std:: end l; 13. return 0; 14. } On the a bov e example the optimizer wil l re place lines 6-10 by the best a lgorit hm give n by the executa ble bb s/algo wit h parameters b , c , g and h . The choice of best al go rithm depends on vector sizes for SIMD libraries, loops and target architectur e. Th e parameters are passed to the bbs/algo executable a s command li ne arguments. The output will look like this: 1. #include 2. #include 3. 4. #ifdef OPTIMIZER_ ACTIVATED 5. // Extra i ncludes here requir ed by building blocks 6. // Th is block o f code r e places th e original /// PRAGMA INCL UDES 7. #endif 8. 9. #ifdef OPTIMIZER_ ACTIVATED 10. // Extra c ode here require d by building blocks 11. // Th is block o f code r e places th e original /// PRAGMA FUNCTIO NS 12. #endif 13. 14. int main() { 15. std::cout << "begin" << s td::endl; 16. #ifndef OPTIMIZE R_ACTIVATED 17. std::co ut << "BAD" << st d : :endl; 18. std::co ut << "BAD" << st d : :endl; 19. #else 20. // OPTIMI ZED CODE HE R E PROPERLY IN DENTED 21. #endif // P RAGMA BEGIN line 6 22. std::cout << "end" << st d ::e ndl; 23. return 0; 24. } V. C ONC LUSIO NS Even highl y optimized signal-processin g code c an be improv ed b y moderate f actors for such a critical application as 5G signal -processing. Bu t the process is very work intensiv e, especially because the co mpute tasks ar e small and memor y access is a limiting factor. So a special type o f generic prog ra mm ing is needed. Programmer productivity sh ou ld be multiplied and useful speedups (20% to 100% acc eleratio ns) can be obt ained on all the s ignal-processing pipeline. A CKNOWLEDGME N T Guillaume Quintin and S ylvain J ubertie o f Numscale wer e the main devel opers o f the optimizer software too l and contributed to a lgorit hms. The a uthors thank Yang Gan g hua and Bill McCo ll fo r initiati ng and suppo rting t he 5Gperf proj ect. Antoine P etitet, Alain Dom inguez and Cho ng Li were invo lved i n s ome of the technical de cisions and made suggest ions that contributed to the pro ject’s success . R EFERENCES [1] Y. Baehr, E. Ben-Dror, S. Chai, M. Komm, W. Liang, V. Mirkis, H. Moushkatel, M. N aaman and D. Touitou, “ADS: A Framework for Running 5G Radio Ac ce ss Netwo rk in the Cloud”, IEEE Con ference on Standards for Communications and N etworking (CSCN), 2016. [2] W. L ian g, Y. W ang, B. Li, W. Wang, J. Shen g, Y. Han, H. Shen, L . G u, Y. Sa ito, A. Benjebbour, Y. Ki s hiyama, X. Wang, X. Hou and H. Jiang, “Ultra-High-Throughput Massive MIMO Field Trial over Rad io Computing Arc hitecture with Peak Spectrum Efficiency of 7 9.82 bps/Hz”, IEEE International S ympo sium on Personal, Indoor and Mobile Radio Communication s (PIMRC) 2 017. [3] W. Liang, Y. Wang, K. Song, W. Wang, B. L i, X. Wang, H. Shen, S. Chai, S. Zhang, D. Han, L. Gu , Y. Saito, A . Benjebbour, Y. Kishiyama, “Field Trial Investigation of Wi red and Wireless C alibration Schemes for Real-time Massi ve MIMO Prototype”, IEEE Vehicu lar Technology Conference (VTC2017) , 2017. [4] bSIMD La programmation ve ctorielle si mplifiée efficace e t uni ver selle. Numscale SARL 2017 [Online]. Ava ilable: www.numscale.com [5] P. Estérie, M. G aunard, J . Falcou, J. T. Lapresté and B. Rozoy , “Boost.SIMD: G eneric pro gramming for p ortable SIMDization”, 21st International Conference on Parallel Architectures and C ompilation Techniques (PA CT) , Min neapolis, MN , 2012, pp. 431-432. [6] A. Fog, Instruction tables: Lists of instruction la t encies, throughputs and micro-operation b reak downs for Intel, AMD and VIA CPUs. Technical University of De nmark, 2017-5-2. [Online]. Available: http://www.agner.org/optimize/instr uction_tables.pdf [7] E. Dalhman, S. Parkv all, J. Skold. 4G LTE /LTE-Advanced for Mobile Broadband. Second Edition, Ac ademic Press, 2013. [8] S. Sesia, I. Toufik and M. Ba ker . LTE Th e UMTS Long Term Evolution. Second Edition, Wiley, 2011. [9] X. Wa ng, X. Hou, H. Jiang, J. Qiu, H. Shen, C. Tang, T. Tian, A. Benjebbour, Y. Saito, Y. Kishiyama and T. Kashima, “Large s cale experimental tria l of 5 G mobile communication systems – TDD massive MIMO with l inear a nd non-linea r precodi ng schemes”, IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PI MRC): Workshop:Inclusive Radio Communication Networks for 5G and Bey ond (IRACON2016) 2016.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment