Multiplierless DFT Approximation Based on the Prime Factor Algorithm

Multiplierless DFT Approximation Based o n the Prime F actor Algorithm L. P ortella * F . M. Bayer † R. J . Cintra ‡ Abstract Matrix approxi mation methods have successfully prod uced efﬁcient, low-complexity approximate transforms for the dis- crete cosine transforms and the d iscrete F ourier transforms. For the DFT case, literature archives approximations operating at small power-of-two blocklenghts , such as { 8, 16, 32}, or at large bloc klengths, such as 1024, which are obtained by means of the C o oley-Tukey-based approximation relying on the small-blocklength approximate transforms. Cooley-Tukey-based approxi mations inherit th e intermediate multiplications by twiddled factors which are usually not appr oximated; other- wise the ef fected error propagation would prevent the overall good performance of the approximation. In this context, the prime factor algorithm can furnish the necessary framework for deriving fully multipl ierless DFT approxi mations . W e in- troduced an approximation method based on small prime-sized DFT approxima tions which entirely eliminates intermediate multiplic ation steps and prevents internal error propagation. T o demonstrate the prop osed method, we design a f ully mul- tiplier less 1023-point DFT approxi mation based on 3-, 11- and 31-point DFT app roximations . The performa nce evaluat ion accordi ng to popul ar metrics showed that the proposed approximations not on ly presented a signiﬁcantly lower arithmetic complexity but also resulted in smaller approxi mation e rror measurements when compared to competing meth ods. Keywords F ast algorithms, approxi m ate DFT , multipli cative complexi ty , prime factor algorithm. 1 Introduction The discrete F ourier transform (DFT ) is a central tool in signal processing [9], ﬁnding applications in a very l a rge number of contexts, such as spectra l a nalysis [74], ﬁltering [64], data compression [69], a nd fast convolution [11], to cite a few . The widespread usage of the DFT is due to its rich physical interpretati o n [10] and the existence o f efﬁcient methods for its computation [61]. Although the direct co mputation of the N -point DFT is an ope ration in O ( N 2 )—which is prohibitively expensive [35]—efﬁcient a lgorithms [10, 15, 61] collectively known a s fast F ourier transforms (FFTs) [6] ar e capab le o f evaluating the DFT with much less numerical op erations placing the resulting complexity in O ( N log N ) [15]. Despite such substantial reduction in com p lexity , the remaining operations ca n still be signiﬁcant in contexts where severe restrictions in computational power a nd/or energy autonomy a re present [48]. Such restrictive conditions arise in the framework of wire less communicatio n [54, 85], embedded systems [4 4, 49], and Internet o f Things (IoT) [50, 73]. Inspired by the successful methods for appr oximating the discrete c osine transform [5, 7, 8 , 12, 19 , 21, 22, 3 3], in [75], a suite of multiplierless DFT approximatio ns was derived for N = 8, 16 , and 32 [2, 20, 46, 52, 53, 76]. These DFT approx- imations were dem onstrated to provide spec tral estimates cl o se to the exact DFT computation, while requiring only 2 6, 54, and 144 additions for real -valued input, respectively [20 , 7 5]. Broadly , ﬁnding approximate transforms that closely match the per formance of the exact ones is a hard task, because it is often posed as an integer non-linear matrix opti- mization proble m with a lar ge number of variables [27]. Thus , as N increases, obtaining good approximations becom e s an exceedingly dema nding problem to be solved [65]. As a consequence, designers of DFT approximations make use of in- direct methods such as (i) mathematical relationships between small-sized and large-sized DFT matrices [15], (ii) matrix functional recursions [62], and (iii) ma trix decompositio ns [70]. The systematic derivation of good DFT approxima ti o ns * L. P ortella was with the Department of Statistics, Universidade Estadual de Campinas, Campinas 13083-859, Brazil and the Industrial Signal Processing Laboratory , Universidade Federal de Pernambuco , Caruaru, Brazil (e-mail: luanps@unicamp .br). † F . M. Bayer is with the Department of Statistics and LACESM, Universidade Federal de Santa Maria, Santa Maria 97105-900, Brazil and the Department of Mathematics and Natural Sciences, Blekinge Institute of T echnology , Karlskrona, 37179, Sweden (e-mail: ba yer@ufsm.br). ‡ R. J . Cintra is with the Industrial Signal Processing Laboratory , Department of Technology , Universidade F ederal de Pernambuco , Caruaru 55014-900, Brazil (e-mail: rjdsc@de.ufpe .br). 1 for large blo ck sizes is still a n open problem and technical advances occ ur in a c ase-by-case fashion due to the inherent numerical di fﬁculties of ﬁnding integer ma trices that ensure competitive performance. F ollowing such an indirect approa ch, the 32-point DFT approximatio n discussed in [20 , 75] was employed as the fun- damental block of the 1024-point DFT ap proximation introduced in [53]. When employed as a fundamental block to obtain lar ger transforms, the DFT is referre d to as a ground transformation. The methodo logy describ ed in [53] revisits the Cooley-Tukey algor i thm and effectively extends a given 32-point DFT a pproximation resulting in a 32 2 -point DFT approximation. This extension stems from the fact that the Cooley-Tukey a lgorithm can be fo r mulated accor ding to a two-dimensional mapp i ng such that the computatio n of the 1024 -point DFT is performed by 2 × 32 instantiations of the 32-point DFT [25]. However , even considering multiplierle ss 32-point DFT approximations , the resulting 1 024-point DFT approximations proposed in [53] are not multiplicati o n-free. Indeed, the Cooley-Tukey-based appro ximations inherit the twiddle factors present in the exact form ulation of the trad i tional Cooley-Tukey algorithm [61]. Thus , the ﬁnal resulting arithmetic complexity of the best Coole y-Tukey-based 1024-point DFT ap p r oximation in [53] is 2883 rea l multiplications and 2 5 155 a dditions , which represent a pproximately a 72 % red uctio n in terms of real multiplication and an 18% reduction in terms of a dditions when compared to the exact Cooley-Tukey algorithm [6]. The goal of the present paper is to propose a framewor k for de r iving la rge DFT approxima tions that are fully multi- plierless. Although , in ﬁxed-point arithmetic, any m ultiplication can theore tically be expressed as a sum of dyadic terms , we ad o pt the term multiplierless in a more restrictive and practical sense, consistent with [12], where the minimum num- ber of add e rs is sought. In this work, matrices elements assumes values in { 0 , ± 1 , ± 1 2 } and eventual scaling constants have their dyadic repre sentation limited to at most two additions. Such multiplierlessness criterion emphasizes that the proposed approximations re ly only o n addi ti o ns a nd bit-shifting opera tions , aiming a t the minimum number o f ad d e rs , so that future impleme ntations can achieve re ductions in chip ar ea, power co nsumption, a nd delay [2]. F or such an end, we aim a t exploiting the prime fa ctor algorithm (PF A) [31 , 78], also known a s the Good-Thomas al gor i thm. The PF A has distinct number-theoretical properties cap a ble of perfor ming the DFT computation without intermediate computations such as twiddle factors , which the traditional radix-2 algorithm heav i ly rely on. This a pproach allows the construction o f scalable, multiplierless DFT app r oximations, signiﬁcantly reducing arithmetic complexity while maintaining competitive approximation ac curacy . Due to the very number -theoretica l nature of the PF A, the resulting transform blocklength cannot be a p o wer -of- two [6 ]. However , the design of non-power-of-two DFT methods [57, 80, 84] is a promising topic in (i) beamforming a nd direction of arr i va l (DOA) estimation [1, 60, 68, 71, 79, 81]; (ii) 5G broadc a sting which usually present 2 n · 3 m -point ( n ≤ 11) input signals [24]; (iii) hybrid algorithms [42, 72]; (iv) when the length cannot be chosen, such as in digital ra dio technology [23 , 39, 43, 47]; (v) channel eq ualization [13, 45]; a nd (vi) scenarios that presents ﬂexible or a daptive transform lengths as L TE [17] and MIMO-OFDM systems [8 3 ]. F or example s, FFT blocklengths like 128 , 512 , 1024, and 204 8, which are usually implemented with radix-2 algorithms , could be re placed by alternative lengths such as 130 (2 × 5 × 13 ), 510 (2 × 3 × 5 × 17), 1023 (which i s adopted in this work to demonstrate the proposed method), and 2046 (2 × 3 × 11 × 3 1), which are co m patible with the PF A. Although power- of-two DFT algorithms are more common, in the e nd, the tota l number of operations might play a decisive role. As a consequence, we separa ted the particular blocklegth N = 1023 as a rep r esentative ca se study to highlight the ben- eﬁts of the propo sed approach in contrast with the state-of-the-art in la rge-scale DFT approxima tions introduced i n [53]. The co mparison justiﬁed by the fact that in [53 ] it is provide d a practical implementation benchmark for low-power appli- cations such as beam forming , reinforcing the re l e vance o f a d irect comparison. Other multiplierless strate gie s ba sed on twiddle-factor appro ximations , such as those using sum-of-powers-of-two (S OPO T) coe fﬁcients [16] and stream ing multi- plierless FFTs (SMUL-FFT) [58], are also effective in spec i ﬁc contexts . However , these methods tend to be more suitab le for sma ll-to-medium transform sizes . As the blocklength increa se s, the number of di stinct twiddle-facto r matrices to be ap- proximated grows rapidly , shifting the computational burden from multiplic a tions to managing a la rge number o f additive operations or ﬁxed-pattern m ultipli e rs . This can signiﬁcantly reduce their practic a l efﬁciency in large-scale settings . 2 It is worth emphasizing that, although this work focuses on the 1023-po i nt DFT approximation to enable a direc t comparison with the method in [53], the proposed approa ch i s built upon the PF A, which naturally inherits scalability to any b locklength that can be fac to rized into coprime fa ctors. This scalab i l ity stems from the Chinese Remainder Theorem, which underlie s the Good–Thomas mappi ng a nd enables the decomposition of large transforms into smaller ground trans- forms. Furthermore, it is also p ossible to develop hybrid algorithms where radix-2 process the power -of-two factors, while the proposed PF A-based multiplierless method computes the remaining coprime factors . The paper is organized a s follows. Se ction 2 provides an overview of the DFT and the PF A. Section 3 descri b es the methodology to obtain the proposed DFT approximation. In Section 4, the proposed approximations a nd algorithm a r e detailed. In Section 5, the proposed method is emplo yed to propose a multiplierless 1023-point DFT approximation. F ast algorithms and a rithmetic comp lexity are presented in Section 6. Section 7 repo rts the error analysis o f the appro xima- tions . In Section 8, c onclusions are summarized. 2 Mathematical backgroun d In this section, we review the DFT ma thematical ba ckground and b rieﬂy describe the prime factor algori thm. 2.1 Deﬁnition of the DFT The DFT is a linea r tra nsforma tion that ma ps an N -point discrete signal x = [ x [0 ] , x [1] , . . . , x [ N − 1]] ⊤ into an output signal X = [ X [0] , X [1] , . . . , X [ N − 1]] ⊤ by mea ns of the following expression [6]: X [ k ] , N − 1 X n = 0 ω n k N · x [ n ] , k = 0 , 1 , . . . , N − 1 , (1) where X [ k ] is the k th DFT coefﬁcient, ω N = e − j 2 π N is the N th ro o t of unity , and j , p − 1. Although, the input signal x may be r eal or complex, this work focuses on the general ca se of co mplex-valued inputs. The DFT can also be expressed in ma trix forma t according to the next expression: X = F N · x , where F N is the DFT ma trix deﬁned by F N =              1 1 1 . . . 1 1 ω N ω 2 N . . . ω N − 1 N 1 ω 2 N ω 4 N . . . ω 2( N − 1) N 1 ω 3 N ω 6 N . . . ω 3( N − 1) N . . . . . . . . . . . . . . . 1 ω N − 1 N ω 2( N − 1) N . . . ω ( N − 1)( N − 1) N              . 2.2 Prime F act or Algorithm Comparable to the mor e popular Cooley-Tukey FFT [6], the PF A is a factorization-based FFT cap a ble of computing the N -point DFT , where N = N 1 × N 2 , with N 1 and N 2 being relatively prime, i.e., gcd( N 1 , N 2 ) = 1. The method i s based on a number -theoretica l re-indexing [61] of the input signal coe fﬁcients into a two-dimensional array [64] which is based on the Chinese re mainder theorem [6]. The PFA is summar ized acco rding to the following description [6, F ig. 3.8]: (i) Obtain n 1 and n 2 that satisfy ( n 1 · N 1 + n 2 · N 2 ) mod N = 1 [61]; 3 (ii) Map x into a block of size N 1 × N 2 according to the following 1D to 2D re arrangement of elements: map                     x [0] x [1] x [2] . . . x [ N − 1]                     =         x [0] x [ r ] . . . x [( N 2 − 1) r ] x [ s ] x [ r + s ] . . . x [ r + ( N 2 − 1) r ] . . . . . . . . . . . . x [( N 1 − 1) s ] x [( N 1 − 1) r + s ] · · · x [( N 1 − 1) s + ( N 2 − 1) r ]         , where r = N 1 · n 1 and s = N 2 · n 2 ; (iii) Compute the N 2 -point DFT of e ach column of the 2D array obtained in Step 2); (iv) Compute the N 1 -point DFT of e ach row of the resulting 2D array from S tep 3); (v) Reconstruct the vector X from the resulting block according to the following mapping: invmap                 X [0] X [ N 1 ] · · · X [( N 2 − 1) N 1 ] X [ N 2 ] X [ N 1 + N 2 ] · · · X [ N 2 + ( N 2 − 1) N 1 ] . . . . . . . . . . . . X [( N 1 − 1) N 2 ] X [( N 1 − 1) N 2 + N 1 ] · · · X [( N 1 − 1) N 2 + ( N 2 − 1) N 1 ]                 =           X [0] X [1] X [2] . . . X [ N − 1 ]           . All index operations are performed i n modulo N arithmetic, ensuring the correct size of the arrays . The algorithm c an be synthesized as follows: X = invmap µ F N 1 · h F N 2 · (map( x )) ⊤ i ⊤ ¶ . (2) Notice that if N 1 or N 2 can b e factored into relative primes, then the algorithm ca n be reapplied. The N 1 - and N 2 -point transformations a re referred to as ground transformati o ns . 3 Approximate DFT methodology Being alternatives to the exact transformations, approximate transforms po ssess a low com p utatio nal cost and provide similar ma thematical properties and p e rformance to their e xact counterparts. In this context, an a pproximate DFT matrix ˆ F ∗ N can be derived by solving the following o ptimization prob lem: ˆ F ∗ N = a r g mi n erro r ˆ F N ( ˆ F N , F N ) , (3) where e r ror( · ) represents the adopted error measure and ˆ F N is a candidate approximati on obta ined from a suitable search space. Approximate transforms can be derived fro m low-complexity matrices [19] a ccording to a n orthogonalization process referred to as polar deco mposition [3 7]. Such approa c h consists of two matrices: a low-complexity ma tr i x and a r eal- valued dia gonal ma trix. It is importa nt to note that an a uxiliary method is required to generate the low-complexity matrix because the polar decomposition alone d o es not pro vide such a matr ix. If the polar dec o mposition is applied dire ctly to the exact DFT matrix or to an a lready orthogonal approximation, the resulting diagonal matrix becomes the id e ntity matrix. The a uxiliary method to obtai n the low-complexity matrix is presented in the next section. 4 A candidate approximation ˆ F N for the exact transformation F N can be written as: ˆ F N = p N · S N · T N , (4) where T N is a low-complexity m a trix and S N is a diagonal ma trix expressed by S N = dia g Ã r h diag ³ T N · T H N ´i − 1 ! , (5) being diag( · ) a function that returns a diagonal matrix, if the argument is a vector; or a vecto r with the diagonal elements, if the argument is a matrix, the superscript H denotes the Hermitian operation [70], a nd p · is the matrix square root operation [38]. Therefore, a suitable choice of T N is central to the a b ove approac h. Thus , from this point onward, we focus on the derivation o f T N . F or simplicity of notation, the constant p N is absorbed into the matri x S N as follo ws ˆ F N = ˆ S N · T N , (6) where ˆ S N = p N · S N . 3.1 Search Space The low-comp lexity matrices T N are taken from the search space given by the matrix space M N ( P ), which is the set of all N × N matrices with entries over a set of low-complexity multipliers P . P o pular choices for P are { − 1 , 0 , 1 } and { − 1 , − 1 2 , 0 , 1 2 , 1 } , which co ntain only trivial multipliers [6 ]. The set M N ( P ) can be extremely large. F or instance, M N ( P ) contains 3 64 ≈ 3 . 43 × 10 30 elements (distinct matrice s) for N = 8 and P = { − 1 , 0 , 1 } . Therefore, we propose as a wor king search space, a subset of M N ( P ) given by the expansion factor methodology [12]. Thus , lo w-complexity matrices T N can be generated accor ding to the following expression: T N = g ( α · F N ) , (7) where g ( · ) is an entry-wise integer matrix function, such as rounding , truncation, ce iling , and ﬂoor functions [1 9] and α is a real number referred to as the e xpa nsion factor [55]. T o ensure that the integer fun cti o n g ( · ) returns only values within P , the values of α are j udiciously restricted to an interval D given by α min ≤ α ≤ α max , (8) where α min = inf { α ∈ R + : g ( α · γ max ) 6= 0 } and α max = sup { α ∈ R + : g ( α · γ max ) = max( P ) } , being γ max = max m , n ( |ℜ ( f m , n ) | , |ℑ ( f m , n ) | ) and f m , n , the ( m , n )th entry o f F N . The symmetries of F N allow us to re strict the analysis to α ≥ 0 and since the entries of F N are bounded by the unity , γ max = 1 . 3.2 Optimization Problem and Objective Function The genera l optimization problem shown in (3) can be formulated as α ∗ = arg m in error α ∈ D ( ˆ F N , F N ) , (9) employing (6) α ∗ = arg m in error α ∈ D ( ˆ S N · T N , F N ) , (10) 5 and rewritte n applying (7) α ∗ = arg m in error α ∈ D ( ˆ S N · g ( α · F N ) , F N ) . (11) Therefore, the low-complexity matrix is obta ined by T ∗ N = g ( α ∗ · F N ) , (12) and the optimal a pproximation is furnish ed b y ˆ F ∗ N = ˆ S ∗ N · T ∗ N , (13) where ˆ S ∗ N stems from T ∗ N as deta iled in (5), mutatis mutandis . Now we aim at specifying the error function in (11). As shown in literature [52, 66, 77], usual choice s for such function are: (i) the tota l er r or energy [21]; (ii) the mean absolute percentage error (MAPE) [28]; and (iii) the deviation from orthogonality [22, 29]. These metrics are described belo w . (i) The total error energy is deﬁned by ǫ ( ˆ F N ) = π · || F N − ˆ F N || 2 F , where || · || F represents the Frobenius norm [82]; (ii) The MAPE of the transformation matrix is obtained by M ( ˆ F N ) = 10 0 · 1 N 2 · N X m = 1 N X n = 1 ¯ ¯ ¯ ¯ ¯ f m , n − ˆ f m , n f m , n ¯ ¯ ¯ ¯ ¯ , where ˆ f m , n is the ( m , n )th entry of ˆ F N ; (iii) The deviation from orthogonality [22] is deﬁned by: φ ( ˆ F N ) = 1 − || d iag( ˆ F N · ˆ F H N ) || F || ˆ F N · ˆ F H N || F . Small values of φ ( · ) indicate pro ximity to orthogonality . Orthogonal matrices have null deviatio n. Combining the a bove error functions in a single optimization pr o blem, we obtain the follo wing m ulticriteria problem [4, 27]: α ∗ = a r g mi n α ∈ D n ǫ ³ ˆ S ∗ N · g ( α · F N ) ´ , M ³ ˆ S ∗ N · g ( α · F N ) ´ , φ ³ ˆ S ∗ N · g ( α · F N ) ´o . (14) 4 Multiplierless Prime F actor Approximation In this section, we fo r malize the mathematica l structure of the a p proximate DFT based on the pri m e factor algorithm. Additional to the m ain structure, we de scr ibe two variations of the method: (i) the unscaled approximation and (ii) the hybrid appr o ximation. 6 4.1 Mathematical Deﬁniti on Under the assumption o f the PFA, we c o mpute the N -point DFT approximation a s follows ˆ X = invmap µ ˆ F ∗ N 1 · h ˆ F ∗ N 2 · (map( x )) ⊤ i ⊤ ¶ , (15) where ˆ F ∗ N 1 and ˆ F ∗ N 2 are appro ximations of F N 1 and F N 2 , respectively (cf . (2)). If the approxima ti o ns used in (15) admit the format expressed in (1 3), then the ab ove equation can be rewritten as ˆ X = invmap µ ˆ S ∗ N 1 · T ∗ N 1 · h ˆ S ∗ N 2 · T ∗ N 2 · (map( x )) ⊤ i ⊤ ¶ . (16) Notice that ˆ S ∗ N 1 and ˆ S ∗ N 2 are real diagonal m atrices that can be factored out from the mapping ope rator as follows: ˆ X = ˆ S · invmap µ T ∗ N 1 · h T ∗ N 2 · (map( x )) ⊤ i ⊤ ¶ , (17) where ˆ S is a dia gonal matr i x given by ˆ S = diag £ invmap ³ diag( ˆ S ∗ N 1 ) · diag( ˆ S ∗ N 2 ) ⊤ ´ ¤ . 4.2 Unscaled Approximation Because the matrix ˆ S is diagonal, its role in the approximate DFT computation consisits of scali ng each spectra l compo- nent. De p ending on the context in which the DFT i s appli e d, the scaling ca n be embedded, absorb ed, par a llel computed, or even neglected when the unscaled spectrum is suf ﬁcie nt [30, 41, 67]. The unscaled N -point DFT approximati o n is ob ta ined by ˜ X = invmap µ T ∗ N 1 · h T ∗ N 2 · (map( x )) ⊤ i ⊤ ¶ . (18) Thus , we have the fo llowing relatio nship between (17) and (18): ˆ X = ˆ S · ˜ X . (19) 4.3 Hybrid Approximations It might be advantageous to approximate only part o f the DFT computatio n instead of the entire transform. This a pproach called hybrid algor i thms allows for a balance betwee n computatio nal efﬁciency and acc uracy , targeting speciﬁc com ponents of the computation for approximatio n. In this context, we a lso provide two hybrid a l gorithms approximating only part of the DFT computation. First, we keep the row-wise N 1 -point DFT exact while the column-wise N 2 -point DFT is approximated. The diagonal matrix ˆ S can also be fa ctored out in the hybrid algorithms . Thus, this algorithm is given by ˆ X 1 = ˆ S · invmap Ã ½ F N 1 · h T ∗ N 2 · (map( x )) i ⊤ ¾ ⊤ ! , (20) where ˆ S = diag £ invmap ³ 1 N 1 · diag( ˆ S ∗ N 2 ) ⊤ ´ ¤ and 1 r is a column vecto r of ones with length e quals to r . Second, the co l umn-wise N 2 -point DFT is ma intained e xac t and the row-wise N 1 -point DFT is approximated. Then, the N -point DFT a pproximation is calculated by ˆ X 2 = ˆ S · invmap µ n T ∗ N 1 · £ F N 2 · (map( x )) ¤ ⊤ o ⊤ ¶ , (21) where ˆ S = diag £ invmap ³ diag( ˆ S ∗ N 1 ) · ( 1 N 2 ) ⊤ ´ ¤ . 7 5 Approximations for the 1023-point DFT In this section, we advance two results . First, we apply the pr ime fac tor algorithm detailed in Se ction 4 to obta in approxi- mations for the 1023 -po int DFT . Second, we apply the methodology de scr ibed in Section 3 to ob tain app roximations for the 3-, 11- and, 3 1-point DFTs , which are required for 10 23-point DFT approximations. 5.1 1023-point DFT Approximation Invoking (15) for N 1 = 3 1 and N 2 = 3 3 , we introduce a 1023-point DFT app roximation accordi ng to the following equation: ˆ X = invmap µ ˆ F ∗ 31 · h ˆ F ∗ 33 · (map( x )) ⊤ i ⊤ ¶ . (22) The term in square bra c kets in (22 ) req uires 31 calls of a 33-point DFT approxima tion. Beca use N 2 = 3 3 = 1 1 × 3 is suitable for the propo se d method formalism, a 33-point DFT appr oximation can be obtained based on a pproximations for the 3- and 11-point DFTs, as follows ˆ Y = invmap µ ˆ F ∗ 11 · h ˆ F ∗ 3 · (map( y )) ⊤ i ⊤ ¶ , (23) where y is the 33-point column vector c o rresponding to the rows of map( x ). As shown in (17), the scaling matrix ˆ S can be calculated sepa r ately , allowing the 1023-point DFT approximation to be rewri tte n as ˆ X = ˆ S · invmap µ ˆ T ∗ 31 · h ˆ T ∗ 33 · (map( x )) ⊤ i ⊤ ¶ , (24) and ˆ Y = invmap µ ˆ T ∗ 11 · h ˆ T ∗ 3 · (map( y )) ⊤ i ⊤ ¶ . (25) Notice that the dia gonal matrix ˆ S encompasses the intermediate diagonals ˆ S ∗ 3 , ˆ S ∗ 11 , and ˆ S ∗ 31 and is given by ˆ S = diag n invmap h diag( ˆ S ∗ 31 ) · invmap ³ diag( ˆ S ∗ 11 ) · diag( ˆ S ∗ 3 ) ⊤ ´ ⊤ io , where ˆ S ∗ 3 , ˆ S ∗ 11 , and ˆ S ∗ 31 are the di agonal matrices req uired by the DFT approximations ˆ F ∗ 3 , ˆ F ∗ 11 , and ˆ F ∗ 31 , respectively , as described in (13). 5.2 Design P arameters The algorithm detailed in the previous sectio n req uire s ap proximations to the 3-, 11-, and 31 -point DFT . T o obta in such approximations, we numerically apply the methodology described in Section 3 for which g ( · ) and P must be speciﬁed. As suggested in [77], we deﬁne P = { − 1 , − 1 2 , 0 , 1 2 , 1 } as the set of l o w-complexity multipliers. Among the integer functions mentioned, the round function, as implemented in Matlab /Octave [26, 56], has been repor ted to offer superior p e rformance when compa red with other integer functions [21, 22, 77]. Thus , as suggested in [63], we adopte d the following round-to- multiple function: g ( x ) = 1 2 · round(2 · x ) ∈ P . (26) The related α search space is D = [0 . 26 , 1 . 25] (cf . (8)). T he α step used was 10 − 5 providing a total of 6, 1 6, a nd 42 different app r oximations for the 3-, 11-, and 31-point DFTs, respectively . Sm aller α steps do not alter the results. 8 5.3 3-, 11-, and 31-point DFT Approximations The obtained o ptimal expansion factors α ∗ are in the intervals [0 . 86603 , 1 . 25000], [0 . 99240 , 1 . 14528], and [1 . 08859 , 1 . 15141] for the 3 -, 11-, and 31-po int DFT approximations, respe c tively . Therefore, we selected α ∗ = 9 8 for conve- nience. Then, the low-com p lexity matrice s T ∗ 3 , T ∗ 11 , and T ∗ 31 are obtained by T ∗ N = 1 2 · round µ 2 · 9 8 · F N ¶ , N = 3 , 11 , 31 . From (5), we obtai n the sca ling ma trices ˆ S ∗ N according to the following general structure ˆ S ∗ N = diag ¡ 1 , p η N · I N − 1 ¢ , N = 3 , 11 , 31 , where I m is the identity matrix of order m , and the constants are η 3 = 6 7 , η 11 = 11 13 , and η 31 = 31 38 . Thus , the 3-, 11 - and 31-point DFT a pproximations a re obtained by ˆ F ∗ N = ˆ S ∗ N · T ∗ N , N = 3 , 11 , 31 . 5.4 Approximate Scale F actors Despite the dramatic reduction in arithmetic complexity , including the absence o f twiddle factors, the computation shown in (24) requires 2( N − 1) real m ultiplications due to the elements of the diagonal ˆ S . The elements of ˆ S , s i , i = 0 , 1 , . . . , 1022, are given by s i =                                            1 , if i = 0 , q 6 7 , if i mod 31 = 0 ∧ i mod 11 = 0 ∧ i mod 3 6= 0 , q 11 13 , if i mod 31 = 0 ∧ i mod 11 6= 0 ∧ i mod 3 = 0 , q 66 91 , if i mod 31 = 0 ∧ i mod 11 6= 0 ∧ i mod 3 6= 0 , q 31 38 , if i mod 31 6= 0 ∧ i mod 11 = 0 ∧ i mod 3 = 0 , q 93 133 , if i mod 31 6= 0 ∧ i mod 11 = 0 ∧ i mod 3 6= 0 , q 341 494 , if i mod 31 6= 0 ∧ i mod 11 6= 0 ∧ i mod 3 = 0 , q 1023 1729 , otherwise . Further comple xity reductions can b e achieved by appro ximating the elements of ˆ S using truncated representations from the canonical signed digit (CSD) number system [18, 3 6], which satisfy the minimum adder representation crite- rion [32]. W e appr o ximated each element i n ˆ S in such way that its tr uncated CSD representation a dmits at most two additions [3]. T able 1 provides a representatio n of the e lements fro m ˆ S , the absolute error betwee n the approxima tion, the constants, and their truncated representation in CSD ( ¯ 1 represents − 1). Since the e l e ments from ˆ S 3 , ˆ S 11 , and ˆ S 31 are present in ˆ S , multiplierless app r oximation for the 3-, 11-, and 31-poi nt DFTs a re also possible using T able 1. Th e approximations in which the diagonal ma trices were also appro ximated follow the notation ˆ F ′ N . 6 F ast Algo rithms and Arithmetic Complexity In this section, we i ntroduce fast algorithms b ased on sparse matrix facto rizations [6] for the propose d 3 -, 1 1-, and 31-poi nt DFT approximations to reduce the number of remaining operations . These approximatio ns are emplo yed within the PF A to approximately c ompute the 102 3-point DFT . Although the PF A has the inherent property of reducing the total number of o perations by nearly half when the input is rea l -valued, as shown in [34], for the sake of generality and consistency , the analyses pre sented in this section are ba se d on complex-valued input. A complex m ultiplication c a n be tr anslated into 9 T able 1: Truncated CSD Approximatio ns for the Constants from S Constant Approxima tion | Error | CSD q 66 91 ≈ 0 . 8 5 163 55 64 = 0 . 85937 5 0.00774 1 . 00 ¯ 100 ¯ 10 q 11 13 ≈ 0 . 9 1 987 59 64 = 0 . 92187 5 0.00201 1 . 000 ¯ 10 ¯ 10 q 6 7 ≈ 0 . 9 2 582 119 128 = 0 . 92968 75 0.00387 1 . 00 0 ¯ 100 ¯ 1 q 341 494 ≈ 0 . 8 3 083 27 32 = 0 . 8 4 375 0.01292 1 . 00 ¯ 10 ¯ 100 q 93 133 ≈ 0 . 8 3 621 27 32 = 0 . 8 4 375 0.00754 1 . 00 ¯ 10 ¯ 100 q 31 38 ≈ 0 . 9 0 321 29 32 = 0 . 9 0 625 0.00304 1 . 00 ¯ 10100 q 1023 1729 ≈ 0 . 7 6 920 49 64 = 0 . 76562 5 0.00358 1 . 0 ¯ 100010 three real multiplic ations a nd three real a dditions [6 , p. 3 ]. The matrices T ∗ N , N = 3 , 11 , 31, employed in the 1 023-point DFT ap proximations do not req uire multiplications; only additions and bit-shifting ope r ations are needed [12, p. 2 2 1]. In the following, the butterﬂy-structure matrix is deﬁned as follows B m = " I m /2 ¯ I m /2 − ¯ I m /2 I m /2 # , where m is an even integer and ¯ I m /2 is the counter -identity matrix [40 ] of order m /2. 6.1 3-point A pproximation F ast algorithm The lo w-complexity matrix T ∗ 3 can be represented as: T ∗ 3 = A ⊤ 1 · C 1 · A 1 , (27) where A 1 = diag(1 , B 2 ) , (2 8) and the matrix C 1 is C 1 = " 1 1 1 − 1 2 − j # . The m a trix A 1 requires only 4 rea l additions and C 1 needs 4 real additio ns and 2 bit-shifting ope rations. Considering a complex input, the direct implementation of the T ∗ 3 requires 20 additions and 8 bit-shifting oper ations . However , i f the factorization in (27) is applied, then T ∗ 3 requires 12 real additions and 2 bit-shifting operations. T o scale the ap p r oximation maintaining the exact ˆ S ∗ 3 , 4 real multiplications are added to the previous arithmetic cost. This ap- proximation is denoted by ˆ F ∗ 3 . Otherwise, if the diagonal matrix is appro ximated, 8 a dditions and 8 bit-shifting o perations are needed i nstead of the multiplications. This approximation is called ˆ F ′ 3 . 6.2 11-point Approximation F ast Algorithm F or the 11-point approximation, the low-complexity matrix T ∗ 11 is given by T ∗ 11 = A ⊤ 2 · C 2 · A 2 , (29) 10 where A 2 = dia g(1 , B 10 ) , (30) and the block matri x C 2 is C 2 = diag                     1 1 1 1 1 1 1 1 1 2 − 1 2 − 1 1 1 2 − 1 2 − 1 1 1 − 1 1 2 1 − 1 2 1 − 1 2 1 − 1 1 2 1 − 1 1 − 1 2 1 2           ,          − j j − j j 2 − j 2 j − j 2 − j 2 j − j − j − j 2 j j 2 − j j 2 j j 2 − j − j − j 2 − j − j − j − j 2                    . Matrix A 2 requires 20 real additions and C 2 needs 90 real addi tions and 40 bit-shifting operations . While the direct im plementation of the T ∗ 31 requires 38 0 additions and 160 b it-shifting oper ations , the factorization presented in (29) needs 130 rea l additi o ns and 40 bit-shifting op erations. The scaling to F ∗ 11 requires 20 extra multiplica- tions . In terms of F ′ 11 , it needs 40 additions and 40 bit-shifting operations instead of 20 multiplications of F ∗ 11 . 6.3 31-point Approximation F ast Algorithm The lo w-complexity matrix T ∗ 31 can be expressed a s: T ∗ 31 = A ⊤ 3 · C 3 · A 3 , (31) where A 3 = dia g(1 , B 30 ) . The m a trix C 3 is a block matrix given by C 1 = dia g( E 1 , E 2 ) , where E 1 =                               1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 − 1 2 − 1 2 − 1 2 − 1 − 1 − 1 − 1 1 1 1 1 2 − 1 2 − 1 − 1 − 1 − 1 − 1 2 − 1 2 1 2 1 1 1 1 1 2 − 1 2 − 1 − 1 − 1 − 1 2 1 1 1 1 2 − 1 2 − 1 1 1 − 1 − 1 − 1 2 1 1 1 2 − 1 2 − 1 − 1 − 1 2 1 2 1 1 1 2 − 1 2 − 1 − 1 2 1 2 1 1 − 1 2 − 1 − 1 1 1 − 1 1 1 2 − 1 − 1 1 1 2 − 1 2 − 1 1 1 − 1 2 − 1 − 1 2 1 1 − 1 − 1 2 1 1 − 1 2 − 1 1 2 1 − 1 − 1 2 1 1 2 − 1 1 − 1 1 − 1 2 − 1 1 2 1 − 1 2 − 1 1 2 1 − 1 2 − 1 1 1 − 1 2 − 1 1 1 2 − 1 1 − 1 2 − 1 1 1 2 − 1 1 − 1 2 1 − 1 2 − 1 2 1 − 1 2 − 1 1 − 1 1 − 1 1 1 2 − 1 1 2 1 − 1 2 − 1 2 1 − 1 1 − 1 1 2 1 2 − 1 1 − 1 1 − 1 2 1 − 1 1 2 − 1 1 − 1 2 − 1 2 1 − 1 1 − 1 2 1 − 1 1 2 1 − 1 1 2 − 1 2 1 − 1 1 − 1 2 1 2 − 1 1 − 1 1 − 1 2 1 − 1 1 − 1 2 1 2 − 1 2 1 2 − 1 1 − 1 1 − 1 1 − 1 2 1 − 1 1 − 1 1 − 1 1 − 1 1 − 1 2 1 2 − 1 2 1 2 − 1 2                               , 11 E 2 = j ·                             − 1 1 − 1 1 − 1 1 − 1 1 − 1 2 1 2 − 1 2 1 2 − 1 2 1 − 1 1 − 1 2 − 1 2 1 2 − 1 1 − 1 1 − 1 1 2 − 1 2 − 1 1 − 1 2 1 2 − 1 1 − 1 1 2 − 1 2 1 − 1 1 − 1 2 1 − 1 2 1 − 1 1 − 1 2 1 − 1 1 2 1 2 − 1 1 − 1 2 − 1 1 2 − 1 1 2 1 2 − 1 1 − 1 1 − 1 2 − 1 2 1 − 1 1 − 1 1 1 2 − 1 1 2 1 2 − 1 1 2 1 2 − 1 1 − 1 − 1 − 1 2 1 − 1 1 2 1 2 − 1 1 − 1 2 − 1 1 1 2 − 1 1 1 2 − 1 − 1 2 1 1 2 − 1 − 1 2 1 1 2 − 1 1 − 1 − 1 2 − 1 1 2 1 − 1 1 1 2 − 1 − 1 1 2 1 − 1 2 − 1 1 2 1 − 1 − 1 1 2 1 1 2 − 1 − 1 1 1 2 − 1 2 − 1 − 1 2 − 1 − 1 2 1 2 1 1 2 − 1 2 − 1 − 1 1 1 − 1 − 1 1 2 1 1 1 2 − 1 2 − 1 − 1 1 2 1 1 − 1 2 − 1 − 1 − 1 2 − 1 − 1 − 1 − 1 2 1 1 1 1 2 − 1 2 − 1 − 1 − 1 2 1 2 1 1 1 1 1 2 − 1 2 − 1 2 − 1 − 1 − 1 − 1 − 1 2 − 1 2 − 1 2 − 1 2 − 1 − 1 − 1 − 1 − 1 − 1 − 1 − 1 − 1 2 − 1 2                             . The m a trix A 3 requires only 60 real additions, while C 3 needs 78 0 real additions and 300 bit-shifting operations. The direct im p lementation of the T ∗ 31 requires 31 80 additions and 120 0 bit-shifting operations. After applying the factorization in (31), the number o f arithmetic opera tions is reduced to 900 real additions and 300 bit-shifting opera tions . The sca ling to F ∗ 31 requires 60 multiplications, whereas F ′ 31 needs 12 0 additions and 12 0 bit-shifting operatio ns . 6.4 1023-point DFT Approximation The PF A c o mputation of the 10 23-point DFT , a s deﬁned in Section 2.2, require s (33 × 270 0 ) + (93 × 300) + (34 1 × 12) = 1 21092 real multiplications, and (33 × 4 560) + (93 × 520) + (34 1 × 24) = 20702 4 real additions using the 3 -, 11-, and 31-point DFTs directly from their deﬁnitions . However , if the e xact 1023 -point DFT is c o mputed by the sparse matri x factorizations detailed in Appendix A, then (33 × 90 0) + (93 × 100) + (341 × 2 ) = 39682 real multiplicatio ns , (33 × 102 0 ) + (93 × 140) + (341 × 1 2) = 50772 real additions, a nd (3 41 × 2) = 682 bit-shifting operations a r e needed. The unscaled algorithm to compute the 102 3-point DFT a pproximation, referred to as ˆ T ∗ 1023 , has null complexity of multiplications. T o calculate ˆ T ∗ 1023 , the ma trix T ∗ 31 is ca lled 33 times contributing with 900 × 33 = 29 700 real additions and 300 × 33 = 99 00 bit-shifting operations. On the other hand, T ∗ 11 is called 93 times contributing with 130 × 93 = 12 090 real additions and 40 × 93 = 3720 bit-shifting operations and T ∗ 3 is called 341 times which cor responds to 12 × 341 = 40 92 real additions and 2 × 341 = 682 bit-shifting operations . Then, the r esulting arithmetic costs of ˆ T ∗ 1023 are 45882 r e al a dditions and 1 4302 bit-shifting operations. T o compute the scaled 1023 -point DFT approxima tion with the exact S , called ˆ F ∗ 1023 , more 204 4 multiplications a re necessary . However , if S is appro ximated following T able 1, i nstead of multiplications, 40 88 additions a nd 4088 bit-shifting operatio ns are neede d to achieve ˆ F ′ 1023 . 6.5 Hybrid 1023-point DFT Approximation Applying the hybrid appr oach d e tailed in Section 4.3 to the 10 2 3-point DFT appro xim a tion de ﬁned in (24) and (25), we obtained 12 distinct approximations. The 1023-point DFT a pproximations consist of four elements: a diagonal m atrix ˆ S , a 3-, an 11 -, and a 31-po int transformatio n. In the hybrid approximations, these four e l ements are alternated between exact and a pproximate form. T able 2 helps to understand the approximatio ns by providing the comb inations of the these fo ur elements. The computation of the exact DFT was perfor med according to the algorithms detai led in the Appendix A. The a rithmetic co mplexity of the 1023 -po int DFT approximations is summa rized in T able 4 and can be obtained using 12 T able 2: Hybr id Approxima tions for the 1023-po int DFT Employed transformation Approximation ˆ S 3-point 1 1-point 31 -point ˆ F ∗ 1023 , I Exact T ∗ 3 F 11 F 31 ˆ F ′ 1023 , I CSD appr ox. T ∗ 3 F 11 F 31 ˆ F ∗ 1023 , II Exact F 3 T ∗ 11 F 31 ˆ F ′ 1023 , II CSD appr ox. F 3 T ∗ 11 F 31 ˆ F ∗ 1023 , III Exact T ∗ 3 T ∗ 11 F 31 ˆ F ′ 1023 , III CSD appr ox. T ∗ 3 T ∗ 11 F 31 ˆ F ∗ 1023 , IV Exact F 3 F 11 T ∗ 31 ˆ F ′ 1023 , IV CSD appr ox. F 3 F 11 T ∗ 31 ˆ F ∗ 1023 , V Exact T ∗ 3 F 11 T ∗ 31 ˆ F ′ 1023 , V CSD appr ox. T ∗ 3 F 11 T ∗ 31 ˆ F ∗ 1023 , VI Exact F 3 T ∗ 11 T ∗ 31 ˆ F ′ 1023 , VI CSD appr ox. F 3 T ∗ 11 T ∗ 31 the following equation: A (1023-point DFT) = A ( ˆ S ) + 33 · A (31-point DFT) + 93 · A (11-point DFT) + 341 · A (3-point DFT) , (32) where A ( · ) represents the arithmetic comple xity of the argument, including operations such as multiplications, add itions , and bit-shifting. 7 Comparison and Discussion In this section, we assess and compare the proposed methods with competing me thods. The com parisons encom pass: arithmetic compl e xity , error analysis, and frequency response. 7.1 Arithmetic Complexity 7.1.1 Complexity Measureme n ts T able 3 shows the arithmetic co m plexity of the proposed ground transforma ti o n matric es compare d with: (i) their re- spective exact counterparts, (ii) the ground transformatio n used in [53] denoted by ˆ F 32 , and (iii) the exact 32-point DFT calculated by the fully optimized Cooley-Tukey Radix-2 [6 ] denoted by F 32 . In T a ble 4, we compare the arithmetic complexity of the proposed 1 023-point DFT approximation algorithm with (i) the 1023-point exact DFT computed b y the pri m e factor algorithm without fast algorithms for the ground transforms; (ii) the 1023-point exact DFT computed by the prime factor algor ithm with fast algorithms for the ground transforms; (iii) the 1024-point exact DFT computed by deﬁnition ( F 1024 ); (iv) the 10 24-point exact DFT computed by the fully optimi zed Cooley-Tukey Ra dix-2 ( F 1024 ); and (iv) the fast algorithms for the 1024-poi nt DFT approximation— ˆ F I 1024 , ˆ F II 1024 , and ˆ F III 1024 —proposed in [53]. Such methods are the closest c o mparable algorithms i n the literature. In ˆ F I 1024 , both row- and column-wise 32 -point DFTs are replaced by the multiplierless ˆ F 32 [20]. In ˆ F II 1024 and ˆ F III 1024 , either the col umn- or ro w-wise operation is kept in the exac t form. 13 T able 3: Arithmetic Complexity of the Approximate Ground Transforms a nd Comparison N Transform Real M ult. Real Add. Bit-shifting 3 F 3 (by deﬁnition [64]) 12 24 0 F 3 (by the proposed FFT) 2 12 2 ˆ T ∗ 3 0 12 2 ˆ F ∗ 3 4 12 2 ˆ F ′ 3 0 20 10 11 F 11 (by deﬁnition [6 4]) 300 520 0 F 11 (by the proposed FFT) 100 140 0 ˆ T ∗ 11 0 130 40 ˆ F ∗ 11 20 130 40 ˆ F ′ 11 0 170 80 31 F 31 (by deﬁnition [6 4]) 2700 4560 0 F 31 (by the proposed FFT) 900 1020 0 ˆ T ∗ 31 0 900 300 ˆ F ∗ 31 60 900 300 ˆ F ′ 31 0 1020 420 32 F 32 (Cooley-Tukey Radix-2 [6]) 88 408 0 ˆ F 32 (proposed in [53]) 0 348 0 T able 4: Arithmetic Complexity Assessment and Comp a rison N Transform Real Mult. Real Add. Bit-shifting 1023 F 1023 (PF A) [64]) 121092 2 07024 0 F 1023 (PF A and fast algorithms) 39682 50772 682 ˆ F ∗ 1023 , I 40364 50772 682 ˆ F ′ 1023 , I 39000 53500 3410 ˆ F ∗ 1023 , II 32242 49842 4402 ˆ F ′ 1023 , II 30382 53562 8122 ˆ F ∗ 1023 , III 11962 46812 10582 ˆ F ′ 1023 , III 9982 50772 14542 ˆ F ∗ 1023 , IV 31684 49842 4402 ˆ F ′ 1023 , IV 29700 53810 8370 ˆ F ∗ 1023 , V 11324 46812 10582 ˆ F ′ 1023 , V 9300 50860 14630 ˆ F ∗ 1023 , VI 2722 45882 14302 ˆ F ′ 1023 , VI 682 4996 2 1838 2 ˆ T ∗ 1023 0 45882 14302 ˆ F ∗ 1023 2044 45882 14302 ˆ F ′ 1023 0 49970 18390 1024 F 1024 (exact, by deﬁnition [6]) 308428 8 5159 9 36 0 F 1024 (Cooley-Tukey [6]) 10248 30728 0 ˆ F I 1024 (proposed in [53]) 2883 25155 0 ˆ F II 1024 (proposed in [53]) 5699 27075 0 ˆ F III 1024 (proposed in [53]) 5699 27075 0 14 T able 5: Erro r Me a surements of the Ground Approximate Transforms N Transform ǫ M φ × 10 3 3 ˆ F ∗ 3 0.0968 1.59 6.73 ˆ F ′ 3 0.0975 1.60 6.77 11 ˆ F ∗ 11 8.88 1.19 14.12 ˆ F ′ 11 8.90 1.20 14.11 31 ˆ F ∗ 31 76.60 0.45 19.83 ˆ F ′ 31 76.90 0.45 19.84 32 ˆ F 32 332 0.84 36 .07 7.1.2 Discussion T able 3 shows that power-of-two ground transforma tion matr i c es be neﬁt mo re fro m fac torization than prime-length tra ns- formations. However , when ground transforms are used as a building block to de rive larger transforms, a differe nt phe- nomenon o ccurs . Indeed, the N comp l e x multiplicatio ns due to the twiddle factors pr esent in Cooley-Tukey-based approx- imations offset the complexity reductions from its factori zatio n. In contrast, the propo se d PF A-based approxima tions do not require intermed i ate multiplications b y twiddle factors. In particular , the propose d approximations ˆ T ∗ 1023 and ˆ F ′ 1023 are entirely multiplierless. This feature has a di rect imp act on hardware efﬁciency . As a r eference, the appro ximation with the lowest arithmetic complexity p r oposed i n [53] ( ˆ F I 1024 ) achieved r eductions o f up to 48.5% in chip area , 30% in c r itical path delay (CPD), and 66.0% in energy consumption compared to the conventional radix-2 Cooley-Tukey FFT impleme ntation. Given that the proposed a pproximations ˆ T ∗ 1023 and ˆ F ′ 1023 require no multiplications, it is there fo r e reasonable to expec t that the propo se d method co uld achieve at least c omparable reductions in terms of power consumption, chip area, a nd CPD . 7.2 Error Analysis T able 5 and 6 summa rize the proximity mea surements (Sectio n 3.2) for the proposed appro xima tions compared the ap- proximations proposed in [2 0, 53]. Although the proposed approximations ar e not strictly orthogonal, their deviations from orthogonality are extreme ly low ( ≈ 10 − 2 ). MAPE measureme nts are also smaller fo r the proposed app r oximations. The total erro r energy ( ǫ ) i ndicates that the proposed approximations ar e more reﬁned than the approxima ti o ns in [20, 53]. The good pe r formance of the propo sed ground approximations is transferred to the 1023-po int DFT app roximations. The applicabi l i ty of appr oximate DFTs to pra ctical scenarios such as digital beamforming and spectrum sensing has been demonstrated in [20, 53]. In both ca ses , the adopted approximatio ns proved adequate for real-world im plementations despite presenting higher error levels compared to the me thods proposed in this work. Therefore, given that blocklength is not a critic al constraint in these contexts and that the proposed approximations achieve even lower error rates , it is reasonable to expect that they a r e e qually , if not more, suitabl e for these classes of applic a tions , while fully elimi nating multiplications. 7.3 Frequency Response Considering the rows o f the linear transform m a trix as a ﬁnite impulse response (FIR) ﬁlter bank [64], it is possible to evaluate the a pproximation pe rformance according to the frequency response of the considered ﬁlters. This appr oach i s justiﬁed by the fact that, in linear time-invariant systems, any input signal can b e expressed as a linear combination of shifted impulses [59]. In this wa y , a nalyzing the i m pulse respo nse of each row provides a complete and interpretable characterization of the tra nsformation behavior . 15 T able 6: Erro r Me a surements of the Proposed 1023-point Approximation a nd Com parison N Transform ǫ × 10 − 4 M × 10 3 φ × 10 3 1023 ˆ F ∗ 1023 , I 1.13 4.67 6.73 ˆ F ′ 1023 , I 1.13 4.69 6.77 ˆ F ∗ 1023 , II 7.68 12.83 14.12 ˆ F ′ 1023 , II 7.70 12.86 14.11 ˆ F ∗ 1023 , III 8.35 13.68 19.83 ˆ F ′ 1023 , III 8.38 13.70 19.84 ˆ F ∗ 1023 , IV 8.80 14.12 20.76 ˆ F ′ 1023 , IV 8.88 14.18 20.79 ˆ F ∗ 1023 , V 9.46 14.77 26.43 ˆ F ′ 1023 , V 9.55 14.82 26.49 ˆ F ∗ 1023 , VI 15.93 18 .67 33.68 ˆ F ′ 1023 , VI 16.66 19 .86 33.78 ˆ F ∗ 1023 17.03 19 .41 40.18 ˆ F ′ 1023 17.10 19 .45 40.06 1024 ˆ F I 1024 93.00 44 69.42 ˆ F II 1024 34.02 25 .31 36.07 ˆ F III 1024 34.02 25 .31 36.07 The analysis is focused on ˆ F ′ 3 , ˆ F ′ 11 , ˆ F ′ 31 , and ˆ F ′ 1023 because they are fully app roximated, scaled, and free of multiplica - tions . 7.3.1 Overall Ass essment The frequency response erro r energy ( ǫ ) measureme nts for ˆ F ′ 3 , ˆ F ′ 11 , and ˆ F ′ 31 relative to the exac t ﬁlter bank a r e given in T able 7. The lowest measureme nts are in bol dface. Fig . 1(a)–(d) shows the error energy for a ll ﬁlters from the 3-, 11-, 31-, and 102 3-point a pproximations. Each row of the transform matrix is interpreted as a FIR ﬁlter and is represented b y a distinct color i n the p l o t. The vertical axis corresponds to the normal i zed magnitude response in decibels (dB), computed as: 20 log 10 µ | H ( ω ) | max ω | H ( ω ) | ¶ , where H ( ω ) is the freq uency response o f the row under analysis. This normalization sets the main lob e peak at 0 dB, allowing for visual inspection of attenuation a nd spectral leakage across frequencies . Although the curves appea r visu- ally similar to eac h other , it is still possible to observe that the error is ke p t under − 17 dB in all cases. I n [53], the approximations ˆ F I 1024 , ˆ F II 1024 , and ˆ F III 1024 produced error s below − 6.8 dB, − 11.52 dB, and − 10.61 dB, respectively . Notice that, for all proposed approximations (see T able 7), including those with blocklength 1023 derived from the PF A, the DC component re mains unchanged and is computed exactly as in the e xact DFT . This holds because neither the ap p r oximation procedure nor the orthogonalization process mo d iﬁes the 0th row of the DFT matr i x, which is solely composed of ones and inherently has lo w-complexity . Therefore, the approxima tion erro r is e xclusively a ssoc i ated with the non-DC fre quency components, while the DC level rema ins identical to the one calc ulated by exact DFT . In a d d ition, an analysis is presented for the freq uency response to a pure cosine input generated by x [ n ] = cos µ 2 π · 100 N · n ¶ , n = 0 , 1 , . . . , N − 1 . (33) 16 T able 7: Error Energy of 3-, 11 -, and 31-point DFT approximations (least performing are highligh te d ) Method Row T otal 1 2 3 4 5 6 7 8 9 10 1 1 ˆ F ′ 3 0.00 0.08 0.01 0.09 ˆ F ′ 11 0.00 0.44 1.01 0.93 1.09 1 .33 0.46 0.69 0 .85 0.77 1.34 8.90 Method Row T otal 1 2 3 4 5 6 7 8 9 10 11 1 2 13 14 15 16 ˆ F ′ 31 0.00 2.08 2.91 1 .56 3 .66 1.97 3.69 3.26 0.82 3.36 1.56 3.38 2.54 1.60 2.73 1.91 76.8 Row 17 18 19 2 0 21 22 23 24 25 26 27 28 29 30 3 1 3.20 2.38 3.51 2 .57 1.73 3.56 1.77 4.29 1.87 1.45 3 .1 6 1.47 3 .55 2.22 3.04 Normalized Frequency Magnitude (dB) − π − π /2 0 π /2 π − 40 − 10 0 − 22 . 86 (a) ˆ F ′ 3 Normalized Frequency Magnitude (dB) − π − π /2 0 π /2 π − 40 − 30 0 − 17 . 69 (b) ˆ F ′ 11 Normalized Frequency Magnitude (dB) − π − π /2 0 π /2 π − 40 − 30 − 10 0 − 19 . 91 (c) ˆ F ′ 31 Normalized Frequency Magnitude (dB) − π − π /2 0 π /2 π − 40 − 30 − 10 0 − 20 . 9 (d) ˆ F ′ 1023 Figure 1: Error p l ots be tween the ﬁlter bank frequency response magnitude for (a) 3-point approximation, (b) 11-point approximation, (c) 31-point approximation, and (d) 1023-po i nt a pproximation and their exact counterparts . Fig . 4 presents the magnitude responses obtained by applying the exact and DFT approxima tions to a pure cosine sig- nal, where the dashed line represents the maximum value of the non-dominant frequencies. In this anal ysis, the results are shown in l i near scale (rather than in dB) to i m prove visualization. As shown in Fig. 4(a), the approximation ˆ F I 1024 exhibits noticeable leakage across several non-dominant freq uencies , along with distortio n in o ne of the main lob es (left side). In the hybrid appro ximation ˆ F II 1024 , in Fig . 4(b), the leakage in non-dominant frequencies is re duced but distortion in the left main lobe remains. In Fig . 4(c), the hybrid approximati o n ˆ F III 1024 presents the main lob es clo se to their exact counterparts, but it presents leakage in more non-dominant frequencies that ˆ F II 1024 . In contrast, the proposed appr o xima- tion ˆ F ′ 1023 , shown in Fig . 4(d), achieves a closer match to the exact DFT in the main lobes and a lower maximum leakage in non- d o minant frequencies . 7.3.2 W orst-case Scenario Fig . 2(a)–(c) displays the frequency response magnitude plots of the three least-perform i ng ﬁlters for the ground 3-, 11 -, and 31 –point app roximations compare d with their exact counterparts, respectively . Fig . 3(a)–(c) address the 10 23-point approximation and shows the freq uency response magnitude plo t associated to matrix rows (ﬁlters) 86, 6 99, and 854 , respec tively . These are the least performi ng ﬁlters, presenting error energy me a- surements o f 306 .08, 287.1, and 286.29, respectively; being 1 67.15 the average erro r energy from all ﬁlters. Even under such worst-case scenario analysis , the approxima te methods were ab le to pre serve the m a in and the secondary lo b es from the exact DFT . 17 Normalized Frequency Magnitude (dB) − π − π /2 0 π /2 π Row 3 Row 2 Row 1 Exact − 40 − 30 − 20 − 10 0 (a) ˆ F ′ 3 Normalized Frequency Magnitude (dB) − π − π /2 0 π /2 π Row 11 Row 6 Row 5 Exact − 40 − 30 − 20 − 10 0 (b) ˆ F ′ 11 Normalized Frequency Magnitude (dB) − π − π /2 0 π /2 π Row 24 Row 7 Row 5 Exact − 40 − 30 − 20 − 10 0 (c) ˆ F ′ 31 Figure 2: Comparison between the ma gnitude of the ﬁlter-bank respo nses of the app r oximations and their exact counter - parts for the least three perfor ming rows. 18 −50 −40 −30 −20 −10 0 Normalized Frequency Magnitude (dB) F 1023 ˆ F ′ 1023 π /6 π /4 π /3 5 π /12 π /2 (a) Row 854 −50 −40 −30 −20 −10 0 Normalized Frequency Magnitude (dB) F 1023 ˆ F ′ 1023 7 π /12 2 π /3 3 π /4 π /2 (b) Row 699 −50 −40 −30 −20 −10 0 Normalized Frequency Magnitude (dB) F 1023 ˆ F ′ 1023 − π /3 − π / 4 − π /6 − π /12 0 (c) Row 86 Figure 3 : Comparison between the ma gnitude of the ﬁlter -ba nk responses of the approximation ˆ F ′ 1023 and the exact DFT for the least three performing rows . DFT index Magnitude ˆ F I 1024 F 1024 200 400 600 800 1000 0 0 0 . 4 0 . 6 0 . 8 1 0 . 17 (a) DFT index Magnitude F 1024 ˆ F II 1024 200 400 600 800 1000 0 0 0 . 4 0 . 6 0 . 8 1 0 . 17 (b) DFT index Magnitude F 1024 ˆ F III 1024 200 400 600 800 1000 0 0 . 4 0 . 6 0 . 8 1 0 . 11 (c) DFT index Magnitude ˆ F ′ 1023 F 1023 200 400 600 800 1000 0 0 . 4 0 . 6 0 . 8 1 0 . 09 (d) Figure 4: Comparison b etween the magnitude responses obtained from [53] a nd ˆ F ′ 1023 with their exact counterparts, applied to a pure cosine sign al. 19 8 Conclusion In this paper , we proposed a method to obtain matrices with nu ll multiplicative complexity o n the pri me factor algorithm. W e demonstrated that, if the tr ansform le ngth can be decomposed into relativel y prime numbers, then the entire ap- proximate DFT co mputation can be performed multiplierlessly . The absence of twiddle factors simpliﬁes the de sign when compared to Cooley-Tukey-based appro ximations a nd also reduces the error propagation from the approximate ground transforms. W e applie d the propo sed method to derive a 1 023-point DFT approximation a long with a collection of approx- imations, trading-off accuracy for c o mputational cost and vice-versa. The pro posed method outperformed the c omparable methods in the litera ture acco rding to popular ﬁgures of merit. The m a in contributions of this paper are summarized a s follows: • A generic method to obta in multiplierless DFT approximatio ns using the pri m e facto r algorithm; • Three novel appro xima tions for the 3-point DFT ; • Three novel appro xima tions for the 11-point DFT ; • Three novel appro xima tions for the 31-point DFT ; • A procedure to construct la rge multiplierless DFT approximatio ns when the transform le ngth can be de c omposed into rela tively prime factors; • Fifteen approximations for the 1023-po int DFT , derived using the propo sed method. F or future work, we a im to extend this framework to different bl ocklengths . Preliminary results for the 65-point DFT (65 = 5 × 13) have already corr o borated the scala bility and co nsistency of the method, preserving its fully multiplierless and low-complexity characteristics. In pa rticular , large transform sizes that result from coprime factorizations—such as 2 046 (2 × 3 × 1 1 × 31)—which directly beneﬁts from the results and appro ximations developed in this wor k. Additionally , practical hardware implementatio ns of the pro p o sed m ethod is sought to be investigated. In this context, we highlight memory- efﬁcient strate gies such as in-place and i n-orde r computation. As shown in [51], such techniques are dir ectly possible in PF A algorithms only when the transform blockle ngth prime factors are quadratic residues of each other . However , an earlier work by [14] introduced a variation of the PF A that e nables fully in-place and in-order execution for arbitra r y factorizations. This approac h signiﬁcantly reduces memory usage, eliminate s the need for post-processing (unscrambling), and may achieve faster computation than traditional FFT algorithms such a s Cooley-Tukey radix-4. A F ast Alg orithms for the Exact Transforms This appendix discusses the a rithmetic complexity and pro vide s fast algorithms for the exact 3-, 11-, and 31-po int DFT . The re sulting arithmetic complexity is shown in T ab l e 3. A.1 3 -, 11 -, and 31 - point DFT The direct com putation of the DFT requires co mplex multiplications in the order of O ( N 2 ) [64]. Therefore, disregard i ng the trivial multipliers, if we consider a co mplex input, the direct computation o f the 3-, 11-, and 31-po int DFT req uires 12 rea l multiplications and 24 real additions; 300 re al multiplications a nd 52 0 real additio ns; and 2 7 00 real multiplica tions and 4560 re al additions. re spectively . 20 A.2 F ast A lgorithms A.2.1 3-point DFT Matrix F 3 can be factorized as: F 3 = A ⊤ 1 ·   1 1 1 − 1 2 − j p 3 2   · A 1 . (34) A.2.2 11-point DFT Let β x = co s ³ x · π N ´ and δ x = j · sin ³ x · π N ´ . Matrix F 11 can be represented as: F 11 = A ⊤ 2 ·             1 1 1 1 1 1 1 β 2 β 4 β 6 β 8 β 10 1 β 4 β 8 β 10 β 6 β 2 1 β 6 β 10 β 4 β 2 β 8 1 β 8 β 6 β 2 β 10 β 4 1 β 10 β 2 β 8 β 4 β 6 − δ 5 δ 4 − δ 3 δ 2 − δ 1 δ 4 − δ 1 − δ 2 δ 5 − δ 3 − δ 3 − δ 2 δ 4 δ 1 − δ 5 δ 2 δ 5 δ 1 − δ 3 − δ 4 − δ 1 − δ 3 − δ 5 − δ 4 − δ 2             · A 2 . (35) A.2.3 31-point DFT Matrix F 31 can be expressed as: F 31 = A ⊤ 3 · h E 3 E 4 i · A 3 , (36) where E 3 =                      1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 β 2 β 4 β 6 β 8 β 10 β 12 β 14 β 16 β 18 β 20 β 22 β 24 β 26 β 28 β 30 1 β 4 β 8 β 12 β 16 β 20 β 24 β 28 β 30 β 26 β 22 β 18 β 14 β 10 β 6 β 2 1 β 6 β 12 β 18 β 24 β 30 β 26 β 20 β 14 β 8 β 2 β 4 β 10 β 16 β 22 β 28 1 β 8 β 16 β 24 β 30 β 22 β 14 β 6 β 2 β 10 β 18 β 26 β 28 β 20 β 12 β 4 1 β 10 β 20 β 30 β 22 β 12 β 2 β 8 β 18 β 28 β 24 β 14 β 4 β 6 β 16 β 26 1 β 12 β 24 β 26 β 14 β 2 β 10 β 22 β 28 β 16 β 4 β 8 β 20 β 30 β 18 β 6 1 β 14 β 28 β 20 β 6 β 8 β 22 β 26 β 12 β 2 β 16 β 30 β 18 β 4 β 10 β 24 1 β 16 β 30 β 14 β 2 β 18 β 28 β 12 β 4 β 20 β 26 β 10 β 6 β 22 β 24 β 8 1 β 18 β 26 β 8 β 10 β 28 β 16 β 2 β 20 β 24 β 6 β 12 β 30 β 14 β 4 β 22 1 β 20 β 22 β 2 β 18 β 24 β 4 β 16 β 26 β 6 β 14 β 28 β 8 β 12 β 30 β 10 1 β 22 β 18 β 4 β 26 β 14 β 8 β 30 β 10 β 12 β 28 β 6 β 16 β 24 β 2 β 20 1 β 24 β 14 β 10 β 28 β 4 β 20 β 18 β 6 β 30 β 8 β 16 β 22 β 2 β 26 β 12 1 β 26 β 10 β 16 β 20 β 6 β 30 β 4 β 22 β 14 β 12 β 24 β 2 β 28 β 8 β 18 1 β 28 β 6 β 22 β 12 β 16 β 18 β 10 β 24 β 4 β 30 β 2 β 26 β 8 β 20 β 14 1 β 30 β 2 β 28 β 4 β 26 β 6 β 24 β 8 β 22 β 10 β 20 β 12 β 18 β 14 β 16                      , E 4 =                    − δ 15 δ 14 − δ 13 δ 12 − δ 11 δ 10 − δ 9 δ 8 − δ 7 δ 6 − δ 5 δ 4 − δ 3 δ 2 − δ 1 δ 14 − δ 11 δ 8 − δ 5 δ 2 δ 1 − δ 4 δ 7 − δ 10 δ 13 − δ 15 δ 12 − δ 9 δ 6 − δ 3 − δ 13 δ 8 − δ 3 − δ 2 δ 7 − δ 12 δ 14 − δ 9 δ 4 δ 1 − δ 6 δ 11 − δ 15 δ 10 − δ 5 δ 12 − δ 5 − δ 2 δ 9 − δ 15 δ 8 − δ 1 − δ 6 δ 13 − δ 11 δ 4 δ 3 − δ 10 δ 14 − δ 7 − δ 11 δ 2 δ 7 − δ 15 δ 6 δ 3 − δ 12 δ 10 − δ 1 − δ 8 δ 14 − δ 5 − δ 4 δ 13 − δ 9 δ 10 δ 1 − δ 12 δ 8 δ 3 − δ 14 δ 6 δ 5 − δ 15 δ 4 δ 7 − δ 13 δ 2 δ 9 − δ 11 − δ 9 − δ 4 δ 14 − δ 1 − δ 12 δ 6 δ 7 − δ 11 − δ 2 δ 15 − δ 3 − δ 10 δ 8 δ 5 − δ 13 δ 8 δ 7 − δ 9 − δ 6 δ 10 δ 5 − δ 11 − δ 4 δ 12 δ 3 − δ 13 − δ 2 δ 14 δ 1 − δ 15 − δ 7 − δ 10 δ 4 δ 13 − δ 1 − δ 15 − δ 2 δ 12 δ 5 − δ 9 − δ 8 δ 6 δ 11 − δ 3 − δ 14 δ 6 δ 13 δ 1 − δ 11 − δ 8 δ 4 δ 15 δ 3 − δ 9 − δ 10 δ 2 δ 14 δ 5 − δ 7 − δ 12 − δ 5 − δ 15 − δ 6 δ 4 δ 14 δ 7 − δ 3 − δ 13 − δ 8 δ 2 δ 12 δ 9 − δ 1 − δ 11 − δ 10 δ 4 δ 12 δ 11 δ 3 − δ 5 − δ 13 − δ 10 − δ 2 δ 6 δ 14 δ 9 δ 1 − δ 7 − δ 15 − δ 8 − δ 3 − δ 9 − δ 15 − δ 10 − δ 4 δ 2 δ 8 δ 14 δ 11 δ 5 − δ 1 − δ 7 − δ 13 − δ 12 − δ 6 δ 2 δ 6 δ 10 δ 14 δ 13 δ 9 δ 5 δ 1 − δ 3 − δ 7 − δ 11 − δ 15 − δ 12 − δ 8 − δ 4 − δ 1 − δ 3 − δ 5 − δ 7 − δ 9 − δ 11 − δ 13 − δ 15 − δ 14 − δ 12 − δ 10 − δ 8 − δ 6 − δ 4 − δ 2                    . 21 Reference s [1] S. A . A L A W S H A N D A . H . M U Q A I B E L , Mul t i-level prime array for sparse sampling , IET Signal Processing, 12 ( 2018), p. 688–699 . [2] V . A R I YA R A T H N A , D . F . G. C O E L H O , S . P U L I PA T I , R . J. C I N T R A , F . M . B AY E R , V. S . D I M I T R O V , A N D A . M A D A N AYA K E , Multibeam digital array rec eiver using a 16-point mul tiplierless DFT approximation , IEE E Transacti on s on Antennas and Propagation, 67 (2018), p. 925–933. [3] A . A V I Z I E N I S , Si gn ed-digit number representations for fast para l lel arithmetic , IRE Transactions on Electronic Computers , EC-10 (1961), p. 389–400. [4] V . B A R I C H A R D , X . G A N D I B L E U X , A N D V . T ’ K I N D T , Multi objective programming and goal programming: theoretical results and practical applications , vol. 618, Springer Science & B usiness Media, Heidelberg , BW , 2008. [5] F . M . B AY E R A N D R . J. C I N T R A , DCT-l ike transform for i mage compression requires 14 additi ons only , Electronics Letters, 48 (2012), p. 919–921. [6] R . E . B L A H U T , Fa st algorithms for signal process i ng , Cambridge University Press, Cambridge, UK, 2010. [7] S. B O U G U E Z E L , M . O . A H M A D , A N D M . N . S . S W A M Y , Low-complexity 8 × 8 transform for image compression , Electronics Letters, 44 (2008), p. 1249–1250 . [8] , A novel transfor m for image compres sio n , in IE EE International Midwest Symposium on Circuits and Systems, 2010, p. 509–512. [9] R . N. B R A C E W E L L , T he Fourier transform and i ts applications , McGraw-Hi ll, New Y ork, NY , 2000. [10] W . L . B R I G G S A N D V . E . H E N S O N , The DFT : an owners’ manual for the discrete Fourier transform , vol. 45, SIAM, Philadelphia, PA, 1995. [11] E . O. B R I G H A M , The fast Fourier transform and its applications , Prentice-Hall, Inc., Englewood Cliffs, NJ , 1988. [12] V . B R I T A N A K , P . C . Y I P , A N D K . R . R A O , Discrete cosine and sine transforms: general properties , fast algorithms and integer approximations , E lse vier , Oxford, UK, 2010. [13] R . B U C H E R T , S . S H A H R I E R , A N D P . B E C K E R , Optimized di screte Fourier t ransfor m method and appar atus u sing prime factor algorithm , 2004. [14] C. B U R R U S A N D P . E S C H E N B A C H E R , An in-place, in-order prime factor FFT algorithm , IEEE Transactions on Acoustics, Speech, and Signal Processing, 29 (1981), pp. 806–817. [15] C. B U R R U S A N D T . P A R K S , DFT/FFT and convolution algorith ms . Theory and implementatio n , 1985. [16] S. - C . C H A N A N D P . Y I U , An efﬁcient multiplierless approximation of the fast Fourier transform using sum-of -powers - of-two (SOPOT) coefﬁcients , IEEE Signal Processing Letters, 9 (2002), pp. 322–325. [17] A . X . G . X . C H E L L I A H , B . S. P . S. R O B I N S O N , M . S E L L A T H U R A I , A N D L . G O P A L A K R I S H N A N , A power - efﬁcient variable-length prime factor MDC FFT arc hit ecture for high- speed wi reless communi cation applications , A E U-International Journal of Electronics and Communications, 120 (2020), p. 153194. [18] R . C I N T R A , A note on the conversion of nonnegative int eger s to t he canonical signed-digit represe nt ation , arXi v preprint arXiv:2 501.10908, (2025). [19] R . J. C I N T R A , An integer approximation method for d iscrete sinusoidal t ransforms , Circuits , Systems, and Signal Processing, 30 (2011), p. 1481. [20] , An approximation for the 32-point d iscrete Fourier transform , arXiv , (2024). [21] R . J. C I N T R A A N D F . M . B AY E R , A DCT approximation for i mage compres si o n , IEEE Signal Processing Letters, 18 ( 2011), p. 579–582. [22] R . J . C I N T R A , F . M . B AY E R , A N D C . T A B L A D A , Low-complexit y 8-poin t DCT approximations based on integer f u nctions , Signal Processing , 99 (2014), p. 201–214. [23] E . N. C O M M I T T E E E T A L . , Digital radio mondiale ( DRM)—system speciﬁcation , ETSI ES, 201 ( 2009). [24] J. D A I A N D H . Y I N , Design and realization of non-radix-2 FFT prime f actor processor for 5G broadcasting in release 16 , in Interna- tional Symposium on C omputational Intelligence and Design, 2020, p. 406–409. [25] P . D U H A M E L A N D M . V E T T E R L I , F ast Fourier transforms: a tutorial review and a state of the art , Signal Processing, 19 ( 1990), p. 259–299. [26] J. W . E A T O N , D. B A T E M A N , A N D S. H A U B E R G , GNU Octave Manual V ersion 3 , Network Theory Ltd., Oct. 2008. [27] M . E H R G O T T , Multicriteria o ptimization , vol. 491, Springer Science & Bus iness Media, Heidelberg , BW , 2005. [28] B. E V E R I T T A N D A . S K R O N D A L , The Cambridge dictionary of stati sti cs , vol. 106, Cambridge University Press Cambridge, New Y ork, NY , 2002. [29] B. N. F L U R Y A N D W. G A U T S C H I , An algorithm for simult aneous orthogonal transformation of several positive d eﬁnite symmetric matrices to nearly diagonal form , SIA M Journal o n Scientiﬁc and Statistical Computing, 7 (1986), p. 169–184 . [30] M . F R I G O A N D S . G. J O H N S O N , T he f astest Fourier transform in th e West , tech. rep., Massachusetts Ins titute of Technology , 1997. [31] I . J. G O O D , The interaction algorithm and practical Fourier analysis , J ournal of the R oyal Statistical Society: Series B ), 20 (1958 ) , p. 361–372. 22 [32] R . I . H A R T L E Y , Subexpression shar i ng in ﬁ lters using canonic signed digit multipliers , IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 43 (1996), p. 677–688. [33] T . I . H A W E E L , A new square wave transform based on the DC T , Signal Processing, 81 (2001), p. 2309–2319. [34] M . H E I D E M A N , C. B U R R U S , A N D H . J O H N S O N , Prime factor FFT algorit hms for rea l - valued series , in ICASSP’84. IEEE Interna- tional C onference on Acoustics, Speech, and Signal Processing , vol. 9, IEEE, 1984, pp. 492–495. [35] M . T . H E I D E M A N A N D C . S. B U R R U S , Multipli cative complexity , convo l ution, and the DFT , Springer , New Y ork, NY , 1988. [36] R . H E W L I T T A N D E . S W A R T Z L A N T L E R , Canon i cal signed digit represe n t ation for FIR digi tal ﬁlters , in IEEE W orkshop on Signal Processing Systems, 2000, p. 416–426. [37] N. J. H I G H A M , Computing the polar decomposition—wi th applications , SIAM Journal on Scientiﬁc and Statistical Computing , 7 (1986), p. 1160–1174 . [38] , Computing real square roots of a real matrix , Linear Algebra and its applications , 88 (1987), p. 405–430. [39] F . H O F M A N N , C . H A N S E N , A N D W. S C H A F E R , Digital radio mondi ale (DRM) digital sound broadcasting in the AM bands , IEEE Transactions on B roadca s ting , 49 ( 2003), p. 319–328 . [40] R . A . H O R N A N D C. R . J O H N S O N , Matrix analysis , Cambridge University Press, New Y ork, NY , 2012. [41] S. G. J O H N S O N A N D M . F R I G O , A modi ﬁed split-radix FFT wi th fewer arithmetic operations , IEEE Transactions on Signal Process- ing , 55 (2006), p. 111–119. [42] D. - S . K I M , S . - S. L E E , J. - Y . S O N G , K . - Y. W A N G , A N D D . - J. C H U N G , Design of a mixed prime factor FFT for portable digital ra d io mondiale receiver , IEE E Transactions on Consumer Electronics , 54 (2008), p. 1590–1594. [43] , Design of a mixed prime factor FFT for portable digit al radio mondiale receiver , IEEE Transactions on Consumer Electronics, 54 ( 2008), p. 1590–159 4. [44] I . K I T A M U R A , S . K A N A I , A N D T . K I S H I N A M I , Copyright protection of vector map using di gital watermarking method based on d iscrete F ourier transform , in IEEE International Geoscience and Remote Sen s ing Symposium, vol. 3, IEEE, 2001, p. 1191–1193 . [45] K . K R I S H N E G O W D A , R . K R A E M E R , A . C . W O L F , A N D E . R . B A M M I D I , High-speed channel equalization scheme for 100 Gbps system , in IEEE In ternational Conf erence on Industrial T echnology , 2018, p. 1430–1435. [46] S. K U L A S E K E R A , A . M A D A N AYA K E , D . S U A R E Z , R . J. C I N T R A , A N D F . M . B AY E R , Multi-beam receiver apertures using mul tiplierless 8-point approximate DFT , in IEEE Radar Conference, 2015, p. 1244–1249. [47] S. - C . L A I , W.- H . J U A N G , Y.- S . L E E , A N D S . - F . L E I , High-perf ormance RDFT design for app l ications of digital ra d io mondiale , in IEEE International S ymposium on Circuits and Systems, 2013, p. 2601–2604. [48] D. - U. L E E , H . K I M , M . R A H I M I , D . E S T R I N , A N D J. D . V I L L A S E N O R , Energy-efﬁcient image compre ssion for resource-constrained platforms , IEEE Transactions on Image Processing , 18 (2009), p. 2100–2113 . [49] S. L I N , N. L I U , M . N A Z E M I , H . L I , C . D I N G , Y. W A N G , A N D M . P E D R A M , FFT-base d d eep learning deployment in embedded systems , in Design, A utomation Test in Europe Con ference Exhibition, 2018, p. 1045–10 50. [50] N. L U , N . C H E N G , N. Z H A N G , X . S H E N , A N D J. W . M A R K , Connected vehicles: Solutions and challenges , IEEE Internet of Things J ou rnal, 1 (2014), p. 289–299 . [51] D. - K . L U N A N D W . - C . S I U , An analysis for the realization of an in -place and in -order prime f actor algorith m , IEEE Transactions on Signal Processing, 41 (1993), pp. 2362–2370. [52] A . M A D A N AY A K E , V . A R I YA R A T H N A , S. M A D I S H E T T Y , S. P U L I P A T I , R . J. C I N T R A , D . C O E L H O , R . O L I V I E R A , F . M . B AY E R , L . B E L O S T O T S K I , S . M A N D A L , A N D T . S. R A P PA P O R T , T owards a low-SWaP 1024-beam digital array: A 32-beam subsystem at 5.8 GHz , IEEE Transactions on An te nnas and Propagati on , 68 (2019), p. 900–912. [53] A . M A D A N AYA K E , R . J. C I N T R A , N. A K R A M , V . A R I YA R A T H N A , S . M A N D A L , V. A . C O U T I N H O , F . M . B AY E R , D . C O E L H O , A N D T . S . R A P PA P O R T , F ast radix-32 approximate DFTs for 1024-beam digit al RF beamforming , IEEE Access, 8 (2020), p. 96613–96627. [54] K . M A H A R A T N A , E . G R A S S , A N D U. J A G D H O L D , A 64-poin t F ourier transfor m chip for high-speed wireless LAN application using OFDM , IE EE Journal of Solid-State Circuits, 39 (2004), p. 484–493. [55] H . M A LVA R , A . H A L L A P U R O , M . K A R C Z E W I C Z , A N D L . K E R O F S K Y , Low-complexity trans form and quanti zation with 16-bit arith- metic for H. 26L , in IE EE International Co n ference on Image Processing, vol. 2, 2002, p. II–II. [56] M A T L A B , version 8.1. (R2013a) , The MathW orks Inc., Natick, Massachusetts, 2013. [57] P . A . M I L D E R , F . F R A N C H E T T I , J. C . H O E , A N D M . P Ü S C H E L , Hardwa re implementation of the discrete F ou rier transform with non-power -o f-two problem size , in IEEE International Conference o n A cous tics , Speech and Signal Processing, 2010, p. 1546–154 9. [58] S. H . M I R F A R S H B A FA N , S . T A N E R , A N D C . S T U D E R , SMUL-FFT : a streaming multi plierless fast F ourier transform , IEEE Transac- tions o n C ircuits and Systems II: E xpress Briefs, 68 (2021), pp. 1715–1719. [59] S. K . M I T R A , Digital signal proces sin g: a computer -based approac h , McGra w-Hill Higher Education, 2001. [60] G. S . M O G H A D A M A N D A . A . B . S H I R A Z I , DOA estimation with co-prime arrays based on multiplicative bea mforming , in Interna- tional Symposium on Telecommunica tions, 2018, p. 501–506. [61] D. G . M Y E R S , Di gital signal proces sin g: efﬁcient convolution and F ourier transfor m t echniques , Prentice-Hall, Inc., New Y ork, NY , 1990. [62] H . J. N U S S B A U M E R , The f ast Fourier transform , in F ast Fouri er Transf o rm and C onvolution A lgorithms , Spri n ger , Heidelberg , BW , 1981, p. 80–111. 23 [63] K . B . O L D H A M , J. M Y L A N D , A N D J. S PA N I E R , An atlas of functi ons: wit h equator , t h e atlas f unction calculator , Springer Scienc e & Business Media, New Y ork, NY , 2010. [64] A . V . O P P E N H E I M , Di screte - time signal processing , Prentice-Hall, Upper Saddle River , NJ , 1999. [65] L . P O R T E L L A , D . F . C O E L H O , F . M . B AY E R , A . M A D A N AYA K E , A N D R . J. C I N T R A , Radix- N algorithm for computi ng N 2 n -point DFT approximations , IEE E Signal Processing Letters, 29 (2022), p. 1838–184 2. [66] U . S . P O T L U R I , A . M A D A N AYA K E , R . J. C I N T R A , F . M . B AY E R , A N D N . R A J A PA K S H A , Multipli er -free DCT approximations for RF multi -beam digital aperture-arra y spac e imaging and directional sensing , Measurement Science and Technology , 23 (2012), p. 114003. [67] S. Q A D E E R , M . Z . A . K H A N , S . A . S A T T A R , A N D A H M E D , A radix-2 DIT FFT with reduced arithmetic complexity , in IEE E Interna- tional C onference on Advances in Computing, Commun ications and Inf ormatics , 201 4, p. 1892–1896 . [68] S. Q I N , Y. D. Z H A N G , A N D M . G . A M I N , Generalized coprime array conﬁgurations for di rection-of-arrival estimation , IEEE Trans- actions on S ignal Processing, 63 (2015), p. 1377–1390. [69] K . R . R A O A N D P . C. Y I P , The transform and data compress i on handbook , CRC press, Boca Raton, FL, 2018. [70] G. A . S E B E R , A matrix handbook for statisticians , John Wiley & Son s, Hoboken , NJ , 2008. [71] J. S H I , G. H U , X . Z H A N G , F . S U N , W . Z H E N G , A N D Y . X I A O , Generalized co-prime MIMO rada r for DOA estimation wi th enhanced degrees of f reedom , IEE E Sensors Journa l, 18 (2017), p. 1203–1212. [72] S. S O N G , K . S O N G , T . X U , W . Z H O U , H . L I , A N D W . L I U , T h e electronics design of real-time feedback control system in KTX , IEEE Transactions on Nuclear Science, 68 (2021), p. 2066–2073. [73] J. A . S T A N K O V I C , Researc h directions for the Internet of Thin gs , IEEE Internet of Things Journal, 1 (2014), p. 3–9. [74] J. Y. S T E I N , Digi tal signal process i ng: a computer science perspe cti ve , J ohn Wiley & Sons, Inc., New Y ork, NY , 2000. [75] D. M . S U Á R E Z V I L L A G R Á N , Aproximações para a transformada di screta de F ou rier e aplicações em d eteção e estimação , Master’ s thesis, Universidade F ederal de Pernamb u o, Recife, Brazil, 2015. [76] D. M . S U Á R E Z V I L L A G R Á N , R . J. C I N T R A , F . M . B AY E R , A . S E N G U P T A , S . K U L A S E K E R A , A N D A . M A D A N AYA K E , Multi- beam RF aperture usin g mul t iplierless FFT approximation , Electronics Letters , 50 (2014), p. 1788–1790 . [77] C. T A B L A D A , F . M . B AY E R , A N D R . J. C I N T R A , A class of DCT app roxi mations based on the Feig–Winograd algorith m , Signal Processing , 113 (2015), p. 38–51. [78] L . H . T H O M A S , Using a computer t o solve problems in physics , Applicati on s of Digital Computers, (1963), p. 44–45. [79] P . P . V A I D YA N A T H A N A N D P . P A L , Sparse sensing with co-prime sampler s and arr ays , IE E E Transactions on Signal Proc e s sing, 59 (2010), p. 573–586. [80] M . D . V A N D E B U R G WA L , P . T . W O L K O T T E , A N D G. J. S M I T , Non- power -of-two FFTs: Exploring th e ﬂexibilit y of the montium TP , International J ournal of Reconﬁgurable C omputing , 2009 (2009). [81] W . W A N G , S . R E N , A N D Z . C H E N , Uni ﬁed coprime array with mul t i-period subarra y s for direction-of-arrival estimation , Digital Signal Processing, 74 (2018), p. 30–42. [82] D. S . W A T K I N S , Fund amentals of matrix computations , vol. 64, John Wiley & Sons, New Y ork, NY , 2004. [83] K . - J. Y A N G , S. - H . T S A I , A N D G. C . H . C H U A N G , MDC FFT/IFF T processor with variable length for MIMO-OFDM systems , IEEE Transactions on V ery Large Scale Integration (VLSI) Systems, 21 ( 2013), pp. 720–731. [84] Z . - X . Y A N G , Y. - P . H U , C . - Y . P A N , A N D L . Y A N G , Design of a 3780-point IFFT processor for TDS-OFDM , IEEE Transactions on Broadcasting , 48 ( 2002), p. 57–61. [85] Z . Z H A N G , X . W A N G , K . L O N G , A . V . V A S I L A K O S , A N D L . H A N Z O , Large-scale MIMO-based wireless back h aul i n 5G networks , IEEE Wireless Communications, 22 (2015), p. 58–66. 24

Multiplierless DFT Approximation Based on the Prime Factor Algorithm

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment