Estimates of the Reconstruction Error in Partially Redressed Warped Frames Expansions

Pr oceedings of the 19 th International Confer ence on Digital Audio Ef fects (DAFx-16), Brno, Czec h Republic, September 5–9, 2016 ESTIMA TES OF THE RECONSTR UCTION ERR OR IN P AR TIALL Y REDRESSED W ARPED FRAMES EXP ANSIONS Thomas Mejstrik ∗ Department of Mathematics Univ ersity of V ienna V ienna, Austria tommsch@gmx.at Gianpaolo Evangelista Institute for Composition and Electroacoustics, MD W , Univ ersity of Music and Performing Arts V ienna, Austria evangelista@mdw.ac.at ABSTRA CT In recent work, redressed w arped frames hav e been introduced for the analysis and synthesis of audio signals with non-uniform fre- quency and time resolutions. In these frames, the allocation of frequency bands or time intervals of the elements of the represen- tation can be uniquely described by means of a warping map. In- verse warping applied after time-frequenc y sampling pro vides the key to reduce or eliminate dispersion of the w arped frame elements in the conjugate v ariable, making it possible, e.g., to construct fre- quency warped frames with synchronous time alignment through frequency . The redressing procedure is howe ver e xact only when the analysis and synthesis windows hav e compact support in the domain where warping is applied. This implies that frequency warped frames cannot have compact support in the time domain. This property is undesirable when online computation is required. Approximations in which the time support is ﬁnite are howe ver possible, which lead to small reconstruction errors. In this pa- per we study the approximation error for compactly supported fre- quency warped analysis-synthesis elements, providing a few ex- amples and case studies. 1. INTR ODUCTION The a vailability of conﬁgurable time-frequenc y schemes is an as- set for sound analysis and synthesis, where the elements of the representation can efﬁciently capture features of the signal. These include features of interpretation: e.g. glissandi and vibrati; human perception: e.g. non-uniform frequency sensitivity of the cochlea; music theory: e.g. scales (pentatonic, 12-tone, Indian scales with unequally spaced tones, etc.); physical ef fects: e.g. non-harmonic ov ertones in the lo w register of the piano or of percussion instru- ments and so forth. When used in the conte xt of Music Informa- tion Retriev al, the adaptation of the representation to music scales is bound to improve. e.g. for instrument-, note-, chord- and ov er- tone detection and recognition. T raditionally , two extreme cases have been considered: Ga- bor expansions featuring uniform time and frequency resolutions and orthogonal wavelet expansions and frames featuring octave band allocation and constant uncertainty of the representation el- ements. In previous work [1, 2, 3, 4, 5, 6, 7], generalised Gabor frames ha ve been constructed which allow for non-uniform time- frequency schemes with perfect reconstruction. In [3] the alloca- tion of generalised Gabor atoms is speciﬁed according to a fre- quency or time warping map. In [8] the STFT redressing method ∗ Part of the research contained in this work was performed by this au- thor while he was a Master student at MDW , Wien, under the guidance of Prof. Gianpaolo Evangelista is introduced, which, with the use of additional warping in time- frequency , shows under which conditions one can have generalised Gabor frames. These conditions stem from the interaction of sam- pling in time-frequency and frequenc y or time warping operators, which allow to incorporate the results in [3] in a more general context. It is shown that arbitrary allocation of the atoms is ex- actly possible in the so called painless case , i.e. in the case of ﬁnite time support of the windows for arbitrary time interval al- location and of ﬁnite frequency support of the windows for arbi- trary frequency band allocation. Non-uniform frequency analysis by means of warping was introduced in [7]. Non-uniform Gabor frames with constant-Q were previously introduced in [1] , based on the theory developed in [2], where an ad hoc procedure was employed for their construction. In [3] we provided an alterna- tiv e general method for their construction, using warping. In [8] the redressing method was introduced and applied to the general construction of non-stationary Gabor frames. In [9] the general theory was revisited with the real-time computational aspects in mind. There, the ﬁrst approximate schemes were introduced with- out extensi ve testing. Since online computation of the generalised Gabor analysis- synthesis is only possible with ﬁnite duration windows, the arbi- trary frequenc y band allocation does not lead to an exact solution in applications that require real-time, while the arbitrary time inter- val allocation presents little or no problem. In [9] approximations leading to nearly exact representations were introduced. In this paper we expand on the results in [9] and provide a study of the approximation error on a wide class of signals when ﬁnite duration windows are required in arbitrary frequency band allocation. The paper is or ganised as follo ws. In Section 2 we revie w the concept of applying time and frequency warping to time-frequency representations, together with the redressing method, which in- volv es a further warping operation in the time-frequency domain to reduce or eliminate dispersion. In Section 3 we introduce approx- imations suitable for the online computation of redressed frame expansions. In Section 4 the results of numerical experiments are shown, which provide estimations of the approximation error . In Section 5 we draw our conclusions. 2. REDRESSED W ARPED GABOR FRAMES In this section we revie w the concepts leading to the deﬁnition of redressing in the context of frequency warped time-frequency representations. First we re vie w the basic notions of STFT (Short- T ime Fourier T ransform) and Gabor frames. Then we mov e on to the deﬁnition of warped frames and then to the redressing proce- dure. D AFX-1 Pr oceedings of the 19 th International Confer ence on Digital Audio Effects (DAFx-16), Brno, Czech Republic, September 5–9, 2016 2.1. STFT and Gabor Frames Gabor expansions can be considered as a form of sampling and ex- act reconstruction of the STFT . As is well-kno wn, gi ven a windo w h and deﬁning the time-shift operator T τ s ( t ) = s ( t − τ ) and the modulation operator M ν s ( t ) = e j 2 πν t s ( t ) , the STFT is obtained by applying the operator S to the signal s : [ S s ] ( τ , ν ) = h s, h τ ,ν i = h s, T τ M ν h i = Z + ∞ −∞ s ( t ) h 0 , 0 ( t − τ ) e − j 2 πν ( t − τ ) dt, (1) where the ov erbar denotes complex conjugation. A Gabor system is generated by the kernel of S by sampling the time-frequency plane ( τ , ν ) : G ( h, a, b ) = { T na M qb h : n, q ∈ Z } (2) where a, b > 0 are sampling parameters. The scalar products of the signal with the members of the Ga- bor system h s, T na M qb h i , n, q ∈ Z (3) provide e valuations of the STFT (1) of a signal s with window h at the time-frequency grid of points ( na, q b ) , with n, q ∈ Z . The question whether the signal s can be reconstructed from these ev aluations can be addressed by introducing the concept of frame. A sequence of functions { ψ l } l ∈ I in the Hilbert space H is called a frame if there exist both positi ve constant lo wer and upper bounds A and B , respectiv ely , such that A k s k 2 ≤ X l ∈ I |h s, ψ l i| 2 ≤ B k s k 2 ∀ s ∈ H , (4) where k s k 2 = h s, s i is the norm square or total energy of the signal. Frames generate signal expansions, i.e., the signal can be perfectly reconstructed from its projections ov er the frame. A Gabor system that is a frame is called a Gabor frame . In this case, the signal can be reconstructed from the corresponding samples of the STFT (3). While not unique, reconstruction can be achiev ed with the help of a dual frame, which in turn is a Gabor frame generated by a dual window ˜ h . Perfect reconstruction es- sentially depends on the choice of the window and the sampling grid. One can sho w that there exist no Gabor frames when ab > 1 . See [10] for more informations about Gabor frames. 2.2. W arped STFT and Gabor Frames The warped STFT can be obtained by warping the signal prior to applying the STFT operator . In this paper we focus on pure frequency warping. A frequency warping operator W ˜ θ is completely characterised by a function composition operator W θ , such that W θ x = x ◦ θ , in the frequency domain: W ˜ θ = F − 1 W θ F , (5) where F is the Fourier transform operator . The function θ is the frequency warping map, which transforms the Fourier transform ˆ s = F s of a signal s into the Fourier transform ˆ s f w = F s f w of another signal s f w . W e afﬁx the ˜ symbol over the map θ as a reminder that the map operates in the frequency domain. If the warping map is one-to-one and almost e verywhere dif- ferentiable then a unitary form U ˜ θ of the warping operator can be deﬁned by the following frequenc y domain action ˆ s f w ( ν ) = h d U ˜ θ s i ( ν ) = q   dθ dν   ˆ s ( θ ( ν )) , (6) where ν denotes frequency . W e assume henceforth that all warping maps are almost ev erywhere increasing so that the magnitude sign can be dropped from the deriv ati ve under the square root. Giv en a frequency w arping operator W ˜ θ , the warped STFT is deﬁned through the operator S ˜ θ as follows [ S ˜ θ s ] ( τ , ν ) = [ S W ˜ θ s ] ( τ , ν ) = h W ˜ θ s, h τ ,ν i = D s, W † ˜ θ h τ ,ν E , (7) which is indeed a w arped v ersion of (1), where W † ˜ θ is the adjoint of the warping operator . If the warping operator is unitary then we have W † ˜ θ = W − 1 ˜ θ = W ˜ θ − 1 . In that case, warping the signal prior to STFT is perfectly equiv alent to perform STFT analysis with inv ersely frequency warped windows. The warped STFT is unitarily equiv alent to the STFT so that a number of properties concerning conditioning and reconstruction hold [11]. The Fourier transforms of the frequenc y w arped STFT analy- sis elements are ˆ ˜ h τ ,ν ( f ) = h \ W ˜ θ − 1 h τ ,ν i ( f ) = q dθ − 1 d f ˆ h ( θ − 1 ( f ) − ν ) e − j 2 πθ − 1 ( f ) τ , (8) i.e., the warped STFT analysis elements are obtained from fre- quency warped modulated windows centred at frequencies f = θ ( ν ) . The windo ws are time-shifted with dispersive delay , where the group delay is τ dθ − 1 d f . Frequency warping generally disrupts the time organisation of signals due to the fact that the time-shift operator T τ does not commute with the frequency warping operator [9]. From (4) it is easy to see that any unitary operation, in partic- ular unitary warping, on a frame results in a new frame with the same frame bounds A and B [11]. Since the atoms are not gen- erated by shifting and modulating a single window function, the resulting frames are not necessarily of the Gabor type. Howe ver , warping prior to con ventional Gabor analysis and unwarping after Gabor synthesis always leads to perfect reconstruction. Starting from a Gabor frame (analysis) { ϕ n,q } q,n ∈ Z and dual frame (synthesis) { γ n,q } n,q ∈ Z : ϕ n,q = T na M qb h γ n,q = T na M qb g , (9) where h and g are dual windo ws, warped frames can be generated, following (8), by unw arping the analysis and synthesis frames. In the case of non-unitary warping, a frequency domain scaling op- eration is necessary in order to reconstruct the original signal. For the case of unitary warping we simply ha ve: ˜ ϕ n,q = U † ˜ θ ϕ n,q = U ˜ θ − 1 T na M qb h ˜ γ n,q = U † ˜ θ γ n,q = U ˜ θ − 1 T na M qb g , (10) where { ˜ ϕ n,q } q,n ∈ Z is the frequency warped analysis frame and { ˜ γ n,q } n,q ∈ Z is the dual warped frame for the synthesis. With these D AFX-2 Pr oceedings of the 19 th International Confer ence on Digital Audio Effects (DAFx-16), Brno, Czech Republic, September 5–9, 2016 deﬁnitions, one obtains the signal expansion s = X n,q ∈ Z h s, ˜ ϕ n,q i ˜ γ n,q . (11) W arped Gabor frames suffer from the same problem as the warped STFT . Indeed the F ourier transforms of the warped Gabor frame elements bear frequency dispersiv e delays so that disper- siv e time samples are produced by the direct application of the frequency warped frame analysis. 2.3. Redressing Methods As shown in [8, 9], the dispersive delays intrinsic to the warped STFT can be redressed, i.e. made into constant delays in each anal- ysis band if frequenc y unw arping is performed in the transformed time domain, i.e. with respect to time shift. In other w ords, instead of (7) we consider the similarity transformation W † ˜ θ S W ˜ θ on the STFT operator , which is time-shift cov ariant. In fact, one has: h \ W ˜ θ − 1 S W ˜ θ s i ( f , ν ) = ˆ h 0 , 0 ( θ − 1 ( f ) − ν ) ˆ s ( f ) , (12) which is in the form of a time-inv ariant ﬁltering operation, cor- responding to con volution in time domain. The ﬁlters are fre- quency warped versions of the modulated windows in the tradi- tional STFT . The Fourier transform of the redressed analysis ele- ments are ˆ ˜ ˜ h τ ,ν ( f ) = h \ T τ W ˜ θ h 0 ,ν i ( f ) = ˆ h 0 , 0 ( θ − 1 ( f ) − ν ) e − j 2 πf τ , (13) which sho ws that the dispersi ve delays in the analysis elements (8) are brought back to non-dispersiv e delays. In redressing warped Gabor frames one faces a further difﬁ- culty due to time-frequency sampling. In this case, inv erse fre- quency warping can only be applied to sequences (with the respect to the time shift inde x) and may not perfectly reverse the dispersi ve effect of the original map on delays. Unitary frequency warping in discrete time can be realised with the help of an orthonormal basis of ` 2 ( Z ) constructed from an almost e verywhere differentiable warping map ϑ that is one-to-one and onto [ − 1 2 , + 1 2 [ , as follows: µ m ( n ) = Z + 1 2 − 1 2 q dϑ dν e j 2 π ( nϑ ( ν ) − mν ) dν, (14) where n, m ∈ Z (see [12, 13, 14, 15, 16]). The map ϑ can be extended over the entire real axis as congruent modulo 1 to a 1 - periodic function. Giv en any sequence { x ( n ) } in ` 2 ( Z ) , the action of the discrete- time unitary warping operator D ˜ ϑ is deﬁned as follows: ˜ x ( m ) = [ D ˜ ϑ x ] ( m ) = h x, µ m i ` 2 ( Z ) . (15) In fact, the sequence { ˜ x ( m ) } in ` 2 ( Z ) satisﬁes ˆ ˜ x ( ν ) = q dϑ dν ˆ x ( ϑ ( ν )) , (16) where the ˆ symbol, when applied to sequences, denotes discrete- time Fourier transform. The sequences η m ( n ) deﬁne the nucleus of the in verse unitary frequency warping ` 2 ( Z ) operator D ˜ ϑ − 1 = D † ˜ ϑ . where η m ( n ) = µ n ( m ) . In order to limit or eliminate time dispersion in the frequenc y warped Gabor expansion, the discrete-time frequency warping op- erator D ˜ ϑ − 1 is applied to the time sequence of expansion coefﬁ- cients o ver the warped Gabor frames. Since the operator is applied only on the time index, for generality , one can include dependency of the map and of the sequences η n on the frequency inde x q . The process can be equiv alently described by deﬁning the redressed frequency warped Gabor analysis and synthesis frames as follo ws: ˜ ˜ ϕ n,q = D ˜ ϑ − 1 q ˜ ϕ • ,q = X m η n,q ( m ) ˜ ϕ m,q ˜ ˜ γ n,q = D ˜ ϑ − 1 q ˜ γ • ,q = X m η n,q ( m ) ˜ γ m,q , (17) obtaining: s = X n,q ∈ ( Z )  s, ˜ ˜ ϕ n,q  ˜ ˜ γ n,q . (18) One can sho w [9] that the Fourier transforms of the redressed frame are ˆ ˜ ˜ ϕ n,q ( f ) = A ( f ) ˆ h ( θ − 1 ( f ) − q b ) e − j 2 πnϑ q ( aθ − 1 ( f )) , (19) where A ( f ) = q dθ − 1 d f q dϑ q dν     ν = aθ − 1 ( f ) . (20) Hence, dispersion is completely eliminated if ϑ q ( aθ − 1 ( f )) = d q f (21) for an y f ∈ R , where d q are positi ve constants controlling the time scale in each frequenc y band. In this case, the Fourier transforms of the redressed frame elements become: ˆ ˜ ˜ ϕ n,q ( f ) = q d q a ˆ h ( θ − 1 ( f ) − q b ) e − j 2 πnd q f . (22) When all d q are identical all the time samples are aligned to a uniform time scale throughout frequencies. If the d q are distinct, time realignment when displaying the non-uniform spectrogram is a simple matter of different time base or time scale for each frequency band. Unfortunately , due to the discrete nature of the redressing warp- ing operation, each map ϑ q is constrained to be congruent modulo 1 to a 1 -periodic function, while the global warping map θ can be arbitrarily selected. Moreov er , the functions ϑ q must be one- to-one in each unit interval, therefore they can have at most an increment of 1 there. In Fig. 1 we illustrate the phase linearisation problem. There, the black curve is the amplitude scaled warping map d q θ ( ν ) and the grey curve represents the map ϑ q ( aν ) , which is 1 /a -periodic. Both maps are plotted in the abscissa ν = θ − 1 ( f ) . By amplitude scaling the warping map θ one can allo w the values of the map to lie in the range of the discrete-time warping map ϑ q . The ampli- tude scaling factors happen to be the ne w time sampling interv als d q of the redressed warped Gabor expansion. In the “painless” case, which was hand picked in [3], the win- dow h is chosen to have compact support in the frequency domain. Through equation (19) and condition (21), the redressing method shows that, giv en any continuous and almost anywhere differen- tiable and increasing warping map, only in the painless case one can e xactly eliminate the dispersiv e delays with the help of (17). In fact, in this case linearisation of the phase is only required within a D AFX-3 Pr oceedings of the 19 th International Confer ence on Digital Audio Effects (DAFx-16), Brno, Czech Republic, September 5–9, 2016 J ( a n ) q d q ( n ) q q n = q ( f ) -1 d q ( n ) 1 1 a n - n + q d q ( n ) + - Figure 1: Locally eliminating dispersion by means of discrete-time frequency warping. Black line: curve derived from the original map θ by amplitude scaling. Gray line: discrete-time frequency warping characteristics for local phase linearisation. ﬁnite frequenc y range gi ven by the frequenc y support of the frame elements [8, 9], which is compatible with the periodicity constraint of the redressing maps ϑ q . In the general case, a perfect time realignment of the compo- nents is not guaranteed. Notice, ho wev er , that, by construction, the redressed warped Gabor systems are guaranteed to be frames for any choice of the maps ϑ q satisfying the stated periodicity con- ditions, e ven when the phase is not completely linearised. Lo- cally , within a certain band of the warped modulated windows it is possible to linearise the phase of the complex exponentials in (19). In the sequel we will refer to this band as the essential band- width since, hopefully , the magnitude of the Fourier transform of the window is ne gligible outside it, at least as a design goal. Unfortunately , in general both the painless case and the par- tially redressed cases lead to inﬁnite duration windows, which are undesirable when online computation is required. In the next we are going to propose some approximations and study the recon- struction error also through numerical experiments. 3. REAL-TIME COMPUT A TION OF THE W ARPED GABOR EXP ANSION For real-time computation one needs to make assumptions on the signal, as well on the window functions h in order to keep the computational load as low as possible. By requiring the window h to be real-valued and θ to be odd, we obtain ˆ ˜ ˜ ϕ n, − q ( − f ) = ˆ ˜ ˜ ϕ n,q ( f ) . If, additionally the input sig- nal is real-valued, then the coef ﬁcients c n,q = h s, ˜ ˜ ϕ n,q i fulﬁl c n, − q = c n,q and thus we only need to compute about half the coefﬁcients and frame elements. By enforcing shift-inv ariance of the w arped frame elements (i.e. ˜ ˜ ϕ n,q ( t ) = ˜ ˜ ϕ 0 ,q ( t − na ) ), which we must do since otherwise the pre-computation of the frame ele- ments and hence the real-time computation of the expansion would be impossible, it is sufﬁcient to only store ˜ ˜ ϕ 0 ,q for non-negativ e values of q. It is advisable that the warping map is selected as a continuously differentiable function, since the error in the resyn- thesised signal in the frequency domain at the points of discon- tinuity of θ 0 is very high. A strictly positiv e deriv ati ve helps the atoms not get too stretched in the time domain, which is undesir - able in real time applications. T o av oid aliasing we further require that θ ( q b ) = SR / 2 and θ 00 ( q b ) = 0 for some q ∈ N , where SR denotes the sampling rate. This ensures that the high frequencies are smoothly mapped back to the negativ e frequencies and avoids extra aliasing introduced by the approximation. By requiring the Fourier transform of the windo w h to have an analytic e xpression, we can av oid numerical errors in the computation of the warped windows. W e further only worked with windows which are dual to itself, i.e. ϕ n,q = γ n,q which is actually not a necessary restric- tion. 3.1. Implementation Details W e implemented the approximate warped redressed Gabor analy- sis and synthesis as C externals interfaced to Pure-Data (32 bit and 64 bit). It runs both under W indows and Linux and can use multi- ple cores. The warped windo ws are computed using equation (22) and applying an IDFT of that data. The inner products c n,q are directly computed with a loop. Also for the synthesizing part, the sum P n,q c n,q ˜ ˜ ϕ n,q is directly computed. It turned out that this is sufﬁciently fast for real-time computation and whence we made no use of fast con volution algorithm. Since the data rate of the anal- ysis part is not uniform in time, PD’ s signal connections are not suitable to transfer the coefﬁcients. Therefore we use the signal connections to transfer pointers rather than signals. 3.2. Interface Details For simplicity we require that the essential bandwidth B is a mul- tiple K ∈ N of the frequency shift parameter b , i.e. B = K b . Then, in order to obtain a frame the follo wing conditions must be fulﬁlled as can be seen easily in Figure 1 [9, eq. (34) to (36)]: abK ≤ 1 d q B q ≤ 1 (23) where B q is the ess. bandwidth of the warped modulated windo w . B q = θ  q b + B 2  − θ  q b − B 2  (24) In the case of an exponential increasing warping map, it makes sense to set the frequency shifting parameter in a way that adjacent notes f all aw ay from the windo ws main lobe in the frequency do- main (If there is such a main lobe, as in the case of the raised cosine window). This on the other hand determines the windo w length T h , a minimal value for R and the time shift parameter a = T h R (25) with R ≥ K due to equation (23). W e set R = K for simplicity . Equations (23) are fulﬁlled by setting b = 1 aRC b d q ' 1 B q C d (26) where C b , C d can be chosen inside the PD-patch. C b together with R controls the oversampling, where C d controls the bandwidth in which the phase is linearised and also inﬂuences the o versampling. The d q are different for each band which results in a non-uniform D AFX-4 Pr oceedings of the 19 th International Confer ence on Digital Audio Effects (DAFx-16), Brno, Czech Republic, September 5–9, 2016 Figure 2: This is a sample patch for the use of our Pure-Data Im- plementation. wabor˜ and in vwabor˜ are for analysing and synthe- sising. They are connected with two signal-paths. invwabor˜ holds at the right outlet the delay in samples due to the analysis-synthesis procedure. welay˜ is a simple delay line. If perfect reconstruction is achie ved, the output at dac˜ is zero. With the toggles the e xter- nals can be switched on and off. The bangs are for outputting the parameters used. data rate for each band. W e remark that the numbers d q hav e to be chosen, such that d q / SR ∈ N , where SR is the sampling rate. The number of bands q sup we need for a giv en SR is q sup = θ − 1 ( S R/ 2) b . (27) The assumptions on the warping map ensure that this is a natural number . Due to the warping, the supports of the windo ws are in general unbounded. Thus we compute the windows with a zero padded array of length of T c ( c for compute), which is deﬁned as T c = T h θ 0 inf C T c (28) where C T c ≥ 1 can be chosen inside the PD-patch and θ 0 inf = inf f ∈ R θ 0 ( f ) . Due to the same reason it is indispensable to cut the windo ws after warping. T o deﬁne a sensible atom length T q after warping, we set the atoms to zero after the point from which they were smaller than   ˜ ˜ ϕ q ( t )   < 1 /C cut max t   ˜ ˜ ϕ q ( t )   , with C cut a constant which can be chosen inside the PD-patch. Afterwards the windows are truncated accordingly and with respect to the pa- rameter T max , which deﬁnes the maximal desired window length. It is important that the windo ws are cut after aligning them to the time origin. The second approach suggested in [9] to obtain win- dows with ﬁnite length by computing only a linearised version of the warped windo ws, leads to very bad reconstruction. 3.3. Computational Costs 3.3.1. Analysis A rough estimation yields the follo wing. Let T q denote the length of the q th -window . It will be clear from the context if T q denotes seconds or samples. T o compute the inner product of that windo w with the signal one needs 2 T q many real multiplications and sum- mations. This has to be done e very d q samples. Hence, per sample we ha ve 4 T q /d q many ﬂoating point operations (ﬂops) in a verage. If the essential bandwidth is not too lar ge, then T q is proportional to T 0 /θ 0 ( q b ) and d q is proportional to 1 / ( K bθ 0 ( q b )) since, lin- earising θ around qb yields θ ( q b + K b 2 ) ' θ ( q b ) + θ 0 ( q b ) K b 2 and Abbrev . Explanation SR Sampling Rate B q Essential bandwidth of the q th window T c , C T c Length of the window used in pre-computation T h Length of the original window T q Length of the q th window T max Maximal window length used for real time computation C cut Parameter controlling the truncation of the win- dows in the time domain after precomputing b , C b Frequency shift parameter d , C d T ime shift parameter of the warped windows q sup Number of bands Figure 3: Summary of the variables used in the PD implementa- tion. The parameters C X control their variable X . hence B q = θ  q b + K b 2  − θ  q b − K b 2  ' θ 0 ( q b ) K b ⇒ d q = 1 B q ' 1 θ 0 ( q b ) K b (29) Summing up o ver all windows we get the average number of operations per sample N avg : N avg = X q 4 T q d q ∼ X q 4 T 0 θ 0 ( q b ) K bθ 0 ( q b ) 1 ' 4 K θ − 1 ( SR / 2) T 0 (30) Where q sup , deﬁned abov e, denotes the number of bands and de- pends on θ − 1 . That means the complexity of the analysis part is proportional to T 0 , K and b and q sup ∼ θ − 1 ( S R ) . If there is a lo wer bound for the d q ’ s, one can choose identical numbers for all bands. Howe ver , this increases the computational load tremendously , making real time computation impossible. The above is an estimation of the a verage computational cost. In the w orst case all inner products of the windo ws with the signal hav e to be computed starting at one frame. The number of opera- tions for that frame is X q 4 T q ' 4 X q T 0 θ 0 ( q b ) (31) Howe ver , if one processes the audio stream block wise, the worst case cannot arise for all samples in the block at once, be- cause two worst case scenarios have a distance of at least M = max q d q (actually M = lcm ( { d q } ) ), the next M samples have for sure lower computational cost. Furthermore, the next m = min q d q samples have zero computational cost. Therefore the av- erage costs are a suitable measure for the analysis process. 3.3.2. Synthesis P art The complexity of the synthesis part is the same as that of the analysis part. Furthermore, in the synthesis part the worst case scenario can be av oided, because only the parts of the next frame buf fer have to be computed in real-time, the rest can be computed later . Nev ertheless, our given implementation is not optimised in this direction. D AFX-5 Pr oceedings of the 19 th International Confer ence on Digital Audio Effects (DAFx-16), Brno, Czech Republic, September 5–9, 2016 3.3.3. Memory Costs Our algorithm for precomputing the windows needs 2 q T 0 cells. W ith slight modiﬁcations it only needs T 0 + P q T q . The small- est possible number of cells being needed is P q T q . The anal- ysis and synthesis algorithm both need at least a buf fer of size audiobuf fer + T max , where T max denotes the window length of the longest windows used in the analysis and synthesis, and au- diobuf fer the b uffer length in which the audio is processed. The frame elements for our tests below needed between 50 and 400 MB. 4. COMPUT A TIONAL ERR OR The measured er r -numbers are the difference between the aver - aged RMS-amplitude in dB of the input signal and the analysed- synthesised output signal (i.e. negati ve signal to noise ratio). For comparison: 16 bit quantization (which is CD-quality) has err ' − 96 dB, 8 bit quantization has err ' − 50 dB. The tests were con- ducted with the PD-patch av ailable at tommsch.com/science.php in real-time over a time of about 20 s with double precision ﬂoat- ing point numbers and a sample rate of 44 . 1 kHz. W e used the following stationary and non-stationary test signals • white: White noise • sine X: A pure sine tone with X Hz • const: A constant signal • clicks: Clicks with a spacing of 1 s • beet: Beethov en - Piano Sonata op 31.2, length 25 s. • speech: A man counting from one to twenty , length 20 s. • ﬁr e: A ﬁrework, length 23 s. • atom: A sample of sparse synthesised warped Gabor atoms which were also used for that speciﬁc test run. Since our method would lead to perfect reconstruction if the windows were time-shift in variant, the behaviour of the algorithm for stationary signals o ver a long time is of greatest interest. The constant signal is of interest since it is the one with the lowest possible frequenc y and our algorithm may bear problems with lo w frequencies due to the necessary cutting. The clicks represent the other extreme point of signals. The atoms are of interest since they shows whether our algorithm has the ability to reproduce its atoms with high quality . For the warping map, we used a function which is exponen- tially increasing between two frequencies f in and f out , namely θ ( f ) = f 0 2 − f /k where f 0 , k ∈ R are constant parameters which can be set inside the PD-patch. For all the tests we set f 0 = 12 and k = 36 . Outside of ± f out and between ± f in the map is linear . The function attains exactly Nyquist frequency , i.e. at θ (( q sup − 1) b ) = SR / 2 . The frequencies f in , f out are chosen such, that the resulting map is C 1 . See Figure 4 for a plot of θ − 1 . 4.1. Raised Cosine Windo w As proposed in [9] we ﬁrst performed tests with a raised cosine window . h ( t ) = ( q 2 b R cos πt T − T 2 ≤ t ≤ + T 2 0 otherwise (32) with Fourier -Transform ˆ h ( ν ) = r b 2 R  sinc ( ν T − 1 2 ) + sinc ( ν T + 1 2 )  (33) Ν Θ - 1 H Ν L Ν in Ν out SR  2 f in f out H q sup - 1 L b Figure 4: The used w arping map θ . It is C 1 , linear between ± f in and outside of ± f out and passes through S R/ 2 . Figure 5: The picture shows a warped raised cosine window in the time domain. The upper part is a magniﬁcation around zero. Ripples, due to the slow decay in the frequency domain, are clearly visible. where T is the total duration of the window , R ≥ 3 is an integer , f 0 = 3 2 T , a = 3 2 Rf 0 , b = 2 f 0 R . See [9, Section 4] for an ansatz about ho w to compute these parameters. This windo w has a very slow decay in the frequenc y domain after warping: ˆ ˜ ˜ ϕ 0 ,q ( f ) = r d q a ˆ h ( θ − 1 e ( f ) − q b ) ' 1 log 2 f (34) This means that either the warped windo ws are not bounded an y- more (i.e.: ˜ ˜ ϕ q / ∈ L ∞ ( R ) ) or that they are discontinuous, which is clearly visible in the warped windows, a plot of one window can be seen in Figure 5. Hence the computation of the warped windo ws with the IDFT bears numerical errors. Also the test results with this window were suboptimal, yielding an error between − 50 dB and − 60 dB, depending on the chosen parameters and input signal. Changing the parameter R or the parameter C d had a signiﬁ- cant effect on the error . In Figure 6 on can see the inﬂuence of R . This can be expected, since the essential bandwidth, in which the phase is linearised, depends on that parameter . Since ov ersampling is directly proportional to the computational load (as computed in (30)), for real-time applications there is a natural upper bound. C d ≈ 2 , 5 pro vided good results. T oo small and too big numbers for C d both gav e worst results. On the contrary , changing the parameter b while preserving the ov ersampling factor R and the parameter a had no effect, as long C b ≤ 2 . The relation between the windo w length after cutting and the error was nearly proportional to the value of P q T q in a certain range, see Figure 8. From this observ ation one can determine a suitable cutting parameter C cut . Changing the parameter T max has clearly only an inﬂuence on the er r for low frequencies. D AFX-6 Pr oceedings of the 19 th International Confer ence on Digital Audio Effects (DAFx-16), Brno, Czech Republic, September 5–9, 2016 4 P T q /d q R q sup b er r 13.1k 2,0 67 6 Hz − 74 . 4 22.1k 2,5 83 4,8 Hz − 86 . 4 25.9k 3,0 99 4 Hz − 94 . 4 Figure 6: Inﬂuence of the parameter R on the error . Gaussian W indow , B=24 Hz, R , q sup , b =variable, T=0.167 s, C d = 2 , T max =0.629 s, C T c =2, a =0.04167 s, C cut = 20 , test signal: white noise. V alues in dB. T c / T h Raised Cosine Gaussian 1,0 − 63 , 1 − 62 , 8 1,2 − 66 , 9 − 66 , 1 1,4 − 70 , 2 − 71 , 3 1,6 − 71 , 0 − 73 , 3 1,8 − 71 , 1 − 76 , 0 2,0 − 71 , 1 − 77 , 7 2,5 − 71 , 1 − 78 , 3 5,0 − 71 , 0 − 78 , 3 16,0 − 71 , 0 − 71 , 5 Figure 7: Inﬂuence of the parameter T c on the error for the Gaus- sian and the raised cosine window . Parameters as in Figure 10 and 9. Only the Parameter C T c was changed. Signal: White noise. V alues in dB The parameter C T c had no big inﬂuence as long it was about T c was about twice the windo w length T h , see Figure 7 The warping map has a very big inﬂuence on the error . At points where the map is not smooth the error for these frequencies is order of magnitudes higher . This can be seen in Figures 9 and 10 for the sine with 30 Hz signal. At this point, our used warping map is only C 1 . For a C 0 warping map the error was again 10 dB higher . The table in Figure 9 shows selected numerical results with well chosen parameters for the raised cosine window . All v al- ues in dB, v alues abov e − 60 dB and belo w − 70 dB are coloured. The number 4 P T q /d q denotes the average computational av er- age complexity (see abo ve). 4.2. Gaussian Windo w In order to obtain a window with proper decay in the frequency domain after warping, we used a Gaussian window . This windo w does not overlap-add to one. Hence higher overlap is necessary to minimize the de viation from one. The warped windo ws were still fast decaying in the time domain, resulting in the possibility to cut them much shorter then the warped raised-cosine windows which compensated the high computational load due to the high ov erlap. Our used Gaussian window and its F ourier T ransform is h ( t ) = C r b 2 R e − 1 4 t 2 T 2 , ˆ h ( t ) = C T r b R e − t 2 T 2 (35) where T is the total duration of the window , R ≥ 2 the ov erlap fac- tor is an integer , f 0 = R 2 T , a = T R , b = 1 aR and C ' 0 . 893249 · · · is a constant factor used to approximate the overlap-add to one condition. This window has a fast enough decay to ensure that the warped windows in the frequency domain still are in L 1 ( R ) and hence æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ æ à à à à à à à à à à à à à à à à à ì ì ì ì ì ì ì ì ì ì ì 2 4 6 8 10 12 1 1000 â q T q - 100 - 80 - 60 - 40 - 20 err Figure 8: Inﬂuence of the truncation length of the computed win- dow . T est signal: white noise. Red squares and yellow diamonds: Raised Cosine W indo w with dif ferent parameters, blue dots: Gaus- sian window . C cut =variable, all other parameters are ﬁxed. The value of P T q is directly proportional to the error in a certain range. signal er r signal er r white − 70 , 6 clicks − 57 , 0 sine 30 − 53 , 3 beet − 66 , 8 sine 440 − 63 , 3 speech − 61 , 9 sine 20k − 76 , 9 ﬁre − 62 , 8 const − 84 , 7 atom − 62 , 7 Figure 9: T est results for the raised cosine window . B=24 Hz, R =7, q sup =229, T=0.30 s, T c =1.4 s, T max =0.5 s, a =0.04167 s, b =1.714 Hz, C b =2, C d =4, C cut =55, 4 P T q /d q = 33 . 6 k. V alues in dB. signal er r signal er r white − 71 , 3 clicks − 57 , 3 sine 30 − 61 , 5 beet − 70 , 2 sine 440 − 70 , 0 speech − 69 , 8 sine 20k − 81 , 2 ﬁre − 68 , 4 const − 63 , 7 atom − 69 , 1 Figure 10: T est results for the Gaussian windo w . B=24 Hz, R =4, q sup =132, T=0.167 s, T c =0.8 s, T max =0.4 s, a =0.04167 s, b =3 Hz, C b =2, C d =2, C cut =1000, 4 P T q /d q = 7 . 3 k. V alues in dB. their In verse F ourier transformed (i.e. the warped windows in the time domain) are bounded and continuous. This windo w leads to signiﬁcantly better results do wn to − 96 dB which is the limit when using PD’ s single precision numbers. The inﬂuence of the param- eters on the error was the same as for the raised-cosine windo w . The tables in Figure 10 show selected numerical results. D AFX-7 Pr oceedings of the 19 th International Confer ence on Digital Audio Effects (DAFx-16), Brno, Czech Republic, September 5–9, 2016 4.3. Comparison of these two windows For the raised cosine a high number of bands must be used to achiev e a small error . In Figure 8 the tests of the red dots are conducted using 92 bands, the tests with yellow dots were with 229 bands. Since the windows ha ve a bad decay in the frequency domain, a high essential bandwidth has to be used too, which en- tails big overlap in time. and hence a very high average compu- tational load, in our example 33 . 6 k ﬂoating point operations per sample. If one uses similar parameters to the ones used for the Gaussian windo w in Figure 10, the error is in the range of − 36 dB. Gaussian windo ws on the other hand do not o verlap add to one and hence are not dual to themselves. Hence we at ﬁrst used a very high overlap to minimize the deviation from 1, which turned out to be not necessary later . The fast decay in the time domain allo ws to cut the windows much shorter than the raised cosine window . This decreases the computational load, in our example only 7.3k ﬂops per sample in average. In Figure 8 one can see that for the Gaussian windo w , with proper parameters, the error is in the range of − 90 dB. 5. CONCLUSIONS W e ha ve introduced a nov el, ﬂexible, easy way to construct frames starting from a classical Gabor frame. T ests show , that ev en in the painful case , where perfect time realignment of the components is not guaranteed and hence our method does not lead to perfect reconstruction, the error can be made as small as the accuracy of single precision ﬂoating point numbers for a wide range of signals. This does not prov e that the error is small for all signals, but gi ves a good estimation of the error to be expected. The transform is working in real time. W e are going to implement maps that can be arbitrarily de- ﬁned, e.g., by means of interpolation of a selected number of points. W e will also implement Gabor Multipliers [17] in the redressed warped Gabor e xpansion (partially done already with the external winary˜ ). Since we already implemented this method as a Pure Data e x- ternal, it is ready to use for audio-applications. The whole external explained in detail as well as the source code can be found online at tommsch.com/science.php and in [18]. 6. A CKNO WLEDGEMENTS Many thanks to the great number of suggestions by the great num- ber of anonymous re viewers on ho w to improv e the paper . 7. REFERENCES [1] G. A. V elasco, N. Holighaus, M. Dörﬂer, and T . Grill, “Constructing an inv ertible constant-Q transform with non- stationary Gabor frames, ” in Proceedings of the Digital A u- dio Effects Confer ence (DAFx-11) , Paris, France, 2011, pp. 93–99. [2] P . Balazs, M. Dörﬂer , F . Jaillet, N. Holighaus, and G. A. V e- lasco, “Theory , implementation and applications of nonsta- tionary Gabor Frames, ” Journal of Computational and Ap- plied Mathematics , vol. 236, no. 6, pp. 1481–1496, 2011. [3] G. Ev angelista, M. Dörﬂer, and E. Matusiak, “Phase vocoders with arbitrary frequency band selection, ” in Pro- ceedings of the 9th Sound and Music Computing Confer ence , Copenhagen, Denmark, 2012, pp. 442–449. [4] N. Holighaus and C. W iesmeyr , “Construction of warped time-frequency representations on nonuniform frequency scales, Part I: Frames, ” ArXiv e-prints , Sep. 2014. [5] T . T waroch and F . Hlawatsch, “Modulation and warping op- erators in joint signal analysis, ” in T ime-F r equency and T ime- Scale Analysis, 1998. Pr oceedings of the IEEE-SP Interna- tional Symposium on , Oct. 1998, pp. 9–12. [6] R. G. Baraniuk and D. L. Jones, “W arped wa velet bases: unitary equiv alence and signal processing, ” in Acoustics, Speech, and Signal Processing , 1993. ICASSP-93., 1993 IEEE International Conference on , Apr . 1993, vol. 3, pp. 320–323 vol.3. [7] C. Braccini and A. Oppenheim, “Unequal bandwidth spec- tral analysis using digital frequency w arping, ” IEEE T rans- actions on Acoustics, Speech, and Signal Pr ocessing , vol. 22, no. 4, pp. 236–244, Aug. 1974. [8] G. Evangelista, “Warped Frames: dispersiv e vs. non- dispersiv e sampling, ” in Pr oceedings of the Sound and Mu- sic Computing Conference (SMC-SMAC-2013) , Stockholm, Sweden, 2013, pp. 553–560. [9] G. Ev angelista, “Approximations for Online Computation of Redressed Frequency Warped Vocoders, ” in Proceedings of the Digital Audio Effects Conference (DAFx-14) , Erlangen, Germany , 2014, pp. 1–7. [10] K. Gröchenig, F oundations of T ime-F r equency Analysis , Ap- plied and Numerical Harmonic Analysis. Birkhäuser Boston, 2001. [11] R. G. Baraniuk and D. L. Jones, “Unitary equiv alence : A new twist on signal processing, ” IEEE T ransactions on Sig- nal Pr ocessing , vol. 43, no. 10, pp. 2269–2282, Oct. 1995. [12] P . W . Broome, “Discrete orthonormal sequences, ” Journal of the A CM , vol. 12, no. 2, pp. 151–168, Apr . 1965. [13] L. Knockaert, “On Orthonormal Muntz-Laguerre Filters, ” IEEE T ransactions on Signal Processing , vol. 49, no. 4, pp. 790–793, Apr . 2001. [14] G. Evangelista, “Dyadic W arped W avelets, ” Advances in Imaging and Electr on Physics , vol. 117, pp. 73–171, Apr . 2001. [15] G. Ev angelista and S. Cav aliere, “Frequency W arped Filter Banks and W av elet Transform: A Discrete-Time Approach V ia Laguerre Expansions, ” IEEE T ransactions on Signal Pr ocessing , vol. 46, no. 10, pp. 2638–2650, Oct. 1998. [16] G. Ev angelista and S. Cav aliere, “Discrete Frequency W arped W avelets: Theory and Applications, ” IEEE T rans- actions on Signal Pr ocessing , vol. 46, no. 4, pp. 874–885, Apr . 1998, special issue on Theory and Applications of Fil- ter Banks and W avelets. [17] Hans G. Feichtinger and Krzysztof No wak, A F irst Survey of Gabor Multipliers , pp. 99–128, Birkhäuser Boston, Boston, MA, 2003. [18] Thomas Mejstrik, “Real time computation of redressed frequency warped gabor expansion, ” Master thesis, Uni- versity of Music and Performing Arts V ienna, tomm- sch.com/science.php, 2015. D AFX-8

Estimates of the Reconstruction Error in Partially Redressed Warped Frames Expansions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment