Blind Source Separation Using Mixtures of Alpha-Stable Distributions

We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gauss…

Authors: Nicolas Keriven (DMA), Antoine Deleforge (PANAMA), Antoine Liutkus (ZENITH)

BLIND SOURCE SEP ARA TION USING MIXTURES OF ALPHA-ST ABLE DISTRIBUTIONS Nicolas K eriven ∗ , Antoine Delefor ge ∗ and Antoine Liutkus † ∗ Inria Rennes - Bretagne Atlantique, France † Inria and LIRMM, Uni versity of Montpellier , France ABSTRA CT W e propose a new blind source separation algorithm based on mix- tures of α -stable distributions. Complex symmetric α -stable dis- tributions hav e been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. Howe ver , inference with these models is notoriously hard to perform because their probability density functions do not hav e a closed-form expression in general. Here, we introduce a nov el method for estimating mixtures of α - stable distributions based on characteristic function matching. W e apply this to the blind estimation of binary masks in indi vidual fre- quency bands from multichannel conv olutiv e audio mixtures. W e show that the proposed method yields better separation performance than Gaussian-based binary-masking methods. Index T erms — Blind Source Separation, Binary Masking, Alpha-Stable, Generalized Method of Moments 1. INTR ODUCTION This paper is concerned with source separation, which is a topic in applied mathematics that aims at processing mixtur e signals so as to recov er their constitutiv e components, called sour ces [1]. It is a field of important and widespread practical applications, notably in audio. It is traditionally exemplified by the cocktail party problem , which consists in isolating some specific discussion within the recording of a crowd [2, 3]. Apart from such speech processing scenarios, source separation also enjoyed much interest in the music processing liter- ature, due to its important applications in the entertainment indus- try [4]. From the perspective of this paper , it is worth mentioning that a significant portion of the research on source separation first makes some assumptions on the source signals and then picks some mixing model . While the former usually stands on probabilistic grounds, the latter often comes from physical assumptions and explains how the observed mixtures are generated from the sources. Historically , the over determined linear case was considered, i.e. , more mixtures than sources are av ailable [1]. The interesting fact about such mixing models is the y can be in verted easily , allowing to recov er the sources from the mixtures, provided their parameters are known. The breakthrough brought in by source separation is to al- low identification of such mixing parameters with only very general assumptions about the sources. These assumptions are mostly either that sources are independent, identically distributed (iid.) and non- Gaussian, as in Independent Components Analysis (ICA, [5]), or that This work was partly supported by the research programme KAMoulox (ANR-15-CE38-0003-01) funded by ANR, the French State agency for re- search. they are Gaussian but not iid. as in Second-Order Blind Identifica- tion (SOBI [6]). Going in the frequency domain allowed to extend such approaches to convolutive mixtures, i.e. for which the sensors capture the sources after some acoustic propagation whose duration is not negligible. The validity of the mixing model and its in v ertibility is cru- cial for applying separation methods that make only broad assump- tions on the sources. When such assumptions are violated, those approaches are not applicable. This typically happens in the under- determined scenario, where fewer mixtures than sources are avail- able, which is common in audio. In that case, separation may only be achiev ed through more in volved source models and time-varying filtering procedures [4]. For this reason, it is natural that research in underdetermined separation focused on highly parameterized and tractable source models. In short, a huge part of the models proposed in the literature stands on Gaussian grounds, where one wants to estimate time-varying po wer-spectral densities and steering vectors for building the corresponding multichannel Wiener filters [7, 8].In that framework, estimation is typically achie ved through maximiza- tion of likelihoods, for instance using the Expectation-Maximization (EM) algorithm [9]. This line of thought leaves room for much fle xi- bility and a large community striv ed to pro vide effecti ve audio spec- trogram models, from sophisticated linear factorization [10] to re- cent dev elopments in deep learning [11]. An intrinsic weakness of Gaussian processes for modelling au- dio sources is to require man y parameters to faithfully represent so- phisticated signals. This is made unavoidable by their light-tails, which only allow for small explorations around av erages and stan- dard de viations. One typically has to pick a dif ferent Gaussian dis- tribution for each time-frequency bin to obtain a good model [8], and precise estimation of all parameters is required for good perfor- mance. This inevitably makes all related estimation methods very sensitiv e to initialization. Using distributions with heavier-tails than the Gaussian for underdetermined separation has been less explored [12] although it is common practice in the overdetermined case [13, 14]. Among such distributions, the α -stable distribution [15] en- joyed some interest in signal processing [16]and more particularly in source separation recently , because it was shown to straightfor- wardly yield ef fective filters with better percei ved audio quality than the more classical W iener [17, 18]. Howe ver , the delicate question of how to estimate the parame- ters of α -stable source models remains quite an open issue. It ap- pears to be very challenging because such distributions do not pro- vide an analytical e xpression for their likelihood, which pre vents the use of classical inference methods. T wo alternati ve options were considered so far . First, Markov Chain Monte Carlo methods [19] are applicable and effecti ve at yet a high computational cost. Sec- ond, classical moment-matching methods were proposed [20] that are ef fective, but some what ad-hoc and hard to translate into the mul- tichannel case of se veral mixtures. In this paper, we use a v ariant of the recent algorithm introduced in [21] for the estimation of mixture models by generalized moment matching (GeMM), to exploit mixtures of multivariate α -stable dis- tributions in the conte xt of audio source separation. This algorithm, referred to as Compressiv e Learning-Orthogonal Matching Pursuit with Replacement (CL-OMPR), is a greedy , heuristic method that was initially used in the context of sketching [21], to estimate mix- ture models on large-scale databases using only a collection of gen- eralized moments computed in one pass. Sketching enjoyed sev- eral successful applications in machine learning [22], b ut also in source localization [23]. Here, we exploit instead the capacity of CL-OMPR to estimate an α -stable mixture model whose probability density function does not enjoy an analytical expression. 2. ALPHA-ST ABLE UNMIXING 2.1. Alpha-stable mixture model Let us consider a mixture of K sound sources observed through M channels. W e denote by { s k ( f , t ) } K k =1 the emitted source spectro- grams and by { x m ( f , t ) } M m =1 the observed channel spectrograms in the complex short-time Fourier domain, where f ∈ [1 . . . F ] and t ∈ [1 . . . T ] denote the discrete frequency and time index es. Assuming time-domain con voluti ve filters from sources to channels which are short compared to the Fourier windows, the mixing model at ( f , t ) can be written x ( f , t ) = K X k =1 a k ( f ) s k ( f , t ) (1) where x ( f , t ) ∈ C M is the observed vector , s ( f , t ) ∈ C K the source vector and a k ( f ) ∈ C M source k ’ s steering vector . Now , we choose an original probabilistic model for the source signals, inspired by recent research on α -harmonizable processes [17, 23]. For each f , all { s k ( f , t ) } T t =1 are assumed independent and identically distributed (iid.) with respect to (wrt.) a symmetric com- plex and centered α -stable distribution of unit scale parameter and characteristic exponent α k,f , which we write: p ( s k ( f , t ); α k,f ) = S c ( s k ( f , t ); α kf ) . (2) In short, the symmetric centered α -stable distribution generalizes the Gaussian isotropic one [24], while providing significantly heavier tails as its characteristic exponent α kf ∈ ]0 , 2] gets small, α kf = 2 corresponding to the Gaussian case. Contrary to classical Gaussian audio source models [7, 8] the parameters of the proposed model are time-in variant, drastically reducing its size. This is permitted by the fact that the distribution S c enables important dynamics for s k ( f , t ) . In other words, (2) corresponds to a model for the mar ginal dis- tribution of the sources. Such ideas have already been considered in [23]. The particularity of our approach in this regard is to feature a frequency-dependent characteristic exponent α kf for increased e x- pressiv e power . The choice of a unit scale for the distrib ution comes with no loss of generality: any frequency-dependent scaling of the sources is incorporated in the steering vectors a k ( f ) . W e highlight that the probability density function (pdf) of s k ( f , t ) in (2) does not have a closed-form expression except for α kf = 1 (Cauchy) and α kf = 2 (Gaussian). Howe ver , its char ac- teristic function , defined as the Fourier transform of its pdf does. W e hav e [15, 17]: ∀ ω ∈ C , E { exp( i Re [ ω ? s k ( f , t )]) } = exp( −| ω | α kf ) . (3) At this point, we make one important simplifying assumption: we suppose only one sour ce is significantly acti ve at each time- frequency (TF) point. More specifically , let z ( f , t ) be the index of the source that has the strongest magnitude | s k ( f , t ) | at TF bin ( f , t ) . Our assumption is that all other sources hav e a magnitude close to 0 . This is less strong than the so called W -disjoint orthogonality as- sumption [3] where a single source is assumed to be activ e. This allows us to assume weak sources are approximately distributed wrt a Gaussian distrib ution. Indeed, ev en if it lacks an analytical ex- pression, the pdf for a symmetric α -stable distribution is infinitely differentiable close to the origin [15], justifying this second order approximation for weak sources. As a result of these assumptions, we take our mixture as: x ( f , t ) = K X k =1 I ( z ( f , t ) = k ) { a k ( f ) s k ( f , t ) + e k ( f , t ) } , (4) where I is the indicator function and e k ( f , t ) ∈ C M is a residual Gaussian term containing all non-dominating signals (other than k ) and possible additional noise. For con venience, we ne glect the inter- channel correlations coming from weak sources, to simply assume that e k is composed of iid. entries with variance σ 2 kf : p ( e k ( f , t ) | z ( f , t ) = k ; σ 2 kf ) = N c ( e k ( f , t ); 0 , σ 2 kf I M ) (5) where N c denotes the multiv ariate complex circular-symmetric Gaussian distribution [24], I M is the M − dimensional identity ma- trix and σ 2 kf is the residual variance at frequency f when source k dominates. Furthermore, the indexes z ( f , t ) of the strongest source for each TF bin are modelled as iid. multinomial variables: p ( z ( f , t ) = k ; π f ) = π kf (6) where π kf is the probability of source k dominating in frequency band f , and P k π kf = 1 . From all the preceding assumptions and dropping the in- dex es ( f , t ) for con venience, we deduce the characteristic functions of a k s k , e k and x | z = k , where ω ∈ C M : ψ a k s k ( ω ) = exp( −| a ? k ω | α k ) (7) ψ e k ( ω ) = exp( − σ 2 k k ω k 2 2 ) (8) ψ x | z = k ( ω ) = exp( −| a ? k ω | α k − σ 2 k k ω k 2 2 ) . (9) Combining (6) and (9), we deduce that { x ( f , t ) } t follows a mixture model parametrized by θ f = { α kf , σ 2 kf , a k ( f ) , π kf } K k =1 . (10) Follo wing the two-stage approach of [25], the proposed blind source separation method consists in clustering observations x ( f , t ) inde- pendently at each frequency according to this mixture model. The resulting classical source permutation ambiguity across frequencies is left aside here (see Section 2.4), and a binary mask is then ob- tained for each source [2, 3]. The special Gaussian case α f k = 2 is discussed in Section 2.2 while a parameter estimation method for the general case is giv en in Section 2.3. 2.2. Special case α f k = 2 Let us consider the special Gaussian case where α f k = 2 for all f , k . The observation model at each frequenc y becomes p ( x t | z t = k ; θ ) = N c ( x t ; 0 , a k a ? k + σ 2 k I M ) (11) where frequency index es have been dropped for con venience. The parameters θ of this mixture model can be straightforwardly esti- mated via an expectation-maximization (EM) procedure [26]. Inter- estingly , using the re-parameterization a k ← σ k a k and σ 2 k ← 2 σ 2 k , it turns out that these EM updates match those of the blind source separation model proposed in [25], up to a small additive constant for σ 2 k . A key dif ference is that in [25], the observations are normal- ized so that k x t k 2 2 = 1 . As such, [25] belongs to the class of spatial- feature clustering-based methods, similarly to DUET [3], while our method operates in the signal domain. 2.3. Parameter estimation via generalized moment matching In the general case α 6 = 2 , estimation is done by generalized moment matching, that is, minimizing the difference between the empirical and theoretical values of a finite number of generalized moments, which are here samples of the empirical characteristic function of the data at some frequency vectors ω j ∈ C M , j ∈ [1 . . . J ] , to be matched with their analytical expression (9). Follo wing the method- ology in [21], the vectors ω j are dra wn randomly according to some probability distribution Λ , in practice designed automatically from the data using the method prescribed in [21]. More precisely: giv en the data points to cluster x 1 , . . . , x T ∈ C M (where the index f has been dropped), the estimation is per- formed as follows: 1. Draw m random vectors ω j iid. ∼ Λ for j ∈ [1 ...J ] ; 2. Compute the empirical characteristic function at these frequen- cies y = h 1 T P T t =1 e i Re ( ω ? j x t ) i J j =1 ∈ C J ; 3. Estimate the model parameters (10) by (approximately) solving min θ    y −  ψ x | z = k ( ω j )  J j =1    2 2 (12) where ψ x | z = k ( ω ) is defined by (9), parameterized by θ . CL-OMPR. The minimization (12) is carried out by a modified ver - sion of the CL-OMPR algorithm [21] adapted to our model. It is a greedy , heuristic algorithm precisely designed to perform mixture model estimation by generalized moment matching. Although it offers limited theoretical guarantees except for very particular set- tings [27], it has been empirically shown to perform well for a large variety of models [21]. In particular, it is applicable as soon as the considered mixture model has a closed-form characteristic function with respect to the parameters of the model, which is the case for mixture of α -stables distrib utions. Although it was initially designed to perform mixture model estimation on large databases, we use it here mostly because the probability density function of the proposed model (9) does not enjoy an analytical expression. This forbids the use of classical methods such as EM. T o our knowledge, there is no other algorithm capable of estimating mixtures of multivariate α -stable distributions in the literature. The CL-OMPR algorithm is a v ariant of Orthogonal Matching Pursuit (OMP), a classical greedy algorithm in compressi ve sensing. Like OMP , it iteratively adds a component to the mixture model by maximizing its correlation to the residual signal. Since the space of parameters is continuous, this is done here with a gradient ascent randomly initialized. Furthermore, CL-OMPR alternates this greedy step with a non-con vex, global gradient descent on (12) initialized with the current support. This additional step adjust the whole sup- port when a component is added. Finally , it also performs more iterations than OMP and includes a hard thresholding step to main- tain the number of components at K , to allo w for replacing spurious components. The CL-OMPR algorithm is described in detail in [21], where it is applied to Gaussian Mixture Model (GMM) estimation. Replacing the GMM by our α -stable model is easily implemented and only requires computation of the gradient of ψ ( x | z = k ) with respect to the dif ferent parameters. The code is av ailable at https:// github.com/nkeriven/alpha_stable_bss . Appr oximate clustering. A drawback of the α -stable model to in- vestigate in future work is that the pdf p ( x | z = k ) does not have an explicit expression. Therefore, the clustering of data points x t can- not be done by exactly maximizing the posterior p ( x t | z = k ) with respect to k . In other words, once we ha ve estimated the mixture of α -stable distrib utions, it is dif ficult to actually assign each point to a component of the mixture. Although a few methods may exist to approximately compute this posterior using approximate numerical integration [28], in prac- tice we found them to be extremely unstable and time consuming. Instead, we decided to cluster the data as if the model was Gaussian , i.e. with α k = 2 , since the likelihood is then computable. Hence, the “clustering” part (and therefore the final source separation step) of both EM (Section 2.2) and the α -stable model are in fact the same . The dif ference between the two lies in the estimation of parameters ( a k , σ 2 k , π k ) . Our hope is that by using the more realistic α -stable source model, steering vectors a k will be estimated more precisely . 2.4. Frequency permutation ambiguity Once clustering is performed at each frequency , a permutation am- biguity remains as the assignment of frequenc y masks to sources is not known. This is a classical problem in blind source separation referred to as permutation alignment . It notably occurs when using ICA [5] and clustering-based methods [8, 25]. A number of tech- niques hav e been proposed to tackle it, based on temporal acti vation patterns [25], steering v ector models [8] or adjacent frequenc y bands similarity [29]. The selection and tuning of a specific permutation technique highly depends on the type of signal and mixing model considered, which is out of the scope of this study . For this reason and for fairness, all methods evaluated in the next section benefited from the same oracle permutation scheme. At each frequency , the permutation minimizing the mean-squared error between estimated and true source images is selected. 3. EV ALU A TION AND RESUL TS W e use two datasets for ev aluation. First, a subset of the QU ASI database 1 consisting in 10 musical excerpts of 30s. For each ex- cerpt, we produced stereo ( M = 2 ) mixes of K = 4 musical tracks (vocals, bass, drums, electric guitar, keyboard,...) using random pure gains and delays. Second, the TIMIT speech database 2 , from which we created 10 tracks of 30s. For each experiment we mix K = 3 of them selected at random into M = 2 channels, again with ran- dom pure gains and delays. In all cases, the gain dif ference between the two channels are at most 5dB and the delay is at most 20 sam- ples. Note that none of the tested methods make assumption on the specific con voluti ve filters used for mixing, as long as they are rel- ativ ely small compared to the Fourier analysis window . The STFT 1 www.tsi.telecom- paristech.fr/aao/en/2012/03/12/quasi/ 2 catalog.ldc.upenn.edu/ldc93s1 SDR (dB) SIR (dB) MER (dB) Mix − 5 . 96 ± 4 . 96 − 5 . 49 ± 4 . 85 N/A Oracle 8 . 33 ± 3 . 16 18 . 3 ± 4 . 13 N/A [25] 1 . 26 ± 2 . 44 2 . 88 ± 3 . 82 10 . 5 ± 9 . 84 EM 3 . 50 ± 2 . 87 9 . 04 ± 4 . 92 12 . 3 ± 11 . 0 CF-GMM 3 . 80 ± 2 . 53 8 . 60 ± 3 . 62 12 . 3 ± 9 . 90 CF- α 4 . 11 ± 2 . 59 9 . 17 ± 3 . 51 12 . 7 ± 9 . 73 (a) QU ASI database (music), K = 4 SDR (dB) SIR (dB) MER (dB) Mix − 3 . 14 ± 1 . 91 − 3 . 13 ± 1 . 90 N/A Oracle 11 . 9 ± 0 . 98 25 . 9 ± 1 . 05 N/A [25] 2 . 16 ± 1 . 33 4 . 90 ± 2 . 54 22 . 0 ± 6 . 57 EM 0 . 54 ± 0 . 50 1 . 44 ± 1 . 21 12 . 0 ± 3 . 64 CF-GMM 1 . 60 ± 1 . 10 4 . 13 ± 2 . 46 14 . 8 ± 3 . 32 CF- α 2 . 70 ± 1 . 74 6 . 11 ± 3 . 31 18 . 9 ± 2 . 72 (b) TIMIT database (speech), K = 3 T able 1 : Separation results with K sources and M = 2 channels, for the four clustering algorithms as well as oracle and mixture results. Each slot contains the mean and standard deviation ov er the 100 trials and K sources, i.e. over 100 K values. parameters were fixed to 64ms Hamming windows at 16kHz with 75% ov erlap. Each experiment is a veraged over 100 trials: each of the 10 songs is selected 10 times, and at each speech trial random utterances are picked from TIMIT and mixed. The results are ev aluated using the classical bss eval toolbox [30]. They are expressed in terms of the signal-to-distortion ratio (SDR) and signal-to-interference ra- tio (SIR), ev aluating the quality of the reconstructed source signals, and the mixing error ratio (MER), defined in [31], evaluating the estimation of the steering vectors a k . W e compare the following 4 clustering algorithms (recall that in each case, binary masks are created using the oracle permutation method of Sec. 2.4): • EM : The clustering is done with a GMM as described in Sec. 2.2. The EM algorithm is repeated 10 times and parameters yielding the best log-likelihood are kept. • [25] : This is our implementation of the method of Sawada et al. using normalized observation, as described in Section 2.2. The EM is also repeated 10 times. • CF-GMM : the clustering is formed with the moment matching method of Sec. 2.3, but with all the α k fixed to 2 . Hence, both EM and CF-GMM achieve estimation in a Gaussian setting, but with dif ferent cost functions: while EM maximizes likelihood, CF-GMM performs generalized moment matching of the charac- teristic function (CF). • CF- α : the clustering is done with the mixture of α -stable distribu- tions of Sec. 2.3. As mentioned before, recall that the clustering part is done by approximating the model as Gaussian, only the estimation of the parameters is different. T o put the results in conte xt, we also outline the “best” and “worst” possible results. In oracle , the separation is performed with the bi- nary mask formed by considering the source that has the highest en- ergy at each TF bin (with oracle knowledge of each source signal). In mix , the result are obtained by directly feeding the mixture signal into the function bss eval images . EM CF-GMM CF-alpha -6 -4 -2 0 2 × 10 4 QUASI, K = 4 EM CF-GMM CF-alpha -6 -4 -2 0 2 × 10 4 TIMIT, K = 3 Fig. 1 : Log-likelihood of the data at each frequency index for each trial (i.e. 100 F v alues), for the EM, CF-GMM and CF- α . For the latter , the “lik elihood” is computed with α = 2 (Gaussian), ev en if a different α was estimated. For readability the lo w end of the y -axis has been cut at − 7 . 10 4 , the CF-GMM and CF- α algorithms have outliers that go down to, respecti vely , − 2 . 10 10 and − 3 . 10 10 . Separation results. In T able 1 we show the separation results for all algorithms. Recall that [25] performs separation purely based on spatial clustering while EM, CF-GMM and CF- α also rely on statistical source models. The results suggest that first, using source models is more beneficial in heavily underdetermined scenarios, e.g . T able 1(a), where source signals are less sparse and more numerous. Second, the proposed α -stable model is better suited than Gaussian models for both speech and music sources. Finally , the proposed approach blindly estimates mixing filters in a more stable way than the EM approach of [25] despite its multiple initialization, as showed by the lower standard de viations of MER. Relevance of log-likelihood. A some what surprising observ ation is that CF-GMM significantly outperforms EM on speech data, despite the fact that both estimate a GMM. In Fig. 1 we compare the log- likelihood results obtained with the three algorithms during the clus- tering phase subsequent to the estimation of the parameters (recall that all three algorithms have the same clustering phase that do not use the estimated α k ). As expected, EM significantly outperforms the two other algorithms on this criterion. This is not surprising since EM aims at maximizing the log-likelihood while the two CF algo- rithms consider only the characteristic function. Since the CF ap- proaches outperform EM in terms of separation results, we conclude that maximization of the log-likelihood, while natural, might not be the most appropriate approach to estimate the mixture parameters in this case, which is an interesting lead for future research. 4. CONCLUSION W e presented a novel method for multichannel blind separation of audio sources using an α -stable model for source signals, combined with the assumption that only one source dominates each ( t, f ) point. The parameters of the proposed model, including distinct scale and α values for each source, are estimated at each frequency using a no vel method based on random generalized moment match- ing. Results show that using oracle permutations, the proposed model performs better than Gaussian models, and that the proposed estimation method outperforms EM ev en using the same Gaussian model. Future work will further inv estigate the α and scale values estimated by our method. In particular, it would be interesting to see if they can be constrained or exploited to resolve permutation ambiguities. The potential of random generalized moment matching versus maximum likelihood methods in source separation should also be further studied. 5. REFERENCES [1] P . Comon and C. Jutten, Eds., Handbook of Blind Sour ce Sep- aration: Independent Component Analysis and Blind Decon- volution , Academic Press, 2010. [2] O. Y ilmaz and S. Rickard, “Blind separation of speech mix- tures via time-frequenc y masking, ” IEEE T ransactions on Sig- nal Pr ocessing , vol. 52, no. 7, pp. 1830–1847, July 2004. [3] Scott Rickard, “The duet blind source separation algorithm, ” Blind Speech Separ ation , pp. 217–237, 2007. [4] E. V incent, N. Bertin, R. Gribon val, and F . Bimbot, “From blind to guided audio source separation: How models and side information can impro ve the separation of sound, ” IEEE Signal Pr ocessing Magazine , v ol. 31, no. 3, pp. 107–115, May 2014. [5] A. Hyv ¨ arinen, J. Karhunen, and E. Oja, Independent compo- nent analysis , vol. 46, John W iley & Sons, 2004. [6] A. Belouchrani, K. Abed-Meraim, J-F Cardoso, and E. Moulines, “ A blind source separation technique using second-order statistics, ” IEEE T ransactions on signal pr ocess- ing , vol. 45, no. 2, pp. 434–444, 1997. [7] L. Benaroya, F . Bimbot, and R. Gribon val, “ Audio source sep- aration with a single sensor , ” IEEE T ransactions on Audio, Speech and Languag e Pr ocessing , vol. 14, no. 1, pp. 191–199, Jan. 2006. [8] N.Q.K. Duong, E. V incent, and R. Gribon val, “Under- determined reverberant audio source separation using a full- rank spatial covariance model, ” IEEE T ransactions on Audio, Speech and Language Pr ocessing , vol. 18, no. 7, pp. 1830 – 1840, sept. 2010. [9] M. Feder and E. W einstein, “Parameter estimation of superim- posed signals using the EM algorithm, ” IEEE T ransactions on Acoustics , vol. 36, pp. 477–489, 1988. [10] A. Ozerov , E. V incent, and F . Bimbot, “ A general flexi- ble framew ork for the handling of prior information in audio source separation, ” IEEE T ransactions on Audio, Speech and Language Pr ocessing , v ol. PP , no. 99, pp. 1, 2011. [11] A. Nugraha, A. Liutkus, and E. V incent, “Multichannel audio source separation with deep neural networks, ” IEEE T r ansac- tions on A udio, Speech, and Languag e Pr ocessing , vol. 24, no. 9, pp. 1652–1664, 2016. [12] Kazuyoshi Y oshii, Katsutoshi Itoyama, and Masataka Goto, “Student’ s t nonnegativ e matrix factorization and positive semidefinite tensor factorization for single-channel audio source separation, ” in IEEE International Confer ence on Acoustics, Speech and Signal Pr ocessing (ICASSP) , Shanghai, China, April 2016. [13] P . Kidmose, Blind separation of heavy tail signals , Ph.D. thesis, T echnical Univ ersity of Denmark, L yngby , Denmark, 2001. [14] P . Kidmose, “Independent component analysis using the spec- tral measure for alpha-stable distributions, ” in IEEE-EURASIP 2001 W orkshop on Nonlinear Signal and Image Processing , 2001, vol. 400. [15] G. Samoradnitsky and M. T aqqu, Stable non-Gaussian ran- dom pr ocesses: stoc hastic models with infinite variance , vol. 1, CRC Press, 1994. [16] C. Nikias and M. Shao, Signal processing with alpha-stable distributions and applications , W iley-Interscience, 1995. [17] A. Liutkus and R. Badeau, “Generalized Wiener filtering with fractional power spectrograms, ” in IEEE International Con- fer ence on Acoustics, Speec h and Signal Pr ocessing (ICASSP) , Brisbane, Australia, April 2015. [18] M. Fontaine, A. Liutkus, L. Girin, and R. Badeau, “Explaining the parameterized wiener filter with alpha-stable processes, ” in IEEE W orkshop on Applications of Signal Pr ocessing to A udio and Acoustics (W ASP AA) , 2017. [19] U. S ¸ ims ¸ ekli, A. Liutkus, and A.T . Cemgil, “ Alpha-stable ma- trix factorization, ” IEEE Signal Pr ocessing Letters , vol. 22, no. 12, pp. 2289–2293, 2015. [20] A. Liutkus, T . Olubanjo, E. Moore, and M. Ghov anloo, “Source Separation for T ar get Enhancement of Food Intake Acoustics from Noisy Recordings, ” in IEEE W orkshop on Ap- plications of Signal Pr ocessing to Audio and Acoustics (W AS- P AA) , New P altz, NY , United States, Oct. 2015. [21] N. Keri ven, A. Bourrier , R. Gribonv al, and P . P ´ erez, “Sketching for large-scale learning of mixture models, ” Information and Infer ence: A Journal of the IMA , 2017. [22] R. Gribon val, G. Blanchard, N. Keriv en, and Y . T raonmilin, “Compressiv e statistical learning with random feature mo- ments, ” arXiv pr eprint arXiv:1706.07180 , 2017. [23] M. Fontaine, C. V anwynsberghe, A. Liutkus, and R. Badeau, “Scalable source localization with multichannel alpha-stable distributions, ” in 25th Eur opean Signal Pr ocessing Confer ence (EUSIPCO 2017) , 2017. [24] R. Gallager, “Circularly Symmetric Complex Gaussian Ran- dom V ectors - A T utorial, ” T ech. Rep., Massachusetts Institute of T echnology , 2008. [25] Hiroshi Sa wada, Shoko Araki, and Shoji Makino, “ A two-stage frequency-domain blind source separation method for under- determined con voluti ve mixtures, ” in Applications of Signal Pr ocessing to Audio and Acoustics, 2007 IEEE W orkshop on . IEEE, 2007, pp. 139–142. [26] Michael E Tipping and Christopher M Bishop, “Mixtures of probabilistic principal component analyzers, ” Neural compu- tation , vol. 11, no. 2, pp. 443–482, 1999. [27] Nicholas Boyd, Geoffrey Schiebinger, and Benjamin Recht, “The Alternating Descent Conditional Gradient Method for Sparse In verse Problems, ” pp. 1–21, 2015. [28] John P . Nolan, “Multiv ariate elliptically contoured stable dis- tributions: Theory and estimation, ” Computational Statistics , vol. 28, no. 5, pp. 2067–2089, 2013. [29] Leandro E Di Persia and Diego H Milone, “Using multiple frequency bins for stabilization of fd-ica algorithms, ” Signal Pr ocessing , vol. 119, pp. 162–168, 2016. [30] E. V incent, R. Gribonv al, and C. F ´ evotte, “Performance mea- surement in blind audio source separation, ” IEEE T ransactions on A udio, Speech and Language Pr ocessing , vol. 14, no. 4, pp. 1462 –1469, July 2006. [31] Emmanuel V incent, Shoko Araki, and Pau Bofill, “The 2008 signal separation ev aluation campaign: A community-based approach to large-scale ev aluation, ” in International Confer- ence on Independent Component Analysis and Signal Separa- tion . Springer , 2009, pp. 734–741.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment