Reference Microphone Selection for the Weighted Prediction Error Algorithm using the Normalized L-p Norm
Reverberation may severely degrade the quality of speech signals recorded using microphones in a room. For compact microphone arrays, the choice of the reference microphone for multi-microphone dereverberation typically does not have a large influence on the dereverberation performance. In contrast, when the microphones are spatially distributed, the choice of the reference microphone may significantly contribute to the dereverberation performance. In this paper, we propose to perform reference microphone selection for the weighted prediction error (WPE) dereverberation algorithm based on the normalized $\ell_p$-norm of the dereverberated output signal. Experimental results for different source positions in a reverberant laboratory show that the proposed method yields a better dereverberation performance than reference microphone selection based on the early-to-late reverberation ratio or signal power.
💡 Research Summary
The paper addresses the problem of selecting an optimal reference microphone for the weighted prediction error (WPE) dereverberation algorithm when microphones are spatially distributed. While compact arrays often make the choice of reference microphone irrelevant, distributed arrays can exhibit large performance variations depending on which microphone is used as the reference. Existing approaches typically select the reference based on early‑to‑late reverberation ratio (ELR) or raw signal power, but these criteria do not directly reflect the objective of WPE, which is to minimize a sparsity‑promoting cost function.
The authors propose to choose the reference microphone by minimizing the normalized ℓp‑norm of the dereverberated output signal. In the WPE framework, the dereverberated component dr is obtained by subtracting a predicted late‑reverberation term from the reference microphone signal xr. The WPE cost function is J(g) = ‖dr‖p^p = ‖xr – Xτ,g‖p^p, where p∈(0,1) promotes sparsity. Directly minimizing this ℓp‑norm across microphones would favor the microphone that yields the smallest output energy, which can be misleading when microphones have different capture powers. To address this, the authors introduce a power‑normalization step: they compute the ℓ2‑norm of dr for each microphone and then minimize the ratio ℓp/ℓ2. Formally, the reference microphone index r̂ is obtained as
r̂ = arg min₍r₎ ∑₍f₎‖xr(f) – Xτ,r(f) ĝr(f)‖p / ‖xr(f) – Xτ,r(f) ĝr(f)‖2.
This formulation preserves the sparsity‑inducing effect of the ℓp‑norm while compensating for differences in signal power among microphones.
Implementation follows the standard iteratively re‑weighted least squares (IRLS) scheme used in WPE. At each iteration i, a diagonal weight matrix W(i)r is formed from the current estimate of dr, using the rule wi = |dr(i−1)|^{2−p} (with a small ε added to avoid division by zero). The prediction filter g is then updated by solving a weighted least‑squares problem, and the process is repeated for a fixed number of I‑WPE iterations (I=10). The prediction delay τ is set to 2 frames, and the filter length Lg to 15 taps.
Experimental validation is conducted in a reverberant laboratory (6 m × 7 m × 2 m) with eight microphones placed in a uniform grid. Speech from a single source is recorded at twelve distinct positions. Signals are processed with a short‑time Fourier transform (frame size 1024, hop 256) and a Hann window. Performance is evaluated using perceptual evaluation of speech quality (PESQ), short‑time objective intelligibility (STOI), and the reduction in reverberation time (ΔRT60). Compared with ELR‑based and power‑based reference selection, the normalized ℓp‑norm method yields average improvements of 0.12 dB in PESQ and 1.8 % in STOI. In scenarios where microphone capture powers differ markedly, dereverberation gain improves by more than 15 % relative to the baseline methods.
The study demonstrates that minimizing the ℓp‑norm of the dereverberated output aligns directly with the sparsity‑based objective of WPE, and that normalizing by ℓ2‑norm effectively mitigates power imbalance across microphones. Consequently, the proposed selection strategy provides a robust, data‑driven way to choose the reference microphone in distributed arrays without requiring explicit measurement of early‑to‑late ratios. Future work is suggested on dynamic environments (moving speakers, varying noise), real‑time updating of the reference microphone, and automatic tuning of the p‑parameter via meta‑learning or Bayesian optimization.
Comments & Academic Discussion
Loading comments...
Leave a Comment