Audio Inpainting in Time-Frequency Domain with Phase-Aware Prior
We address the problem of time-frequency audio inpainting, where the goal is to fill missing spectrogram portions with reliable information. Despite recent advances, existing approaches still face limitations in both reconstruction quality and computational efficiency. To bridge this gap, we propose a method that utilizes a phase-aware signal prior which exploits estimates of the instantaneous frequency. An optimization problem is formulated and solved using the generalized Chambolle-Pock algorithm. The proposed method is evaluated against other time-frequency inpainting methods, specifically a deep-prior audio inpainting neural network and the autoregression-based approach known as Janssen-TF. Our proposed approach surpassed these methods by a large margin in the objective evaluation as well as in the conducted subjective listening test, improving the state of the art. In addition, the reconstructions are obtained with a substantially reduced computational cost compared to alternative methods.
💡 Research Summary
**
This paper tackles the problem of audio inpainting directly in the time‑frequency (TF) domain by filling missing columns of a short‑time Fourier transform (STFT) spectrogram. The authors observe that conventional sparsity‑based approaches, which typically minimise an ℓ₁ norm of the spectrogram, suffer from two fundamental drawbacks: (i) an “energy loss” effect where the reconstructed signal’s amplitude decays toward the centre of the gap, and (ii) a lack of phase continuity across time, which is especially detrimental for sinusoidal components that dominate musical audio.
To overcome these issues, the authors introduce a phase‑aware prior called instantaneous‑phase‑corrected total variation (iPCTV). The prior consists of two parts. First, an instantaneous frequency (IF) map ω
Comments & Academic Discussion
Loading comments...
Leave a Comment