SCRAPL: Scattering Transform with Random Paths for Machine Learning

SCRAPL: Scattering Transform with Random Paths for Machine Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The Euclidean distance between wavelet scattering transform coefficients (known as paths) provides informative gradients for perceptual quality assessment of deep inverse problems in computer vision, speech, and audio processing. However, these transforms are computationally expensive when employed as differentiable loss functions for stochastic gradient descent due to their numerous paths, which significantly limits their use in neural network training. Against this problem, we propose “Scattering transform with Random Paths for machine Learning” (SCRAPL): a stochastic optimization scheme for efficient evaluation of multivariable scattering transforms. We implement SCRAPL for the joint time-frequency scattering transform (JTFS) which demodulates spectrotemporal patterns at multiple scales and rates, allowing a fine characterization of intermittent auditory textures. We apply SCRAPL to differentiable digital signal processing (DDSP), specifically, unsupervised sound matching of a granular synthesizer and the Roland TR-808 drum machine. We also propose an initialization heuristic based on importance sampling, which adapts SCRAPL to the perceptual content of the dataset, improving neural network convergence and evaluation performance. We make our code and audio samples available and provide SCRAPL as a Python package.


💡 Research Summary

The paper introduces SCRAPL (Scattering transform with Random Paths for machine Learning), a stochastic optimization framework designed to make wavelet scattering transforms (ST) tractable as differentiable loss functions in deep learning. Traditional ST decomposes a signal into a large set of coefficients (paths) indexed by scale, time, and possibly frequency. Computing the Euclidean distance between full sets of coefficients (the ST loss) requires evaluating all paths, which can be thousands to tens of thousands, leading to prohibitive memory and compute costs during back‑propagation.

SCRAPL addresses this by randomly sampling a single path at each training iteration and using the corresponding per‑path loss as an unbiased estimator of the full ST loss. Proposition 3.1 proves that uniform sampling yields an unbiased gradient estimate. However, because gradients from different paths have distinct distributions, naïve use of Adam would suffer from high variance and biased moment estimates. To mitigate this, the authors propose two path‑aware optimizers:

  1. P‑Adam – maintains separate first‑ and second‑order moment estimates (mₚ, vₚ) for each path p. The update rule adapts the exponential decay rates based on how many iterations have elapsed since the path was last sampled (τₚ), effectively giving fresher information higher weight while older estimates decay more slowly.

  2. P‑SAGA – a path‑wise variant of the SAGA algorithm. It stores the most recent gradient for each path (ĝₚ) and corrects the current stochastic gradient by subtracting the stored value and adding the average of all stored gradients. This reduces variance without requiring a memory footprint proportional to the dataset size; the memory scales with the number of paths, which is manageable.

Both techniques can be used together, providing a variance‑reduced, unbiased estimator while keeping computational overhead low.

Beyond uniform sampling, the paper introduces θ‑Importance Sampling (θ‑IS), a heuristic that biases the path‑selection distribution toward paths that are most sensitive to the model’s parameters θ (e.g., the latent vector produced by a DDSP encoder). For each path‑parameter pair (p, u), the method computes a sensitivity measure sₓ,ᵤ,ₚ = ∂L_{φₚ}(x, D(θ))/∂θᵤ, where D is a non‑learnable synthesizer. This sensitivity is multiplied by the gradient of the encoder output with respect to θᵤ, forming a matrix whose largest eigenvalue λ_max is taken as an importance score C_{u,p}. Averaging C_{u,p} over a representative unlabeled dataset yields a per‑path importance C_p, which is normalized to a categorical distribution π over paths. Sampling paths according to π (instead of uniformly) focuses training on the most informative components of the scattering representation, effectively introducing a controlled bias that improves convergence. The eigenvalue computation is performed via stochastic power iteration and Hessian‑vector products, incurring only a constant factor overhead relative to a standard back‑propagation step.

The authors evaluate SCRAPL on two differentiable digital signal processing (DDSP) tasks: (i) unsupervised matching of a stochastic granular synthesizer, and (ii) parameter recovery for the classic Roland TR‑808 drum machine. The baseline losses are the multiscale spectral loss (MSS), which is fast but perceptually weak for these tasks, and the full joint time‑frequency scattering (JTFS) loss, which is accurate but 25× more expensive. Experiments show that SCRAPL attains roughly 90 % of the JTFS accuracy while requiring only about twice the runtime of MSS, thus occupying a sweet spot on the accuracy‑efficiency Pareto front. Moreover, because SCRAPL’s memory footprint is dramatically smaller than full JTFS, larger batch sizes become feasible, further accelerating training.

Key contributions of the paper are:

  1. Formal unbiased stochastic approximation of scattering loss via uniform path sampling.
  2. Path‑wise adaptive moment estimation (P‑Adam) that respects the non‑i.i.d. nature of path gradients.
  3. Path‑wise SAGA with acceleration (P‑SAGA) that stores per‑path gradient history, reducing variance without dataset‑size memory.
  4. θ‑Importance Sampling, a parallelizable heuristic that biases path selection according to parameter sensitivity, improving convergence.

The authors release a Python package implementing SCRAPL, along with code and audio samples, facilitating reproducibility. They argue that the framework is generic and can be applied to other multivariate scattering transforms, such as image‑based JTFS or 3‑D scattering, opening avenues for efficient perceptual losses in a broad range of deep learning applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment