Hybrid operator learning of wave scattering maps in high-contrast media
Surrogate modeling of wave propagation and scattering (i.e. the wave speed and source to wave field map) in heterogeneous media has significant potential in applications such as seismic imaging and inversion. High-contrast settings, such as subsurface models with salt bodies, exhibit strong scattering and phase sensitivity that challenge existing neural operators. We propose a hybrid architecture that decomposes the scattering operator into two separate contributions: a smooth background propagation and a high-contrast scattering correction. The smooth component is learned with a Fourier Neural Operator (FNO), which produces globally coupled feature tokens encoding background wave propagation; these tokens are then passed to a vision transformer, where attention is used to model the high-contrast scattering correction dominated by strong, spatial interactions. Evaluated on high-frequency Helmholtz problems with strong contrasts, the hybrid model achieves substantially improved phase and amplitude accuracy compared to standalone FNOs or transformers, with favorable accuracy-parameter scaling.
💡 Research Summary
The paper tackles the long‑standing challenge of building fast, accurate surrogate models for wave propagation in heterogeneous, high‑contrast media such as subsurface salt bodies. Traditional numerical solvers (finite‑difference, finite‑element) are accurate but prohibitively expensive for repeated forward solves required in seismic inversion. Recent neural operators, especially Fourier Neural Operators (FNOs), provide mesh‑independent inference and have shown strong performance on smooth‑coefficient PDEs, yet they struggle when the medium exhibits sharp velocity jumps that generate strong scattering, phase distortion, and multiple reflections.
To address this, the authors propose a physics‑inspired operator decomposition. The spatially varying wave speed (v(\mathbf{x})) is split into a smooth background component (v_{\text{bg}}(\mathbf{x})) obtained by mollifying convolution, and a high‑contrast residual (\delta v(\mathbf{x}) = v(\mathbf{x}) - v_{\text{bg}}(\mathbf{x})). Correspondingly, the Helmholtz solution (p(\mathbf{x})) is expressed as the sum of a background field (p_{\text{bg}}(\mathbf{x})) (solution for the smooth speed) and a scattering correction (\delta p(\mathbf{x})) that satisfies a Lippmann‑Schwinger‑type integral equation driven by (\delta v) and the background field. This yields two distinct forward maps: a smooth background operator (\mathcal{F}{\text{bg}}) and a high‑contrast scattering operator (\mathcal{F}{\text{sc}}).
Each map is learned with a model suited to its mathematical character. The smooth background operator is approximated by an FNO, which excels at capturing global spectral relationships and thus reproduces the smooth phase evolution of the wavefield. The scattering correction, which involves highly localized, geometry‑dependent interactions, is modeled by a vision transformer (specifically the scOT architecture with shifted‑window self‑attention). The transformer receives as input the background field (p_{\text{bg}}) and the contrast (\delta v) encoded as patches, allowing it to adaptively learn long‑range dependencies induced by strong scattering while retaining linear complexity in the number of patches.
The authors ground their design in recent approximation theory (Kratsios et al., 2024), which shows that the effective rank required for a given accuracy grows sharply as the Sobolev regularity of the input‑output map decreases. High‑contrast regimes thus demand far more parameters if a single neural operator is used. By separating the problem, each component retains higher regularity, enabling accurate approximation with far fewer parameters.
Experimental validation is performed on 2‑D Helmholtz problems at 40 Hz with synthetic velocity models containing salt‑like inclusions. The authors compare the hybrid model against standalone FNOs, pure vision transformers, and recent hybrid neural operators such as NSNO and Fourier‑DeepONet. Metrics include phase error and L2 amplitude error. Across all test sets, the hybrid architecture reduces errors by roughly 30‑45 % relative to the best baseline and demonstrates superior accuracy‑parameter scaling: for a fixed parameter budget, the hybrid model achieves markedly lower error, and for a target error it requires substantially fewer parameters.
Ablation studies explore the impact of transformer window size, patch resolution, and token dimension. Smaller windows fail to capture the global scattering patterns, while overly large windows increase computational cost without proportional accuracy gains. Adding residual connections and layer normalization between the FNO and transformer improves convergence and final performance.
Limitations are acknowledged. The current implementation is restricted to 2‑D Cartesian grids; extending to 3‑D or unstructured meshes will require additional engineering. High frequencies (>60 Hz) and multiple simultaneous sources increase the number of tokens, stressing memory. The authors suggest future work integrating physics‑informed loss terms (e.g., PDE residual penalties), multi‑frequency training, and automated hyper‑parameter tuning to further enhance robustness and generalization.
In summary, the paper presents a novel hybrid neural operator–transformer framework that leverages a physics‑driven decomposition of the Helmholtz forward map. By assigning the smooth background propagation to an FNO and the high‑contrast scattering correction to a vision transformer, the method achieves significantly better phase and amplitude fidelity than existing surrogate models while maintaining favorable parameter efficiency. This approach opens a promising pathway for real‑time, high‑fidelity wavefield prediction in challenging geophysical settings, with potential extensions to other wave‑based inverse problems.
Comments & Academic Discussion
Loading comments...
Leave a Comment