Improving Full Waveform Inversion in Large Model Era

Improving Full Waveform Inversion in Large Model Era
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Full Waveform Inversion (FWI) is a highly nonlinear and ill-posed problem that aims to recover subsurface velocity maps from surface-recorded seismic waveforms data. Existing data-driven FWI typically uses small models, as available datasets have limited volume, geological diversity, and spatial extent, leading to substantial concerns about overfitting. Although they perform well on synthetic datasets, current methods fail to generalize to more realistic geological structures. In this work, we show that a model trained entirely on simulated and relatively simple data can generalize remarkably well to challenging and unseen geological benchmarks. We provide a working recipe that tames a billion-parameter model for FWI through coordinated scaling across three axes: model capacity, data diversity, and training strategy. Our model achieves state-of-the-art performance on OpenFWI and significantly narrows the generalization gap in data-driven FWI. Across six challenging geophysical benchmarks, including Marmousi, 2D SEG/EAGE Salt and Overthrust, 2004 BP, Sigsbee, and SEAM Phase I, it infers complex structures absent from the training set and delivers significant performance improvements (SSIM from 0.5844 to 0.7669). Overall, our results demonstrate that with an appropriate scaling strategy, large models trained on simple synthetic data can achieve substantial generalization to more complex and realistic geological structures.


💡 Research Summary

**
This paper tackles the long‑standing challenges of data‑driven full‑waveform inversion (FWI): severe non‑linearity, ill‑posedness, and, most critically, the scarcity and limited diversity of training data. Existing deep‑learning FWI approaches rely on relatively small models trained on a few hundred thousand synthetic velocity‑seismic pairs, which leads to over‑fitting and poor generalization to realistic geological settings containing complex features such as salt bodies, strong heterogeneity, and fault systems.

The authors propose a comprehensive “large‑model recipe” that simultaneously scales three axes—model capacity, data diversity, and training strategy—to tame a one‑billion‑parameter transformer for FWI. The key components are:

  1. Model Architecture – A billion‑parameter transformer backbone processes a unified token sequence composed of seismic patches (embedded via a patch‑embedding layer) and velocity tokens (discretized by a VQ‑GAN). Unlike conventional autoregressive decoders, the model adopts a non‑causal, parallel decoding scheme: placeholder tokens occupy the velocity positions and are jointly optimized with the transformer, allowing full self‑attention across the entire sequence. This yields global context, higher efficiency, and better reconstruction quality.

  2. Tokenizer – The velocity tokenizer is upgraded to a ViT‑VQGAN that removes the compression bottleneck by up‑sampling the latent space five‑fold. This preserves fine‑scale geological details while still providing a discrete codebook for sequence modeling. Rotary positional embeddings maintain spatial coherence.

  3. Data Scaling – To overcome data scarcity, the authors train a latent diffusion model on the OpenFWI velocity domain, then synthesize millions of new velocity maps. Each synthetic map is paired with a forward‑simulated acoustic seismic record, guaranteeing physical consistency. This expands the paired dataset from ~408 k to >5 M samples and introduces hybrid geological structures that blend features across sub‑datasets, dramatically increasing structural diversity.

  4. Two‑Stage Training
    a. Supervised Pre‑training – The transformer learns token‑wise mappings from seismic tokens to velocity tokens using cross‑entropy loss.
    b. Reinforcement‑Learning (RL) Post‑training – The pretrained model is treated as a stochastic policy πθ(y|s). A map‑level reward encourages geological continuity and physical plausibility (negative L2 distance between ground‑truth and decoded velocity). Policy optimization via GRPO re‑weights token predictions toward globally coherent velocity fields, bridging the gap between token‑level supervision and structural fidelity.

  5. Latent‑Space Gradient Refinement – After token prediction, the continuous latent vectors of the VQ‑GAN decoder are refined by gradient descent on a reconstruction loss computed from the forward‑modeled seismic data. Updating in latent space, rather than directly on the velocity field, preserves high‑frequency details while enforcing consistency with the wave equation, avoiding the instability typical of classic gradient‑based FWI.

Experimental Results – Ablation studies show each component contributes measurable gains: data augmentation alone lifts SSIM from 0.584 to 0.660; non‑causal decoding and the ViT‑VQGAN add ~0.03–0.05; RL adds another ~0.02; latent‑space refinement adds ~0.03–0.04, culminating in a final SSIM of 0.767 on the OpenFWI synthetic benchmark. More importantly, on six realistic, previously unseen benchmarks (Marmousi, 2D SEG/EAGE Salt, Overthrust, 2004 BP, Sigsbee, SEAM Phase I) the method achieves an average SSIM of 0.71, a >30 % improvement over the prior state‑of‑the‑art BigFWI (SSIM ≈ 0.58). Qualitative visualizations demonstrate sharper interfaces, correctly recovered salt bodies, and finer geological textures.

Conclusions and Impact – The study convincingly demonstrates that a sufficiently large model, when paired with massive, physically consistent synthetic data and a carefully designed training pipeline, can generalize far beyond the synthetic domain it was trained on. This opens a pathway for applying deep‑learning FWI to real‑world 3‑D surveys, multi‑physics inversion (elastic, electromagnetic), and domain‑adaptation scenarios where labeled field data are scarce. The work sets a new benchmark for data‑driven geophysical inversion and provides a reproducible recipe for future research.


Comments & Academic Discussion

Loading comments...

Leave a Comment