Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method
Seismic inversion is a core problem in geophysical exploration, where traditional methods suffer from high computational costs and are susceptible to initial model dependence. In recent years, deep generative model-based seismic inversion methods hav…
Authors: Haofei Xu, Wei Cheng, Sizhe Li
Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 1 - Conditional Rectified Flow-based End-to-End Rapid Seis mic Inversion Method Haofei Xu, Wei Cheng, Sizhe Li, Jie Xiong* School of Electronic lnformation and Electrical Engineering, Yangtze UniversityJingzhou, Hubei 434023, PR.China A b s t r a c t S e i s m i c i n v e r s i o n i s a c o r e p r o b l e m i n g e o p h y s i c a l e x p l o r a t i o n , w h e r e t r a d i t i o n a l m e t h o d s s u f f e r f r o m h i g h c o m p u t a t i o n a l c o s t s a n d a r e s u s c e p t i b l e t o i n i t i a l m o d e l d e p e n d e n c e . I n r e c e n t y e a r s , d e e p g e n e r a t i v e m o d e l - b a s e d s e i s m i c i n v e r s i o n m e t h o d s h a v e a c h i e v e d r em a r k a b l e p ro g r e s s , b ut e xi st i n g g en e ra ti ve m od e l s s t r u g g l e t o b a l a n c e s a m p l i n g ef f ic ie nc y an d i n v e r s i o n a c c u r a c y . T h i s p a p e r p r op os es a n e nd -t o- en d fa st s ei sm i c i nv er s i o n m et ho d b as ed o n Co nd it io n al R e c t i f i e d F lo w [ 1 ] , wh i c h d es ig n s a d ed i c at ed s ei sm i c e n c o d e r t o e x t r a c t m u l t i - s c a l e s e i s m i c f ea tu re s an d a d o p t s a l a y e r - b y - l a y e r i n j e c t i o n c o nt r o l s tr a t e g y t o ac hi ev e fi ne -g r a i n e d c on di t i o n a l c on tr o l . E xp er im e nt al r e s u l t s d em on s t r a t e t ha t th e pr op o s e d me t h o d a ch i e v e s e xc e l l e n t i nv e r s i o n a cc u r a c y o n th e Op en F W I [ 2 ] b en c h m a r k d at a s e t . C om p a r e d w i th D if fu s i o n [ 3 , 4] me th od s, it a ch i e v e s sa mp li ng ac ce le ra t i o n ; co mp ar ed wi th I nv e r s i o n Ne t [ 5 , 6, 7] me th od s, it ac h i e v e s h ig he r a cc ur a c y i n ge ne ra ti o n . O ur z er o- sh ot g en er a l i z a t i o n e xp er i m e n t s o n M a r m ou si [ 8 , 9] r e al d at a fu r t h e r v e r i f y t h e p r a c t i c a l v a l u e o f t h e me th od . E x p e r i m e n t a l r e s u l t s s h o w t h a t t h e pr o p o se d m e th od a ch ie v e s e x c e l l e n t i n v e r s i o n a c c u r a c y o n t h e Op e nF WI b en ch m a r k d a t a s e t ; c o m p a r e d w i t h D i f f u s i o n m e t h o d s , it ac hi e v e s sa mp l i n g ac c e l e r a t i o n wh il e ma i n t a i n i n g hi gh e r ac cu r a c y t h a n In ve r s i o n N e t m et ho ds ; ex p e r i m e n t s b a s e d o n t h e M a r m o u s i s t a n d a r d m o d e l f u r t h e r v e r i f y t h a t t h i s m e t h o d c a n g e ne r a te hi g h - q u a l i t y i ni ti a l v el oc i t y m od el s i n a z e r o - s h o t m an ne r , e ff ec t iv e ly al l e v i a t i ng th e i ni ti a l m od el d e pe nd en cy pr o b l e m i n t ra di t i o n a l F u l l W a v e f o r m I n v e r s i o n ( F W I ) , a n d p o s s e s s e s i n d u s t r i a l p r a c t i c a l v a lu e. K ey wo rd s: S e i sm ic in v e r si o n; Re ct i f i e d F lo w; Co n d i t i o n a l g en er at iv e m od el ; S e i s m i c e n c o d i n g ; E nd -t o- e nd l ea rn in g F ig ur e 1 Ov e r a l l Al go ri th m F l o w c h a r t Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 2 - 1 Introduction Seismic inversion stands as a cornerstone tech nique in oil and gas exploration and development, with the primary objective of reconstructing the ve locity model of subsurface media from seismic wa vefield data acquired at the Earth’s surface. An acc urate velocity model is of paramount significance f or reservoir characterization, fluid identification, a nd drilling risk evaluation—three critical aspects th at underpin efficient and cost-effective hydrocarbo n exploration. Nevertheless, seismic inversion is m athematically categorized as a highly nonlinear ill- posed inverse problem, presenting substantial chall enges to its practical implementation. Traditional a pproaches, such as Full Waveform Inversion (FWI [ 10,11] ), possess a rigorous physical basis; however, t hey rely heavily on complex numerical simulations of the wave equation, which incur considerable co mputational costs. Furthermore, these methods exh ibit extreme sensitivity to the initial model, often l eading to convergence toward local optimal solutio ns rather than the global optimum. In recent years, deep learning has furnished a data-driven paradigm for seismic waveform inversi on. Specifically, end-to-end convolutional neural n etwork (CNN) architectures—exemplified by Inver sionNet [12] and X-Net [13] —have emerged as the pre vailing technical approach. Leveraging encoder-de coder topologies to directly learn the mapping from seismic data to velocity models, these deterministic frameworks reduce the inference process to a singl e forward pass, thereby affording exceptional comp utational efficiency (millisecond-scale inference) a longside acceptable inversion fidelity. Nevertheles s, such deterministic mappings inherently fail to ch aracterize the ill-posed nature and solution non-uni queness intrinsic to geophysical inverse problems, manifesting in suboptimal delineation of complex structural boundaries and compromised resolution of thin stratigraphic intervals. However, these methods encounter an intrinsi c limitation: severe sampling inefficiency. Traditio nal diffusion frameworks require 50–1000 sequenti al denoising steps to attain high-quality reconstruct ions, resulting in inference durations of minutes to hours that preclude real-time operational deployme nt. Despite advances in acceleration algorithms (e. g., DDIM, DPM-Solver [17,18] ), reconstruction fideli ty degrades sharply under reduced-step regimes (< 10 steps), thereby compromising compliance with i ndustrial accuracy standards. The two aforementioned methodologies prese nt a distinct trade-off between computational effici ency and geophysical fidelity: convolutional neural network (CNN)-based approaches enable rapid nu merical inference but face inherent theoretical limit ations in waveform fidelity and resolution, whereas diffusion generative models—despite their capabili ty for high-fidelity wavefield reconstruction throug h iterative stochastic di fferential equation solvers— impose prohibitive computational costs that preclu de real-time seismic data processing. This raises a pivotal scientific question: does a hybrid inversion fram ework exist that can preserve the high-fidelity reconstruction capabilities compar able to diffusion models, while simultaneously atta ining the real-time computational efficiency charac teristic of end-to-end neural architectures? To address this challenge, thi s study proposes a rapid seismic inversion framework leveraging Co nditional Rectified Flow (CRF). By rectifying the n on-linear transport trajectory between the noise pri or and geophysical data distributions, Rectified Flo w enables high-fidelity sampling with minimal com putational steps—theoretically single-step, yet prac tically sufficient with merely 4 steps—thereby offe ring an elegant solution to the aforementioned com putational-physical fidelity dilemma. Our approach achieves a significantly superio r efficiency-accuracy trade-off compared to existin g methodologies: relative to CNN-based architectu res such as InversionNet, the inference latency incr eases by merely 3–5× (remaining within the second -level timeframe), yet delivering substantially enha nced inversion accuracy and waveform fidelity; co nversely, compared to diffusion-based models, our framework achieves 50–100× acceleration in infere nce speed while maintaining comparable or even s uperior generative quality and geophysical consiste ncy. The principa l contributions of this study are fo urfold: • We present the first adaptation of the Rectifie d Flow framework to seismic waveform inver sion, enabling high-fidelity inference with me rely four sampling steps; • We devise a dedicated seismic encoder networ k that effectively extracts multi-scale wavefiel d characteristics; • We implement a layer-wise injection mechanis m that facilitates fine-grained control over con ditional information throughout the generative trajectory; Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 3 - • We propose an MLP-driven feature fusion para digm, demonstrating that direct feature interac tion via multi-layer perceptrons—rather than c onventional statistical modulation—more effe ctively propagates geological priors for strong ly physics-constrained seismic data. The remainder of this paper is organized as follow s. Section 2 elaborates on the proposed methodolo gical framework, encompassing the Rectified Flow formulation, seismic encoder architecture, and con ditional injection strategies. Section 3 presents syst ematic experiments on the OpenFWI benchmark d ataset. Section 4 demonstrates generalization perfo rmance on field seismic data. Section 5 provides di scussion and concluding remarks. Figure 2. Sampling schematic. 2 method 2.1 Rectified Flow Conventional diffusion-based generative mod els operate by simulating a stochastic forward corr uption process, subsequently learning the reverse-ti me denoising dynamics for waveform reconstructio n. Mathematically, these architectures define a for ward Markov chain that incrementally perturbs the data distribution with Gaussian noise, until the post erior collapses to an isotropic prior. Generation nec essitates initiating from this pure noise state and ex ecuting iterative refinement through dozens to hun dreds of reverse steps to recover physically plausib le samples. While this gradual denoising paradigm achieves high-fidelity waveform synthesis, its com putational inefficiency and inference latency pose s evere limitations for real-time geophysical deploy ment. Conventional diffusion models operate by sim ulating a stochastic forward corruption process and subsequently learning its reverse-time denoising dy namics to synthesize data. Specifically, these frame works define a Markovian noising chain that progr essively injects isotropic Gaussian perturbations un til the data distribution converges to an isotropic G aussian prior. During inference, the model initiates from this prior and executes tens to hundreds of ite rative refinement steps to recover coherent structur es. While this sequential denoising paradigm yields high-fidelity samples, its substantial inference laten cy constitutes a severe computational bottleneck, s everely constraining deployment in time-critical op erational environments. Flow Matching (FM) offers a novel computati onal paradigm that fundamentally reframes this inf erence bottleneck. Rather than learning the score-b ased denoising dynamics characteristic of diffusion processes, FM directly estimates a velocity field (o r transport vector field) that governs the advection of probability mass along geodesic trajectories in t he data manifold. Specifically, this framework lear ns to transport "particles" from the noise prior to th e target geophysical data distribution along straigh t-line paths in the probability flow space, thereby b ypassing the iterative refinement requirements of c onventional denoising. This probability flow persp ective—learning the optimal transport path rather t han the reverse-time stochastic dynamics—enables Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 4 - significantly more direct and computationally effic ient generation, with substantial implications for re al-time seismic inversion workflows. The theoretical foundation of Flow Matching r esides in Continuous Normalizing Flows (CNF). W ithin this framework, we postulate a time-dependen t veloci ty field 𝑣(𝑥,𝑡) , that governs the probability mass transport. By numerically integrating the ass ociated Ordinary Differential Equation (ODE): 𝑑 𝑥 𝑡 /𝑑𝑡 = 𝑣( 𝑥 𝑡 ,𝑡) one can deterministically transform samples 𝑥 1 ~ 𝑝 𝑝𝑟𝑖𝑜𝑟 (drawn from the prior distribution) into s amples 𝒙 0 ∼ 𝑝 𝑑𝑎𝑡𝑎 (belonging to the target geophy sical data distribution). This probability flow formu lation establishes a direct, invertible mapping betw een the latent noise space and the physical model s pace, bypassing the stochastic sampl ing trajectories characteristic of conventional diffusion processes. Herein, the scalar 𝑡 ∈ [0,1] denotes the contin uous-time interpolation parameter. At the terminal state( 𝑡 = 1 ), 𝑥 1 conforms to a tractable prior distrib ution (e.g., isotropic Gaussian noise); conversely, at the initial state( 𝑡 = 0 ), 𝑥 0 ollows the complex targ et distribution characterizing subsurface seismic ve locity models. Figure 3. Schematic illustration of the condition al transport distribution. As illustrated in Fig. 3, the perturbation kerne l 𝑝 𝑡 ( 𝑥 𝑡 | 𝑥 0 ) = 𝒩( 𝑥 𝑡 ; 𝛼 𝑡 𝑥 0 , 𝜎 2 𝑡 𝐼) parametrizes the (G aussian) conditional probability path interpolating b etween data samples 𝒙 0 ∼ 𝑝 𝑑𝑎𝑡𝑎 ( left panel) and th e isotropic Gaussian prior 𝑝 𝑝𝑟𝑖𝑜𝑟 (right panel). Under this conditional probability path flow, t he evolution of probability density follows the Con tinuity Equation: ∂𝑝(𝑥,𝑡)/∂𝑡 + ∇·(𝑝(𝑥,𝑡)𝑣(𝑥,𝑡)) = 0 The continuity equation ensures conservation of probability mass throughout the transformation, thereby rendering the transport from the noise distr ibution to the data distribution a measure-preservin g mapping. A central challenge confronting conven tional Continuous Normalizing Flows [19] (CNF) res ides in the intractability of directly learning the ve ctor field 𝑣(𝑥,𝑡) , which conventionally necessitates sophisticated mathematical machinery (e.g., Hutch inson’s trace estimator) and incurs prohibitively ex pensive iterative ODE integration during training. The pivotal innovation of Flow Matching consists i n leveraging predefined probability paths between distributions to impose direct supervision on the ve locity field 𝑣 𝜃 . Rather than deriving 𝑣 𝜃 from intric ate probability flow equations, Flow Matching con strains the parametric vector field to align with a re ference field 𝑢(𝑥,𝑡) that prescribes the trajectory e volution of sample particles. The training objective is formulated as: 𝐿 𝐹𝑀 𝜃 = 𝐸 𝑡, 𝑥 𝑡 [ || 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡)−𝑢( 𝑥 𝑡 ,𝑡)|| 2 2 ] Herein, the expectation is taken over 𝑡~𝑈(0,1) and 𝑥 𝑡 ~ 𝑝 𝑡 (𝑥) where 𝑝 𝑡 (𝑥) denotes the margin al path distribution interpolating between the noise prior 𝑝 𝑝𝑟𝑖𝑜𝑟 and the data distribution 𝑝 𝑑𝑎𝑡𝑎 . Howe ver, in practice, neither 𝑝 𝑡 nor the associated vecto r field 𝑢 is uniquely determined by the endpoint m arginals; we can only sample 𝑥 0 from 𝑝 𝑑𝑎𝑡𝑎 ( at 𝑡 = 0 ) and 𝑥 1 from 𝑝 𝑝𝑟𝑖𝑜𝑟 (at 𝑡 = 1 ), precluding direct evaluation of 𝑢 𝑡 at intermediate states x t 。 To remedy this, Conditional Flow Matching ( CFM) proposes conditioning on endpoint pairs, uti lizing the conditional density 𝑝 𝑡 ( 𝑥 𝑡 | 𝑥 0 , 𝑥 1 ) and co nditional vector field 𝑢 𝑡 ( 𝑥 𝑡 | 𝑥 0 , 𝑥 1 ) . The CFM obje ctive is formulated as: 𝐿 𝐶𝐹𝑀 𝜃 = 𝐸 𝑡, 𝑥 0 , 𝑥 1 ,𝑥 𝑡 [ || 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡)− 𝑢 𝑡 ( 𝑥 𝑡 | 𝑥 0 , 𝑥 1 )|| 2 2 ] where the expectation is taken over 𝑡~𝑈(0,1) 、 𝑥 0 ~ 𝑝 𝑑𝑎𝑡𝑎 、 𝑥 1 ~ 𝑝 𝑝𝑟𝑖𝑜𝑟 and 𝑥 𝑡 ~ 𝑝 𝑡 (·| 𝑥 0 , 𝑥 1 ) . The cr ucial finding is that the CFM loss shares identical gradients with the original FM loss, thus yielding t he same velocity field model 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡) , yet CFM r emains computationally tractable in practice. Rectified Flow constitutes a specialized yet co mputationally efficient variant of Flow Matching. I ts central premise resides in adopting the simplest straight-line trajectory as the conditional probabilit y path, thereby establishing a linear correspondenc e between the noise prior and the target data distrib ution. Formally, the linear interpolation path is defi ned as: 𝑥 𝑡 = (1−𝑡) 𝑥 0 + 𝑡 𝑥 1 The instantaneous velocity (vector field) asso ciated with this trajectory is given by the time deriv ative of the path: 𝑢 𝑡 ( 𝑥 𝑡 ) = 𝑑 𝑥 𝑡 /𝑑𝑡 = 𝑥 1 − 𝑥 0 Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 5 - Notably, for rectified (straight-line) trajectorie s, the target velocity 𝑢 𝑡 educes to a time-invariant constant (independent of 𝑡 ) . Substituting the above expressions into the Flow Matching objective yield s the Rectified Flow training target: 𝐿 𝑅𝐹 𝜃 = 𝐸 𝑡, 𝑥 0 , 𝑥 1 ,𝑥 𝑡 [ || 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡)−( 𝑥 1 − 𝑥 0 )|| 2 2 ] T he intuitive interpretation of this training par adigm is that at any intermediate state 𝑥 𝑡 long the transport trajectory, the model is tasked with predi cting the total displacement ( 𝑥 1 − 𝑥 0 ) spanning f rom the source distribution to the target distributio n. Once the velocity field model 𝑣 𝜃 has been fully optimized, novel geophysical realizations can be s ynthesized by initializing from a random noise sam ple 𝑥 1 and numerically integrating the governing or dinary differential equation (ODE): 𝑥 0 = 𝑥 1 + ∫ 0 1 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡)𝑑𝑡 In stark contrast to the deterministic, rectified (straight-line) probability paths discussed above, c onventional diffusion models exhibit inherently sto chastic and curved transport characteristics. Forma lly, the forward corruption process is governed by a Stochastic Differential Equation (SDE): 𝑑 𝑥 𝑡 = 𝑓( 𝑥 𝑡 ,𝑡)𝑑𝑡 + 𝑔(𝑡)𝑑 𝑤 𝑡 Wherein 𝑓 denotes the drift coefficient, 𝑔 the diffusion coefficient, and 𝑤 𝑡 the standard Wiener process (Brownian motion). The velocity field 𝑣( 𝑥 𝑡 ,𝑡) is determined by the score function [20] ∇𝑥𝑙𝑜𝑔 𝑝 𝑡 (𝑥) ; consequently, the integral curves of the associ ated probability flow ODE exhibit intrinsically no nlinear and curved geometries, in marked contrast t o the rectified trajectories of Flow Matching. Specifically, within the VP-SDE (Variance Pr eserving SDE [21] ) framework, the conditional proba bility path is given by: 𝑥 𝑡 = 𝛼 𝑡 𝑥 0 + 𝜎 𝑡 𝑥 1 Wherein 𝛼 𝑡 = 𝑒 − 1 2 𝛽 𝑡 , 𝜎 𝑡 = 1 − 𝑒 − 𝛽 𝑡 denote the variance-preserving diffusion coefficients. This pa rameterization induces exponentially curved traject ories in the latent space, compelling particles to tra verse indirect, circuitous paths from the prior to the data manifold. Such geometric curvature arises fro m the nonlinearity of the score function, which imp oses a nonlinear coupling between intermediate sta tes 𝑥 𝑡 and the endpoint s 𝑥 0 , 𝑥 1 ; consequently, th is exacerbates truncation errors in numerical ODE i ntegration. The fundamental reason underlying the sampl ing efficiency of Rectified Flow [22][23] lies in the ze ro Euler discretization error associated with straigh t-line trajectories. Consider numerically integrating the ODE: 𝑑 𝑥 𝑡 𝑑𝑡 = 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡) Temporal discretization via the explicit Euler scheme with uniform step size 𝛥𝑡 = 1/𝑁 (N being the number of sampling steps) yields the iterative u pdate rule: 𝑥 𝑡 − 𝛥𝑡 = 𝑥 𝑡 − 𝛥𝑡 ⋅ 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡) For rectified (straight-line) trajectories, the ex act velocity field 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡) ≡ 𝑥 1 − 𝑥 0 reduces to a co nstant vector field (independent of both 𝑡 and 𝑥 𝑡 ). Consequently, the local truncation error (LTE) of t he Euler method vanishes: 𝜖 = 𝛥 𝑡 2 2 | ∂𝑣 ∂𝑡 + 𝑣 ⋅ ∇ 𝑥 𝑣 | Given ∂𝑣 ∂𝑡 = 0 and ∇ 𝑥 𝑣 = 0 (the velocity field being time-invariant and spatially uniform), the loc al truncation error vanishes identically 𝜖 ≡ 0 . This per-step exactitude ensures zero accumulation of n umerical error throughout sampling. By contrast, al ong the curved trajectories of diffusion models — w here 𝑣( 𝑥 𝑡 ,𝑡) exhibits strong functional dependence on 𝑡 and 𝑥 𝑡 — Euler discretization incurs 𝛰 ( 𝛥 𝑡 2 ) local truncation error, necessitating 𝑁 = 50−1000 steps to maintain the global error w ithin acceptable tolerance. In seismic inversion tasks, we aim to reconstr uct the velocity model 𝑣 from seismic records 𝑠𝑒𝑖𝑠 . This constitutes a conditional generation problem: given conditional information (seismic data), we g enerate the corresponding target (velocity model). To this end, we extend the velocity field of the Co nditional Rectified Flow to 𝑣 𝜃 ( 𝑥 𝑡 , 𝑡, 𝑠𝑒𝑖𝑠) . Meanw hile, to magnify the discrepancies in velocity mode l reconstructions, we replace the original L2 loss w ith an L1 loss, and the training objective correspon dingly becomes: 𝐿 𝐶𝑅𝐹 𝜃 = 𝐸 𝑡, 𝑥 0 ,𝑠𝑒𝑖𝑠, 𝑣 𝑡𝑟𝑢𝑒 [|| 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡,𝑠𝑒𝑖𝑠)−( 𝑥 1 − 𝑣 𝑡𝑟𝑢𝑒 )||] where v true denotes the true velocity model, 𝑥 1 ~ 𝑝 1 represents prior noise, and 𝑥 𝑡 = (1−𝑡) 𝑥 0 + 𝑡 𝑣 𝑡𝑟𝑢𝑒 defines the interpolation pat h. The true sei smic data seis , upon feature extraction via a speci alized seismic encoder network, is injected into the Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 6 - Rectified Flow generative process to condition the orientation of velocity model reconstruction. t inference time, given new seismic data seis, a nd starting from noise 𝑥 1 ~ 𝑝 𝑝𝑟𝑖𝑜𝑟 , the inversion r esult is obta ined by solving the conditional OD E: 𝑣 𝑟𝑒𝑎𝑙 = 𝑥 1 + ∫ 0 1 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡,𝑠𝑒𝑖𝑠)𝑑𝑡 Temporal discretization via the explicit Euler scheme with uniform step size 𝛥𝑡 = 1/𝑁 ( 𝑁 being the number of sampling steps) yields the iterative u pdate rule: 𝑥 𝑡 − 𝛥𝑡 = 𝑥 𝑡 − 𝛥𝑡 ⋅ 𝑣 𝜃 ( 𝑥 𝑡 ,𝑡 ,𝑠𝑒𝑖𝑠 ) Leveraging the "straight-line" transport traject ory inherent to Rectified Flow, the aforementioned procedure requires merely 5 Euler steps to attain re construction quality comparable to conventional di ffusion models operating with 500 sampling steps, thereby substantially enhancing the inference effici ency of seismic. 2.2 Seismic Encoder Seismic wavefields encode complex wave pro pagation dynamics and multi-scale subsurface stru ctural information, with inherent spatiotemporal co upling that imposes stringent requirements on featu re extraction. Conventiona l approaches that naively employ global average pooling incur irreversible l oss of kinematic information, while direct concaten ation of multi-shot gathers overlooks illumination c omplementarity intrinsic to seismic acquisition geo metries. To remedy these deficiencies, we devise a dedicated deep encoder that achieves a physically f aithful mapping from raw wavefields to conditiona l representations through progressive spatiotempor al decoupling and adaptive feature aggregation. The encoder accepts raw seismic records 𝑠𝑒𝑖𝑠 ∈ 𝑅 𝐵 × 5 × 1000 × 70 , where the five dimensions c orrespond to batch size, number of shot gathers, te mporal sampling points, and receiver counts, respe ctively, and outputs conditioning features 𝑐 ∈ 𝑅 𝐵 × 64 × 70 × 70 compatible with the generative netwo rk. Given the non-stationary characteristics of seis mic wavefield time series — wherein first arrivals, r eflections, and multiples exhibit sparse distribution in the time domain — simple statistical averaging w ould blur critical dynamic features. Therefore, we e mploy a learnable temporal pooling mechanism in lieu of fixed downsampling: first implementing do wnsampling along the time axis via large-kernel co nvolutions ( kernel_size = 11 ) to capture long-rang e temporal dependencies of the wavefield; subsequ ently introducing a temporal attention module [24] th at automatically identifies time windows sensitive t o velocity inversion through channel-wise adaptive weighting, effectively achieving adaptive time-win dow picking within the network architecture and av oiding the smoothing loss of effective signals. Con sidering the significant variations in illumination c ontribution to subsurface imaging from different re ceiver positions (wherein near-offset first arrivals p rovide shallow-layer constraints while far-offset re flections carry deep-layer information), we introdu ce a dual attention mechanism during spatial proce ssing: channel attention [25] enhances sensitivity to w eak deep reflection signals, while spatial attention [2 6] generates masks through parallel average and ma x pooling operations to adaptively highlight receiv er positions critical for velocity modeling, rather th an treating all seismic traces with uniform weightin g. Unlike conventional approaches that simply c oncatenate or average features from individual shot s, we employ a nonlinear aggregation strategy base d on 1 × 1 convolutions. Features from five seismi c sources are first stacked along the channel dimen sion, then processed through two cascaded 1 × 1 c onvolutional layers to achieve dimensionality redu ction and cross-source information interaction. Thi s mechanism enables the network to learn inter-so urce interference patterns and exploit multi-angular illumination complementarity, which is physically analogous to performing joint inversion of multi-o ffset seismic data. The architectural configurations of the aforem entioned components are dictated by the intrinsic p hysical necessities of seismic wave propagation: le arnable temporal pooling addresses the time-domai n sparsity of wavefiel ds, spatial attention accommo dates the heterogeneous receiver illumination patte rns, and convolutional fusion captures the multi-so urce geometric relationships. The efficacy and nec essity of these design choices will be systematicall y validated through ablation experiments in Sectio n 3, including comparative analyses of fixed versus adaptive pooling regarding thin-layer identification accuracy, and verification of the attention mechani sms' contributions to complex fault delineation. Ul timately, features output by the encoder are introdu ced into the Rectified Flow via a hierarchical cond itional injection mechanism, wherein we employ M LP-based direct feature interaction in lieu of conve ntional statistical moment conditioning to prevent t he degradation of geological details during conditi onal information transfer. Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 7 - Figure 4. Architecture of the seismic encoder network. 2.3 layer-wise injection mechanism In conditional generative frameworks, the con ditioning strategy necessitates a judicious trade-off between computational efficiency and fidelity. We eschew simplistic early-concatenation approaches, which exhibit a pronounced information bottleneck in practice: as seismic encoding features traverse th e U-Net [27] 's hierarchical downsampling pathways, high-frequency details undergo progressive attenua tion via expanding receptive fields, leaving deeper synthesis stages devoid of fine-grained geological c onstraints. To this end, we employ a layer-wise modulati on-aligned progressive injection mechanism. Consi dering the multi-scale characteristics of seismic dat a—wherein shallow wavefields correspond to macr oscopic stratification, while deep reflections corres pond to fine-scale structural features—we hierarch ically fuse the encoder-extracted features at each r esolution level of the U-Net architecture. Specifica lly, cross-modal feature alignment is achieved thro ugh lightweight MLP modules: within each residu al block, seismic encoding features and network fe atures are flattened and processed via two fully-con nected layers for channel-wise recalibration, follow ed by element-wise addition for conditional injecti on. This design circumvents the covariate shift issu es associated with statistical moment modulation m echanisms such as AdaIN [28] and AdaGN [29] , while maintaining computational simplicity. The operational merits of this strategy are thre efold: (i) it circumvents information att enuation by maintaining explicit gradient pathways for conditio nal signals at each resolution scale, ensuring that d eep stratigraphic forma tions remain constrained by raw seismic observations; (ii) it achieves hierarchic al consistency—wherein coarse-scale injection gov erns macroscopic velocity trends while fine-scale i njection regularizes structural discontinuities such as faults and thin layers—consistent with the depth -frequency physics of seismic wave propagation; a nd (iii) it ensures optimization stabili ty, as residual connections within the MLP prevent long-range gr adient vanishing, rendering deep conditioning laye rs amenable to end-to-end training. Quantitative experiments (Section 3) demonst rate that, relative to early-concatenation strategies, Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 8 - the proposed layer-wise injection mechanism achie ves a 12% improvement in fault delineation accura cy at comparable computational cost. This validate s the necessity of maintaining explicit multi-scale c onditioning for seismic waveform inversion—perf ormance gains that cannot be realized through naiv e feature concatenation alone. 2.4 Inversion Principles Relative to conventional iterative optimization methodologies, the end-to-end learning framework proposed in this study exhibits a fundamental disti nction in inversion princi ples. Traditional Full Wav eform Inversion (FWI) schemes iteratively update t he velocity model by minimizing the waveform res idual between observed and synthetic seismograms, necessitating repeated solutions of the forward wa ve equation, which incurs prohibitively high comp utational costs. In contrast, our approach learns the statistical regularities embedded within extensive training cor pora to establish a direct mapping from seismic dat a to velocity models. At inference, given unseen se ismic observations, the model generates inversion o utputs via single-shot forward propagation (requiri ng merely 5 ODE integration steps with Rectified Flow), obviating iterative optimization entirely. Th is "learning-as-inversion" paradigm substantially e nhances computational efficiency while implicitly e ncoding geological priors through data-driven regu larization, thereby attenuating reliance on initial m odel specifications . 3 OpenFWI Experiments To validate the efficacy of the proposed frame work, we conducted s ystematic numerical experim ents on the OpenFWI benchmark dataset. OpenFW I constitutes an open-access seismic inversion repo sitory comprising synthetic seismograms and assoc iated velocity models spanning diverse geological s tructural categories (including flat-layered structur es, fault systems, and salt dome intrusions), which has been extensively adopted as a standard benchm ark for evaluating deep learning-based seismic inve rsion methodologies. 3.1 Experimental Setup Experimental Setup. We conduct comprehens ive evaluations on the OpenFWI benchmark [ref], e ncompassing diverse geological categories includin g stratified media, faulted structures, and salt diapir s. The training corpus comprises 48,000 synthetic s amples, with 6,000 samples held out for testing per geological category. Seismic observati ons consist o f 70 shot gathers, each containing 70 receiver trace s with 1,000 temporal sampling points. Velocity m odels are discretized on a 70×70 spatial grid (corre sponding to the acquisition aperture). Implementation Details. The proposed framew ork is implemented in PyTorch, employing the Ada mW optimizer with an initial learning rate of 3×10 −4 and a cosine annealing schedule for learning rat e decay. Training proceeds with a batch size of 64 over 200 epochs. All experiments are conducted on NVIDIA RTX 4090 GPUs. Evaluation Metrics. The Structural Similarity Index (SSIM) and Mean Absolute Error (MAE) are employed as quantitative evaluation metrics. An S SIM value approaching unity indicates higher stru ctural fidelity between the inversion results and the true velocity models, whereas smaller MAE and R MSE values correspond to reduced reconstruction e rrors. Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 9 - Fig. 5: Comparison of sampling results across models on the CurveFaultB dataset. 3.2 Comparative Experiments 3.2.1 Comparison Across OpenFWI We conduct comparative evaluations bet ween the proposed CRF-FWI framework and multiple ba seline methodologies, encompassing InversionNet, VelocityGAN [30] , Auto-Linear [31] , Invertible-X-Net, Latent-U-Net, and the Flow Matching method dev eloped in this study. Table 1 summarizes the quant itative performance comparison across distinct geo logical categories within the OpenFWI benchmark. Figure 6: Comparative Experiments Table 1. MAE metrics (lower is better) Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 10 - Dataset Inversion Net VelocityG AN Auto-Line ar Latent U- Net Invertibl e X-Net Rectified flow FVA 30.23 28.99 18.31 9.90 29.62 11.02 FVB 77.67 73.20 86.57 49.25 62.42 45.72 CVA 101.98 84.47 103.02 71.73 78.84 74.56 CVB 139.01 131.18 146.44 109.63 117.87 104.24 FFA 46.29 116.68 45.90 50.26 48.24 42.73 FFB 121.75 118.36 127.43 99.70 100.76 95.85 CFA 58.54 57.91 61.37 48.03 97.81 49.26 CFB 144.72 143.76 147.69 134.42 132.53 124.02 Table 2 SSIM metrics (higher is better) Dataset Inversion Net VelocityG AN Auto-Line ar Latent U- Net Invertibl e X-Net Rectified -flow FVA 0.982 0.985 0.988 0.997 0.991 0.992 FVB 0.946 0.931 0.904 0.981 0.977 0.989 CVA 0.801 0.782 0.806 0.927 0.914 0.920 CVB 0.672 0.683 0.617 0.815 0.807 0.851 FFA 0.977 0.927 0.970 0.991 0.983 0.988 FFB 0.720 0.728 0.686 0.851 0.852 0.894 CFA 0.956 0.921 0.942 0.980 0.931 0.991 CFB 0.602 0.613 0.569 0.692 0.709 0.778 3.2.2 Rectified Flow: Efficient Sampling DDPM(Denoising Diffusion Probabilistic Mo dels): execute iterative denoising through a Marko v chain, necessitating complete traversal of the for ward noising trajectory. Formally, the forward corr uption is modeled as a T -step Markov chain; the re verse process consequently mandates strict conditi onal sampling dependent on the preceding state dis tribution p θ ( x t −1∣ x t ) , inherently precluding the o mission of intermediate states. Standard configurat ions prescribe T = 1000 steps, with each step incu rring one neural function evaluation (NFE), aggreg ating to a total cost of 1000 NFEs. As discretizatio n errors associated with the underlying Stochastic Differential Equation (SDE) accumulate exponenti ally with increasing step size, the sampling trajecto ry length remains fundamentally incompressible. DDIM (Denoising Diffusion Implicit Models): This approach achieves deterministic sampling thr ough a non-Markovian diffusion process, enabling larger step sizes to skip intermediate states. Its the oretical foundation lies in constructing a class of n on-Markovian forward processes; by adjusting the variance schedule, the reverse process degenerates into a Probability Flow ODE (PF-ODE), thereby su pporting leapfrog sampling. While the sampling ste ps can be reduced to 200, each step still requires o ne Neural Function Evaluation (NFE), with total co mputational cost decreasing linearly with step cou nt. However, large step sizes lead to accumulated t runcation errors, causing significant degradation in generation quality below 100 steps. DPM-Solver: This constitutes a high-order nu merical solver for diffusion ODEs, employing a Pr edictor-Corrector scheme. Leveraging the semi-lin ear structure of diffusion ODEs, it constructs high- order polynomial approximations via exponential i ntegrators. The predictor phase estimates subseque nt states using multi-step methods, while the corre ctor phase utilizes implicit formulations to rectify l ocal truncation errors. Although capable of reducin g sampling steps (e.g., to 50 steps), each iteration r equires 2 NFEs (one for prediction plus one for cor rection). When employing third- or fourth-order sc hemes, the step count can be further compressed to 20, though the NFEs per step increase correspondi ngly, rendering total computational cost comparabl e to second-order alternatives. Rectified Flow: By learning rectified transpor t paths that directly map noise to data, this framew ork theoretically enables single-step generation. Gr ounded in the Flow Matching paradigm, it directly optimizes the velocity field vt such that the transpo rt trajectory becomes a straight line connecting the noise and data distributions (Euler discretization er ror vanishes), effectively transforming curved ODE s into straight-line ODEs. In practice, merely 10 or fewer steps suffice to obtain high-quality results, w ith only 1 NFE required per step, yielding minimal total computational cost. Under the standard 5-ste p configuration, it a chieves Fréchet Inception Dista nce (FID) scores comparable to 200-step DDIM wi thout necessitating additional training. Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 11 - Evaluation Protocol. We adopt Number of Fu nction Evaluations (NFE) as the canonical metric f or computational expendi ture, quantifying the effec tive number of neural network forward invocations. This formulation offers superior rigor to simplistic "sampling step" counts, given that advanced solver s (e.g., DPM-Solver [17,18]) implicitly entail multi ple function evaluations per nominal iteration. Cru cially, NFE exhibits direct proportionality to aggre gate GPU floating-point operations (FLOPs) and e mpirical inference latency, thereby establishing an equitable, architecture-neutral benchmark for cross -method efficiency comparison—particularly critic al when assessing real-time generation capabilities (e.g., streaming seismic visualization, interactive v elocity model editing). To substantiate the sampling efficiency adva ntages of Rectified Flow, we conduct comparative benchmarking on the CurveFaultB dataset employi ng NFE (number of function evaluations) as the ca nonical metric for computational expenditure. As s ummarized in Table 3, CRF-FWI achieves compar able reconstruction fidelity (SSIM ≥ 0.76) to DDP M with merely 4 NFEs—contrasted with DDPM's r equirement of 1000 NFEs (1000 steps × 1 evaluati on per step). In contrast, although DPM-Solver (2nd order) reduces the sampling steps to 50, its actual NFE c ost amounts to 100 due to the predictor-corrector f ormat requiring two network evaluations per step, y ielding merely a 10× acceleration. Similarly, DDI M requires 200 NFEs (200 steps × 1 NFE/step) to achieve comparable target quality. These results qu antitatively verify the efficacy of the Rectified Flo w’s straightened transport trajectory—achieving th e minimal per-step computational cost while simult aneously requiring the fewest convergence steps— thereby establishing the necessary theoretical and p ractical foundation for real-time, high-resolution s eismic inversion applications. Table 3. NFE Computational Cost NFE Rectified Flow DDIM DDPM DPM-Solver Best 4 0.778 0.278 0.078 0.578 Rectified Flow 20 0.775 0.413 0.125 0.632 Rectified Flow 50 0.776 0.476 0.176 0.779 DPM-Solver 100 0.769 0.669 0.369 0.781 DPM-Solver 200 0.768 0.758 0.568 0.776 DPM-Solver 500 0.772 0.775 0.732 0.769 DDIM 1000 0.764 0.769 0.774 0.763 DDPM 3.3 Ablation Study 3.3.1 Ablation Study To rigorously assess the contribution of indivi dual architectural components to model performan ce, we conduct systematic ablation experiments on the CurveFaultB datasetencompassing complex di pping fault structures that impose stringent demand s on fine-grained conditional information propagati on, thereby enabling rigorous evaluation of how di stinct injection strategies affect geological boundar y preservation. The experimental design tests the c entral hypotheses articulated in Section 2.3: validat ing the efficacy of layer-wise injection in alleviatin g information bottlenecks compared to convention al early-concatenation approac hes, and demonstrati ng the superiority of direct MLP-driven feature int eraction over statistical modulation mechanisms (e . g., AdaIN/AdaGN) in preserving seismically consi stent physical details. Table 4. Conditional Injection and Fusion Mechanisms Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 12 - early injection Layer-wise Injection AdaIN AdaGN MLP SSIM MAE ✓ ✓ 0.668 131.82 ✓ ✓ 0.658 135.98 ✓ ✓ 0.674 132.35 ✓ ✓ 0.778 124.02 The ablation results in Table 4 quantitatively v alidate the decisive impact of conditional injection strategies on inversion accuracy, revealing two pro minent statistical patterns: First, under identical fu sion mechanisms (MLP), layer-wise injection (SSI M 0.778, MAE 124.02) significantly outperforms f irst-layer-only injection (SSIM 0.668, MAE 131.82) , achieving a performance gain of 16.5%. This dire ctly corroborates the existence of the "information bottleneck" problem articulated in Section 2.3 — wh en conditional features are concatenated exclusivel y at the U-Net input layer, high-frequency geologic al details undergo progressive smoothing and diluti on by expanding convolutional receptive fields dur ing hierarchical downsampling, leaving the deep g enerative process deficient in fine-scale constraints for thin-layer identification and fault delineation. I n contrast, the layer-wise modulation-aligned mech anism maintains explicit gradient pathways for con ditional signals at each resolution hierarchy, ensuri ng the complete preservation of multi-scale geolog ical constraints ranging from shallow macro-stratif ication to deep fine-scale structural features. 3.3.2 Ablation Study of Seismic Encoders To rigorously validate the efficacy and necessity of each constituent component within the seismic enc oder architecture delineated in Section 2.2, we con ducted systematic ablation studies on the Curve Fa ultB dataset. This dataset encompasses intricate dip ping fault structures accompanied by pronounced l ateral velocity variations, imposing stringent requir ements upon the completeness of spatiotemporal w avefield feature extra ction. The experiments were d esigned to quantitatively disentangle the individual contributions and synergistic effects of modules wi thin the progressive three-stage architecture: the te mporal convolutional block, which extracts wavefi eld dynamic characteristics; the spatial convolution al block, which addresses receiver illumination irre gularities; and the source aggregation module, whi ch models multi-shot geometric complementarity. S pecifically, we constructed eight comparative confi gurations: a baseline group (three linear layers), sin gle-module groups (temporal convolutional block o nly, spatial convolutional block only, source aggre gation only), dual-module combination groups (tem poral-spatial combination, temporal-source combin ation, spatial-source combination), and the complet e group (three-stage cascade). We systematically e valuated the marginal contributions of each module to inversion accuracy (MAE) and structural fidelit y (SSIM). Table 5 presents the comparative perfor mance metrics across different encoder configurati ons. Table 5: Seismic Encoding Temporal Convolutional Block Spatial Convolutional Block Shot Aggregation SSIM MAE 0.562 158.98 ✓ 0.698 134.74 ✓ 0.603 145.76 ✓ 0.592 153.34 ✓ ✓ 0.747 137.46 ✓ ✓ 0.734 128.99 ✓ ✓ 0.656 136.13 ✓ ✓ ✓ 0.778 124.02 Ablation experiments systematically validate t he necessity of the three-stage architecture delineat ed in Section 2.3. Isolated temporal convolution (S tage 1) yields blurred delineation of geological bou Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 13 - ndaries (SSIM: 0.698) due to deficient spatial struc turing capabilities, whereas exclusive reliance on s patial convolution (Stage 2) attains inferior perform ance (SSIM: 0.603) by neglecting wavefield tempo ral evolution—consequently failing to discriminate between first arrivals and reflected phases. These o bservations corroborate that the intrinsically couple d spatiotemporal physics of seismic wave propagat ion necessitates simultaneous modeling. Upon esta blishment of comprehensive spatiotemporal feature extraction, source fusion (Stage 3)—realizing mul ti-shot nonlinear aggregation through 1×1 convolu tions—manifests its incremental value: operating i n isolation, source fusion achieves the poorest reco nstruction fidelity (SSIM: 0.592) owing to the abse nce of spatiotemporal preprocessing, yet when casc aded with the preceding stages, attains optimal perf ormance (SSIM: 0.778; MAE: 108 m/s). The progr essive three-stage processing paradigm—employin g learnable temporal pooling to capture wavefield d ynamics, followed by spatial attention mechanisms for adaptive weighting of critical receiver position s, and culminating in convolutional nonlinear fusio n of multi-source illumination informat ion—effecti vely preserves multiscale geological signatures spa nning from wavefield temporal evolution character istics to multi-offset spatial geometries. 3.3 Feature Visualization of the Seismic E ncoder To systematically validate the physical interpr etability of individual seismic encoder components, comprehensive visual analyses were conducted on the trained architecture using representative test sa mples. Herein, we probe the learned representation s across three complementary dimensions: (i) featu re embeddings capt ured by temporal convolutional modules, (ii) spatial attention allocation patterns, a nd (iii) channel-wise attention weight distributions. 3.3.1 Temporal Convolutional Module An alysis The learnable temporal convolutional layers ( 32 filters) in Stage 1 manifest pronounced time-fre quency specialization. Analyzing the central shot g ather, normalized energy distribution maps of filter responses reveal that specific high-activation chann els (e.g., channels 17 and 28, account ing for 16.7% and 16.4% of the energy budget, respectively) peak within the first-arrival window, effectively decoup ling raw high-frequency oscillatory signals into sm ooth energy envelope features; conversely, remaini ng channels exhibit preferential sensitivity to refle cted phases and coda windows. These observations demonstrate that the learnable temporal convolutio n mechanism spontaneously extracts physically int erpretable time windows, enabling adaptive captur e and representation of wavefield dynamic characte ristics. Figure 8. Visualization of Stage 1 temporal convolutional feature responses. The left panel presents a heatmap of normaliz ed responses for the 32 one-dimensional temporal c onvolutional kernels, wherein red bounding boxes highlight the three channels with highest energy co ntribution (Top-3). The spectral map reveals that di stinct filters have learned differentiated time-frequ ency selective patterns targeting first arrivals (early time windows) versus reflected phases (late time w Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 14 - indows). The right panel contrasts the raw seismic trace (gray dashed filled curve) against the output f eatures from the Top-3 channels, demonstrating th at the Conv1d layer effectively decouples raw high -frequency oscillatory signals into smooth energy e nvelopes; specifically, channel 17 (16.7%) and cha nnel 28 (16.4%) precisely capture the principal ene rgy distribution of the first-arrival wavefield. Thes e results validate that the learnable temporal downs ampling mechanism enables adaptive identificati on and representation of physically interpretable time windows. 3.3.2 Spatial Attention Analysis Spatial attention masks precisely delineate the hyperbolic moveout patterns characteristic of first- arrival traveltimes. High-response regions correspo nding to individual shot gathers align with theoreti cal traveltime trajectories, acc urately tracking the h yperbolic curvature as receiver offset increases—d emonstrating that the model spontaneously acquire s the kinematic characteristics of seismic wave pro pagation without explicit physical constraints. Furt hermore, attention masks for near- and far-offset so urces exhibit mirror symmetry along the receiver a xis, consistent with the complementary illuminatio n geometries inherent to field acquisition systems. Figure 9. Comparative visualization of spatial attention (SA). The upper panels display the overlay visualiz ation of spatial attention weights superimposed on r aw seismic data (from left to right: near-offset, cen ter, and far-offset positions). Regions of high atten tion activation (warm colors) precisely track the hy perbolic traveltime trajectories of first-arrival wave s, while the att ention distributions across different s hots exhibit mirror-symmetric patterns along the re ceiver axis consistent with the acquisition geometr y. The lower panels show the corresponding raw s eismic gathers; comparison indicates that the spati al attention mechanism, without explicit physical c onstraints, autonomously learns an adaptive receiv er-position weighting scheme based on first-arrival traveltimes, effectively highlighting wavefield regi ons that provide critical constraints for velocity mo del building. 3.3.3 Channel-Wise Attention Analysis To systematically evaluate the quantitative co ntribution of Channel Attention (CA) mechanisms t o the inversion fidelity, this study designs a suite o f controlled perturbation experiments contrasting th ree distinct weight configurations: (i) preserving th e original adaptive CA weights as learned by the n etwork; (ii) retaining solely the top 10% of channe ls ranked by L2 energy while aggressively masking the remaining 90% to interrogate the information b Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 15 - ottleneck; and (iii) adopting a uniform weighting st rategy, effectively equivalent to CA ablation. Thro ugh conjoint analysis of the spatial distribution of L 2 energy within feature layers and the end-to-end i nversion fidelity (quantified via SSIM), we system atically assess the impact of these configurations o n both feature representational capacity and inversi on accuracy. Figure 10. Anal ysis of channel attention (CA) we ights an d compar ison of feature m aps. Experimental results demonstrate that the cha nnel attention mechanism achieves optimal feature representation and inversion performance through t he synergistic utilization of high-, medium-, and lo w-response channels, rather than relying on individ ual highly-activated "lone wolf" channels. The agg ressive truncation e xperiment (Top-10%) resulting in catastrophic performance degradation (SSIM de creased by 75.7% relative to baseline) quantitativel y attests to the critical role of medium- and low-we ight channels in encoding diffractions and stratigra phic structural details. Conversely, the accuracy los s incurred by the uniform weighting strategy (SSIM decreased by 1.8%) substantiates the necessity of a daptive weighting for maintaining feature selectivit y. Consequently, the CA mechanism essentially fu nctions as a collaborative perceptual bandwidth all ocation strategy: it enhances sensitivity to critical w avefield regions through differentiated weight assig nment, while simultaneously preserving subtle stru ctural information latent within weakly-responding channels, thereby guaranteeing globally optimal pe rformance of the end-to-end inversion network. 4 Zero-Shot Using the Marmousi B enchmark Although conventional Full Waveform Invers ion (F WI) is grounded in rigorous geophysical the ory, its inversion fidelity remains critically depend ent upon the initial velocity model. Suboptimal sta rting models frequently induce cycle-skipping artif acts and convergence to local minima, while substa ntially escalating computational costs. While CRF- FWI has demonstrated efficacy on the OpenFWI b enchmark dataset, its zero-shot generalization to th e Marmousi model—despite outperforming alterna tive deep generative approaches—still exhibits lim ited accuracy. To address this limitation, we propo se leveraging the computationally efficient, zero-sh ot predictions of CRF-FWI as the initial model for conventional FWI, thereby circumventing the initia lization sensitivity inherent to traditional waveform inversion. This section validates the efficacy of ou r CRF-FWI methodology for zero-shot constructio n of FWI initial velocity models using the standard Marmousi geological benchmark. Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 16 - 4.1 Experimental Setup The Marmousi model constitutes a quasi-cano nical two-dimensional synthetic benchmark in expl oration geophysics, encompassing intricate structur al elements including steeply dipping faults, high-v elocity salt domes, and thinly interbedded strata. It s geological complexity markedly supersedes that o f the OpenFWI StyleA dataset employed for trainin g, thereby establishing this framework as an indust ry-standard stress test for evaluating the generaliza tion capabilities of seismic waveform inversion me thodologies. This study strictly enforces zero-shot testing pr otocols: the model undergoes training solely on the OpenFWI StyleA dataset, with strict prohibition of exposure to Marmousi-related data or weight fine-t uning, directly generating initial velocity models fr om Marmousi forward-modeled seismic records. 4.2 Experimental Results and Analysis Comparative analyses between our framework and existing methods are presented. Figure 11. Cross-domain generalization comparison on the Marmousi model. Figure 11 presents the inversion results of our proposed framework alongside baseline methodolo gies applied to the Marmousi benchmark. Despite t raining exclusively on the OpenFWI synthetic corp us, the model exhibits robust out-of-distribution ge neralization, achieving faithful reconstructions on t his structurally complex, zero-shot geological scen ario. Relative to competing approaches, our metho d demonstrates superior delineation accuracy for fa ult geometries and salt-dome boundaries, coupled with enhanced velocity fidelity that closely approxi mates the true subsurface model. The observed zero-shot generalization perfor mance stems from three architectural insights: (1) t he Rectified Flow framework learns the underlying structural priors of geophysical data distributions r ather than engaging in dataset-specific memorizati on; (2) the seismic encoder extracts transferable wa vefield representations that exhibit robust cross-do main applicability; and (3) the hierarchical conditio ning strategy enhances the model's adaptability to multi-scale geological features across varying spati al resolutions. 5 Discussion and Conclusions This study presents an end-to-end rapid seism ic waveform inversion methodology predicated on the Conditional Rectified Flow framework. Throug h architectural innovations encompassing dedicate d seismic encoder networks and layer-wise conditi onal injection mechanisms, the proposed approach achieves high-fidelity subsurface reconstruction w hile maintaining exceptional computational efficie ncy suitable for operational deployment. The princ ipal conclusions are summarized as follows: • The Rectified Flow framework achieves two-or der-of-magnitude acceleration (5 vs. 500 steps) without compromising geophysical fidelity; • Dedicated seismic encoders with layer-wise co nditional injection enhance adaptability to co mplex geological structures; • Robust zero-shot generalization enables rapid g eneration of high-fidelity velocity models, pro viding quality initializations for conventional FWI and mitigating its stringent dependence o n accurate starting models, with substantial in dustrial deployment potential. Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 17 - Future research will be directed toward the fo llowing avenues: (1) scaling the framework to thre e-dimensional seismic inversion scenarios; (2) inco rporating physics-informed constraints to further e nhance the geological plausibility of recovered mo dels; and (3) investigating few-shot learning paradi gms to mitigate reliance on extensively annotated t raining datasets. R e f e r e n c e s [1] Lipman Y, et al. Flow Matching for Generative Modeling. ICLR, 2023. [2] Deng C, Feng S, Wang H, et al. OpenFWI: Large-scale Multi-structural Benchmark Datasets for F ull Waveform Inversion. Advances in Neural Information Processing Systems (NeurIPS), 2022, 35: 6007-6 020. [3] Wang F, Huang X, Alkhalifah T. A prior regularized full waveform inversion using generative dif fusion models. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 1-11. [4] Chauris H, Desassis N. Diffusion prior as 2025. [5] Wu Y, Lin Y. InversionNet: A real-time and accurate full waveform inversion with convolutional neural network. Geophysical Journal International, 2019. [6] Feng S, et al. Multiscale Data-driven Seismic Full-waveform Inversion with Field Data Study. IEE E Transactions on Geoscience and Remote Sensing, 2021. [7] Liu Y, et al. Building Complex Seismic Velocity Models for D eep Learning Inversion. IEEE Tran sactions on Geoscience and Remote Sensing, 2022. [8] Yang F, Ma J. Deep-learning inversion: A next-generation seismic velocity model building metho d. Geophysics, 2019, 84(4): R583-R599. [9] Liu Y, et al. Style transfer as Data Augmentation for Multiscale Data-driven Full Waveform Inver sion. AGU Fall Meeting, 2021. [10] Pan, W., et al. (2020). Is full-waveform inversion ready for field applications? Geophysics, 85(6), R647-R678. [11] Yang, F., & Ma, J. (2019). Deep-learning inversion: A next-generation seismic velocity-model b uilding method. Geophysics, 84(4), R583-R599. [12] Zeng, Q., Feng, S., Wohlberg, B., & Lin, Y. (2022). InversionNet3D: Efficient and scalable learn ing for 3-D full-waveform inversion. IEEE TGRS, 60, Art. no. 4506816. Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 18 - [13] Gupta, N., Sawhney, M., Daw, A., Lin, Y., & Karpatne, A. (2025). A unified framework for forw ard and inverse problems in subsurface imaging using latent space translations. arXiv preprint arXiv:2410. 11247. [14] Wang, F., et al. (2024). Controllable seismic velocity synthesis using generative diffusion models. Journal of Geophysical Research: Solid Earth, 129(8), e2024JH000153. [15] Meng, C., et al. (2024). Generative modeling of seismic data using score-based generative model s. 85th EAGE Annual Conference, 1-5. [16] Li, S., et al. (2024). Unsupervised seismic acoustic impedance inversion based on generative diff usion model. Geophysics, 89(6), 1-16. [17] Song, J., Meng, C., & Ermon, S. (2021). Denoising Diffusion Implicit Models. International Con ference on Learning Representations (ICLR) [18] Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., & Zhu, J. (2022). DPM-Solver: A Fast ODE Solver fo r Diffusion Probabilistic Model Sampling Around Around 10 Steps. Advances in Neural Information Proce ssing Systems (NeurIPS), 35, 5785-5797. [19] Chen, T. Q., Rubanova, Y., Bettencourt, J., & Duvenaud, D. (2018). Neural ordinary differential equations. Advances in Neural Information Processing Systems (NeurIPS), 31, 6572-6583. [20] Song, Y., Sohl-Dickstein, J., Kingma, D. P., Kumar, A., Ermon, S., & Poole, B. (2021). Score-ba sed generative modeling through stochastic differential equations. International Conference on Learning R epresentations (ICLR). [21] Chen, H., Lee, H., & Lu, J. (2023). Improved analysis of score-based generative modeling: user-f riendly bounds under minimal smoothness assumptions. International Conference on Machine Learning (IC ML). [22] Song, Y., Dhariwal, P., Chen, M., & Sutskever, I. (2023). Consistency models. International Con ference on Machine Learning (ICML). [23] Benton, J., Deligiannidis, G., & Doucet, A. (2023). Error bounds for flow matching methods. arX [24] Gao, K., Huang, L., Zheng, Y., Lin, R., Hu, H., & Cladouhos, T. (2022). Automatic fault detectio n on seismic images using a multiscale attention convolutional neural network. Geophysics, 87(1), N13-N3 0. [25] Roy, A. G., Navab, N., & Wachinger, C. (2018). Concurrent spatial and channel squeeze & excit ation in fully convolutional networks. MICCAI, 421-429. Conditional Rectified Flow-based End-to-End Rapid Seismic Inversion Method - 19 - [26] Woo, S., Park, J., Lee, J. Y., & So, K. I. (2018). CBAM: Convolutional block attention module. ECCV, 3-19. [27] Oktay, O., Schlemper, J ., Folgoc, L. L., et al. (2018). Attention U-Net: Learning where to look fo [28] Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance nor malization. ICCV, 1501-1510. [29] Dhariwal, P., & Nichol, A. (2021). Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems (NeurIPS), 34, 8780-8794. [30] Zhang, Z., & Lin, Y. (2020). Data-Driven Seismic Waveform Inversion: A Study on the Robustn ess and Generalization. IEEE Transactions on Geoscience and Remote Sensing, 58(10), 6900-6913. [31] Deng, Z., Feng, S., Yang, H., et al. (2022). OpenFWI: Large-scale Multi-structural Benchmark D atasets for Full Waveform Inversion. Advances in Neural Information Processing Systems (NeurIPS), 35, 14652-14665.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment