Data-driven modeling of multivariate stochastic trajectories -- Application to water waves
A data-driven methodology is proposed to model the distribution of multivariate stochastic trajectories from an observed sample. As a first step, each trajectory in the sample is reduced to a vector of features by means of Functional Principal Component Analysis. Next, the joint distribution of features is modeled using (i) a non-parametric vine copula approach for the bulk of the distribution, and (ii) the conditional modeling framework of Heffernan and Tawn (2004) for the multivariate tail. The method is applied to the modeling of water waves. The dataset used is the DeRisk database, which consists of numerical simulations of water waves. The analysis is restricted to the portion of the wave period between the free-surface zero-upcrossing and the wave crest. The kinematic variables considered are the free-surface slope, the normal component of the fluid velocity at the free surface, and the vertical Lagrangian acceleration of the fluid at the free surface. The stochastic trajectories of these three variables are modeled jointly. The vertical Lagrangian acceleration of the fluid is employed to enforce a wave-breaking filter in the stochastic model. The capabilities of the model are illustrated by predicting the distributions of selected response variables and by generating synthetic trajectories.
💡 Research Summary
This paper presents a novel data-driven methodology for modeling the joint probability distribution of multivariate stochastic trajectories, with a specific application to the kinematics of water waves for offshore engineering design.
The core problem addressed is the stochastic modeling of nonlinear wave kinematics, which are critical for estimating extreme loads (like slamming forces) on marine structures. Traditional model-driven approaches (linear or second-order wave theory) struggle to capture higher-order nonlinearities, especially under extreme conditions. To overcome this, the authors propose a purely data-driven framework that learns the statistical structure directly from a large database of high-fidelity numerical simulations—the DeRisk database, generated using the fully nonlinear potential flow code OceanWave3D.
The methodology is a two-stage hybrid approach. In the first stage, dimensionality reduction is performed on the raw trajectory data. Each observed trajectory (time series) of the kinematic variables—specifically the free-surface slope (s), the normal fluid velocity at the surface (u_n), and the vertical Lagrangian acceleration (˙w)—is converted into a low-dimensional feature vector using Functional Principal Component Analysis (FPCA). This step effectively summarizes the infinite-dimensional functional data into a finite set of scores that capture the essential modes of variation in wave shape and dynamics during the “water-entry” phase (from wave zero-upcrossing to crest).
The second and most innovative stage involves modeling the joint probability distribution of these feature vectors. The authors cleverly separate the modeling of the bulk (central part) of the distribution from the multivariate tail (extreme region). For the bulk, they employ a non-parametric vine copula model. This flexible approach can capture complex, non-linear dependence structures among the FPCA scores without assuming a specific parametric form for the marginal distributions or their copula. For the multivariate tail—where extrapolation is needed and standard copulas often fail—they implement the semi-parametric conditional modeling framework of Heffernan and Tawn (2004). This method models the conditional distribution of all variables given that one variable exceeds a high threshold, allowing for a more realistic characterization of extreme dependence.
The proposed model is applied to two contrasting sea states from the DeRisk database: one highly nonlinear and one moderate. The vertical Lagrangian acceleration is used as a physically-based filter to respect the wave-breaking limit (˙w > -0.5g) in the generated synthetic trajectories. The model’s performance is validated by its ability to: (1) generate realistic synthetic water-entry trajectories that visually match the characteristics of the original dataset, including extreme waves, and (2) accurately predict the distributions of key nonlinear response variables. These response variables, such as the maximum of K_x (a proxy for horizontal slamming force) and the time-integral I_y (a proxy for vertical impulse), are nonlinear functions of u_n and s, providing a stringent test of the model’s ability to capture joint dynamics.
In conclusion, the paper demonstrates that this hybrid data-driven approach—combining FPCA for dimensionality reduction, vine copulas for bulk dependence, and the Heffernan-Tawn model for multivariate extremes—provides a powerful and flexible tool for the stochastic emulation of complex physical phenomena like nonlinear water waves. It offers a practical pathway for incorporating high-fidelity simulation results into probabilistic engineering design, particularly for estimating the statistics of extreme events where nonlinearities are dominant.
Comments & Academic Discussion
Loading comments...
Leave a Comment