SoilGen: A Comprehensive Tool for Generating Synthetic Soil Profiles for Geotechnical and Seismic Analysis

Reading time: 4 minute
...

📝 Original Info

  • Title: SoilGen: A Comprehensive Tool for Generating Synthetic Soil Profiles for Geotechnical and Seismic Analysis
  • ArXiv ID: 2512.12429
  • Date: 2025-12-13
  • Authors: Mersad Fathizadeh, Hosna Kianfar

📝 Abstract

Geotechnical and seismic applications, ranging from site response analysis and HVSR simulations to dispersion curve modeling, increasingly depend on large, well-labeled datasets for robust model development. However, the scarcity of publicly available borehole datasets, coupled with the proprietary nature of high-quality field records, creates a significant bottleneck for data-driven research, particularly in machine learning. To address this limitation, this study introduces SoilGen, an open-source framework that procedurally generates physically consistent multilayer soil columns as synthetic soil profiles. Unlike simple randomization, SoilGen computes a complete suite of geotechnical properties, including layer thickness, shear-wave velocity, P-wave velocity, density, and Poisson ratio, while enforcing physical constraints to ensure realism. The algorithmic foundations of the framework and its implementation are outlined, and its utility is demonstrated through representative near-surface geological scenarios relevant to site characterization and near-surface geophysics. By facilitating the rapid generation of large-scale model libraries exceeding one hundred thousand realizations, SoilGen enables comprehensive parametric studies and the training of deep learning inversion networks that require extensive labeled datasets for shear-wave velocity profiling and related site characterization tasks.

💡 Deep Analysis

Figure 1

📄 Full Content

1 SoilGen: A Comprehensive Tool for Generating Synthetic Soil Profiles for Geotechnical and Seismic Analysis Mersad Fathizadeh1, Hosna Kianfar2

1University of Arkansas, Graduate Research Assistant, Dept. of Civil Eng., 4190 Bell Engineering Center Fayetteville, AR 72701, USA, mersadf@uark.edu 2University of Arkansas, Graduate Research Assistant, Dept. of Civil Eng., 4190 Bell Engineering Center Fayetteville, AR 72701, USA, hkianfar@uark.edu

ABSTRACT Geotechnical and seismic applications, ranging from site response analysis and HVSR simulations to dispersion curve modeling, increasingly depend on large, well-labeled datasets for robust model development. However, the scarcity of publicly available borehole datasets—coupled with the proprietary nature of high-quality field records—creates a significant bottleneck for data-driven research, particularly in machine learning. To address this limitation, this study introduces SoilGen, an open-source framework that procedurally generates physically consistent, multilayered soil columns as synthetic soil profiles. Unlike simple randomization, SoilGen computes a complete suite of geotechnical properties—including thickness, 𝑉𝑉𝑆𝑆, P-wave velocity (𝑉𝑉𝑃𝑃, Density and Poisson’s ratio—while enforcing physical constraints to ensure realism. The algorithmic foundations of the framework and its implementation are outlined, and its utility is demonstrated through representative near-surface geological scenarios relevant to site characterization and near-surface geophysics. By facilitating the rapid generation of large-scale model libraries (𝑁𝑁> 105), SoilGen enables comprehensive parametric studies and the training of deep learning inversion networks that require extensive, labeled datasets for shear-wave velocity (𝑉𝑉𝑆𝑆) profiling and other site characterization tasks. Keywords: synthetic soil profiles; near-surface geophysics; machine learning; site characterization; shear-wave velocity (Vs) 1 INTRODUCTION Accurate characterization of the near-surface velocity structure is fundamental to seismic site response evaluation, dispersion curve analysis, and a broad range of geotechnical studies. Techniques such as Horizontal-to-Vertical Spectral Ratio (HVSR), Multichannel Analysis of Surface Waves (MASW), and numerical site response modeling all rely on robust subsurface models to yield reliable predictions. However, traditional inversion methods are often computationally intensive and suffer from non-uniqueness, while publicly available borehole datasets containing complete geotechnical properties remain scarce. This data deficit is particularly critical for data-hungry machine learning approaches, which demand hundreds of thousands of labeled models to learn robust mappings from geophysical observations to subsurface properties. SoilGen addresses this need by programmatically generating one-dimensional layered soil profiles that exhibit realistic thicknesses and velocities, subject to rigorous geophysical constraints. Crucially, the package computes a complete suite of geotechnical parameters—including layer thickness, shear-wave velocity (𝑉𝑉𝑆𝑆), P-wave velocity (𝑉𝑉𝑃𝑃), density, and Poisson’s ratio—ensuring that each generated model is immediately applicable to dispersion curve forward modeling, HVSR

2 simulation, site response analysis, or machine learning pipelines. Integrated validation routines strictly enforce physical laws, such as ensuring that 𝑉𝑉𝑃𝑃 exceeds 𝑉𝑉𝑆𝑆 and that material properties remain within plausible limits. The framework facilitates the rapid generation of extensive model libraries (𝑁𝑁> 105), allowing users to assign profiles to predefined geological scenarios, export them in multiple formats, and visualize them via a modern graphical user interface. The remainder of this paper is organized as follows: Section 2 outlines the SoilGen methodology, detailing the scenario definitions, empirical relationships, and implementation specifics. Section 3 presents representative results, illustrating the tool’s output through multi-panel figures for various geological settings. Finally, Section 4 concludes with a discussion of the package’s broader applications in geotechnical modeling, including its integration with complementary tools such as hvstrip-progressive (Fathizadeh et al., 2025) for advanced layer-stripping analyses.
2 METHODOLOGY AND DATA PROCESSING 2.1 Profile Generation Algorithm SoilGen generates randomized 1D soil profiles—typically comprising 3 to 8 layers—by stochastically sampling layer thicknesses and shear-wave velocities (𝑉𝑉𝑆𝑆), subsequently computing derived elastic properties and validating the physical consistency of each model. The overall generation workflow is illustrated in Figure 1, which depicts the primary interface for parameter definition. To create a synthetic dataset, the user selects a target geological scenario and specifies boundary conditions, including the tot

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut