An efficient, accurate, and interpretable machine learning method for computing probability of failure
We introduce a novel machine learning method called the Penalized Profile Support Vector Machine based on the Gabriel edited set for the computation of the probability of failure for a complex system as determined by a threshold condition on a computer model of system behavior. The method is designed to minimize the number of evaluations of the computer model while preserving the geometry of the decision boundary that determines the probability. It employs an adaptive sampling strategy designed to strategically allocate points near the boundary determining failure and builds a locally linear surrogate boundary that remains consistent with its geometry by strategic clustering of training points. We prove two convergence results and we compare the performance of the method against a number of state of the art classification methods on four test problems. We also apply the method to determine the probability of survival using the Lotka–Volterra model for competing species.
💡 Research Summary
The paper addresses the challenging problem of estimating the probability of failure for complex engineering systems whose behavior is described by an expensive computer model Q(x). The failure event is defined by a threshold q₀ on the scalar response Q, which induces a nonlinear decision boundary Q⁻¹(q₀) in the input space Λ. Traditional Monte‑Carlo estimation requires a huge number of model evaluations because most samples lie far from this boundary and provide little information about its geometry.
To overcome this inefficiency, the authors propose a novel classification framework called Penalized Profile Support Vector Machine based on the Gabriel edited set (PSVMG). The approach consists of two synergistic components. First, an adaptive sampling scheme named Probability of Failure – Darts (POF‑Darts) generates training points that concentrate near the unknown decision boundary. Starting from a small initial design, each point is associated with a sphere whose radius estimates the distance to the boundary; new points are placed on hyper‑planes (“darts”) so that these spheres do not overlap, thereby automatically focusing sampling effort where the boundary is poorly described.
Second, the method extracts from the POF‑Darts sample set the Gabriel edited set (GES), i.e., pairs of points that are Gabriel neighbors with opposite class labels. The mid‑points of these pairs are called Characteristic Boundary Points (CBPs) and serve as proxies for the true boundary. CBPs are clustered using a modified k‑means algorithm (MagKmeans) that mitigates class‑imbalance. For each cluster, a soft‑margin linear Support Vector Machine (penalized SVM) is trained, yielding a locally linear surrogate hyperplane. The collection of local hyperplanes is combined via a weighted ensemble average to produce a piecewise‑linear approximation of the global nonlinear boundary.
The authors prove two convergence results: (1) as the number of samples grows, the GES and CBPs converge to the true decision boundary, guaranteeing that the locally linear surrogates become asymptotically accurate; (2) the probability estimate obtained by classifying new points with the ensemble converges almost surely to the true failure probability.
Empirical evaluation is performed on four benchmark problems—including the Brusselator chemical reaction model, a high‑dimensional synthetic test, and a Lotka‑Volterra competing‑species model—as well as against state‑of‑the‑art classifiers such as kernel SVM, Random Forest, XGBoost, and the earlier Pujol method. With a modest budget of a few thousand model evaluations, PSVMG achieves lower absolute error (typically 0.02–0.05) than competing methods, while preserving interpretability because each local surrogate is a simple linear equation that can be related back to physical parameters.
The paper also discusses limitations: the distance‑to‑boundary estimate required by POF‑Darts depends on the initial design and may be inaccurate early on; constructing the Gabriel graph scales quadratically with the number of points, which can become costly in very high dimensions; and the performance is sensitive to the regularization parameter β and the number of clusters K, suggesting a need for automated hyper‑parameter tuning.
Overall, the work makes a significant contribution to reliability engineering and uncertainty quantification by delivering a method that simultaneously reduces computational cost, maintains high classification accuracy, and provides a physically interpretable surrogate of the failure boundary.
Comments & Academic Discussion
Loading comments...
Leave a Comment