Modelling heavy tail data with bayesian nonparametric mixtures
In the study of heavy tail data, several models have been introduced. If the interest is in the tail of the distribution, block maxima or excess over thresholds are the typical approaches, wasting relevant information in the bulk of the data. To avoid this, two building block mixture models for the body (below the threshold) and the tail (above the threshold) are proposed. In this paper, we exploit the richness of nonparametric mixture models to model heavy tail data. We specifically consider mixtures of shifted gamma-gamma distributions with four parameters and a normalised stable processes as a mixing distribution. One of these parameters is associated with the tail. By studying the posterior distribution of the tail parameter, we are able to estimate the proportion of the data that supports a heavy tail component. We develop an efficient MCMC method with adapting Metropolis-Hastings steps to obtain posterior inference and illustrate with simulated and real datasets.
💡 Research Summary
The paper addresses the problem of modeling heavy‑tailed data without discarding the bulk of the observations, a limitation common to traditional extreme‑value approaches that rely on block maxima or threshold exceedances. The authors propose a Bayesian non‑parametric mixture model that simultaneously captures the body and the tail of a distribution. The building block of the model is the shifted gamma‑gamma (SGG) density, a four‑parameter family (location µ, shape γ, tail α, scale β) that nests both the generalized Pareto distribution (GPD) and the ordinary gamma‑gamma distribution. The tail parameter α governs the heaviness of the tail: α ≤ 1 yields a regularly varying (heavy‑tailed) distribution, while α > 1 produces lighter tails.
To allow an unknown number of mixture components, the authors employ a normalized stable (NS) process as a mixing measure. The NS process, indexed by a stability parameter ν∈(0,1), is almost surely discrete and can be represented either as a normalized random measure or via a stick‑breaking construction. Small ν leads to few clusters, large ν to many; thus ν controls the model’s complexity. Each observation i is associated with a parameter vector θ_i = (µ_i, γ_i, α_i, β_i) drawn from the NS process, and the data are generated as X_i | θ_i ∼ SGG(θ_i). The base measure G₀ for the NS process is taken as independent gamma priors on each component of θ.
Because the SGG likelihood is not conjugate to the gamma priors, standard Gibbs sampling is infeasible. The authors adapt Neal’s Algorithm 8 for non‑conjugate Dirichlet‑process mixtures, extending it to the NS prior. The sampler proceeds by (i) augmenting each observation with a latent Gamma variable Y_i, exploiting the hierarchical representation X_i − µ_i | Y_i ∼ Gamma(γ_i, Y_i) and Y_i ∼ Gamma(α_i, β_i); (ii) drawing a set of r auxiliary component parameters from the prior; (iii) re‑assigning each θ_i to either an existing cluster or a new auxiliary component using a Metropolis–Hastings step that mixes the prior weight ν and the likelihood contribution; (iv) updating the parameters of each occupied cluster conditional on its assigned data; and (v) updating ν itself with a Metropolis–Hastings step under a Beta hyper‑prior. To improve mixing, the proposal distributions for each scalar parameter (µ, γ, α, β, ν) are uniform over a symmetric interval whose width δ is adapted every 50 iterations to target an acceptance rate between 30 % and 40 %. This adaptive scheme follows the recommendations of Robert and Casella (2010) and ensures efficient exploration of the high‑dimensional posterior.
The methodology is evaluated through a simulation study and two real‑world applications. In the simulation, data are generated from a two‑component SGG mixture: one component with light tail (α = 3) and another with heavy tail (α = 0.5). The model is fitted with vague gamma priors (shape = rate = 0.5) and several values of ν (0.5, 0.1, 0.05, 0.01) as well as two Beta hyper‑priors for ν. After 15,000 MCMC iterations, the posterior correctly identifies two clusters, recovers the true α values, and adapts ν to the appropriate number of clusters.
For real data, the authors analyze a financial loss dataset and a natural‑disaster insurance claim dataset. Compared with standard GPD‑based peak‑over‑threshold methods, the proposed mixture captures the full distribution, providing estimates of the bulk parameters (µ, γ, β) and, crucially, the posterior probability that a component belongs to the heavy‑tail regime (α ≤ 1). This yields interpretable measures of tail risk that can inform pricing, capital allocation, and risk management decisions.
In summary, the paper makes three principal contributions: (1) introducing a novel Bayesian non‑parametric mixture that combines the flexible SGG kernel with a normalized stable mixing measure; (2) offering a direct Bayesian inference framework for the tail‑heaviness parameter, allowing probabilistic statements about the presence and proportion of heavy‑tailed behavior; and (3) developing an adaptive Metropolis‑within‑Gibbs sampler capable of handling the non‑conjugate structure efficiently. By integrating the strengths of extreme‑value theory and mixture modeling, the approach provides a powerful tool for analysts dealing with heavy‑tailed phenomena across finance, insurance, environmental science, and related fields.
Comments & Academic Discussion
Loading comments...
Leave a Comment