Mean regression for (0,1) responses via beta scale mixtures
To achieve a greater general flexibility for modeling heavy-tailed bounded responses, a beta scale mixture model is proposed. Each member of the family is obtained by multiplying the scale parameter of the conditional beta distribution by a mixing random variable taking values on all or part of the positive real line and whose distribution depends on a single parameter governing the tail behavior of the resulting compound distribution. These family members allow for a wider range of values for skewness and kurtosis. To validate the effectiveness of the proposed model, we conduct experiments on both simulated data and real datasets. The results indicate that the beta scale mixture model demonstrates superior performance relative to the classical beta regression model and alternative competing methods for modeling responses on the bounded unit domain.
💡 Research Summary
The paper introduces a novel class of distributions called Beta Scale Mixtures (BSM) for modeling response variables that lie in the unit interval (0, 1). Traditional beta regression, while flexible in shape, is sensitive to outliers and cannot adequately capture heavy‑tailed behavior because its dispersion (or precision) parameter ϕ controls only the overall spread. The authors address this limitation by scaling the dispersion parameter with a positive mixing random variable W. Conditional on a given W = w, the response Y follows a beta distribution with mean μ and scaled dispersion ϕ/w. The mixing distribution h(w; θ) is governed by a single tail‑weight parameter θ, which determines how much probability mass is placed on large values of W and therefore how heavy the resulting mixture tails become. When W is degenerate at 1, the model collapses to the ordinary beta distribution, guaranteeing that BSM nests the classical model as a special case.
Four specific mixing distributions are examined: (i) a two‑point (Bernoulli) mixing variable, leading to the Two‑Point Beta (TPB) distribution; (ii) a Gamma mixing variable, yielding the Gamma‑Beta (GB) distribution; (iii) a Log‑Normal mixing variable, giving the Log‑Normal‑Beta (LNB) distribution; and (iv) an Inverse Gaussian mixing variable, producing the Inverse‑Gaussian‑Beta (IGB) distribution. Each choice provides a different mechanism for inflating variance and thickening tails. The TPB model is particularly interpretable: θ₁ represents the proportion of “good” observations drawn from the reference beta component, while θ₂ (> 1) inflates the variance of the contaminant component. The continuous mixing families (Gamma, Log‑Normal, Inverse Gaussian) allow a smooth spectrum of tail heaviness controlled by θ.
The authors derive closed‑form expressions for the first four moments of Y under the BSM framework using the hierarchical representation. The mean remains μ, while the variance, skewness, and excess kurtosis involve expectations of functions of W such as E
Comments & Academic Discussion
Loading comments...
Leave a Comment