The exact amount of t-ness that the normal model can tolerate
Suppose that the normal model is used for data $Y_1,\ldots,Y_n$, but that the true distribution is a t-distribution with location and scale parameters $ξ$ and $σ$ and $m$ degrees of freedom. The normal model corresponds to $m=\infty$. Using a local asymptotic framework where $m$ is allowed to increase with $n$ two classes of estimands are identified. One small class, which in particular contains the functions of $ξ$ alone, is only affected by t-ness to the second order, and maximum likelihood estimation in the two- or three-parameter models become equivalent. For all other estimands it is shown that if $m\ge1.458\sqrt{n}$, then maximum likelihood estimation using the incorrect normal model is still more precise than using the correct three-parameter model. This is furthermore shown to be true in regression models with t-distributed residuals. We also propose and analyse compromise estimators that in various ways interpolate between the normal and the nonnormal models. A separate section extends the t-ness results to general normal scale mixtures, in which case the tolerance radius around the normal error distribution takes the form of an upper bound $0.3429/\sqrt{n}$ for the variance of the scale mixture distribution. Proving our results requires somewhat nonstandard `corner asymptotics’ since behaviour of estimators must be studied when the crucial parameter $γ=1/m$ is close to zero, which is not an inner point of the parameter space, and the maximum likelihood estimator of $m$ is equal to $\infty$ with positive probability.
💡 Research Summary
The paper investigates how much deviation from normality, in the form of heavy‑tailed t‑distributed errors, can be tolerated before the normal model becomes inferior to a correctly specified three‑parameter t‑model. The authors adopt a “corner asymptotics” framework in which the degrees‑of‑freedom parameter m (inverse γ = 1/m) approaches infinity at a rate proportional to √n, the sample size. By re‑parameterising the t‑density as f(y,ξ,σ,γ) and expanding the log‑likelihood around γ = 0, they obtain explicit expressions for the score functions and the Fisher information matrix.
Two estimation strategies are compared: (i) the “narrow” approach that fits only location ξ and scale σ under the normal model (γ = 0), and (ii) the “wide” approach that fits all three parameters (ξ, σ, γ) using maximum likelihood. Because the MLE of γ can be exactly zero with positive probability, the usual asymptotic normality breaks down for the wide model; the limiting distribution becomes piecewise, depending on whether the random component C (derived from the third score) exceeds −δ, where δ = γ√n.
The authors derive the asymptotic mean‑squared error (MSE) for a generic estimand μ(ξ,σ,γ). For the narrow model the limiting risk is
E
Comments & Academic Discussion
Loading comments...
Leave a Comment