Information theoretic limits of robust sub-Gaussian mean estimation under star-shaped constraints

Information theoretic limits of robust sub-Gaussian mean estimation under star-shaped constraints
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We obtain the minimax rate for a mean location model with a bounded star-shaped set $K \subseteq \mathbb{R}^n$ constraint on the mean, in an adversarially corrupted data setting with Gaussian noise. We assume an unknown fraction $ε\le 1/2-κ$ for some fixed $κ\in(0,1/2]$ of $N$ observations are arbitrarily corrupted. We obtain a minimax risk up to proportionality constants under the squared $\ell_2$ loss of $\max(η^{2},σ^2ε^2)\wedge d^2$ with \begin{align} η^* = \sup \bigg{η\ge 0 : \frac{Nη^2}{σ^2} \leq \log \mathcal{M}_K^{\operatorname{loc}}(η,c)\bigg}, \end{align*} where $\log \mathcal{M}_K^{\operatorname{loc}}(η,c)$ denotes the local entropy of the set $K$, $d$ is the diameter of $K$, $σ^2$ is the variance, and $c$ is some sufficiently large absolute constant. A variant of our algorithm achieves the same rate for settings with known or symmetric sub-Gaussian noise, with a smaller breakdown point, still of constant order. We further study the case of unknown sub-Gaussian noise and show that the rate is slightly slower: $\max(η^{*2},σ^2ε^2\log(1/ε))\wedge d^2$. We generalize our results to the case when $K$ is star-shaped but unbounded.


💡 Research Summary

This paper studies the fundamental statistical limits of multivariate mean estimation when the unknown mean vector μ is known to lie in a star‑shaped set K⊂ℝⁿ and a fraction ε≤½−κ (for a fixed κ∈(0,½]) of the N observations may be arbitrarily corrupted by an adversary. The clean data follow a location model ˜X_i=μ+ξ_i, where the noise ξ_i is either Gaussian N(0,σ²I_n) or sub‑Gaussian with parameter σ (the variance proxy). The goal is to design an estimator μ̂ that minimizes the worst‑case expected squared ℓ₂ error
R(N,ε,K,σ)=inf_{μ̂} sup_{μ∈K} sup_{C} E_μ


Comments & Academic Discussion

Loading comments...

Leave a Comment