Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses

Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the MMD variance, often differing under the null and alternative hypotheses and across balanced or imbalanced sampling schemes. In this paper, we study the variance of the MMD statistic through its U-statistic representation and Hoeffding decomposition, and establish a unified finite-sample characterization covering different hypotheses and sample configurations. Building on this analysis, we propose an exact acceleration method for the univariate case under the Laplacian kernel, which reduces the overall computational complexity from $\mathcal O(n^2)$ to $\mathcal O(n \log n)$.


💡 Research Summary

The paper addresses a fundamental limitation in the use of the Maximum Mean Discrepancy (MMD) for non‑parametric two‑sample testing: the lack of a unified, unbiased finite‑sample variance estimator that works equally well under the null hypothesis (identical distributions), the alternative hypothesis (different distributions), and for both balanced and imbalanced sample sizes. Existing approaches either provide separate estimators for each regime or rely on asymptotic approximations that become biased when the sample sizes are moderate.

Theoretical contribution
The authors start from the well‑known unbiased estimator of MMD², which can be written as a linear combination of two single‑sample U‑statistics (the within‑sample kernel sums) and one generalized U‑statistic (the cross‑sample kernel sum). By applying Hoeffding’s decomposition to each component separately, they obtain explicit first‑order projection terms (g_{1}) and second‑order residual terms (g_{2}). The decomposition yields a clean variance expression

\


Comments & Academic Discussion

Loading comments...

Leave a Comment