Minimax Generalized Cross-Entropy

Minimax Generalized Cross-Entropy
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Loss functions play a central role in supervised classification. Cross-entropy (CE) is widely used, whereas the mean absolute error (MAE) loss can offer robustness but is difficult to optimize. Interpolating between the CE and MAE losses, generalized cross-entropy (GCE) has recently been introduced to provide a trade-off between optimization difficulty and robustness. Existing formulations of GCE result in a non-convex optimization over classification margins that is prone to underfitting, leading to poor performances with complex datasets. In this paper, we propose a minimax formulation of generalized cross-entropy (MGCE) that results in a convex optimization over classification margins. Moreover, we show that MGCEs can provide an upper bound on the classification error. The proposed bilevel convex optimization can be efficiently implemented using stochastic gradient computed via implicit differentiation. Using benchmark datasets, we show that MGCE achieves strong accuracy, faster convergence, and better calibration, especially in the presence of label noise.


💡 Research Summary

The paper addresses a fundamental limitation of the recently proposed Generalized Cross‑Entropy (GCE) loss, namely its non‑convexity with respect to classifier margins when the interpolation parameter β lies between the extremes of Mean Absolute Error (MAE, β = 1) and Cross‑Entropy (CE, β → ∞). Existing GCE methods use a fixed softmax link to convert margins into class probabilities, which makes the resulting loss non‑convex and prone to under‑fitting on complex datasets. To overcome this, the authors introduce a Minimax Generalized Cross‑Entropy (MGCE) framework that integrates the α‑loss (re‑parameterized as β) into a minimax formulation over an uncertainty set of data distributions.

The uncertainty set U is defined by moment constraints on the feature mapping Φ, with a confidence vector λ derived from sample statistics. The minimax objective seeks a classifier h that minimizes the worst‑case expected β‑loss over all distributions p ∈ U. By exploiting the structure of the β‑loss, the authors derive a bilevel convex optimization problem: the outer level optimizes a linear‑plus‑ℓ1 regularized term in the margin parameters μ, while the inner level defines a scalar function ϕβ(x, μ) as the maximal ν satisfying a convex constraint on the transformed margins. Crucially, ϕβ is concave in μ for β ≥ 1, which guarantees that the overall objective is convex in μ.

A novel β‑dependent link function is then obtained:
(h_{\beta}(x)_y = \frac{


Comments & Academic Discussion

Loading comments...

Leave a Comment