Stability Regularized Cross-Validation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We revisit the problem of ensuring strong test set performance via cross-validation, and propose a nested k-fold cross-validation scheme that selects hyperparameters by minimizing a weighted sum of the usual cross-validation metric and an empirical model-stability measure. The weight on the stability term is itself chosen via a nested cross-validation procedure. This reduces the risk of strong validation set performance and poor test set performance due to instability. We benchmark our procedure on a suite of $13$ real-world datasets, and find that, compared to $k$-fold cross-validation over the same hyperparameters, it improves the out-of-sample MSE for sparse ridge regression and CART by $4%$ and $2%$ respectively on average, but has no impact on XGBoost. It also reduces the user’s out-of-sample disappointment, sometimes significantly. For instance, for sparse ridge regression, the nested k-fold cross-validation error is on average $0.9%$ lower than the test set error, while the $k$-fold cross-validation error is $21.8%$ lower than the test error. Thus, for unstable models such as sparse regression and CART, our approach improves test set performance and reduces out-of-sample disappointment.

💡 Research Summary

The paper tackles a well‑known shortcoming of standard k‑fold cross‑validation (CV): when hyper‑parameters are selected by minimizing the CV error, the resulting model often exhibits an “adaptivity gap” – the validation error is systematically lower than the true test error, especially in small‑sample or high‑dimensional settings. This phenomenon, sometimes called the optimizer’s curse or out‑of‑sample disappointment, arises because the CV error itself is a random quantity; optimizing it can inadvertently over‑fit to the validation folds.

To mitigate this, the authors build on the algorithmic stability framework of Bousquet & Elisseeff (2002). They first extend the classic stability‑based generalization bound from leave‑one‑out CV to arbitrary k‑fold CV, obtaining a bound of the form

Stability Regularized Cross-Validation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment