Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime
This paper provides data-dependent bounds on the expected error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The results show that generalization in the low-temperature regime is already signaled by small training errors in the noisier high-temperature regime. The bounds are stable under approximation with Langevin Monte Carlo algorithms. The analysis motivates the design of an algorithm to compute bounds, which on the MNIST and CIFAR-10 datasets yield nontrivial, close predictions on the test error for true labeled data, while maintaining a correct upper bound on the test error for random labels.
💡 Research Summary
The paper tackles a central puzzle of modern deep learning: in the over‑parameterized “interpolation” regime, models can achieve near‑zero training error even on completely random labels, yet sometimes they also generalize well on real data. Classical generalization bounds become vacuous in this setting because they depend only on the size of the hypothesis class or on the training error, both of which are negligible when the model interpolates.
The authors focus on the Gibbs posterior, a probability distribution over hypotheses that assigns weight proportional to (\exp(-\beta \hat L(h,x))), where (\beta) is the inverse temperature and (\hat L) is the empirical loss. They observe that the log‑density of the Gibbs posterior admits a simple integral representation with respect to temperature: \
Comments & Academic Discussion
Loading comments...
Leave a Comment