Deep Probabilistic Supervision for Image Classification
Supervised training of deep neural networks for classification typically relies on hard targets, which promote overconfidence and can limit calibration, generalization, and robustness. Self-distillation methods aim to mitigate this by leveraging inter-class and sample-specific information present in the model’s own predictions, but often remain dependent on hard targets without explicitly modeling predictive uncertainty. With this in mind, we propose Deep Probabilistic Supervision (DPS), a principled learning framework constructing sample-specific target distributions via statistical inference on the model’s own predictions, remaining independent of hard targets after initialization. We show that DPS consistently yields higher test accuracy (e.g., +2.0% for DenseNet-264 on ImageNet) and significantly lower Expected Calibration Error (ECE) (-40% ResNet-50, CIFAR-100) than existing self-distillation methods. When combined with a contrastive loss, DPS achieves state-of-the-art robustness under label noise.
💡 Research Summary
The paper introduces Deep Probabilistic Supervision (DPS), a novel self‑distillation framework that replaces hard one‑hot targets with dynamically constructed probabilistic targets derived from the model’s own predictions. Each training sample i is associated with a latent class distribution y_i modeled as a Dirichlet distribution Dir(α_i). The prior α_i^0 is set to concentrate most of its mass on the original label (using a large concentration c for the true class and a tiny ε for all others). At every training epoch t the network produces a softmax prediction ŷ_ti, which is interpreted as fractional evidence and added to the Dirichlet parameters: α_ti = γ α_{t‑1,i} + ŷ_ti. The discount factor γ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment