Optimal Transport-Induced Samples against Out-of-Distribution Overconfidence
Deep neural networks (DNNs) often produce overconfident predictions on out-of-distribution (OOD) inputs, undermining their reliability in open-world environments. Singularities in semi-discrete optimal transport (OT) mark regions of semantic ambiguity, where classifiers are particularly prone to unwarranted high-confidence predictions. Motivated by this observation, we propose a principled framework to mitigate OOD overconfidence by leveraging the geometry of OT-induced singular boundaries. Specifically, we formulate an OT problem between a continuous base distribution and the latent embeddings of training data, and identify the resulting singular boundaries. By sampling near these boundaries, we construct a class of OOD inputs, termed optimal transport-induced OOD samples (OTIS), which are geometrically grounded and inherently semantically ambiguous. During training, a confidence suppression loss is applied to OTIS to guide the model toward more calibrated predictions in structurally uncertain regions. Extensive experiments show that our method significantly alleviates OOD overconfidence and outperforms state-of-the-art methods.
💡 Research Summary
The paper tackles the pervasive problem of deep neural networks (DNNs) producing overconfident predictions on out‑of‑distribution (OOD) inputs, which jeopardizes reliability in open‑world settings. Existing mitigation strategies either perform post‑hoc OOD detection or expose the model to heuristically generated proxy OOD samples (e.g., external datasets, input corruptions, class mixing). While effective to some degree, these approaches lack a principled connection to the regions where overconfidence is most likely to arise—namely, semantically ambiguous zones near class boundaries.
The authors observe that in semi‑discrete optimal transport (OT), where a continuous source measure (e.g., a Gaussian) is mapped onto a discrete target measure (the latent embeddings of training data), the optimal transport map is given by the gradient of a convex potential (Brenier’s theorem). This potential is the upper envelope of affine functions and induces a partition of the source domain into convex Laguerre cells. The interfaces between adjacent cells are non‑differentiable loci—transport singularities—where the transport direction changes abruptly. The paper argues that these singular sets correspond to structurally unstable zones that align with semantic transitions in classification tasks, and thus are natural candidates for generating OOD samples that provoke overconfidence.
To exploit this insight, the authors first train an autoencoder to embed images into a compact latent space. The latent vectors of the training set become the support points of the discrete target measure ν. They then solve the semi‑discrete OT problem between a chosen continuous source distribution μ (Gaussian or uniform) and ν. Using Monte‑Carlo sampling, they estimate the μ‑volume of each Laguerre cell and optimize the offset vector h to match the desired cell masses (uniform in their experiments). This yields a power‑diagram partition of the latent space.
From the resulting partition, they compute a geometric “singularity score” for each cell boundary S_ij as the angular deviation between the vectors y_i and y_j (the centroids of adjacent cells). Boundaries with the largest angular deviations are selected as singular boundaries S′, under the hypothesis that larger angular changes indicate sharper transport direction shifts and higher likelihood of semantic ambiguity.
For each selected singular boundary, the method estimates the mass centers c_i and c_j of the two adjacent cells. It then draws a random source sample z∼μ, computes inverse‑distance interpolation weights λ_i and λ_j, and constructs a smoothed transport extension ˜T(z)=λ_i T(c_i)+λ_j T(c_j). The resulting latent vector ˆy=˜T(z) is decoded back to image space, producing an Optimal Transport‑Induced OOD Sample (OTIS). Because OTIS are generated near transport singularities, they are intrinsically semantically ambiguous—often containing blended features from neighboring classes—yet they are grounded in the geometry of the data distribution rather than arbitrary noise.
During training, each mini‑batch consists of 50 % original in‑distribution (ID) samples (trained with standard cross‑entropy) and 50 % OTIS (trained with a confidence suppression loss L_sup = Σ_{k=1}^K (1/K) log V_k(·), where V_k denotes the softmax probability for class k). This loss encourages the model to output a near‑uniform distribution on OTIS, thereby reducing maximum confidence in structurally ambiguous regions.
The authors evaluate the approach on several benchmarks (CIFAR‑10/100, SVHN) and architectures (ResNet‑18, WideResNet‑28‑10). Metrics include test error, mean maximum confidence on ID data (ID MMC), and mean maximum confidence on OOD data (OOD MMC). Table 1 shows that the proposed method consistently lowers OOD MMC—often below 10 %—while preserving or slightly improving ID accuracy compared to strong baselines such as Outlier Exposure (OE), Confidence Calibration (CCU), and recent OOD‑aware training schemes. Ablation studies varying the proportion of singular boundaries, latent dimensionality, and number of Monte‑Carlo samples demonstrate robustness of the method to hyper‑parameter choices.
Strengths of the work include: (1) a solid theoretical foundation linking OT singularities to semantic ambiguity; (2) a practical pipeline that operates in a low‑dimensional latent space, making OT computation tractable; (3) empirical evidence that geometry‑driven OOD samples are more effective than heuristic proxies. Limitations are acknowledged: the quality of OT‑induced samples depends on the autoencoder’s representation power; Monte‑Carlo estimation of cell volumes and singular scores incurs computational overhead, which may be challenging for very large datasets; and the method currently assumes a static target distribution (the training embeddings) without adapting to evolving feature spaces during training.
Future directions suggested include integrating more expressive latent encoders (e.g., contrastive or self‑supervised models), developing scalable OT solvers that avoid Monte‑Carlo sampling, and combining OT‑based sample generation with Bayesian uncertainty estimation or ensemble methods for further robustness.
In summary, the paper introduces a novel, theoretically grounded framework that leverages optimal transport singularities to generate semantically ambiguous OOD samples, and demonstrates that training with these samples effectively suppresses overconfident predictions on OOD inputs while maintaining high performance on in‑distribution data.
Comments & Academic Discussion
Loading comments...
Leave a Comment