DAVIS: OOD Detection via Dominant Activations and Variance for Increased Separation
Detecting out-of-distribution (OOD) inputs is a critical safeguard for deploying machine learning models in the real world. However, most post-hoc detection methods operate on penultimate feature representations derived from global average pooling (GAP) – a lossy operation that discards valuable distributional statistics from activation maps prior to global average pooling. We contend that these overlooked statistics, particularly channel-wise variance and dominant (maximum) activations, are highly discriminative for OOD detection. We introduce DAVIS, a simple and broadly applicable post-hoc technique that enriches feature vectors by incorporating these crucial statistics, directly addressing the information loss from GAP. Extensive evaluations show DAVIS sets a new benchmark across diverse architectures, including ResNet, DenseNet, and EfficientNet. It achieves significant reductions in the false positive rate (FPR95), with improvements of 48.26% on CIFAR-10 using ResNet-18, 38.13% on CIFAR-100 using ResNet-34, and 26.83% on ImageNet-1k benchmarks using MobileNet-v2. Our analysis reveals the underlying mechanism for this improvement, providing a principled basis for moving beyond the mean in OOD detection.
💡 Research Summary
The paper addresses the critical problem of out‑of‑distribution (OOD) detection for deep neural networks that are deployed in real‑world applications. While many recent post‑hoc OOD detectors operate on the penultimate feature vector obtained after global average pooling (GAP), GAP discards the spatial distribution of activations, keeping only a single mean value per channel. The authors argue that this loss of information is especially detrimental because OOD samples often exhibit distinctive patterns in the activation maps: they may contain unusually high peaks or display a different variance across spatial locations.
To recover these overlooked cues, the authors propose DAVIS (Dominant Activations and Variance for Increased Separation), a lightweight plug‑and‑play module that enriches the feature representation before the final fully‑connected layer. Two variants are introduced: (1) DAVIS(m), which replaces the GAP vector with the per‑channel maximum (the “dominant” activation), and (2) DAVIS(µ,σ), which augments the per‑channel mean with a scaled standard deviation (γ·σ). Both variants require no additional training; they simply compute extra statistics from the pre‑pooling activation maps and feed the resulting vector into the existing classifier head.
The method is evaluated on three scales of data: CIFAR‑10/100 (with six OOD test sets) and ImageNet‑1k (with four OOD benchmarks). A variety of backbone architectures are used, including ResNet‑18, ResNet‑34, DenseNet‑101, MobileNet‑v2, DenseNet‑121, ResNet‑50, and EfficientNet‑b0. Performance is measured with the standard OOD metrics FPR95 (false‑positive rate at 95 % true‑positive rate) and AUROC.
Results show that even the simplest DAVIS(m) alone dramatically improves over the baseline energy score: on CIFAR‑10 with ResNet‑18, FPR95 drops from 35.61 % to 18.21 %; on CIFAR‑100 with ResNet‑34, it falls from 59.43 % to 52.26 %. When combined with existing post‑hoc refinements such as DICE, ASH, ReAct, or SCALE, DAVIS yields further gains. For example, DAVIS(m)+DICE reduces FPR95 by 48.27 % relative to the baseline on CIFAR‑10 across four architectures, and similar relative reductions (≈38 %) are observed on CIFAR‑100. On ImageNet‑1k, DAVIS variants achieve average FPR95 reductions of roughly 10 % and AUROC improvements up to 0.97, surpassing strong baselines.
A statistical analysis (Appendix B) demonstrates why the dominant activation and variance are effective. The per‑channel maximum creates a larger separation gap between ID and OOD distributions than the mean, as illustrated in Figure 1 of the paper. The variance captures the spread of activations, providing an orthogonal signal that further separates the two classes. By concatenating or combining these statistics, the feature space becomes more discriminative, allowing downstream scoring functions (energy, MSP, ODIN) to operate with higher confidence.
Importantly, DAVIS is compatible with any existing post‑hoc detector because it operates purely on the feature level before scoring. The authors show that adding DAVIS to ReAct, DICE, ASH, or SCALE never degrades performance and often yields additive improvements, confirming its role as a complementary enhancement rather than a replacement. Hyper‑parameter γ is found to be robust across datasets (typically 0.1–0.5), and the extra computational cost is negligible: computing per‑channel maxima and variances is O(N·K²) where N is the number of channels and K the spatial resolution, which is trivial on modern GPUs.
In summary, the paper identifies a fundamental limitation of GAP‑based OOD detectors, proposes a simple yet powerful remedy that leverages channel‑wise maxima and variance, and validates the approach across multiple datasets, architectures, and scoring functions. DAVIS sets new state‑of‑the‑art results while remaining easy to implement, making it a valuable addition to the toolbox of practitioners seeking reliable OOD detection in safety‑critical systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment