Catalyst: Out-of-Distribution Detection via Elastic Scaling

Catalyst: Out-of-Distribution Detection via Elastic Scaling
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Out-of-distribution (OOD) detection is critical for the safe deployment of deep neural networks. State-of-the-art post-hoc methods typically derive OOD scores from the output logits or penultimate feature vector obtained via global average pooling (GAP). We contend that this exclusive reliance on the logit or feature vector discards a rich, complementary signal: the raw channel-wise statistics of the pre-pooling feature map lost in GAP. In this paper, we introduce Catalyst, a post-hoc framework that exploits these under-explored signals. Catalyst computes an input-dependent scaling factor ($γ$) on-the-fly from these raw statistics (e.g., mean, standard deviation, and maximum activation). This $γ$ is then fused with the existing baseline score, multiplicatively modulating it – an ``elastic scaling’’ – to push the ID and OOD distributions further apart. We demonstrate Catalyst is a generalizable framework: it seamlessly integrates with logit-based methods (e.g., Energy, ReAct, SCALE) and also provides a significant boost to distance-based detectors like KNN. As a result, Catalyst achieves substantial and consistent performance gains, reducing the average False Positive Rate by 32.87 on CIFAR-10 (ResNet-18), 27.94% on CIFAR-100 (ResNet-18), and 22.25% on ImageNet (ResNet-50). Our results highlight the untapped potential of pre-pooling statistics and demonstrate that Catalyst is complementary to existing OOD detection approaches.


💡 Research Summary

The paper tackles the critical problem of out‑of‑distribution (OOD) detection for deep neural networks, proposing a simple yet powerful post‑hoc framework called Catalyst. Existing state‑of‑the‑art OOD detectors (e.g., Energy, ReAct, SCALE) rely exclusively on the model’s logits or on the feature vector obtained after global average pooling (GAP). This reliance discards the rich channel‑wise statistics that exist in the pre‑pooling activation map, such as per‑channel mean, standard deviation, and maximum activation. The authors argue that these statistics contain complementary information that can help separate in‑distribution (ID) from OOD samples.

Catalyst extracts three statistics from the penultimate layer’s activation map g(x) of shape n × k × k:

  1. Channel mean µ(x) – identical to the GAP feature vector.
  2. Channel standard deviation σ(x) – measures spatial variability within each channel.
  3. Channel maximum m(x) – captures the strongest response per channel.

Because OOD samples often produce extreme values, each statistic is clipped at a configurable threshold c to avoid domination of the scaling factor. The clipped vectors (\bar{f}(x)) are summed across channels to obtain an input‑dependent scalar γ(x) = ∑ᵢ (\bar{f}_i(x)). This γ is then fused with any existing OOD score S(x; θ) through either multiplicative (γ·S) or additive (γ+S) combination. The authors adopt the multiplicative “elastic scaling” as the primary method, noting its robustness to hyper‑parameter changes and its ability to stretch or shrink the score distribution elastically.

Catalyst is model‑agnostic and method‑agnostic: it can be applied on top of Energy, MSP, ODIN, ReAct, DICE, ASH, SCALE, and even distance‑based detectors such as K‑Nearest Neighbour (KNN). The paper provides a formal analysis (Appendix B) showing that scaling the score by γ increases the statistical distance between ID and OOD score distributions, thereby improving detection metrics.

Experimental evaluation spans three standard benchmarks: CIFAR‑10, CIFAR‑100, and ImageNet‑1k, using a variety of architectures (ResNet‑18/34/50, DenseNet‑121, MobileNet‑v2). The baseline is the Energy score; Catalyst is evaluated both alone (using each statistic separately) and in combination with other methods (e.g., Catalyst + ReAct). Results are reported as average FPR@95% (FPR95) and AUROC over six OOD test sets per benchmark.

Key findings:

  • On CIFAR‑10 with ResNet‑18, Catalyst + ReAct (using the maximum statistic) reduces FPR95 from 29.76 % (ReAct alone) to 13.19 %, a 55 % relative reduction, while AUROC rises from 95.19 % to 91.70 % (higher is better, the reported AUROC is already near‑perfect).
  • On CIFAR‑100, Catalyst + ReAct (maximum) cuts FPR95 from 57.76 % to 34.66 %, again a substantial gain.
  • On ImageNet‑1k (ResNet‑50), Catalyst alone lowers average FPR95 by 22.25 % relative to the Energy baseline, and Catalyst + ReAct yields further improvements.
  • When applied to KNN, Catalyst still provides a noticeable boost, confirming its generality beyond logit‑based scores.

Ablation studies explore:

  1. Statistic choice – σ and m generally outperform µ, indicating that variability and peak activation are more discriminative for OOD.
  2. Clipping threshold c – performance is stable for c in the range 0.5–2.0 (dataset‑dependent).
  3. Multiplicative vs. additive fusion – both improve performance, but multiplicative scaling is less sensitive to hyper‑parameters and yields more consistent gains.

The method adds negligible computational overhead (a few per‑channel reductions and a sum) and requires no retraining, making it attractive for real‑world deployment where models are already fixed. Limitations include the need to tune the clipping threshold for new domains and a modest increase in computation for very wide networks with thousands of channels.

In summary, Catalyst introduces an elastic scaling factor derived from under‑exploited pre‑pooling channel statistics, and demonstrates that this simple augmentation can substantially improve OOD detection across diverse datasets, architectures, and baseline methods. The work opens a new direction for leveraging internal activation statistics in safety‑critical AI systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment