Buffer layers for Test-Time Adaptation
In recent advancements in Test Time Adaptation (TTA), most existing methodologies focus on updating normalization layers to adapt to the test domain. However, the reliance on normalization-based adaptation presents key challenges. First, normalization layers such as Batch Normalization (BN) are highly sensitive to small batch sizes, leading to unstable and inaccurate statistics. Moreover, normalization-based adaptation is inherently constrained by the structure of the pre-trained model, as it relies on training-time statistics that may not generalize well to unseen domains. These issues limit the effectiveness of normalization-based TTA approaches, especially under significant domain shift. In this paper, we introduce a novel paradigm based on the concept of a Buffer layer, which addresses the fundamental limitations of normalization layer updates. Unlike existing methods that modify the core parameters of the model, our approach preserves the integrity of the pre-trained backbone, inherently mitigating the risk of catastrophic forgetting during online adaptation. Through comprehensive experimentation, we demonstrate that our approach not only outperforms traditional methods in mitigating domain shift and enhancing model robustness, but also exhibits strong resilience to forgetting. Furthermore, our Buffer layer is modular and can be seamlessly integrated into nearly all existing TTA frameworks, resulting in consistent performance improvements across various architectures. These findings validate the effectiveness and versatility of the proposed solution in real-world domain adaptation scenarios. The code is available at https://github.com/hyeongyu-kim/Buffer_TTA.
💡 Research Summary
Test‑time adaptation (TTA) aims to adjust a pretrained model to the distribution of unlabeled target data during inference, without accessing source data. Most recent TTA methods focus on updating batch‑normalization (BN) statistics and affine parameters, because BN is a convenient entry point for adaptation. However, BN‑based adaptation suffers from two fundamental drawbacks. First, BN relies on accurate batch statistics, which become unreliable when the test batch size is small (e.g., 2–4 samples), leading to unstable or even harmful updates. Second, BN is tightly coupled to the pretrained backbone; modifying its parameters can disturb the learned feature representations and cause catastrophic forgetting of the source knowledge, especially under prolonged online adaptation.
To overcome these limitations, the authors propose a lightweight, modular “Buffer layer” that is inserted in parallel to the backbone but never alters the backbone’s weights. Each Buffer layer consists of a 1×1 convolution for channel reduction/expansion followed by a 3×3 convolution for local feature refinement. The output of the Buffer is scaled by a learnable scalar and added residually to the original activation. Crucially, only the Buffer parameters are updated during test time, while the backbone (including its BN layers) remains frozen. This design yields three key advantages: (1) the pretrained representation is preserved, eliminating catastrophic forgetting; (2) adaptation does not depend on batch statistics, so it remains stable even with very small batches; and (3) the extra computational and memory overhead is minimal because the Buffer is shallow and back‑propagation is confined to it.
The paper evaluates the Buffer layer across a wide range of state‑of‑the‑art TTA algorithms—TENT, EA‑TTA, SAR, DeYO, CMF, ROID, among others—by simply replacing the original update target (“@BN”) with the Buffer (“@Buffer”). All other components (learning rate, loss functions such as entropy minimization or consistency regularization) are kept unchanged, allowing a clean ablation of “what to update”. Experiments are conducted on several corruption benchmarks (CIFAR‑10‑C, CIFAR‑100‑C, ImageNet‑C) and on CIFAR‑10‑W, using diverse backbones (WRN‑28, WRN‑40, ResNeXt).
Results consistently show that Buffer‑based adaptation outperforms BN‑based adaptation, especially in the low‑batch regime. For example, on WRN‑28 with batch size 2, TENT @BN yields a 82.56 % error rate, whereas TENT @Buffer reduces it to 37.05 % (a 45.5 % relative drop). Similar gains are observed for EA‑TTA, SAR, DeYO, and CMF across both CIFAR‑10‑C and CIFAR‑100‑C. When the Buffer layer is combined with the original BN update (denoted “@BN+Buffer”), additional improvements are often observed, demonstrating that the Buffer acts as a complementary adaptation module rather than a mere replacement.
Beyond accuracy, the Buffer layer’s modularity enables seamless integration into any TTA pipeline without architectural changes. Its lightweight nature keeps inference latency low, making it suitable for resource‑constrained or real‑time deployments. Moreover, because the backbone is never altered, the method inherently mitigates catastrophic forgetting, a problem that plagues many online adaptation schemes that continuously fine‑tune the whole network.
In summary, the paper reframes TTA from “how to update” to “what to update”, introducing a dedicated auxiliary network that isolates adaptation from the core model. The Buffer layer offers a practical, general‑purpose solution that addresses the instability of BN under small batches, preserves pretrained knowledge, and delivers consistent performance gains across a variety of benchmarks and adaptation strategies. Future work may explore richer Buffer architectures, meta‑learning objectives, or multi‑task extensions, but the current study already establishes the Buffer layer as a compelling building block for robust, source‑free test‑time adaptation.
Comments & Academic Discussion
Loading comments...
Leave a Comment