Stylized Synthetic Augmentation further improves Corruption Robustness

Stylized Synthetic Augmentation further improves Corruption Robustness
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper proposes a training data augmentation pipeline that combines synthetic image data with neural style transfer in order to address the vulnerability of deep vision models to common corruptions. We show that although applying style transfer on synthetic images degrades their quality with respect to the common Frechet Inception Distance (FID) metric, these images are surprisingly beneficial for model training. We conduct a systematic empirical analysis of the effects of both augmentations and their key hyperparameters on the performance of image classifiers. Our results demonstrate that stylization and synthetic data complement each other well and can be combined with popular rule-based data augmentation techniques such as TrivialAugment, while not working with others. Our method achieves state-of-the-art corruption robustness on several small-scale image classification benchmarks, reaching 93.54%, 74.9% and 50.86% robust accuracy on CIFAR-10-C, CIFAR-100-C and TinyImageNet-C, respectively


💡 Research Summary

The paper tackles the well‑known vulnerability of deep vision models to real‑world image corruptions (noise, blur, weather effects, etc.) by introducing a novel data‑augmentation pipeline that fuses two model‑guided techniques: synthetic image generation and neural style transfer (NST). Synthetic images are produced with a state‑of‑the‑art diffusion model (EDM) yielding a pool of one million samples. NST is applied using the AdaIN formulation, which transfers the statistical moments (mean and variance) of a style image’s feature maps onto a content image’s feature maps, allowing a smooth interpolation between the original and fully stylized appearance via a strength parameter α.

The pipeline is governed by four key hyper‑parameters: λ (the proportion of synthetic images in each batch), λ_o and λ_s (probabilities of applying NST to original and synthetic images, respectively), and α_o/α_s (style strengths for each image type). Systematic grid searches reveal non‑trivial optima—approximately λ ≈ 0.5, λ_s ≈ 0.7, λ_o ≈ 0.3, α_s ≈ 0.8, α_o ≈ 0.4—indicating that heavy stylization of synthetic data combined with modest stylization of real data yields the best corruption robustness.

A striking observation is that NST degrades the Frechet Inception Distance (FID) of synthetic images, suggesting poorer visual fidelity, yet models trained on these stylized synthetic samples outperform those trained on high‑FID but non‑stylized data. The authors argue that NST strips away unrealistic textures from synthetic images, forcing the network to focus on shape cues that are more invariant to corruptions. This aligns with prior findings that stylized ImageNet improves texture bias reduction, but the current work uniquely demonstrates the synergy between synthetic data and NST.

The authors evaluate the method on three benchmark classification datasets—CIFAR‑10, CIFAR‑100, and TinyImageNet—using their corrupted counterparts (C‑C) for robustness testing. The primary architecture is WideResNet‑28‑4, with additional experiments on DenseNet‑201‑12, ResNeXt‑29‑32x4d, and a Vision Transformer (ViT‑B‑16). The augmentation pipeline is combined with TrivialAugment (TA), a simple yet effective rule‑based policy. When paired with TA, the stylized‑synthetic pipeline yields state‑of‑the‑art robust accuracies: 93.54 % on CIFAR‑10‑C, 74.9 % on CIFAR‑100‑C, and 50.86 % on TinyImageNet‑C. These numbers surpass previous bests (≈92 % on CIFAR‑10‑C) while preserving clean‑test accuracy, thereby mitigating the usual accuracy‑robustness trade‑off.

Conversely, the authors find that the pipeline is incompatible with more complex rule‑based policies such as AutoAugment, AugMix, or NoisyMix; these combinations actually degrade performance. The likely cause is that NST dramatically alters global color and contrast statistics, which interferes with the expectations of those policies that rely on consistent photometric distributions.

The paper’s contributions are fourfold: (1) introduction of the first augmentation scheme that jointly leverages synthetic data and NST, (2) empirical evidence that NST‑induced FID degradation does not impede—and may even help—training for corruption robustness, (3) a thorough ablation showing which rule‑based augmentations complement or conflict with the proposed method, and (4) demonstration of consistent, superior robustness across multiple architectures and datasets.

Limitations include the computational overhead of NST (especially for high‑resolution images) and the reliance on a fixed style‑image pool (1,000 paintings) which may limit diversity. Moreover, the disconnect between traditional image‑quality metrics (FID) and downstream robustness suggests a need for new evaluation criteria for augmentation quality. Future work could explore lightweight style‑transfer networks, adaptive α scheduling via meta‑learning, and scaling the approach to large‑scale datasets such as ImageNet.

Overall, the study provides a compelling case that stylizing synthetic data can bridge the appearance gap between generated and real images, yielding models that are both accurate and markedly more robust to a wide spectrum of real‑world corruptions.


Comments & Academic Discussion

Loading comments...

Leave a Comment