FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Dataset distillation (DD) compresses a large training set into a small synthetic set, reducing storage and training cost, and has shown strong results on general benchmarks. Decoupled DD further improves efficiency by splitting the pipeline into pretraining, sample distillation, and soft-label generation. However, existing decoupled methods largely rely on coarse class-label supervision and optimize samples within each class in a nearly identical manner. On fine-grained datasets, this often yields distilled samples that (i) retain large intra-class variation with subtle inter-class differences and (ii) become overly similar within the same class, limiting localized discriminative cues and hurting recognition. To solve the above-mentioned problems, we propose FD$^{2}$, a dedicated framework for Fine-grained Dataset Distillation. FD$^{2}$ localizes discriminative regions and constructs fine-grained representations for distillation. During pretraining, counterfactual attention learning aggregates discriminative representations to update class prototypes. During distillation, a fine-grained characteristic constraint aligns each sample with its class prototype while repelling others, and a similarity constraint diversifies attention across same-class samples. Experiments on multiple fine-grained and general datasets show that FD$^{2}$ integrates seamlessly with decoupled DD and improves performance in most settings, indicating strong transferability.

💡 Research Summary

Dataset Distillation (DD) aims to compress a large training set into a tiny synthetic one while preserving the original training utility. Recent decoupled DD methods (e.g., SRe2L, SRe2L++) improve efficiency by separating pre‑training, sample distillation, and soft‑label generation, but they rely only on coarse class‑level supervision. When applied to fine‑grained datasets—where intra‑class variation is large and inter‑class differences are subtle—these methods suffer from two major issues: (i) the distilled set still exhibits the undesirable fine‑grained characteristic (high intra‑class dispersion, low inter‑class separability), and (ii) samples within the same class become overly similar, limiting the diversity of discriminative cues.

FD² (Fine‑grained Dataset Distillation) addresses both problems by augmenting the decoupled pipeline with attention‑based fine‑grained supervision. The framework consists of three stages:

Pre‑training with Counterfactual Attention Learning (CAL).
A backbone network is coupled with a CAL branch that produces multiple attention maps for each image. CAL generates a factual representation by aggregating the backbone feature with its attention maps, then creates a counterfactual representation by replacing the attention maps with perturbed ones while keeping the feature fixed. The difference between factual and counterfactual logits (the “effect”) is used as an additional loss, encouraging the model to focus on truly discriminative regions. Simultaneously, CAL maintains a normalized prototype vector for each class, updated online with a momentum rule and a center regularizer that pulls the factual representation toward its class prototype.
Distillation with Two Novel Constraints.
Distilled images are organized into same‑class groups of size (N_S). For each sample we obtain its backbone feature (z) and attention map (A) from the pretrained teacher.
- Fine‑grained Characteristic Constraint (L_F). This loss aligns the sample’s representation with its own class prototype while pushing it away from prototypes of other classes: \

FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation

💡 Research Summary

Comments & Academic Discussion

Leave a Comment