ACIL: Active Class Incremental Learning for Image Classification
Continual learning (or class incremental learning) is a realistic learning scenario for computer vision systems, where deep neural networks are trained on episodic data, and the data from previous episodes are generally inaccessible to the model. Existing research in this domain has primarily focused on avoiding catastrophic forgetting, which occurs due to the continuously changing class distributions in each episode and the inaccessibility of the data from previous episodes. However, these methods assume that all the training samples in every episode are annotated; this not only incurs a huge annotation cost, but also results in a wastage of annotation effort, since most of the samples in a given episode will not be accessible to the model in subsequent episodes. Active learning algorithms identify the salient and informative samples from large amounts of unlabeled data and are instrumental in reducing the human annotation effort in inducing a deep neural network. In this paper, we propose ACIL, a novel active learning framework for class incremental learning settings. We exploit a criterion based on uncertainty and diversity to identify the exemplar samples that need to be annotated in each episode, and will be appended to the data in the next episode. Such a framework can drastically reduce annotation cost and can also avoid catastrophic forgetting. Our extensive empirical analyses on several vision datasets corroborate the promise and potential of our framework against relevant baselines.
💡 Research Summary
The paper introduces ACIL (Active Class Incremental Learning), a novel framework that integrates active learning into the class incremental learning (CIL) paradigm to simultaneously address two major challenges: the high annotation cost traditionally assumed in CIL and the problem of catastrophic forgetting when new classes are introduced over time. In a typical CIL setting, data arrives in a sequence of episodes, each containing samples from a set of classes that have not been seen before. Existing CIL methods assume that all samples in every episode are fully labeled, which is unrealistic for many real‑world applications where labeling is expensive and storage constraints limit long‑term data retention.
ACIL relaxes this assumption by allowing only a small, fixed‑size exemplar budget k per episode. At episode n, the data is split into a small labeled subset X_L^n, a large unlabeled pool X_U^n, and the exemplar set E_{n‑1} carried over from the previous episode. The algorithm must select exactly k samples to be annotated and added to the exemplar set E_n, which will be used in the next episode. The selection budget is divided between X_U^n and E_{n‑1} proportionally to the number of distinct classes present in each set, ensuring that both new and previously seen classes receive representation. No samples are taken from X_L^n because this set is already labeled and typically much smaller than X_U^n.
The core of the selection strategy is a joint uncertainty–diversity criterion. Uncertainty is measured by Shannon entropy of the model’s softmax output for each unlabeled sample, capturing how ambiguous the current model’s prediction is. Diversity is enforced by clustering the feature embeddings of candidate samples using a weighted k‑means algorithm, where each sample’s weight equals its uncertainty. The objective minimizes the weighted within‑cluster variance, effectively grouping similar samples while preferring those with higher uncertainty. After clustering into k clusters, the sample closest to each cluster centroid is chosen for annotation. This procedure is applied independently to the unlabeled pool and to the previous exemplar set, guaranteeing that the newly formed exemplar set E_n contains a balanced and informative mix of old and new class instances.
Training in each episode proceeds on the union of the newly labeled data X_L^n and the carried‑over exemplars E_{n‑1}. Because class frequencies can be highly imbalanced (the new episode may contain many samples of a few classes while the exemplar set contains few), a weighted cross‑entropy loss is employed, with class weights inversely proportional to the number of samples per class. To further mitigate forgetting, knowledge distillation is incorporated: the model from the previous episode M_{n‑1} is applied to the exemplar set E_{n‑1}, and a distillation loss L_D is computed between its soft targets and the current model M_n’s outputs. The total loss is L = L_WCE + λ L_D, where λ balances classification and distillation objectives.
Extensive experiments on CIFAR‑100, ImageNet‑Subset, and TinyImageNet demonstrate that ACIL can reduce the annotation budget to as low as 5‑10 % of the total data while maintaining classification accuracy comparable to state‑of‑the‑art CIL methods that require full labeling. Compared against recent active learning baselines such as Coreset and BADGE, ACIL achieves superior class coverage because it deliberately samples from both the current unlabeled pool and the previous exemplar set, thereby preserving knowledge of earlier classes. The memory footprint remains constant (fixed k exemplars), and the additional computational overhead is limited to a weighted k‑means clustering step, making the approach practical for real‑time or resource‑constrained deployments.
In summary, ACIL offers a realistic solution for continual visual learning: it drastically cuts human labeling effort, respects a fixed memory budget, and simultaneously curbs catastrophic forgetting through a principled uncertainty‑diversity sampling scheme and a combined classification‑distillation training objective. The framework bridges the gap between active learning and class incremental learning, opening avenues for more efficient lifelong learning systems in computer vision and beyond.
Comments & Academic Discussion
Loading comments...
Leave a Comment