MECAD: A multi-expert architecture for continual anomaly detection
In this paper we propose MECAD, a novel approach for continual anomaly detection using a multi-expert architecture. Our system dynamically assigns experts to object classes based on feature similarity and employs efficient memory management to preserve the knowledge of previously seen classes. By leveraging an optimized coreset selection and a specialized replay buffer mechanism, we enable incremental learning without requiring full model retraining. Our experimental evaluation on the MVTec AD dataset demonstrates that the optimal 5-expert configuration achieves an average AUROC of 0.8259 across 15 diverse object categories while significantly reducing knowledge degradation compared to single-expert approaches. This framework balances computational efficiency, specialized knowledge retention, and adaptability, making it well-suited for industrial environments with evolving product types.
💡 Research Summary
The paper introduces MECAD, a Multi‑Expert Continual Anomaly Detection framework designed for industrial settings where product types and defect patterns evolve over time. Traditional anomaly detectors are trained on a fixed set of objects and must be retrained from scratch when new categories appear, leading to high annotation costs and catastrophic forgetting. MECAD tackles these issues by combining a shared feature‑extraction backbone (WideResNet‑50) with a set of N specialized expert modules, each maintaining its own patch‑level memory bank of normal embeddings.
Key technical contributions are:
-
Patch‑level memory with coreset compression – Normal training images are broken into 32×32 patches, embedded, and then reduced to a compact representative set using an online coreset selection algorithm. This drastically cuts memory usage while preserving the diversity of normal patterns.
-
Similarity‑driven expert assignment – When a new object class arrives, its embedding centroid is compared to the centroids of each expert’s memory using cosine similarity. If the highest similarity exceeds a threshold (θ = 0.9), the class is routed to that expert; otherwise it is assigned to an unused expert. This dynamic routing automatically groups semantically similar classes under the same expert, encouraging specialization and reducing interference.
-
Expert‑specific replay buffer – Only the expert receiving the new class is updated. Its training set consists of the new class’s selected embeddings plus replay samples drawn from that expert’s previously assigned classes. A fixed replay ratio of 0.2 balances stability (knowledge retention) and plasticity (learning new patterns). Replay samples are also drawn from the coreset, ensuring they are informative yet memory‑efficient.
-
Inference procedure – At test time, an image is processed by the expert that was assigned to its class. Patch embeddings are compared to the expert’s memory via nearest‑neighbor distance; the maximum patch‑level distance becomes the image‑level anomaly score. Because experts operate in parallel, inference remains fast enough for real‑time deployment.
The authors evaluate MECAD on the MVTec‑AD benchmark (15 object categories, 5,354 images) using a continual‑learning protocol where classes are introduced sequentially. They vary the number of experts from 1 to 8 while keeping all other hyper‑parameters constant. Results show a clear performance‑vs‑expert trade‑off: a single expert yields an average AUROC of 0.7494 and severe forgetting (−0.3736). Adding experts quickly improves AUROC—0.7793 with two experts, 0.8212 with three, peaking at 0.8269 with four. The configuration with five experts achieves the best overall balance: average AUROC = 0.8259 and forgetting reduced to −0.1396. Adding more experts yields diminishing returns on AUROC but continues to lower forgetting, indicating that specialist experts help preserve prior knowledge. Per‑class analysis reveals that some categories (e.g., leather, bottle) consistently achieve AUROC > 0.95 regardless of expert count, while others (e.g., screw, transistor) remain challenging, suggesting avenues for further data augmentation or model refinement.
MECAD’s strengths lie in (i) mitigating catastrophic forgetting through expert‑level isolation and replay, (ii) efficient memory usage via coreset selection, and (iii) automatic specialization through similarity‑based routing. Limitations include a fixed expert pool that may become saturated as the number of new classes grows, and sensitivity to the similarity threshold and memory budget, which the paper does not explore in depth. Future work could investigate dynamic expert creation, importance‑based replay sampling, and deployment on edge devices.
In summary, MECAD offers a practical, scalable solution for continual anomaly detection in dynamic industrial environments, achieving high detection accuracy while substantially reducing knowledge degradation compared to single‑expert baselines.
Comments & Academic Discussion
Loading comments...
Leave a Comment