CLEANN: Accelerated Trojan Shield for Embedded Neural Networks

CLEANN: Accelerated Trojan Shield for Embedded Neural Networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose CLEANN, the first end-to-end framework that enables online mitigation of Trojans for embedded Deep Neural Network (DNN) applications. A Trojan attack works by injecting a backdoor in the DNN while training; during inference, the Trojan can be activated by the specific backdoor trigger. What differentiates CLEANN from the prior work is its lightweight methodology which recovers the ground-truth class of Trojan samples without the need for labeled data, model retraining, or prior assumptions on the trigger or the attack. We leverage dictionary learning and sparse approximation to characterize the statistical behavior of benign data and identify Trojan triggers. CLEANN is devised based on algorithm/hardware co-design and is equipped with specialized hardware to enable efficient real-time execution on resource-constrained embedded platforms. Proof of concept evaluations on CLEANN for the state-of-the-art Neural Trojan attacks on visual benchmarks demonstrate its competitive advantage in terms of attack resiliency and execution overhead.


💡 Research Summary

The paper introduces CLEANN, a novel end‑to‑end framework designed to mitigate neural backdoor (Trojan) attacks on deep neural networks (DNNs) deployed in resource‑constrained embedded systems. Unlike prior defenses that rely on labeled data, model retraining, or costly reverse‑engineering of triggers, CLEANN operates entirely in an unsupervised manner, requiring only a small set of unlabeled benign samples (typically less than 1 % of the original training set) to learn a dictionary for sparse representation.

CLEANN consists of two complementary analyzers: a DCT (Discrete Cosine Transform) analyzer that works in the frequency domain on raw input images, and a feature analyzer that operates on latent feature maps extracted from the penultimate layer of the victim DNN. Both analyzers share the same core pipeline: (1) transform the signal (image patches or feature tensors) into a suitable domain (spatial‑frequency for DCT, raw latent space for features); (2) perform sparse recovery using the pre‑learned dictionary, reconstructing the signal with a small number of atoms; (3) compute the reconstruction error and feed it to an outlier detection module based on concentration inequalities.

In the DCT analyzer, each input image is divided into non‑overlapping P × P patches (P = 4 for tiny images, P = 8 for larger ones). A group convolution layer with kernels initialized to DCT basis coefficients implements the transform efficiently on hardware. High‑frequency components, which are typically weak in natural images, become pronounced when a Trojan trigger (e.g., a sticky note, a digital watermark) is present. Sparse recovery suppresses these anomalous frequencies, and the resulting reconstruction error is used to generate a binary mask that blanks out suspicious regions before the image proceeds to the main DNN.

The feature analyzer is placed just before the final classification layer. By applying sparse recovery to the high‑dimensional feature tensor, CLEANN both denoises trigger‑induced distortions—thereby recovering the true class label of a poisoned sample—and detects abnormal reconstruction errors as a signal of a Trojan. A dimensionality‑reduction module adapts the recovery process to various feature sizes, while a restoring layer reshapes the output back to the original tensor dimensions for seamless continuation of inference.

CLEANN’s offline phase builds the dictionary and calibrates the outlier thresholds using only benign data. No Trojan‑infected samples, no labeled data, and no modifications to the victim model’s weights are required. The online phase is lightweight: the DCT extraction and up‑sampling are realized as an additional convolutional front‑end, and the sparse recovery, outlier detection, and dimensionality‑reduction/restoration blocks are implemented as custom FPGA IP cores. This co‑design yields sub‑millisecond latency and orders‑of‑magnitude higher throughput per watt compared with software‑only baselines.

The authors evaluate CLEANN against state‑of‑the‑art backdoor attacks, including BadNets (physical triggers such as a sticky note) and TrojanNN (digital triggers like squares and watermarks). Across CIFAR‑10, GTSRB, and other visual benchmarks, CLEANN reduces the attack success rate to 0 % while preserving the clean‑accuracy drop to less than 1 %. Compared with prior defenses—Neural Cleanse, STRIP, NIC, and others—CLEANN achieves comparable or superior detection rates with dramatically lower computational overhead, making it suitable for deployment on embedded platforms such as autonomous vehicles, drones, and IoT edge devices.

In summary, CLEANN advances Trojan mitigation by (1) eliminating the need for labeled data or model fine‑tuning, (2) providing a unified detection and remediation pipeline that works for both physical and digital triggers, and (3) delivering a hardware‑accelerated solution that meets the stringent latency and energy constraints of modern embedded AI systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment