DART: Input-Difficulty-AwaRe Adaptive Threshold for Early-Exit DNNs
Early-exit deep neural networks enable adaptive inference by terminating computation when sufficient confidence is achieved, reducing cost for edge AI accelerators in resource-constrained settings. Existing methods, however, rely on suboptimal exit policies, ignore input difficulty, and optimize thresholds independently. This paper introduces DART (Input-Difficulty-Aware Adaptive Threshold), a framework that overcomes these limitations. DART introduces three key innovations: (1) a lightweight difficulty estimation module that quantifies input complexity with minimal computational overhead, (2) a joint exit policy optimization algorithm based on dynamic programming, and (3) an adaptive coefficient management system. Experiments on diverse DNN benchmarks (AlexNet, ResNet-18, VGG-16) demonstrate that DART achieves up to \textbf{3.3$\times$} speedup, \textbf{5.1$\times$} lower energy, and up to \textbf{42%} lower average power compared to static networks, while preserving competitive accuracy. Extending DART to Vision Transformers (LeViT) yields power (5.0$\times$) and execution-time (3.6$\times$) gains but also accuracy loss (up to 17 percent), underscoring the need for transformer-specific early-exit mechanisms. We further introduce the Difficulty-Aware Efficiency Score (DAES), a novel multi-objective metric, under which DART achieves up to a 14.8 improvement over baselines, highlighting superior accuracy, efficiency, and robustness trade-offs.
💡 Research Summary
The paper tackles the inefficiency of static inference on edge AI devices by proposing DART (Input‑Difficulty‑Aware Adaptive Threshold), a unified framework that brings together three novel components: (1) a lightweight difficulty‑estimation module, (2) a joint exit‑policy optimization algorithm based on dynamic programming, and (3) an online adaptive coefficient management system.
The difficulty estimator computes three inexpensive visual metrics—edge density via Sobel filters, pixel variance, and Laplacian gradient magnitude—and fuses them with learned weights into a normalized difficulty score α∈
Comments & Academic Discussion
Loading comments...
Leave a Comment