Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles' Perception
Autonomous vehicles (AVs) rely on sensors and deep neural networks (DNNs) to perceive their surrounding environment and make maneuver decisions in real time. However, achieving real-time DNN inference in the AV’s perception pipeline is challenging due to the large gap between the computation requirement and the AV’s limited resources. Most, if not all, of existing studies focus on optimizing the DNN inference time to achieve faster perception by compressing the DNN model with pruning and quantization. In contrast, we present a Predictable Perception system with DNNs (PP-DNN) that reduce the amount of image data to be processed while maintaining the same level of accuracy for multi-tenant DNNs by dynamically selecting critical frames and regions of interest (ROIs). PP-DNN is based on our key insight that critical frames and ROIs for AVs vary with the AV’s surrounding environment. However, it is challenging to identify and use critical frames and ROIs in multi-tenant DNNs for predictable inference. Given image-frame streams, PP-DNN leverages an ROI generator to identify critical frames and ROIs based on the similarities of consecutive frames and traffic scenarios. PP-DNN then leverages a FLOPs predictor to predict multiply-accumulate operations (MACs) from the dynamic critical frames and ROIs. The ROI scheduler coordinates the processing of critical frames and ROIs with multiple DNN models. Finally, we design a detection predictor for the perception of non-critical frames. We have implemented PP-DNN in an ROS-based AV pipeline and evaluated it with the BDD100K and the nuScenes dataset. PP-DNN is observed to significantly enhance perception predictability, increasing the number of fusion frames by up to 7.3x, reducing the fusion delay by >2.6x and fusion-delay variations by >2.3x, improving detection completeness by 75.4% and the cost-effectiveness by up to 98% over the baseline.
💡 Research Summary
The paper introduces PP‑DNN, a Predictable Perception system for autonomous vehicles that improves the timing predictability of multi‑tenant deep neural network (DNN) inference without sacrificing detection accuracy. Traditional approaches focus on compressing DNN models (pruning, quantization) or accelerating inference, which often lead to accuracy loss or hardware‑specific solutions. PP‑DNN takes a complementary route: it reduces the amount of visual data that must be processed by dynamically selecting “critical” frames and region‑of‑interest (ROI) patches based on environmental context.
Key components of PP‑DNN are:
- ROI Generator – Computes structural similarity (SSIM) between consecutive frames, tracks static and moving objects with a lightweight tracker, and incorporates driving conditions (speed, highway vs. city) to decide which frames are critical and which spatial regions must be processed. It can emit multiple ROIs per frame, allowing a much smaller total pixel area than a single large ROI.
- FLOPs Predictor – Estimates the multiply‑accumulate operations (MACs) required for each DNN given the selected ROI sizes. The predictor captures the non‑linear relationship between input size and MACs, especially for two‑stage detectors such as Faster‑RCNN where small inputs can paradoxically increase MACs due to preprocessing resizing.
- ROI Scheduler / Task Coordinator – Uses the FLOPs predictions together with real‑time GPU load information to orchestrate the execution order of the concurrent DNN tasks. It may ask a task to yield resources, skip a frame, or adjust ROI dimensions to keep the overall fusion delay bounded and to avoid processing outdated frames.
- Detection Predictor – For non‑critical frames, it generates anticipated detection results by extrapolating object trajectories from previous detections and tracking data. These predicted results are merged with the actual detections during the sensor‑fusion stage, preserving functional completeness.
The system is implemented on a ROS‑based autonomous‑driving stack and evaluated with two large public datasets: BDD100K and nuScenes. Experiments compare PP‑DNN against a baseline that processes every frame with a fixed full‑frame ROI. Results show:
- Up to 7.3× more fusion frames per second, meaning the perception pipeline can output more timely updates.
- >2.6× reduction in average fusion latency and >2.3× reduction in latency variance, addressing the critical issue of timing predictability.
- 75.4 % improvement in detection completeness, indicating that the dynamic ROI selection does not miss important objects.
- Up to 98 % improvement in cost‑effectiveness, measured as the ratio of perception quality to computational resources consumed.
The authors also provide two empirical insights that motivate the design: (1) critical frames and ROI configurations vary with traffic scenario and must be selected adaptively; (2) dynamic input sizes and the presence of out‑of‑date frames are the primary sources of inference‑time variation in multi‑tenant DNN pipelines. By making the ROI selection ROI‑aware and coordinating the multi‑tenant tasks accordingly, PP‑DNN achieves a stable, high‑throughput perception pipeline suitable for safety‑critical autonomous driving. Future work may extend the approach to incorporate additional sensor modalities, more sophisticated context awareness (weather, illumination), and deployment on edge hardware with stricter power budgets.
Comments & Academic Discussion
Loading comments...
Leave a Comment