Vision-based Pedestrian Alert Safety System (PASS) for Signalized Intersections
Although Vehicle-to-Pedestrian (V2P) communication can significantly improve pedestrian safety at a signalized intersection, this safety is hindered as pedestrians often do not carry hand-held devices (e.g., Dedicated short-range communication (DSRC) and 5G enabled cell phone) to communicate with connected vehicles nearby. To overcome this limitation, in this study, traffic cameras at a signalized intersection were used to accurately detect and locate pedestrians via a vision-based deep learning technique to generate safety alerts in real-time about possible conflicts between vehicles and pedestrians. The contribution of this paper lies in the development of a system using a vision-based deep learning model that is able to generate personal safety messages (PSMs) in real-time (every 100 milliseconds). We develop a pedestrian alert safety system (PASS) to generate a safety alert of an imminent pedestrian-vehicle crash using generated PSMs to improve pedestrian safety at a signalized intersection. Our approach estimates the location and velocity of a pedestrian more accurately than existing DSRC-enabled pedestrian hand-held devices. A connected vehicle application, the Pedestrian in Signalized Crosswalk Warning (PSCW), was developed to evaluate the vision-based PASS. Numerical analyses show that our vision-based PASS is able to satisfy the accuracy and latency requirements of pedestrian safety applications in a connected vehicle environment.
💡 Research Summary
The paper introduces a Vision‑based Pedestrian Alert Safety System (PASS) designed to improve pedestrian safety at signalized intersections without requiring pedestrians to carry any communication device. Recognizing that Vehicle‑to‑Pedestrian (V2P) safety solutions based on Dedicated Short‑Range Communications (DSRC) or 5G rely on hand‑held devices that many pedestrians simply do not possess, the authors propose leveraging existing traffic‑camera infrastructure combined with a deep‑learning pedestrian detector to generate Personal Safety Messages (PSMs) in real time.
The system pipeline consists of four main stages. First, video streams from intersection cameras are down‑sampled to reduce computational load. Second, a YOLOv3‑based convolutional neural network, pre‑trained on public datasets (INRI‑A, MIT, CUHK) and fine‑tuned with locally collected, annotated footage, detects pedestrians with >81 % accuracy and a per‑frame processing time of roughly 51 ms (≈20 fps). Third, detected bounding boxes undergo non‑maximum suppression to eliminate duplicates, and a homography transformation—derived from a one‑time calibration of the camera’s intrinsic and extrinsic parameters—maps image coordinates to world latitude/longitude. By tracking the centroid of each pedestrian across successive frames, the system computes velocity (in 0.02 m/s increments) and heading (in 0.125° increments).
Fourth, the extracted kinematic data are packaged into PSMs that conform to SAE J2945/J2735 standards. Each PSM includes a Message ID, timestamp (millisecond resolution), message count, a temporary anonymized ID, latitude, longitude, elevation (static for the intersection), positional accuracy, velocity, and heading. These messages are broadcast via a DSRC transceiver at a fixed interval of 100 ms. Connected vehicles receive the PSMs and run the Pedestrian in Signalized Crosswalk Warning (PSCW) application, which issues visual and auditory alerts to drivers when a potential conflict is detected.
Performance evaluation is conducted in two parts. In a controlled laboratory setting, PASS’s localization error averages 0.35 m (σ ≈ 0.12 m), dramatically outperforming GPS‑based DSRC hand‑held devices that typically exhibit ~10 m errors. Velocity estimation error remains below 0.15 m/s. End‑to‑end latency—from image capture through PSM transmission—is measured at 92 ms on average, satisfying the sub‑100 ms latency requirement for safety‑critical alerts. A field trial at a real intersection demonstrates that the PSCW application, powered by PASS, can warn drivers of an imminent pedestrian‑vehicle collision up to 0.9 seconds before impact, providing a meaningful safety margin.
The authors acknowledge several limitations. Camera‑based perception is vulnerable to occlusion, adverse weather, and low‑light conditions; each intersection requires precise geometric calibration; and privacy concerns demand strategies for video anonymization or edge‑only processing. Moreover, the current implementation is confined to signalized intersections and may not generalize to unsignalized or highly congested urban environments without additional sensor fusion.
In conclusion, PASS offers a device‑free, high‑accuracy, low‑latency V2P safety solution that meets SAE standards and demonstrates practical feasibility through both simulation and real‑world testing. Future work is suggested in multi‑camera fusion, weather‑robust image enhancement, and deployment of edge AI accelerators to broaden applicability and further reduce processing delays.
Comments & Academic Discussion
Loading comments...
Leave a Comment