PMMA: The Polytechnique Montreal Mobility Aids Dataset
This study introduces a new object detection dataset of pedestrians using mobility aids, named PMMA. The dataset was collected in an outdoor environment, where volunteers used wheelchairs, canes, and walkers, resulting in nine categories of pedestrians: pedestrians, cane users, two types of walker users, whether walking or resting, five types of wheelchair users, including wheelchair users, people pushing empty wheelchairs, and three types of users pushing occupied wheelchairs, including the entire pushing group, the pusher and the person seated on the wheelchair. To establish a benchmark, seven object detection models (Faster R-CNN, CenterNet, YOLOX, DETR, Deformable DETR, DINO, and RT-DETR) and three tracking algorithms (ByteTrack, BOT-SORT, and OC-SORT) were implemented under the MMDetection framework. Experimental results show that YOLOX, Deformable DETR, and Faster R-CNN achieve the best detection performance, while the differences among the three trackers are relatively small. The PMMA dataset is publicly available at https://doi.org/10.5683/SP3/XJPQUG, and the video processing and model training code is available at https://github.com/DatasetPMMA/PMMA.
💡 Research Summary
The paper introduces PMMA, a novel object detection and multi‑object tracking dataset focused on pedestrians who use mobility aids such as wheelchairs, walkers, and canes. Collected in an outdoor parking lot at Polytechnique Montréal, the dataset comprises approximately 28,000 high‑resolution (2208 × 1242) stereo frames captured at 15 fps from two fixed camera poles positioned 4 m above the ground. Although the recordings are stereo, only the left view is annotated, and the annotations follow the COCO format to ensure compatibility with mainstream deep‑learning frameworks.
PMMA defines nine fine‑grained categories: (1) ordinary pedestrians, (2) cane users, (3) walkers while walking, (4) walkers while resting, and five wheelchair‑related categories – (5) a person alone in a self‑propelled wheelchair, (6) a person pushing an empty wheelchair, (7) a group pushing a wheelchair (the whole group), (8) the pusher within that group, and (9) the seated person being pushed. This hierarchical labeling captures both the mobility aid itself and the interaction dynamics between users, a level of detail absent from existing autonomous‑driving or surveillance datasets. Occlusion is encoded with four levels (no occlusion, partial, full, and “shadow” occlusion) extending the KITTI convention.
The authors benchmark seven state‑of‑the‑art detectors—Faster R‑CNN, CenterNet, YOLOX, DETR, Deformable DETR, DINO, and RT‑DETR—using the MMDetection toolbox. All models are initialized with COCO‑pretrained weights and fine‑tuned for 50 epochs (early stopping) with AdamW (lr = 1e‑4, weight decay = 1e‑4). Detection performance is evaluated with standard COCO metrics (mAP and AP₅₀). YOLOX, Deformable DETR, and Faster R‑CNN achieve the highest overall mAP (≈ 0.55, 0.53, and 0.52 respectively), while CenterNet and DINO lag behind. Class‑wise analysis reveals that the most challenging categories are the wheelchair‑group and its sub‑components, where small bounding boxes, heavy overlap, and similar visual appearance cause a steep drop in AP.
For tracking, three recent MOT algorithms—ByteTrack, BOT‑SORT, and OC‑SORT—are applied to the detection outputs. The three trackers yield comparable MOTA and IDF1 scores (≈ 0.71–0.73), indicating that detection quality, rather than the tracking algorithm, is the primary bottleneck. Nevertheless, identity switches are frequent in group‑movement scenarios, highlighting the need for trackers that can model inter‑object relationships.
The dataset collection protocol was approved by the Polytechnique Montréal ethics committee. Volunteers were graduate students who simulated mobility‑aid usage; no actual wheelchair or walker users participated, which limits the ecological validity but mitigates privacy and safety concerns. The authors acknowledge class imbalance (some wheelchair sub‑categories contain only a few hundred instances) and the fact that depth information from the stereo pair is not exploited in the current benchmarks.
By providing a publicly available, richly annotated, outdoor dataset that distinguishes nine pedestrian sub‑categories, PMMA fills a critical gap in computer‑vision resources for inclusive transportation systems. The release of both the data (doi:10.5683/SP3/XJPQUG) and the processing code (GitHub) enables reproducibility and encourages future work on multimodal fusion (e.g., leveraging stereo depth), behavior prediction, risk assessment, and the development of detection‑tracking pipelines tailored to vulnerable road users.
Comments & Academic Discussion
Loading comments...
Leave a Comment