InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Perception systems of autonomous vehicles are susceptible to occlusion, especially when examined from a vehicle-centric perspective. Such occlusion can lead to overlooked object detections, e.g., larger vehicles such as trucks or buses may create blind spots where cyclists or pedestrians could be obscured, accentuating the safety concerns associated with such perception system limitations. To mitigate these challenges, the vehicle-to-everything (V2X) paradigm suggests employing an infrastructure-side perception system (IPS) to complement autonomous vehicles with a broader perceptual scope. Nevertheless, the scarcity of real-world 3D infrastructure-side datasets constrains the advancement of V2X technologies. To bridge these gaps, this paper introduces a new 3D infrastructure-side collaborative perception dataset, abbreviated as inscope. Notably, InScope is the first dataset dedicated to addressing occlusion challenges by strategically deploying multiple-position Light Detection and Ranging (LiDAR) systems on the infrastructure side. Specifically, InScope encapsulates a 20-day capture duration with 303 tracking trajectories and 187,787 3D bounding boxes annotated by experts. Through analysis of benchmarks, four different benchmarks are presented for open traffic scenarios, including collaborative 3D object detection, multisource data fusion, data domain transfer, and 3D multiobject tracking tasks. Additionally, a new metric is designed to quantify the impact of occlusion, facilitating the evaluation of detection degradation ratios among various algorithms. The Experimental findings showcase the enhanced performance of leveraging InScope to assist in detecting and tracking 3D multiobjects in real-world scenarios, particularly in tracking obscured, small, and distant objects. The dataset and benchmarks are available at https://github.com/xf-zh/InScope.

💡 Research Summary

**
The paper addresses a critical limitation of vehicle‑centric perception systems: occlusion caused by large objects such as trucks or buses, which creates blind spots for cyclists and pedestrians. To overcome this, the authors propose leveraging infrastructure‑side perception (IPS) as part of a vehicle‑to‑everything (V2X) framework. Existing real‑world 3D infrastructure datasets (e.g., DAIR‑V2X, V2X‑Seq) either lack multi‑sensor coverage or do not provide comprehensive annotations for occlusion‑focused research.

InScope is introduced as the first large‑scale real‑world dataset specifically designed to mitigate occlusion through the deployment of multiple LiDAR units on infrastructure. The system consists of an 80‑beam primary LiDAR (range ≤ 230 m, 10 Hz) and a 32‑beam secondary LiDAR (range ≤ 150 m, 10 Hz) mounted at different heights and orientations at a busy T‑intersection in Guangdong, China. The primary sensor covers a wide horizontal field of view (90°–275°) while the secondary sensor fills the blind‑spot region (15°–305°). Temporal synchronization is achieved at the microsecond level, ensuring that data from both sensors can be fused without significant time drift.

Data collection spanned 20 days, yielding 87 sequences, 21,317 point‑cloud frames, 303 distinct object trajectories, and 187,787 expertly annotated 3D bounding boxes (including class, position, dimensions, orientation, and tracking ID). The dataset is split into three parts: InScope‑Pri (primary LiDAR only), InScope‑Sec (secondary LiDAR only), and the fused InScope (both LiDARs combined). Calibration involved more than 20 reference points to compute precise transformation matrices between the two sensor coordinate systems and a unified world frame.

Four benchmarks are defined:

Collaborative 3D Object Detection – evaluates detection performance using single‑LiDAR data versus fused multi‑LiDAR data, quantifying the benefit of additional viewpoints.
Multisource Data Fusion – assesses various fusion strategies (point‑level, feature‑level, decision‑level) and their robustness under occlusion.
Data‑Domain Transfer – tests the applicability of state‑of‑the‑art vehicle‑side detectors on InScope, facilitating unsupervised domain adaptation research.
3D Multi‑Object Tracking – measures tracking metrics (MOTA, ID‑Switch, ID‑F1) over time, emphasizing the ability to maintain identities for occluded, small, or distant objects.

To systematically measure occlusion mitigation, the authors introduce a new metric ξ_D, defined as the degradation ratio of detection performance when only one LiDAR is used compared to the multi‑LiDAR scenario. Lower ξ_D indicates stronger anti‑occlusion capability. Experiments show ξ_D values between 0.12 and 0.18 for InScope, substantially better than previous infrastructure datasets (≈ 0.35).

Baseline experiments reproduce several popular LiDAR detectors (SECOND, PointPillars, PV‑RCNN) and fusion models (PointRCNN‑Fusion, VoxelFusion). Results demonstrate that multi‑LiDAR fusion improves average precision for small objects (pedestrians, cyclists) by 8–12 % and raises tracking MOTA by 3.4 % while reducing ID‑Switches by 27 %.

The dataset contains only point‑cloud data, with all personally identifiable information removed, and is publicly available at https://github.com/xf-zh/InScope. By providing rich annotations, multi‑sensor coverage, and dedicated benchmarks, InScope offers the research community a solid platform to develop and evaluate occlusion‑robust infrastructure‑centric perception algorithms, ultimately advancing the safety and reliability of V2X systems.

InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios

💡 Research Summary

Comments & Academic Discussion

Leave a Comment