The Vienna 4G/5G Drive-Test Dataset
Machine learning for mobile network analysis, planning, and optimization is often limited by the lack of large, comprehensive real-world datasets. This paper introduces the Vienna 4G/5G Drive-Test Dataset, a city-scale open dataset of georeferenced Long Term Evolution (LTE) and 5G New Radio (NR) measurements collected across Vienna, Austria. The dataset combines passive wideband scanner observations with active handset logs, providing complementary network-side and user-side views of deployed radio access networks. The measurements cover diverse urban and suburban settings and are aligned with time and location information to support consistent evaluation. For a representative subset of base stations (BSs), we provide inferred deployment descriptors, including estimated BS locations, sector azimuths, and antenna heights. The release further includes high-resolution building and terrain models, enabling geometry-conditioned learning and calibration of deterministic approaches such as ray tracing. To facilitate practical reuse, the data are organized into scanner, handset, estimated cell information, and city-model components, and the accompanying documentation describes the available fields and intended joins between them. The dataset enables reproducible benchmarking across environment-aware learning, propagation modeling, coverage analysis, and ray-tracing calibration workflows.
💡 Research Summary
The paper presents the Vienna 4G/5G Drive‑Test Dataset, a city‑scale, open‑access collection of real‑world LTE and 5G New Radio measurements gathered across Vienna, Austria between March 2024 and March 2025. The dataset uniquely combines two complementary measurement modalities: (i) passive wide‑band scanning using a PCTEL IBflex receiver equipped with two OmniLOG PRO omnidirectional antennas and a GPS unit, and (ii) active handset logs from three 5G‑capable smartphones mounted on the test vehicle. The scanner provides technology‑agnostic observations of cell‑specific reference signals (CRS), synchronization signal blocks (SSB), and associated system information (MIB/SIB), together with cell identifiers (PCI, channel, eNB ID) and radio‑layer KPIs (RSRP, RSRQ, etc.). The handsets deliver user‑centric metrics such as RSRP, RSRQ, RSSI, SINR, throughput, and timing‑advance (TA). Both streams are time‑ and location‑synchronized, enabling joint analysis of network‑side and user‑side perspectives for the same cells.
Beyond raw measurements, the authors infer deployment descriptors for a representative subset of base stations. Using the Austrian Senderkataster as a spatial constraint and Google Earth Pro for antenna‑height estimation, they apply the methodology of Eller et al. to locate LTE sites (≈50 distinct BSs across operators) and, for NR, a PCI‑based clustering combined with TA‑circle overlap validation to locate ≈30 sites for one operator. Sector azimuths are derived from the RSRP‑weighted centroid of handset measurement locations, yielding a north‑referenced clockwise angle.
The release is organized into four main components: (1) scanner_data tables containing cleaned, standardized CRS/SSB events with MIB/SIB context; (2) phone_data tables with harmonized KPI fields and interpolated TA records; (3) cell_info providing the estimated BS coordinates, azimuths, and antenna heights; and (4) city_model comprising high‑resolution 3‑D building footprints and a digital elevation model of Vienna. Detailed documentation specifies primary keys (PCI, channel, measurement ID) and recommended joins, allowing researchers to construct pipelines such as “scanner + handset → cell → city model”.
The authors discuss a broad spectrum of use cases. First, ray‑tracing calibration: measured link‑level data together with the inferred BS metadata and 3‑D environment enable systematic tuning of deterministic simulators and quantitative benchmarking against field observations. Second, environment‑aware radio‑map learning: deep models can ingest building and terrain features to predict coverage, with the dataset providing both training targets (e.g., RSRP maps) and baseline deterministic predictions. Third, deployment inference: the TA‑based localization and azimuth estimation procedures can be evaluated or improved using the released ground truth. Fourth, 5G beam‑management research: NR beam‑index and associated RSRP measurements support studies of beam selection and tracking under mobility. Finally, digital twin prototyping: the combination of deployment descriptors, high‑fidelity city geometry, and real measurements forms a complete digital twin of a metropolitan cellular network, useful for scenario testing, algorithm validation, and policy analysis.
Compared with existing public datasets (e.g., DoNext, Rochman, Yapar synthetic maps), Vienna’s dataset stands out by jointly providing (i) dense, city‑wide measurement coverage across urban and suburban zones, (ii) dual passive/active modalities, (iii) explicit deployment metadata for a subset of cells, and (iv) a high‑resolution 3‑D city model. This integration addresses a critical gap for environment‑conditioned learning and reproducible benchmarking of both AI‑driven and physics‑based approaches. The authors conclude that the Vienna 4G/5G Drive‑Test Dataset will serve as a valuable benchmark and development resource for the wireless research community, facilitating more accurate propagation modeling, robust network optimization, and realistic digital twin construction at city scale.
Comments & Academic Discussion
Loading comments...
Leave a Comment