Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper presents Multimodal-Wireless, a large-scale open-source dataset for multimodal sensing and communication research. The dataset is generated through an integrated and customizable data pipeline built upon the CARLA simulator and Sionna framework, and features high-resolution communication channel state information (CSI) fully synchronized with five other sensor modalities, namely LiDAR, RGB and depth camera, inertial measurement unit (IMU) and radar, all sampled at 100 Hz. It contains approximately 160,000 frames collected across four virtual towns, sixteen communication scenarios, and three weather conditions. This paper provides a comprehensive overview of the dataset, outlining its key features, overall framework, and technical implementation details. In addition, it explores potential research applications concerning communication and collaborative perception, exemplified by beam prediction using a multimodal large language model. The dataset is open in https://le-liang.github.io/mmw/.

💡 Research Summary

**
The paper introduces Multimodal‑Wireless, a large‑scale open‑source dataset that simultaneously captures high‑resolution wireless channel state information (CSI) and five conventional sensor modalities (LiDAR, RGB camera, depth camera, inertial measurement unit, and radar) at a synchronized 100 Hz rate. Built on a tightly integrated pipeline that combines the CARLA autonomous‑driving simulator, Blender for electromagnetic‑material enrichment, and the Sionna ray‑tracing framework, the dataset provides temporally aligned multimodal data across 4 virtual towns, 16 distinct V2X communication scenarios, and three weather conditions (sunny, rainy, foggy). In total, approximately 160 000 frames (8–13 seconds per scenario) are collected, each frame containing raw CSI path parameters (complex gain, azimuth/elevation angles, delay) for every multipath component, full‑resolution LiDAR point clouds (64 channels, 30 k points), RGB and depth images (640 × 480, 110° field‑of‑view), IMU measurements with realistic noise models, and radar scans (2 k points, 30° × 110° field‑of‑view).

Key contributions are:

High‑frequency, high‑detail CSI – Unlike existing wireless datasets (e.g., DeepMIMO, Boston‑Twin) that provide only static channel statistics or received power, Multimodal‑Wireless delivers per‑frame, per‑path complex gains and geometric parameters at the 5G NR sub‑frame granularity (10 ms). This enables research on beam prediction, channel estimation, physical‑layer waveform design, and data‑driven channel modeling with unprecedented temporal fidelity.
Weather‑aware multimodal synchronization – The dataset systematically incorporates adverse weather effects. Rain and fog alter LiDAR back‑scatter, degrade camera contrast, and modify material permittivity for ray‑tracing, thereby affecting both sensing and propagation. Researchers can directly study robustness of perception‑communication pipelines under realistic environmental stressors, a capability largely missing from prior works.
Full V2X and collaborative perception integration – Each frame includes 3‑D bounding boxes and object labels (as in OPV2V/DAIR‑V2X) together with V2I/V2V channel matrices. This unified resource supports joint optimization problems such as context‑aware beamforming guided by visual or LiDAR cues, sensor‑aided channel prediction, and cooperative perception where communication constraints are explicitly modeled.
Extensible configuration‑driven pipeline – Scenario definition, sensor placement, antenna array geometry, carrier frequency, and weather parameters are all specified in a single human‑readable configuration file. By modifying this file, users can generate new towns, alter vehicle/RSU densities, experiment with massive MIMO arrays, or switch to mmWave frequencies without rewriting code. The pipeline automates: (i) CARLA scenario execution and sensor capture, (ii) Blender reconstruction with material enrichment (concrete, glass, metal, brick, etc.), and (iii) Sionna ray‑tracing to produce CSI. This design transforms the dataset from a static archive into a dynamic research tool.
Demonstration of multimodal LLM‑based beam prediction – As a proof‑of‑concept, the authors fine‑tune a large language model that ingests concatenated embeddings from LiDAR, RGB, depth, radar, and CSI to predict optimal transmit beams. Experiments show that incorporating visual and depth cues improves beam selection accuracy, especially under rainy and foggy conditions, highlighting the value of cross‑modal context for next‑generation communication systems.

The paper also provides exhaustive tables detailing sensor specifications, material assignments for electromagnetic simulation, and scenario statistics (e.g., roundabout, sky‑bridge, over‑pass, rural crossroad). The authors emphasize that the dataset is released under an open‑source license at https://le‑liang.github.io/mmw/, along with scripts for data generation, preprocessing, and baseline model training.

In summary, Multimodal‑Wireless fills a critical gap between autonomous‑driving perception datasets and wireless channel simulators by delivering a richly annotated, temporally synchronized, weather‑diverse, and easily extensible multimodal dataset. It opens new research avenues in context‑aware beamforming, data‑driven channel modeling, robust V2X perception, and multimodal AI for communications, positioning itself as a foundational resource for 6G and beyond‑5G investigations.

Multimodal-Wireless: A Large-Scale Dataset for Sensing and Communication

💡 Research Summary

Comments & Academic Discussion

Leave a Comment