Multimodal-NF: A Wireless Dataset for Near-Field Low-Altitude Sensing and Communications
Environment-aware 6G wireless networks demand the deep integration of multimodal and wireless data. However, most existing datasets are confined to 2D terrestrial far-field scenarios, lacking the 3D spatial context and near-field characteristics cruc…
Authors: Mengyuan Li, Qianfan Lu, Jiachen Tian
Multimo dal-NF: A Wireless Dataset for Near-Field Lo w-Altitude Sensing and Comm unications Mengyuan Li, Graduate Studen t Mem b er, IEEE, Qianfan Lu, Jiachen Tian, Hongjun Hu, Y u Han, Mem b er, IEEE, Xiao Li, Senior Member, IEEE, Chao-Kai W en, F ello w, IEEE, and Shi Jin, F ellow, IEEE Abstract—En vironmen t-a w are 6G wireless net w orks de- mand the deep in tegration of m ultimo dal and wireless data. Ho w ev er, most existing datasets are conned to 2D terrestrial far-eld scenarios, lac king the 3D spatial context and near- eld characteristics crucial for low-altitude extremely large- scale multiple-input m ultiple-output (XL-MIMO) systems. T o bridge this gap, this letter introduces Multimo dal-NF, a large- scale dataset and sp ecialized generation framework. Operating in the upp er midband, it synchronizes high-delity near-eld c hannel state information (CSI) and precise wireless lab els (e.g., T op-5 b eam indices, LoS/NLoS) with comprehensiv e sen- sory modalities (RGB images, LiD AR p oint clouds, and GPS). Crucially , these multimodal priors provide spatial semantics that help reduce the near-eld search space and thereb y low er the ov erhead of wireless sensing and comm unication tasks. Finally , w e v alidate the dataset through represen tativ e case studies, demonstrating its utilit y and eectiv eness. The op en- source generator and dataset are a v ailable at https://lmyxxn. github.io/6GXLMIMODatasets/ . Index T erms—Near-eld, XL-MIMO, upp er midband, mul- timo dal dataset, low-altitude, sensing, communications. I. Introduction T O W ARDS sixth-generation (6G) wireless netw orks, extremely large-scale multiple-input multiple-output (XL-MIMO) and the emerging low-altitude econom y (LAE) hav e b ecome key driving forces [ 1 , 2 ]. Building up on the momentum of the upp er 6 GHz (U6G) band, extending in to the broader upp er midband (7-24 GHz) serv es as a strategic spectrum choice for these 3D low- altitude systems [ 3 ]. Its short wa v elengths allo w for dense XL-MIMO pac king, while con tiguous wide bandwidths sig- nican tly bo ost transmission rates [ 4 ]. Although articial in telligence (AI) has demonstrated immense p otential in optimizing comm unication systems [ 5 ], its ecacy funda- men tally relies on massive training data. The lac k of large- scale, high-quality datasets that capture the complex 3D near-eld characteristics of upp er midband LAE systems creates a severe b ottlenec k. Existing representativ e wireless datasets and generators (summarized in T able I ) face critical limitations for M. Li, Q. Lu, J. Tian, H. Hu, Y. Han, X. Li, and S. Jin are with the School of Information Science and Engineering, Southeast Universit y , Nanjing 210096, China (email: mengyuan_li@seu.edu.cn; qianfan_lu@seu.edu.cn; huhongjun@seu.edu.cn; tianjiachen@seu.edu.cn; li_xiao@seu.edu.cn; hanyu@seu.edu.cn; jinshi@seu.edu.cn). C.-K. W en is with the Institute of Comm unications Engineering, National Sun Y at-sen Universit y , Kaohsiung 804, T aiwan (e-mail: chaokai.w en@mail.nsysu.edu.tw). LAE applications. Real-world datasets (e.g., DeepSense6G [ 6 ]) suer from rigid physical settings, while prev ailing sim ulators (e.g., DeepMIMO [ 7 ]) strictly rely on far-eld planar wa ve assumptions or lac k 3D dynamic multimodal supp ort. Ev en recen t near-eld mo dels [ 8 ] remain conned to 2D terrestrial environmen ts. T o bridge this gap, we in tro duce Multimodal-NF, with the main contributions summarized as follows: • Customizable 3D Near-Field Generator: W e develop an op en-source dataset generator for low-altitude XL- MIMO systems, whic h allo ws researchers to exibly dene 3D UA V trajectories, array congurations, and en vironmen ts. • Large-Scale Multimo dal Dataset: W e provide a large- scale dataset that synchronizes high-delity wireless data (e.g., near-eld channel state information (CSI), T op-5 beam indices, line-of-sigh t (LoS)/non-LoS (NLoS) lab els) with rich sensory information, including RGB images, LiD AR p oint clouds, and GPS coordinates. • Dataset Analysis and V alidation: W e analyze and v al- idate the dataset through represen tativ e case studies. These results demonstrate its ecacy in emp ow ering en vironmen t-a w are sensing and comm unications. I I. System Mo del A. Near-Field Channel Mo del As depicted in Fig. 1 , w e in v estigate a low-altitude XL- MIMO system where a base station (BS) is equipp ed with an M -elemen t ( M = M y × M z ) uniform planar array (UP A) in the y z -plane with half-wa velength spacing. A single-an tenna unmanned aerial v ehicle (UA V), acting as the user equipmen t (UE), mo v es in the system. T o ensure cross-mo dal spatial alignmen t, an R GB camera and a LiD AR are co-lo cated with the UP A, oriented tow ards the + x -axis with identical eld-of-view (F oV) and sharing the same global co ordinate system. The UA V’s ground- truth (GT) 3D spatial co ordinate at time t is denoted by u t ∈ R 3 . The near-eld uplink channel betw een the m -th BS antenna at p m and the UE at time t and frequency f is h m ( t, f ) = L ( t ) X l =1 g l,m ( t, f ) e − j 2 πf c d l,m ( t ) , (1) where L ( t ) , g l,m ( t, f ) , and d l,m ( t ) denote the num ber of paths, complex path gain, and Euclidean propagation T ABLE I: Comparison of Representativ e Wireless Datasets and Generation F rameworks. Category Dataset/F ramework BS Antenna F requency Band Near-Field 3D Customizable Modalities Real-W orld Measured DeepSense 6G [ 6 ] 16 ULA mm W av e × ✓ × CSI, LiDAR, Radar, RGB, GPS LuViRA [ 9 ] 25 × 4 UP A Mid-band × × × CSI, RGB, Depth, IMU, Audio Simulated DeepMIMO [ 7 ] 16 × 16 UP A mm W av e × × × CSI Only Raymobtime [ 10 ] 4 × 4 UP A mm W av e × × × CSI, LiD AR, Radar, R GB BUPTCMCC-6G [ 8 ] Custom Upper midband, mm W a ve, THz ✓ × ✓ CSI Only CA VIAR6G [ 11 ] 8 × 8 UP A mm W ave × ✓ ✓ CSI, LiDAR, RGB, Position Multimodal Wireless [ 12 ] ULA ( ≤ 256 ) UP A ( ≤ 8 × 8 ) mm W av e × × ✓ CSI, LiDAR, Radar, RGB, Depth, IMU Multimodal-NF 64 × 64 UP A U6G, upper midband ✓ ✓ ✓ CSI, RGB, LiDAR, GPS, Wireless Labels distance from UE to the lo cation of the m -th antenna p m , resp ectiv ely . c is the sp eed of ligh t. This p er-an tenna exact distance computation explicitly captures the near-eld spherical wa vefron t. T o provide GT lab els for downstream learning tasks, the receiv ed signal for a predened near- eld co deb o ok W is computed as y ( t, f ) = w H p P r h ( t, f ) + n ( t, f ) , (2) where w is the b eamforming vector, P r is the receive p o w er, and n ( t, f ) ∼ C N ( 0 , σ 2 I ) is the additiv e Gaussian noise. B. Motiv ation for Multimodal Supp ort W e use x t = ( u t , E ) to denote the underlying geometric state, where u t is the UA V location and E denotes the propagation environmen t. Let s t and c t denote sensing- and comm unication-related target v ariables, respectively . F or analytical tractabilit y , w e quan tize the state spaces so that x t , s t , c t , and the multimodal observ ation V t are discrete random v ariables. Accordingly , H ( · ) and H ( · | · ) denote the Shannon entrop y and conditional entrop y . Both sensing and comm unication tasks are gov erned by the same geometric state, yet recov ering it from wireless observ ations alone is inherently uncertain. Prop osition 1 (En tropy Reduction for Sensing and Comm unication via Multimo dal Side Information): If the sensing-related target s t and the communication-related target c t b oth dep end on the geometric state x t up to small residual uncertainties, quantied b y H ( s t | x t ) ≤ ε s , H ( c t | x t ) ≤ ε c . (3) F or m ultimodal observ ation V t , the residual uncertainties satisfy H ( s t | V t ) ≤ H ( x t | V t )+ ε s , H ( c t | V t ) ≤ H ( x t | V t )+ ε c . (4) In the sp ecial case where b oth s t and c t are deterministic functions of x t , the ab ov e b ounds reduce to H ( s t | V t ) ≤ H ( x t | V t ) , H ( c t | V t ) ≤ H ( x t | V t ) . (5) Therefore, informative multimodal observ ations that re- duce geometric uncertain t y can sim ultaneously reduce the eectiv e search spaces of b oth sensing and communication tasks. GPS Lidar UA V BS z x y O M z M y r m p m r θ Ŭ • ! "# Camera )RU[JVUOTZY 8-(OSGMK (9 Fig. 1: Illustration of the low-altitude XL-MIMO system. The ab ov e result follo ws directly from the chain rule of conditional entrop y . F or example, for the comm unication task, H ( c t | V t ) ≤ H ( c t , x t | V t ) = H ( x t | V t ) + H ( c t | x t , V t ) ≤ H ( x t | V t ) + H ( c t | x t ) , (6) the last inequality follows from the fact that conditioning cannot increase en trop y . The same holds for the sensing task. Remark 1: The prop osition sho ws that m ultimo dal sensing b enets b oth task categories through a shared mec hanism: it reduces the uncertain t y of the underlying geometric state, whic h in turn reduces the uncertain ty of b oth sensing-related and comm unication-related targets. In complex low-altitude en vironments, explicitly deriving the mappings from multimodal observ ations to these task states is analytically intractable. This motiv ates the prop osed Multimo dal-NF dataset, which provides aligned m ultimo dal observ ations together with sensing and comm unication lab els for data-driven learning. I I I. Multimo dal-NF: Dataset Generator In this section, we detail the worko w of the prop osed generator, including: (i) scene construction, (ii) tra jectory sim ulation, and (iii) wireless and m ultimo dal data gener- ation. A. Scene Construction and T ra jectory Simulation As illustrated in Fig. 2 (a), the simulation scene spans a 120 m × 120 m area, with the BS deploy ed at a xed height at the center of the b oundary . The environmen t allows 2 (9 (9 ;' < G H Fig. 2: Example visualizations of (a) the LAE scene with the pre-dened tra jectory modes and (b) the near-eld ra ys. for exible conguration of road topologies, building densities, and geometric attributes, where building heigh ts are realistically constrained b etw een 20 m and 60 m. T o facilitate subsequent channel generation via the Sionna ra y-tracing (R T) engine, w e assign sp ecic electromag- netic materials to all 3D geometric entities according to ITU-R Recommendation P .2040-3 [ 13 ]. These primarily include ITU-concrete for roads, ITU-marble/woo d/metal for buildings, and medium dry ground for the terrain. Suc h precise material parameterization ensures physically accurate w a v e propagation by prop erly mo deling reec- tion, diraction, and scattering phenomena. The prop osed dataset observ es T consecutive time slots for each UA V. T o reect t ypical lo w-altitude operations, w e incorp orate 10 tra jectory mo des represen ting div erse kinematic b eha viors, ranging from near-ground urban can y ons to the upp er urban canopy . As detailed in T able II , each mo de is characterized b y horizontal and v ertical v elocity ranges as well as sp ecic ight altitude la y ers. F urthermore, these tra jectories are categorized in to t w o diculty lev els: hard and easy . The hard mo des (Zigzag, Sudden T urn, W all Hug, and Insp ect) introduce c hallenging propagation conditions. Sp ecically , highly mobile mo des, suc h as Zigzag and Sudden T urn, are de- signed to test the ability to track rapid spatial v ariations, while W all Hug and Insp ect simulate NLoS-dominant en vironmen ts to ev aluate robustness against blo ckages. Con v ersely , the easy mo des (Street P atrol, City Cruise, Orbit, Scan, F ast T ransit, and Hov er) feature relativ ely stable igh ts. B. Wireless and Multimo dal Data Generation 1) Wireless Data: W e congure the Sionna R T with syn thetic_arra y=F alse to enforce p er-antenna c hannel generation. By calculating element-wise propagations, this approac h accurately captures the spherical wa vefron t c har- acteristics inherent to near-eld channels, as visualized b y individual ray paths in Fig. 2 (b), and the generation details can b e found in [ 14 ]. Stac king the c hannels across all M antennas yields H ( t ) ∈ C M × K o v er K sub carriers, and the CSI ov er a tra jectory of T frames is stored in T ABLE II: Parameters of Pre-dened UA V tra jectory mo des. ID T ra jectory Horiz. V el. (m/s) V ert. V el. (m/s) Altitude (m) Flight Altitude Lay er 1 Zigzag 0 – 5 0 – 1.5 5 – 15 Urban Ultra-Low 2 W all Hug 5 – 15 0 5 – 20 Urban Low 3 Inspect 0 0 – 2 2 – 60 Urban Low-High 4 Sudden T urn 8 – 12 0 – 2 5 – 45 Urban Mid-Low 5 Street Patrol 8 – 12 0 – 2 5 – 45 Urban Mid-Low 6 Ho ver 0 0 – 0.5 10 – 80 F ull Co verage 7 Cit y Cruise 8 – 15 0 30 – 60 Urban Mid-High 8 Orbit 0 – 10 0 30 – 60 Urban Mid-High 9 F ast T ransit 15 – 25 0 50 – 80 Urban Ultra-High 10 Scan 0 – 12 0 50 – 80 Urban Ultra-High the released dataset by separating the real and imaginary parts into a tensor in R M × K × T × 2 . Alongside the generated near-eld CSI matrix, we pro vide supplemen tary wireless lab els for do wnstream tasks: (i) a LoS indicator, (ii) the T op-5 optimal b eam indices, and (iii) the corresp onding normalized b eamforming gains. T o derive these b eam lab els, we construct a 3D near-eld co deb o ok W b y uniformly sampling the main UA V op er- ational region across azimuth θ k θ ∈ [ − 72 ◦ , 72 ◦ ] , elev ation φ k φ ∈ [60 ◦ , 150 ◦ ] , and distance d k r ∈ [20 , 155] m, using N θ = 20 , N φ = 20 , and N r = 10 grid points, resp ectively . Con v erting the sampled p olar tuple ( θ k θ , φ k φ , d k r ) to Cartesian co ordinates p cw ∈ R 3 , the near-eld co deword is w = 1 √ M h e − j 2 πf c c ∥ p cw − p 1 ∥ , . . . , e − j 2 πf c c ∥ p cw − p M ∥ i T , (7) where f c denotes the carrier frequency . F or a co deword w ∈ W at time slot t , the ac hiev able rate is R ( w , t, f c ) = log 2 1 + P r w H h ( t, f c ) 2 σ 2 ! . (8) Eac h b eam is uniquely identied b y a global index k : k = ( k θ − 1) N φ N r + ( k φ − 1) N r + k r . (9) W e select the T op-5 global indices that maximize R ( w , t, f c ) and map them back to their spatial tuples ( k θ , k φ , k r ) to generate the GT T op-5 decomposed lab els. T o ev aluate b eam alignmen t, the normalized b eamforming gain is dened as G norm ( w , t, f c ) = w H h ( t, f c ) 2 | ( w ∗ ) H h ( t, f c ) | 2 , (10) where w ∗ denotes the optimal GT b eamforming vector. 2) Multimo dal Sensing Data: T o facilitate environ- men tal sensing, we simulate multiple virtual sensors to construct a div erse multimodal dataset, with detailed congurations and data dimensions summarized in T a- ble II I . First, to capture the visual and structural context of the environmen t, we generate RGB images from a xed camera viewp oin t at the BS and simultaneously sim ulate a co-lo cated LiDAR sensor using Op en3D [ 15 ]. The in tegration of p oint cloud data pro vides 3D geometric 3 T ABLE I I I: Conten ts of Multimodal-NF Dataset. Modality Data Comp onents & Attributes Wireless CSI tensor e H ∈ R M × K × T × 2 (real/imaginary stack ed), LoS Indicators, T op-5 Beam Indices, Normalized Beam- forming Gains GPS 3D Co ordinates with Gaus sian noise N (0 , σ 2 GPS ) Vision RGB Image (F oV= 90 ◦ , 512 × 512 ) LiDAR 10,000-p oint Cloud (co-lo cated with camera) Label T rajectory ID (10 kinematic mo des) T ABLE IV: Summary of the Multimo dal-NF dataset splits and sample statistics (UP A 64 × 64 ). Split Mo de Cities T ra j. Samples T otal LoS NLoS T rain Easy 22 2,614 52,280 49,923 2,357 Hard 5,185 103,700 94,251 9,449 V al Easy 4 488 9,760 9,374 386 Hard 988 19,760 19,139 621 T est Easy 4 494 9,880 9,381 499 Hard 1,001 20,020 19,007 1,013 T otal – 30 10,770 215,400 201,075 14,325 Note: All trajectories contain T = 20 frames with a sampling interv al of 0.1 s and altitude range 5–80 m. The BS UP A is lo cated at (0 , 0 , 65) m. The dataset is split by cities into T rain/V al/T est = 22/4/4. Overall, 93.35% of samples are LoS and 6.65% are NLoS. information, whic h eectiv ely comp ensates for p otential visual degradation under adverse lighting conditions or bad weather. Representativ e visualizations of these tw o mo dalities are illustrated in Fig. 1 . T o emulate realistic p ositioning measurements, we add Gaussian noise to the GT co ordinate to obtain a noisy GPS observ ation ˜ u t = u t + z t , where z t ∼ N ( 0 , σ 2 GPS I ) . Finally , eac h sample is annotated with a specic tra jectory mode ID that categorizes the UA V’s ight pattern. IV. Dataset Analysis and Case Studies In this section, we ev aluate the Multimo dal-NF dataset through tw o typical b enc hmark case studies. T o align with the 6G upp er midband sp ectrum standardization, the communication system operates at a carrier frequency f c = 7 GHz with a sub carrier spacing ∆ f = 30 kHz across K = 1024 sub carriers. The BS is p ositioned at a height H BS = 65 m, equipp ed with a vertically polarized UP A comprising M y × M z = 64 × 64 antennas. The comprehen- siv e dataset splits and sample statistics generated under these congurations are detailed in T able IV . T o v alidate the dataset’s practical utilit y , w e conduct t wo do wnstream case studies: near-eld localization and multimodal b eam prediction. F or these ev aluations, we extract the single- carrier CSI ev aluated at f c , and dene the GPS noise v ariance as σ 2 GPS = 0 . 5 . 1) Case Study I: User lo calization: T o ev aluate the prop osed dataset, we conduct 3D near-eld user lo cal- ization using the orthogonal matc hing pursuit (OMP) algorithm used in [ 16 ] with a ne-grained p olar-domain co deb o ok. Unlike conv entional far-eld mo dels typically G H Fig. 3: Visualizations of (a) the Cartesian-domain c hannel and 3D lo cation of UE (estimation and GT) and (b) the Angular-domain channel and angles of UE (estimation and GT). 2U9ZXGPKI ZUX_ GTJMRUHGR HKGSOT JK^ • \GXOGZ OUT *OYZGTI KOTJK^ • \GXOGZO UT '`OS[ZN ! O TJK^ • " ! \GXOG ZOUT +RK\GZO UTOTJK^ • # ! \GXOGZ OUT 42U9 ZXG PKIZUX _GTJMRUHGR HKGSOTJK^ • \GXOGZO UT *OYZGTI KOTJK^ • ! \GXOGZO UT '`OS[ZNO TJK^ • " ! \GXOGZO UT +RK\GZO UTOTJK^ • # ! \GXOGZO UT G H :OSKYZKV JKIU[VRKJ HKGSO TJK^ 9NGXVP[SVY :OSKYZKV JKIU[VRKJ HKGSO TJK^ 9SUUZN\GXOGZOUTY /TJK^\GXOGZOUTYOTJ[IKJH_Y[XXU[TJOTMYIGZZKXKXY × (KGSRUIGZO UT $ ;'<RUIGZOUT !!!! $ 42U9 RUIG ZOUT Fig. 4: Spatial-temp oral v ariation of b eam indices in (a) LoS and (b) NLoS scenarios, including the global index v ariations and decomp osed distance d , elev ation φ , and azim uth θ indices ov er time. The color indicates the b eam index v alue. restricted to 2D angle-of-arriv al (AoA) estimation, near- eld c hannels characterized by spherical wa vefron ts enable join t acquisition of azim uth, elev ation, and distance. As illustrated in Fig. 3 (a), the estimated UE p ositions in the 3D Cartesian co ordinate system align closely with the GT. This high-precision alignmen t demonstrates the dataset’s abilit y to simultaneously preserve accurate azimuthal and elev ation features alongside distance information. This is further corrob orated by the angular-domain resp onse in Fig. 3 (b), where a pronounced angular spread eect is observ ed. These distinct near-eld characteristics v alidate that the generated channels successfully encapsulate the essen tial 3D geometric characteristics of the propagation en vironmen t, substantiating the dataset’s suitability for high-resolution sensing tasks. 2) Case Study I I: Multimodal Beam Prediction: First, w e illustrate the inheren t challenges of capturing the b eam features in the prop osed dataset by analyzing the dynamic trends of the GT b eam indices. Fig. 4 visualizes the spatial-temporal evolution of b eam indices relativ e to the UA V’s trajectory under b oth LoS and NLoS scenarios, including the global b eam index and the individual indices for each dimension (azim uth, ele- v ation, and distance). As shown in Fig. 4 (a), although 4 Fig. 5: System achiev able rate comparison of the LLM- based beam prediction metho d (trained on the prop osed dataset) with traditional b eam training baselines. the global b eam index is highly correlated with the UA V’s tra jectory , it exhibits p erio dic sharp jumps. This o ccurs b ecause ( 9 ) maps a contin uous 3D index to a 1D array , causing index discontin uities when the UA V crosses elev ation or azim uth boundaries. Con v ersely , NLoS scenarios (Fig. 4 (b)) exhibit highly irregular spatial dis- con tin uities, as the b eam index is aected not only by c hanges in the UA V’s position but also b y the surrounding scatterers. These observ ations highlight the necessity of m ultimo dal en vironmen tal aw areness to anticipate UA V tra jectories and perform blo ckage-a w are b eam prediction. Consequen tly , these characteristics demonstrate that the prop osed dataset faithfully reects the complexities of real-w orld communication environmen ts, oering highly c hallenging and realistic scenarios for related research. F ollowing the prop osed large language mo del (LLM)- based b eam prediction algorithm in [ 17 ], w e utilize the pre-trained GPT-2 as the backbone to predict future b eam indices for T p = 10 time slots using the historical T h = 10 m ultimodal data. As shown in Fig. 5 , the base- line using traditional ecient tw o-stage near-eld beam training [ 18 ] consisten tly outp erforms using a far-eld co deb o ok, verifying the necessity of near-eld mo deling in XL-MIMO datasets. Sp ecically , in our setup, the tw o- stage metho d requires 110 pilot symbols for near-eld training ( 100 for coarse angular scanning and 10 for ne distance renemen t) and 100 pilots ( 64 coarse, 36 ne) for the far-eld counterpart. These pilot coun ts follo w the sp ecic search grid settings used in our implementation. F urthermore, across b oth LoS and NLoS scenarios, the LLM-based metho d trained on our dataset approac hes the upp er b ound of achiev able rates dened by exhaus- tiv e search, notably with multimodal-assisted zero b eam training o v erhead. More importantly , Fig. 6 demonstrates an ablation study highlighting the v alue of the pro vided m ultimo dal data. While GPS data provides a baseline spatial aw areness, the integration of RGB images and LiD AR p oint clouds signicantly enhances the normalized b eamforming gain, esp ecially in NLoS scenarios, where they pro vide useful geometric cues ab out the surrounding 1 2 3 4 5 6 7 8 9 10 Prediction T ime Step 0.60 0.65 0.70 0.75 0.80 0.85 Normalized Beamforming Gain GPS+IMG+LiDAR+Prompt GPS+IMG+LiDAR GPS+IMG+Prompt GPS+LiDAR+Prompt GPS+Prompt Fig. 6: Ablation study on dierent input modalities. en vironmen t. This impro v emen t underscores the critical role of visual and structural mo dalities in p erceiving spa- tial blo ckages, demonstrating that the prop osed dataset supp orts environmen t-aw are communications. V. Conclusion In this letter, we in troduced Multimo dal-NF, an open- source multimodal dataset and generator tailored for sensing and comm unications research in low-altitude near- eld XL-MIMO systems under the upp er midband. By pro viding high-delit y CSI paired with rich sensory modal- ities (e.g., GPS, RGB and LiDAR) and precise wireless lab els, this dataset helps bridge the gap left by traditional 2D far-eld datasets. W e v alidated its utilit y through represen tativ e tasks, demonstrating the critical adv an tage of multimodal environmen tal aw areness in complex 3D scenarios. F urthermore, the open-source generator emp ow- ers researchers to sim ulate diverse environmen ts, suc h as terrestrial vehicular net works, custom UP A congurations, and alternativ e frequency bands, with few mo dications. References [1] Y. W u et al., “Low-altitude UA V p osition prediction- assisted near-eld adaptive beamwidth control for XL- MIMO systems,” IEEE Internet Things J., vol. 13, no. 4, pp. 5477–5490, F eb. 2026. [2] Y. Jiang et al., “Integrated sensing and communication for low altitude economy: Opp ortunities and challenges,” IEEE Commun. Mag., vol. 63, no. 12, pp. 72–78, 2025. [3] J. Zhang et al., “New midband for 6G: Several con- siderations from the channel propagation characteristics p ersp ectiv e,” IEEE Commun. Mag., vol. 63, no. 1, pp. 175–180, Jan. 2025. [4] J. Tian et al., “Mid-band extra large-scale MIMO sys- tem: Channel mo deling and p erformance analysis,” IEEE T rans. Commun., vol. 73, no. 2, pp. 1025–1041, F eb. 2025. [5] P . Zhang et al., “ComAI: The conv ergence of communica- tion and articial intelligence,” IEEE Comm un. Surv. & T ut., vol. 28, pp. 2163–2197, 2026. [6] A. Alkhateeb et al., “DeepSense 6G: A large-scale real- w orld multi-modal sensing and communication dataset,” IEEE Commun. Mag., v ol. 61, no. 9, pp. 122–128, Sept. 2023. [7] ——, “DeepMIMO: A generic deep learning dataset for millimeter wa v e and massiv e MIMO applications,” in Pro c. Inf. Theory Appl. W orkshop (IT A), 2019, pp. 1– 8. 5 [8] L. Y u et al., “BUPTCMCC-6G-DataAI+: A generative c hannel dataset for 6G AI air-interface research,” Sci. China Inf. Sci., v ol. 68, no. 9, p. 197301, Sep. 2025, doi: 10.1007/s11432-024-4445-0. [9] O. Y aman et al., “The LuViRA dataset: Synchronized vision, radio, and audio sensors for indo or localization,” in Proc. IEEE Int. Conf. Rob ot. A utom. (ICRA), 2024, pp. 11 920–11 926. [10] A. Klautau et al., “5G MIMO data for mac hine learning: Application to b eam-selection using deep learning,” in Pro c. Inf. Theory Appl. W orkshop (IT A), 2018, pp. 1– 9. [11] J. Borges et al., “CA VIAR: Co-simulation of 6G commu- nications, 3-d scenarios, and AI for digital twins,” IEEE In ternet Things J., v ol. 11, no. 19, pp. 31 287–31 300, 2024. [12] T. Mao et al., “Multimo dal-Wireless: A large-scale dataset for sensing and communication,” arXiv preprint arXiv:2511.03220, 2025. [13] ITU-R, “Eects of building materials and structures on ra- dio wa ve propagation abov e about 100 MHz,” In ternational T elecommunication Union (ITU), Genev a, Switzerland, Recommendation ITU-R P .2040-3, Aug. 2023. [Online]. A v ailable: https://www.itu.int/rec/R- REC- P.2040/en [14] J. Hoydis et al., “Sionna R T: Dieren tiable ra y tracing for radio propagation modeling,” in Proc. IEEE Globecom W orkshops (GC Wkshps), Kuala Lumpur, Malaysia, Dec. 2023, pp. 317–321. [15] Q.-Y. Zhou, J. Park, and V. Koltun, “Op en3D: A mo dern library for 3D data processing,” arXiv:1801.09847, 2018. [16] M. Li et al., “Keyp oint detection emp ow ered near-eld user lo calization and channel reconstruction,” IEEE T rans. Wireless Commun., vol. 24, no. 7, pp. 5664–5677, Jul. 2025. [17] ——, “Structure-a w are m ultimo dal LLM framew ork for trustw orth y near-eld b eam prediction,” arXiv preprin t arXiv:2603.16143, 2026. [Online]. A v ailable: [18] C. W u et al., “T wo-stage hierarc hical beam training for near-eld communications,” IEEE T rans. V eh. T echnol., v ol. 73, no. 2, pp. 2032–2044, F eb. 2024. 6
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment