Robust Semantic Transmission for Low-Altitude UAVs: Predictive Channel-Aware Scheduling and Generative Reconstruction
Unmanned aerial vehicle (UAV) downlink transmission facilitates critical time-sensitive visual applications but is fundamentally constrained by bandwidth scarcity and dynamic channel impairments. The rapid fluctuation of the air-to-ground (A2G) link creates a regime where reliable transmission slots are intermittent and future channel quality can only be predicted with uncertainty. Conventional deep joint source-channel coding (DeepJSCC) methods transmit coupled feature streams, causing global reconstruction failure when specific time slots experience deep fading. Decoupling semantic content into a deterministic structure component and a stochastic texture component enables differentiated error protection strategies aligned with channel reliability. A predictive transmission framework is developed that utilizes a split-stream variational codec and a channel-aware scheduler to prioritize the delivery of structural layout over reliable slots. Experimental evaluations indicate that this approach achieves a 5.6 dB gain in peak signal-to-noise (SNR) ratio over single-stream baselines and maintains structural fidelity under significant prediction mismatch.
💡 Research Summary
**
The paper tackles the problem of delivering high‑quality visual data from a low‑altitude unmanned aerial vehicle (UAV) to a ground user over a highly dynamic air‑to‑ground (A2G) channel. In such environments, line‑of‑sight (LOS) and non‑LOS (NLOS) transitions, rapid distance changes, and correlated shadowing cause the instantaneous signal‑to‑noise ratio (SNR) to fluctuate dramatically from slot to slot. Conventional deep joint source‑channel coding (DeepJSCC) approaches treat the learned latent representation as a single homogeneous stream and adapt only to the instantaneous channel state information (CSI). Consequently, when a deep fade occurs, both structural information (edges, object outlines) and fine‑grained texture degrade simultaneously, leading to catastrophic semantic loss.
To overcome this limitation, the authors propose a predictive, channel‑aware semantic transmission framework that (1) explicitly disentangles an image into a deterministic structure component and a stochastic texture component, (2) predicts future SNR values over a finite horizon using a neural predictor that ingests past SNR measurements and the UAV’s planned trajectory, and (3) schedules the transmission of the two components according to the predicted reliability of each time slot.
Core Technical Contributions
-
Structure‑Texture Variational Auto‑Encoder (ST‑VAE)
- A shared encoder extracts a 256 × 16 × 16 feature map from the input image.
- The structure branch produces a deterministic latent map (z_s) (128 × 16 × 16) that captures edges, shapes, and object layout. This branch is treated as a single global block, ensuring that the most semantically critical information is transmitted in a compact form.
- The texture branch models the residual details with a diagonal Gaussian posterior (q_\phi(z_t|x)). During training, the re‑parameterization trick enables back‑propagation.
- A conditional prior (p_\psi(z_t|z_s)) is learned so that, at the receiver, missing texture blocks can be sampled conditioned on the already received structure. This prior is a lightweight convolutional network that outputs mean and variance parameters.
-
Predictive SNR Forecasting
- A neural network (F_\varphi) receives three inputs: a history of measured SNRs, a history of UAV positions, and the planned future trajectory for the next (K) slots.
- It outputs a predicted SNR sequence (\hat\gamma_{k+1},\dots,\hat\gamma_{k+K}). These predictions are used solely for resource allocation; the actual transmission still experiences the realized SNR (\gamma_k).
-
Channel‑Aware Scheduler
- Slots whose predicted SNR exceeds a minimum usability threshold (\gamma_{\text{min}}) form the set (\mathcal{K}_{\text{use}}).
- Each usable slot receives a weight (\bar c_k = \log_2(1+\hat\gamma_{\text{lin},k})), inspired by the AWGN capacity formula.
- The total per‑image sample budget (n_{\text{tot}}) is proportionally divided among usable slots: (\hat n_k = n_{\text{tot}} \frac{\bar c_k}{\sum_j \bar c_j} + \epsilon). After flooring to integers and distributing the remaining budget to slots with the largest fractional parts, the final per‑slot budgets (n_k) sum exactly to (n_{\text{tot}}). Outage slots receive zero budget.
- The scheduler first places the single structure block into the most reliable slots, then distributes the 16 texture blocks across the remaining budget.
-
Loss Function and Training Strategy
- Reconstruction loss combines pixel‑wise MSE and a perceptual loss (e.g., VGG‑based) with weights (\lambda_{\text{pix}}) and (\lambda_{\text{perc}}).
- A KL‑divergence term (L_{\text{KL},t}=D_{\text{KL}}
Comments & Academic Discussion
Loading comments...
Leave a Comment