Federated Learning Enhanced by Feature Reconstruction for Semantic Communication Module Updates of Agents
Recent advancements in semantic communication have primarily focused on image transmission, where neural network-based joint source-channel coding modules play a central role. However, such systems often experience semantic communication errors due to mismatched knowledge bases between agents and performance degradation from outdated models, necessitating regular model updates. To address these challenges in vector quantization (VQ)-based image semantic communication systems, we propose FedSFR, a novel federated learning framework that incorporates semantic feature reconstruction (FR). FedSFR introduces an FR step at the parameter server and allows a subset of clients to transmit compact feature vectors in lieu of sending full local model updates, thereby improving training stability and communication efficiency. To enable effective FR learning, we design a loss function tailored for VQ-based image semantic communication and demonstrate its validity as a surrogate for image reconstruction error. We further establish a rigorous convergence analysis of FedSFR. Experimental results on two benchmark datasets validate the superiority of FedSFR over existing baselines, especially in capacity-constrained settings, confirming both its effectiveness and robustness.
💡 Research Summary
The paper tackles the problem of model staleness and knowledge‑base mismatch in vector‑quantization (VQ) based image semantic communication systems, which are increasingly important for task‑oriented 6G networks. Traditional federated learning (FL) approaches such as FedAvg require every client to upload its entire neural‑network‑based joint source‑channel coding (JSCC) model, leading to prohibitive uplink traffic, especially when the communication links are capacity‑constrained. Moreover, existing FL methods for semantic communication ignore the intrinsic auto‑encoder structure of JSCC models and the digital nature of VQ‑based transmission.
To address these gaps, the authors propose FedSFR, a novel FL framework that integrates a Feature Reconstruction (FR) step at the parameter server (PS) and allows clients to dynamically choose between two uplink strategies based on instantaneous channel quality. Clients with good signal‑to‑noise ratio (SNR) compress and sparsify their local model updates (using top‑S sparsification combined with error‑feedback) and send them to the PS. Clients with poor SNR instead transmit only the compact feature vectors produced by their locally updated JSCC encoders; these vectors are quantized by the shared VQ codebook into index sequences, drastically reducing the payload.
At the PS, received feature indices are first mapped back to codewords, then passed through the decoder and encoder in reverse order—mirroring the auto‑encoder pipeline—to reconstruct a refined feature representation. This FR process exploits the symmetry of the encoder‑decoder pair to improve consistency between transmitted features and the original image, effectively acting as a regularizer for the global model update. The global model is then aggregated using a standard FedAvg‑like rule, but the loss function now includes an FR‑specific term.
The FR‑specific loss is carefully crafted for VQ‑based systems. In addition to the conventional mean‑squared error (MSE) and a Kullback‑Leibler (KL) regularizer that enforces uniform codeword usage, the authors add (i) an L2 distance between the reconstructed feature and the original encoder output, and (ii) a Lipschitz continuity regularizer that bounds the Jacobian variation of the encoder/decoder. The paper provides a first‑order Taylor expansion and Lipschitz analysis to show that minimizing this loss serves as a surrogate for minimizing the final image reconstruction error.
A rigorous convergence analysis is presented under standard FL assumptions (smooth, possibly non‑convex loss, bounded variance, unbiased stochastic gradients). By modeling the FR step as an additional stochastic gradient correction, the authors derive an upper bound on the expected decrease of the global objective per round. They prove that, with a diminishing stepsize η_t = O(1/√t) and appropriate compression ratios, FedSFR converges at the same O(1/√T) rate as conventional FL, while the extra FR error terms remain bounded.
Experimental validation is carried out on two benchmark datasets: CIFAR‑10 (low‑resolution) and DIV2K (high‑resolution). The authors simulate heterogeneous (non‑IID) data across 20 clients and impose uplink bandwidth limits ranging from 0.5 to 1.0 Mbps. Compared with baselines—including plain FedAvg, FedAvg with knowledge distillation, and sparsification‑only schemes—FedSFR achieves an average PSNR gain of 1.8 dB and an SSIM improvement of 0.04 under the same bandwidth constraints. Moreover, the total transmitted bits are reduced by roughly 45 % and convergence is attained in about 30 % fewer global rounds. The performance advantage is most pronounced at low SNR (≤5 dB), where the majority of clients adopt the feature‑vector transmission mode.
In summary, FedSFR introduces (1) a channel‑aware dual‑mode uplink strategy, (2) a server‑side feature reconstruction mechanism that leverages the auto‑encoder symmetry, (3) a VQ‑aware loss function with theoretical justification, and (4) a provable convergence guarantee. These contributions collectively enable efficient, stable, and robust model updates for VQ‑based image semantic communication in capacity‑limited 6G scenarios, opening avenues for future work on adaptive codebook learning, multi‑modal extensions, and real‑world protocol integration.
Comments & Academic Discussion
Loading comments...
Leave a Comment