Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient

Model-free Reinforcement Learning of Semantic Communication by Stochastic Policy Gradient
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Following the recent success of Machine Learning tools in wireless communications, the idea of semantic communication by Weaver from 1949 has gained attention. It breaks with Shannon’s classic design paradigm by aiming to transmit the meaning, i.e., semantics, of a message instead of its exact version, allowing for information rate savings. In this work, we apply the Stochastic Policy Gradient (SPG) to design a semantic communication system by reinforcement learning, separating transmitter and receiver, and not requiring a known or differentiable channel model – a crucial step towards deployment in practice. Further, we derive the use of SPG for both classic and semantic communication from the maximization of the mutual information between received and target variables. Numerical results show that our approach achieves comparable performance to a model-aware approach based on the reparametrization trick, albeit with a decreased convergence rate.


💡 Research Summary

The paper tackles the emerging problem of semantic communication, where the goal is to convey the meaning (semantics) of a message rather than its exact bitwise representation. Starting from an information‑theoretic formulation, the authors model the end‑to‑end system as a Markov chain z → s → x → y → \hat{z}. Here z is a hidden semantic variable, s is a semantic observation generated by a semantic channel p(s|z), x is the transmitted signal produced by an encoder pθ(x|s), y is the received signal after a physical channel p(y|x), and \hat{z} is the estimate obtained by a decoder qϕ(z|y). The design objective is to maximize the mutual information Iθ(z;y), which directly quantifies how much of the original meaning survives the transmission. This objective can also be expressed as an Information Bottleneck (IB) problem: maximize Iθ(z;y) while constraining the compression rate Iθ(s;y) (e.g., by fixing the number of transmit antennas).

Traditional model‑aware approaches solve the problem by re‑parameterizing the channel output y = μθ(s)+Σ½ n, which makes the loss differentiable and allows the use of standard back‑propagation. However, this requires a known, differentiable channel model—a restriction that prevents deployment in realistic wireless environments where the channel may be unknown, non‑linear, or time‑varying.

To overcome this limitation, the authors propose a model‑free reinforcement‑learning (RL) framework based on the Stochastic Policy Gradient (SPG). They replace the deterministic encoder with a stochastic policy pθ(x|s) (e.g., a multivariate Gaussian with mean μθ(s) and exploration variance σ²_exp). Using the log‑trick, the gradient of the cross‑entropy loss with respect to θ becomes an expectation of the form E


Comments & Academic Discussion

Loading comments...

Leave a Comment