Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian embeddings are provably advantageous. In particular, they induce stable tracking of time-varying targets for linear readouts, achieve maximal entropy under a fixed variance budget, and encourage a balanced use of all representational dimensions–all of which enable agents to be more adaptive and stable. Building on this insight, we propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution during training. We demonstrate empirically, over a variety of domains, that this simple and computationally inexpensive method improves performance under non-stationarity while reducing representation collapse, neuron dormancy, and training instability.

💡 Research Summary

Deep reinforcement learning (DRL) suffers from intrinsic non‑stationarity: as the policy improves, both the data distribution and the bootstrapped learning targets drift continuously. This drift often leads to representation collapse, neuron dormancy, and unstable gradients, limiting performance and scalability. While prior work has tackled instability through algorithmic tricks, auxiliary losses, or architectural heuristics, the geometric structure of the learned representations has received far less attention.

The authors address this gap by asking: what statistical properties should a representation have to remain stable under continuously evolving targets? They prove that isotropic Gaussian embeddings are provably optimal for this setting. The core theoretical contribution is a tracking‑error analysis for a linear critic (Q_\theta(s,a)=w^\top\phi(s,a)) with a time‑varying TD target (y_t). Defining the covariance (\Sigma_\phi(t)=\mathbb{E}

Stable Deep Reinforcement Learning via Isotropic Gaussian Representations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment