BAP-SRL: Bayesian Adaptive Priority Safe Reinforcement Learning for Vehicle Motion Planning at Mixed Traffic Intersections

BAP-SRL: Bayesian Adaptive Priority Safe Reinforcement Learning for Vehicle Motion Planning at Mixed Traffic Intersections
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Navigating urban intersections, especially when interacting with heterogeneous traffic participants, presents a formidable challenge for autonomous vehicles (AVs). In such environments, safety risks arise simultaneously from multiple sources, each carrying distinct priority levels and sensitivities that necessitate differential protection preferences. While safe reinforcement learning (RL) offers a robust paradigm for constrained decision-making, existing methods typically model safety as a single constraint or employ static, heuristic weighting schemes for multiple constraints. These approaches often fail to address the dynamic nature of multi-source risks, leading to gradient cancellation that hampers learning, and suboptimal trade-offs in critical dilemma zones. To address this, we propose a Bayesian adaptive priority safe reinforcement learning (BAP-SRL) based motion planning framework. Unlike heuristic weighting schemes, BAP formulates constraint prioritization as a probabilistic inference task. By modeling historical optimization difficulty as a Bayesian prior and instantaneous risk evidence as a likelihood, BAP dynamically gates gradient updates using a Bayesian inference mechanism on latent constraint criticality. Extensive experiments demonstrate that our approach outperforms state-of-the-art baselines in handling interactions with stochastic, heterogeneous agents, achieving lower collision rates and smoother conflict resolution.


💡 Research Summary

This paper tackles the challenging problem of autonomous vehicle motion planning at busy urban intersections where multiple heterogeneous traffic participants (human‑driven cars, cyclists, pedestrians, etc.) can generate simultaneous safety risks. Traditional safe reinforcement learning (Safe RL) approaches either treat safety as a single constraint or assign fixed heuristic weights to several constraints. Such static schemes fail to capture the rapid, context‑dependent changes in risk levels and often suffer from gradient cancellation: opposing constraint gradients negate each other, leading to policy paralysis or unsafe oscillations.

The authors formalize the task as a Constrained Markov Decision Process (CMDP) with K cost functions Cₖ, each representing a distinct safety requirement (e.g., front‑collision avoidance, rear‑collision avoidance, pedestrian proximity). Using the standard Lagrangian relaxation, the objective becomes a minimax problem minₗₐₘbda≥0 max_π L(π,λ) where L(π,λ)=J_R(π)−∑ₖ λₖ(J_Cₖ(π)−dₖ). In conventional primal‑dual methods, the Lagrange multipliers λₖ are updated episodically and applied uniformly as weights for the constraint gradients. This static weighting cannot react to sub‑second spikes in risk, and when gradients from different constraints point in opposite directions, the summed gradient may be near zero (gradient cancellation).

To overcome these limitations, the paper introduces Bayesian Adaptive Priority Safe Reinforcement Learning (BAP‑SRL). The key idea is to treat the importance of each constraint as a latent random variable wₖ. A Bayesian inference process computes a posterior weight wₖ at every time step by combining:

  • Prior – a distribution derived from historical optimization difficulty, i.e., the accumulated violation count and average cost for constraint k over past episodes. This captures how “hard” the constraint has been to satisfy.
  • Likelihood – an instantaneous risk evidence term obtained from the current cost critic’s output (e.g., the predicted immediate cost for constraint k).

Assuming Gaussian forms for both prior and likelihood, the posterior wₖ can be obtained analytically as another Gaussian, which is then normalized to the interval


Comments & Academic Discussion

Loading comments...

Leave a Comment