Reinforcement Learning-assisted Constraint Relaxation for Constrained Expensive Optimization

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Constraint handling plays a key role in solving realistic complex optimization problems. Though intensively discussed in the last few decades, existing constraint handling techniques predominantly rely on human experts’ designs, which more or less fall short in utility towards general cases. Motivated by recent progress in Meta-Black-Box Optimization where automated algorithm design can be learned to boost optimization performance, in this paper, we propose learning effective, adaptive and generalizable constraint handling policy through reinforcement learning. Specifically, a tailored Markov Decision Process is first formulated, where given optimization dynamics features, a deep Q-network-based policy controls the constraint relaxation level along the underlying optimization process. Such adaptive constraint handling provides flexible tradeoff between objective-oriented exploitation and feasible-region-oriented exploration, and hence leads to promising optimization performance. We train our approach on CEC 2017 Constrained Optimization benchmark with limited evaluation budget condition (expensive cases) and compare the trained constraint handling policy to strong baselines such as recent winners in CEC/GECCO competitions. Extensive experimental results show that our approach performs competitively or even surpasses the compared baselines under either Leave-one-out cross-validation or ordinary train-test split validation. Further analysis and ablation studies reveal key insights in our designs.

💡 Research Summary

**
This paper introduces RLECEO (Reinforcement Learning‑assisted Constraint Relaxation for Constrained Expensive Optimization), a meta‑learning framework that automatically learns how to adjust ε‑relaxation levels during the solution of constrained expensive optimization problems (CEOPs) where the number of allowable function evaluations is severely limited. Traditional constraint‑handling techniques—fixed or hand‑crafted ε‑relaxation, penalty functions, or surrogate‑assisted methods—rely heavily on expert knowledge and often fail to generalize across diverse problem instances. Inspired by recent advances in Meta‑Black‑Box Optimization (Meta‑BBO), the authors formulate the optimization process as a Markov Decision Process (MDP) and train a Double Deep Q‑Network (DDQN) to control the relaxation parameter dynamically.

MDP Design
The state at generation t, S_t, is a 10‑dimensional vector composed of: (1) four landscape‑analysis features (standard deviation of normalized decision variables, normalized objective‑value spread, mean of normalized variables, mean of normalized objectives) that capture exploration‑exploitation balance; (2) a progress metric for the objective function; (3) two constraint‑focused metrics – average violation of the top‑5 best individuals and the proportion of feasible solutions; (4) the ratio of consumed evaluations to the total budget; (5) the previous action a_{t‑1}; and (6) a correlation term measuring the trade‑off between objective values and constraint violations within the current population. This rich representation enables the agent to recognize whether the search is still in an exploratory phase, approaching convergence, or struggling with infeasibility.

Action Space
The policy selects an action a_t from an 11‑point discrete set {0, 0.1, …, 1}. The ε‑relaxation for each constraint is computed as ε_t = (ε_base)^{a_t}·δ, where ε_base is a vector of average initial violations per constraint and δ=10^{-3} is a minimal floor. Small a_t values enforce strict feasibility (ε≈δ), while large a_t values allow the optimizer to ignore modest violations and focus on objective improvement. The discrete design yields an exponentially uniform distribution of possible ε values, providing sufficient diversity for learning.

Reward Structure
A novel composite reward combines objective improvement (r1) and constraint‑violation reduction (r2).

r1(t) =

Reinforcement Learning-assisted Constraint Relaxation for Constrained Expensive Optimization

💡 Research Summary

Comments & Academic Discussion

Leave a Comment