Bridging the Sim-to-real Gap: A Control Framework for Imitation Learning of Model Predictive Control
To address the computational challenges of Model Predictive Control (MPC), recent research has studied using imitation learning to approximate MPC with a computationally efficient Deep Neural Network (DNN). However, this introduces a common issue in learning-based control, the simulation-to-reality (sim-to-real) gap. Inspired by Robust Tube MPC, this study proposes a new control framework that addresses this issue from a control perspective. The framework ensures the DNN operates in the same environment as the source domain, addressing the sim-to-real gap with great data collection efficiency. Moreover, an input refinement governor is introduced to address the DNN’s inability to adapt to variations in model parameters, enabling the system to satisfy MPC constraints more robustly under parameter-changing conditions. The proposed framework was validated through two case studies: cart-pole control and vehicle collision avoidance control, which analyzed the principles of the proposed framework in detail and demonstrated its application to a vehicle control case.
💡 Research Summary
The paper addresses the well‑known challenge of the simulation‑to‑real (sim‑to‑real) gap that arises when deep neural networks (DNNs) are trained to imitate Model Predictive Control (MPC) using data collected in simulation. While imitation learning can reduce the computational burden of MPC by orders of magnitude, the resulting controller often fails in the real world because the state distribution encountered during deployment differs from the one seen during training. Existing remedies such as Domain Randomization (DR) attempt to broaden the source domain by injecting random disturbances, but they suffer from four major drawbacks: (1) difficulty in selecting appropriate randomization factors, (2) a steep increase in required training data, (3) overly conservative control policies, and (4) incomplete coverage of the true gap.
Inspired by Robust Tube MPC (R‑TMPC), the authors propose a novel control architecture that eliminates the need for extensive randomization. In R‑TMPC, the system dynamics are split into a nominal model and an error dynamics driven by disturbances. A nominal controller computes an optimal input for the nominal model, while an ancillary controller (often a linear feedback law) keeps the actual state inside a bounded “error tube” around the nominal trajectory. The tube size is derived from known bounds on disturbances and model uncertainties, and the original state/input constraints are tightened accordingly, guaranteeing recursive feasibility and robust constraint satisfaction.
The proposed framework adopts this structure but replaces the nominal controller with a DNN that has been trained to mimic MPC only on the nominal model. Consequently, the DNN receives the nominal state (\bar{x}) as its sole input and outputs a nominal control command (\pi_{\theta}(\bar{x})). The ancillary controller (\kappa(x,\bar{x})) then corrects the discrepancy between the real state (x) and the nominal state (\bar{x}). Because the DNN never sees the real state, its operational domain is automatically confined to the Nominal Model‑Based Domain (S_nom), eliminating exposure to untrained states and rendering DR unnecessary. Training therefore requires only demonstrations generated by MPC on the nominal model, dramatically reducing data collection effort.
To handle variations in plant parameters (e.g., mass, friction, aerodynamic drag) that the DNN alone cannot compensate for, the authors introduce an Input Refinement Governor. This add‑on refines the combined input (u = \pi_{\theta}(\bar{x}) + \kappa(x,\bar{x})) in real time to ensure that all MPC constraints (state limits, input saturation, safety margins) remain satisfied despite parameter changes. The governor operates without additional learning, leveraging the ancillary controller’s feedback to adjust the DNN output as needed.
Theoretical contributions include a formal definition of the nominal model‑based domain, a proposition proving that the DNN’s target domain equals this nominal domain when embedded in the proposed architecture, and a loss formulation that minimizes the mean‑squared error between MPC’s nominal inputs and the DNN’s outputs over trajectories generated in (S_{nom}).
Experimental validation is performed on two benchmark problems:
-
Cart‑Pole Swing‑Up – The proposed method achieves significantly lower overshoot and faster settling compared with a DR‑trained DNN, while maintaining constraint satisfaction (e.g., pole angle limits). The ancillary controller successfully keeps the real pole trajectory inside the predicted error tube.
-
Vehicle Collision‑Avoidance – A nonlinear vehicle model with varying road‑friction coefficients and wind disturbances is used. The DNN‑based nominal controller, combined with the ancillary controller and the input refinement governor, respects acceleration and steering‑angle constraints even when parameters shift by up to 30 %. In contrast, a DR‑based baseline frequently violates constraints under the same conditions.
Key advantages highlighted by the authors are:
- Data Efficiency: No need for large randomized datasets; only nominal‑model demonstrations are required.
- Reduced Conservatism: Because the DNN imitates a less conservative MPC (the tube tightening is handled by the ancillary controller), performance is closer to the original MPC.
- Robustness to Parameter Changes: The input refinement governor enables constraint‑compliant operation without retraining.
The paper concludes by noting that while the current implementation assumes known bounds on disturbances and a reasonably accurate nominal model, future work will explore automated tube‑size computation for highly nonlinear systems, integration with adaptive or learning‑based ancillary controllers, and real‑hardware experiments to further confirm the approach’s practicality.
Comments & Academic Discussion
Loading comments...
Leave a Comment