CMR: Contractive Mapping Embeddings for Robust Humanoid Locomotion on Unstructured Terrains
Robust disturbance rejection remains a longstanding challenge in humanoid locomotion, particularly on unstructured terrains where sensing is unreliable and model mismatch is pronounced. While perception information, such as height map, enhances terrain awareness, sensor noise and sim-to-real gaps can destabilize policies in practice. In this work, we provide theoretical analysis that bounds the return gap under observation noise, when the induced latent dynamics are contractive. Furthermore, we present Contractive Mapping for Robustness (CMR) framework that maps high-dimensional, disturbance-prone observations into a latent space, where local perturbations are attenuated over time. Specifically, this approach couples contrastive representation learning with Lipschitz regularization to preserve task-relevant geometry while explicitly controlling sensitivity. Notably, the formulation can be incorporated into modern deep reinforcement learning pipelines as an auxiliary loss term with minimal additional technical effort required. Further, our extensive humanoid experiments show that CMR potently outperforms other locomotion algorithms under increased noise.
💡 Research Summary
The paper introduces Contractive Mapping for Robustness (CMR), a novel framework designed to improve the disturbance rejection capabilities of humanoid robots operating on unstructured terrains where sensor noise and sim‑to‑real gaps are severe. The authors first provide a theoretical analysis of how observation noise affects the expected return of a policy. In Theorem 1 they show that, under bounded policy Jacobian M, the return gap scales as O(H·L_r·L_f·M·δ_max), where H is the horizon, L_f and L_r are the Lipschitz constants of the dynamics and reward, and δ_max is the maximum observation perturbation. This bound reveals that when the dynamics are not contractive (L_f ≥ 1) the error can grow exponentially with horizon length, making long‑horizon locomotion fragile.
To overcome this, the authors propose mapping the high‑dimensional observations into a latent space via an embedding ϕ that satisfies a contraction property: ‖ϕ(sₜ₊₁) − ϕ(s′ₜ₊₁)‖ ≤ κ‖ϕ(sₜ) − ϕ(s′ₜ)‖ + εₜ with 0 < κ < 1. Theorem 2 proves that in such a contractive latent space the return gap is bounded by O(η/(1−κ)), where η measures the expected action discrepancy between the optimal and current policies. Crucially, this bound is independent of the horizon H, indicating that the contraction eliminates the exponential error amplification seen in Theorem 1.
Implementation-wise, CMR combines three loss components: (1) an InfoNCE contrastive loss (L_InfoNCE) that preserves semantic discrimination by pulling together temporally close states and pushing apart distant ones; (2) a Lipschitz regularization term (L_Lipschitz) that penalizes violations of the contraction condition, explicitly encouraging ‖ϕ(sₜ₊₁) − ϕ(s′ₜ₊₁)‖² ≤ κ²‖ϕ(sₜ) − ϕ(s′ₜ)‖²; and (3) the standard PPO objective (L_PPO) for policy optimization. The total objective is L_CMR = L_InfoNCE + λ·L_Lipschitz + L_PPO, where λ balances semantic fidelity against contraction strength.
The observation space includes command velocities, proprioceptive signals (joint positions, velocities, base angular velocity), an egocentric height‑map (15 × 15 grid), and the previous action. Uniform noise scaled by a global factor α is added during training, primarily to the perception channel, to emulate real‑world sensor degradation. The authors evaluate CMR on six challenging terrain types (including stairs, stepping stones, and balance beams) under three noise regimes (observation, perception, and sim‑to‑real mismatches). Baselines comprise classic model‑based controllers (ZMP, LIP), recent Lipschitz‑constrained RL (LCP), and other learning‑based locomotion methods.
Results show that CMR consistently outperforms all baselines, achieving 15–30 % higher average returns across noise levels, and maintaining stable gait on the most difficult terrains where other methods fail. Ablation studies reveal that removing the contrastive term leads to loss of task‑relevant information, while omitting the Lipschitz regularizer reduces noise attenuation, confirming the necessity of both components. A “sim‑to‑sim” transfer experiment demonstrates near‑zero performance drop when the trained policy is deployed in a different simulator, indicating strong generalization afforded by the contractive embedding.
The paper’s contributions are threefold: (1) introduction of a contractive embedding framework for humanoid locomotion; (2) rigorous theoretical bounds linking contraction strength κ to robustness against observation noise; (3) a practical training pipeline that integrates seamlessly with existing deep RL workflows. Limitations include the sensitivity to hyper‑parameters κ and λ—over‑contracting can discard essential environmental cues—and the current focus on height‑map perception, leaving multi‑modal extensions (e.g., raw images, lidar) for future work. Overall, CMR offers a compelling blend of theory and practice, advancing the state of robust humanoid locomotion on noisy, unstructured terrains.
Comments & Academic Discussion
Loading comments...
Leave a Comment