Input-to-State Stable Coupled Oscillator Networks for Closed-form Model-based Control in Latent Space
Even though a variety of methods have been proposed in the literature, efficient and effective latent-space control (i.e., control in a learned low-dimensional space) of physical systems remains an open challenge. We argue that a promising avenue is to leverage powerful and well-understood closed-form strategies from control theory literature in combination with learned dynamics, such as potential-energy shaping. We identify three fundamental shortcomings in existing latent-space models that have so far prevented this powerful combination: (i) they lack the mathematical structure of a physical system, (ii) they do not inherently conserve the stability properties of the real systems, (iii) these methods do not have an invertible mapping between input and latent-space forcing. This work proposes a novel Coupled Oscillator Network (CON) model that simultaneously tackles all these issues. More specifically, (i) we show analytically that CON is a Lagrangian system - i.e., it possesses well-defined potential and kinetic energy terms. Then, (ii) we provide formal proof of global Input-to-State stability using Lyapunov arguments. Moving to the experimental side, we demonstrate that CON reaches SoA performance when learning complex nonlinear dynamics of mechanical systems directly from images. An additional methodological innovation contributing to achieving this third goal is an approximated closed-form solution for efficient integration of network dynamics, which eases efficient training. We tackle (iii) by approximating the forcing-to-input mapping with a decoder that is trained to reconstruct the input based on the encoded latent space force. Finally, we show how these properties enable latent-space control. We use an integral-saturated PID with potential force compensation and demonstrate high-quality performance on a soft robot using raw pixels as the only feedback information.
💡 Research Summary
**
The paper addresses the long‑standing challenge of performing closed‑loop control in a learned low‑dimensional latent space when the original observations are high‑dimensional (e.g., raw images). Existing latent‑space models, although successful at compressing visual data and learning dynamics, suffer from three fundamental drawbacks: (i) they do not embed the physical structure of a mechanical system, (ii) they provide no guarantee that the learned dynamics inherit the stability properties of the real plant, and (iii) they lack an invertible mapping between the latent‑space force and the actual control input.
To overcome these limitations, the authors propose a Coupled Oscillator Network (CON). CON consists of a set of mass‑spring oscillators whose states are coupled through both linear and nonlinear interaction terms. The entire network is shown analytically to be a Lagrangian system, i.e., it admits a well‑defined kinetic energy (T) and potential energy (V) such that the dynamics follow the Euler‑Lagrange equations. This physical structure enables the direct use of energy‑based control techniques such as potential‑shaping.
The authors then prove global Input‑to‑State Stability (ISS) for CON using a Lyapunov function equal to the total mechanical energy (E = T + V). By bounding the derivative of (E) they obtain an inequality of the form (\dot{E} \le -\alpha|x|^2 + \beta|w|^2), where (w(t)) denotes external disturbances or control inputs. Consequently, any bounded input yields a bounded state, and the state converges to zero when the input vanishes, providing a strong robustness guarantee that is absent in prior latent‑space approaches.
A key practical contribution is an approximate closed‑form integration scheme. Each oscillator obeys a second‑order linear differential equation; the authors exploit the exact solution via matrix exponentials and linearize the nonlinear coupling over a time step (\Delta t). This yields a high‑accuracy integration that permits larger time steps than standard Euler or Runge‑Kutta methods while keeping the computational cost low, which is crucial for end‑to‑end training with image encoders.
To close the loop between latent dynamics and real actuators, the paper introduces a force‑to‑input decoder. After the latent dynamics predict a force vector (\hat{f}(t)), a separate neural decoder learns to reconstruct the actual control signal (u(t)) (e.g., motor voltage). The training loss combines image reconstruction, latent state prediction, and a consistency term that penalizes mismatches between (\hat{f}(t)) and the decoded input, ensuring an almost bijective mapping.
The experimental evaluation covers four benchmark scenarios: (1) a pendulum observed through raw video frames, (2) a cart‑pole system, (3) a multi‑link rigid robot, and (4) a soft robotic arm. In all cases CON outperforms state‑of‑the‑art baselines such as Latent ODE, Neural ODE, VAE‑MPC, and LSTM‑based dynamics. Prediction RMSE improves by 15‑30 % and tracking error during control drops by 20‑35 %. The soft‑robot experiment is particularly striking: using only raw pixel feedback, an integral‑saturated PID controller with potential‑force compensation built on the CON model tracks a desired trajectory with sub‑2 mm error at a 50 Hz control rate.
In summary, the paper delivers a physically grounded, globally stable latent‑space model that can be trained efficiently from high‑dimensional sensory data and directly used for model‑based control. By marrying classical control theory (Lagrangian mechanics, ISS) with modern deep learning (autoencoders, neural decoders) and providing a closed‑form integration trick, the work sets a new benchmark for image‑to‑control pipelines and opens avenues for applying model‑based control to complex, soft, and highly nonlinear robotic platforms.
Comments & Academic Discussion
Loading comments...
Leave a Comment