Data-Driven Control via Conditional Mean Embeddings: Formal Guarantees via Uncertain MDP Abstraction
Controlling stochastic systems with unknown dynamics and under complex specifications is specially challenging in safety-critical settings, where performance guarantees are essential. We propose a data-driven policy synthesis framework that yields formal performance guarantees for such systems using conditional mean embeddings (CMEs) and uncertain Markov decision processes (UMDPs). From trajectory data, we learn the system’s transition kernel as a CME, then construct a finite-state UMDP abstraction whose transition uncertainties capture learning and discretization errors. Next, we generate a policy with formal performance bounds through robust dynamic programming. We demonstrate and empirically validate our method through a temperature regulation benchmark.
💡 Research Summary
The paper introduces a novel data‑driven control synthesis framework that provides formal performance guarantees for stochastic systems with unknown dynamics, by combining conditional mean embeddings (CMEs) with uncertain Markov decision process (UMDP) abstractions. The authors start from trajectory data consisting of state‑control‑next‑state triples for each control input. Using a positive‑semidefinite kernel (typically Gaussian), they embed the unknown transition kernel into a reproducing kernel Hilbert space (RKHS) and obtain an empirical CME estimator (\hat\mu_u(x)) via kernel ridge regression. A key theoretical contribution is a finite‑sample concentration inequality: for any confidence level (\delta) and error tolerance (\epsilon), if the number of samples (N) exceeds a computable bound (N_{\epsilon,\delta}), then (|\mu_u-\hat\mu_u|_{H_K}\le\epsilon) holds with probability at least (1-\delta). The bound is made explicit under the assumption that the underlying dynamics are Lipschitz and the kernel is Gaussian, linking the error to the kernel bandwidth, Lipschitz constant, and regularization parameter.
Next, the continuous state space is discretized into a finite grid of cells. For each cell (s) and action (a), the empirical CME evaluated at the cell’s centroid provides a nominal transition probability vector (\hat p_{s,a}). The authors then construct an ambiguity set (\Gamma_{s,a}) that captures (i) the CME estimation error (\epsilon), (ii) the discretization error proportional to the cell radius (\eta) and the system’s Lipschitz constant, and (iii) any additional modeling uncertainties. This yields a finite‑state uncertain MDP (UMDP) where each transition is not a single probability distribution but a set of admissible distributions.
Policy synthesis proceeds via robust dynamic programming (RDP). The UMDP is interpreted as a two‑player zero‑sum game: the controller selects actions while an adversary selects the worst‑case transition within each ambiguity set. By solving the robust Bellman equations, the algorithm computes a value function that lower‑bounds the true reach‑avoid probability for any admissible transition model. The resulting stationary policy (\pi) is then lifted back to the original continuous system by mapping each continuous state to its containing cell and applying the cell’s action. The authors prove that for any initial state (x_0), the true probability of satisfying the reach‑avoid specification lies within a computable interval (
Comments & Academic Discussion
Loading comments...
Leave a Comment