Online Coreset Selection for Learning Dynamic Systems

Online Coreset Selection for Learning Dynamic Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the increasing availability of streaming data in dynamic systems, a critical challenge in data-driven modeling for control is how to efficiently select informative data to characterize system dynamics. In this work, we develop an online coreset selection method for set-membership identification in the presence of process disturbances, improving data efficiency while preserving convergence guarantees. Specifically, we derive a stacked polyhedral representation that over-approximates the feasible parameter set. Based on this representation, we propose a geometric selection criterion that retains a data point only if it induces a sufficient contraction of the feasible set. Theoretically, the feasible-set volume is shown to converge to zero almost surely under persistently exciting data and a tight disturbance bound. When the disturbance bound is mismatched, an explicit Hausdorff-distance bound is derived to quantify the resulting identification error. In addition, an upper bound on the expected coreset size is established and extensions to nonlinear systems with linear-in-the-parameter structures and to bounded measurement noise are discussed. The effectiveness of the proposed method is demonstrated through comprehensive simulation studies.


💡 Research Summary

This paper addresses the problem of online identification of dynamic systems when data arrive continuously and the system is subject to bounded process disturbances. Classical set‑membership identification (SMI) characterizes the set of all parameters consistent with the observed data, but retaining every data point quickly becomes computationally prohibitive in streaming scenarios. The authors propose an online coreset selection framework that dramatically reduces the amount of stored data while preserving the convergence guarantees of SMI.

The key technical contribution is a “stacked polyhedral over‑approximation” of the feasible parameter set. By stacking row‑wise polyhedral constraints derived from each data point, the feasible set is represented as a single high‑dimensional polyhedron Θ̂ that can be updated efficiently. When a new data pair (zₖ₋₁, xₖ) arrives, the algorithm evaluates how much the volume of Θ̂ would shrink if the new constraint were added. This evaluation relies on a generalized Grünbaum inequality, which provides a lower bound on the ratio of the volumes before and after adding a constraint. If the contraction ratio falls below a user‑defined threshold α₀ (0 < α₀ < 1), the data point is deemed informative and is retained in the coreset; otherwise it is discarded.

Theoretical analysis proceeds in three stages. First, under persistently exciting inputs and a correctly specified disturbance bound W, the volume of the coreset‑based polyhedron converges to zero almost surely (Theorem 1). The proof uses martingale convergence arguments and the Borel‑Cantelli lemma to establish “with probability one” convergence. Second, when the disturbance bound is over‑estimated, the authors derive an explicit Hausdorff‑distance bound between the coreset‑induced feasible set and the full‑data feasible set (Theorem 2). This bound scales linearly with the bound mismatch and with the threshold α₀, providing a clear trade‑off between robustness to model misspecification and data reduction. Third, they bound the expected cardinality of the coreset (Theorem 3). By linking the selection probability to the information content of each incoming datum, they show that the expected number of retained points grows at most logarithmically with the total number of samples, and that the constant factor can be tuned via α₀.

The framework is extended to (i) nonlinear systems that are linear in the unknown parameters (e.g., polynomial or rational models) and (ii) scenarios with bounded measurement noise. In both cases, analogous stacked polyhedral representations are constructed, and the same selection rule guarantees convergence (Corollaries 4 and 5).

Algorithm 1 details the online procedure: (1) maintain the current stacked polyhedron; (2) compute the contraction ratio for each new datum; (3) add the datum if the ratio is below α₀; (4) periodically prune redundant constraints using the double‑description method to keep the polyhedron compact.

Simulation studies validate the theory. In a 2‑dimensional linear system, the coreset size is reduced to roughly 10–15 % of the total data while the parameter estimation error remains indistinguishable from that obtained with all data. A 3‑joint robotic arm with a nonlinear dynamics model demonstrates similar reductions (≈15 % of data) and maintains tracking errors within 5 % of the full‑data benchmark. Experiments with over‑estimated disturbance bounds confirm the predicted Hausdorff‑distance behavior.

Overall, the paper delivers a principled, computationally tractable method for online data reduction in set‑membership system identification. By coupling a geometric contraction criterion with a stacked polyhedral representation, it achieves substantial storage and computational savings without sacrificing the almost‑sure convergence of the feasible set. The approach is broadly applicable to real‑time control, adaptive robotics, and any domain where streaming data and bounded uncertainties coexist. Future work may explore scalable implementations for very high‑dimensional systems and integration with controller synthesis that directly exploits the evolving feasible set.


Comments & Academic Discussion

Loading comments...

Leave a Comment