Score-based change point detection via tracking the best of infinitely many experts

Score-based change point detection via tracking the best of infinitely many experts
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We propose an algorithm for nonparametric online change point detection based on sequential score function estimation and the tracking the best expert approach. The core of the procedure is a version of the fixed share forecaster tailored to the case of infinite number of experts and quadratic loss functions. The algorithm shows promising results in numerical experiments on artificial and real-world data sets. Its performance is supported by rigorous high-probability bounds describing behaviour of the test statistic in the pre-change and post-change regimes.


💡 Research Summary

The paper tackles the problem of online, non‑parametric change‑point detection by recasting it as a prediction‑with‑expert‑advice task. Observations X₁,…,X_T are assumed i.i.d. from a pre‑change density p up to an unknown time τ, after which they are drawn from a different density q. No parametric assumptions are placed on p or q; the only requirement is that their score functions ∇log p and ∇log q are smooth and can be approximated by a suitable function basis.

The authors introduce a continuous family of “experts” indexed by a d‑dimensional parameter θ ∈ ℝᵈ. Each expert corresponds to a parametric density p_θ. To evaluate an expert at round t they define a quadratic loss ℓₜ(θ)=½ θᵀAₜθ − bₜᵀθ, where Aₜ and bₜ are constructed from the current observation Xₜ using Green’s first identity. This loss is an unbiased estimator of the Fisher‑divergence between the true density (p before τ, q after τ) and the candidate p_θ. Consequently, the expected loss of the best expert before τ is smaller than that of any expert after τ, and vice‑versa.

Two online forecasters are built on top of this loss structure:

  1. Exponentially Weighted Average (EWA) – with a Gaussian prior N(0,λ⁻¹I). Because the loss is quadratic, the posterior mean can be expressed in closed form as θ̂_t = (A₁:t + λ η_t I)⁻¹ b₁:t, where η_t is the learning rate. The cumulative loss of EWA, L^{EW}_{1:t}, serves as a baseline that tracks the best static expert.

  2. Fixed‑Share (FS) forecaster – a variant of the classic fixed‑share algorithm adapted to an infinite expert set. A special prior ρ over compound experts (sequences of experts) is defined by mixing a “stay” probability (1 − α) with a “switch” probability α. The resulting predictive distribution r_t can be updated recursively via a simple linear combination of the previous posterior and a fresh prior, avoiding the need to enumerate infinitely many compound experts.

The test statistic is the difference of cumulative losses: S_t = L^{EW}{1:t} − L^{FS}{1:t}. Before the change (t < τ) the two forecasters behave similarly, so S_t remains close to zero. After the change, the FS forecaster quickly adapts to the new optimal expert, causing its cumulative loss to drop relative to EWA; consequently S_t grows approximately linearly with slope equal to the Fisher‑divergence Δ = F(p,q). Detection is declared when S_t exceeds a pre‑chosen threshold γ.

Theoretical contributions include high‑probability bounds for both regimes. In the pre‑change regime, with appropriate choices of η_t and α, |S_t| ≤ C √{t log t} holds with probability at least 1 − δ, guaranteeing a low false‑alarm rate. In the post‑change regime, S_t ≥ Δ(t − τ) − C √{(t − τ) log(t − τ)} with the same confidence, which yields a detection delay of order O((log (1/δ))/Δ). Importantly, these results hold without sub‑Gaussian tail assumptions or explicit parametric forms for p and q.

Empirical evaluation covers synthetic scenarios (univariate mean shift, multivariate Gaussian mixtures, high‑dimensional Gaussian changes) and real‑world streams (network traffic, financial price series). The proposed FS‑EWA method consistently outperforms classical CUSUM, Page‑Hinkley, and recent online mirror‑descent based detectors in terms of both detection delay and false‑alarm probability. Moreover, because the loss is quadratic, the algorithm runs in O(d²) time per observation, making it scalable to dimensions where log‑loss based methods become computationally prohibitive.

In summary, the paper delivers a novel, theoretically sound, and practically efficient framework for online change‑point detection that leverages infinite‑expert fixed‑share learning and Fisher‑divergence based loss estimation. Future directions suggested include extending the approach to other Bregman divergences, handling multiple simultaneous change points, and integrating deep neural representations for the score functions.


Comments & Academic Discussion

Loading comments...

Leave a Comment