A variational approach to dimension-free self-normalized concentration
We study the self-normalized concentration of vector-valued stochastic processes. We focus on bounds for “sub-$ψ$” processes, a well-known and quite general class of process that encompasses a wide variety of well-known tail conditions (including sub-exponential, sub-Gaussian, sub-gamma, sub-Poisson, and several heavy-tailed settings without a moment generating function such as symmetric or bounded 2nd or 3rd moments). Our results recover and generalize the influential bound of de la Peña et al. [20] (proved again in Abbasi-Yadkori et al. [2]) in the sub-Gaussian case. Further, we fill a gap in the literature between determinant-based bounds and more recent bounds based on condition numbers. As applications we prove a Bernstein inequality for random vectors satisfying a moment condition (a more general condition than boundedness), and also provide the first dimension-free self-normalized empirical Bernstein inequality. Our techniques are based on the variational (PAC-Bayes) approach to concentration.
💡 Research Summary
The paper tackles the problem of self‑normalized concentration for vector‑valued stochastic processes. Classical results, dating back to de la Peña and later popularized by Abbasi‑Yadkori et al., provide high‑probability bounds on the quantity ‖Sₜ‖_{Vₜ⁻¹} that depend on the log‑determinant of the accumulated variance matrix Vₜ, but these results were limited to sub‑Gaussian processes. More recent work by Whitehouse et al. extended the theory to the much broader sub‑ψ class, yet the resulting bounds involve the condition number κ(Vₜ) and the ambient dimension d, which can be problematic when Vₜ is ill‑conditioned or the dimension is large.
The authors introduce a variational (PAC‑Bayes) framework to derive self‑normalized concentration inequalities that are both dimension‑free and determinant‑based for the full sub‑ψ family. The key technical device is a data‑dependent mixture distribution: by selecting a posterior distribution Q that depends on the observed (Sₜ,Vₜ) and paying a price measured by the KL‑divergence KL(Q‖P) (where P is a fixed prior, typically Gaussian), they obtain a martingale inequality that naturally yields a log‑determinant term. This approach can be viewed as a data‑dependent “method of mixtures” where the mixture’s parameters are optimized via a variational bound rather than fixed a priori.
The paper’s contributions are organized as follows:
-
Recovery of the sub‑Gaussian bound (Theorem 3.1). Using the variational technique with ψ(λ)=λ²/2, the authors exactly reproduce the Abbasi‑Yadkori bound ‖S_τ‖_{V_τ⁻¹} ≤ √{2 log det V_τ + 2 log(1/δ)} for any stopping time τ, demonstrating that the new method subsumes the classical result.
-
General sub‑ψ line‑crossing inequality (Theorem 4.1). For any λ>0 they prove a bound of the form
‖Sₜ‖_{Vₜ⁻¹} ≤ ψ*⁻¹((log det Vₜ + log(1/δ))/λ) + λ,
where ψ* is the convex conjugate of ψ. Crucially, the bound contains no factor of the ambient dimension d. -
Stitching across time (Theorems 4.4 and 4.9). By partitioning time into geometrically growing epochs and applying the line‑crossing inequality with different λ values in each epoch, the authors obtain a uniform‑in‑time bound that incurs only an iterated logarithm penalty (log log det Vₜ). For sub‑gamma processes they give explicit constants; for the broader sub‑ψ class they provide an asymptotic version.
-
Self‑normalized Bennett and Bernstein inequalities (Section 5). When ψ corresponds to a Bernstein‑type condition (i.e., the process satisfies a variance‑plus‑bias bound rather than a strict sub‑Gaussian tail), the authors derive a Bennett‑type inequality that reduces to a Bernstein bound of the form
‖Sₜ‖_{Vₜ⁻¹} ≤ √{2σ² log det Vₜ + 2c log(1/δ)} + (… ),
where σ² and c are parameters governing the second and third moments. This extends prior work that required bounded observations. -
Self‑normalized empirical Bernstein inequality (Section 6). By replacing the true variance matrix with its empirical counterpart (\hat V_t) (the sum of outer products of observed vectors), they obtain a dimension‑free bound that mirrors the classic empirical Bernstein inequality but now holds for the self‑normalized quantity. This is the first such result that is both determinant‑based and free of the dimension.
The paper also includes detailed proofs, a discussion of the relationship between determinant‑based and condition‑number‑based bounds, and simulation studies confirming the tightness of the new inequalities. The authors argue that determinant‑based bounds are advantageous in settings where Vₜ is poorly conditioned, a common situation in contextual bandits, system identification, and high‑dimensional time‑series analysis.
In summary, the work demonstrates that the variational (PAC‑Bayes) approach, previously successful for non‑self‑normalized concentration, can be adapted to the self‑normalized setting to produce sharp, dimension‑free, determinant‑based concentration inequalities for the broad sub‑ψ class. This unifies and extends a line of research spanning scalar self‑normalized results, vector‑valued sub‑Gaussian bounds, and recent sub‑ψ developments, and opens the door to further applications in online learning, adaptive control, and high‑dimensional statistics where robust, data‑dependent confidence regions are essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment