Sparsified-Learning for High-Dimensional Heavy-Tailed Locally Stationary Time Series, Concentration and Oracle Inequalities

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Sparse learning is ubiquitous in many machine learning tasks. It aims to regularize the goodness-of-fit objective by adding a penalty term to encode structural constraints on the model parameters. In this paper, we develop a flexible sparse learning framework tailored to high-dimensional heavy-tailed locally stationary time series (LSTS). The data-generating mechanism incorporates a regression function that changes smoothly over time and is observed under noise belonging to the class of sub-Weibull and regularly varying distributions. We introduce a sparsity-inducing penalized estimation procedure that combines additive modeling with kernel smoothing and define an additive kernel-smoothing hypothesis class. In the presence of locally stationary dynamics, we assume exponentially decaying $β$-mixing coefficients to derive concentration inequalities for kernel-weighted sums of locally stationary processes with heavy-tailed noise. We further establish nonasymptotic prediction-error bounds, yielding both slow and fast convergence rates under different sparsity structures, including Lasso and total variation penalization with the least-squares loss. To support our theoretical results, we conduct numerical experiments on simulated LSTS with sub-Weibull and Pareto noise, highlighting how tail behavior affects prediction error across different covariate-dimensions as the sample size increases.

💡 Research Summary

This paper develops a comprehensive sparse learning framework for high‑dimensional locally stationary time series (LSTS) subject to heavy‑tailed innovations. The authors consider a non‑parametric regression model Yₜ,ₜ = m⋆(t/T, Xₜ,ₜ) + εₜ,ₜ where the regression surface m⋆(·,·) varies smoothly over rescaled time u = t/T, the covariates Xₜ,ₜ are d‑dimensional, and the noise εₜ,ₜ follows either a sub‑Weibull distribution (with tail parameter η) or a regularly varying Pareto‑type distribution (with tail index α). The underlying stochastic process is assumed to be β‑mixing with exponentially decaying coefficients, which captures realistic temporal dependence in many applications such as finance, economics, and environmental science.
To handle the non‑stationarity, the authors employ kernel smoothing with a bandwidth h_T that shrinks as the sample size T grows. The localized weighted least‑squares loss is
L_T(θ) = Σₜ K_h(t/T − u) (Yₜ,ₜ − θᵀXₜ,ₜ)²,
where K_h is a symmetric kernel. Sparsity is induced by a combination of an ℓ₁ (Lasso) penalty λ₁‖θ‖₁ and a weighted total‑variation (TV) penalty λ_TV TV(θ) = Σ_j w_j|θ_{j+1} − θ_j|. The resulting estimator solves
θ̂ = arg min_θ { L_T(θ) + λ₁‖θ‖₁ + λ_TV TV(θ) }.
The theoretical contributions are twofold. First, the paper derives novel concentration inequalities for kernel‑weighted sums of β‑mixing LSTS with heavy‑tailed noise. By combining block‑wise mixing arguments with martingale difference techniques, the authors obtain tail bounds of the form
P( | Σₜ K_h(t/T−u) εₜ,ₜ | > x ) ≤ 2 exp( −c x^α / (T h_T)^{α/2} ),
where α depends on the sub‑Weibull parameter η or the Pareto index. These bounds explicitly reflect the interaction between the bandwidth, sample size, mixing decay, and tail heaviness.
Second, leveraging the concentration results, the paper establishes non‑asymptotic oracle inequalities for the proposed estimator. Without any restricted eigenvalue (RE) condition, a “slow‑rate” bound of order √(s log d / (T h_T)) is proved for the prediction error, where s is the true sparsity level. Under a standard RE condition, a “fast‑rate” bound of order s log d / (T h_T) is obtained. For the TV penalty, an additional term proportional to the number of change‑points appears, reflecting the piecewise‑constant structure. These results extend classical Lasso theory to the challenging setting of locally stationary, heavy‑tailed, dependent data.
On the computational side, the authors design a proximal algorithm (a FISTA‑type scheme) that alternates between kernel‑weighted gradient steps and soft‑thresholding operations for both ℓ₁ and TV penalties. The algorithm enjoys an O(1/k²) convergence rate and requires only O(d) memory, making it scalable to high dimensions.
Extensive simulations are conducted with dimensions d = 50, 200, 500 and sample sizes T = 200, 500, 1000. The noise is generated from sub‑Weibull distributions with η ∈ {0.5, 1, 2} and Pareto distributions with α ∈ {1.5, 2.5}, combined with β‑mixing coefficients of varying strength. Performance is evaluated in terms of mean‑squared prediction error and variable‑selection metrics (F1‑score). The experiments demonstrate that (i) heavier tails (smaller η or α) increase error but the proposed method consistently outperforms standard Lasso without kernel smoothing, (ii) the TV penalty substantially improves support recovery when the true coefficient path is piecewise constant, and (iii) appropriate bandwidth selection yields notable gains in sample efficiency.
In summary, the paper provides a rigorous statistical foundation for sparse estimation in high‑dimensional, non‑stationary, heavy‑tailed time series, delivers practical algorithms, and validates the theory through comprehensive numerical studies. Future directions suggested include extensions to multivariate dependence structures, adaptive bandwidth selection, and online/streaming implementations.

Sparsified-Learning for High-Dimensional Heavy-Tailed Locally Stationary Time Series, Concentration and Oracle Inequalities

💡 Research Summary

Comments & Academic Discussion

Leave a Comment