TIFO: Time-Invariant Frequency Operator for Stationarity-Aware Representation Learning in Time Series
Nonstationary time series forecasting suffers from the distribution shift issue due to the different distributions that produce the training and test data. Existing methods attempt to alleviate the dependence by, e.g., removing low-order moments from each individual sample. These solutions fail to capture the underlying time-evolving structure across samples and do not model the complex time structure. In this paper, we aim to address the distribution shift in the frequency space by considering all possible time structures. To this end, we propose a Time-Invariant Frequency Operator (TIFO), which learns stationarity-aware weights over the frequency spectrum across the entire dataset. The weight representation highlights stationary frequency components while suppressing non-stationary ones, thereby mitigating the distribution shift issue in time series. To justify our method, we show that the Fourier transform of time series data implicitly induces eigen-decomposition in the frequency space. TIFO is a plug-and-play approach that can be seamlessly integrated into various forecasting models. Experiments demonstrate our method achieves 18 top-1 and 6 top-2 results out of 28 forecasting settings. Notably, it yields 33.3% and 55.3% improvements in average MSE on the ETTm2 dataset. In addition, TIFO reduces computational costs by 60% -70% compared to baseline methods, demonstrating strong scalability across diverse forecasting models.
💡 Research Summary
The paper tackles the pervasive problem of distribution shift in non‑stationary time‑series forecasting by moving the analysis from the time domain to the frequency domain. Existing approaches typically normalize low‑order statistics (mean, variance) on a per‑sample basis, which assumes that a global standard distribution (e.g., N(0, 1)) adequately represents the whole dataset. This assumption fails when the data exhibit complex, time‑varying structures such as modality changes, high‑order moments, or shifting spectral characteristics.
To address these shortcomings, the authors propose the Time‑Invariant Frequency Operator (TIFO), a two‑stage, plug‑and‑play module that learns stationarity‑aware weights over the entire frequency spectrum of a dataset.
Stage I – Dataset‑level stationarity learning
All training sequences are transformed with a Discrete Fourier Transform (DFT). For each frequency‑channel pair (k, c), the mean μ and standard deviation σ of the amplitude across all samples are computed. A stationarity score S(k, c) is defined (e.g., μ/σ); high values indicate that the frequency component is consistently present (stationary) across the dataset, while low values signal non‑stationary behavior. This score is stored as a tensor S and later supplied to a lightweight neural network.
Stage II – Sample‑specific weighting and forecasting
During model training, each input sequence is again DFT‑transformed to obtain real (R) and imaginary (I) coefficient matrices. Two independent multilayer perceptrons (MLPs) take the pre‑computed S as input and output multiplicative weights λ_r and λ_i for the real and imaginary parts, respectively. The weighted coefficients (R ⊙ λ_r, I ⊙ λ_i) are inverse‑DFT‑transformed back to the time domain, producing a modified series eX that emphasizes stationary frequencies and suppresses non‑stationary ones. The resulting eX is fed into any downstream forecasting architecture (e.g., DLinear, PatchTST, iTransformer). The whole pipeline is trained end‑to‑end with the usual forecasting loss (MSE).
Theoretical justification
The authors argue that the Fourier transform implicitly defines a kernel in frequency space, which admits an orthonormal eigen‑basis via spectral decomposition (Berg et al., 1984). The learned λ values correspond to eigenvalues that capture how much each spectral component contributes to the overall data distribution. By learning dataset‑wide eigenvalues, TIFO can isolate and down‑weight the frequencies responsible for distributional discrepancies between training and test periods.
Empirical evaluation
Experiments span seven public multivariate time‑series benchmarks (ETTm1, ETTm2, Weather, Traffic, Electricity, ILI, etc.) and 28 different forecasting configurations (varying input lengths, horizons, and variable counts). TIFO is integrated with three state‑of‑the‑art forecasters: DLinear, PatchTST, and iTransformer. Results show that TIFO‑augmented models achieve the best performance in 18 settings and second‑best in 6 settings. Notably, on the highly non‑stationary ETTm2 dataset, PatchTST and iTransformer improve their average MSE by 33.3 % and 55.3 %, respectively. Moreover, because TIFO replaces costly per‑sample normalization with a single dataset‑level statistic and lightweight MLPs, computational overhead drops by 60 %–70 % in 16 out of 28 scenarios, demonstrating strong scalability.
Strengths
- Directly addresses spectral shifts, which are the primary manifestation of non‑stationarity in many real‑world series.
- Provides a unified, dataset‑wide view of stationarity, overcoming the myopic per‑sample normalization of prior work.
- Plug‑and‑play design allows seamless adoption across diverse forecasting backbones without architectural redesign.
- Theoretical grounding links the learned weights to eigenvalues of a frequency‑space kernel, offering interpretability.
Limitations and future directions
- Computing DFT and the global statistics requires a full pass over the training set, which may be memory‑intensive for extremely large or streaming datasets.
- The current linear Fourier basis may struggle with highly non‑linear or transient patterns; extensions to wavelet or adaptive time‑frequency representations could capture richer dynamics.
- Separate MLPs for real and imaginary parts increase parameter count modestly; joint complex‑valued networks might be more efficient.
- Online updating of the stationarity tensor S to accommodate evolving data streams remains an open problem.
In summary, TIFO introduces a novel frequency‑domain perspective on distribution shift, learns dataset‑level stationarity weights, and demonstrates substantial accuracy gains and computational savings across a broad set of forecasting tasks. It represents a significant step toward robust, non‑stationary time‑series prediction and opens several avenues for further research in adaptive spectral learning.
Comments & Academic Discussion
Loading comments...
Leave a Comment