ChaosNexus: A Foundation Model for ODE-based Chaotic System Forecasting with Hierarchical Multi-scale Awareness

ChaosNexus: A Foundation Model for ODE-based Chaotic System Forecasting with Hierarchical Multi-scale Awareness
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Foundation models have shown great promise in achieving zero-shot or few-shot forecasting for ODE-based chaotic systems via large-scale pretraining. However, existing architectures often fail to capture the multi-scale temporal structures and distinct spectral characteristics of chaotic dynamics. To address this, we introduce ChaosNexus, a foundation model for chaotic system forecasting underpinned by the proposed ScaleFormer architecture. By processing temporal contexts across hierarchically varying patch sizes, ChaosNexus effectively captures long-range dependencies and preserves high-frequency fluctuations. To address heterogeneity across distinct systems, we integrate Mixture-of-Experts (MoE) layers into each ScaleFormer block and explicitly condition the final forecasts on a learned frequency fingerprint, providing the model with a global spectral view of the system. Extensive evaluations on over 9,000 synthetic systems demonstrate that ChaosNexus achieves superior fidelity in long-term attractor statistics while maintaining competitive point-wise accuracy. Furthermore, in real-world applications, it achieves a remarkable zero-shot mean error below 1°C for 5-day station-based weather forecasting. Codes are available at https://github.com/TomXaxaxa/ChaosNexus.


💡 Research Summary

ChaosNexus is a foundation model designed for zero‑shot and few‑shot forecasting of ordinary‑differential‑equation (ODE) based chaotic systems. The authors identify two major shortcomings of existing architectures: (1) an inability to capture the intrinsic multi‑scale temporal structure of chaotic dynamics, and (2) a lack of mechanisms to handle the pronounced spectral heterogeneity across different chaotic systems (e.g., Lorenz‑63 versus Lorenz‑96). To address these issues, they introduce a novel backbone called ScaleFormer, which adopts a U‑Net‑style encoder‑decoder architecture that processes temporal contexts at hierarchically varying patch sizes.

In the encoder, successive patch‑merging layers double the effective temporal receptive field, thereby acting as low‑pass filters in deeper layers while preserving high‑frequency information in the early layers. This design is motivated by the Nyquist‑Shannon sampling theorem, ensuring that each hierarchical level naturally focuses on a specific frequency band. The decoder mirrors this process with patch‑expansion layers and skip connections, reconstructing fine‑grained details lost during encoding.

Each ScaleFormer block incorporates a Mixture‑of‑Experts (MoE) module. A set of specialist experts and a shared expert are gated by a routing network that selects the top‑K specialists based on the input’s dynamical characteristics. Dual axial attention (separating variable‑axis and temporal‑axis) reduces computational complexity from O(S²V²) to O(S²+V²), while rotary positional embeddings and FlashAttention improve stability and efficiency. This MoE‑enhanced transformer enables the model to allocate dedicated parameters to distinct dynamical regimes, effectively handling the spectral diversity observed across chaotic systems.

A further innovation is the frequency‑enhanced joint scale readout. The decoder produces representations at L different temporal scales; each is temporally pooled, concatenated, and linearly fused into a unified dynamics vector. Crucially, the model is conditioned on a “frequency fingerprint” extracted from the historical observations using a wavelet scattering transform. The scattering coefficients are pooled to a compact spectral summary, which is concatenated with the multi‑scale decoder features before the final linear projection. This explicit spectral conditioning provides the model with a global view of the system’s energy distribution, allowing it to adapt its multi‑scale fusion to the specific power spectrum of the target system.

Training employs a composite loss that balances short‑term pointwise accuracy (e.g., MSE/MAE) with long‑term statistical fidelity. The latter is measured by discrepancies between the reconstructed and true attractor distributions using metrics such as Maximum Mean Discrepancy (MMD) or optimal transport distances, encouraging preservation of invariant chaotic statistics (Lyapunov exponents, fractal dimensions, autocorrelation functions).

Experiments are conducted on two fronts. First, the model is pretrained on a synthetic corpus of ~20,000 ODE chaotic systems (the “Panda” dataset) covering a wide range of dimensions, parameters, and initial conditions. Zero‑shot evaluation on over 9,000 unseen synthetic systems shows that ChaosNexus outperforms state‑of‑the‑art baselines (Panda, DynaMix) in both pointwise error and attractor‑level metrics. Second, real‑world weather forecasting is tested on station‑based 5‑day temperature prediction. Without any fine‑tuning, ChaosNexus achieves a mean absolute error below 1 °C, surpassing baselines that have been fine‑tuned on more than 470 k samples from the target domain.

In summary, ChaosNexus advances chaotic system forecasting by (i) structurally encoding multi‑scale temporal information through hierarchical patch processing, (ii) dynamically allocating expert parameters via MoE to respect system‑specific spectral signatures, and (iii) explicitly conditioning on a wavelet‑derived frequency fingerprint to provide a global spectral context. The paper demonstrates that these design choices lead to superior zero‑shot performance on both massive synthetic benchmarks and practical meteorological tasks, establishing a new paradigm for foundation models in dynamical system prediction. Future work may explore adaptive expert scaling, incorporation of physical constraints (e.g., conservation laws), and extension to partial‑differential‑equation‑based high‑dimensional systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment