Enhancing Time Series Classification with Diversity-Driven Neural Network Ensembles

Enhancing Time Series Classification with Diversity-Driven Neural Network Ensembles
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Ensemble methods have played a crucial role in achieving state-of-the-art (SOTA) performance across various machine learning tasks by leveraging the diversity of features learned by individual models. In Time Series Classification (TSC), ensembles have proven highly effective whether based on neural networks (NNs) or traditional methods like HIVE-COTE. However most existing NN-based ensemble methods for TSC train multiple models with identical architectures and configurations. These ensembles aggregate predictions without explicitly promoting diversity which often leads to redundant feature representations and limits the benefits of ensembling. In this work, we introduce a diversity-driven ensemble learning framework that explicitly encourages feature diversity among neural network ensemble members. Our approach employs a decorrelated learning strategy using a feature orthogonality loss applied directly to the learned feature representations. This ensures that each model in the ensemble captures complementary rather than redundant information. We evaluate our framework on 128 datasets from the UCR archive and show that it achieves SOTA performance with fewer models. This makes our method both efficient and scalable compared to conventional NN-based ensemble approaches.


💡 Research Summary

The paper addresses a critical shortcoming of current neural‑network ensembles for time‑series classification (TSC): although ensembles of identical deep models (e.g., InceptionTime, H‑InceptionTime, LITE) achieve state‑of‑the‑art (SOTA) accuracy, they rely solely on random initialization to induce diversity. In practice, the individual networks often converge to very similar feature representations, limiting the theoretical benefits of ensembling.

To remedy this, the authors propose a diversity‑driven ensemble framework that explicitly encourages each member to learn complementary features. The core idea is a feature orthogonality loss that penalizes cosine similarity between the global feature vectors (taken just before the final pooling layer) of different ensemble members. Formally, the total loss for an M‑member ensemble is

L_total = L_CE + λ · (1/(M(M‑1))) ∑_{i≠j} cos_sim(f_i, f_j),

where L_CE is the standard cross‑entropy, f_i denotes the feature vector of model i, and λ controls the strength of the orthogonal regularization. Unlike prior work that applies orthogonality to convolutional filters, this approach operates at the feature‑level, ensuring that the representations fed to the classifier are truly diverse.

Training proceeds sequentially: the first model is trained normally; each subsequent model is trained while keeping the already‑trained models fixed and adding the orthogonality term that references their features. This sequential scheme guarantees that every new model is explicitly pushed away from the feature subspace already occupied by earlier models. The authors adopt the lightweight LITE architecture as the base learner because it offers a small parameter budget (≈2 % of InceptionTime) while retaining competitive accuracy.

Experiments were conducted on all 128 datasets of the UCR archive, using a 5‑fold cross‑validation protocol. The proposed method was compared against (i) standard ensembles of five independently trained LITE models, (ii) InceptionTime‑5, (iii) H‑InceptionTime‑5, and (iv) LITETime‑5. Results show that the diversity‑driven ensemble matches or exceeds the average accuracy of the five‑model baselines while using fewer models (as few as three). On the BirdChicken dataset, the decorrelated ensemble achieved 100 % test accuracy, surpassing the 95 % best reported by InceptionTime‑5.

To quantify diversity, the authors compute Fréchet Inception Distance (FID) between the feature distributions of ensemble members, observing a ~30 % reduction relative to standard ensembles, indicating more dispersed representations. t‑SNE visualizations further illustrate that feature clusters from different models are well separated when the orthogonality loss is applied.

The paper acknowledges limitations: sequential training increases wall‑clock time compared with fully parallel ensembles, the regularization weight λ must be tuned per dataset, and the study is confined to homogeneous architectures. Future work could explore adaptive λ scheduling, parallelizable decorrelation mechanisms, and extensions to heterogeneous ensembles (e.g., mixing CNNs with Transformers).

In summary, the work makes three substantive contributions: (1) a novel feature‑level orthogonal loss that directly enforces representation diversity, (2) a sequential training protocol that integrates this loss without altering the underlying network architecture, and (3) extensive empirical validation showing SOTA‑level performance with fewer models and demonstrable diversity gains. This advances the practical efficiency and theoretical understanding of deep ensembles for time‑series classification.


Comments & Academic Discussion

Loading comments...

Leave a Comment