MRMS-Net and LMRMS-Net: Scalable Multi-Representation Multi-Scale Networks for Time Series Classification

MRMS-Net and LMRMS-Net: Scalable Multi-Representation Multi-Scale Networks for T ime Series Classiﬁcation 1 st Celal Alag ¨ oz Computer Engineering Sivas Bilim ve T eknoloji ¨ Universitesi Siv as, T ¨ urkiye 0000-0001-9812-1473 2 nd Mehmet Kurnaz Computer Engineering Sivas Bilim ve T eknoloji ¨ Universitesi Siv as, T ¨ urkiye 0009-0003-1303-1623 3 rd Farhan Aadil Computer Engineering Sivas Bilim ve T eknoloji ¨ Universitesi Siv as, T ¨ urkiye 0000-0001-8737-2154 Abstract —Time series classiﬁcation (TSC) performance de- pends not only on architectural design but also on the diversity of input repr esentations. In this work, we propose a scalable multi- scale con volutional framework that systematically integrates structured multi-representation inputs for univariate time series. W e introduce two architectures: MRMS-Net, a hierarchi- cal multi-scale con volutional network optimized for robustness and calibration, and LMRMS-Net, a lightweight variant de- signed for efﬁciency-aware deployment. In addition, we adapt LiteMV—originally developed f or multivariate inputs—to oper - ate on multi-repr esentation univariate signals, enabling cross- repr esentation interaction. W e evaluate all models across 142 benchmark datasets under a uniﬁed experimental protocol. Critical Difference (CD) analysis conﬁrms statistically signiﬁcant performance differences among the top models. Results show that LiteMV achieves the high- est mean accuracy , MRMS-Net provides superior probabilistic calibration (lowest NLL), and LMRMS-Net offers the best efﬁ- ciency–accuracy tradeoff . Pareto analysis further demonstrates that multi-repr esentation multi-scale modeling yields a ﬂexible design space that can be tuned for accuracy-oriented, calibration- oriented, or resource-constrained settings. These ﬁndings establish scalable multi-representation multi- scale learning as a principled and practical direction for modern TSC. Reference implementation of MRMS-Net and LMRMS-Net is available at: https://github .com/alagoz/mrmsnet- tsc Index T erms —Time Series Classiﬁcation, Multi-Scale CNN, Multi-Representation Learning, Lightweight Deep Neural Net- works, Computational Efﬁciency I . I N T RO D U C T I O N TSC has witnessed substantial progress with the emergence of deep conv olutional and transformer -based architectures. Despite these advances, two fundamental aspects remain un- derexplored in a uniﬁed manner: (i) the role of structured rep- resentation div ersity , and (ii) the trade-of f between accuracy , calibration, and computational efﬁcienc y at scale. Most existing deep TSC models operate on ra w time- domain inputs, implicitly expecting the network to learn all relev ant transformations internally . Howe ver , classical signal processing suggests that complementary representations—such as deriv atives, frequency-domain projections, and autocorrela- tion structures—encode discriminativ e information that may not be easily recoverable from raw signals alone. While prior studies have explored representation ensembles or feature concatenation, systematic multi-representation learning within scalable deep architectures remains limited. In parallel, multi-scale con volutional networks have proven effecti ve for capturing temporal dependencies across v arying receptiv e ﬁelds. Y et, current multi-scale models are typi- cally optimized purely for predictiv e accuracy , with limited analysis of calibration quality and efﬁcienc y trade-offs. For large benchmark collections, such as the 142-dataset UCR archiv e [1], scalability and robustness become critical design considerations. In this work, we propose a principled multi-representation multi-scale learning framework for TSC. Our contributions are threefold: • Scalable Multi-Scale Architecture (MRMS-Net). W e introduce MRMS-Net, a hierarchical multi-scale con vo- lutional network designed to integrate structured rep- resentation groups while maintaining stable calibration performance. • Lightweight Efﬁciency-Oriented V ariant (LMRMS- Net). W e design LMRMS-Net as a computationally efﬁcient alternative that preserv es competiti ve predic- tiv e performance while signiﬁcantly reducing train- ing cost.LMRMS-Net incorporates a dynamic inference mechanism inspired by early-exit architectures [2]. Un- like static models that apply uniform computation to all samples, LMRMS-Net employs a conﬁdence-based gating strategy . By prioritizing shallow feature extraction for high-conﬁdence samples and reserving the deeper fusion block for ambiguous cases, LMRMS-Net achieves a fa vorable trade-off between predictiv e latency and clas- siﬁcation accuracy . • Multi-Representation Adaptation of LiteMV . W e re- purpose LiteMV -originally dev eloped for multi variate time series- to operate on structured multi-representation inputs of uni variate signals, enabling cross-representation interaction through multiv ariate-style modeling. W e ev aluate our models across 142 benchmark datasets under a uniﬁed experimental protocol with Monte Carlo re- sampling. Beyond reporting accuracy , we analyze macro-F1, Area Under the R OC Curve (A UC), negati ve log-likelihood (NLL), and runtime. Statistical v alidation using CD analysis conﬁrms signiﬁcant differences among the top-performing models. Our results re veal three ke y ﬁndings. First, structured multi-representation learning consistently improves perfor - mance ov er raw inputs. Second, MRMS-Net achieves superior calibration performance, while LiteMV attains the highest ov erall accuracy . Third, LMRMS-Net establishes a strong efﬁ- ciency–accurac y Pareto frontier , demonstrating that multi-scale modeling can be adapted to resource-constrained scenarios. These ﬁndings establish scalable multi-representation multi- scale learning as a ﬂexible and statistically v alidated paradigm for modern TSC. I I . R E L A T E D W O R K A. Deep Learning for T ime Series Classiﬁcation Early TSC methods relied on distance-based approaches such as nearest neighbor classiﬁers [3] with elastic similarity measures. The introduction of deep learning shifted focus tow ard conv olutional neural networks (CNNs), which demon- strated strong performance by automatically learning hierar- chical temporal features from ra w inputs. Architectures such as fully con volutional networks and residual networks became competitiv e baselines across large benchmark collections [4], [5]. More recently , attention-based [6] transformer architectures [7] have further advanced TSC performance. Ho wever , many of these approaches prioritize predictiv e accurac y without explicitly addressing calibration quality or computational scal- ability across div erse dataset characteristics. B. Multi-Scale Modeling Early multi-scale approaches, such as the Multi-Scale Con- volutional Neural Network (MCNN) [8] , introduced a trans- formation stage to extract multi-resolution features through down-sampling and smoothing. More recent multi-scale con- volutional architectures [9] aim to capture temporal depen- dencies at different resolutions through parallel con volutional branches or hierarchical receptiv e ﬁelds. These designs hav e prov en effecti ve in modeling both short-term and long-term dynamics. Other CNN models [10] similarly stack dilated or multi-size ﬁlters to capture patterns of varying receptiv e ﬁelds. These architectures excel at accuracy , but rarely analyze their calibration or ef ﬁciency . Nonetheless, existing multi- scale networks generally operate on a single raw represen- tation, implicitly assuming that scale di versity alone suf ﬁces to capture signal complexity . Beyond complex designs, Fully Con volutional Networks (FCNs) hav e demonstrated that supe- rior performance can be achiev ed through relatively simple, parameter-ef ﬁcient architectures that bypass pooling layers to preserve temporal resolution [11]. In contrast, our work combines scale diversity with rep- resentation diversity , enabling complementary information sources (e.g., time-domain deriv atives, frequency magnitudes, autocorrelation) to be jointly modeled within a uniﬁed frame- work. C. Representation Learning and Multi-V iew Appr oaches Feature-based TSC approaches [12]–[14] have long lev er- aged handcrafted transformations such as wa velets [15], Fourier coefﬁcients [16], and autocorrelation features to cap- ture diverse temporal characteristics. Ensemble-based methods [17], [18] further combine heterogeneous representations at the classiﬁer lev el to improve robustness and accuracy . Beyond individual feature sets, recent work has explored systematic integration of features extracted from multiple signal representations. For example, Crossﬁre [19] integrates features derived from deriv ativ e, autocorrelation, Fourier , co- sine, wav elet, and Hilbert representations within a uniﬁed feature extraction framework. Ev aluated on the 142 datasets of the UCR archi ve, this approach demonstrated that combin- ing complementary representations can improve classiﬁcation robustness while maintaining strong computational ef ﬁciency and scalability . In parallel, con volutional kernel-based methods such as R OCKET and its v ariants [20], [21] have shown that generat- ing large stochastic representation spaces using thousands of random conv olutional kernels can effecti vely capture complex temporal patterns. Deep learning approaches hav e also been widely applied to TSC, often relying on ensembles of identi- cal architectures trained independently to impro ve predicti ve performance [22]. While ensembling improves accuracy , these models typically rely on random initialization to introduce div ersity , which may lead to redundant feature representations. More recently , representation learning paradigms ha ve emerged to learn robust embeddings directly from time series data. Self-supervised methods such as TS2V ec [23] and TF- C [24] aim to capture temporal and spectral dependencies through contrastiv e learning objectives. In parallel, multi-view learning frameworks hav e been explored in other domains to integrate complementary data sources and improve generaliza- tion. Despite these advances, systematic integration of multiple signal representations within deep con volutional architectures remains relativ ely undere xplored. In this work, we introduce multiple representation regimes and ev aluate their impact across 142 datasets, providing large-scale empirical evidence that carefully designed representation combinations can yield consistent performance improv ements. D. Multivariate Modeling and LiteMV LiteMV [25] was originally proposed for multiv ariate TSC, modeling interactions across channels. In this work, we rein- terpret distinct signal representations as structured channels, allowing LiteMV to operate in a multi-representation setting. This adaptation enables cross-representation interaction with- out requiring inherently multiv ariate input signals, extending the applicability of multi variate architectures to representation- enhanced univ ariate problems. E. Calibration and Efﬁciency in TSC While accuracy remains the dominant ev aluation metric in TSC, probabilistic calibration has gained increasing attention due to its importance in risk-sensitiv e applications. NLL provides a principled measure of predictiv e conﬁdence quality . T o this end, MRMS-Net is designed as a high-capacity , hierar- chical architecture that leverages full representation di versity to achiev e superior calibration and robustness across complex signal domains. Furthermore, large-scale empirical ev aluations necessitate careful analysis of training and inference cost. The LITE model [26] that accuracy-competitiv e CNN architectures for TSC can be achie ved with signiﬁcantly reduced parameter counts. Similarly , Omni-Scale architectures like OS-CNN [27] emphasize the importance of capturing univ ersal patterns through diverse kernel sizes. Inspired by these efﬁcienc y- oriented design principles and neural scaling laws, LMRMS- Net incorporates lightweight conv olutional strategies, such as reduced ﬁlter sizes and computationally ef ﬁcient feature ex- traction, to maintain competitive performance while lowering computational cost. Our study e xplicitly analyzes accuracy , macro-F1, A UC, NLL, and runtime, and visualizes trade-offs using P areto analysis. T o our kno wledge, this is among the ﬁrst works to jointly e valuate multi-scale, multi-representation architectures with statistical signiﬁcance testing and ef ﬁciency–calibration tradeoff analysis across the full 142-dataset benchmark suite. I I I . M E T H O D O L O G Y A. Multi-Representation F ramework Rather than relying solely on raw time-domain signals, we construct structured representation sets designed to capture complementary temporal characteristics. F or each univ ari- ate time series x ( t ) , we consider following representations: T I M E , D T 1 , D T 2 , H LB M AG , D W T A , F F T M AG , D C T , and AC F . Here, D T 1 and DT 2 denote ﬁrst and second deriv atives, H LB M AG and F F T M AG correspond to frequency mag- nitude projections, D W T A represents wa velet approxima- tion coefﬁcients, DC T denotes discrete cosine transform coefﬁcients, and AC F represents autocorrelation features. Each representation is treated as an input channel, enabling structured multi-representation learning within con volutional architectures. This formulation allows controlled analysis of representation impact across datasets. B. Arc hitectural Overview Figure 1 provides a visual comparison between MRMS- Net and LMRMS-Net architectures. Both architectures process multi-representation inputs with shape ( R × L ) , where R is the number of representations and L is the time series length. C. MRMS-Net: Multi-Scale Representation Network MRMS-Net is designed to capture temporal dependencies at multiple recepti ve ﬁeld scales while inte grating structured representations. Giv en an input tensor of shape ( R, L ) , where R is the number of representations and L is the series length, MRMS- Net applies parallel conv olutional branches with dif ferent kernel sizes. These branches capture short-term and long-term temporal patterns simultaneously . Branch outputs are concatenated and passed through hier- archical con volutional fusion blocks consisting of: • Batch normalization • ReLU activ ation • Stacked 1 D con volutions • Dropout regularization Global average pooling aggreg ates temporal information before classiﬁcation. The architecture is optimized for stable training, controlled capacity gro wth, and robust calibration performance. D. LMRMS-Net: Lightweight Multi-Scale Network with Early Exit T o address computational ef ﬁciency , we introduce LMRMS- Net (implemented as FastMultiScaleCNN ), a lightweight multi-scale architecture with conditional early exit. 1) Ultra-Light Multi-Scale F eature Extraction: LMRMS- Net uses two shallow con volutional branches with kernel sizes 3 and 5 : b 3 = Conv1d ( R, 16 , k = 3) , b 5 = Conv1d ( R, 16 , k = 5) The branch outputs are concatenated to form a compact 32- channel representation. 2) Early Exit Classiﬁer: An early classiﬁer operates di- rectly on pooled branch features: • Adaptiv e average pooling • Fully connected layer (32 → 64) • ReLU activ ation • Output layer (64 → C ) During inference, prediction conﬁdence is computed via softmax probabilities. If the mean maximum class probability exceeds a threshold τ = 0 . 8 , the early prediction is returned. 3) Main P athway (F allback): If conﬁdence is below thresh- old, features are processed through a deeper fusion block: • BatchNorm + ReLU • Con v1d(32 → 64) • Con v1d(64 → 128) • Dropout (0.3) After global average pooling, a ﬁnal linear classiﬁer pro- duces predictions. During training, only the main pathway is used to ensure stable gradient ﬂow . Early exit is activ ated only during infer- ence. This design enables LMRMS-Net to reduce inference cost on “easy” samples while maintaining competitiv e accuracy . Input ( R × L ) Con v1D k = 3 Con v1D k = 5 Con v1D k = 7 Short Medium Long Concatenate Feature Fusion BN → ReLU → Conv → Dropout Global A vg Pool + FC (a) MRMS-Net architecture Input ( R × L ) Con v1D k = 3 Con v1D k = 5 Concatenate (32 ch) conf ≥ τ Early Exit Pool + FC Main Path Con v → Pool → FC Y es No Training: main path only (b) LMRMS-Net architecture Fig. 1: Architecture comparison of the proposed models. MRMS-Net employs three multi-scale con volution branches ( k = 3 , 5 , 7 ) followed by a feature fusion block. LMRMS-Net uses a lightweight two-branch design and incorporates a conﬁdence- based early-exit mechanism to reduce inference cost. E. LiteMV Multi-Representation Adaptation LiteMV was originally designed for multiv ariate TSC. W e reinterpret representation channels as structured pseudo- variables, enabling cross-representation interaction modeling. Formally , for a representation set of size R , the input tensor is treated as multiv ariate with R channels. LiteMV thus models: F : R R × L → R C This adaptation allows structured interaction between time- domain and frequency-domain signals without requiring inher- ently multiv ariate datasets. I V . E X P E R I M E N T A L P R OT O C O L A. Datasets and Evaluation W e ev aluate all models on 142 benchmark TSC datasets. For each dataset, we employ Monte Carlo resampling with R repeated train/test splits. Predeﬁned resampling indices are used when av ailable to ensure strict comparability with prior state-of-the-art (SOT A) studies. Performance is reported as: • Mean across resamples (per dataset), • Then macro-averaged across datasets. This a voids dataset-size bias and follows established large- scale ev aluation protocols. B. T raining Conﬁguration All models are trained using the Adam optimizer with cross- entropy loss for a maximum of 1500 epochs. Early stopping is applied based on training loss with a ﬁxed patience parameter . The best model state is restored before ev aluation. Batch size is automatically selected as a function of dataset workload ( N × L ), with dynamic adjustment to pre vent GPU out-of-memory failures. C. Evaluation Metrics For each resample, we compute Accuracy , Macro F1-score, A UC (computed for both binary and multi-class cases), NLL, and training and test time. Final rankings and statistical comparisons are conducted using the Friedman test with Nemenyi post-hoc analysis. V . R E S U LT S W e ev aluate four primary architectures across 142 UCR/UEA datasets using 30 Monte-Carlo resamples per dataset. Performance is measured using accuracy , macro-F1, A UC, and NLL. T raining and test times are also recorded to assess computational efﬁcienc y . L M R M S - N e t L i t e M V M R M S - N e t L i t e Fig. 2: CD diagram based on accuracy rankings across 142 datasets. Lower ranks indicate better performance. Methods connected by a horizontal bar are not signiﬁcantly different according to the Nemenyi test. L i t e 0 . 8 2 7 8 M e a n - A c c u r a c y L i t e M V 0 . 8 3 6 1 0 . 0 1 0 L i t e 0 . 8 2 7 8 M e a n - D i f f e r e n c e r > c / r = c / r < c W i l c o x o n p - v a l u e 0 . 0 0 5 0 . 0 0 0 0 . 0 0 5 M e a n - D i f f e r e n c e M R M S - N e t 0 . 8 2 7 5 0 . 0 0 0 2 5 9 / 2 / 8 1 0 . 1 9 5 3 0 . 0 1 0 L M R M S - N e t 0 . 8 2 7 3 0 . 0 0 0 4 6 0 / 2 / 8 0 0 . 5 2 1 2 - 0 . 0 0 8 4 5 7 / 1 / 8 4 0 . 0 0 5 6 I f i n b o l d , t h e n p - v a l u e < 0 . 0 5 Fig. 3: Multi comparision matrix. The four architectures compared in detail are: • Lite (baseline) • LiteMV (multi-view adaptation) • LMRMS-Net (Lightweight Scale Network) • MRMS-Net (Multi-Scale Network) All statistical comparisons are performed using the Fried- man test follo wed by Nemenyi post-hoc analysis across 142 datasets. A. Overall P erformance Comparison T able I reports mean performance across all datasets. LiteMV achieves the highest mean accuracy and macro-F1, while MRMS-Net achiev es the best calibration (lo west NLL). LMRMS-Net provides competitive accuracy with signiﬁcantly reduced computational cost. B. Statistical Signiﬁcance Acr oss 142 Datasets Figure 2 presents the CD diagram based on accuracy rankings across 142 datasets. The Friedman test indicates statistically signiﬁcant dif- ferences among methods ( p < 0 . 05 ). The Nemenyi post- hoc test shows: (i) LiteMV ranks ﬁrst overall, (ii) Lite and MRMS-Net are statistically indistinguishable from LiteMV , and (iii) LMRMS-Net remains competiti ve but slightly lower in av erage rank. Importantly , no architecture dominates all others across ev ery dataset, conﬁrming that improvements are dataset- dependent. C. Efﬁciency–P erformance T radeof f Figure 4 shows the Pareto tradeoff between mean training time and accuracy . Marker size represents A UC, and color 0 . 8 3 6 0 . 8 3 4 0 . 8 3 2 0 . 8 3 0 0 . 8 2 8 0 . 8 2 6 1 2 L M R M S - N e t 1 4 1 6 L i t e M V 1 8 2 0 L i t e 2 2 2 4 M R M S - N e t 2 6 0 . 6 7 0 . 6 6 0 . 6 5 0 . 6 4 0 . 6 3 0 . 6 2 MeanTrainingTime (seconds) Mean Accuracy Mean Negative Log-Likelihood (NLL) Fig. 4: Pareto tradeoff between mean training time and clas- siﬁcation accuracy . Marker size represents mean A UC, while color encodes mean NLL. The dashed curve denotes the P areto frontier , identifying models that achie ve optimal tradeof fs between predictiv e performance and computational cost. 0 . 8 3 6 0 . 8 3 4 0 . 8 3 2 0 . 8 3 0 0 . 8 2 8 M R M S - N e t 0 . 6 2 0 . 6 3 0 . 6 4 L i t e M V 0 . 6 5 0 . 6 6 0 . 6 7 L i t e L M R M S - N e t MeanNegativeLog-Likelihood(NLL) Mean Accuracy B e t t e r c a l i b r a t i o n & a c c u r a c y Fig. 5: Accuracy versus mean NLL across ev aluated architec- tures. encodes NLL. LMRMS-Net lies near the Pareto frontier, achieving near-SO T A accuracy with substantially reduced training time. LiteMV provides the strongest accuracy while maintaining moderate training cost. MRMS-Net achiev es su- perior calibration but at increased computational expense. This demonstrates that scalable multi-scale modeling can be adapted to different operating regimes: • Accuracy-oriented regime: LiteMV • Efﬁciency-oriented regime: LMRMS-Net • Calibration-oriented regime: MRMS-Net D. Calibration Analysis Figure 5 plots accuracy versus NLL. While several models achiev e similar accuracy , MRMS-Net consistently achiev es lower NLL values, indicating better probabilistic calibration. This suggests that multi-scale feature aggregation contributes not only to classiﬁcation accuracy b ut also to improv ed uncer- tainty estimation. T ABLE I: Mean performance across 142 datasets (best values in bold). Architecture Accuracy F1 A UC NLL ↓ T rain Time (s) T est Time (s) Lite 0.828 0.802 0.936 0.675 21.26 0.059 LiteMV 0.836 0.812 0.938 0.647 18.72 0.088 LMRMS-Net 0.827 0.801 0.939 0.677 11.70 0.027 MRMS-Net 0.828 0.799 0.938 0.615 25.35 0.088 E. Impact of Representations 1) Arc hitecture–Repr esentation Interaction: The beneﬁt of representation expansion varies across architectures: • LiteMV beneﬁts most strongly , likely due to cross- channel interaction. • MRMS-Net shows stable improvements, indicating inher- ent robustness to representation div ersity . • LMRMS-Net achieves optimal efﬁciency under the Min- imal setting. These ﬁndings demonstrate that representation div ersity interacts with architectural design in non-trivial ways. 2) Efﬁciency Considerations: Although the Default repre- sentation often yields the highest accurac y , it increases training time. The Minimal set captures most gains at signiﬁcantly lower computational cost, offering a strong tradeoff point. F . Summary of Findings Across 142 datasets, the results support three main conclu- sions: 1) Multi-view representation expansion signiﬁcantly im- prov es performance over raw time-domain inputs. 2) Scalable multi-scale con volution provides strong calibra- tion beneﬁts. 3) Lightweight variants can achieve near state-of-the-art performance with substantially reduced computational cost. Overall, combining structured representation div ersity with scalable multi-scale architectures forms a robust and efﬁcient framew ork for TSC. V I . D I S C U S S I O N This study provides large-scale empirical evidence across 142 datasets that performance in TSC is gov erned by three interacting factors: architectural capacity , representation diver - sity , and computational scalability . A. Arc hitecture vs. Representation A key ﬁnding is that representation div ersity consistently improv es performance across all architectures. Moving from raw time-domain inputs to the Minimal representation set yields substantial gains in both accuracy and macro-F1. Ex- panding further to the Default set provides smaller but con- sistent improv ements, suggesting diminishing returns be yond a compact, informativ e transformation core. Interestingly , the magnitude of improv ement depends on architectural design. LiteMV beneﬁts the most from repre- sentation expansion, indicating that cross-view interactions effecti vely exploit complementary feature domains. In con- trast, MRMS-Net exhibits more stable performance across representation regimes, suggesting inherent robustness due to multi-scale aggregation. LMRMS-Net achie ves its best ef ﬁ- ciency–accurac y balance under the Minimal setting, indicating that lightweight models beneﬁt most from carefully curated representation subsets. These results highlight that representation engineering and architectural design should not be treated independently; their interaction is central to scalable TSC performance. B. Accuracy vs. Calibration While LiteMV achiev es the highest mean accuracy , MRMS- Net consistently attains the lowest NLL, indicating superior probabilistic calibration. This suggests that multi-scale hier- archical aggregation impro ves uncertainty estimation beyond pure classiﬁcation accuracy . This distinction is important for applications requiring re- liable conﬁdence estimates, such as medical diagnosis or anomaly detection. The results imply that architectural depth and multi-scale structure contribute differently to discrimina- tion and calibration. C. Efﬁciency Considerations From an ef ﬁciency standpoint, LMRMS-Net demonstrates that competitive accuracy can be achiev ed with substantially reduced training and inference cost. The Pareto analysis shows that LMRMS-Net lies near the efﬁciency frontier , making it attractiv e for large-scale or resource-constrained deployments. Importantly , the representation expansion strategy does not break scalability . The Minimal representation captures most performance gains while preserving computational ef ﬁciency , making it a strong default conﬁguration for practical applica- tions. D. Statistical Robustness The Friedman and Nemenyi analyses conﬁrm statistically signiﬁcant differences among architectures. Howe ver , no sin- gle model dominates across all datasets. This reinforces the importance of reporting average ranks and performing multi- dataset statistical testing rather than relying solely on mean accuracy . Overall, the results demonstrate that scalable multi-scale con volution combined with structured multi-representation in- puts forms a robust and adaptable TSC framework. V I I . C O N C L U S I O N W e introduced a scalable multi-scale con volutional frame- work for TSC and systematically ev aluated its beha vior across 142 benchmark datasets. Our contributions can be summarized as follows: 1) W e demonstrated that structured representation expan- sion (Raw → Minimal → Def ault) consistently improves classiﬁcation performance. 2) W e showed that adapting LiteMV to multi- representation univ ariate inputs provides strong accuracy gains. 3) W e proposed MRMS-Net and LMRMS-Net, scalable multi-scale architectures that balance accuracy , calibra- tion, and computational efﬁcienc y . 4) W e provided statistically rigorous comparisons using CD analysis and Pareto tradeoff e valuation. The results rev eal that: • LiteMV achieves the highest mean accuracy across 142 datasets. • MRMS-Net provides superior calibration performance. • LMRMS-Net achieves competitive accuracy with signif- icantly reduced training cost. These ﬁndings suggest that combining representation di ver- sity with scalable multi-scale modeling offers a ﬂexible design space that can be tuned for accuracy-oriented, calibration- oriented, or efﬁcienc y-oriented regimes. Future work will inv estigate adaptive representation selec- tion, dynamic multi-scale attention mechanisms, and extension to large-scale multiv ariate benchmarks. Overall, this study establishes that scalable multi- representation multi-scale learning is a principled and practical direction for modern TSC. R E F E R E N C E S [1] H. A. Dau, A. Bagnall, K. Kamgar , C.-C. M. Y eh, Y . Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. Keogh, “The ucr time series archiv e, ” IEEE/CAA Journal of Automatica Sinica , vol. 6, no. 6, pp. 1293–1305, 2019. [2] S. T eerapittayanon, B. McDanel, and H.-T . Kung, “Branchynet: Fast inference via early exiting from deep neural networks, ” in 2016 23rd international conference on pattern r ecognition (ICPR) . IEEE, 2016, pp. 2464–2469. [3] Y .-H. Lee, C.-P . W ei, T .-H. Cheng, and C.-T . Y ang, “Nearest-neighbor- based approach to time-series classiﬁcation, ” Decision Support Systems , vol. 53, no. 1, pp. 207–217, 2012. [4] B. Dhariyal, T . Le Nguyen, and G. Ifrim, “Back to basics: A sanity check on modern time series classiﬁcation algorithms, ” in International W orkshop on Advanced Analytics and Learning on T emporal Data . Springer , 2023, pp. 205–229. [5] A. Shifaz, C. Pelletier , F . Petitjean, and G. I. W ebb, “Elastic similarity and distance measures for multiv ariate time series, ” Knowledge and Information Systems , vol. 65, no. 6, pp. 2665–2698, 2023. [6] Y . W ang, Z. Zhang, L. Feng, Y . Ma, and Q. Du, “ A ne w attention-based cnn approach for crop mapping using time series sentinel-2 images, ” Computers and electr onics in agricultur e , vol. 184, p. 106090, 2021. [7] X.-M. Le, L. Luo, U. Aickelin, and M.-T . T ran, “Shapeformer: Shapelet transformer for multivariate time series classiﬁcation, ” in Proceedings of the 30th A CM SIGKDD Conference on Knowledge Discovery and Data Mining , 2024, pp. 1484–1494. [8] Z. Cui, W . Chen, and Y . Chen, “Multi-scale con volutional neural net- works for time series classiﬁcation, ” arXiv preprint , 2016. [9] H. Ismail Fa waz, B. Lucas, G. Forestier , C. Pelletier, D. F . Schmidt, J. W eber, G. I. W ebb, L. Idoumghar , P .-A. Muller, and F . Petitjean, “Inceptiontime: Finding alexnet for time series classiﬁcation, ” Data mining and knowledge discovery , vol. 34, no. 6, pp. 1936–1962, 2020. [10] W . X. Cheng, P . N. Suganthan, and R. Katuwal, “Time series classiﬁca- tion using div ersiﬁed ensemble deep random vector functional link and resnet features, ” Applied Soft Computing , vol. 112, p. 107826, 2021. [11] Z. W ang, W . Y an, and T . Oates, “Time series classiﬁcation from scratch with deep neural networks: A strong baseline, ” in 2017 International joint conference on neural networks (IJCNN) . IEEE, 2017, pp. 1578– 1585. [12] C. H. Lubba, S. S. Sethi, P . Knaute, S. R. Schultz, B. D. Fulcher, and N. S. Jones, “catch22: Canonical time-series characteristics: Selected through highly comparative time-series analysis, ” Data mining and knowledge discovery , vol. 33, no. 6, pp. 1821–1852, 2019. [13] M. Middlehurst and A. Bagnall, “The freshprince: A simple transforma- tion based pipeline time series classiﬁer, ” in International Confer ence on P attern Recognition and Artiﬁcial Intelligence . Springer , 2022, pp. 150–161. [14] M. Christ, N. Braun, J. Neuffer , and A. W . Kempa-Liehr , “Time series feature extraction on basis of scalable hypothesis tests (tsfresh–a python package), ” Neurocomputing , vol. 307, pp. 72–77, 2018. [15] D. Li, T . F . Bissyande, J. Klein, and Y . L. Traon, “T ime series classiﬁ- cation with discrete wa velet transformed data, ” International Journal of Softwar e Engineering and Knowledge Engineering , vol. 26, no. 09n10, pp. 1361–1377, 2016. [16] P . Sch ¨ afer , “Scalable time series classiﬁcation, ” Data Mining and Knowledge Discovery , vol. 30, no. 5, pp. 1273–1298, 2016. [17] V . M. Souza, P . S. V eiga, and A. G. Ribeiro, “V isemble: A fast ensemble approach for time series classiﬁcation with multiple visual representations, ” Knowledge-Based Systems , vol. 309, p. 112864, 2025. [18] M. Middlehurst, J. Large, M. Flynn, J. Lines, A. Bostrom, and A. Bag- nall, “Hive-cote 2.0: a new meta ensemble for time series classiﬁcation, ” Machine Learning , vol. 110, no. 11, pp. 3211–3243, 2021. [19] C. Alag ¨ oz, “Crossﬁre: cross-domain feature integration for robust time series classiﬁcation, ” P eerJ Computer Science , vol. 11, p. e3328, 2025. [20] A. Dempster, F . Petitjean, and G. I. W ebb, “Rocket: exceptionally fast and accurate time series classiﬁcation using random conv olutional kernels, ” Data Mining and Knowledge Discovery , vol. 34, no. 5, pp. 1454–1495, 2020. [21] C. W . T an, A. Dempster , C. Bergmeir , and G. I. W ebb, “Multirocket: multiple pooling operators and transformations for f ast and effecti ve time series classiﬁcation: Cw tan, ” Data Mining and Knowledge Discovery , vol. 36, no. 5, pp. 1623–1646, 2022. [22] J. Abdullayev , M. Dev anne, C. Meyer , A. Ismail-Fawaz, J. W eber , and G. Forestier , “Enhancing time series classiﬁcation with div ersity-driven neural network ensembles, ” in 2025 International Joint Conference on Neural Networks (IJCNN) . IEEE, 2025, pp. 1–10. [23] Z. Y ue, Y . W ang, J. Duan, T . Y ang, C. Huang, Y . T ong, and B. Xu, “Ts2vec: T owards univ ersal representation of time series, ” in Proceed- ings of the AAAI confer ence on artiﬁcial intelligence , vol. 36, no. 8, 2022, pp. 8980–8987. [24] X. Zhang, Z. Zhao, T . Tsiligkaridis, and M. Zitnik, “Self-supervised contrastiv e pre-training for time series via time-frequency consistency , ” Advances in neural information pr ocessing systems , vol. 35, pp. 3988– 4003, 2022. [25] A. Ismail-Fawaz, M. Devanne, S. Berretti, J. W eber , and G. Forestier, “Look into the lite in deep learning for time series classiﬁcation, ” International Journal of Data Science and Analytics , vol. 20, no. 4, pp. 4029–4049, 2025. [26] A. Ismail Fawaz, M. Dev anne, S. Berretti, J. W eber, and G. Forestier , “Lite: Light inception with boosting techniques for time series classiﬁ- cation, ” in 2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA) . IEEE, 2023, pp. 1–10. [27] W . T ang, G. Long, L. Liu, T . Zhou, M. Blumenstein, and J. Jiang, “Omni-scale cnns: a simple and effecti ve kernel size conﬁguration for time series classiﬁcation, ” arXiv preprint , 2020.

MRMS-Net and LMRMS-Net: Scalable Multi-Representation Multi-Scale Networks for Time Series Classification

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment