Turning Time Series into Algebraic Equations: Symbolic Machine Learning for Interpretable Modeling of Chaotic Time Series

Chaotic time series are notoriously difficult to forecast. Small uncertainties in initial conditions amplify rapidly, while strong nonlinearities and regime dependent variability constrain predictability. Although modern deep learning often delivers …

Authors: Madhurima Panja, Grace Younes, Tanujit Chakraborty

Turning Time Series into Algebraic Equations: Symbolic Machine Learning for Interpretable Modeling of Chaotic Time Series
T urning Time Series in to Algebraic Equations: Sym b olic Mac hine Learning for In terpretable Mo deling of Chaotic Time Series Madh urima P anja 1 , † , Grace Y ounes 1 , † , and T anujit Chakrab ort y 1 , 2 , ∗ 1 SAFIR, Sorb onne Universit y Abu Dhabi, UAE 2 Sorb onne Cen ter for Artificial Intelligence, Sorb onne Univ ersity , P aris, F rance † Join t First Author Marc h 10, 2026 Abstract Chaotic time series are notoriously difficult to forecast: small uncertainties in initial conditions amplify rapidly , while strong nonlinearities and regime-dependent v ariabilit y constrain predictabilit y . Although mo dern deep learning often delivers strong short-horizon accuracy , its black-box nature limits scientific insight and practical trust in settings where understanding the underlying dynamics matters. T o address this gap, w e prop ose tw o complementary symbolic forecasters that learn ex- plicit, interpretable algebraic equations from chaotic time series data. Sym b olic Neural F orecaster (SyNF) adapts a neural net work–based equation learning architecture to the forecasting setting, enabling end-to-end differentiable discov ery of compact and interpretable algebraic relations. The Sym b olic T ree F orecaster (SyTF) builds on ev olutionary sym b olic regression to search directly o ver equation structures under a principled accuracy–complexit y trade-off. W e ev aluate b oth approaches in a rolling-windo w now casting setting (one-step-ahead) using sev eral accuracy metrics and compare against a broad suite of baselines spanning classical statistical models, tree ensembles, and state-of- the-art deep learning arc hitectures. Numerical experiments co ver a benchmark of 132 lo w-dimensional c haotic attractors and tw o real-w orld c haotic time series (weekly dengue incidence in San Juan and the Ni ˜ no 3.4 sea-surface temp erature index). Across datasets, sym b olic forecasters ac hiev e competi- tiv e one-step-ahead accuracy while providing transparen t equations that reveal salient aspects of the underlying dynamics. 1 In tro duction F orecasting complex dynamical systems remains one of the central c hallenges in mo dern science [57]. In domains ranging from epidemiology and climate to ecology and net work ed systems, the evolution of observ able quan tities is gov erned by nonlinear interactions, regime changes, and feedback mechanisms that are only partially understo o d [3]. Classical forecasting approac hes t ypically fall in to t wo broad categories: empirical mo dels that fit observed tra jectories directly [28] and mechanistic mo dels derived from gov erning differential equations [9]. While empirical models often provide short-term predictiv e accuracy , they may lac k structural v alidit y outside the calibration windo w. Con versely , mec hanistic mo dels offer interpretabilit y and theoretical grounding but require strong prior assumptions and inv olv e c hallenging inv erse problems for parameter estimation [62]. ∗ Email: tanujit.c hakraborty@sorbonne.ae 1 Recen t adv ances in data-driven mo deling ha ve sought to bridge this gap. The emerging paradigm of data-driven equation discov ery aims to infer explicit sym b olic representations of gov erning equations directly from observ ational data, com bining the predictive flexibility of mac hine learning with the trans- parency of mathematical models [59]. In contrast to black-box predictiv e architectures, equation disco v ery metho ds yield compact algebraic or differential expressions that are interpretable, analytically tractable, and p oten tially generalizable across regimes. This paradigm has gained momentum across disciplines, including physics, geosciences, and epidemiology , where unco vering hidden dynamical structure is as im- p ortan t as forecasting accuracy . Among mo dern approaches, neural-sym b olic regression has emerged as a p o werful strategy for discov ering go verning equations in high-dimensional and net work ed systems [27]. By combining neural netw orks for nonlinear representation learning with sym b olic regression for equation parsing, these methods reduce dimensionality while retaining interpretabilit y . How ever, despite rapid progress in discov ering go verning equations of dynamical systems, the role of sym b olic regression in time-series forecasting, particularly under chaotic dynamics, remains underexplored. Most existing w orks focus on reco v ering differen tial equations or static functional relationships, rather than ev aluating predictiv e p erformance in rolling forecasting settings. Chaotic systems present a particularly stringent challenge for forecasting metho dologies [32]. Their sensitiv e dep endence on initial conditions limits long-horizon predictabilit y , while nonlinear in teractions and regime-dependent behaviors restrict the use of conv en tional linear or purely data-driv en mo dels [3]. In practice, forecasters often observe only partial system states, further complicating equation recov ery . This raises a critical question: Can symb olic r e gr ession disc over c omp act, interpr etable for e c asting e qua- tions that r emain c omp etitive with mo dern machine le arning mo dels in chaotic time-series pr e diction? F urthermore, c haotic forecasting is not merely a theoretical c hallenge but a problem of substantial prac- tical imp ortance. Applications range from predicting the El Ni ˜ no–Southern Oscillation [1] and trac king dengue fev er outbreaks [2] to an ticipating rogue o cean wa v es [4] and monitoring instability in financial mark ets [36]. In each of these domains, uncertain ty amplifies rapidly , and nonlinear feedbac k mecha- nisms dominate system evolution. Impro ving short-term predictive accuracy in such settings, therefore, carries significan t so cietal and economic implications, particularly when forecasts inform public health in terven tions and climate risk managemen t. Mo dern data-driv en approac hes suc h as reservoir computing [29] and deep recurrent net works [26] hav e demonstrated strong short-term forecasting ability in c haotic settings, y et they remain largely “blac k-b o x” mo dels. While effectiv e lo cally , their long-horizon forecasts deteriorate rapidly and their p erformance is often sensitive to noise, data scarcity , and hyperparameter tuning. Ev en recent long-sequence arc hitec- tures, including T ransformers and their time-series v ariants [64, 72], demand substantial computational resources while offering zero structural interpretabilit y . T o address these concerns, metho ds grounded in dynamical systems theory hav e b een explored. Phase-space reconstruction, based on T ak ens’ em b ed- ding theorem and its extensions [61, 55, 18], reconstructs attractor geome try through dela y embeddings, but ma y struggle with complex nonlinear couplings, even with refinements suc h as Ma’s in verse dela y ed em b edding [60]. Analytical indicators can provide early-warning signals when gov erning equations are kno wn [16, 3], y et lack general applicability , while vector-field reconstruction estimates lo cal flow directly from data but deteriorates under sparse or noisy observ ations [20]. These strands reveal a p ersistent tension: highly accurate models often lack interpretabilit y , whereas transparent dynamical techniques can fail in complex, data-limited en vironmen ts. Recen t adv ances in sparse system identification, notably SINDy [10], and p erturbation-based learning approac hes [40, 19], demonstrate that compact, closed-form equations can be inferred directly from data, rekindling interest in sym b olic tec hniques for interpretable forecasting. Symbolic regression (SR) pro vides a natural resolution to this problem by jointly minimizing prediction error and mo del complexity , thereb y uncov ering concise form ulas that b oth fit observed data and exp ose the underlying functional relationships [35, 56]. Although the search space is NP-hard [58, 66] and computationally intensiv e, SR offers several imp ortant adv antages: it do es not imp ose a predefined mo del structure, enabling the disco very of nov el functional forms; it yields compact, algebraic equations that enhance interpretabilit y 2 [34, 68]; and it often exhibits strong generalization, particularly in data-scarce regimes [70]. Despite recen t metho dological adv ances, symbolic regression has not b een systematically b enc hmarked for chaotic time- series forecasting across a broad and diverse set of systems. In particular, its p erformance relative to mo dern forecasting architectures remains largely unexplored. W e address this gap b y assembling a curated b enc hmark of 132 lo w-dimensional chaotic attractors, including classical systems such as Lorenz and R¨ ossler, follo wing [24, 22, 23]. In this work, we adapt sym b olic framew orks to discrete time-series forecasting, enabling the extraction of in terpretable dynam- ical relations directly from lagged observ ations. W e prop ose tw o complementary sym b olic forecasters that learn explicit algebraic mappings from past observ ations to future states. The first is a neural- sym b olic architecture (namely , Symb olic Neur al F or e c aster (SyNF) ) that extends the Equation Learner (EQL) framework [54] to the forecasting setting through end-to-end differentiab le training. The second approac h namely Symb olic T r e e F or e c aster (SyTF) builds on the PySR symbolic regression library [15] and emplo ys evolutionary symbolic regression to search ov er expression trees under an explicit accu- racy–complexit y trade-off. By reframing symbolic regression as a forecasting metho dology rather than solely a scientific disco v ery to ol, this work adv ances the dev elopment of forecasting models that are not only comp etitive in accuracy but also transparent and mechanistically meaningful. W e ev aluate b oth metho ds on a no w casting task (one-step-ahead prediction) and compare their p erformance against a broad suite of baselines, including deep-learning arc hitectures and tree-based regressors using several error metrics. T o ev aluate real-world applicability , we additionally include t wo empirical time series: the El Ni˜ no SST index [51] and weekly dengue cases [11] in San Juan. The rest of this pap er is structured as follows. Section 2 reviews background material on symbolic regression. Section 3 introduces our b enc hmarking datasets: 132 lo w-dimensional chaotic attractors (e.g., Lorenz, R¨ ossler, Ch ua) along with tw o real-w orld series (the El Ni ˜ no 3.4 index and dengue cases in San Juan). Section 4 presen ts our proposed metho dology . Section 5 outlines our exp erimental ev aluation on b oth syn thetic and real-world datasets. Finally , Section 6 concludes the pap er with a discussion on the limitations and future directions of this study . 2 Bac kground: Symbolic Regression Sym b olic Regression (SR) is an emerging subfield of machine learning that aims to discov er symbolic mathematical expressions describing the relationships within data, wherein both the parameters and the functional structure of the analytical mo del are simultaneously optimized [56, 41]. Unlike conv entional deep learning arc hitectures, SR emphasizes on the trade-off b et ween predictiv e accuracy and in terpretabil- it y . A model is said to b e interpretable if the relationship betw een mo del inputs and outputs can b e traced concisely through structured analytical equations [37]. As data-driven metho dologies are increasingly applied in scientific disciplines and high-stakes decision-making, the need for interpretable mo dels has b ecome more critical. In domains such as ph ysics, interpretable mathematical mo dels grounded in first principles facilitate reasoning ab out the underlying mec hanisms in w ays that opaque predictive models, suc h as deep neural netw orks, cannot. Similarly , in high-stakes decision fields such as public health, the deploymen t of accurate, non-interpretable mo dels remains limited due to safety and accountabilit y concerns [67]. Consider a dataset D = { ( x i , z i ) } N i =1 , whe re x ∈ R q denotes the input features and z represents the target. A linear model z = θ ⊤ x is fully in terpretable, since the con tribution of each input can b e captured from its co efficien t. How ev er, this mo del is often structurally restrictive in capturing complex nonlinear patterns. In contrast, a neural net work z = NN( x ; θ ) can mo del complex relationships but remains opaque as the mapping from x to z remains en tangled within m ultiple la y ers of learned transformations. SR addresses this gap by seeking a mapping ˆ z ( x ) = ˆ α ( x , ˆ β ) that is b oth accurate and interpretable. By assuming an underlying analytical relation z ( x ) = α ∗ ( x , β ∗ ) + ϵ, SR jointly explores the space of candidate expressions α ∗ and their asso ciated parameters β ∗ . In this wa y , SR directly targets the recov ery of an explicit functional form that reflects the true structure of the data-generating pro cess, achieving 3 in terpretability through mo del transparency rather than post-ho c explanation [53]. A v ariet y of algorithmic frameworks hav e b een developed to make the SR search pro cess computa- tionally feasible and s calable. These metho ds can b e broadly classified in to four categories: exhaustiv e (brute-force) searc h, genetic programming, sparse regression, and deep learning–based frameworks, each with distinct strengths and limitations. The exhaustive search approac hes, suc h as BA CON [38] and F AHRENHEIT [39], systematically generate and ev aluate candidate expressions, successfully rediscov- ering empirical laws from idealized datasets. Ho wev er, their reliance on heuristics and the exp onen tial gro wth of the search space make brute-force metho ds infeasible for high-dimensional, complex nonlinear problems. The Genetic programming (GP)–based SR metho ds iteratively ev olve candidate expressions using mutation, crossov er, and selection [35]. Mo dern v arian ts, including PySR [15], GP-GOMEA [65], and QLattice [7], can explore large, unstructured mo del spaces and uncov er complex nonlinear relation- ships. Despite their flexibility , GP-based metho ds often suffer from expression bloat, are computationally in tensive in high dimensions, and are sensitive to h yp erparameters, which limits scalabilit y in real-world applications. On the other hand, the sparse regression approaches assume that systems can b e describ ed using a few active terms. Exact metho ds using Mixed-Integer Nonlinear Programming (MINLP) [14] guaran tee sparsit y but quickly b ecome in tractable as the feature library (in terms of size of the fea- ture space and num b er of mathematical op erations) grows. More scalable ℓ 1 -regularized v arian ts (e.g., LASSO, elastic net) promote sparsity efficien tly but often retain unnecessary terms. The Sparse Identi- fication of Nonlinear Dynamics (SINDy) framework [10] extends sparse regression to dynamical systems b y fitting co efficien ts in a predefined function library; while fast and effective for structured dynamics, its p erformance is sensitiv e to noise and the selection of the library . The Sure Independence Screening and Sparsifying Op erator (SISSO) [48] and SyMANTIC [45] metho ds improv e scalability and interpretabilit y through feature screening, recursiv e expansion, P areto-front tracking, and GPU acceleration [44], but they remain limited to expressions within the predefined feature set. The deep learning–inspired SR approac hes embed sym b olic discov ery within neural architectures. Recen tly developed mo dels such as Deep Symbolic Optimization [50], Equation Learner (EQL) [54], and SR-T ransformer [31] optimize b oth structure and parameters using differen tiable netw orks and sequence-to-sequence models. Hybrid meth- o ds lik e AI-F eynman [63] com bine neural fitting with targeted searches informed b y domain kno wledge. While p o werful for capturing complex patterns, these approaches often require large datasets for effective training and can struggle with low-data regime problems. Th us, sym b olic regression has emerged as a p o werful to ol for uncov ering gov erning equations across diverse scientific domains, including physics, bio c hemistry , ecology , epidemiology , and complex net w ork dynamics [27]. By automatically extracting in- terpretable mathematical relations from ric h observ ational data, it reveals hidden mec hanisms underlying phenomena such as epidemic transmission, so cial interaction patterns, and nonlinear system evolution, thereb y supp orting more transparent mo deling and informed decision-making [21]. The choice of mo deling approach is often guided b y the structure of the problem and the degree of prior kno wledge av ailable [59]. When the ob jectiv e is to discov er complex, nonlinear equations with minimal prior assumptions, sym b olic regression provides a flexible framew ork for unco vering explicit mathematical relationships directly from data [27]. In contrast, when substan tial structural kn owledge is a v ailable, such as when seeking to identify go v erning ODEs or PDEs, sparse regression metho ds offer a principled wa y to reco ver parsimonious dynamical systems. Finally , for problems inv olving high-dimensional interactions and ric h nonlinear dependencies, equation learner net works provide a scalable neural-symbolic alternative that balances expressiv e pow er with structural in terpretabilit y . These v ariations in SR metho ds pro vide the context for symbolic forecasting approaches that leverage the strengths of GP-based algorithms, such as PySR, and neural-sym b olic arc hitectures lik e EQL, to discov er in terpretable and accurate models for sequen tial learning tasks in time series datasets. 4 3 Motiv ating Examples In this study , w e ev aluate the p erformance of symbolic forecasting approaches relativ e to baseline ar- c hitectures by assessing their mo deling and forecasting capabilities across a diverse set of datasets. W e consider t w o distinct categories of data: synthetic time series generated from c haotic dynamical systems and complex real-world observ ations drawn from applied domains. This dual ev aluation strategy allows us to demonstrate b oth the theoretical robustness of the symbolic forecasters and their practical rele- v ance. In this section, we briefly describ e these tw o distinct data categories and analyze their global c haracteristics. 3.1 Syn thetic Dataset from Chaotic A ttractors Figure 1: A dataset of 132 distinct low-dimensional c haotic systems, colored b y largest Lyapuno v exponent ( λ max ). In our analysis, w e consider 132 low-dimensional c haotic system datasets collected from the publicly a v ailable ‘dysts’ repository [22, 23]. These dynamical systems encompass a wide range of applied domains, including climatology , neuroscience, astrophysics, ecology , and others, thereby pro viding a diverse b enc h- mark for ev aluating forecasting p erformance on sim ulated observ ations. Each dataset is univ ariate and corresp onds to a fine-grained temporal gran ularity , with 100 p oin ts sampled p er perio d. The time series for each system is obtained by n umerically integrating the canonical differential equations that define the underlying c haotic dynamics. The parameters and initial conditions are selected based on the literature to ensure sustained c haotic b eha vior [22]. By slightly p erturbing the initial conditions for each system, w e capture the intrinsic sensitivity to initial states c haracteristic of c haotic systems, resulting in diverse y et structurally consistent tra jectories. These attractors provide a controlled yet complex b enchmark for ev aluating forecasting and data-driv en modeling approac hes. In our study , we simulate chaotic attractor series with 1200 observ ations, and we split eac h series chronologically into the first 1000 observ ations 5 for training and the remaining 200 observ ations for testing. This preserves temp oral dep endencies and a voids data leak age during forecast ev aluation. F urthermore, to characterize the c haotic behavior of each simulated data w e compute the largest Ly apunov exp onen t ( λ max ). This exp onen t quantifies the av erage exp onential rate of divergence b et ween nearb y tra jectories in the system’s state space. A p ositiv e v alue of λ max indicates c haotic dynamics, while a v alue of zero corresp onds to neutral stability , often asso ciated with quasi-p erio dicit y , and a negative v alue represents stable b ehavior where tra jectories conv erge to fixed p oin ts. Figure 1 provides a visual represen tation of all the dynamical systems, color-co ded according to the v alues of the largest Lyapuno v exp onen t ( λ max ). As shown in the figure, all synthetic datasets hav e p ositiv e v alues of λ max , indicating that they exhibit significant chaotic b ehavior. 3.2 Real-W orld Time Series Alongside syn thetic datasets from chaotic attractors, we consider tw o real-world datasets from the epi- demic and climate domains, namely the San Juan dengue cases and El Ni ˜ no sea-surface temperature (SST). The San Juan dengue dataset comprises the total num ber of dengue infections rep orted weekly in the Puerto Rico region from 1990 to 2013. This dataset has been widely used in epidemic forecasting studies due to its complex structural patterns and strong seasonal cycles [30, 49]. The El Ni˜ no SST dataset c haracterizes the El Ni ˜ no–Southern Oscillation, a recurrent disruption in the equatorial P acific Ocean driven by interactions b et ween atmospheric and o ceanic circulations and mark ed by elev ated sea- surface temp eratures [51]. This phenomenon profoundly influences global climate v ariabilit y and marine ecosystems, making accurate SST forecasting a ma jor c hallenge in climate science. In our analysis, we consider w eekly SST data recorded in the Ni ˜ no 3.4 region spanning January 3, 1990, to April 21, 2021. In our study , w e p erform several preliminary analyses to examine the global b ehavior of these real- w orld datasets. T o ev aluate the p erformance of the sym b olic forecasting approac hes, each dataset is split chronologically in to training and testing subsets. F or the San Juan dengue dataset, the first 1,144 observ ations are used for training and the remaining 52 observ ations for testing, while for the El Ni ˜ no SST dataset, 1,582 observ ations are allocated for training and 52 for testing. Figure 2 visualizes the temp oral patterns of these real-world datasets along with their auto correlation function (ACF) and partial auto correlation function (P ACF). As highlighted in the plot, the ACF and P A CF functions reveal distinct underlying dynamics for these datasets. F or the San Juan dengue data, the ACF decays rapidly to near zero after a few lags, while the P A CF displa ys a significan t correlation at lag 1, follow ed b y negligible v alues. This pattern indicates that the epidemic incidence series exhibits short-memory b eha vior, with only limited dep endence on past observ ations. In con trast, the El Ni ˜ no SST dataset shows a slo wly deca ying, sin usoidal A CF pattern consisten t with seasonal or cyclic dynamics, while its P A CF again highligh ts a strong influence at lag 1 and weak er effects at higher lags. T o analyze the global characteristics of these real-w orld datasets, we examine several statistical fea- tures, namely non-stationarit y , nonlinearity , seasonality , normalit y , and chaotic dynamics. These features offer comprehensiv e insights into the underlying structure and dynamical patterns of the time series. T o ev aluate nonlinearity , we emplo y T erasvirta’s neural netw ork test, which examines the null h yp othesis that the observed time series follows a linear structure. Stationarity is another fundamental prop ert y of a time series, which guaran tees that the statistical features, such as mean and v ariance, of the series do not change ov er time. In our analysis, to assess the stationarity of a time series, we employ the Kwiatk owski–Phillips–Sc hmidt–Shin (KPSS) test. W e also examine seasonality , which captures recurring patterns or p eriodic fluctuations at fixed interv als, using Ollec h and W eb el’s combined test. Additionally , normalit y of the time series is assessed with the Anderson-Darling test, which determines whether the series follo ws a Gaussian distribution, with deviations indicating sk ewness, heavy tails, or other structural irregularities. The results of these statistical tests, rep orted in T able 1, suggest that b oth the observed series exhibit stationary patterns, c haotic behavior, and significan tly deviate from normality assumptions. The San Juan dengue dataset exhibits nonlinear dynamics with seasonal fluctuations at regular interv als, 6 Figure 2: Real-world datasets along with their auto correlation function (A CF) and partial auto correlation function (P A CF). while the El Ni ˜ no SST dataset provides linear trends coupled with seasonal v ariations. Overall, these observ ations highlight the inheren t temp oral structures across b oth real-world datasets and underscore the need for developing forecasting metho ds that can effectiv ely balance accuracy with interpretabilit y . T able 1: Statistical characteristics of real-world datasets. In the table, a ✓ denotes the presence of a feature, while a ✗ denotes its absence. V alues in paren theses represen t the p-v alues from the corresp onding statistical tests and the maxim um Ly apunov exp onent. Data Time Span Length Gran ularity Linearit y Stationarity Normal Seasonality Chaotic Sanjuan Dengue 199 0–2013 1196 W eekly ✗ (0.026) ✓ (0.098) ✗ (0.001) ✓ (0.001) ✓ (0.001) El Ni ˜ no SST 1990–2021 1634 W eekly ✓ (0.777) ✓ (0.100) ✗ (0.001) ✓ (0.001) ✓ (0.001) 4 Sym b olic F orecasting Approac hes This section pro vides the form ulation of the symbolic forecasting techniques, whic h aim to combine strong predictive performance with mo del interpretabilit y for time series forecasting. W e introduce tw o distinct approaches, namely the SyNF and SyTF, based on the neural training and ev olutionary search tec hniques of the symbolic regression mo dels. SyNF framework adopts the EQL architecture [42] to time series forecasting, where the conv en tional neural netw ork activ ations (e.g., ReLU, tanh) are replaced with a set of unary and binary mathematical op erations. These op erations serve as sym b olic building blo c ks that the mo del combines into larger expressions as training progresses. Since the en tire arc hitecture remains differen tiable, the netw ork can b e trained end-to-end while simultaneously pro ducing explicit, 7 in terpretable equations expressed in terms of lagged input features. Our second approach, the genetic-programming-based SyTF, builds on the PySR framework [15]. SyTF employs an ev olutionary search pro cess inspired b y natural selection, where candidate mathematical expressions are constructed from a predefined library of basic op erations, such as addition, subtraction, m ultiplication, division, and elementary nonlinear or trigonometric functions. These expression trees are iterativ ely mutated, recombined, and selected based on their predictive accuracy and mo del complexit y . Similar to SyNF, SyTF models temporal dependencies through lagged inputs, but unlike the d ifferentiable arc hitecture of SyNF, it explores a broad, unstructured model space using ev olutionary operators without relying on gradient-based training. T ogether, SyNF and SyTF offer tw o distinct yet complemen tary p erspectives on symbolic forecasting. Figure 3 illustrates the schematic arc hitecture of the symbolic forecasting approac hes. The upp er panel (I) depicts the SyNF framew ork, where a neural optimization- driv en mechanism is utilized for discov ering interpretable predictive expressions, while the low er panel (I I) demonstrates the workflo w of the SyTF mo del and reflects the strengths of evolutionary exploration in navigating large sym b olic search spaces. Through these t w o distinct approaches, we demonstrate ho w sym b olic forecasting framew orks can yield accurate, transparen t, and in terpretable mo dels across b oth syn thetic chaotic systems and real-w orld datasets. 4.1 Sym b olic Neural F orecaster Giv en an univ arite input series { y 1 , y 2 , . . . , y T } , the SyNF framework assumes that the observ ation at time t can b e expressed as y t = Φ  y t − 1  + ξ t , where y t − 1 = { y t − 1 , y t − 2 , . . . , y t − p } ⊤ ∈ R p con tains the p -lagged observ ations, Φ : R p → R denotes the true underlying functional mapping, and ξ t is the additive noise. The ob jective is to learn a compact and interpretable appro ximation Ψ : R p → R that minimizes the exp ected squared error E   Ψ  y t − 1  − Φ  y t − 1  2  . Since the exp ected loss is unknown, we estimate it using the empirical risk function 1 T − p P T t = p +1  Ψ  y t − 1  − y t  . The resulting analytical approximation Ψ( · ) provides one- step ahead forecast ( ˆ y T +1 ) of the observed series ˆ y T +1 = Ψ  y T  . T o learn Ψ( · ) from historical data, SyNF mo difies a standard feed-forward neural net work by replacing con ven tional activ ations with fixed algebraic base functions, including sine, cosine, identit y , and multipli- cation [54]. These op erations act as symbolic building blocks that can b e com bined within a differen tiable arc hitecture, enabling gradien t-based training while still pro ducing in terpretable expressions. SyNF em- plo ys a single hidden-la y er arc hitecture. A linear transformation is applied for mapping the lag-em b edded input y t − 1 ∈ R p to a d -dimensional intermediate representation z (1) t − 1 = W (1) y t − 1 + b (1) , where W (1) ∈ R d × p and b (1) ∈ R d are learnable parameters. The hidden la y er consists of u unary functions f i : R → R and v binary functions g j : R × R → R , applied to the comp onen ts of z (1) t − 1 = [ z t − 1 , 1 , . . . , z t − 1 ,d ] where d = u + 2 v . The unary op erations pro cess the first u components, while binary op erations pro cess the remaining comp onen ts in pairs. The hidden output ( y (1) t − 1 ∈ R k ) is obtained by concatenating the sym b olic outputs as y (1) t − 1 := [ f 1 ( z t − 1 , 1 ) , f 2 ( z t − 1 , 2 ) , . . . , f u ( z t − 1 ,u ) , g 1 ( z t − 1 ,u +1 , z t − 1 ,u +2 ) , . . . , g v ( z t − 1 ,d − 1 , z t − 1 ,d )] ⊤ , (1) 8 Figure 3: Sym b olic F orecasting Arc hitectures. The upp er panel illustrates the architecture of the Sym b olic Neural F orecaster (SyNF), where a differen tiable neural training mechanism is used to learn interpretable forecasting equations from symbolic op erators and lagged inputs. The low er panel presen ts the workflo w of the Symbolic T ree F orecaster (SyTF), which employs an evolutionary search to construct and refine symbolic expression trees based on predictive p erformance and interpretabilit y . The demonstration is conducted on a simulated dataset. The mathematical expression obtained from the SyNF model is ˆ y t = 0 . 1110 + 0 . 9421 y t − 1 + 0 . 0001 ( y t − 1 ) 2 (top image) whereas the SyTF architecture yields ˆ y t, SyTF = 0 . 9613 y t − 1 (b ottom image) using a single historical lagged observ ation. The one-step- ahead rolling window forecasts are also presen ted in the upp er and lo wer panels for the simulated example datasets. 9 where k = u + v . In the SyNF framework to enhance mo del interpretabilit y and robustness, w e utilize three unary functions, i.e., f i ∈ { iden tity , sine , and cosine } and the binary operations represent the element- wise multiplication. The one-step ahead prediction is generated from the output la y er by applying a linear transformation on the hidden activ ation y (1) t − 1 as: ˆ y t = W (2) y (1) t − 1 + b (2) , where W (2) ∈ R 1 × k , b (2) ∈ R are learnable parameters of the output lay er. Hence, by stacking several sym b olic units, SyNF effectively constructs weigh ted combinations of elemen tary algebraic terms, enabling in terpretable equation disco very while main taining strong predictiv e capability . The base mo del of the SyNF arc hitecture is trained by minimizing the mean squared error (MSE) MSE = 1 T − p T X t = p +1 ( y t − ˆ y t ) 2 whic h is differentiable in its parameters  W (1) , b (1) , W (2) , and b (2)  . The training is p erformed using a gradient descent backpropagation approac h with the Adam optimizer, without regularization. T o prev ent ov erfitting and promote parsimony , w e additionally introduce a regularized v ariant, SyNF-Reg, whic h augments the MSE loss function with an ℓ 1 p enalt y MSE + λ     W (1)    1 +    W (2)    1  , where λ controls the sparsity level and encourages the remo v al of redundant symbolic comp onen ts. SyNF-Div F ramew ork. While the SyNF arc hitecture captures a broad class of nonlinear relationships through additive, m ultiplicativ e, and trigonometric op erations, man y real-world and dynamical systems inheren tly rely on division-based functional forms, such as rational dep endencies, saturation dynamics, and feedbac k mec hanisms. T o mo del such b ehavior, w e extend the SyNF framework by introducing learnable division units, following the design principles of the EQL ÷ mo del [54]. A critical challenge in introducing division op erations is the numerical instability that o ccurs when denominators approach zero. The SyNF-Div family addresses this b y penalizing small denominators and gradually relaxing this constrain t during training, allowing the net work to learn meaningful rational structures while maintaining stable optimization. This results in t w o v ariants: SyNF-Div, whic h incorp orates division into the symbolic function set, and SyNF-Div-Reg, which further integrates ℓ 1 regularization to promote sparsity and in terpretability . The SyNF-Div framework retains the one-hidden-lay ered feed-forward architecture of the SyNF mo del, with the output lay er augmen ted by a division operation. Sp ecifically , the hidden activ ation y (1) t − 1 (as in Eq. 1) is linearly transformed as: z (2) t − 1 = W (2) ∗ y (1) t − 1 + b (2) ∗ , where z (2) t − 1 = h z (2) t − 1 , 1 , z (2) t − 1 , 2 i ∈ R 2 and n W (2) ∗ ∈ R 2 × k , b (2) ∗ ∈ R 2 o are the learnable parameters. The one-step ahead prediction is generated from the SyNF-Div mo del by applying the division op eration ( h γ ) on z (2) t − 1 as: ˆ y t, ∗ = h γ  z (2) t − 1 , 1 , z (2) t − 1 , 2  =      z (2) t − 1 , 1 z (2) t − 1 , 2 , z (2) t − 1 , 2 > γ 0 , z (2) t − 1 , 2 ≤ γ , (2) where ˆ y t, ∗ is the forecast generated by the SyNF-Div framework and γ ≥ 0 is a threshold controlling n umerical stability , such that if the denominator falls b elo w γ , the output and its gradient are set to 10 zero. Additionally , to prev ent division by zero, a p enalty term is added that p enalizes small or negative denominators. F or each denominator activ ation z (2) t − 1 , 2 , a lo cal penalty is defined as: p γ  z (2) t − 1 , 2  := max( γ − z (2) t − 1 , 2 , 0) , (3) where γ is the stability threshold as in Eq. (2). This p enalt y term is activ ated when the denominator falls below γ , and summing it ov er all the training samples yields the global penalty as P γ = T X t = p +1 p γ  z (2) t − 1 , 2  . (4) While Eq. (4) prev ents negative denominators during training, additional constrain ts are required to ensure stable b ehavior under out-of-sample and extrap olative regimes. T o con trol instability arising from negativ e denominators or high magnitudes of output, p enalty ep o chs are introduced at fixed interv als (ev ery 50 ep ochs), during which the mo del is trained using an additional p enalt y loss ( L Penalt y ) given by: L Penalt y = P γ + P bound . (5) The P bound enforces bounded outputs and is given by: P bound = T X t = p +1 h max  Ψ ∗  y t − 1  − B , 0  + max  − Ψ ∗  y t − 1  − B , 0  i , (6) where Ψ ∗  y t − 1  = ˆ y t, ∗ denotes the netw ork’s predicted output at time t . Here, B is a user-defined b ound on the output magnitude, c hosen based on the range of training data, with results largely insensitiv e to its exact v alue. F urthermore, to balance n umerical stabilit y and the accurate approximation of sym b olic expressions, the threshold γ is gradually relaxed during training, with γ ( t ) = 1 / √ t + 1. This curricu- lum ensures safe optimization in early stages while progressively approximating exact division. During v alidation and testing, γ is fixed to a small v alue (10 − 4 ). The SyNF-Div framew ork is fully differentiable in its parameters { W (1) , W (2) ∗ , b (1) , b (2) ∗ } which allows net work training through gradien t descen t backpropagation approach. The training mechanism for SyNF- Div com bines predictiv e accuracy with global p enalt y term: L SyNF-Div = 1 T − p T X t = p +1 ( y t − ˆ y t, ∗ ) 2 + P γ , while SyNF-Div-Reg framework further integrates an ℓ 1 regularization to promote sparsity: L SyNF-Div + λ ∗  ∥ W (1) ∥ 1 + ∥ W (2) ∗ ∥ 1  . The bound penalty term P bound (Eq. (6)) is used in the training mechanism of both SyNF-Div and its regularized v arian t only during sp ecific p enalt y ep ochs. Ov erall, the SyNF-Div arc hitecture enables learning of accurate, stable, and interpretable rational structures, extending the symbolic forecasting capabilities of SyNF to a wider range of real-w orld systems. 4.2 Sym b olic T ree F orecaster The SyTF framework is a multi-population evolutionary algorithm that adopts an evolv e-simplify-optimize lo op inspired by the PySR arc hitecture [15], specifically for time series forecasting. Given the lagged in- puts y t − 1 = { y t − 1 , y t − 2 , . . . , y t − p } ⊤ ∈ R p , the framework build an interpretable, expression tree-based sym b olic function ( F ) to generate one-step ahead forecast of the observed time series as follo ws: ˆ y t, SyTF = F  y t − 1  . 11 The sym b olic mapping F : R p → R is obtained through evolutionary search ov er a structured space of mathematical expressions. In SyTF, the candidate forecasting functions are represented as expression trees composed of lagged v ariables, scalar constants, and a predefined set of unary and binary operators (e.g., addition, subtraction, m ultiplication, division, sin, cos, or exp). These trees ev olv e across m ultiple async hronously interacting subpopulations, which helps in preserving div ersity during the searc h. Within eac h subp opulation, tournament selection [6, 25] is used to choose parents, fav oring fitter expressions while maintaining a non-zero probabilit y of selecting w eaker candidates. The selected paren t is cloned and modified through mutation and crosso v er op erations, and the resulting offspring replaces the oldest candidate in a subp opulation. This age-regularized replacement preven ts the p opulation from stagnating and reduces the risk of early conv ergence to local optima [52]. T o further enhance exploration, SyTF integrates a simulated annealing mechanism during mutation [15]. Under this sc heme, a structurally mo dified candidate with fitness L F ma y replace its paren t with fitness L E according to the rejection probabilit y P = exp  L F − L E α τ  , where τ ∈ [0 , 1] is a temp erature parameter and α con trols the scale of temperature. This strategy allo ws the ev olution to alternate b et ween high temp erature and low temp erature phases, with the high τ increasing div ersit y of individuals and lo w τ narro wing in on the fittest individuals. Alongside the simu- lated annealing strategy , SyTF p erforms an evolv e–simplify–optimize cycle on the evolutionary process. In the evolv e phase, new candidate equations are generated through repeated application of tournament selection and mutation. The simplify phase then applies algebraic identities to reduce expressions into more compact y et mathematically equiv alent expressions, constraining the searc h without discarding useful in termediate structures. In the optimize phase, real-v alued constants within the symbolic expres- sions are fine-tuned using the BFGS algorithm [8, 43], improving predictive accuracy while retaining in terpretability . The SyTF framework con trols mo del complexit y through an adaptive parsimon y mec hanism. In the arc hitecture, the expression complexit y is measured b y the n umber of nodes in the sym b olic tree E , denoted C ( E ). T o con trol ov erly complex mathematical functions, SyTF architecture incorp orates an adaptiv e p enalty that depends on how frequen tly and how recen tly expressions of each complexit y level ha ve app eared, a measure referred to as ‘frecency’. Th us, the mo dified loss function, follo wing [15], is defined as ℓ ( E ) = ℓ pred ( E ) exp(frecency [ C ( E )]) , where ℓ pred ( E ) denotes the predictiv e loss and frecency [ c ] records how often and how recently expressions of complexit y c hav e appeared, computed with a mo ving time windo w and a tunable normalization con- stan t. This adaptive mec hanism dynamically regulates search pressure across complexit y levels. When expressions of a particular complexit y b ecome o verrepresen ted, their effective p enalty increases, encour- aging exploration of alternative structural forms. Con v ersely , underrepresented complexity levels incur a smaller penalty , making them more lik ely to b e selected. This balanced exploration reduces the risk of early con vergence to either ov erly simple or excessiv ely complex formulas, thereby improving the likelihoo d of iden tifying forecasting mo dels that achiev e a practical balance betw een accuracy and in terpretability . Additionally , SyTF adapts the Par eto fr ont strategy for model selection [15]. Instead of returning a single expression, it main tains a set of non-dominated solutions that balance predictiv e performance (loss) and mo del simplicity (expression complexit y). An expression lies on the Pareto front if no other candidate is b oth more accurate and simpler. This multi-ob jectiv e approach allo ws the selection of forecasting mo dels that ac hieve an appropriate trade-off b etw een in terpretabilit y and performance. In this study , w e develop tw o v arian ts of the SyTF framework: the base SyTF architecture emplo ys core op erators suc h as addition, subtraction, multiplication, and trigonometric functions (sin and cos), whereas the SyTF-Div-Exp v ariant enlarges the op erator set to include division and exp onential functions. The restricted searc h space of the base model encourages simple expressions that align with the p eriodic 12 and multiplicativ e patterns commonly observed in chaotic dynamics. In con trast, the division op erator enables the extended v ariant to capture rational relationships prev alen t in many physical and scien tific systems, while the exponential function facilitates mo deling of growth and deca y pro cesses. By comparing these tw o distinct configurations, we assess how a richer set of unary and binary op erations influences b oth predictive p erformance and the in terpretability of the resulting forecasting equations. Overall, by explicitly balancing forecasting p erformance and expression complexity , the SyTF framework pro duces h uman-readable mathematical expressions that provide transparen t insigh ts in to the underlying temp oral dynamics of the time series. Remark 1. The pr op ose d symb olic for e c asting fr ameworks offer differ ent tr ade-offs b etwe en mo del c om- plexity, interpr etability, and c omputational efficiency. In pr actic e, the choic e of mo del c an b e guide d by the fol lowing c onsider ations: • SyNF and its variants. The SyNF ar chite ctur e is mor e suitable for datasets exhibiting c omplex nonline ar dynamics. These mo dels tend to pr o duc e richer symb olic e quations involving p olynomial inter actions and trigonometric c omp onents, al lowing them to c aptur e intric ate temp or al r elation- ships. However, this incr e ase d expr essive p ower r esults in higher c omputational c omplexity and longer tr aining time. In pr actic e, SyNF is often mor e effe ctive for r e al-world datasets. Sever al variants further enhanc e the mo deling c ap ability: SyNF-R e g inc orp or ates ℓ 1 r e gularization to en- c our age sp arsity and impr ove interpr etability; SyNF-Div extends the op er ator libr ary with division op er ators to r epr esent r ational functional r elationships c ommonly observe d in oscil latory or phys- ic al pr o c esses; and SyNF-Div-R e g c ombines division op er ations with ℓ 1 r e gularization to b alanc e expr essive nonline ar mo deling with sp arse and interpr etable symb olic e quations. • SyTF and its variants. The SyTF fr amework is pr eferr e d when the underlying time series dynamics ar e r elatively simple or when c omputational efficiency is imp ortant. Due to the evolve- simplify-optimize le arning pr o c e dur e, SyTF typic al ly pr o duc es c omp act symb olic expr essions that r esemble autor e gr essive structur es. This makes it p articularly suitable for simulate d datasets or sc enarios wher e simpler symb olic r elationships ar e sufficient. The extende d variant SyTF-Div-Exp further enriches the op er ator set by inc orp or ating division and exp onential functions, enabling the mo del to c aptur e mor e c omplex nonline ar p atterns while maintaining the c omp act symb olic structur e of the SyTF fr amework. 5 Exp erimen tal Setup and Results In this section, we ev aluate the performance of the proposed sym b olic forecasting approac hes in predicting the future dynamics of b oth c haotic systems and real-world datasets. T o establish a comprehensive comparison, the sym b olic forecasters are b enc hmarked against a suite of state-of-the-art machine learning forecasting tec hniques. 5.1 Baseline Mo dels T o ensure a systematic ev aluation, w e b enchmark the sym b olic forecasting methods against several fore- casting framew orks spanning tree-based and deep learning tec hniques. This diverse selection of baseline mo dels allows us to assess the robustness and generalization abilit y of sym b olic metho ds across different mo deling paradigms. In particular, we include sev eral ensemble approac hes such as the bagging-based Random F orest mo del [5], the b oosting-based Extreme Gradient Bo osting (XGBoost) [13], and the Light Gradien t Bo osting Machine (Ligh tGBM) [33]. These mo dels are kno wn for their abilit y to capture complex nonlinear relationships through aggregation and b o osting strategies. F rom the deep learning paradigm, we consider several architectures that hav e demonstrated sup erior p erformance in sequence 13 mo deling tasks. W e employ the Normalization-based Linear (NLinear) method [71], whic h extends tra- ditional single-hidden-la yer neural net works by in tro ducing normalization to stabilize learning dynamics and impro ve generalization. W e also use the Neural Basis Expansion Analysis for Time Series (NBetas) [47] and the Neural Hierarchical In terp olation for Time Series (N-HiTS) [12] framew orks, which emplo y basis expansion and hierarchical decomp osition to learn interpretable temp oral patterns. In addition, w e incorp orate the recurren t architecture-based Long Short-T erm Memory (LSTM) net work [26], which is well-suited for handling long-term temp oral dep endencies in time series data. Finally , to include atten tion-driven sequence modeling techniques, we ev aluate the enco der-deco der framew orks, namely the T ransformer mo del with a multi-head self-atten tion mechanism and the Time-series Dense Enco der (TiDE) [17], whic h employs dense representations to enhance temp oral enco ding efficiency . T ogether, these mo dels serve as robust baselines to comprehensively ev aluate the efficacy of symbolic forecasting approac hes across div erse datasets for one-step-ahead (rolling windo w) forecasting. 5.2 P erformance Indicators In our analysis, we ev aluate the p erformance of sym b olic forecasting approaches and baseline arc hitectures using four scale-dependent and indep endent p erformance metrics, namely Symmetric Mean Absolute P ercentage Error (SMAPE), Ro ot Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Ranged Relative Error (MARRE). The mathematical form ulation for these ev aluation metrics is defined as follo ws: SMAPE = 1 h h X i =1 2 | ˆ y t + i − y t + i | ( | ˆ y t + i | + | y t + i | ) × 100% , RMSE = v u u t 1 h h X i =1 ( ˆ y t + i − y t + i ) 2 , MAE = 1 h h X i =1 | ˆ y t + i − y t + i | , and MARRE = 1 h h X i =1       y t + i − ˆ y t + i max i y t + i − min i y t + i       , where y t + i ∈ R denotes the ground truth observ ation recorded at time t + i , ˆ y t + i ∈ R is the corresp onding p oin t forecast, and h is the forecast horizon. By definition, smaller v alues of these metrics indicate b etter p erformance of the forecasting model [28, 49]. 5.3 F orecast Ev aluation for Chaotic A ttractors W e ev aluate the performance of the sym b olic forecasting approac hes in predicting the future tra jectories of 132 synthetic chaotic attractors. Our ev aluation uses a rolling-window mechanism, where the model generates a one-step-ahead forecast for the predetermined test p erio d. In this ev aluation, four v arian ts of sym b olic forecasting techniques, including SyNF, its regularized v arian t SyNF-Reg, SyTF, and the extended SyTF-Div-Exp mo del that includes division and exp onen tial operators to capture more complex nonlinear interactions, are considered. T o ensure a comprehensive ev aluation, all sym b olic and baseline mo dels are trained with different num b ers of lagged observ ations (5, 10, and 25 lags), whic h serve as input features pro viding temp oral con text. The smaller n umber of lags enhances mo del interpretabilit y and computational effectiveness, while a larger num b er of lags offers richer historical information and yields complex representations. The empirical results, illustrated in Figure 4, summarize the forecasting p erformance of all mo dels trained with five lagged observ ations. The errorbar plots displa y the median and interquartile range of four k ey performance metrics across the c haotic attractor datasets. The results highligh t the sup eriorit y of symbolic forecasting architectures, particularly with the genetic programming-based SyTF-Div-Exp and SyTF mo dels consistently achieving the minimum median errors and disp ersion, reflecting b oth high accuracy and stabilit y across attractors. These expression tree-based forecasters excel by discov ering explicit analytical expressions that approximate the gov erning dynamics of chaotic systems, yielding 14 Figure 4: Comparative p erformance of symbolic forecasting approaches and state-of-the-art framew orks for one-step ahead (rolling window) forecasting of c haotic datasets using five historical observ ations for eac h mo del. In the plot, panels sho w error distributions across four metrics: RMSE (top-left), MAE (top-righ t), MARRE (b ottom-left), and SMAPE (b ottom-righ t). Red dots denote median p erformance; error bars represen t the interquartile range (IQR). F or all panels, mo dels are rank ed by ascending median error (lo wer v alues indicate b etter performance). 15 in terpretable and generalizable predictive relations. The SyNF and its regularized v arian t, whic h integrate sym b olic op erations with a neural training mechanism, achiev e comp etitiv e y et less stable p erformance compared to SyTF. While the gradien t-based optimization in SyNF mo dels enhances flexibilit y and smo other learning dynamics, the stochastic nature of neural training in troduces v ariability that reduces their generalization capability across c haotic regimes. Despite their higher v ariabilit y , both SyNF v ariants consisten tly outp erform state-of-the-art deep learning architectures in forecasting p erformance. Among the baseline mo dels, deep neural forecasters such as NBeats, NHiTS, TiDE, and LSTM show case mo derate median p erformance but fail to generalize effectively to the inheren t instability of chaotic systems. On the other hand, the tree-based ensemble metho ds like XGBoost, Ligh tGBM, and Random F orest exhibit larger v ariance, while the atten tion mec hanism-based T ransformer framew ork records both higher median errors and wider error disp ersion primarily due to the low sample size of the dataset. Ablation Study . Additionally , we conduct an ablation study to determine the influence of the num- b er of lagged observ ations on the p erformance of the symbolic forecasting approaches. The emipircal ev aluations conducted with 10 and 25 lagged observ ations, presented in Figures 5 and 6, also depict similar patterns. While increasing the lag windo w enhances the temp oral represen tation and allows the mo dels to capture long-term dep endencies, the relative ranking among the mo dels remains consisten t. The SyTF architecture and its extended v arian t exhibit the lo west median forecasting errors and high- est stability across all chaotic attractors. This demonstrates the robustness of the SyTF frameworks to c hanges in input dimensionality and the ability to optimize mo del p erformance and complexit y . The SyNF and SyNF-Reg mo dels also maintain comp etitiv e p erformance with the SyTF frameworks. On the other hand, the deep learning forecasters depict marginal improv ements in performance due to additional lags; how ever, they con tinue to suffer from increased error disp ersion and reduced generalization. The tree-based mo dels and other baseline architectures demonstrate a similar pattern of limited adaptability as the input horizon expands. Statistical Significance T est. F urthermore, to v alidate the robustness of the performance improv e- men ts of the SyTF and SyTF-Div-Exp approach in forecasting the dynamics of the c haotic attractors, w e conduct the Multiple Comparison with the Best (MCB) test [46]. This non-parametric test pro cedure ranks the comp etitiv e mo dels based on their p erformance across different forecasting tasks and computes the av erage rank with their critical distance. The mo del with the least av erage rank is considered the ‘b est’ p erforming tec hnique, and the corresponding critical distance serv es as the reference v alue for the test. Figure 7 presents the MCB test results, in terms of RMSE and SMAPE metrics, for the symbolic and baseline forecasting mo dels with five lagged inputs. As observed from the results, the SyTF mo del achiev es the minimum ranks in b oth the p erformance metrics and hence serv es as the ‘b est’ forecasting mo del, follo wed by the SyTF-Div-Exp framework. Since the critical distance of all other models lies b ey ond the reference v alue (shaded region) of the test, their p erformance is significan tly inferior in comparison to the SyTF architecture. Overall, the empirical ev aluation on the chaotic attractor datasets underscores the sup erior p erformance of symbolic forecasting frameworks, where SyTF and SyTF-Div-Exp emerge as the most accurate and stable architectures, while the SyNF v arian ts offer a robust balance b et ween sym b olic representation and neural adaptabilit y , outperforming all baselines. Bey ond their predictiv e p erformance, these symbolic mo dels p ossess the critical b enefit of interpretabilit y , providing analytical relationships that explain the underlying c haotic dynamics, unlik e con v entional deep learning approac hes, whic h function as blac k-box arc hitectures. 5.4 Empirical Ev aluation for Real-w orld Dataset W e assess the forecasting p erformance of the proposed symbolic forecasting frameworks on tw o complex real-w orld time series datasets, San Juan dengue and El Ni ˜ no SST, as describ ed in Section 3.2. These datasets represent diverse real-w orld phenomena, encompassing epidemiological and climatic domains, 16 Figure 5: Comparative p erformance of symbolic forecasting approaches and state-of-the-art framew orks for one-step ahead forecasting of chaotic datasets using ten historical observ ations for each mo del. In the plot, panels sho w error distributions across four metrics: RMSE (top-left), MAE (top-right), MARRE (b ottom-left), and SMAPE (b ottom-righ t). Red dots denote median p erformance; error bars represen t the interquartile range (IQR). F or all panels, mo dels are rank ed by ascending median error (low er v alues indicate better p erformance). 17 Figure 6: Comparative p erformance of symbolic forecasting approaches and state-of-the-art framew orks for one-step ahead forecasting of chaotic datasets using tw en ty-fiv e historical observ ations for eac h mo del. In the plot, panels show error distributions across four metrics: RMSE (top-left), MAE (top- righ t), MARRE (b ottom-left), and SMAPE (b ottom-righ t). Red dots denote median p erformance; error bars represen t the interquartile range (IQR). F or all panels, mo dels are rank ed by ascending median error (lo wer v alues indicate better p erformance). 18 Figure 7: Visualization of the MCB test results for synthetic c haotic attractor forecasting tasks based on RMSE (left) and SMAPE (right) metrics. The vertical axis represents the av erage ranks of the forecasting metho ds, while the horizon tal axis lists the corresp onding techniques. F or example, SyTF – 1.98 indicates that the SyTF framew ork obtained an a verage rank of 1.98 under the RMSE metric, with similar in terpretations applicable to the remaining models. eac h c haracterized by nonlinear, nonstationary , and c haotic dynamics. In addition to the comp etitiv e b enc hmark models discussed earlier, w e ev aluate tw o enhanced v ariants of the proposed SyNF framew ork: (i) SyNF-Div, whic h incorporates division-based sym b olic operators for richer functional expressiveness, and (ii) SyNF-Div with ℓ 1 regularization (SyNF-Div-Reg), whic h introduces sparsity constraints to pre- v ent ov erfitting and enhance mo del in terpretability . F or consistency and fair comparison, we employ a comprehensiv e exp erimen tal setup where four historical observ ations of the target v ariable are used as inputs to predict one-step-ahead v alues ov er the test horizon across all forecasting techniques. This c hoice is motiv ated by the short-memory b eha vior of these time series, as eviden t from the preliminary analy- sis (Figure 2), which indicates that only a few recent observ ations are sufficient to capture the relev an t temp oral dep endencies. T able 2 presents a comprehensive p erformance ev aluation for one-step ahead forecasting of real-world datasets for all comp eting mo dels during the test p eriod. F or the San Juan dengue dataset, con v entional mac hine learning approac hes suc h as Random F orest, XGBoost, and LightGBM yield mo derate p erfor- mance but struggle to capture sudden epidemic outbreaks, resulting in higher RMSE and MAE v alues compared to neural forecasting mo dels. Deep learning arc hitectures such as NBeats, N-HiTS, and TiDE pro vide comp etitive p erformance, while recurrent mo dels such as LSTM and atten tion-based T ransform- ers exhibit significantly higher errors due to their inability to model irregular epidemic patterns with limited data. In contrast, the SyNF-Reg architecture ac hieves the lo west errors across all ev aluation met- rics, indicating that the sparse symbolic connections effectively capture the inherent nonlinear patterns of dengue incidence cases. Notably , SyNF and SyNF-Div also outp erform most deep learning baselines, suggesting that sym b olic represen tations provide a more data-efficient mec hanism for mo deling epidemio- logical time series than con v entional neural architectures. F or the El Ni˜ no SST dataset, characterized by complex oscillatory b eha vior and long-range temporal dep endencies, the forecasting p erformance differs across mo del paradigms. While several neural forecasting arc hitectures, such as NBeats, N-HiTS, and Ligh tGBM, achiev e relativ ely lo w errors, the performance of recurren t and atten tion-based models consis- ten tly diminishes. Notably , the SyNF-Div-Reg mo del outp erforms all baseline framew orks, achieving the lo west SMAPE, RMSE, MAE, and MARRE metrics for this dataset. The inclusion of division op erations com bined with the sparse connections enables the mo del to reconstruct complex SST fluctuations b y represen ting rational functional relationships, which are particularly effective for describing oscillatory 19 T able 2: Comparative ev aluation of forecasting p erformance for mac hine learning mo dels and symbolic forecasting frameworks on real-w orld datasets across multiple p erformance indicators. In the table, the b est and se c ond-b est p erforming models are highligh ted. Dataset Sanjuan Dengue El Ni˜ no SST Models SMAPE RMSE MAE MARRE SMAPE RMSE MAE MARRE NLinear 21.70 21.37 16.97 7.44 1.03 0.34 0.27 8.29 RandomF orest 25.21 25.46 20.08 8.81 0.87 0.30 0.22 6.94 XGBoost 27.58 28.09 22.16 9.72 0.92 0.31 0.24 7.29 LightGBM 24.27 32.62 21.99 9.64 0.82 0.28 0.21 6.55 LSTM 50.21 72.43 49.98 21.92 5.23 1.62 1.40 42.41 NBeats 21.77 21.94 16.84 7.38 0.83 0.29 0.21 6.61 NHiTS 21.10 21.59 16.29 7.14 0.86 0.30 0.22 6.87 T ransformer 81.54 94.41 70.11 30.75 4.36 1.38 1.15 35.12 TiDE 21.94 22.18 17.14 7.51 1.19 0.37 0.32 9.59 SyNF 21.39 20.80 16.19 7.10 0.83 0.27 0.22 6.66 SyNF-Reg 20.95 20.69 15.76 6.91 1.41 0.44 0.37 11.37 SyNF-Div 21.09 20.82 16.04 7.03 0.84 0.27 0.22 6.70 SyNF-Div-Reg 21.36 20.96 16.60 7.28 0.79 0.26 0.20 6.30 SyTF 23.30 23.32 17.76 7.79 0.86 0.27 0.22 6.88 SyTF-Div-Exp 22.73 24.04 18.37 8.06 0.85 0.27 0.22 6.79 ph ysical pro cesses. Other symbolic forecasting approaches also remain highly comp etitiv e, consistently p erforming similar or b etter than the state-of-the-art forecasting mo del. Additionally , we present the ex- plicit analytical expressions learned by the sym b olic forecasting framew orks in T able 3. These symbolic equations describ e the temp oral dynamics of the real-w orld datasets as modeled b y the SyNF and SyTF arc hitectures. Compared to SyNF, the equations obtained from the SyTF model are considerably simpler and largely resemble compact autoregressive relationships inv olving only a few lagged observ ations. This simplicit y can b e attributed to the evolv e-simplify-optimize lo op in SyTF, which balances exploration and exploitation while capturing most temp oral dep endencies in the latent represen tation, allowing the sym b olic lay er to remain concise. In contrast, the SyNF mo del pro duces ric her nonlinear expressions in volving polynomial in teractions and trigonometric terms. F or the El Ni ˜ no SST dataset, these p erio dic comp onen ts reflect the oscillatory nature of sea surface temp erature dynamics, while for the San Juan Dengue dataset, they help in capturing nonlinear seasonal patterns in epidemic incidence . Overall, these sym b olic equations pro vide interpretable insights into the dominan t temp oral dep endencies gov erning the evolution of the underlying processes. F urthermore, to v alidate the statistical significance of the observ ed p erformance improv ements, w e conduct the distribution-free MCB test. The results of this test, illustrated in Figure 8, indicate that the SyNF-Div-Reg framework consistently outp erforms competitive deep learning and ensemble mo dels across b oth real-world forecasting tasks, follow ed b y the SyNF and SyNF-Div models. Ov erall, the empirical findings for the real-w orld datasets underscore the sup erior generalization abil- it y of the symbolic neural arc hitectures (SyNF family) compared to the expression-tree-based symbolic tree forecasters. The p erformance impro vemen ts observed in SyNF arise from its hybrid design, which effectiv ely integrates symbolic reasoning with neural parameter optimization. Unlike conv en tional neural net works, SyNF maintains a fixed computational structure where only the connection weigh ts b et ween the input and symbolic no des are optimized through gradient-based learning. This design ensures b oth 20 T able 3: Symbolic equations learned by SyTF and SyNF for real-w orld datasets. Dataset Equation from SyTF ( ˆ y t, SyTF =) SyNF ( ˆ y t =) Sanjuan y t − 4 0 . 087 sin( y t − 2 + − 1 . 1 e − 4 y 2 t − 1 + 6 . 7 e − 5 y t − 1 y t − 2 + 0 . 288 y t − 1 + 3 . 7 e − 5 y 2 t − 1 + 0 . 684 y t − 2 + 0 . 938 − Dengue sin(0 . 280 y t − 3 − 0 . 696)) + y t − 4 0 . 203 sin( − 3 . 6 e − 4 y t − 1 + 1 . 337 y t − 2 + 1 . 439) + 0 . 601 cos(1 . 568 y t − 1 − 0 . 288 y t − 2 + 0 . 063) El Ni ˜ no 0 . 985 y t − 1 + 0 . 417 − 8 . 2 e − 3 y 2 t − 1 + 0 . 017 y t − 1 y t − 2 + 0 . 553 y t − 1 + 0 . 019 y 2 t − 2 − 0 . 324 y t − 2 + 0 . 031 − SST 0 . 137 sin(1 . 438 y t − 1 + 0 . 078 y t − 2 − 0 . 107) + 1 . 792 cos(0 . 536 y t − 1 + 3 . 2 e − 5 y t − 2 − 0 . 463) Figure 8: Visualization of the MCB test results for real-w orld forecasting tasks based on RMSE (left) and SMAPE (right) metrics. The v ertical axis represen ts the av erage ranks of the forecasting metho ds, while the horizontal axis lists the corresp onding techniques. F or instance, SyNF-Div-Reg – 2.50 denotes that the SyNF-Div-Reg framework achiev ed an av erage rank of 2.50 according to the RMSE metric (low er metric indicates b etter performance), and similar in terpretations apply to the other methods. 21 computational efficiency and sym bolic interpretabilit y , enabling the model to capture complex nonlinear dep endencies of real-w orld time series observ ations while remaining structurally stable. In con trast, the SyTF framework employs an evolutionary symbolic search mechanism that iterativ ely constructs expres- sion trees, allowing b oth the sym b olic structure and asso ciated parameters to evolv e ov er time. Although this approach provides flexibilit y and can ac hieve strong p erformance in low-dimensional and noise-free settings, its effectiveness diminishes for high-dimensional, noisy , and chaotic datasets. The rapidly ex- panding symbolic search space under these conditions leads to computational inefficiency and sub optimal con vergence, ultimately degrading forecasting performance for real-w orld datasets. 5.5 Uncertain t y Quan tification Alongside p oin t forecasts, we also applied a conformal prediction approach to quan tify the uncertaint y in the forecasts pro duced by the sym b olic forecasting models. This non-parametric metho d offers a systematic wa y to transform deterministic predictions from the symbolic forecasters ( S F ) into reliable prediction in terv als [69]. Giv en the input sequence y t , the approach constructs an uncertaint y mo del ( U M ) based on the lagged inputs n y t − 1 o to capture the predictive uncertaint y . Using this setup, a conformal score ( C S t ) is then derived as C S t =    y t − S F  y t − 1     U M  y t − 1  . The mo del-agnostic conformal prediction approac h leverages the inheren t temp oral patterns of the input series and computes the conformal quan tile ( C Q t ) using a weigh ted conformal metho d with an α -sized windo w Γ t = 1  ˜ t ≥ t − α  , ∀ ˜ t < t as C Q t = inf    ∆ : 1 min  α, ˜ t − 1  + 1 t − 1 X ˜ t =1 C S ˜ t Γ ˜ t ≥ 1 − δ    . Hence, the 100 ∗ (1 − δ ) % conformal prediction interv al based on these weigh ted quan tiles is giv en b y: S F  y t − 1  ± C Q t U M  y t − 1  . In this analysis, we construct 90% conformal prediction interv als for the real-w orld forecasting tasks generated b y the b est-p erforming SyNF-Div-Reg model, as iden tified in Figure 8. T o ensure the statistical v alidit y of these interv als and preven t any p otential data leak age, the conformal residuals are derived using a separate calibration (v alidation) set, distinct from the test p erio d. Figure 9 illustrates b oth the p oin t forecasts pro duced by the SyNF-Div-Reg mo del and the corresponding ground-truth observ ations, with shaded regions represen ting the conformal prediction interv als. As observed, the SyNF-Div-Reg framew ork effectiv ely captures the intricate temp oral patterns and nonlinear fluctuations inherent in b oth the San Juan dengue incidence and El Ni ˜ no SST datasets. The predicted tra jectories align closely with the observed v alues, and the ma jority of data p oin ts lie within the conformal in terv als, underscoring the mo del’s predictive reliabilit y . Notably , the in terv als adapt dynamically to the underlying data regimes, expanding during highly v olatile p erio ds and con tracting in stable phases, demonstrating the abilit y of SyNF-Div-Reg to represent regime-dep enden t uncertain ty and temp oral heterogeneity in real-world dynamics. Overall, the SyNF-Div-Reg mo del provides the b est p erformance in b oth p oin t forecasting and uncertain ty estimation for real-w orld c haotic time series datasets. 22 Figure 9: The plot visualizes the ground truth (red points) observ ations of (a) Sanjuan Dengue and (b) El Ni ˜ no SST dataset during the test perio d, along with the p oint forecasts (blue line) and the conformal prediction in terv al of the SYNF-Div-Reg model (yello w shaded). 6 Conclusion and F uture W ork The in tegration of symbolic disco very with contemporary forecasting pip elines offers a path wa y to ward mo dels that are not only accurate but also analytically tractable and scientifically meaningful. This study pro vides the first large-scale b enc hmark for symbolic mac hine learning in chaotic time-series forecasting, p ositioning symbolic mo del discov ery as a viable approach to interpretable forecasting of chaotic dynam- ics. Across 132 chaotic attractors, expression-tree metho ds, such as SyTF and its richer operator v ariant, ac hieve the most accurate and stable one-step-ahead forecasts, outp erforming a wide range of deep learn- ing and ensemble baselines. SyTF also yields compact equations that exp ose the inherent patterns of the underlying dynamics. On real-w orld data, the strongest p erformance shifts to ward the neural-symbolic family: sparsity and rational op erators improv e generalization. In particular, regularized SyNF v ariants performed b est for dengue incidence, and the division-regularized v arian t p erformed b est for Ni ˜ no 3.4 SST. Bey ond p oin t accuracy , the symbolic neural framework supp orts uncertain ty quantification through conformal prediction interv als that adapt to volatilit y , offering a practical pathw a y tow ard reliable, interpretable forecasting in high-stakes applications. Sev eral extensions follow naturally . First, while we focus on rolling-window now casting, an imp ortan t next step is to ev aluate multi-step-ahead p erformance, including the stabilit y of disco vered equations under iterative forecasting. Second, incorp orating multiv ariate inputs and exogenous drivers (esp ecially for real-world climate and epidemic series) could impro ve b oth the accuracy and interpretabilit y of the reco vered relationships. Third, tigh ter in tegration b et ween symbolic discov ery and dynamical system constrain ts, such as stabilit y or inv ariance priors, may further reduce the expression search space while impro ving robustness. Although this work represents an initial attempt to integrate sym b olic machine learning tec hniques for forecasting real-w orld time series, future research should also explore their appli- cation in high-risk domains such as mobile health monitoring, and contin uous time series from electronic health records including heart rate, ECG, and EEG signals, where the in terpretability of forecasting mo dels is critically imp ortan t. W e hop e this b enchmark stimulates further research at the intersection of forecasting, dynamical systems, and in terpretable mac hine learning. 23 Data and Co de Av ailabilit y Statemen t Data and co des are av ailable in our GitHub repository: https://github.com/mad- stat/Symbolic_ Forecasting . References [1] Rob Allan, Janette Lindesa y , and David P arker. El Ni ˜ no Southern Oscil lation & Climatic V ariability . Cam bridge Universit y Press, 1996. [2] Corey M Benedum, Kim b erly M Shea, Helen E Jenkins, Louis Y Kim, and Natasha Markuzon. W eekly dengue forecasts in iquitos, p eru; san juan, puerto rico; and singapore. Plos Ne gle cte d tr opic al dise ases , 14(10):e0008710, 2020. [3] Guido Boffetta, Massimo Cencini, Massimo F alcioni, and Angelo V ulpiani. Predictabilit y: a wa y to c haracterize complexity . Physics r ep orts , 356(6):367–474, 2002. [4] Cristian Bonatto, Mic hael F eyereisen, St ´ ephane Barland, Massimo Giudici, Cristina Masoller, Jose R Rios Leite, and Jorge R T redicce. Deterministic optical rogue w a ves. Physic al r eview letters , 107(5):053901, 2011. [5] Leo Breiman. Random forests. Machine le arning , 45:5–32, 2001. [6] Anne Brindle. Genetic algorithms for function optimization. 1980. [7] Kevin Ren´ e Broløs, Meera Vieira Machado, Chris Cav e, Jaan Kasak, V aldemar Sten toft-Hansen, Victor Galindo Batanero, T om Jelen, and Casp er Wilstrup. An approach to symbolic regression using feyn. arXiv pr eprint arXiv:2104.05417 , 2021. [8] Charles George Broyden. The con vergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applie d Mathematics , 6(1):76–90, 1970. [9] Stev en L Brun ton and J Nathan Kutz. Data-driven scienc e and engine ering: Machine le arning, dynamic al systems, and c ontr ol . Cam bridge Universit y Press, 2022. [10] Stev en L Brunton, Joshua L Pro ctor, and J Nathan Kutz. Discov ering go verning equations from data b y sparse identification of nonlinear dynamical systems. Pr o c e e dings of the national ac ademy of scienc es , 113(15):3932–3937, 2016. [11] T an ujit Chakrab ort y , Swarup Chattopadhy a y , and Indra jit Ghosh. F orecasting dengue epidemics using a h ybrid methodology . Physic a A: Statistic al Me chanics and its Applic ations , 527:121266, 2019. [12] C. Challu, K. G. Oliv ares, B. N. Oreshkin, F. Garza Ramirez, M. Mergenthaler Canseco, and A. Dubrawski. Nhits: Neural hierarchical interpolation for time series forecasting. In Pr o c e e dings of the AAAI Confer enc e on Artificial Intel ligenc e , v olume 37, pages 6989–6997, 2023. [13] Tianqi Chen and Carlos Guestrin. Xgb oost: A scalable tree b oosting system. In Pr o c e e dings of the 22nd ACM SIGKDD international c onfer enc e on know le dge disc overy and data mining , pages 785–794, 2016. [14] Alison Cozad and Nikolaos V Sahinidis. A global minlp approach to symbolic regression. Mathe- matic al Pr o gr amming , 170:97–119, 2018. [15] Miles Cranmer. Interpretable machine learning for science with p ysr and symbolicregression. jl. arXiv pr eprint arXiv:2305.01582 , 2023. 24 [16] V asilis Dakos, Marten Scheffer, Egb ert H V an Nes, Victor Brovkin, Vladimir P etoukhov, and Her- mann Held. Slowing down as an early w arning signal for abrupt climate change. Pr o c e e dings of the National A c ademy of Scienc es , 105(38):14308–14312, 2008. [17] Abhiman yu Das, W eihao Kong, Andrew Leach, Shaan K Mathur, Ra jat Sen, and Rose Y u. Long- term forecasting with tiDE: Time-series dense encoder. T r ansactions on Machine L e arning R ese ar ch , pages 2835–8856, 2023. [18] Ethan R Deyle and George Sugihara. Generalized theorems for nonlinear state space reconstruction. Plos one , 6(3):e18295, 2011. [19] Joseph Enguehard. Learning p erturbations to explain time series predictions. In International Confer enc e on Machine L e arning , pages 9329–9342. PMLR, 2023. [20] J Doyne F armer and John J Sidoro wich. Predicting chaotic time series. Physic al r eview letters , 59(8):845, 1987. [21] AS F ok as, N Dik aios, and YC Y ortsos. An algebraic formula, deep learning and a nov el seir-t yp e mo del for the co vid-19 pandemic. R oyal So ciety Op en Scienc e , 10(8), 2023. [22] William Gilpin. Chaos as an interpretable b enchmark for forecasting and data-driv en mo delling. In Thirty-fifth Confer enc e on Neur al Information Pr o c essing Systems Datasets and Benchmarks T r ack , 2021. [23] William Gilpin. Mo del scale versus domain knowledge in statistical forecasting of c haotic systems. Physic al R eview R ese ar ch , 5(4):043252, 2023. [24] William Gilpin. Generative learning for nonlinear dynamics. Natur e R eviews Physics , 6(3):194–206, 2024. [25] Da vid E Goldb erg and Kalyanmo y Deb. A comparative analysis of selection schemes used in genetic algorithms. In F oundations of genetic algorithms , v olume 1, pages 69–93. Elsevier, 1991. [26] Sepp Ho c hreiter and J ¨ urgen Sc hmidh ub er. Long short-term memory . Neur al c omputation , 9(8):1735– 1780, 1997. [27] Jiao Hu, Jiaxu Cui, and Bo Y ang. Learning interpretable net work dynamics via universal neural sym b olic regression. Natur e Communic ations , 16(1):6226, 2025. [28] Rob J Hyndman and George Athanasopoulos. F or e c asting: principles and pr actic e . OT exts, 2018. [29] Herbert Jaeger and Harald Haas. Harnessing nonlinearity: Predicting chaotic systems and sa ving energy in wireless communication. scienc e , 304(5667):78–80, 2004. [30] Mic hael A Johansson, Karyn M Apfeldorf, Scott Dobson, Jason Devita, Anna L Buczak, Benjamin Baugher, Linda J Moniz, Thomas Bagley , Stev en M Babin, Erhan Guven, et al. An open c hallenge to adv ance probabilistic forecasting for dengue epidemics. Pr o c e e dings of the National A c ademy of Scienc es , 116(48):24268–24274, 2019. [31] Pierre-Alexandre Kamienny , St ´ ephane d’Ascoli, Guillaume Lample, and F ran¸ cois Charton. End- to-end sym b olic regression with transformers. A dvanc es in Neur al Information Pr o c essing Systems , 35:10269–10281, 2022. [32] Holger Kantz and Thomas Schreiber. Nonline ar time series analysis . Cambridge universit y press, 2003. 25 [33] Guolin Ke, Qi Meng, Thomas Finley , T aifeng W ang, W ei Chen, W eidong Ma, Qiwei Y e, and Tie-Y an Liu. Lightgbm: A highly efficient gradient b oosting decision tree. A dvanc es in neur al information pr o c essing systems , 30, 2017. [34] Mark E Kotanchek, Ek aterina Vladislavlev a, and Guido Smits. Symbolic regression is not enough: it takes a village to raise a mo del. Genetic Pr o gr amming The ory and Pr actic e X , pages 187–203, 2013. [35] John R Koza. Genetic programming as a means for programming computers by natural selection. Statistics and c omputing , 4:87–112, 1994. [36] Olha Kuzmenko, Y uriy Bilan, Evgenia Bondarenko, Beata Gavuro v a, and Hanna Y aro v enko. Dy- namic stabilit y of the financial monitoring system: In tellectual analysis. PloS One , 18(1):e0276533, 2023. [37] William La Cav a, Bogdan Burlacu, Marco Virgolin, Mic hael Kommenda, P atryk Orzecho wski, F abr ´ ıcio Olivetti de F ran¸ ca, Ying Jin, and Jason H Mo ore. Contemporary symbolic regression meth- o ds and their relativ e performance. A dvanc es in neur al information pr o c essing systems , 2021(DB1):1, 2021. [38] P at Langley . Bacon: A pro duction system that discov ers empirical laws. In IJCAI , page 344. Citeseer, 1977. [39] P at Langley and Jan M Zytko w. Data-driven approaches to empirical discov ery . A rtificial Intel li- genc e , 40(1-3):283–312, 1989. [40] Xiao yi Liu, Duxin Chen, W enjia W ei, Xia Zhu, and W enwu Y u. Interpretable sparse system identifi- cation: Beyond recent deep learning tec hniques on time-series prediction. In The twelfth international c onfer enc e on le arning r epr esentations , 2024. [41] Nour Makke and Sanjay Chawla. In terpretable scien tific discov ery with symbolic regression: a review. A rtificial Intel ligenc e R eview , 57(1):2, 2024. [42] Georg Martius and Christoph H Lamp ert. Extrap olation and learning equations. arXiv pr eprint arXiv:1610.02995 , 2016. [43] P Mogensen and A Riseth. Optim: A mathematical optimization pack age for julia. Journal of Op en Sour c e Softwar e , 3(24), 2018. [44] Madha v Muthy ala, F arshud Sorourifar, and Jo el A Paulson. T orchsisso: A pytorc h-based imple- men tation of the sure indep endence screening and sparsifying op erator for efficient and interpretable mo del discov ery . Digital Chemic al Engine ering , 13:100198, 2024. [45] Madha v R Muth yala, F arsh ud Sorourifar, Y ou Peng, and Jo el A Paulson. Symantic: An efficient sym b olic regression metho d for interpretable and parsimonious mo del discov ery in science and b e- y ond. Industrial & Engine ering Chemistry R ese ar ch , 2025. [46] P eter Bjorn Nemen yi. Distribution-fr e e multiple c omp arisons. Princeton Universit y , 1963. [47] Boris N Oreshkin, Dmitri Carp o v, Nicolas Chapados, and Y osh ua Bengio. N-b eats: Neural basis expansion analysis for interpretable time series forecasting. arXiv pr eprint arXiv:1905.10437 , 2019. [48] Runhai Ouyang, Stefano Curtarolo, Emre Ahmetcik, Matthias Scheffler, and Luca M Ghiringhelli. Sisso: A compressed-sensing metho d for identifying the b est lo w-dimensional descriptor in an im- mensit y of offered candidates. Physic al R eview Materials , 2(8):083802, 2018. 26 [49] Madh urima Panja, T an ujit Chakraborty , Uttam Kumar, and Nan Liu. Epicasting: an ensemble w av elet neural netw ork for forecasting epidemics. Neur al Networks , 165:185–212, 2023. [50] Brenden K P etersen, Mikel Landa juela, T Nathan Mundhenk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. Deep s ym bolic regression: Reco vering mathematical expressions from data via risk-seeking policy gradients. arXiv pr eprint arXiv:1912.04871 , 2019. [51] Arnob Ra y , T an ujit Chakraborty , and Dibak ar Ghosh. Optimized ensem ble deep learning framework for scalable forecasting of dynamics containing extreme even ts. Chaos: A n Inter disciplinary Journal of Nonline ar Scienc e , 31(11), 2021. [52] Esteban Real, Chen Liang, Da vid So, and Quo c Le. Automl-zero: Ev olving machine learning al- gorithms from scratc h. In International c onfer enc e on machine le arning , pages 8007–8019. PMLR, 2020. [53] F ederico Sabbatini. F our decades of symbolic kno wledge extraction from sub-sym b olic predictors. a surv ey . A CM Computing Surveys , 58(3):1–36, 2025. [54] Subham Saho o, Christoph Lamp ert, and Georg Martius. Learning equations for extrap olation and con trol. In International Confer enc e on Machine L e arning , pages 4442–4450. Pmlr, 2018. [55] Tim Sauer, James A Y ork e, and Martin Casdagli. Embedology . Journal of statistic al Physics , 65:579–616, 1991. [56] Mic hael Schmidt and Ho d Lipson. Distilling free-form natural laws from exp erimen tal data. scienc e , 324(5923):81–85, 2009. [57] Jagadish Shukla. Predictabilit y in the midst of c haos: A scien tific basis for climate forecasting. scienc e , 282(5389):728–731, 1998. [58] Jinglu Song, Qiang Lu, Bozhou Tian, Jingw en Zhang, Jak e Luo, and Zhiguang W ang. Pro ve sym b olic regression is np-hard by symbol graph. arXiv pr eprint arXiv:2404.13820 , 2024. [59] W enxiang Song, Shijie Jiang, Gustau Camps-V alls, Mathew Williams, Lu Zhang, Markus Reic hstein, Harry V ereec k en, Leilei He, Xiaolong Hu, and Liangsheng Shi. T o wards data-driv en discov ery of go verning equations in geosciences. Communic ations Earth & Envir onment , 5(1):589, 2024. [60] Stev en H. Strogatz. Nonline ar Dynamics and Chaos: With Applic ations to Physics, Biolo gy, Chem- istry, and Engine ering . CR C Press, 2018. [61] Floris T akens. Detecting strange attractors in turbulence. In Dynamic al Systems and T urbulenc e, Warwick 1980: pr o c e e dings of a symp osium held at the University of Warwick 1979/80 , pages 366– 381. Springer, 2006. [62] Y ang T ang, J¨ urgen Kurths, W ei Lin, Edward Ott, and Ljup co Ko carev. Introduction to focus issue: When mac hine learning meets complex systems: Netw orks, chaos, and nonlinear dynamics. Chaos: A n Inter disciplinary Journal of Nonline ar Scienc e , 30(6), 2020. [63] Silviu-Marian Udrescu and Max T egmark. Ai feynman: A physics-inspired metho d for sym b olic regression. Scienc e advanc es , 6(16):eaa y2631, 2020. [64] Ashish V asw ani, Noam Shazeer, Niki P armar, Jakob Uszk oreit, Llion Jones, Aidan N Gomez, Luk asz Kaiser, and Illia Polosukhin. Atten tion is all y ou need. A dvanc es in neur al information pr o c essing systems , 30, 2017. 27 [65] Marco Virgolin, T anja Alderliesten, Cees Witteveen, and Peter AN Bosman. Impro ving mo del- based genetic programming for symbolic regression of small expressions. Evolutionary c omputation , 29(2):211–237, 2021. [66] Marco Virgolin and Solon P Pissis. Symbolic regression is np-hard. arXiv pr eprint arXiv:2207.01018 , 2022. [67] Marco Virgolin, Ziyuan W ang, T anja Alderliesten, and P eter AN Bosman. Machine learning for the prediction of pseudorealistic p ediatric ab dominal phantoms for radiation dose reconstruction. Journal of Me dic al Imaging , 7(4):046501–046501, 2020. [68] Ek aterina J Vladisla vlev a, Guido F Smits, and Dick Den Hertog. Order of nonlinearity as a com- plexit y measure for mo dels generated by sym b olic regression via pareto genetic programming. IEEE T r ansactions on Evolutionary Computation , 13(2):333–349, 2008. [69] Vladimir V ovk, Alexander Gammerman, and Glenn Shafer. Conformal prediction. Algorithmic le arning in a r andom world , pages 17–51, 2005. [70] Casper Wilstrup and Jaan Kasak. Symbolic regression outp erforms other mo dels for small data sets. arXiv pr eprint arXiv:2103.15147 , 2021. [71] Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In Pr o c e e dings of the AAAI c onfer enc e on artificial intel ligenc e , v olume 37, pages 11121–11128, 2023. [72] Hao yi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and W ancai Zhang. Informer: Beyond efficien t transformer for long sequence time-series forecasting. In Pr o c e e dings of the AAAI c onfer enc e on artificial intel ligenc e , v olume 35, pages 11106–11115, 2021. 28

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment