A Tree Adjoining Grammar Representation for Models Of Stochastic Dynamical Systems
Model structure and complexity selection remains a challenging problem in system identification, especially for parametric non-linear models. Many Evolutionary Algorithm (EA) based methods have been proposed in the literature for estimating model str…
Authors: Dhruv Kh, elwal, Maarten Schoukens
A T ree Adjoining Grammar Represen tation for Mo dels Of Sto c hastic Dynamical Systems ? , ?? Dhruv Khandelw al a , Maarten Sc houkens a , Roland T´ oth a a Dep artment of Ele ctric al Engine ering, Eindhoven University of T e chnolo gy, Eindhoven, The Netherlands Abstract Mo del structure and complexit y selection remains a c hallenging problem in system iden tification, especially for parametric non-linear models. Many Ev olutionary Algorithm (EA) based metho ds hav e b een prop osed in the literature for estimating mo del structure and complexit y . In most cases, the prop osed metho ds are devised for estimating structure and complexity within a specified mo del class and hence these metho ds do not extend to other mo del structures without significant c hanges. In this pap er, we prop ose a T ree Adjoining Grammar (T AG) for sto chastic parametric models. T A Gs can b e used to generate mo dels in an EA framew ork while imp osing desirable structural constrain ts and incorp orating prior knowledge. In this pap er, w e propose a T A G that can systematically generate models ranging from FIRs to polynomial NARMAX models. F urthermore, w e demonstrate that T A Gs can b e easily extended to more general mo del classes, such as the non-linear Bo x-Jenkins mo del class, enabling the realization of flexible and automatic model structure and complexity selection via EA. Key wor ds: System iden tification, tree adjoining grammar, ev olutionary algorithms 1 In tro duction In recent years, there has b een a resurgence in the use of Ev olutionary Algorithms (EAs) for data-driv en mo d- elling of dynamical systems. Undoubtedly , one of the main driving forces for this is the steady gro wth of com- putation pow er. EAs are b eing increasingly used in a m ultitude of engineering domains and life science (Eib en et al., 2003; Arias-Montano et al., 2012). Across several domains, EAs hav e generated results that are comp eti- tiv e and, sometimes, ev en surprising (Eib en et al., 2003). Another factor con tributing to the growing p opularity of EAs is that these algorithms can b e used to gener- ate solutions for complex problems for whic h no system- atic solution approach exists in general. In parametric system identification, the estimation of model structure and mo del complexity is one such problem. Mo del structure selection is a classical problem in system iden tification. Over the years, a v ariety of metho ds for This research is supp orted by the Dutc h Organization for Scientific Researc h (NWO, domain TTW, grant: 13852) whic h is partly funded b y the Ministry of Economic Affairs of The Netherlands. Corresp onding author D. Khandelw al. Email addr esses: D.Khandelwal@tue.nl (Dhruv Khandelw al), M.Schoukens@tue.nl (Maarten Sc houk ens), R.Toth@tue.nl (Roland T´ oth). system identification hav e b een developed. Each of these metho ds adopt different approaches to solve the problem of mo del structure selection. While methods like Predic- tion Error Minimization (PEM) treat mo del structure selection as a user’s choice (Ljung, 1999), other meth- o ds (for example, Pillonetto et al. (2011), Laurain et al. (2020)) rely on a flexible model structure, and attempt to estimate or control the complexity of the mo del-to-b e- estimated via regularization. F urthermore, the appropri- ate mo del complexity is often chosen by ranking mo d- els based on an information metric, such as AIC, BIC, or based on a user-defined complexity meas ure (Ro jas et al., 2014). In cases where the n umber of candidate mo dels grows combinatorially with resp ect to the length (or the complexity) of the mo del, a ranking-based com- plexit y selection strategy becomes intractable, restrict- ing mo del structure selection to regularization or shrink- age based metho ds. As a consequence of the aforementioned challenges, heuristics-based metho ds such as EAs hav e b een used to estimate mo del structure and complexity , with a fair amoun t of success. How ever, the application of EAs ha v e b een, to some extent, sup erficial. The premise of the biologically-inspired heuristics used in EAs is that the solutions of a given problem can be constructed from fundamental building blocks, and these fundamen- tal comp onents can b e interc hanged b etw een different Preprin t submitted to Automatica 26 Ma y 2020 solutions. In the system identification literature, the prop osed EA-based approaches to model structure and complexit y selection can b e categorized as follows: i) approac hes that choose a fixed mo del structure and use EAs to determine the appropriate mo del com- plexit y (or mo del terms), and ii) approac hes that use EAs to explore mo del structure and mo del complexity . In the first category of EA-based approac hes, the ba- sic building blo c ks of an EA are chosen suc h that only mo dels with a sp ecific mo del structure can b e generated. Hence, these approaches cannot b e typically extended to other model structures without significan t modifica- tions. This approac h can be found in F onseca and Flem- ing (1996); Ro driguez-V azquez et al. (2004); Rodr ´ ıguez- V´ azquez and Fleming (2000), where the authors use EAs to p erform term selection within a c hosen mo del struc- ture. This approac h is also used in Kristinsson and Du- mon t (1992), where the authors use GAs to estimate p ole-zero lo cations for ARMAX mo dels. In the second category of EA-based approaches, more generic set of building blo cks are used in the EA, allow- ing the generation of mo dels with arbitrary mo del struc- tures. In this case, EAs are used to determine not just the appropriate complexit y of the mo del, but also the approprite mo del structure (e.g., in terms of the non- linear functions to be included in the mo del). How ever, unrestrained generation of arbitrary model structures using EA may result in mo dels that are not w ell-p osed, e.g., models with discontin uities, non-causality , or finite escap e-time. Typically , these problems are av oided by using arbitrary ad-ho c solutions, e.g, setting all discon- tin uities to 0. Another common drawbac k of EA-based approac hes that fall in the second category is that prior kno wledge of the dynamical system cannot b e incorpo- rated systematically in the identification procedure. In Mad´ ar et al. (2005), the authors use GP to identify NAR- MAX mo dels that may contain arbitrary non-linearities. While the authors are interested in mo dels that are linear-in-the-parameters, GP may return mo dels that do not b elong to that class. Consequently , the authors use an ad-ho c solution to ensure that the candidate mo del structures generated by GP are linearly parameterized. A similar approac h was used in Quade et al. (2016) with a larger set of mathematical operations. Again, the prop osed approach do es not allow for systematic inclu- sion of mo del structure constraints or prior knowledge of the system. A slightly differen t approach is used in Gra y et al. (1998), where the authors use GP to con- struct linear or non-linear mo dels from basic elements lik e SIMULINK blo c ks and static non-linearities. Again, the combination of v arious SIMULINK blo cks cannot be systematically structured to av oid ill-p osed mo dels. In this pap er, we propose a generative grammar based represen tation of sto chastic parametric dynamical sys- tems. The prop osed representation allows for the gen- eration of complex, yet well-posed dynamical models b y combining a set of fundamen tal building blo cks in w ell-sp ecified wa ys. The resulting generative declaration of mo dels defines a notion of mo del set that is more generalized than that conv entionally used, for example, in Ljung (1999). The generative grammar used in this w ork is called T ree Adjoining Grammar (T AG) (Joshi and Schabes, 1997). The use of T AG in an EA-based approac h makes it p ossible to dev elop a system iden- tification framework where EAs are used to automati- cally determine the structure and complexit y of a model from a generic, well-posed class of dynamical models, while systematically incorporating model structure con- strain ts and prior knowledge. A preliminary concept of the proposed framew ork (without pro ofs) w as presen ted in Khandelwal et al. (2019b). The prop osed approach for grammar-based iden tification was found to produce re- sults that w ere comparable to state-of-the-art non-linear system identification approaches, while using no sp ecial- ized knowledge of the b enchmark system b eing identi- fied. The main con tributions of this pap er are the following. W e presen t a detailed discussion on the discrete-time input-output represen tation of dynamical systems us- ing T AG, and in tro duce a new notion of a mo del set defined by the generativ e capacit y of a T A G. Subse- quen tly , we develop a T A G for the p olynomial NAR- MAX mo del class. W e prov e that any mo del structure generated by the prop osed T AG b elongs to the class of p olynomial NARMAX mo dels, and con versely , any p olynomial NARMAX mo del can b e represented using the proposed T A G (for which an algorithm is also pro- p osed). W e demonstrate that the mo del set corresp ond- ing to the prop osed T AG includes, as sp ecial cases, other commonly used mo del structures suc h as FIR, ARX and T runcated V olterra series mo dels. W e als o demon- strate that the prop osed representation can be easily ex- tended to other mo del structures (namely p olynomial Non-linear Box-Jenkins, or NBJ). Note that, while the T AG-based mo del set notion developed in this contri- bution is motiv ated b y its applicabilit y in an EA-based iden tification metho dology , the identification approach itself is not in the scop e of the present con tribution. A preliminary v ersion of such an identification metho dol- ogy can b e found in Khandelwal et al. (2019a) and Khan- delw al et al. (2019b). The contributions in this paper differ from Khandelwal et al. (2019b) in the following resp ects: • w e form ulate a T AG for a larger class of dynamical systems (the p olynomial NARMAX class), and prov e their equiv alence, • w e pro vide an algorithm to compute an equiv alen t T AG representation of a giv en polynomial NARMAX mo del, 2 • w e illustrate, via examples, the restriction (and gen- eralization) of the prop osed T A G in order to generate mo dels with more sp ecific (or generic) structures. The remainder of the paper is structured as follows. The concept of T AG is introduced, b oth informally and for- mally , in Sec. 2. In Sec. 3 w e in tro duce the notion of mo del set as defined by a given T AG, and prop ose a T AG that generates the class of p olynomial NARMAX mo d- els. Sev eral examples are used to illustrate the concept in Sec. 4, follow ed by concluding statements in Sec. 5. 2 T ree Adjoining Grammar T o set the stage for the developmen t of T AG for sto chas- tic non-linear systems, first w e in tro duce the basic con- cepts of T AG. Since T AG was initially developed from linguistic considerations, a linguistic example will b e used to illustrate the methodology . This will be follo wed b y formal definitions. T o make the example illustrative, w e first sp ecify an example string, and then infer a T AG that would generate the given string. Conv ersely , for the formal definitions, we will b egin with the basic compo- nen ts of a T AG and lead up to the definition of T AG and op erations that can b e p erformed on T A Gs. 2.1 An informal description Informally , a formal gr ammar can b e describ ed as a set of rules for generating strings. The resulting set of strings is called the language generated by the grammar. In con- trast, T AG describ es a set of rules for generating trees. The resulting set of trees is called the tr e e language of the T AG. The yield of all the trees in the tree set sub- sequen tly determines the corresp onding language. The following example has been derived from Joshi and Schabes (1997). Consider the sentence “A man saw Mary”. Simple grammatical constructs can b e used to decomp ose the giv en sen tence in to its basic comp onen ts. F or example, the sen tence consists of articles (“A”), nouns (“man”, “Mary”) and verbs (“saw”). Other un- derlining structures, such as sub jects and predicates, can also be observed in the sentence. The sentence, to- gether with the underlying grammatical structure can b e represented in a single tree structure as shown in Fig. 1. The tree depicted in Fig. 1 is called a derive d tr e e . The yield of a deriv ed tree are the labels asso ciated with the lea v es of the tree. Hence, the yield of the derived tree in Fig. 1 is “A man saw Mary”. The given derived tree can b e obtained by combining basic building blocks that are constituen ts of the T A G. Fig. 2 depicts the set of initial trees I and auxiliary trees A , collectively known as elementary tr e es , that can b e com bined in sp ecific w a ys to pro duce the derived tree in Fig. 1. The set of initial trees I can b e informally describ ed as a set of non-recursive replacemen t rules that can b e used to generate a set of trees. The set of auxiliary s e n t e n c e s u b p re d a rt N V N a m a n s a w M a ry α 1 : α 2 : s u b PN ↓ s u b a rt ↓ N ↓ α 3 : p re d V ↓ N ↓ α 4 : s e n t e n c e s u b ↓ pr e d↓ α 1 α 2 α 4 β 1 β 1 : s e n t e n c e adv ↓ s e n t e n c e ∗ 1 2 0 Fig. 1. A derived tree with the yield ”A man sa w Mary”. The tree depicts the grammatical constructs that are evident in the structure of the sentence - a sub ject (sub) and a predicate (pred), an article (art), a v erb (V) and nouns (N). s e n t e n c e s u b p re d a rt N V N a m a n s a w M a ry α 1 : α 2 : s u b N ↓ s u b a rt ↓ N ↓ α 3 : p re d V ↓ N ↓ α 4 : s e n t e n c e s u b ↓ pr e d↓ α 1 α 2 α 4 β 1 β 1 : s e n t e n c e adv ↓ s e n t e n c e ∗ 1 2 0 (a) Set of initial tr e es ( I ). s e n t e n c e s u b p re d a rt N V N a m a n s a w M a ry α 1 : α 2 : s u b PN ↓ s u b a rt ↓ N ↓ α 3 : p re d V ↓ N ↓ α 4 : s e n t e n c e s u b ↓ pr e d↓ α 1 α 2 α 4 β 1 β 1 : s e n t e n c e adv ↓ s e n t e n c e ∗ 1 2 0 (b) Set of auxiliary tr e es ( A ). Fig. 2. The sets I and A serv e as building blo c ks of the tree set of a T AG. trees can b e described as a set of recursive replacement rules. Consequen tly , each auxiliary tree has a terminal no de with the same lab el as that of its ro ot no de. The do wn ward arro w symbol ↓ and the star sym bol ? in Fig. 2 represen t no des in a tree that are av ailable for a substitution and adjunction operation respectively . A substitution op eration can b e used to substitute an initial tree into, for instance, another initial tree, if and only if the latter has a terminal no de (leaf ) with a lab el that matc hes the lab el of the ro ot node of the prior. On the other hand, adjunction can b e lo osely describ ed as the operation of inserting an auxiliary tree into a syn tactic tree. Adjunction of an auxiliary tree can take place on a non-terminal no de of a syn tactic tree if and only if the no de has a label that matches the lab el of the ro ot no de of the auxiliary tree to b e adjoined. Consider the following sequence of op erations. The ini- tial tree α 3 can b e substituted in α 1 at the lo cation of the “sub” no de. Let’s denote the resulting tree as γ 1 . The tree γ 1 is an exam ple of a syntactic tr e e , a tree ob- tained by applying an arbitrary num b er of substitution and adjunction operations to a given initial tree. Again, the initial tree α 4 can b e substituted to the syntactic tree γ 1 at the lo cation of the “pred” no de. Let the result b e denoted as γ 2 . Note that γ 2 has the same structure as the example in Fig. 1, upto the last level of the derived tree, where sp ecific articles, nouns and verbs are substi- tuted in the tree to obtain the yield “a man sa w Mary”. Substitution can b e p erformed on a initial tree or syn- tactic tree as long as there exist no des a v ailable for sub- stitution, mark ed by ↓ . A derive d tr e e is a syn tactic tree 3 s e n t e n c e s u b p re d a rt N V N a m a n s a w M a ry α 1 : α 2 : s u b PN ↓ s u b a rt ↓ N ↓ α 3 : p re d V ↓ N ↓ α 4 : s e n t e n c e s u b ↓ pr e d↓ α 1 α 2 α 4 β 1 : s e n t e n c e adv ↓ s e n t e n c e ∗ 1 2 (a) Deriv ation tree for ”a man sa w Mary”. s e n t e n c e s u b p re d a rt N V N a m a n s a w M a ry α 1 : α 2 : s u b PN ↓ s u b a rt ↓ N ↓ α 3 : p re d V ↓ N ↓ α 4 : s e n t e n c e s u b ↓ pr e d↓ α 1 α 2 α 4 β 1 β 1 : s e n t e n c e adv ↓ s e n t e n c e ∗ 1 2 0 (b) Deriv ation tree for ”y esterda y a man saw Mary”. Fig. 3. Deriv ation tree represen tation - dashed lines represent substitutions, solid lines represen t adjunction, and lab els on the edges represent the Gorn addresses (a method to assign a lab el to a no de in a tree structure, see Gorn (1965)) of the no des participating in substitution or adjunction. in which none of the terminal no des (leav es) are av ail- able for substitution. The initial and auxiliary trees pro- vide an alternative representation, the derivation tr e e , as shown in Fig. 3a. Based on the T AG in Fig. 2, more complex sen tences can also b e generated. F or example, the auxiliary tree β 1 can be adjoined to the ro ot node of γ 2 since both root nodes ha ve the label “sen tence”. This op eration effectively adds an adverb b efore the sentence, yielding the sentence “y esterday a man sa w Mary”. The resulting deriv ation tree is depicted in Fig. 3b. The set of all derived trees that can b e obtained, by start- ing from a given start symbol, say “sen tence”, and ap- plying an arbitrary num b er of adjunctions and/or sub- stitutions using elementary trees is called the tr e e lan- guage of the corresp onding T AG. The string yield of all trees in the tree set is called the string language of the corresp onding T A G. W e can now in tro duce the formal definitions of the con- cepts that were informally describ ed in this example. 2.2 The formal definitions The formal definitions of T A G and related concepts can b e found in Joshi and Schabes (1997) and Kallmeyer (2009). These definitions are repro duced here for com- pleteness. Definition 1 A finite tr e e is a dir e cte d gr aph, denote d by γ = < V , E , r > , wher e, V is the set of vertic es, E is the set of e dges, and r ∈ V is the r o ot no de, such that - γ c ontains no cycles, - r ∈ V has in-de gr e e (numb er of inc oming e dges) 0, - Al l v ∈ V \ { r } have in-de gr e e 1, - Every v ∈ V is ac c essible fr om r , - A vertex with out-de gr e e (i.e., numb er of outgoing e dges) 0 is a le af. Definition 2 A lab eling of a gr aph γ = < V , E > over a signatur e < A 1 , A 2 > is a p air of functions l : V → A 1 and g : E → A 2 , with A 1 , A 2 b eing a set of disjoint alphab ets. x x γ = γ 0 = Substitution x (a) T AG substitution op eration. x x γ = γ 0 = Adjunction x x x (b) T AG adjunction op eration. Fig. 4. Illustration of the T AG op erations (Khandelwal et al., 2019b). F or the next definitions, assume N and T to b e disjoint sets of non-terminals and terminals, resp ectively . Definition 3 A syntactic tr e e is an or der e d, lab el le d tr e e < V , E , r > such that the lab el l ( v ) ∈ N for e ach vertex v with out-de gr e e at le ast 1 and l ( v ) ∈ ( N ∪ T ∪ ) for e ach le af v . Definition 4 A n auxiliary tr e e is a syntactic tr e e < V , E , r > such that ther e is a unique le af f , marke d as fo ot no de, with l ( f ) = l ( r ) . An auxiliary tr e e is denote d as < V , E , r , f > . Definition 5 A n initial tr e e is a non-auxiliary syntactic tr e e. With the basic concepts defined, we can now define T AG, and the related op erations. Definition 6 A T r e e A djoining Gr ammar is a tuple G = < N , T , S, I , A > , wher e - N , T ar e disjoint alphab ets of non-terminals and ter- minals, - S ∈ N is a start symb ol, - I is a finite set of initial tr e es and A is a finite set of auxiliary tr e es. The set of trees I ∪ A is called elementary tr e es . Definition 7 (Substitution) L et γ = < V , E , r > b e a syntactic tr e e and γ 0 = < V 0 , E 0 , r 0 > b e an initial tr e e and v ∈ V . The r esult of substituting γ 0 into γ at no de v , denote d as γ [ v , γ 0 ] , is define d as fol lows - If v is not a le af or v is a fo ot no de or l ( v ) 6 = l ( r 0 ) , then γ [ v , γ 0 ] is not define d, - otherwise, γ [ v , γ 0 ] = < V 00 , E 00 , r > with V 00 = V ∪ V 0 \ { v } , (1) 4 and E 00 = ( E \ { < v 1 , v 2 > | v 2 = v and v 1 ∈ V } ) ∪ E 0 ∪ { < v 1 , r 0 > | v 1 , v ∈ E } . (2) The substitution op eration is illustrated in Fig. 4a. Definition 8 (Adjunction) L et γ = < V , E , r > b e a syntactic tr e e and γ 0 = < V 0 , E 0 , r 0 , f > b e an auxiliary tr e e and v ∈ V with out-de gr e e at le ast 1. The r esult of adjoining γ 0 into γ at no de v , denote d as γ [ v, γ 0 ] , is define d as fol lows - if l ( v ) 6 = l ( r 0 ) then γ [ v, γ 0 ] is undefine d, - else γ [ v , γ 0 ] = < V 00 , E 00 , r 00 > with V 00 = V ∪ V 0 \ v , (3) and E 00 = ( E \ { < v 1 , v 2 > | v 1 = v or v 2 = v } ) ∪ E 0 ∪ { < v , r 0 > | < v 1 , v > ∈ E } ∪ { < f , v 2 > | < v , v 2 > ∈ E } . (4) The adjunction op eration is illustrated in Fig. 4b. Recall that a tree obtained by p erforming an arbitrary n um b er of v alid substitution and adjunction op erations to an initial tree γ = h V , E , r i with l ( r ) = S is called a derive d tr e e (for example, as in Fig. 1). Also recall that the substitution and adjunction op erations p erformed can b e represented in a tree representation called deriva- tion tr e e (for example, as in Fig. 3). A deriv ed tree is said to b e satur ate d if all leav es of the derived tree belong to the set T and cannot b e further substituted. The corre- sp onding deriv ation tree is also said to b e satur ate d . Definition 9 (T ree language and string language) L et G = h N , T , S, I , A i b e a T A G. The tr e e language L T ( G ) of gr ammar G is define d as the set of al l satur ate d derive d tr e es in G with r o ot S . The string language L ( G ) of G is the set of yields of the tr e es in L T ( G ) . 3 T A G Description of Dynamical Systems In this Section, we define a notion of mo del set based on T AG and prop ose a T AG for a generic class of dynamical mo dels - the p olynomial NARMAX class. 3.1 Mo del set Consider the following discrete-time input-output rep- resen tation of a non-linear dynamical mo del y k = f ( u k , . . . , u k − n u , y k − 1 , . . . , y k − n y , ξ k − 1 , . . . , ξ k − n ξ ) + ξ k (5) where u k , y k ∈ R are the input and output signals at time-instan t k , ξ k ∼ N (0 , σ 2 ξ ) is a noise signal indepen- den t of input u , constants n u , n y and n ξ are the corre- sp onding maximum time-lags and the non-linear func- tion f ( · ) b elongs to an arbitrary set of functions M . In PEM, the set of functions M , also known as the mo del set , along with a sp ecified choice for n u , n y and n ξ , is determined b y a user based on exp ert knowledge, prior information and informative exp eriments. It will b e demonstrated in Sec. 3.2 that T AG can b e used to generate trees that yield non-linear functions f ( · ) with desirable structural prop erties and v arying choices of ar- gumen ts (time lags of the inv olved u, y and ξ signals). This capability of T A G leads to a more generalized no- tion of mo del set M . In order to formalize this concept, w e introduce a function Π f ( u, y , ξ , k ) that maps from function f to the right-hand-side expression in (5) (in string form). W e can no w define a new notion of model set, based on T A G, defined as follows. Definition 10 F or a given T AG G , the c orr esp onding mo del set M ( G ) is define d as the set of mo dels in the form of (5) such that Π f ( u, y , ξ , k ) ∈ L T ( G ) . Note that this is a more generalized notion of mo del set as compared to that used in PEM. In PEM, a model set is t ypically determined by choosing a fixed mo del struc- ture along with a suitable parameterization (i.e. mo del complexit y). On the other hand, in this w ork, the c hoice of initial and auxiliary trees of a T A G automatically de- termines the mo del set. The adv antage of such a declara- tion of a mo del set is that, when no prior information is a v ailable, the mo del set can b e c hosen to span a n umber of commonly used mo del classes without a prior sp ec- ification of the mo del complexity . On the other hand, when prior information on the structure or complexit y of the mo del is av ailable, the grammar can b e suitably refined to restrict the mo del set. In the subsequen t sec- tions, we prop ose a T A G for a generic mo del class, and demonstrate that the resulting mo del set spans a num- b er of mo del structures commonly used in PEM. 3.2 The p olynomial NARMAX mo del class The NARMAX mo del class is a flexible class on non- linear input-output dynamical mo dels, see Leon taritis and Billings (1985). The p olynomial NARMAX mo del class is the set of all NARMAX mo dels where the non- linear relationships are of the p olynomial kind. Poly- nomial NARMAX is a conv enient model representation since an y con tinuous function on a closed space can b e appro ximated arbitrary w ell using p olynomial functions (based on W eierstrass’ theorem, see Stone (1948)). F ur- thermore, the family of p olynomial NARMAX mo dels includes, as sp ecial cases, other commonly used mo del classes such as FIR and ARMAX. It will b e shown that these mo dels can b e generated by suitably restricting the T A G presented here. 5 α 1 : e x p r 0 a f f β 7 : β 1 : β 2 : e x p r 0 e x p r 0 * op e x p r 1 + e x p r 2 q - 1 e x p r 2 y op × e x p r 2 e x p r 2 * op × q - 1 e x p r 0 e x p r 0 * op e x p r 1 + e x p r 2 u β 4 : β 5 : e x p r 1 e x p r 1 * op x e x p r 2 q - 1 e x p r 2 y op × e x p r 1 e x p r 1 * op x e x p r 2 u β 3 : e x p r 0 e x p r 0 * op e x p r 1 + e x p r 2 q - 1 e x p r 2 op × β 6 : e x p r 1 e x p r 1 * op x e x p r 2 q - 1 e x p r 2 op × ξ ξ ξ Fig. 5. Initial T rees I of T A G G N . A discrete-time SISO p olynomial NARMAX mo del can b e represented as (see Billings (2013)) y k = θ 0 + n X i 1 =1 θ i 1 x i 1 ,k + n X i 1 =1 n X i 2 = i 1 θ i 1 i 2 x i 1 ,k x i 2 ,k + . . . n X i 1 =1 · · · n X i l = i l − 1 θ i 1 i 2 ...i l x i 1 ,k x i 2 ,k . . . x i l ,k + ξ k , (6) where l is the order of the p olynomial non-linearit y , θ i 1 i 2 ...i m are the mo del parame ters, and x k = ( x 1 ,k · · · x n y + n u + n ξ ,k ) > is a vector consisting of the past input, output and noise v alues building up the regressors x m,k = y k − m 1 ≤ m ≤ n y u k − ( m − n y − 1) n y + 1 ≤ m ≤ n y + n u + 1 ξ k − ( m − n y − n u − 1) n y + n u + 2 ≤ m ≤ n y + n u + n ξ + 1 . (7) W e will also use the following alternative and equiv alent represen tation for p olynomial NARMAX mo dels: y k = p X i =1 c i n u Y j =0 u b i,j k − j n ξ Y l =1 ξ d i,l k − l n y Y m =1 y a i,m k − m + ξ k , (8) where p is the num b er of mo del terms, c i are the mo del parameters, a i,m , b i,j , d i,l ∈ N are the exp onents for out- put, input and noise terms. 3.3 Pr op ose d T AG r epr esentation In this section we prop ose a T AG for the p olynomial NARMAX model class. The proposed T AG captures the structural relationships in (8). In the sequel, the time index will b e dropp ed in the context of the pro- p osed T AG, as q − 1 will b e used to denote a backw ard time shift. F or conv enience, introduce the following no- tation. F or a giv en mo del in the form of (8), define J i : = { j ∈ N ≥ 0 | b i,j 6 = 0 } , L i : = { l ∈ N > 0 | d i,l 6 = 0 } and M i : = { m ∈ N > 0 | a i,m 6 = 0 } . F or the i th mo del term, the sequence of dela ys in the input, noise and output factors are denoted b y ¯ j ( i ) n n ∈ J i , ¯ l ( i ) n n ∈ L i , ¯ m ( i ) n n ∈ M i resp ectiv ely . Theorem 1 Consider the T AG G N = < N , T , S, I , A > with α 1 : e x p r 0 a f f β 7 : β 1 : β 2 : e x p r 0 e x p r 0 * op e x p r 1 + q - 1 e x p r 2 y op × e x p r 2 e x p r 2 * op × q - 1 e x p r 0 e x p r 0 * op e x p r 1 + β 4 : β 5 : e x p r 1 e x p r 1 * op x e x p r 2 q - 1 e x p r 2 y op × e x p r 1 e x p r 1 * op x e x p r 2 u β 3 : e x p r 0 e x p r 0 * op + β 6 : e x p r 1 e x p r 1 * op x e x p r 2 q - 1 e x p r 2 op × ξ ξ par e x p r 2 op × u c par e x p r 2 op × c e x p r 1 q - 1 e x p r 2 ξ op × par e x p r 2 op × c Fig. 6. Auxiliary trees A of T AG G N . - N = { expr0 , expr1 , expr2 , op , par } , - T = { u , y , ξ , + , c , × , q − 1 } , - S = expr0 , - I = { α 1 } , wher e initial tr e e α 1 is depicte d in Fig. 5, - A = { β 1 , β 2 , β 3 , β 4 , β 5 , β 6 , β 7 } , wher e the auxiliary tr e es β i ’s ar e depicte d in Fig. 6. The mo del set M ( G N ) is e quivalent to the set of al l mo dels that c an b e expr esse d as (8) with finite values of p, n u , n y and n ξ . PR OOF. F or the first part of the proof, w e show that for any polynomial NARMAX mo del in the form of (8), there exists a deriv ation tree such that the result- ing derived tree has a yield that is equal to the RHS of (8). Algorithm 1 constructs suc h a deriv ation tree for a giv en p olynomial NARMAX mo del. The pro cedure De- la ys ( γ , v , n ) adjoins n auxiliary tree β 7 to the deriv ation tree γ at vertex v . The algorithm constructs the deriv a- tion tree by in tro ducing the first factor ( u, y or ξ ) of each of the p model terms, and subsequen tly building eac h of the branches by introducing the remaining factors with the corresp onding delays and exp onents . F or the second part of the pro of, it needs to b e shown that all expressions in L ( G N ), i.e., yields of all p ossible trees generated by G N , are RHS expressions of p olyno- mial NARMAX mo dels. This is prov en b y structural in- duction. W e first observe that the simplest tree in L ( G N ) is the initial tree α 1 with the yield ξ . This corresp onds to the mo del y k = ξ k , (9) 6 Algorithm 1 P arse NARMAX mo del (8) to deriv ation tree. Require: p, J i , L i , M i , ¯ j ( i ) n n ∈ J i , ¯ l ( i ) n n ∈ L i , ¯ m ( i ) n n ∈ M i 1: V ← { v 0 } ; l ( v 0 ) ← α 1 initialize with start tree 2: r ← v 0 3: V ← S p i =1 { v i, 1 } ∪ V Insert p v ertices to b egin the p summation branc hes 4: E ← S p i =2 {h v i − 1 , 1 , v i, 1 i} ∪ {h v 0 , v 1 , 1 i} 5: for i ← 1 , p do 6: if J i 6 = φ then If there is an input factor in the i th term 7: l ( v i, 1 ) ← β 1 F or each summation branch, assign the appropriate lab el to the first vertex 8: h V , E , r i ← Dela ys ( h V , E , r i , v i, 1 , ¯ j ( i ) 1 ) Adjoin dela y trees 9: b i, ¯ j ( i ) 1 ← b i, ¯ j ( i ) 1 − 1 Reduce the corresp onding exp onen t by 1 10: else if L i 6 = φ then 11: l ( v i, 1 ) ← β 3 12: h V , E , r i ← Dela ys ( h V , E , r i , v i, 1 , ¯ l ( i ) 1 ) 13: d i, ¯ l ( i ) 1 ← d i, ¯ l ( i ) 1 − 1 14: else if M i 6 = φ then 15: l ( v i, 1 ) ← β 2 16: h V , E , r i ← Dela ys ( h V , E , r i , v i, 1 , ¯ m ( i ) 1 ) 17: a i, ¯ m ( i ) 1 ← a i, ¯ m ( i ) 1 − 1 18: s i ← 1 Coun ter for multiplying remaining factors 19: for all j ∈ J i do 20: V ← S b i,j n =1 { v i,s i + n, 1 ∪ V } ; l ( v i,s i + n, 1 ) ← β 4 21: E ← S b i,j n =1 {h v i,s i + n − 1 , 1 , v i,s i + n, 1 i} ∪ E 22: for n ← 1 , ¯ b ( i ) j do Adjoin dela ys for multiple factors 23: h V , E , r i ← Dela ys ( h V , E , r i , v i,s i + n, 1 , j ) 24: s i ← s i + ¯ b ( i ) j 25: for all l ∈ L i do 26: V ← S d i,l n =1 { v i,s i + n, 1 ∪ V } ; l ( v i,s i + n, 1 ) ← β 6 27: E ← S d i,l n =1 {h v i,s i + n − 1 , 1 , v i,s i + n, 1 i} ∪ E 28: for n ← 1 , ¯ d ( i ) l do 29: h V , E , r i ← Dela ys ( h V , E , r i , v i,s i + n, 1 , l ) 30: s i ← s i + ¯ d ( i ) l 31: for all m ∈ M i do 32: V ← S a i,m n =1 { v i,s i + n, 1 ∪ V } ; l ( v i,s i + n, 1 ) ← β 5 33: E ← S a i,m n =1 {h v i,s i + n − 1 , 1 , v i,s i + n, 1 i} ∪ E 34: for n ← 1 , ¯ a ( i ) m do 35: h V , E , r i ← Dela ys ( h V , E , r i , v i,s i + n, 1 , m ) 36: s i ← s i + ¯ a ( i ) m return h V , E , r i whic h belongs to the p olynomial NARMAX class. No w, consider an arbitrary saturated derived tree γ ∈ L T ( G N ) whose yield is the RHS of a p olynomial NARMAX mo del. This implies that the yield is a p olynomial ex- pression in terms of the factors u , y and ξ . T o complete the principle of induction, it m ust b e shown that an y p ossible adjunction to γ results in a new tree in L T ( G N ) whose yield is also a polynomial expression in terms of the aforementioned factors. F or conv enience, the auxiliary trees are group ed based on the op erators inv olved - β 1 , β 2 , β 3 are called additive-typ e auxiliary trees, β 4 , β 5 , β 6 are called multiplic ative-typ e , and β 7 is called delay-typ e auxiliary tree. The following adjunctions b e made on γ : • adjunction of an additive-t yp e tree. Suc h an adjuction in tro duces an input, output or noise term additiv ely in the expression while resp ecting the causality of the expression. Hence the resulting expression is also a p olynomial; • adjunction of a multiplicativ e-type tree. This simply in tro duces m ultiplicative factors to an existing model term, and hence, the resulting expression is also a p olynomial; • adjunction of a delay-t yp e tree. This op eration sim- ply adds delays to an existing monomial, and hence preserv es the p olynomial structure of the expression. Since all p ossible op erations yield a causal p olynomial expression, it can b e concluded that L ( G N ) consists of only dynamical p olynomial expressions in terms of the factors u, y and ξ whic h corresp onds to a p olynomial NARMAX mo del. This concludes the pro of. 2 Theorem 1 demonstrates that structural prop erties of a ric h class of dynamical mo dels can b e captured within a compact set of trees of a T A G. The expansive rep- resen tational capabilit y of T AG can b e exploited using EAs suc h as GP to identify mo dels without prior sp ec- ification of structure and complexity , as demonstrated in Khandelw al et al. (2019b). F urthermore, Algorithm 1 pro vides a metho d to compute the deriv ation tree rep- resen tation of a given p olynomial NARMAX mo de l in terms of grammar G N . Consequently , a v ailable prior in- formation ab out the mo del of the system can b e trans- lated to T A G represen tation (or incorporated in tree sets I , A ), thereb y making the evolutionary searc h more efficien t. Hence, the use of T AG enables iden tification within a larger class of dynamical mo dels without re- quiring user-interaction, while simultaneously allowing the user to restrict the evolutionary search effectively . 4 Illustrations In this section w e discuss aspects of T AG useful for EA- based SI. W e demonstrate the use of T AG G N to gener- ate p olynomial NARMAX mo dels. It is also shown that mo dels belonging to simpler model classes can be gener- ated b y scaling do wn the set of elemen tary trees of G N appropriately . F urthermore, more flexible mo del classes can b e represented b y scaling up the set of elementary trees. This is demonstrated b y extending the prop osed T AG to generate Non-linear Box Jenkins (NBJ) mo dels. 7 e x p r a f f ξ β 2 e x p r 0 op e x p r 1 + e x p r 2 q - 1 e x p r 2 y op × α 1 0 β 1 e x p r 0 op e x p r 1 + u 0 e x p r 2 op × c op × par c par 𝑦 𝑘 = 𝑐 1 𝑦 𝑘 − 1 + 𝑐 2 𝑢 𝑘 + 𝜉 𝑘 (A) ( B) (a) Example 1 - deriv ation tree (A), derived tree (B) and sym b olic model. 𝑦 𝑘 = 𝑐 1 𝑦 𝑘 − 1 2 + 𝑐 2 𝑢 𝑘 + 𝜉 𝑘 (A) ( B) β 4 e x p r 1 op x e x p r 2 q - 1 e x p r 2 y op × e x p r a f f ξ β 2 e x p r 0 op e x p r 1 + e x p r 2 q - 1 e x p r 2 y op × α 1 0 1 β 1 e x p r 0 op e x p r 1 + u 0 e x p r 2 op × c op × par c par (b) Example 2 - deriv ation tree (A), deriv ed tree (B) and sym b olic model. 𝑦 𝑘 = 𝑐 1 𝑦 𝑘 − 1 2 + 𝑐 2 𝑢 𝑘 + 𝑐 3 𝜉 𝑘 − 1 𝜉 𝑘 − 2 + 𝜉 𝑘 (A) (B) β 5 e x p r 1 op x e x p r 2 q - 1 e x p r 2 y op × e x p r a f f ξ β 2 e x p r 0 op e x p r 1 + e x p r 2 q - 1 e x p r 2 y op × α 1 0 1 β 1 e x p r 0 op e x p r 1 + u 0 e x p r 2 op × c op × par c par β 6 0 β 7 1 β 3 1 e x p r 0 op + e x p r 1 q - 1 e x p r 2 op × ξ e x p r 1 op x e x p r 2 q - 1 e x p r 2 op × ξ e x p r 2 op × q - 1 c op × par e x p r 2 (c) Example 3 - deriv ation tree (A), derived tree (B) and symbolic mo del. Fig. 7. Illustrativ e examples. 4.1 Mo del gener ation using G N Three illustrativ e examples are used to demonstrate the generation of mo dels using G N . The mo dels generated b elong to the ARX, p olynomial NARX and p olynomial NARMAX mo del classes. It will b e demonstrated that b y restricting the elemen tary trees I and A to subsets of the elementary trees in the prop osed T AG G N , we can generate mo dels that only b elong to mo del sub-classes that are prop erly included in the set of p olynomial NAR- MAX mo dels, such as FIR and truncated V olterra series. 4.1.1 ARX example ARX mo dels can b e describ ed by the equation y k = n u X i =0 b i u k − i + n y X j =1 a j y k − j + ξ k , (10) where a j , b i ∈ R are co efficients. The grammar G N can b e used to generate ARX models b y restricting the aux- iliary tree set A as A 0 = { β 1 , β 2 , β 7 } ⊂ A. (11) Consider the example depicted in Fig. 7a. T ree (A) is a deriv ation tree with initial tree α 1 at the ro ot no de, and auxiliary trees β 1 and β 2 in subsequent vertices. The edges are lab elled with Gorn addresses of vertices in the auxilliary trees at which adjunctions take place. P erforming the adjunctions results in derived tree (B) in Fig. 7a. The RHS of the resulting mo del app ears at the lea v e s of the derived tree, and the corresp onding mo del is y k = c 1 y k − 1 + c 2 u k + ξ k . (12) 4.1.2 NARX example P olynomial NARX mo dels can b e describ ed by the equa- tion y k = p X i =1 c i n u Y j =0 u b i,j k − j n y Y m =1 y a i,m k − m + ξ k . (13) By restricting auxiliary trees to the set A 00 = { β 1 , β 2 , β 4 , β 5 , β 7 } ⊂ A (14) w e can restrict the prop osed grammar to generate p olynomial NARX mo dels only . Consider the example 8 α 1 : e x p r 0 op e x p r 1 , ϕ e x p r 3 a f f ξ β 1 : β 2 : e x p r 1 e x p r 1 * op e x p r 2 + e x p r 5 q - 1 e x p r 5 ŷ op × e x p r 1 e x p r 1 * op e x p r 2 + e x p r 5 u β 3 : β 4 : e x p r 3 e x p r 3 * op e x p r 4 + e x p r 5 q - 1 e x p r 5 v op × e x p r 3 e x p r 3 * op e x p r 4 + e x p r 5 ξ β 6 : β 10 : e x p r 4 e x p r 4 * op x e x p r 5 q - 1 e x p r 5 v op × e x p r 2 e x p r 2 * op x e x p r 5 u par op × c par op × c par op × c par op × c β 7 : e x p r 2 e x p r 2 * op x e x p r 5 q - 1 ŷ op × e x p r 5 β 8 : e x p r 4 e x p r 4 * op x e x p r 5 ξ β 11 : e x p r 5 e x p r 5 * op × q - 1 q - 1 e x p r 5 op × β 5 : e x p r 3 e x p r 3 * op e x p r 4 + e x p r 5 u par op × c q - 1 e x p r 5 op × β 9 : e x p r 4 e x p r 4 * op x e x p r 5 u Fig. 8. Initial and auxiliary trees ( I and A ) of T AG G NBJ deriv ation tree (A) in Fig. 7b, which is an extension of the previous example. The deriv ation tree consists of the initial tree α 1 , and auxiliary trees β 2 , β 3 and β 4 . P erforming the adjunctions describ ed by the deriv a- tion tree results in the derived tree (B) in Fig. 7b. The corresp onding symbolic mo del is y k = c 1 y 2 k − 1 + c 2 u k + ξ k . (15) 4.1.3 NARMAX example This example builds on the previous example by using the complete auxiliary tree set A and adjoining trees β 3 , β 6 and β 7 to the tree β 2 . The new deriv ation tree and deriv ed tree are depicted in Fig. 7c. The corresp onding mo del, y k = c 1 y 2 k − 1 + c 2 u k + c 3 ξ k − 1 ξ k − 2 ξ k + ξ k , (16) is a p olynomial NARMAX mo del. 4.2 Non-line ar Box-Jenkins Extension Just lik e the proposed grammar can b e scaled down to generate sp ecific dynamic sub-classes, it can also b e ex- tended to generate mo dels that b elong to a more gen- eralized class of mo dels. W e illustrate this by extend- ing the proposed grammar to a more generalized mo dels structure - Non-linear Box Jenkins (NBJ). In the case of linear systems, a Bo x-Jenkins mo del struc- ture is an extension of the Output Error (OE) mo del structure, where the error is modelled as an ARMA pro- cess (Ljung, 1999). The BJ class also includes, as special cases, other linear mo del structures suc h as ARMAX and OE. In the same spirit, NBJ mo del structure can b e expressed as a Non-linear Output Error (NOE) mo del where the error is subsequently mo delled as a NARMA pro cess. The NBJ mo del structure is given by the fol- lo wing equations ˆ y k = f ( ˆ y k − 1 , . . . , ˆ y k − n y , u k , . . . , u k − n u ) , v k = g ( v k − 1 , . . . , v k − n v , u k , ..., u k − n u , ξ k − 1 , . . . , ξ k − n ξ ) + ξ k , y k = ˆ y k + v k , (17) where f ( · ) and g ( · ) are p olynomial functions in terms of their argumen ts. Notice that the RHS expressions of the equations describing the pro cess and noise dynamics ha v e the same structure that was studied in Sec. 3.2 for NARMAX models (see (8)). Hence, the prop osed T AG can b e extended to generate NBJ mo dels. Fig. 8 depicts the initial and auxiliary trees of the grammar for NBJ mo del structures G NBJ . The structure of the initial tree α 1 ensures that all elemen ts in L ( G NBJ ) contain t w o expressions, separated by a comma, that represent the functions f ( · ) and g ( · ) resp ectively . Eac h of these expressions can b e expanded by adjoining auxiliary trees that ensure that the p olynomial structure is main tained. 5 Conclusions W e presented a T AG based concept of a model set, that is more general than that commonly used in the system iden tification literature. A T AG G N w as prop osed that captures the dynamical structure of p olynomial NAR- MAX mo dels. It was demonstrated that sub-classes of the p olynomial NARMAX class can b e represented by c ho osing an appropriate subset of the elemen tary trees of G N . Similarly , more flexible mo del classes like Non-linear Bo x-Jenkins can b e represented by extending the set of elemen tary trees. This illustrates that a compact set of elemen tary trees can b e used to express the dynamical relationships across a v ariety of mo del classes, thereby enabling the design of T A G-based EA approac hes for SI that require minimal user-interaction. The practi- cal soundness of this concept has been demonstrated in Khandelwal et al. (2019b), where a T AG-based EA approac h was used to identify a non-linear b enchmark dataset with minimal user-interaction, and also in Khan- delw al et al. (2019a), where the same T AG-based EA approac h is used to identify multiple real physical sys- tems and b enchmark data set with minimal c hanges in the metho dology itself. References Arias-Mon tano, A., Coello, C. A. C., and Mezura- Mon tes, E. (2012). Multiob jective evolutionary al- gorithms in aeronautical and aerospace engineering. IEEE T r ansactions on Evolutionary Computation , 16(5):662–694. 9 Billings, S. A. (2013). Nonline ar system identific ation: NARMAX metho ds in the time, fr e quency, and sp atio- temp or al domains . John Wiley & Sons. Eib en, A. E., Smith, J. E., et al. (2003). Intr o duction to evolutionary c omputing , v olume 53. Springer. F onseca, C. M. and Fleming, P . J. (1996). Non-linear system identification with multiob jective genetic algo- rithms. In Pr o c. of 13 th IF A C World Congr ess , pages 1169–1174, San F rancisco, USA. Gorn, S. (1965). Explicit definitions and linguistic domi- no es. Gra y , G. J., Murra y-Smith, D. J., Li, Y., Sharman, K. C., and W ein brenner, T. (1998). Nonlinear mo del struc- ture identification using genetic programming. Con- tr ol Engine ering Pr actic e , 6(11):1341–1352. Joshi, A. K. and Sc habes, Y. (1997). T ree-adjoining grammars. In Handb o ok of formal languages , pages 69–123. Springer. Kallmey er, L. (2009). A declarative characterization of differen t types of m ulticomp onen t tree adjoining grammars. R ese ar ch on L anguage and Computation , 7(1):55–99. Khandelw al, D., Schouk ens, M., and T´ oth, R. (2019a). Data-driv en mo delling of dynamical systems using tree adjoining grammar and genetic programming. In Pr o c. of the IEEE Congr ess on Evolutionary Compu- tation , pages 2673–2680, W ellington, New Zealand. Khandelw al, D., Sc houkens, M., and T´ oth, R. (2019b). Grammar-based representation and identification of dynamical systems. In Pr o c. of the 18th Eur op e an Contr ol Confer enc e (ECC) , pages 1318–1323, Naples, Italy . Kristinsson, K. and Dumon t, G. A. (1992). System iden tification and control using genetic algorithms. IEEE T r ansactions on Systems, Man, and Cyb ernet- ics , 22(5):1033–1046. Laurain, V., T´ oth, R., Piga, D., and Darwish, M. A. H. (2020). Sparse rkhs estimation via globally conv ex op- timization and its application in lpv-io identification. A utomatic a , 115:108914. Leon taritis, I. and Billings, S. A. (1985). Input-output parametric mo dels for non-linear systems part i: de- terministic non-linear systems. International journal of c ontr ol , 41(2):303–328. Ljung, L., editor (1999). System Identific ation (2 nd Ed.): The ory for the User . Prentice Hall PTR. Mad´ ar, J., Ab onyi, J., and Szeifert, F. (2005). Genetic programming for the identification of nonlinear input- output mo dels. Industrial & engine ering chemistry r ese ar ch , 44(9):3178–3186. Pillonetto, G., Chiuso, A., and De Nicolao, G. (2011). Prediction error iden tification of linear systems: a non- parametric gaussian regression approach. A utomat- ic a , 47(2):291–305. Quade, M., Ab el, M., Shafi, K., Niven, R. K., and Noack, B. R. (2016). Prediction of dynamical systems b y sym b olic regression. Physic al R eview E , 94(1):012214. Ro dr ´ ıguez-V´ azquez, K. and Fleming, P . J. (2000). Use of genetic programming in the identification of rational mo del structures. In Pr o c. of the Eur op e an Confer- enc e on Genetic Pr o gr amming , pages 181–192, Berlin, Heidelb erg. Ro driguez-V azquez, K., F onseca, C. M., and Fleming, P . J. (2004). Identifying the structure of nonlinear dynamic systems using multiob jective genetic pro- gramming. IEEE T r ansactions on Systems, Man, and Cyb ernetics-Part A: Systems and Humans , 34(4):531– 545. Ro jas, C. R., T´ oth, R., and Hjalmarsson, H. (2014). Sparse estimation of p olynomial and rational dy- namical mo dels. IEEE T r ans. A utomat. Contr. , 59(11):2962–2977. Stone, M. H. (1948). The generalized weierstrass approx- imation theorem. Mathematics Magazine , 21(5):237– 254. 10
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment