Regular Expression Matching and Operational Semantics

Many programming languages and tools, ranging from grep to the Java String library, contain regular expression matchers. Rather than first translating a regular expression into a deterministic finite automaton, such implementations typically match th…

Authors: Asiri Rathnayake (University of Birmingham, United Kingdom), Hayo Thielecke (University of Birmingham

M.A. Reniers, P . Sobocinski (Eds.): W orkshop on Structural Operational Semantics 2011 (SOS 2011) EPTCS 62, 2011, pp. 31–45, doi:10.4204 /EPTCS. 62.3 Regular Expression Matching and Operational Semantic s Asiri Rathnayake Hayo Thielecke Univ ersity of Birmingham Birmingham B15 2TT , United Kingdom Many progr amming languages and to ols, r anging from grep to the Ja va String library , contain regular expression matchers. Rather than first translating a regular expression in to a deterministic finite automaton , such implementations typically match th e re gular expr ession on the fly . Thus they can be seen as v irtual machines inte rpreting the regular expression much a s if it were a pro gram with some non-d eterministic constructs such as the Kleene star . W e formalize th is implem entation techniqu e for regular expression matching using operation al semantics. Spe cifically , we derive a series of abstract machines, moving from the abstract definition o f matchin g to increasing ly realistic machines. First a co ntinuation is added to the operation al semantics to describe what remains to be matched af ter the c urrent expression. Next, we represent the expr ession as a data structu re u sing pointers, which enables redund ant searches to be eliminated via testing f or pointer equality . From there, we arrive both at Thomp son’ s lockstep con struction an d a machine that perfo rms some o perations in parallel, suitable for implem entation on a large n umber of c ores, such as a GPU. W e for malize the pa rallel machine using process algebr a and report some prelimin ary experiments with an implementation on a graphics processor using CUD A. 1 Introd uction Regul ar exp ressio ns form a minimalistic language of pattern-matc hing constructs. Originall y defined in Kleene’ s work on the foundation s of computation , the y hav e become ubiquitous in computing. Their practic al significance was boosted by Thompson’ s efficien t construction [13] of a regula r expressio n matcher ba sed on the “lockstep” simulatio n of a Non-det erminist ic F inite Automaton (NF A), and the wide use of regu lar expres sions in U nix too ls such as grep and awk. The regular expr ession matchers used in such tools differ in detail from the implementation of reg- ular expres sions used in compiler construc tion for lexica l analysis. In compiling , le xica l analyz ers are typica lly b uilt by constructin g a Dete rministic Fin ite Automaton (DF A), using one of t he standard resu lts of automata theory . The DF A can process input very ef ficiently , bu t its const ructio n incurs an addition al ov erhe ad befor e any inpu t can be matched. More ov er , the DF A constructio n only works if the matching langua ge really is a regular language, so that it can be recog nized by a DF A. Many matching languag es add constr ucts that take the language beyon d w hat a DF A can recogn ize, for instance back referenc es. (By ab use of terminolog y , such extended language s are sometimes still refer red to as “re gex es”.) Recently , C ox [5] has giv en a rational reconstruc tion of Thompson’ s classic NF A matcher in terms of vir tual machi nes. In essence, a regu lar ex pressi on is inter preted on the fly , much as a program in an interpreted programming language. The interp reter is a kind of virtua l m achine , with a small set of instru ctions suita ble for running reg ular expr ession s. For instance, the Kleene star e ∗ gi ve s a form of non-d eterminis tic loop. Cox emphasizes that the virtual machine approach in the style of Thompson is both flexible and efficien t. Once a basic virtu al m achine for regular expres sions i s set up, other constru cts such as back-refere nces can be added with relati ve ease. M oreo ve r , the machine is much more effici ent than other implemen tation techn iques based on a more naiv e backtracki ng interpreter [4], which ex hibit 32 Regul ar expre ssion matching and operational semantics exp onent ial run-time in some cases. Surprising ly , these inef ficient matchers are widely used in Ja v a and Perl [4]. In this paper , we formalize the view of re gula r expressio n matchers as machin es by using tools from progra mming langu age theory , specifically operatio nal semantics. W e do so starting from the usual definitio n of reg ular expre ssions and their meaning, and then defining increasin gly realistic machines. W e first define some prel iminaries and r ecall what it means for a string to match a re gula r ex pressio n in Section 2; from our perspecti ve, matching is a simple form of big-step semantics, and w e aim to refine it into a small-step semantic s. T o do so in Section 3, we introduce a distinct ion between a curren t exp ressio n and its continu ation. W e th en refine this semantics by represen ting the regular expressi on as a syntax tree usin g pointers in memory (Section 4). Crucially , the pointe r representat ion allo ws us to compare sub-e xpres sions by pointer equ ality (rather than structurally ). This pointer equality test is needed for the efficien t eliminat ion of redunda nt match attempts, which underlie s the general lockstep NF A s imulatio n pre sented in Secti on 5. W e recov er T hompson ’ s mach ine as a sequential i mplementa tion of the lockst ep construc tion (Section 6). Since the lockst ep construc tion in volv es simulatin g many non- determin istic machines in parallel, we then e xplore a par allel versi on using some simple pro cess algebra in S ection 7. T he parallel process semantics is then related to a prototy pe implementation we hav e written in C UD A [3 ] to run on a G raphic s Processor Unit (GPU) in Section 8. S ection 9 concludes with some future directi ons. The ove rall plan of the paper can be visualis ed as foll o ws: Regul ar expre ssion matching as big-step semantics (Sec. 2) EKW machin e (Sec. 3) PW π machine (Sec. 4) Generic locks tep construction (Sec. 5) Sequenti al matcher (S ec. 6) Parall el matcher (Sec. 7) Implementa tion on G raphics Processo r (Sec. 8) Small step with conti nuatio ns Pointer repre sentati on Macro steps Sequenti al scheduling Parall el scheduling Processe s as threads in C UD A 2 Regular expr ession matching as a big-step semantics Let Σ be a finite set, re garde d as the input alphabet. W e use the followin g abstract syntax for regular exp ressio ns: e :: = ε e :: = a where a ∈ Σ e :: = e ∗ e :: = e 1 e 2 e :: = e 1 | e 2 W e let e range ov er reg ular e xpres sions , a over characters, and w ov er strings of characte rs. The Rathnaya ke and Thielecke 33 e ↓ w e 1 ↓ w 1 e 2 ↓ w 2 ( S E Q ) ( e 1 e 2 ) ↓ ( w 1 w 2 ) ( M A T C H ) a ↓ a ( E P S I L O N ) ε ↓ ε e ↓ w 1 e ∗ ↓ w 2 ( K L E E N E 1 ) e ∗ ↓ ( w 1 w 2 ) ( K L E E N E 2 ) e ∗ ↓ ε e 1 ↓ w ( A L T 1 ) ( e 1 | e 2 ) ↓ w e 2 ↓ w ( A L T 2 ) ( e 1 | e 2 ) ↓ w Figure 2.1: Regular exp ressio n matching as a big- step semantics empty string is written as ε . Note that there is also a reg ular expres sion constant ε . W e also write the sequen tial composition e 1 e 2 as e 1 • e 2 when we want to emphasise it as the occurrence of an operator applie d to e 1 and e 2 , for instan ce in a syntax tree. For strin gs w 1 and w 2 , we write their conca tenatio n as juxtap ositio n w 1 w 2 . A single charact er a is also re garde d as a string of leng th 1. Our sta rting po int is the usual de finition of what it mean s for a strin g w to match a regular e xpression e . W e write this rela tion as e ↓ w , regard ing it as a big-ste p operation seman tics for a la nguag e with non-d eterminis tic branching e 1 | e 2 and a non-d eterminis tic loop e ∗ . T he rules are gi ve n in Figure 2.1. Some of our operational semantics will use lists. W e write h : : t for construc ting a list with head h and tail t . The concatenati on of two lists s and t is written as s @ t . For example , 1 : : [ 2 ] = [ 1 , 2 ] and [ 1 , 2 ] @ [ 3 ] = [ 1 , 2 , 3 ] . The empty list is written as [ ] . 3 The EKW machine The big-st ep operati onal semantics of matching in F igure 2.1 gi ve s us little informati on about ho w we should attempt to match a gi ve n input string w . W e define a small-step semantics, called the EKW machine, that makes the matching process m ore ex plicit . In the tradition of the SECD machine [7], the machine is named after i ts co mponent s: E for expressi on, K for co ntinu ation, W for word to be matched. Definition 3.1 A configuratio n of the EKW machine is of the form h e ; k ; w i where e is a reg ular exp ressio n, k is a list of reg ular expr ession s, and w is a string. T he transitions of the EKW machin e are gi ve n in Figure 3.1. The accepting configuration is h ε ; [ ] ; ε i . Here e is the regular expressi on the m achine is curren tly focusin g on. What remains to the right of the curren t exp ressio n is repres ented by k , the current continuatio n. The combinatio n of e and k together is attemptin g to match w , the curre nt input string. Note that m any of the rules are fairly standard, specifically the pushing and popping of the contin- uation stack. The machine is non-d eterminis tic. The paired rules w ith the same current expr ession s e ∗ or ( e 1 | e 2 ) gi v e rise to branchin g in order to search for matches, where it is su f ficient that one of the branch es succe eds. Theor em 3.2 (Partial corr ectness) e ↓ w if and only if there is a run h e ; [ ] ; w i → · · · → h ε ; [ ] ; ε i 34 Regul ar expre ssion matching and operational semantics h e ; k ; w i → h e ′ ; k ′ ; w ′ i h e 1 | e 2 ; k ; w i → h e 1 ; k ; w i (3.1) h e 1 | e 2 ; k ; w i → h e 2 ; k ; w i (3.2) h e 1 e 2 ; k ; w i → h e 1 ; e 2 : : k ; w i (3.3) h e ∗ ; k ; w i → h e ; e ∗ : : k ; w i (3.4) h e ∗ ; k ; w i → h ε ; k ; w i (3.5) h a ; k ; a w i → h ε ; k ; w i (3.6) h ε ; e : : k ; w i → h e ; k ; w i (3.7) Figure 3.1: EKW machine transitio n steps Example 3.3 Unfortunate ly , while The orem 3 .2 e nsures tha t all matching st rings are correctly acce pted, there is no guarante e that the m achine accept s all strings that it should on ev ery run. In fact, there are v alid inputs on which the m achine may enter an infinite loop; an example is the configuratio n h a ∗∗ ; [ ] ; a i . h a ∗∗ ; [ ] ; a i → h a ∗ ; [ a ∗∗ ] ; a i → h ε ; [ a ∗∗ ] ; a i → h a ∗∗ ; [ ] ; a i → · · · Such infinite loops can be pre ven ted by backtra cking and pruning. Howe ver , backtrac king implementa- tions can still take a very long time matching expres sions lik e a ∗∗ to a string consisti ng of, say , 1000 oc- curren ces of a character a fo llo wed b y some other b , due to t he expon ential ly increas ing sear ch sp ace [4]. In Thompson’ s matcher , such loops are a v oided by means of redun danc y elimination. The matcher checks whether it has encount ered t he same expressi on before. Note, ho we ve r , th at “the same” express ion is to be tak en in the sense of pointer equality rather than structura l equalit y . For instance, the two occurr ences of a in ( a b ) | ( a c ) wou ld be tak en as not the same , gi v en th eir dif ferent posi tions in the syntax tree. 4 The PW π machine W e refine the EKW machi ne by represe nting the regul ar e xpre ssion as a d ata s tructu re in a he ap π , whic h serv es as the program run by the machine. That way , the machine can distin guish between dif ferent positi ons in the syntax tree. Definition 4.1 A heap π is a finite partial fun ction f rom addresses to values . There e xists a d isting uished addres s null , which is not mapped to an y va lue. In our setting, the v alues are syntax tree nodes , repres ented by an operator from the syntax of regular exp ressio ns toget her with pointe rs to the tree for th e ar gumen ts (if any ) of the op erator . For e xample , for Rathnaya ke and Thielecke 35 p π ( p ) cont p p 0 p 1 • p 2 null p 1 p 3 ∗ p 2 p 2 b null p 3 p 4 ∗ p 1 p 4 a p 3 null p 0 • p 1 ∗ p 2 b p 3 ∗ p 4 a Figure 4.1: The regu lar express ion a ∗∗ • b as a tree with cont inuati on pointers sequen tial composition , we hav e a node containi ng ( p 1 • p 2 ) , where the two pointer s p 1 and p 2 point to the trees of the two ex pressi ons being compos ed. Definition 4.2 W e write ⊗ for the partial operatio n of forming the union of two partial functio ns pro- vided that their domains are disjoint. More formally , let f 1 : A ⇀ B and f 2 : A ⇀ B be two partial functi ons. Then if dom ( f 1 ) ∩ dom ( f 2 ) = / 0, the func tion ( f 1 ⊗ f 2 ) : A ⇀ B is defined as f 1 ⊗ f 2 = f 1 ∪ f 2 . Note tha t ⊗ is the same as the operat ion ∗ on heaps in separat ion logic [11], and he nce a partial commutati ve monoid. W e a v oid the notat ion ∗ as it could be confu sed with the K leene star . As in separa tion logic, we use ⊗ to descr ibe data struc tures with pointers in memory . Definition 4.3 W e write π , p | = e if p points to the root node of a regular express ion e in a heap π . The relatio n is defined by induc tion on e as follo ws: π , p | = a if π ( p ) = a π , p | = ε if π ( p ) = ε π , p | = ( e 1 | e 2 ) if π = π 0 ⊗ π 1 ⊗ π 2 ∧ π 0 ( p ) = ( p 1 | p 2 ) ∧ π 1 , p 1 | = e 1 ∧ π 2 , p 2 | = e 2 π , p | = ( e 1 e 2 ) if π = π 0 ⊗ π 1 ⊗ π 2 ∧ π 0 ( p ) = ( p 1 • p 2 ) ∧ π 1 , p 1 | = e 1 ∧ π 2 , p 2 | = e 2 π , p | = e 1 ∗ if π = π 0 ⊗ π 1 ∧ π 0 ( p ) = p 1 ∗ ∧ π 1 , p 1 | = e 1 Here the definitio n of π , p | = e preclude s any cy cles in the child point er chain. As an ex ample, conside r the regu lar expr ession e = a ∗∗ b . A π and p 0 such that π , p 0 | = e is g i ve n by the table in Figure 4.1. The tree structure, represen ted by the solid arro ws, is drawn o n the right. 36 Regul ar expre ssion matching and operational semantics p − → q or p a − → q relati ve to π p − → p 1 if π ( p ) = p 1 | p 2 p − → p 2 if π ( p ) = p 1 | p 2 p − → p 1 if π ( p ) = p 1 • p 2 p − → p 1 if π ( p ) = p 1 ∗ p − → p 2 if π ( p ) = p 1 ∗ and cont p = p 2 p − → p 1 if π ( p ) = ε and cont p = p 1 p a − → p ′ if π ( p ) = a and p ′ = c ont p Figure 4.2: PW π transit ions Definition 4.4 Let cont be a function cont : do m ( π ) → ( dom ( π ) ∪ { null } ) W e write π | = cont i f • If π ( p ) = ( p 1 | p 2 ) , then cont p 1 = c ont p and cont p 2 = cont p • If π ( p ) = ( p 1 • p 2 ) , then cont p 1 = p 2 and cont p 2 = c ont p • If π ( p ) = ( p 1 ) ∗ , then cont p 1 = p • cont p 0 = n ull , where p 0 is the point er to the roo t of the syntax tree. The function co nt is uniquely deter mined by the tree structure layed out in π , and it is easy to compute by a recursi ve tree walk. W e elide it when it is clear from the conte xt, assumin g that π alw ays comes equipped with a co nt such that π | = co nt . By treating cont as a function, we hav e not c ommitted to a particular implement ation; for in stance cont could be represen ted as a hash ta ble in de xed b y pointer v alues , or it coul d be added as another pointer field to the nod es in the heap. In the graph ical representa tion in Figure 4.1, dashed arro ws repres ent cont . In particul ar , note the cyc le leading do wnwar d from p 1 and up again via dashed arrows. Follo wing such a cyc le could lead to infinite loops as for the EKW machine in Example 3.3. Definition 4.5 The PW π machine is defined as follo ws. T ransitions of this machine are alwa ys relati ve to some heap π , which does not cha nge during ev aluation. W e elide π if it is clear from the contex t. Configurati ons of the machine are of the form h p ; w i , where p is a pointer in π and w is a string of input symbols. Giv en the transition relation between pointe rs defined in Figure 4.2, the machine has the follo wing transiti ons: p a − → q h p ; a w i → h q ; w i p − → q h p ; w i → h q ; w i The accepting state of the m achine is h null ; ε i . That is, both the continuat ion and the remainin g input ha v e been consumed. Rathnaya ke and Thielecke 37 Example 4.6 For a regu lar ex pressi on e = a ∗∗ b , let π and p 0 be such that π , p 0 | = e . See Figure 4.1 for the representa tion of π as a tree with pointer s. The diagram below illustr ates two possib le ex ecu tions of the PW π machine against inputs e and aab . Execu tion - 1: Infinite loop h p 0 ; a ab i − → h p 1 ; a ab i − → h p 3 ; a ab i − → h p 1 ; a ab i − → h p 3 ; a ab i − → h p 1 ; a ab i − → h p 3 ; a ab i − → h p 1 ; a ab i − → h p 3 ; a ab i − → h p 1 ; a ab i − → . . . Execu tion - 2: Successful m atch h p 0 ; a ab i − → h p 1 ; a ab i − → h p 3 ; a ab i − → h p 4 ; a ab i − → h p 3 ; a b i − → h p 4 ; a b i − → h p 3 ; b i − → h p 1 ; b i − → h p 2 ; b i − → h null ; ε i Theor em 4.7 (Simulation) Let π be a heap su ch that π , p | = e . Then there is a ru n of the EKW ma chine of the form h e ; [ ] ; w i → · · · → h ε ; [ ] ; ε i if and only if there is a run of the P W π machine of the form h p ; w i → · · · → h nul l ; ε i One needs to show that each step of the EKW machine can be simulated by the PW π machine and vice ver sa. The in va riant in this simulation is that the stack k in the E KW machine can be reconstr ucted by follo wing the chain of pointer s in the heap of the PW π machine via the follo wing functi on: stack p = [ ] if cont p = null stack p = e : : ( stac k q ) i f q = cont p 6 = null and π , q | = e 5 The lockstep construction in general As we hav e seen, the PW π machine is buil t from tw o kinds of ste ps. Pointers can b e e volv ed via p − → q by moving i n the syntax tree without read ing any inp ut. When a node for a constant is reach ed, it can be matched to the first charac ter in the inpu t via a step p a − → q . Definition 5.1 Let S ⊆ dom ( π ) ∪ { null } be a set of pointers. W e define the ev olution  S of S as the follo wing set:  S = { q ∈ dom ( π ) | ∃ p ∈ S . p − → ∗ q ∧ ∃ a . π ( q ) = a } 38 Regul ar expre ssion matching and operational semantics Forming  S is similar to compu ting the ε -closu re in auto mata theo ry . Howe v er , this o perati on is not a closure operator , becau se S ⊆  S does not hold in general. When one computes  S incrementally , elements are remov ed as well as added. A void ing infinite loops by adding and remov ing the same element is the main dif ficulty in the computat ion. W e define a transitio n relation analogous to D efinition 4.5, bu t as a determinis tic relation on sets of pointe rs. W e refer to these as m acro steps, as they assume the computat ion of  S as gi ven in a single step, whereas an implementa tion needs to compute it incremen tally . Definition 5.2 (Lockstep transitions) Let S , S ′ ⊆ dom ( π ) ∪ { null } be sets of pointers. S = ⇒ S ′ if S ′ =  S S a = ⇒ S ′ if S ′ = { q ∈ dom ( π ) | ∃ p ∈ S . p a − → q } A set of pointers is first ev olved from S to  S . Then, moving from a set of pointe rs  S to S ′ via  S a = ⇒ S ′ adv ances the state of the machine by adv anci ng all pointers that can match a to their continua tions. All other pointer s are delet ed as unsuccessfu l matches. Definition 5.3 (Generic lockstep machine) The generic lockstep machine has configuration s o f the form h S ; w i . T ransiti ons are defined usin g Definition 5.2: S a = ⇒ S ′ h S ; a w i ⇒ h S ′ ; w i S = ⇒ S ′ h S ; w i ⇒ h S ′ ; w i Acceptin g states of the machine are of the form h S ; ε i , where null ∈ S . Theor em 5.4 For a heap π , p | = e there is a run of the PW π machine: h p ; w i → · · · → h nul l ; ε i if and only if there is a run of the lockst ep machine h{ p } ; w i ⇒ . . . ⇒ h S ; ε i for some set of pointer s S with null ∈ S . 6 The sequential lockstep machine The sequ ential lockstep machine mainta ins two lists of pointers c , n correspon ding to po inters being incremen tally ev olved within the current m acro step and pointer s to be e v olv ed in the next macro step. Another pointer list t is maintained which prov ides support for redundanc y elimination, w e also introdu ce an auxilary functi on ψ ( p , l 1 , l 2 ) to aid in this regar d: Definition 6.1 The auxilary function ψ ( p , l 1 , l 2 ) is defined as: ψ ( p , l 1 , l 2 ) = p : : l 1 if p / ∈ l 1 @ l 2 ψ ( p , l 1 , l 2 ) = l 1 if p ∈ l 1 @ l 2 Rathnaya ke and Thielecke 39 h c ; t ; n ; w i → h c ′ ; t ′ ; n ′ ; w ′ i h p : : c ; t ; n ; w i → h c ′ ; p : : t ; n ; w i if π ( p ) = p ′ | p ′′ where c ′ = ψ ( p ′′ , ψ ( p ′ , c , t ) , t ) h p : : c ; t ; n ; w i → h c ′ ; p : : t ; n ; w i if π ( p ) = p ′ • p ′′ where c ′ = ψ ( p ′ , c , t ) h p : : c ; t ; n ; w i → h c ′ ; p : : t ; n ; w i if π ( p ) = ( p ′ ) ∗ where c ′ = ψ ( cont p , ψ ( p , c , t ) , t ) h p : : c ; t ; n ; w i → h c ′ ; p : : t ; n ; w i if π ( p ) = ε where c ′ = ψ ( cont p , c , t ) h p : : c ; t ; n ; a w i → h c ; t ; n ; a w i if p = null h p : : c ; t ; n ; a w i → h c ; p : : t ; n ′ ; a w i if π ( p ) = a where n ′ = ψ ( cont p , n , [ ]) h p : : c ; t ; n ; a w i → h c ; p : : t ; n ; a w i if π ( p ) = b h [ ] ; t ; n ; a w i → h n ; [ ] ; [ ] ; w i if n 6 = [ ] h p : : c ; t ; n ; ε i → h c ; p : : t ; n ; ε i if π ( p ) = a Figure 6.1: Sequentia l lockste p machine w ith redu ndanc y elimination Definition 6.2 The redun danc y-eli minating sequential lockstep machine has configurat ions of the form h c ; t ; n ; w i . Its transiti ons are gi ven in figure 6.1. T he accep ting states are of the form h null : : c ′ ; t ′ ; n ′ ; ε i W e re gard this machin e as a ratio nal reco nstru ction of Thompso n’ s matcher [13] i n the light of Cox’ s elucid ation as a virtua l machine [5]. This machin e uses a sequ ential schedu le for incremental ly e v olvi ng pointe rs, keeping a list of pointers that hav e been ev olv ed already to pre vent loops and sea rch space exp losion . Howe v er , our main interest is in performing this computation in parallel. 7 Parallel lockstep semantics W e no w define an operatio nal semanti cs where each poin ter is gi ven a ded icated thread for e v olving it. Our motiv ation is to lev erage the lar ge number of cores and hence threads av ailable on G PUs. The semantic s in this section is intended as an ideali zation of the implementatio n describ ed in Section 8 belo w , captur ing the essen tials of the computation while abstractin g from implementatio n details. T o describe the parallel comput ation, we define a simple process calculus. Its transiti on rules are gi ve n in Figure 7.1. Most of our calculus is a subset of CCS [8], with one-to-one direct ional message passin g and paralle l composit ion. Howe v er , w e also need an n -way synch roniza tion with a synchro nous transit ion inspired by Synchron ous C CS [9]. W e let M range ove r processes , p over pointers that may be sent as asynch ronou s messag es, and a 40 Regul ar expre ssion matching and operational semantics M − → M ′ M 1 − → M 2 ( P A R ) M 1 k M 3 − → M 2 k M 3 ( S E N D ) (( p . M ) k p ) − → M M a − → M ′ M ′ 6≡ ( $ a . M ′ ) k M ′′′ M ′ 6− → ( S Y N C ) ( $ a . M 1 k . . . k $ a . M n k M ′ ) a − → ( M 1 k . . . k M n ) Figure 7.1: Process calculus ov er input symbols, which ma y be used for n -way synchronis ation. The syntax of pr ocesse s is as follows: M :: = p | M k M | p . M | $ a . M W e impose some struct ural congruen ces ≡ , identify ing terms up to associati vity and commutati vity of parallel composti on k . Process transiti ons can be interle a ve d with rule P A R . W e ha v e CCS -style handshak e communicati on in ru le S E N D . Here p . M recei ves the message p and proceed s with M afterw ards. No te that recei vers of the form p . M are not replicated (in the pi- calcul us sense [10]), so that each communica tion consumes the recei v er . T his beha viour is essential, as the proces ses we genera te could become trapp ed in an infinite loop otherwise. W e also hav e an n -way synchronis ation S Y N C . This rule is the most complex, an d it is needed to implement matching to input once all pointe rs ha v e been ev olved. The idea is as follo ws: • The current process is factorize d into those processe s that are of the form $ a . M j and an M ′ com- prisin g eve rythin g else. • There are no further − → transition s inside M ′ , written as M ′ 6− → . • If these conditio ns are m et, then all the proces ses waiting to participate in an n -wa y synchr oniza- tion on a are adv anced in one sync hrono us step. • The remaining proces ses in M ′ are discard ed in the same step. Rules in this style, in which a number of processes are adva nced in a single step, are someti mes referre d to as “lockstep” [9]. Indeed, we use this rule to impleme nt the lockst ep matchin g of regu lar exp ressio ns in t he sense of T hompson and Cox . (In p ractic e, this rule may require a little ad-hoc p rotoco l to implement on a gi ve n architect ure.) W e translate each expre ssion pointer p in the heap π into a proce ss [ [ p ] ] π as follo ws: [ [ p ] ] π = p . ( q 1 k q 2 ) if π ( p ) = ( q 1 | q 2 ) [ [ p ] ] π = p . q 1 if π ( p ) = ( q 1 • q 2 ) [ [ p ] ] π = p . ( q 1 k q 2 ) if π ( p ) = q 1 ∗ and cont p = q 2 [ [ p ] ] π = p . q if π ( p ) = ε and cont p = q [ [ p ] ] π = p . $ a . q if π ( p ) = a and cont p = q Rathnaya ke and Thielecke 41 Intuiti vely , for each intern al node in the expressio n tree ident ified by the pointer p , we create a dedica ted little process that listen s on a channel uniq uely correspond ing to p . For simplicity , we use the same name f or the ch annel as f or the po inter . The proces s may be acti vat ed by message s p sen t to it , and it may se nd such message s itself. T hese mess ages trigge r a chain reactio n that ev olve the curren t pointer set of a macro step. There is no need for these message s to be externa lly visible, as their only purpo se is to wak e up their unique recipient. A process p . M lis tening for p is consumed by the transit ion that recei ves the message. Processes for nodes that point to input characters a at the leav es of the express ion tree use a dif ferent form of communicati on. All these nodes synchro nize on the inpu t symbol. T he symbol a is visible in the r esultin g synchrono us transi tion step a − → , becau se we nee d it to a gree with th e nex t input symbol. If dom ( π ) = { p 1 , . . . , p n } , we define the translati on [ [ π ] ] as the translation of all its pointers: [ [ p 1 ] ] π k . . . k [ [ p n ] ] π If the input string is not empty , let a be the first charac ter , so that a w ′ = w . T he parallel machine launch es pr ocesse s for all the nodes in the tree, and sends a message to the process for the root. The resulti ng proce ss make s a number of asynchro nous transi tions, follo wed by a sync hrono us move f or a : [ [ π ] ] k p − → · · · − → a − → M All these ste ps together repre sent one macro step. The mach ine then repeat s the abo ve with the nex t symbol a ′ and M [ [ π ] ] k M − → · · · − → a ′ − → M ′ The machine accepts if the remainin g input is empty and the current proces s is of the form null k M Example 7.1 For e = a ∗∗ b , let π and p 0 be such that π , p 0 | = e . S ee Figure 4.1 for the represen tation of π as a tree with pointe rs. Tr anslat ing the tre e structure to parallel processes giv es us: [ [ π ] ] = ( p 0 . p 1 ) k p 1 . ( p 3 k p 2 ) k p 2 . $ b . null k p 3 . ( p 4 k p 1 ) k p 4 . $ a . p 3 Assume an input string of aab . W e hav e the pointer e v oluti on as follo ws: p 0 k [ [ π ] ] − → p 0 k p 0 . p 1 k p 1 . ( p 3 k p 2 ) k p 2 . $ b . null k p 3 . ( p 4 k p 1 ) k p 4 . $ a . p 3 − → p 1 k p 1 . ( p 3 k p 2 ) k p 2 . $ b . null k p 3 . ( p 4 k p 1 ) k p 4 . $ a . p 3 − → p 3 k p 2 k p 2 . $ b . null k p 3 . ( p 4 k p 1 ) k p 4 . $ a . p 3 − → p 3 k $ b . nul l k p 3 . ( p 4 k p 1 ) k p 4 . $ a . p 3 − → $ b . null k p 4 k p 1 k p 4 . $ a . p 3 − → $ b . null k p 1 k $ a . p 3 Since no more micro transiti ons are poss ible, we ha ve reached the n -way synchro nizati on point: $ b . null k p 1 k $ a . p 3 a − → p 3 42 Regul ar expre ssion matching and operational semantics No w we feed the residu al message s back into a fresh [ [ π ] ] : p 3 k [ [ π ] ] − → p 3 k p 0 . p 1 k p 1 . ( p 3 k p 2 ) k p 2 . $ b . null k p 3 . ( p 4 k p 1 ) k p 4 . $ a . p 3 − → p 0 . p 1 k p 1 . ( p 3 k p 2 ) k p 2 . $ b . null k p 4 k p 1 k p 4 . $ a . p 3 − → p 0 . p 1 k p 1 . ( p 3 k p 2 ) k p 2 . $ b . null k p 1 k $ a . p 3 − → p 0 . p 1 k p 3 k p 2 k p 2 . $ b . null k $ a . p 3 − → p 0 . p 1 k p 3 k $ b . nul l k $ a . p 3 a − → p 3 − → . . . b − → null Therefore , we hav e receiv ed a null w hile the input string has become empty , resulting in a succes sful match. W e need to p rov e that the c onstru ction abov e can corre ctly ev olve and match a ny set of pointer s. Let S = { p 1 , . . . , p n } ⊆ dom ( π ) ∪ { null } be a set of pointers in the heap. W e define S = p 1 k . . . k p n to represen t this set as a parall el composition of messag es. Theor em 7.2 Let S , S ′ ⊆ dom ( π ) ∪ { null } . W e ha ve S = ⇒ a = ⇒ S ′ if and only if S k [ [ π ] ] − → ∗ a − → S ′ Moreo ver , each − → tra nsitio n sequen ce starti ng from S k [ [ π ] ] is finite. Theorem 7.2 assures us that the parellel operationa l semantics correctly implements the lockst ep constr uction . The pointers p in the tree, represente d as proces ses p , are e v olv ed in parallel. Although this ev olution is non-deter ministic, its e nd result i s d etermina te. Moreov er , the cycles in th e p ointe r cha in do not lead to cyclic proce sses loopin g forev er , since each recei ving process becomes inacti ve once the node has been visited . The correctness proof of the parallel implementati on relies on a facto risatio n of the processe s into four compon ents. At each step i , we ha ve : • A set S i of pointers , indic ating nodes tha t should be ev olved. • A heap of recei v ers π i ⊆ π , representing nodes that ha ve n ot b een v isited in the current mac ro st ep. • A set E i of ev olved nodes, w hose process representati ons are of the form ready to match a characte r . • A paralle l composition D i of messages to nodes that ha ve alread y been processed. Rathnaya ke and Thielecke 43 Let E be a set of poin ters E = { p 1 , . . . , p n } such that π ( p j ) = a j and cont p j = q j . W e write $ E = $ a 1 . q 1 k . . . k $ a n . q n W e need to consider transit ion sequenc es of the form S 0 k [ [ π 0 ] ] k $ E 0 k D 0 − → . . . − → S n k [ [ π n ] ] k $ E n k D n where π 0 = π and E 0 = / 0. The in vari ant w e nee d to establish for all transition steps consists of:  S 0 =  ( S i ∩ dom ( π i )) ∪ E i  R i ⊆  ( S i ∩ dom ( π i )) ∪ E i { p | ∃ D . D i ≡ ( p k D ) } ⊆ S i ∪ R i where R i = dom ( π ) \ dom ( π i ) . T he facto rizatio n of proceses at each step and the in va riant are verified by case analys is on the kind of node π ( p ) and hence the possible − → steps that its translat ion [ [ p ] ] π can make usin g the rules from Figure 7.1. In the final configurat ion we ha ve S n ∩ dom ( π n ) = / 0. Hence,  S 0 =  ( S n ∩ dom ( π n )) ∪ E n =  / 0 ∪ E n = E n Therefore , we ha ve  S 0 = E n , as requ ired. From that configura tion, there can only be an a − → transitio n, exa ctly matching the generic lockstep transition S = ⇒ a = ⇒ S ′ . 8 Implementation on a GPU As a proof of concept, we hav e written a simple regul ar expr ession matcher where the ev olution of pointe rs is performed in paralle l on a GP U. 1 Programming the GP U was done via CUD A [3]. T he main points are: • The regular expres sion is parsed, and the syntax tree nodes are pack ed into an array d . This array repres ents our heap π . A second pass through the syntax tree performs the wiring of continu ation pointe rs, correspond ing to cont . • T wo intege r vec tors c , n o f the same size as the re gular e xpression ve ctor a bo ve are crea ted. Here a v alue of t - the m acro step count, on c [ i ] implies that regula r expres sion d [ i ] is to be simulated within the current macro step . On the othe r hand a v alue of − t on c [ i ] implie s that the correspo nding reg ular expres sion has already been simulated for the current macro step. This protocol realizes the s emantics of a process being con sumed once i t has r ecei ved a message . The vecto r n is u sed to collec t those search attempts which are able to match the current input chara cter . A valu e of t + 1 on n [ j ] indicat es that the reg ular exp ressio n d [ j ] is to be simula ted on the next macro step. 1 The code is av ailable at http://www.cs.bham.ac.uk/ ~ hxt/research/regexp.shtml . 44 Regul ar expre ssion matching and operational semantics • Each reg ular exp ressio n node d [ i ] is assigned a GPU thread. This GPU thread is responsible for condit ionall y simulating the reg ular expres sion d [ i ] at each in vocat ion (dependi ng on c [ i ] val ue). While simulating an expres sion, a GP U thread might schedule another GPU thread / expres sion d [ j ] by setting c [ j ] to t (this could happen for an example in the case of e = e 1 • e 2 ). Note that one thread schedulin g another thread via the c vector correspon ds to the sending of a message p from one proces s to another . • At each in v ocatio n of the G PU t hreads ( called a kern el la unc h in CU D A termin ology ), each thread which perfo rms a successful simulatio n updat es either of two shared flags which indic ate if there were m ore threads acti vate d on the c or n vectors d uring t he current in v ocatio n. A macro transitio n in vo lve s swappin g the c and n v ectors while incr emeting the t counter . It corresp onds to the n -way synch roniza tion transition . • The initial state of the machine has only d [ 0 ] , the root node proce ss, scheduled for simulation. Ho wev er , note that this descrip tion corres ponds to a minimalistic GPU-based parallel lockstep machine and does not yet incorporate any optimizat ions from the literat ure [14], such as per sisten t thr eads and tasks queues . 9 Conclusions W e ha v e deriv ed r egu lar expressi on matchers as abstract machines. In doing so, w e ha ve used a number of concep ts and techniques from programming langu age theory . The EKW machine zooms in on a current exp ressio n while maintaing a continua tion fo r keep ing track of what to do next. In that sense, the machine is a dista nt relati ve of machines for interpretin g lambda terms, such as the SEC D machine [7] or the CEK machine [6]. On the other hand, re gular express ions are a much simpler language to interpre t than lambda ca lculus , so th at co ntinu ations can be rep resen ted by a single poin ter into the tr ee st ructur e (or to machine c ode in T hompson ’ s original implemen tation ). While the i dea of continuatio ns as c ode poi nters is sometimes adv anc ed as a helpful intuition , the represe ntatio n of continua tions in CPS compiling [1] is more co mplex , in volv ing an en vironment pointer as well. T o represent pointers and the structures the y b uild up, w e found it con veni ent to use a small fragment of separ ation logic [11], gi ve n by just the separa ting conjun ction and the point s-to-pr edicat e. (They are written as ⊗ and π ( p ) = e abo ve, to a v oid clashe s with ot her nota tion.) A similar u se o f t hese co nnecti ves to de scribe tree s in the setting o f ab stract machines was used in our earlier work on B+trees [12]. Here we translat e a tree-shaped data structure into a netwo rk of processe s that communicate in a cascad e of messages mirrorin g the pointers in the tree structure. The semanti cs of th e proc esses is inspired by the proces s algebra lite rature [8, 9, 10]. One reason w hy a process algebra is suitab le for formalizing the lockstep construct ion with redunda ncy eliminati on is that recei vin g proces ses are eliminat ed once they hav e recei ve d a message; they are used linearl y , and so are reminiscent of linearly-u sed continu ations [2]. W e intend to ex tend both the process algebra vie w and our C UD A implementation , while main- tainin g a close correspon dence between them. R egu lar expr ession matching is an instan ce of irre gular paralle l [14] processing on a GPU, which pr esents some optimiz ation prob lems. At the moment , the paralle l processin g po wer of the GPU cores is not exe rcised , as each thread does little more than access the e xpress ion tree and a cti vate thread s for other no des. W e exp ect the l oad on th e GPU cores to beco me more significant when more expensi ve const ructs su ch as back-r eferen ces (kno wn to be NP-hard) are added to our matching lang uage. It remains to be seen whether a GPU implementa tion will become more efficient than a sequential CPU -based one, particularly as the number of GPU cores continues to Rathnaya ke and Thielecke 45 increa se (it is curren tly in the hundre ds of cores). M ore generally , the op eration al semantics and ab- stract machine approach m ay be fruitful for reasonin g about other forms of General P urpose Graphics Processin g Unit (GPGPU) prog ramming. Refer ences [1] Andrew Appel (199 2): Compilin g with Continuations . Cambrid ge Uni versity Press. [2] Josh Berdine, Peter W . O’Hearn , Uday Reddy & Hayo T hielecke (2 002): Linear Con tinuation P assing . Higher-order and Symb olic Computation 15(2/3 ), pp . 181–20 8, do i: 10.1023/A:10208911 12409 . [3] NVIDIA Corporation (2011): What is CUD A? A vailable at http://www.nvidi a. com/object/what_is_ cuda_new.html . [4] Russ Cox (20 07): Re gular Express ion Matching Can Be S imple And F ast (but is slow in J ava, P erl, PHP , Python, Ruby , ...) . A vailable at http: //swtch. com/ ~ rsc/rege xp/regexp1.html . [5] Russ Cox (2009): Re gular Expr ession Matching: the V irtual Ma chine Ap pr oa ch . A vailable at http:// swtch.com/ ~ rsc/rege xp/regexp2.html . [6] Matthias Felleisen & Daniel P . Friedman (1 986): Contr ol operators, the SECD-machine, and the λ -calculus . In M. W irsing , editor: Formal Description of Programmin g Conc epts , North-Hollan d, pp . 193–21 7. [7] Peter J. La ndin (1964 ): The Mechanical Evalua tion o f Express ions . The Com puter Journal 6(4), p p. 308–32 0. [8] Robin Milner (1980 ): A Calculus of Communica ting S ystems . Lecture Notes in Computer Science 92, doi: 10.1007/3- 540- 10235- 3 , Sprin ger . [9] Robin Milner (198 3): Calculi fo r Synchr ony and Asynchr o ny . Theoretical Compu ter Science 25, pp. 2 67– 310. [10] Robin Milner (1999 ): Commu nicating and Mobile Systems: Th e Pi Calculus . Cambridge University Press. [11] John C. Reynolds (2 002): Sep aration Logic: A Logic for Shared Mutable Data Struc tur es . In: Logic in Computer Science (LICS) , IEEE, pp. 55–74, doi: 10.1109/LICS.2002.1029817 . [12] Alan P . Sexton & Hay o Th ielecke (2008 ): Reason ing ab out B+ T rees with Operationa l Semantics and Sep- aration Logic . In: T wenty- fourth Conferen ce on the Math ematical Foundation s of Prog ramming Semantics (MFPS24) , E lectronic Notes in Theore tical Com puter Scien ce, pp. 3 55–36 9, doi: 10 .1016/j.entcs.2008. 10.021 . [13] Ken Thom pson (1 968): Pr ogramming T echniques: Re g ular expr ession searc h a lgorithm . Communica tions of the A CM 11(6) , p p. 419–4 22, d oi: 10.1145/363347 . 363387 . [14] Stanley Tz eng, Anjul Patney & John D. Owens (2 010): T ask Management for Irr e g ular-P a rallel W orkloa ds on the GPU . I n: High Perform ance Graphics , Eurographic s Association , pp. 29–37.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment