A Survey on Deep Packet Inspection for Intrusion Detection Systems

A Surv ey on Deep P ac k et Insp ection for In trusion Detection Systems T amer AbuHmed 1 , Ab edelaziz Mohaisen 2 ∗ , and DaeHun Ny ang 1 1 Information Securit y Researc h Lab oratory , Inha Univ ers ity , Inc heon 402-751, Korea 2 Electronics and T elecomm unication Researc h Institute, Daejeon 305-700, Korea tamer@se clab.inha.ac.k r , a.mohais en@etri.re.kr , nyang@inha. ac.kr Abstract De ep p acket insp e ctio n is widel y r e c o gnize d as a p ow - erful way which is use d for intrusion dete ction systems for insp e cting, deterring and deﬂe cting malicious at- tacks over the network. F undamental ly, almost intru- sion d ete ction systems have the ability to se ar ch t hr ough p ackets and identify c ontent s that match with known at- tacks. In this p ap er, we survey the de ep p acket insp e c- tion implementations te chniques, r ese ar ch chal lenges and algorithms. Final ly, we pr ovide a c omp arison b e- twe en the diﬀer en t appli e d systems. Key w ords : De ep p acket insp e ct io n, intrusion dete c- tion system, network se curity, algo rithms. 1 In tro duction The enormo us attacks from the Internet like viruses, spam, softw are v ulner abilities a nd many of a ttacks sp o ts make pr otection metho ds a n imp ortan t way to preven t and sa ve the human eﬀorts from destruction. Therefore, a v ar ie ty o f metho ds have be e n used to pro- tect data. These metho ds b egan with using cryptog- raphy , p olicies, ﬁrew alls,IDS and ﬁna lly with in trusion preven tion systems (IPS) [4 2] . IDS and IPS are con- sidered as the seco nd defense line against the outsider attack which do not know the cryptog raphic informa - tion. Besides, they work as the ﬁrst defense line against insider attacks who can bypass the cr yptographic sys- tem. The DPI is a core comp onen t for many systems plugged in the netw ork including pr o xies, pack et ﬁl- ters, sniﬀers, IDS, and IP S. Netw ork comp onent s us e DPI a s an essential insp ector where it is applied in diﬀerent layers o f the OSI mo del. Unlike the early b e- ginnings of using DPI where it was applied in o nly one ∗ This w ork was done while the author w as a graduate studen t at Inha U ni v ersity . lay er dep e nding on the header (e.g., proxies and ﬁre- walls etc.), now adays, lay er- independent attacks force us to insp ect attacks in all the la yers. Accor ding on the in trusion detection literature, eﬀorts to obtain a fast implemen tation can be categ orized into t wo main categories [31]: (1) design of an eﬃcien t data structure with optimized memory access rate, and (2) design of high throug hput algor ithm to pro cess intruder sig na - ture. In this pap er, we survey the deep pack et insp ection algorithms and their usage in the several existing tech- nologies which are used for in trusio n detection systems. The rest of this pap er is o rganized as follows: sec tio n 2 in tro duces an ov erview o n the challenges and g oals (or simply ob jectives) o f using the deep pa c ket ins p ection for eﬃcient intrusion detection systems. Sectio n 3 and section 4 in tro duce b oth the so ft w ar e and hardware implemen tations of DPI systems, respectively . Section 5 overviews the ﬁnite state machine, section 6 in tro- duces a comparison betw een the existing technologies and architectures, and ﬁna lly section 7 dr aws conclud- ing r e ma rks. 2 Challenges and Goals The design a nd implemen tation of the deep pack et inspec tio n has several c halleng e s whic h har den the its adv a ncemen t pr ocess. Also, there are several ultimate goals and design ob jectiv es tha t are alw ays co nsidered when w e make a new DPI design. In this s ection, we list the diﬀerent ch alleng es and desig n ob jects. 2.1 Deep P ack et Insp ection Challenges When the DPI b ecomes mean to detect the intru- sion, there ar e several challenges re lated to applying it on the netw ork. In the follo wing, we summarize these challenges. 1. The se ar ch algorithm comple xit y: the com- plexit y of the algor ithm and the o perations of comparison aga inst the signatures of intruder de- crease the thr o ughput of the system. Thus, search algorithms are the main focus p oin t in DPI re- searches, whereas matc hing pro cess is resource consuming. F or example, the string match ing rou- tines in SNOR T [35] account for up to 70% of tota l execution time a nd 80% of instructions executed on r e al traces[4]. 2. Increasing num b er of intr uder signature: ac- cording to the verity of a ttac ks, the needs for new in truder signa ture increas e. Therefor e, the large n umber of signa tur e s makes the task of IDS ha rder whereas the matching pro cess must insp ect traﬃc against all attacks ﬁngerprints. 3. The ov erlapping of signatures: the signatures of attacks usually are not general so the signa- tures can be categorized in to gro ups acco r ding to common prop erties lik e proto col type. F or exa m- ple http pac ket in snort [35] has 10 96 signatures . Therefore, there is a need for pro cess the pack ets befo r e matchin g pro cess. 4. The Lo cation of signature unkno wn: due to verit y types of attac ks on diﬀerent types of appli- cations, the pattern of intruders is not lo calized in sp e c iﬁc place in the pack et which means that the IDS must insp ect all the payload of the pack et against the attack er sig natures. 5. Encrypted Data: the da ta which is encrypted cannot b e insp ected by DP I. Ho wev er, there are some s olutions to ov erco me this problem by plug- ging the DPI compo nen t b ehind the decryption device. The DP I system as w e mentioned b efore has man y challenges and in the same time it ha ve to provide the requirements for net work need. There are t wo main requirements that should b e satisﬁed on DPI system, more detail will be provided in subsection 2.2,which is:(1) the high sp eed of pro cessing the pac kets which aﬀects the throug hput of the system and manages the core speed of the netw or k (10 Gbps-40 Gbps) and the edges s p eed (1 Gbps). (2) The low co st for DPI s ystem as memo ry , and power co nsumptions. 2.2 DPI Design Ob jec t iv es DPI systems hav e to satisﬁed speciﬁc ob jectives to sustain the traﬃc r ate and intrusion sig na tures growth. Hence, we conclude some ob jectives which hav e to sat- isfy in DPI archit ecture as following [45] [40]: TCAM CAM Content Addressable Memory FPGA Network Processors(NP) DPI Implementation Hardware Software SNORT Bro Figure 1. DPI implementations 1. Deterministic p erformance : the ar c hitecture has to op erate and pr ocess traﬃc strea m indep en- den tly of signature c har acteristics or traﬃc c har- acteristics. So, the system has to manag e traﬃc in worst ca se in s o ft w ar e and hardware based sys- tems. 2. Memory eﬃciency : memory a ccess time is one of the main b ottlenec ks in DPI system in so ft ware implemen tations mea nwhile, it is cr itical in hard- ware design as access time and memory scarcity . Thu s, high memory eﬃcient design is prefera ble. 3. Dynamic up date : this ob jective is v ery impor - tant in hardware based design to add a nd r emo ve in truder signature to sys tem without aﬀect system op eration. 4. Signatures : DPI system supp ort ﬁxed intruder patterns a nd r egular expr ession. Also , the sy s tem can deal with all t yp es of intruder patterns [20] which we will illustr a te in the literature in section 4.4. 5. Scalabilit y : scalability is no t big issue in softw are based system. On the other hand, it is critical in hardware based systems. Thus, hardware design has to suppo rt unlimited n umber of sig natures. 6. Additional functions : DPI system can supp ort another function like; m ulti traﬃc’s sess io ns in- sp e c ted separately , not only inspect the in truders but also allo cate it, and custo mize signatures sub- sets o r entire signature to insp ect. 3 Soft w are Deep pac k et Inspect ion sys- tems There a re many pack et sca nning applications that require deep pac ket insp ections. Here, we review three po pular ones: SNOR T [35], B ro [10] and Linux L7- ﬁlter [28]. SNOR T and Bro are tw o p opular in trusion detections s y stems, while L7 -ﬁlter is an application for application lay er proto cols analysis whic h makes pac ket classiﬁcation bas ed o n application la yer data. These systems are all o pen source systems, whic h allow us to per form a deta iled a nalysis and show their abilities and constraints. 3.1 SNOR T Intrus ion Detection System SNOR T is a n open source intrusion detection sys- tem which used for proto col ana lysis and full pack et inspec tio n against intruder signature. The SNOR T sys- tem pro cesses the traﬃc of pac kets on multi stage s as illustrated in Figure 2 [47]. SNOR T system and all common IDS use metho d called analyze-normalize d- matchin g (ANM) [32]. SNOR T use ma n y string match- ing algor ithms, on o f them is B o yer Mo ore (BM) a lgo- rithm which we will ta lk ab out it in litera ture abo ut matchin g algorithms in section 4.1. SNOR T rule ma y contain header and conten t ﬁelds where the header pa rt chec ks the proto col, source and destinatio n IP address and port, and the conten t part sca ns pack ets payload for o ne or more patterns. Rules with mo r e than one pattern are ca lled c o rrelated rules. F urthermore, rules can also co n tain negation patterns, whic h mean nega- tion of patterns stands for no o ccurrence of the pattern. The matchin g pattern may b e in ASCII , HE X or mixed format. HEX parts are included b et ween vertical ba r symbols “j” as a n example of a Snort r ule is [35]: alert tcp any any -> 198.165. 200.24/32 111 (content : "idcj|3 a3b|j"; msg: "mountd access";) Packet Decoding Preprocessing Content Normalization Detection Engine Alert State Info. Figure 2. SNORT P rocess Stages 4 Hardw are Implemen tation As a need to sp eed up the ins p ection pro cess, the hardware (HW) implementations alw ays app ear as a preferable solution for high sp eed DPI implementation. How ever, the diﬀerent requirements for DPI pr o vide limitations to per fo r m the deep pack et inspection in HW. The limitation refers to the large num ber of sig- nature, complexity and o verlapping of s ignatures and ﬁnally the high rate o f signature update and addition. Therefore, the HW so lution has to s atisfy the previous requirements by s pecial prop erties which are as follows: 1. Use of high degree of pip elining to supp ort inspe c - tion for larg e num b er of in truder patterns. 2. The HW comp onent must hav e high degree of pro cessing capability to ma nage complex patterns with LAN sp eed (e.g., 10 Gbps). 3. It m ust b e conﬁgurable HW to b e suitable for changing s ituatio n of intruder patterns . 4. It m ust be desig n to b e capable of up date or add a new pattern without turning o ﬀ the DPI com- po nen t. The ha r dw are implemen tation can be categorized in to three depending on the used tec hnologies in that implemen tation as follows: 1. T er nary conten t addressable memory (TCAM) im- plemen tation [41] 2. Field-progra mmable g a te array (FPGA) imple- men tation [17] 3. Multi-core proce s sors [2 2] How ever, each implementation has its adv a n tages and limitations which as we will see later when we detail each implemen tation. In ge ner al, multi-core pro ces- sors implementations ar e cons ider ed the be s t pr eferable among the implemen tations due to its progra mming ﬂexibilit y . O n the other ha nd, the TCAM is preferable when the sp eed is considered. 4.1 Matc hing Algorithms The matc hing for pattern depends on the a lgorith- mic wa y to pro cess the data and return the re s ult of existence of the pa ttern or not in considerable time. Accordingly , many algor ithms hav e b een introduced to per form string matching. Though, the string matching algorithms alw ays suﬀer from tw o factors that aﬀect the throughput of pr ocessed da ta. The ﬁrst factor is the computation op erations to make comparison be- t ween the pattern a nd the data and second is the num- ber of patterns that need to b e compare d with the traf- ﬁc of the incoming data . Historically , the ﬁrst str ing matchin g algorithm was the brute for ce (BF) algo rithm which compares the ﬁrst character in the pattern with the data stream. If the a single charter match, BF compares it with the next c har acter of the pattern and so on. Fina lly , if the whole pa ttern is ﬁnished, it issues the pa ttern matc hing results. Later on, man y algorithms app ear to increase the per formance of matching. These a lgorithms can be categorized according to the implementation as soft- ware based, HW bas e d or mixture of both implemen- tations. Brieﬂy , there are a lot o f algo rithms for pat- tern match ing. How ever, the most famous softw are based algo rithms ar e Knuth-Morris-Pr att (KMP) [2 4], Boy er-Mo ore (BM) [9], Aho-Cor asic k (AC) [1], AC BM algorith [14], W u- Man b er [4 8], and Commen tz W al- ter (CW) [15]. W e will summarize the concept b e- hind selec ted alg orithms and their implementation, de- sign, and applicabilit y for DPI. On the o ther hand, most known HW based algorithms are the par allel Blo om Filters [17], CAM (conten t addressable mem- ory), TCAM, and ﬁnally FPGA implemen tations. KMP Algorithm: the Knut h-Mo rris-Pratt (KMP) algo rithm [2 4 ] ca me as a n enhance ment for the brute force algorithm which w as we in tro duced before as the early work for pattern matching. The impro ve- men t of KMP o ver the BF is p erformed b y skipping characters when the mismatch o ccurs in the co mpar- ison phas e. This skipping for characters depe nds on prepro cessing phase of KMP to the patterns. The r e- sult of the KMP is s omeho w similar to the ﬁnite a u- tomata for pa tter ns r epresen tation in which depe nding on every match and mismatch a cer ta in jump over the input stream occur s. Additionally , KMP [24] a nd BM [9] a lg orithms are designed for s ingle pattern searching. If the patter n length is m bytes, the complexit y of the matc hing algo rithm will b e o f O ( m + n ) matc h- ing this pattern in an n bytes stream. If ther e are k patterns, the sea rc h time will be O ( k ( m + n )) ac- cording to tha t the sing le sea rc h is perfor med k times. In [7], B ak er a nd Prasanna implemented a hardw ar e based DPI architecture for KMP algorithm to exploit the HW par allelism and reduce the complexity of the ab o ve b ound. 4.2 Blo om Filter The Blo om ﬁlter is a technique to generate a structure t hat compresses the pa ttern string as s hashed v alue. After that, the sa me has h function that pro duced the patterns is used to make the de- pendences from the input tra ﬃc. This method has been a pplied ﬁrstly in in trusion detection sys tem by Dharamapurik ar et al. [17] and his implementation was on FPGA. The system implemen tation achiev es a throughput of 2.12Gbps. Blo om ﬁlter s are very elega n t in repres en ting set membership, but hav e tw o po ten tial drawbac ks. First, they requir e multiple hash functions and memories, and second, they give an approximate match a nsw er since they a llo w fals e po s itiv es. 4.3 Con ten t Addressable M emory Now adays, the most popular HW tec hniques which are used in commercial pack et insp ection pro ducts ar e conten t a ddressable memor y (CAM) [4 1]. The CAM is a sp ecial memory that makes parallel compar ison for its conten ts a gainst the input v alue a nd returns the ad- dress of match en try . Hence, the CAM is considerably fast and has many demanded pro perties such a s high access sp eed nea r 4 nano-seco nd, the search time com- plexit y is O (1) a nd b ounded by a single memory acces s. How ever, CAM do es not make longe s t preﬁx matching which is essential for many DPI patterns that hav e the same preﬁx. Therefor e, it is suitable for deter ministi c ﬁxed-length matching. Also, b ecause of the above shortage of CAM, a new HW comp onen t was developed by the name of T ernary CAM (or simply , TCAM). TCAM memor y stores the data with three logical v alues ( i.e., 0, 1, ? don’t care) and its circ uit diagram construct as illustrated in Fig- ure 3(b) [41]. F urthermore, each ent ry stores the v alue which is co nsidered to b e intruder signature and entries arrang ed in descending index as illustrated in Figure 3(a) [4 1]. As a re s ult o f the prev ious pr operties, for CAM and additionally to Longest-Preﬁx Matching, TCAM b e- came as backbone for many netw ork devices that de- pend on pack et insp ection. F or e x ample router s and switches prima rily use TCAMs to perform forw ar ding lo okups for In ternet Proto col addresses. TCAMs can be a lso used in devices that suppo rt packet classiﬁ- cation, net work address translatio n, route lo okups in storage net works, la yer 4 to lay er 7 s witc hing, serv er load balancing , la bel s w itching, high perfor mance ﬁr e - wall functi o ns and ﬁnally in netw ork intrusion detec- tion sy stem (NIDS) and net work preven tion system (NIPS) that dep end on DPI techniques. How ever, TCAM has some general disadv a n tages which ar e as follo wing [4 1]: 1. High cost p er bit relative to other memory tech- nologies, it’s ab out 30 times SRAM pe r bit. 2. Storage ineﬃciency . 3. High p ow er co ns umpt ion. It is ab out 180 times than SRAM p er bit and the p ow er co nsumpt ion prop ortional with n umber of entities which has been sear c hed on memory lo okup. 4. Limited s calabilit y to long input keys. The sp ecial disa dv antages for DPI ar e as follows [29]: 1. Range Representation Problem: TCAM can repre- sent preﬁx of patterns in ea s y way ( e.g. ”atta XX” catch any w ord start with atta and tw o letter after) but rang signature which catch s ub-w ord and after arbitrary n umber of c hara cter catch the reminder sub-word co nsumes more entries in TCAM. 2. Multi-match Clas siﬁcation Problem: Return back all the matching results of all matching en tries of TCAM, not just the highest pr iorit y entry of TCAM. Bit wise CAM: In [5 0], CAM hardware has been implemen ted based on a tree-ba sed con tent address- able memory structure ca lled “ Bit wise CAM”, which in volves HW sharing at bit level in o r der to exploit powerful log ic optimizations for mul tiple string s repre- sented as a Bo olean expres sion. The design can run at a ra te of approximately 2.5 Gbps pe r second, and is appro ximately 30% s maller in area when compar ed with published re sults. Also, authors functionalized the pa rallelism in the design of an extended system. 4.4 TCAM implemen tations In literature of TCAM’s con tribution in DPI, Y u et al. [20] have be e n the ﬁrst to design scheme that dea ls with all types of intruder patterns which we will dis- cuss later. In [20], they implement a scheme for IDS that handles the intruder’s signatures with deeply ana l- ysis to in truder’s patterns. The scheme categ o rizes in- truder patterns in to t wo types: complex patterns such as long patterns , patterns with negatio n (which means no existence of sp eciﬁc patterns on traﬃc) a nd corre - lated patterns (whic h means patterns sepa rated with sp e c iﬁc num b er of arbitrary characters). Additionally , there ar e another type which is a simple pattern. The work b y Y u et al. discusses sc heme and alg o- rithms to deal with each t yp e of pattern and how to plug it int o TCAM. The scheme uses SRAM memory as partial hit list (PHL), whic h consider slow in access comparing to TCAM, to store detection of pa r tial cor- related patterns encoun ter in tra ﬃc. Nonetheless, the scheme has bottleneck when the intruder inten tionally send pac ket that make PHL access rate v ery high and then eﬀect the system throughput. Tha t is due the need of mult i memory loo k up. According to the simulation, this scheme can b e op- erated on 2 Gbps traﬃc. The implementation of Y u et al. in [20] s uggests lo okup o n TCAM en tries for ea ch new character. Thus, the input of n c har acter requires the complexity o f O ( n ) lo okup ov er TCAM. On the other hand, Jung et a l. in [38] presented a scheme in which jump are ma de ov er the input traﬃc by window slide size m whic h is called jumping window sc heme and matc h the intruder signature ov er s ing le pa c ket. It reduced the num b er of TCAM lo okup ov er n input character to O ( n/m ) and provided throughput of 1 0 Gbps using 2,394 SNOR T rules. Also, Sung et al. in [39] extended the jumping window s cheme to work ov er m ulti pack ets intruder signatures. 4.5 Multi-core Pro cessors Implemen ta- tions Multi-core pro cessors’ implemen tations ar e pr efer- able for designing IDS due to ﬂexibility . How ever, m ulti-cor e pr ocesso rs still hav e limitation in num b er of pro cessors and size o f on-ch ip memor y which aﬀect eﬃciency o f IDS implemen tations on it. In the fol- lowing, we will intro duce a survey on a par t of the eﬀorts b een p erformed to implement IDS o n net work pro cessors (NP) which is a t yp e o f multi-core pro cessor implemen tation. In [16], Bruijn et a l. developed the SafeCard des- gin whic h is a framework for netw ork - based int rus io n preven tion at the netw ork edge which is able to co pe with a ll levels of abstraction a nd c an be easily extended with new techniques. F ur thermore, it is capable of re- constructing a nd scanning TCP streams at Gbps rates while prev enting p olymorphic buﬀer-ov erﬂow attacks. Additionally , the CardGuard b y Bos et al. in [8] uses IXP1200 net work pr ocesso r as IDS and achi eved few h undred Mbps Ether net p erformances when scan- ning payloads of TCP connectio n. In [34], Singh et al. in tro duce Ear ly-bird pr otot yp e whic h consists of s en- sor to detect attacks a nd ag gregator for a dmin istra tiv e rep orting a nd co n trol. Early-bird can cope with 2 00 Mbps without pack et dro pping. In [12], new work has been introduced by Chris et al. as a combination betw een IXP net work pro cessor s and Xilinx Virtex FPGAs to build IDS. 5 Finite State Mac hine One of the mo s t imp ortan t to ols for the design o f hardware implemen tation for the DPI is the ﬁnite state machine (FSM). The FSM implemen tation is cla ssiﬁed in to tw o categories which ar e the deter ministic ﬁnite automata (DF A) a nd nondeterministic ﬁnite automata (NF A). In this s e c tio n, w e in tro duce a survey o f the research that has b een pe r formed o n the FSM including the tw o categories . 5.1 Nondeterministic Finite Automata Nondeterministic Finite Automata (NF A) is a di- rected gr a ph which has no des c a lled states and lab eled edges to co nnect the s tates. More speciﬁca lly , the NF A 1 0 0 0 0 1 1 0 1 0 ? ? Match 1 0 0 ? Input 1 2 3 n 1 st entry n th entry (a) TCAM key key a1 a2 match logic write enable match line a2 0 1 0 1 a1 0 0 1 1 value Don’t Care 1 0 undefined (b) Cir cuit diagram of a standard TCA M Figure 3. (a) TCAM (b) TCAM cell v alue(0,1,?) en coded by tw o register a1,a2 has initial state and one or more ﬁnal states. Mor e over, the e dg es can b e labeled with single characters or null ( φ ) whic h mean that multiple states ca n b e activ e si- m ultaneous ly in an NF A. The NF A is very useful in parallel pro cessing because it can pro cess input c har - acter in multi bra nc hes of NF A and may output multi acceptance s tate for input on the contrary of DF A [21]. F or its usa bilit y , there are many eﬀorts to construct DPI s y stems which dep end on NF A. In [33], Reetinder et a l. were the ﬁrst how to use the NF A to construct regular expressions in given text using FPGAs. T o match a regular expr e s sion o f leng th n , a ser ial machine requires O (2 n ) memory and takes the time co mplexit y of O (1) p er text character. How ever, they pro posed an approach that requires the O ( n 2 ) space and still pro- cess a text c har a cter in O (1) time (one clo ck cycle). Additionally , they pr esen ted a simple a nd fast alg o- rithm tha t quickly constructs the NF A for the given regular e x pression. F ast NF A cons tr uction is crucial beca use the NF A structure depends on the regular ex- pression, which is kno wn only at run time. F urther- more, in [13], Clar k et al. implemen ted FPGA base d m ulti character deco der for DPI whic h based on NF A. 5.2 Deterministic Finite Automata The Deterministic Finite Automata (DF A) consists of a ﬁnite set of input sy mbo ls (whic h are denoted as P ), a ﬁnite set of states, and a transition function to mov e from one state to the other denoted as ∂ . In contrast of NF A, DF A ha s only one active state at a n y given time [21]. Regular Expressio n: The reg ular express ion is required as a need for pack et payload insp ection to diﬀerent pro tocols packet s. It in tro duces a limited DPI sy stem to dea l with all pack ets structure s. As the r esult of this limitation, state-of-art sy stems hav e been in tro duced to replace the string sets of intrusion signature with more expressiveness regular expres sion (regexp) sys tems. Therefore, there are several con- ten t insp ection engine which have pa rtially or fully mi- grated to reg exps including the those in Snort [3 5] , Bro [10], 3 com’s TippingPoint X506 [42], SafeXcel [19], a nd Cisco systems’ [23]. Ho wev er, using the reg exp to rep- resent pa tt er ns includes conv erting this regexp to De- terministic Finite Automata (DF A) [21]. This DF A is represented in the DPI systems as table. This table represents the sta tes and tra nsitions of DF A a s reco rds which mean that the expa ns io n of memory table of DF A of regexp dep ends on the size of DF A. Exp erimen tally , DF A of r e g exp that contains hun- dreds of pattern yields to tens of thousa nds of states which mean memory consumptions in hundreds of megabytes. As a solution of one of the common prob- lems of HW based DPI solutions is the memory access beca use the memory a c c esses for the con tents of the oﬀ c hip memory are propo r tional with the n umber of b ytes in the pack et. In [26], Kumar et al. noted that the implemen ta- tion for the reg exps of in truder signa tures consumes m uch memory and there should b e a wa y that re duces the regexp memory co ns umption without increa sing the n umber of memor y lo okup to op erate DPI system which is considered an additional problem due to the related loo kup delay . T o reduce the memory access , they also introduced a delayed input DF A D 2 F A whic h tries to compact the traditional DF A for regexp a c c ord- ing to that they note s o me states in DF A that had the same outgoing transition. F or example, if there are tw o states s 1 1, s 2 that introduce tra nsition to the same out- going set o f stats ( S ) for set of input characters C, this transition can b e eliminated from state s 1 b y default transition DT to s 2 . According to this a ssumption, the sta te s 1 can main- tain a ll the transitio n of state s 2 via state s 1 and then pa s sing to next state. D 2 F A constructs a co m- pact DF A which decr e ases the memory consumption b y DF A. How ever, compa c ting the memory represen- tation b y default transition leads to manipulation of m ultiple default transition b efore going to the next T able 1. Comparison between Existing Architectures Algorithm / Comp onent Implemen tation Device Throughput (Gbps) Parallel Blo om Filters [17] FPGA X CV2000 E 2.46 Aho-Coras ic k [3] FPGA 12.35 TCAM [2 0] TCAM 2 Aho-Coras ic k [44] – 8 TCAM/FPGA [43] Xilinx Virtex2 10 nnnnn/SRAM [2] – 14 Selectiv e multi-cha r ac ter transitions /FPGA [37] Xilinx XC2V6000- 6 14 B-FSM/(FPGA or ASIC) [45] Xilinx Virtex- 4 10 ∼ 20 nnn/SRAM [3] FPGA/ASIC 1 ∼ 20 R TCAM [46] TCAM 12.35 Pre-Deco ded CAM [36] Virtex 2-6000 9.7 Quad B lo om Filter/FPGA [6] Xilinx Virtex4 20.4 BITWISE CAM [50] FPGA Xilinx XC2V8000 2.5 FPGA [18] Virtex-4 10 UCLA Pack et/FPGA [11] Xilinx Spa r tan 3-XC3S2000 3.2 NF A/(FPGA and IXP) [1 2] Xilinx Virtex2 -6000&IXP 2400 1 GaT ech Deco der T rees/ FP GA [13] Virtex 2-8000 2 W ashU Blo om/FPGA [5] Virtex 4-100 20.4 Hash F unction [49] Xilinx V ertex-I I Pro XC2VP70 2 Hash F unction and CRC [30] Xilinx V ertex2 2.712 ∼ 4.560 TCAM/Net work Pro cessor [38] Netw ork Processor IX DP28 xx [22] 10 0 4 3 h e 2 r 1 5 s m i 6 he her him his h h h h Figure 5. Aho-Corasick DF A f or patterns “he”, “ she”, “his” , and “her”, we did not in- c lude all failure edges for simplicity . state. Manipulating mult iple DTs means that multiple memory accesses are required whic h decre a se the DPI pro cess thro ughput. How ever, the they (i.e., K umar et al.) found that applying D 2 F A can reduce the memor y usage dra matically ab out 95% whic h helps to imple- men t DPI in an On-chip memory and that leads to high bandwidth in memo ry access and decreases the eﬀect o f m ulti-trans ition a ccess by DTs to pro cess input charac- ter. The construction of D 2 F A from DF A is NP- ha rd. Therefore, they in tro duce heuristic algo rithms to ﬁnd D 2 F A with bala ncing b et ween the depth of DTs and the memory consumption for D 2 F A. D 2 F A construc- tion heuristic based up on maximum weigh t spanning tree cr eates long default paths [25]. In [27], which is also by K uma r et al., a new repre- senti ng for regexp has b een developed a s an alternative to D 2 F A which has the prop ert y of being compressed from D 2 F A and improve the ability of pro cessing multi DTs to handle input characters by intro ducing more in- formation in state identiﬁ er s . Cont ent-addressed D 2 F A CD 2 F A replaced sta te iden tiﬁers with conten t lab els that include part of information that would no rmally be stored in table entry for the state. The main idea of CD 2 F A is exploit the D 2 F A compaction to DF A but on the o ther hand is to ov erco me the m ulti TDs tra vers- ing to manipulate the input. Notwithstanding, C D 2 F A need to increase the s ize of the states la bel to hold more information ab out the next state a nd DTs. So that, there a r e t wo ob jectives to satisﬁed: First, to ensure that states have few la beled transitions . Second, to ensure that default paths are as small as poss ible. According to exp erimen tal ev a luation, CD 2 F A go beyond uncompr essed DF A. F urthermore, CD 2 F A with (a) Aho-Corasick ﬁnite state mac hine (b) Compressed AC Figure 4. Compressed A C for high speed DPI 1KB cache achiev es double throughput than uncom- pressed DF A and with 10% of memory requirement . Aho- Corasick Algorithm: Aho- Coras ic k Algo - rithm (AC) [1] is one of the well known a lgorithms for m ulti-string (patterns) matc hing by e nco ding in truder patterns in FSM in a prepro cessing phas e. After that, the g enerated FSM has ro ot state which represent that no string have b een matc hed or even par tia lly ma tched and all patterns characters enumerated from ro ot. If any pattern has same preﬁx, it means that the pattern shares a common preﬁx also with the cor responding set of pa ren t no des in the tier . Figure 5 shows a ex- ample of the AC FSM construction for patterns “ he”, “she”, “his”, and “ her”. Ho wev er, AC construction is memory consumption as a result of the huge num b er o f failed transitions that pr oportiona l with the num b er of patterns in FSM. Thus,classical AC takes more sto rage than it is likely to ﬁt in a on-c hip SRAM or the ca c he of a pro cessor [44]. Additionally , In [3], Mansoor et al. constructed a compressed ﬁnite state machine that enco des all the in- trusion patterns and makes state transitions on m ulti- ple (at mo st k ) input characters . Therefore, they sta rt constructing Aho-Coras ic k DF A as in Fig ure 4(a), then they create an equiv alent state machine called the com- pressed DF A as illustrated in Figure 4(b) where it has transitions on multiple input characters by co mbining k consecutive s ta tes of Aho-Cor asic k DF A. Co n versely , in [40], Lin et al. pro p osed a new construction for A C by splitting the input character to bits a nd co n- structing sma ll blo c ks that represe nt p ortion of r ules with p ortion of bits for eac h rule. This co ns truction exploits a sp eedy on-chip memory to uploa d the s ma ll blo c k of the s ystem and sp eed up the overall system throughput. 6 Comparison b et w een E xis ting Mod- ules and Implemen tations In this section, we in tro duce a comparison be t w een recent applied IDS with diﬀeren t har dw are implemen- tations. O ur compariso n focuses on the a lg orithm, type of hardware implementations which a re used in design- ing the DPI architecture and the resulting thro ugh- put a s illustrated in T able 1. How ever, other r elated prop erties including the required memo ry and other sp e c iﬁca tions might b e referred in the corres ponding reference. 7 Conclusion In this pap er, w e in tro duced a survey on some of the e x isting and on-going research works on DPI. Our survey included the challenges a nd ultimate goals be- hind the design of the the DPI and its implementa- tions. Also, we in tro duced an ov erview o f the exist- ing implement ations including bo th the so ftw ar e and hardware. As the ﬁnite state machine (or automata) is a n impo rtan t comp o nen t o f the hardware desig n, we considered the its diﬀerent classiﬁed types and the o n- going resear c h b eing per formed on ea c h t yp e. Finally , we in tro duced a co ncluding comparison b et ween the existing mo dules a nd har dw are implemen tations and relating this comparison to the achiev ed throughput. W e b eliev e that this area o f research is still active and several works need to b e per formed on the diﬀerent sides o f the implementation (hardware a nd so ft w a r e) in addition to the design o f fast matc hing alg orithms that ﬁt to the increasing demanded throughputs. Our survey is the ﬁr st step for putting the readers in to the the DP I systems and the op en resear ch topics in the ﬁeld. References [1] Alfred V. Aho and Marg aret J. Corasick. Eﬃcien t string matchi ng: An a id to bibliogr aphic search. Commun. ACM , 18(6):333 –340, 19 75. [2] Monther Aldw air i, Tho mas M. Conte, and Paul D. F ra nzon. Conﬁgurable string ma tc hing hardw are for speeding up intrusion detection. SIGAR CH Computer Ar chite ctur e News , 33(1):99–10 7, 2 005. [3] Manso or Alicherry , M. Muthuprasanna, and Vijay Kumar. High sp eed pattern matching for net work ids/ips. In ICNP) , pages 18 7–196, 2 006. [4] Sp yr os Antonatos, Kostas G. Anag nostakis, a nd Ev a ngelos P . Mark atos. Generating rea listic work- loads for netw or k intrusion detection systems. In WOSP , pages 207 –215, 20 04. [5] Mich a e l Att ig, Sarang Dharmapurik ar, and John W. Lo c kwoo d. Implemen tation results of blo om ﬁlters for string matching. In FCCM , pages 322–3 23, 20 04. [6] Mich a e l Attig and Jo hn W. Lo c kwoo d. Sift: Snor t in trusio n ﬁlter for tcp. In Hot Inter c onne cts , pa ges 121–1 27. IEE E Co mput er So ciet y , 2005. [7] Zachary K . Bak er and Vikto r K. Pr asanna. Au- tomatic s yn thesis of eﬃcient in trusio n detection systems o n fpgas. In FPL , pages 31 1–321, 2004. [8] Herb ert Bos a nd Kaiming Huang . T owards softw are-based signature detection for intrusion preven tion on the netw ork card. In RAID , pages 102–1 23, 20 05. [9] Rob ert S. Boy er and J Strother Mo ore. A fa s t string searching algorithm. Communic ations of the ACM. , 20 (10):76–172, 1 9 77. [10] Bro. Intrusion detection sys tem. htt p:// www.bro- ids.org/. [11] Y oung H. Cho and William H. Mang ione-Smith. Deep pack et ﬁlter with dedicated log ic and read only memor ies. FC CM , 00:12 5–134, 20 04. [12] Chris Clark, W enke Lee, David Sc himmel, Didier Contis, Mohamed Kon, a nd Ashley Thomas. A hardware platform for netw ork intrusion detection and preven tion. In Thir d Workshop on Network Pr o c essors and Appli c ations,Madrid, Sp ain , 2004 . [13] Christopher R. Clark and Da vid E. Schim- mel. Scalable pattern matching for high sp e e d netw orks. In IEE E Symp osium on Field-Pr o gr ammable Custom Computing Ma- chines,(F CCM) , pages 24 9–257, 2004 . [14] C.J. Coit, S. Stanifor d, a nd J. McAlerney . T o- wards faster string matching for intrusion detec- tion o r exceeding the speed of snort. In DARP A Information S urviva bility Confer enc e & Exp osi- tion II , pa ges 367 –373, 200 1. [15] B. Commentz-W alter. A string matching algo- rithm fast on the a verage. In Pr o c e e dings of ICALP , pa g e 11 8132, 1979. [16] Willem de Bruijn, Asia Slowinsk a, Kees v an Reeuwij k, T omas Hruby , Li Xu, and Herber t Bos. Safecard: A giga bit ips on the netw or k ca rd. In RAID , pag e s 311–3 30, 2006 . [17] Sarang Dharmapurik ar , Praveen Krishnamurth y , T o dd S. Sproull, and J ohn W. L o ckw o od. Deep pack et inspection using parallel bloo m ﬁlters. IEEE Micr o , 24(1):52 –61, 2 004. [18] Sarang Dharmapurik ar and John Lockw o o d. F ast and scalable pa tt er n matc hing for con tent ﬁlter- ing. In Pr o c e e dings of the 2005 symp osium on Ar chite ctu r e for net wo rking and c ommunic ations systems , pag es 183 – 192. ACM P ress, 2005. [19] SafeXcel Conten t Inspection En- gine. Hardware r egex a cceleration ip. h ttp://s a fenet-inc.com/Library/3/ SafeXcel- 4850 Pro ductBrief.pdf. [20] Y u F ang, Randy H. Katz, and T. V. La kshman. Gigabit r ate pack et pa ttern- matc hing using tcam. In ICNP , pages 174–183 , 20 04. [21] John E. Ho p croft, Jeﬀrey D. Ullman, and Ra jeev Motw ani. Intr o duction to A ut oma ta The ory, L an- guages and Computation. Addison-W esley , 2 001. [22] Intel. Intel 2800 netw or k pro cessor, har dware ref- erence manual. Jan. 2004 . [23] Cisco IOS. In trusion preven tion systems deploy- men t guide. http://www.cisco.com/. [24] Donald Kn uth. The Art of Computer Pr o gr am- ming: Semi-nu m eri c al Algorithms, volume V ol.2, thir d e dition . Addison-W esley , ISBN: 0-20 1-89684- 2, 1 997. [25] J.B. Krusk al. On the shortest spa nning s ubt r e e of a graph and traveling salesman problem. The Ameri c an Mathematic al So ciety , 7:45–50, 19 5 6. [26] Sailesh Kumar, Sarang Dharmapurik ar , F ang Y u, Patric k Crowley , and Jonathan S. T urner. Algo- rithms to accelera te m ultiple r egular expressions matchin g for deep pack et insp ection. In S IG- COMM , pages 339– 350, 2 006. [27] Sailesh K umar, J o nathan S. T urner, a nd John Williams. Adv a nce d algorithms fo r fast and scal- able deep pac ket insp ection. In ANCS , pages 8 1– 92, 2006. [28] L7-ﬁlter. Application lay er packet classiﬁer. h ttp://l7 -ﬁlter.sourceforge.net/. [29] Karthik Lakshminarayanan, Anand Ra ngara jan, and Sriniv asan V enk atachary . Alg orithms for ad- v anced pack et class iﬁca tion with ter nary ca ms. In SIGCOMM , pages 193– 204, 2 005. [30] Giorgos P apa dopoulos a nd Dionisios N. Pnev- matik atos . Hashing + memory = low co s t, exact pattern matc hing. In FPL , pages 3 9–44, 2005. [31] Mic hael Rash, Angela D. Oreba ug h, Gra ham Clark, Becky Pink ard, and J ak e Babbin. Intru- sion Pr evention and A ctive R esp onse: Deploying Network and Host IPS. Syngress , 20 0 5. [32] Shai Rubin, Somesh Jha, a nd Ba rton P . Miller. Protomatching netw ork traﬃc for high through- put net work intrusion detection. In ACM Confer- enc e on Computer and Communic ations S e curity , pages 47–58, 200 6. [33] Reetinder Sidh u and Pra sanna V. K . F ast regular expression ma tc hing using fpgas. In FPL , pag es 484–4 93, 20 04. [34] Sumeet Singh, Cristian E stan, George V a rghese, and Stefan Sa v age. Automated worm ﬁng erprin t- ing. In OSDI , pages 45 –60, 2 004. [35] SNOR T. Net work intrusion detection system. h ttp://www.s no rt.org/. [36] Ioannis Sourdis a nd Dionisios P nevmatik atos. Pre-deco ded cams for eﬃcient and high-sp eed nids pattern matching. In FCCM , pag e s 258– 267, 2 0 04. [37] Y utak a Sug a wara, Mary Inaba, and K ei Hiraki. Over 10g bps string ma tching mechanism for multi- stream packet scanning systems. In FCCM,IE EE , pages 227–23 8, 2 001. [38] Jung-Sik Sung, eok Min K ang, Y oungseo k Lee, T ae ck-Geun Kw on, a nd Bo ng-T ae Kim. A mu lti- gigabit rate deep pack et insp ection algo r ithm us- ing tcam. In GLOCOM) , pages 453–45 7 , 2 005. [39] Jung-Sik Sung, Seok-Min Ka ng, and T aeck-Geun Kwon. A fast pattern-matching algorithm for net- work in trusion detection system. In Networking , pages 11 57–1162 , 2 006. [40] Lin T an, Br ett Br otherton, and Timothy Sher - woo d. Bit-split string- ma tc hing engines for in- trusion detection and preven tion. T ACO,AC M , 3(1):3–34 , 2006 . [41] David E. T aylor. Surv ey and taxonomy o f packet classiﬁcation techniques. ACM Comput. Surv. , 37(3):238 –275, 2005 . [42] TippingPoin tX050 6 . Tipping- po in t in trusio n preven tion s ystems. h ttp://www.tippingp oint .com/ products ips.h tml. [43] Gerald T ripp. A ﬁnite-state-mac hine based string matchin g system for in trusion detection on high- sp e e d net works. In EICAR 2005 Confer enc e Pr o- c e e dings , pages 26– 40, May 200 5. [44] Nathan T uck, Timothy Sherw o o d, Brad Calder, and George V arg hese. Deterministic memory- eﬃcient string matching algo r ithms for in trusion detection. In INFOCOM , 2004 . [45] Jan v a n Lunteren. High-p erformance pattern- matchin g for int r us io n detection. In INFOCOM , 2006. [46] Y a ron W einsb e r g, Shimrit Tzur-Da vid, Danny Dolev, and T a l Anker. High p erformance string matchin g algorithm for a netw ork intrusion pre- ven tion sy s tem (nips). In HPSR , pages 7–pp, 2006 . [47] Patric k Wheeler and Err in W. F ulp. A taxonomy of par allel techni ques for intrusion detection. In AC M Southe ast Re gional Confer en c e , pag es 2 78– 282, 2 007. [48] Sun W u and Udi Manber. A fast algorithm for m ulti-pattern searching. Department of Computer Science, Universit y of Arizona, 1994. [49] Seungyong Y o on, Byoungk o o Kim, a nd Jintae Oh. High-p erformance stateful intrusion detection sys- tem. In IEEE,Computational Intel ligenc e and Se- curity , volume 01, pages 574–579 , 20 06. [50] Sherif Y us uf a nd W ayne Luk. Bitwise optimised cam for netw ork intrusion detection sys tems. In FPL , pag es 4 44–449, 200 5 .

A Survey on Deep Packet Inspection for Intrusion Detection Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment