Enhanced User Authentication through Trajectory Clustering
Password authentication is the most commonly used technique to authenticate the user validity. However, due to its simplicity, it is vulnerable to pseudo attacks. It can be enhanced using various biometric techniques such as thumb impression, finger …
Authors: Hazarath Munaga (Dr MHM Krishna Prasad), J. V. R. Murthy, N. B. Venkateswarlu
International J ournal of Recen t Tre nds in Engineering Enhanced User Authenticati on through Trajectory Clustering Hazarath Munaga *1 , J. V. R. Murthy 1 , and N. B. Venkateswar lu 2 1 Dept. of CSE, J NTU Kakinada, India Email: {hazarath. munaga, mjonnalagedda}@ gmail.com 2 Dept. of CSE, AIT AM, Tekka li, India Email: venkat_ritc h@yahoo .com * Hazarath Munaga ali as MHM Krish na Prasad Abstract — Password a uthentication is th e most comm only used technique to a uthenticate the user vali dity. How ever, due to its si mplicity, it is vulner able to pseudo attac ks. It can be enhanced usi ng various biometric techniques such as thumb i mpression, finger m ovement, ey e m ovement etc. In this paper, w e concentrate on the m ost economic technique, based on th e user habitual rhyt hm pattern i .e. no t wha t they type b ut how they type is the measure for authenticating th e user. We consider t he latency between k ey events as the trajectory, and trajectory clustering is used to obtain the hidden patterns of the user. Obtained pattern can be considered as a cluster of measure ments t hat ca n be used to differentiate from other users. We evaluated the proposed technique on the data obtained from the 100 users. Index T erms —key stroke analysis, trajectory cl ustering, key stroke latency, key stroke biometrics I. I NTRODUCTI ON As a computer engineer, t he main goal is to pr otect the information or resources fr om unauthor ized users. I n literature there are several methods to authenticate validity of the user varying fr om the usage of biometrics to s mart card s. Among them, p assword authentication i s the most acceptab le and w idely used mechanism beca use of its econo mical, oper ational and imple mentatio n advantages. The method relies on the fact that only the authorized user kno ws the correc t password. There is no security in the use o f p assword s if an i mpostor k nows the password. Hence, to improve the securit y of user authentication one o ption is to rep lace the passwords with a biometrics ide ntification of th e user. Curre ntly, there are three major forms of biometrics: physiolo gical, behavioral, and token based. Physiological based biometrics rely on biological attributes such as fingerprints, iris, a nd retina patter ns. Beha vioral b io- metrics utilize b ehavio ral attributes s uch as voic e, signature, and ke ystroke d ynamics. Token-based systems require the possessio n of a se curity de vice such as an ID card. Behavioral b iometrics wo rks o n the way we interact with the authenticatio n system . The most po pular systems ar e b ased o n voice, signature, and keystroke dyna mics, out of these, keystroke dynamics is purely software-ba sed, it is less expensive and more user tr ansparent. As Gaines et al. [1] observation, a user’ s keystro ke pattern is hig hly repeatable and distinct from that o f other user s (typing biometric), w hich can b e use d to d iscriminate the o wner from i mpostors. Hence, typing biometrics based authentication uses an individual's unique typing pattern to validate the a uthentic user among i mpostors. T he action of typing the passwo rd can be analyzed with respect to its physiological character istics i.e. the la tency time betwee n ke ystrokes, keystroke press ure, ke y displace ment, and key displac ement duration. Ref. [1] introduced the use of keystroke timings as a means of a uthenticatio n using seven professio nal typists. Since then there hav e been a number of researc h studies [2-9] on authentication of users based on keystroke timings using various tec hniques ranging fro m deterministic algor ithms to machine lear ning, and clustering algorithms as learning algorithms for classification. Ref. [2] d evelops a p rofile using stati stical a nalysis method involving means a nd standard deviations of latencies between consecutiv e keystrokes. Under these types o f statistical models, if a user were to t ype each ke y much faster than usual, then he would most likely be rejected because the timing meas urement of eac h of his pairs of consecuti ve keystrokes would fall beyond the stored mean of his trained pr ofile. In 1997 Monro se and Rub in use the E uclidean Distance a nd pr obabilistic calculations based on the assumption that the latency ti mes for o ne-digraph exhibit s a Normal Distribution [3]. Afterwards, in 2000, the y also present an a lgorit hm for identificatio n, based on th e similarity models of B ayes, and in 2001 they present a n algorithm t hat uses p olynomials and vector sp aces to generate co mplex p assword s fro m a simple one, using the keystroke pattern [4 ]. In 200 0 [5] demons trated using neural net work (NN) novelty detection model, w hich was built by tr aining the owner’s patter ns only, a nd the m odel w as used to detect impostors usin g some so rt of si milarity meas ure, reporting a 1. 0% false rej ection rate ( hereinafter, FRR) and 0% false a ccepta nce rate ( hereinafter, F AR). Sung e t al [6] has also a pplied NN to keystroke dynamic s, generating error rates on the order of 2-4%. How ever, such solution suffers from typical NN limitations, e.g., International J ournal of Recen t Tre nds in Engineering conditional independence (i.e., b eing in a state depends only on the previou s state). Revett et al. have u sed th e ro ugh sets ind uction algorithm to extract rules that form models for predictin g the validity of a login ID/p assword atte mpt [7]. T he results indicate t hat the err or r ate can be as lo w as 2% in many cases. In add ition, the use o f a multiple seq uence alignment al gorithm has been successfully deplo yed to authenticate a group of users with virtually 100% success [8]. Ref. [9] d emonstrated the usage of k- means c lusterin g for validating t he user usi ng ke y stroke d ynamics. Whereas, the k-means (i) is not suitable for generating non-globular clu sters and, ( ii) it is difficult to compare the qualit y of the c omputed clu sters (e.g. the different initial partitions a nd the k value a ffect the outco me). Finally, the k- means appro ach requires lot o f computational ti me for conver gence. The use of key stroke dynamic s ba sed u ser authentication is no longer a novel concept, but, novelty in this st udy is the adopted d issimilarity measure and t he technique used to identify the user from i mpostors. Ref. [11] demonstrated the usage of traj ectory clustering for visualizing, a nalyzing a nd obtainin g hidden patterns from user navi gations obtained from virtual environments. Re f. [12] demonstrate d the u sage of trajecto ry clusteri ng for selecting cluster heads whic h implicitly used to extenuate the life time of wireless sensor networks. I n th is s tudy, we employ a trajec tory based clustering al gorithm for authenticate t he leg itimate user based on the two exp licit ( key p ressed, ke y released ) and one i mplicit ( key typed , which is used to o btain the applied p ressure on the key) event s. II. T RAJECTORY C LU STERING A LGORITHM The success o f any cl ustering algorithm depe nds on t he adopted dissimilarity measure. This section e xplains about the adop ted dissi milarity measure. Ref. [13], proposed th e us age of Eu clidean distance between time series o f equal length a s t he m easure o f their similarity. The idea has been generalized in [14 ] for subsequence matching. In a similar way [15] us ed Discrete W avelet Tr ansform and [16 ] used P rincipal Component Analysis for measuring time ser ies similarit y. Another appr oach which is b rought fro m i mage processing is time w arping tec hnique and it is used in [17] to match signals i n speech recog nition. Berndt a nd Clifford [1 8] suggested this technique to measure the similarity of ti me-series d ata in data mining. Recen t works have also u sed this simi larity measure [19][20 ]. Ref. [ 9] suggested the usage of Canberra distance for finding the dista nce bet ween user samples. Here we used Hausdorff measure [21 ] for c alculating dissimilarit y between trajector ies, and o bserve that the Ha usdorf f measure is more se nsitive for small changes than the Canberra distance, which we can o bserve c learly fro m the Fig. 1 (ob viously, trajecto ries A and B are overla pped due to its si milar nature). Note that, we c onsider the ke y stroke late ncy id also for calculati ng the Hausdorf f dissimilarity. The following are the some defi nitions used i n our algorithm. Definition 1 A trajecto ry (t) i s represented as trj(t id , u 0 , u 1 , u 2 .. u n ) , where t id is a unique trajectory id (user id), and is a sequence o f key e vents reflecting the key stroke late ncies. Definition 2 We d efine the key stroke d issimilarity function bet ween two trajec tories t 1 and t 2 as the maximum of one way distance s bet ween two trajecto ries. As mentioned in [12] , the one w ay distance f rom a trajecto ry t 1 to a nother traj ectory t 2 is d efined as the integral of t he Ha usdorff dis tance b etween p oints o f t 1 to trajecto ry t 2 divided by the n umber of po ints (|t 1 |) in t 1 . dist ow (t 1 ,t 2 ) = dp t p d t t p h ∫ ∈ 1 ) , ( | | 1 2 1 The Hausdorff dis tance from a trajectory p oint p to another trajecto ry t 2 is defined as { } ) , ( mi n ) , ( 2 2 q p d t p d t q ∈ = The distance bet ween traj ectories t 1 a nd t 2 is the maximum of their one way d istances. dist(t 1 ,t 2 ) = maximum {dist ow (t 1 ,t 2 ), dist ow (t 2 ,t 1 )} Clearly the dist ow (t 1 ,t 2 ) is not sy mmetric but dist (t 1 ,t 2 ) is symmetric. No te that dist ow (t 1 ,t 2 ) is the i ntegral of the shortest distance s from points in t 1 to t 2 . Label Trajectory A {206,232,192,212,2 10,168,277} B {206,232,192,212,2 10,168,277} C {216,242,202,222,2 20,178,287} D {254,285,135,120,1 90,228,350} E {1 90,220,160,175, 235,248,312} Fig. 1(a) Sample Trajectories Trajectory A B C D E A 0 0 0.16 0.03 0 .05 B 0 0 0.16 0.03 0 .05 C 0.16 0 .16 0 0.19 0.11 D 0.03 0 .03 0.19 0 0.09 E 0 .05 0 .05 0.11 0.09 0 Fig. 1(b) Canberra dissimilarity fo r Sample Trajectorie s Trajectory A B C D E A 0 0 8 .22 27.73 1 1.8 B 0 0 8 .22 27.73 1 1.8 C 8.22 8.22 0 2 8.56 10.95 D 27.73 27.73 28.56 0 2 1.34 E 1 1.8 11.8 10.95 21.34 0 Fig. 1 (c) Hausdorf f dissimilarity fo r Sample Trajector ies Fig. 1(d) Visualization of the sample trajectories International J ournal of Recen t Tre nds in Engineering A. Gene ralized Trajectory Cl uster Routin e Trajec tories are grouped into cl usters using t he threshold . Here the threshold is considered as a maximu m value, such t hat all traj ectories (b elongs to each user) are grouped into a si ngle cluste r. The traj ectory cluster routine contains the follo wing stages: 1. Dissimilarity m atrix for traj ectories w ill be computed using t he Hausdor ff distance, 2. Using Initialization Algor ithm (Table. 1) trajecto ries are groupe d into initial clusters; 3. Using RepTraj Algorithm (T able. 1) representative trajecto ries are computed. 4. By co nsidering the tr ajecto ries received fro m step 3, as initial clus ter centre s, using Re-clu ster Algorithm (T able. 1) re compute clusters and t heir representative traj ectories until there is no change in the repre sentative traje ctories. TABLE I T RAJECTORY C LUSTERING A LGORITHM Algorithm – 1 (Initialization) a. Take first sample as first cluster. Classify all the remaining trajectories into this cluster if they are within the threshold. b. Take a trajectory (sequentially) which is not already classified into any of the cluster and consider it as a new cluster. Take all the other trajectories which are not kept in any of the clusters and keep in this cluster if they satisfy the threshold limit. c. Repeat step b till no new clusters are added. Algorithm – 2 (Representative Trajectory of Cluster C) For each Trajectory of cluster C calculate cumulative dissimilarity with all other trajectories of the same cluster C. Select the trajectory which is having minimum cumulative dissimilarity and take this as representative trajectory of that cluster. Algorithm – 3 (Re-compute) 1. For each Trajectory calculate dissimilarity with all the K representative trajectories and classify to the cluster for which dissimilarity is low (if it is within the threshold ). 2. Re-calculate representative trajectories using Algorithm – 2. III. E XPERI MENTAL W ORK Proposed algorithm i s e xperimented on the dataset obtained from 1 00 computer science undergraduate and graduate stude nts o f the J NTUK 2 . The stude nts' ar e aske d to enro ll a nd a uthenticate duri ng one month p eriod of this study. T he user s are asked to enter there registration number as the lo gin ID and their self chosen ra ndom length strin g as the p assword. Fro m the ob tained sa mples, for each user, a user feature is c reated in the following way: 1. Each p articipant is as ked to type his/her password five ti mes in three r andom sessions. 2. From e ach sessio n, using the ab ove trajec tory clustering al gorithm, a repTra j is genera ted, like thi s, thre e repT raj ar e generated from the three rando m sessions. 3. Average dissimilarity between the t hree repTraj is considered as the userThreshold , 2 http://jntukakinada.e du.in; dataset is availabl e: http://sites.goo gle.com/site/munaga71. and b y clustering the three repTraj (o btained from step 2) a userRepTraj is generated. 4. The userRepTraj and the u serThreshold are considered a nd stored as the u ser feature . After obtaini ng the user feature, the participant s are asked to face the a uthentication pha se for c omputing F RR and F AR. For au thenticating the user, as usual, the user is asked to enter his login ID and p assword, from the entered da ta, a traj ectory is generated and dissimilarity is computed with the u serRepTraj , if t he dissi milarity is within t he u sersThresh old , then t he user is considered as the authorised u ser. Following Fig. 2, s hows the obtained visualization s from the tool. Fig. 2-(i) sho ws the nor mal ( familiar use r with the co mputer) user visualization, in F ig. 2-(i) a, b, c shows the three rando m se ssions and the red highlig hted trajecto ry is the rep Traj of the session r espectivel y, and highlighted one i n Fi g. 2-(i)d is the userRep Traj, which is obtained by clustering the three repTraj sho wn in (i )-a, b and c. Similarly, Fig. 2-(ii) s hows the example case of a novice user ( with t he computer) where the FRR i s observed. Fig 2. (i) Visualiz ations of the Normal user Fig. 2 (ii) Visualizat ions of novice us er, whe re the FRR rate is observe d International J ournal of Recen t Tre nds in Engineering IV. C ONCLUSI ONS In this pa per, we have pres ented a novel traj ectory clustering technique for authenticati ng the user usin g keystroke d ynamics. W e have de monstrated the effectiveness of our solution by testing the prop osed technique o n t he da ta obtained fro m the 100 co mputer science undergraduate and graduate students. As discussed in sect ion 3, the techniq ue proved to be efficient in supporting the ad ministrator for authenticating the va lid user a mong other user s, even though the p assword theft occurre d. Moreover, f or the typing b iometrics system, i t w as o bserved that the Hausdorff dissi milarity mea sure can be ado pted for getting effective res ults, i.e. as sho wn i n section 2 the Hausdorff dis similarity measure is more sensitive for small variat ions. As a limitatio n of the p ropo sed technique, durin g e xperi mentation phase, we tested the toll o n a novice user, where we observed the False Rejection Rate. Even t hough the co ntinuous monitoring concep t of key stroke d ynamics is available , but it can not be applicable for all applications (e. g. accessing ATM, where on ly small amount of time/keys used to access the system), this static concept o f key str oke d ynamics will be u seful. User authentication suppo rted by Keystroke d ynamics has many a pplications in the to day's electron ic world especially ap plicatio ns wher e secure data tr ansfer is mandatory. R EFERENCES [1] R. Gaines, W. Lisowski, S. P ress, and S. Shapiro, “Authentication by ke ystroke timing: Some p reliminary results,” Rand Report R-256-NSF , Rand Corp, 1980. [2] R. Joyce and G. Gupta. “Identity Authenticatio n Based o n Keystroke La tencies.” Communications of the ACM , V ol. 33, No. 2, p168-176, 1990. [3] F. Monrose and A. D. Rubin, “Authentication via Keystroke D ynamics,” Proc. of the Fourth ACM Conference on Computer and Commu nication Security , Zurich, Switzerland, 1997. [4] F. Monrose, M. K. Reiter, and S. Wetzal, “P assword hardening based on keystroke dynamics,” International Journal of Information Security , pp. 69-83, 2001. [5] Cho S, Han C, Han D, Kim H. “Web-based keystroke dynamics identity verification using neural n etwork,” J Organ Comput Electron Commerce 2000;10(4):295e307. [6] K. S. Sung and S. Cho, “GA SVM wrapper ensemble for keystroke d ynamics auth entication,” Proc. o f In ternational Conference on Biometrics , Hong Kong, pp. 654-660, 2006. [7] K. Rev ett, S. Magalhaes, and H. Sant os, “D eveloping a keystroke dynamics based agent using rough sets,” Proc. of The 2005 IEEE/WIC/ACM International Joint Co nference on Web Intelligence and Intelligent Agent Technology Workshop on Ro ugh Sets and Soft Co mputing in Int elligent Agents and Web Technol ogy , Compiegne, France, pp. 56- 61, 19-22 September, 2005. [8] K. Revett, “On t he use of multiple seq uence alignment for user authentication via keystroke dynamics,” Proc. of International Conference on Global eSecurity , Unive rsity of East London, pp. 112-120, 16-18 April, 2007. [9] Cheng Soon Ong and Weng Kin Lai, “Enh anced Password Authentication through Typing Bio metrics with the K- Means Clusterin g Algorithm,” Proc. Of Seventh International Symposium on Manufacturing with Applications (World Auto mation Congress) , Maui, Hawaii, June 11-16, 2000 [10] Edmond Lau, Xia Liu, Chen X iao, and Xiao Yu, “Enhanced User Authentication Through Keystroke Biometrics”, 6.857 : Computer and Network Security, Final Project Report , Massachusetts Institute of Techn ology, December 9, 2004. [11] H. Munaga, L. Ieron utti, and L. Ch ittaro. “CAST - A Novel Trajector y Clustering and visualizatio n tool for spatio temporal data,” In IHCI- 2009: Proc. o f t he First International conference on Intelligent Human C omputer Interaction , pages 169–175. Springer, India, January 2009 . [12] H. M unaga, J. V. R. Mu rthy, and N. B. Venkateswarlu. “A Novel Trajector y Clu stering te chnique for selecting clu ster heads in wireless sensor networks,” Internati onal Journal on Recent Trends in Engineering , 1:357–361, May 2009. [13] R. Agrawal, C. Faloutsos, and A. Swami, “Efficient similarity search in sequence da tabases”, In Proc. of th e Fourth International Conference on Foundations of Data Organization and Algorithms , Chicago, pg. 69–84 , 1993. [14] C. Falo utsos, M . Ranganathan, and Y. Manolopoulos. “Fast subseq uence matchin g in time–series databases”, In Proc. of the ACM SIGMOD Conference on Management o f Data , pg. 419–429, 1994. [15] Z. St ruzik and A. Sibes, “Measuring time series si milarity through large sin gular features revealed with wavelet transformation”, In P roc. o f th e 10th International. Workshop on Database and Expert Systems Appl ., pg. 162–166, 1999. [16] M. Gavrilov, D. Anguelov, P . Ind yk, and R. Motwani, “Mining the stock ma rket: Which measure is best?”, In Proc. o f International conference on Knowl edge discovery and data mining , pg. 487–496, 2000. [17] H. Sakoe an d S. Chib a, “Dynamic programming algori thm optimization for spoken w ord recognition ”, IEEE Trans. Acoustics, Speech and Signal Processing , ASSP-26(1):43– 49, Feb. 1978. [18] D. Bern dt and J. Clifford, “Using D ynamic Ti me Warping to Find Pattern s in Time Series”, In Proc. of AA AI94 Workshop on KDD , 1994. [19] E. Keogh and M . P azzani, “Scaling u p Dynamic Time Warping for Data mining Applications”, In Proc. 6th International Confere nce on K nowledge Discovery and Data Mining , Boston, MA, 2000. [20] S. P ark, W. Chu, J. Yoon, and C. Hsu, “Eff icient Searches for Similar Subsequ ences of Different Lengths in Sequ ence Databases”, In Proceedings of IEEE ICDE , pages 23–32, 2000. [21] D. P . Huttenlocher and K. Kedem, “Computing the minimum Hausd orff distance for po int sets under translation,” In SCG ’90: Proceedings of the sixth annual symposium o n C omputational ge ometry . Ne w York, NY, USA: ACM, 1990, pp. 340–349.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment