Characterization of Request Sequences for List Accessing Problem and New Theoretical Results for MTF Algorithm

List Accessing Problem is a well studied research problem in the context of linear search. Input to the list accessing problem is an unsorted linear list of distinct elements along with a sequence of requests, where each request is an access operatio…

Authors: Rakesh Mohanty, Burle Sharma, Sasmita Tripathy

Characterization of Request Sequences for List Accessing Problem and New   Theoretical Results for MTF Algorithm
International Journal of Computer Applications (0975 – 8887) Volume 22 – No.8, May 2011 35 Characterization of Request Sequences for List A ccessing Problem and Ne w Theoretical Results for MTF Algorithm Rak esh Moha nty Dept . o f Co mp. Sc . & E ngg. In dian I nsti tute o f Te chnol ogy Mad ras, Chenn ai, India Bur le S harm a Dept . o f Co mp. Sc . & E ngg Sa mbalpu r Uni versi ty I nstit ute of Info rmati on Te chnol ogy Jyot iVi har, B urla , Ori ssa, Ind ia Sas mita Trip athy Dept . o f Co mp. Sc . & E ngg Vee r Su rendr a Sai Inst itute of Tech nol ogy Bu rla, Oriss a, I ndia ABSTRACT List Accessing Problem is a well stu died research problem in the context of linear search. Input to the list accessing problem is an unsorted linear list of distinct elements alon g with a sequence of requests, where each request is a n access operation on an element of the list. A list accessing algorithm reorg anizes the list while p rocessing a request sequence on the list in o rder to minimize the access cost. Move- To -Fron t algorithm has been proved to be the b est performing list accessing online algorithm till date in the literature. Characterization of the input request sequences corresponding to practical real life situations is a big challenge for the list accessing p roblem. As far as our knowledge is c oncerned, no chara cterization f or the request sequences has been done i n the literature till date for the list accessing p roblem. In this paper, we h ave chara cterized th e request sequ ences for th e list acc essing p roblem based on several fac tors such as size of the list, size of th e requ est sequence, orderin g of elements an d frequency o f occurrence of elements in the request sequence. We have made a comprehensive stud y o f MTF li st acc essing algorithm and obtained new theoretical re sults for our characterized special class o f req uest sequences. Our characterization will o pen up a new direction o f research for em pirical an alysis of list accessing algorithms for real life inputs. General Terms Data Structures, Algorithms, Linked List, Linear List, Data Compression. Keywords List, Request Sequence, L inear Search, Move- To -Front, Cost Model 1. INTRODUCTION In Computer Science, linear search is one of the simplest search algorithm to find a particular element in the linear li st. In linear search, we search an ele ment se quentially one by one in a fixed size unsorted linear list from the start of the list and move towards the end of the list till the requ ested element is foun d. The p erformance o f th is data s tructure can be enhanced by making it self organizing. Each time af t er accessing the requested element, we reorgan ize the li st by performing exchanges o f adjacent elements so that the frequently requ ested elements are moved closer to the front o f the list, th ereby reducing the access cost of subsequent elements. The whole problem o f efficiently reorganizin g and accessing the elements of the list for obtaining optimal cost is called as List Accessing Problem. An algorithm that accesses the sequence of elements in the list based on the current and past requests is called List Accessing Algorithm. A list accessing algorithm uses a cost model to define th e way in w hich the cost is assigned to a requested element when it is accessed in the linear unsorted list. 1.1 Problem Statement In a list accessing p roblem, we are given a list of distinct elements, and a request sequence o f elements. such th at , and and . Each time we a ccess the ele ment from in list , we incur some access cost. After ea ch access, list is reorganized in order to process efficiently. When we rearrange the list, we incur some reorganization cost. The to tal cost for accessing an element in the list is the sum of the access cost and the reorganization cost. Our objectiv e to minimize the total cost while processing a request sequence on the list. 1.2 Applications The list accessing techniqu es have been extensively u sed f or storing and m aintainin g sm all dictionaries. There are various applications in which a linear li st is the implementation of choice. It is used for organizing the list of identifiers maintained by a compiler an d for resolving collisions in a hash table. Another important application of list a ccessing technique s is data compression. Other uses of List Accessing Algorithms are computing po int maxima and con vex h ulls in computational geometry. The List Accessing P roblem is also significant in the application of self organizing data structures. 1.3 Related Work The list accessing problem is of significant theoretical and practical interest for the last four decades. As per our knowledge, study o f list accessing techniques was init iated by the pioneering work of McCabe[ 1] in 1965. He investigated the problem of maintaining a sequential file and d eveloped two algorithms Move- To -Fro nt(MTF) and Transpose. From 1 965 to 1985, the list update problem was studied by many researchers [2], [3], [ 4], [5] under the assumption that a request sequence is generated by a prob ability distribution. Hester and Hirschberg[6] have provided an extensive su rvey of a verage case analysis o f list update algorithms. The se minal paper by Sleator and Tarjan [7] in 1985 made th e competitive analysis of online algorith ms v ery popu lar. The first use o f rando mization and the demonstration of its advantage in th e competitive analysis con text was do ne by Borodin, Linial an d Saks [8] with respect to metrical task systems in 1985. Bachrach et. al. have provided an extensive theoretical and experimental study of online list accessing algorithms i n 2002 [9]. Angelpolous and et. al.[10] have sho wn that MTF o utperforms all other list International Journal of Computer Applications (0975 – 8887) Volume 22 – No.8, May 2011 36 accessing algorithms for request sequence with locality of reference property. 1.4 Our Contribution In this paper, we have characterized the request sequences for the list accessing prob lem b ased o n several factors such as s ize of the list, size of th e request sequence, ord ering of elements and frequency o f occurrence of elements in the request sequence. Our characterization and classification of request sequence is a novel method which will facilitate generation of different request sequ ence for modeling the real world input s for the list accessing p roblem. Here we have made a comprehensive study of MTF list accessing algorithm and obtained new theoretical results for our characterized special class of request sequences. 1.5 Organization of Paper The paper is o rganized as follows. S ection II contains a description of cost models and list accessing algorithms as well as illustration o f MTF algorithm. Section III contains characterization of r equest sequence b ased on li st size, req uest sequence size, o rdering of elements and frequency of occurrence of elements. Section IV contains the an alytic al results of MTF algorithm. Section V provides the concluding remarks and focus on the future research issues. 2. PRELIMINARIES 2.1 List Accessing Cost Models A cost m odel basically defines the way in w hich the cost is assigned to an element when it is accessed in the linear unsorted list. The two most widel y used cost models for the list accessing problem are Full Co st Model by Sleator an d Tarjan an d Partial Cost Model by Ambu hl. For the Standard full co st model, the cost for accessing a requested element is equal to the position o f that element in the input list i.e. for accessing the element in the li st, access cost is . Immediately after an access, th e accessed element can be moved an y distance forward in the list without paying an y cost. These exchanges cost noth ing and are called free exchanges . For any exchange between two adjacent elements in the list, cost is . These exchanges are called paid exchanges . Hence total cost in a full cost model is th e su m of number of paid exchanges and the access cost. For the partial cost model, the access co st is calculated b y th e number of comparisons between the accessed element and the elements present before the accessed element in the list. For accessing the element of the li st, we have to m ake comparisons. Hence the access cost in partial co st model is . The reorganization cost is same as the full cost model. 2.2 List Accessing Algorithms There are two types of list accessing algorith ms - on line and offline. In online algorithms, the request sequ ence is partially known, i.e. we know the cu rrent request only and future requests come on the fly. In o ffline algorit hms, we k no w the w hole request sequence in advance. Till date many list accessing algorithms have been d eveloped out o f which the primitive algorithms are MTF, TRANSPOSE, and FC. In MTF, after accessing an element, th e element is moved to the front of th e list, withou t changing the relative order o f the oth er elements. In TRANSPOSE, after accessing an element of the request sequence, it is exchanged with th e immediately preceding element of th e list . In FREQUENCY COUNT, we maintain a frequency coun t for each ele ment of th e list, each in itialized to zero. We i ncrease th e co unt of an element b y one whenever it is accessed. We maintain the list so that th e elements are in non- increasing order of frequency count. It is p roved that MTF algorithm is unique op timal algorith m for the li st accessing problem. In ou r study, we have considered the Move- To - Front algorithm for the list accessing problem. 2.3 MTF Algorithm a nd Illustrat ion According to MTF algorith m “ After accessing an element in the input list, it is move to front of the lis t, withou t changing the relative o rder of the other elements.” We illustrate the MTF algorithm with the help of an example as follows. Let th e list configuration is A B C D and request sequence is C A A D B. Each time after accessing a requested element in th e li st, th e accessed element is moved to th e front of the list, thereby shifting each o f the p receding elements one p osition forward in the list. (This is shown in Table-1). Here, th e total access cost for ab ove inpu t list and request sequence using MT F algorithm is 3+2+1+4+4=14. Table 1. Illustration of MTF algorithm Steps Accessed Element List Configuration Accessed Cost 1 2 3 4 5 6 C A A D B A B C D C A B D A C B D A C B D D A C B B D A C 3 2 1 4 4 0 Total=14 3. CHARACTERISATION OF REQUEST SEQUENCES For characterization o f request sequences we have consid ered the following parameters. (i) Si ze of the list – (ii) Size of the request sequence – (iii) A permutation representing the order of th e elements in the list (iv) Frequency of occurrence of element in the list. International Journal of Computer Applications (0975 – 8887) Volume 22 – No.8, May 2011 37 : Fig 1: Classification of Requ est Sequences We have classified the request sequ ences as shown in Fig. 1. Our characterization of r equest sequ ences is based on size of the request sequence and ordering o f elements in the list with reference to the requ est sequence. Based o n the comparison of size of the request sequence with size o f the list , we can classify the request sequence in to two grou ps. In Group 1, we con sider the size of the request sequence is same as the size of the list. In Group 2, we con sider the size of th e request sequ ence is greater than the size of the list. 3.1. Characterization of Group 1 When the size of the req uest sequence is sa me as the size o f th e list (n=l), we classify the request sequence b ased on occurrence of elements in th e list and request sequence into two different types – Class A and Cla ss B . In Class A, all the elements of list must be present in the request sequence. Class A r equest sequence can be characterized as follows. Type I : Request sequence is exactly th e same as that of the list. Type II : Request sequence is the reverse o rder as th at of the list. Type III : Request sequence is a permutation of arbitrary o rder as that of the list (except Ty pe I a nd Type II). In Class B, all elements o f the list may n ot be present in the request sequence. Class B req uest sequence can be characterized as follows . Type IV : Request sequen ce consist of any single element of th e list at position p repeated n times w here 1 ≤ p ≤ n. Type V : Request sequence consist of more than on e elements each repeated at least once. 3.2. Characterization of Group 2 When the size o f the req uest sequence is greater than the size of the list ( ), Ag ain we classify the requ est sequence b ased on the size of the request s equence along with size o f the list into two different types - Class C and Class D . In Class C, size of the request sequence is a multiple of th e size o f the list. Class C request sequence can b e characterized as follows. Class C(a) : all elements of the list must be p resent in the request sequence. Class C(b) : all elements of the list may not be present in th e request sequence. Class C(a) requ est sequence can be characterized based on th e frequency of elements o ccurrence in the request sequence as follows – Class C(a)(i) : Frequency o f all el ements in th e request sequence must be same. Type VI : Type I data appear m number of times Type VII : Type II data appear m number of times Type C( a)(ii) : Frequency of ele ments in th e request sequence may not be same In Class D, size of the request sequence is not a multiple of the size of the list 4. RESULTS FOR MTF ALGORITHM 4.1 Assumptions Let be the size of the list and be th e size of the request sequence. The elements of the req uest sequence are co nsidered to be d istinct. We consider and Full Cost Model and Singly Linked List for our analysis of MTF. Illustration : Let the List be 1,2,3. A request sequence with repetition o f 2 nd elements 4 times will b e 2,2,2,2. S o, let and . Then the cost for the above sequence when processed using MTF algorithm is 5 i.e. = 4+2-1=5 Group-1 n=l Group-2 n>l Class A Type I Type II Type III Cla ss B Type IV Type V Cla ss C (n=ml) Cla ss D ( n=ml+k ) Class C( a) Class C(b) Class C( a)(i) Type VI Type VII Class C( a)(ii ) n, l International Journal of Computer Applications (0975 – 8887) Volume 22 – No.8, May 2011 38 4.2. Theoretical Results Theorem 1: MTF always gives best perfo rmance for a request sequence of size n of d istinct elements, where the order of elements of the request sequence is same as that of the order of the li st. The best case cost of MTF a lgorithm using FCM is denoted by . Proof : Let be a list with elements , ,….. . Let be a request sequence with elements , ,….. such that , ……, . Let be the b est case cost for serving on usin g MTF. . We will prove th is using induction. Base : =1. Let there is a single element in th e list i.e. and single element in the req uest sequence i.e. when is served on using MTF the access cost is . Hen ce, is true. Induction step : L et be true for i.e. best case cost . Now we h ave to prove b y induction that . Let the eleme nts of th e list of size be , ,….. and the elements of request sequence be , ,….. su ch that the all the elements of th e list are presen t in the request sequence . Let the element occurs after in the list and occur after in the requ est sequ ence. The access cost o f elements o f th e req uest sequence is where as the access cost o f element of re quest sequence in the list is . Hence, th e total cost of serving elements in the re quest sequence is = = = = . Hence the statement is true for all . Illustration Let the List be 1,2 ,3. A request sequence o f equ al size and distinct elements will be one of the follo wing permutations of the list - 123, 132, 213, 231, 312, 321. Cost for the above request sequ ence, when p rocessed using MTF algorithm are 6, 7,7,8,8,9 respectively. So, the worst case cost is found to be 9 i.e. 3 2 . Similarly, b y in creasing the size o f the list and request sequence we can observe th at the worst case cost for M TF will be for a request sequence of size . Theorem 2: MTF a lways g ives worst performan ce for a requ est sequence of size of distinct elements, where the order of elements of the request sequ ence is that of the reverse order of the list. T he wo rst case cost of MTF algorithm using FCM is denoted by = . Proof : Let be a list with elements , ,….. . Let be a request sequence with elements , ,….. such that , ……, . Let be the worst case cost for serving on using MTF. . We will prove this using induction. Base : = . Let there is a single el ement in the list i.e. and single element in the request sequence i.e. when is served on using MTF the access cost is . Hence, is true. Induction step : Let be true for i.e. worst case cost . Now we have to prove by indu ction that . Let the elements of the list of si ze be , ,….. a nd the elements of request sequence be , ,….. su ch that , , ……, . Let the element occurs after in the list and occur before in t he request sequence. When is served, access cost o f is . Then according to M TF rule, is moved to th e front of the list. Now, the li st configuration becomes , , ,…., . The re maining requ est se quence left to be serve d is , , …, . Af ter serving the first request sequence and moving it to the front of the list the access cost o f subsequent elements in the list is increased b y each. Hence, th e total cost of serving next elements i.e. from to in the list is . Therefore, th e total cost of serving elements in the request sequence is = = = . Corollary 1: Let C MTF (Type- III) denote the total a ccess cost incurred by MTF algorithm for Type III request sequence. Th en . Illustration Let the List be 1,2,3. A request sequence with repetition of elements 4 times will b e 2 ,2,2,2. So, let and . Then the cost for t he above sequenc e when processed usi ng MTF algorithm is 5 i.e. Theorem 3 : For T ype IV request sequ ence of size n, the cost of MTF is given by to . Proof : Let be a list with ele ments , , ….. . Let be a request sequ ence with elements , ,….. such that , , .…, where is having any position the list. Let be the access cost for serving on using MTF. The access co st will be where will be th e number of elements of the req uest sequence. This will be proved b y using induction. Base : . Let there is a sin gle eleme nt in the list i.e. and single element in the request sequence i.e. International Journal of Computer Applications (0975 – 8887) Volume 22 – No.8, May 2011 39 when is served o n using MTF the access cost is . Hence, is true. Induction step : Let be true for i.e. access cost . Now we have to pro ve by induction . The access cost of elem ents of request sequence for element of the list is . Af ter accessing the element in th e request sequence first time i.e. for the element is moved to th e front of the list. So for next , ,…., access cost is for each elem ent of the request sequence. So the access cost for element will be . Hence the total cost for serving elements in the request sequence is . Hence it is true for all . Corollary 2 : For Type IV request sequ ence of size , the best case cost for MTF a lgorithm is and wo rst ca se cost is . Illustration Let the List be 1,2 ,3. A request sequence o f equ al size and distinct elements will be one of the follo wing permutations of the list - 123, 132, 213, 231, 312, 321. Cost for the above request sequ ence, when p rocessed using MTF algorithm are 6, 7,7,8,8,9 respectively. Let for a request sequence 213 all the elements are repeated twice and forms a new req uest sequence as 2 21133. Then cost for this request sequence can be derived as 7+3(2-1)=10 where 7 is the cost original request sequence, 3 is the number of elements in th e original request sequence and 2 be th e nu mber of tim es each ele ment of the original request sequence is repeated. Theorem 4: Let th e access cost of a request sequence having distinct elements is represented by for a list with same num ber of elements as of request sequence. T hen for a new requ est sequence of any o rder where each element of the request sequence is repeated times, the total cost of MTF algorithm for pro cessing th e request sequ ence can be evaluated b y using the following fo rmula. where is size of the list and is the number of time each element of the request sequence is repeated. Proof : Let be a list with elements , ,….. . Let b e a request sequ ence with e lements , ,….. such that the request sequ ence consist of all the elements of list. Let b e the access cost for any request sequence , ,….. . Let each element of the request sequence b e repeated for times then the access co st for the requ est sequence with repetition of ele ments b e where is th e size of original request sequence. We will prove this using induction. Base : . Let there is a single element in the list i.e. and single element in the request sequence i.e. when is served on u sing MTF the acce ss cost is . Hence, is true. Let be true for i.e. worst case cost . Now we h ave to prove by induction that where be fixed. Let th e elements of the list of size s be , ,….. and the elements of request sequence be , ,….. such t hat t he req uest sequence consist a ll th e elements o f the list.. So for with repetitions the request sequence will be , ,….., , . So, the total access cost will be as upto request sequence the cost is , the access cost of element o f the request sequence is , then for subsequent access the cost is . The cost up to can be represented as = = ---------------- eqn(1) Then th e cost of can b e represen ted as = = = = Hence, the statement is true for . Now, we have to p rove that the statement is true for all . From eqn(1) the statement is true for i.e . . Hence for repetition, Hence, the statement is tru e for . So, the stateme nt is true for all an d . Illustration Let the List be 1,2 ,3. A requ est sequence of equal size and distinct elements will be one of the follo wing permutations of the list - 123, 132, 213, 231, 312, 321. Cost for the above request sequ ence, when p rocessed using MTF algorithm are 6, 7,7,8,8,9 respectively. Let for a request sequence 213 each element is repeated 2,3 and 4 times respectively and forms a new request sequence as 221113333. Then cost for th is request sequence can be derived as 7+(2 -1)+(3-1)+(4-1)=13 where 7 is the cost original requ est sequence, 2,3 and 4 b e the number of times each element of the original request sequence is repeated. Theorem 5: Let th e access cost of a request sequence having distinct elements is represented by for a list with same number of elements a s that of request sequence. Then f or a new request sequence of any o rder where each element of the request sequence is repeated , ,….., times respectively, th en the total cost of MTF fo r processing the request sequ ence can be evaluated by using the following formula . where is size of the list and is th e number of time each element of th e request sequence is repeated. International Journal of Computer Applications (0975 – 8887) Volume 22 – No.8, May 2011 40 Proof : Let be a list with elements , , ….. . Let be a request sequ ence with e lements , ,….. such that the request sequence consist of all the elements of list. Let all th e elements of th e requ est sequence be repeated d iff erentl y i.e. , ,….. be repeated for , ,….. times respectively. Then th e access cost for the n ew request sequence with repetition of elements be . We will prove this using induction. Base : . Let there is a single element in the list i.e. and sin gle element in th e request sequence i.e. when is served on using MTF the access cost is . Hence, is true. Let be true for i.e. worst case cost . Now we ha ve to prove by ind uction that where be fixed. Let the ele ments of the list of size be , ,….. and the elements of requ est sequence be , ,….. such that th e request sequence con sist all the elements of the list.. S o for with , ,….. repetitions respectively the request sequence will be . S o, the total access cost will be as upto request sequence the cost is , the access cost of element of th e requ est sequence is , then for subsequent access the cost is . The cost upto can be represented as = ---------------- eqn(1) Let the statement is true for , ,…, Then the cost of can be represented as = Hence, the statement is true for all and , ,…, Now, we have to prove th at th e statement is true for all with , ,….. repetitions respectively The access cost for = = = So, the statement is true for all and . 5. CONCLUDING REMARKS Our characterization and classification of request sequence is a novel method which will facilitate generation of different request sequence for modeling real world input for the list accessing pro blem. Furth er characterization of request sequences can b e done based on locality of reference and look ahead p roperty o f the input. This characterization can be u sed as an important tool for making comparative performance analysis of various list accessing algorithms. New improved list accessing algorithms can be d esigned in f ut ure for a specific class of req uest sequence. Each characterization co rresponds to a specific real life application f or t he list accessing problem. New cost models c an be developed based on characterization of request sequence. Based on our characterization, the best list accessing algorithm can b e determined for d ifferent inputs. This characterization will help us in developin g some new alternate performance matrix for list accessing algorithms. A new experimental set u p can be designed which will co ver a wide range of request sequ ence for measuring the p erformance of list accessing algorithms. 6. ACKNOWLEDGMENTS Our special thanks to Dr. N. S. Nara ya naswa my of D epartment of Computer Science and Engineering, In dian Institute of Technology, Madras for his initial motivation and support. 7. REFERENCES [1] J. McCabe , “On serial files with relocatable records”, Operations Research, vol . 12.pp609-618, 1965. [2] J. L. Bentley and C. C. McGeoch, “Amortized an alysis of self organizing sequential search heuristics,” CACM, vol. 28, pp. 404-411, 1985. [3] G. H. Gonnet, J.I. Munro, and H. Suwanda, “Towards self- organizing linear search.”IEEE, pp.169 -174 , 1979. [4] R. Rivest, “On sel f o rganizing sequential search h euristics,” Communication of the ACM , vol 10. 19, 63 - 67, 1976. [5] G .H. Gonet, J.I. Munro, a nd H. S uwanda, “T owards self organising linear search.” SIAM journal of co mputing, vol. 10, no. 3,pp. 613-637, 1981. [6] J. H. Hester and D. S. Hirschberg, “Self – organizing linear search,” vol. 17, pp. 295 -312, 1985. [7] D. D. S leator and R.E. Tarjan, “Amortized efficiency o f list update and paging rules.” Common. ACM, vol. 28, no. 2, 202 -208, 1985. [8] A . Borodin, N. Linial, and M. Saks, “An optimal onli ne algorithm for material task systems,” J ACM, vol. 52, pp. 46 -52, 1985. [9] R. Bachrach, R. EI- Yaniv, and M. Reinstadtler, “ on the competitive theor y and practice of online list accessing algorithms,” Algorithmica. V ol. 32, no. 2, pp. 201 -245, 2002. [10] S Angelopoulos, R Dorrigiv, A López-Ortiz - “ List upd ate with locality of reference: MTF outperforms all other algorithms,TR CS -2006-46, School o f Computer Science, University of Waterloo, November 2006 .

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment