Generalized Compression Dictionary Distance as Universal Similarity Measure

GENERALIZED COMPRESSION DICTIONAR Y DIST ANCE AS UNIVERSAL SIMILARITY MEASURE Andr ey Bo gomolov ∗ SKIL T elecom Italia Lab, University of T rento, V ia Sommari v e, 5, I-38123, T rento, Italy E-mail: andrey .bogomolov@unitn.it Bruno Lepri, F abio Pianesi † Fondazione Bruno K essler V ia Sommari v e, 18, I-38123, T rento, Italy E-mail: { lepri, pianesi } @fbk.eu ABSTRA CT W e present a new similarity measure based on information theoretic measures which is superior than Normalized Com- pression Distance for clustering problems and inherits the useful properties of conditional K olmogorov comple xity . W e show that Normalized Compression Dictionary Size and Normalized Compression Dictionary Entropy are com- putationally more efﬁcient, as the need to perform the com- pression itself is eliminated. Also they scale linearly with ex- ponential vector size gro wth and are content independent. W e show that normalized compression dictionary distance is compressor independent, if limited to lossless compres- sors, which giv es space for optimizations and implementation speed improv ement for real-time and big data applications. The introduced measure is applicable for machine learn- ing tasks of parameter-free unsupervised clustering, super- vised learning such as classiﬁcation and regression, feature selection, and is applicable for big data problems with order of magnitude speed increase. Index T erms — dissimilarity , distance function, normal- ized compression distance, time-series clustering, parameter - free data-mining, heterogenous data analysis, K olmogorov complexity , information theory , machine learning, big data 1. INTR ODUCTION The similarity measure between objects is fundamental for machine learning tasks. Most similarity measures require prior assumptions on the statistical model and/or the parame- ters limits. For most applications in computational social science, economics, ﬁnance, human dynamics analysis this implies a certain risk of being biased or not to account for partially observed source signal fundamental properties [1]. ∗ Thanks to T elecom Italia for funding. † Performed the work while at Fondazione Bruno K essler For more technical applications such as digital signal pro- cessing, telecommunications and remote sensing, gi v en that the signal could be observ ed and modelled, we face the prob- lems of noise, features representation and algorithmic efﬁ- ciency . W e easily agree with the following outcome of 20 trials of fair coin toss “01111011001101110001”, b ut we do not accept the result “00000000000000000000”. Howe ver , both results hav e equal chances gi ven that the f air coin model as- sumption holds. This is a common example of paradoxes in probability theory , but our reaction is caused by the belief that the ﬁrst sequence is complicated, b ut the second is simple [2]. A second example of human-inspired limitations is “Green Lumber F allacy” introduced by Nassim Nicholas T aleb . It is a kind of fallacy that a person ”mistaking the source of im- portant or e ven necessary kno wledge, for another less visible from the outside, less tractable one”. Mathematically , it could be expressed as we use an incorrect function which, by some chance, returns the correct output, such that g ( x ) is mixed with f ( x ) . The root of the fallac y is that “although people may be focusing on the right things, due to comple xity of the thing, are not good enough to ﬁgure it out intellectually” [1]. Despite the psychological limitations and fallacies by which we, human, reason and de velop the models, during the last fe w decades there were de veloped a number of ro- bust methods to enhance model generalization properties and resistance to noise, such as ﬁltering, cross validation, boost- ing, bootstrapping, bagging, random forests [3]. The most promising approach to the challenging paradigm of approach- ing antifragility uses Kolmogoro v complexity theory [4] and concepts of Computational Irreducibility or the Principle of Computational Equiv alence, introduced by Stev en W olfram [5]. Unfortunately K olmogorov complexity is uncomputable for a general case. For practical applications we hav e to implement the algorithms which run on computers having T uring machine properties. 2. METHODOLOGY For deﬁning similarity measure which uses Kolmogoro v com- plexity researchers introduced Information Distance measure, which is deﬁned as a distance between strings x and y as the length of the shortest program p that computes x from y and vice versa. The Information Distance is absolute and to obtain a similarity metric Normalized Information Distance ( N I D ) was introduced: N I D ( x, y ) = max { K ( x | y ) , K ( y | x ) } max { K ( x ) , K ( y ) } (1) Unfortunately , Normalized Information Distance is also uncomputable for a general case, as it is dependent on uncom- putable K olmogorov complexity measure [6]. For approxi- mating NID in a practical en vironment – Normalized Com- pression Distance ( N C D ) was dev eloped, based on based on a real-world lossless abstract compressor C [7]: N C D ( x, y ) = C ( xy ) − min { C ( x ) , C ( y ) } max { C ( x ) , C ( y ) } (2) Daniele Cerra and Mihai Datcu introduced another ap- proximation metric – Fast Compression Distance ( F C D ), which is applicable to medium-to-large datasets: F C D ( x, y ) = | D ( x ) | − ∩ ( D ( x ) , D ( y )) | D ( x ) | , (3) where | D ( x ) | and | D ( y ) | are the sizes of the relati ve dic- tionaries, represented by the number of entries they contain, and ∩ ( D ( x ) , D ( y )) is the number of patterns which are found in both dictionaries. FCD accounts for the number of patterns which are found in both dictionaries extracted during com- pression by Lempel-Ziv-W elch algorithm, and reduces the computational effort by computing the intersection between dictionaries, which represents the joint compression step per - formed in NCD [8]. W e found that the number of patterns which exist in the both dictionaries is dependent on the compression algorithm used. Also the dictionary of x and y set intersection could be coded with dif ferent symbols, as the frequencies of strings could be dif ferent in x , y and ∩ ( x, y ) , which leads to less accurate approximation. The size of a compression dictionary does not account for the symbol frequency properties of the dictionary and the size of possible algorithmic “description of the string in some ﬁx ed uni v ersal description language”, which is the essence of K olmogorov complexity . That means that we loose a lot of information about x and y if we compute only the size of a compression dictionary . In order to overcome this problem we introduce Gen- eralized Compression Dictionary Distance (GCDD) metric, which is deﬁned as: GCDD ( x, y ) = Φ ( x · y ) − min { Φ ( x ) , Φ ( y ) } max { Φ ( x ) , Φ ( y ) } , (4) where Φ ( x · y ) is functional characteristics of the com- pression dictionary extracted from concatination of x and y byte arrays. GCDD returns an n-dimensional vector , which characterizes the conditional similarity measure between x and y . Each dimension of GCDD represents a real v alued function. The algorithmic complexity of the proposed solution is proportional to: O GC D D ( x,y ) → km x log m y , (5) where m x and m y is the dictionary size of x and y , and k is the constant dependent on the dimensionality of the result- ing vector . In comparison, the algorithmic complexity of the other measures are: O F C D ( x,y ) → m x log m y , (6) O N C D ( x,y ) → ( n x + n y ) log( m x + m y ) , (7) which shows asymptotically small increase in compu- tational time for GCDD but preserving informational gain through additional x and y characteristics transfer . 3. EXPERIMENT AL RESUL TS AND DISCUSSION The implementation of Generalized Compression Dictionary Distance prototype was done in The Ja v a Platform, Standard Edition 8 for double precision 64-bit input vectors and work- ing with the Huffman Coding and Lempel-Ziv-W elch com- pression for byte array case applying the approach of binary- to-string encoding. The experiments were run on “Synthetic Control Chart T ime Series” – a well known dataset published in UCI Ma- chine Learning Repository[9], which contains control charts synthetically generated by Alcock and Manolopoulos [10] for the six different classes of time series: normal, cyclic, increas- ing trend, decreasing trend, upw ard shift and do wnward shift. Experimental results show that Normalized Compression Dictionary Size and Normalized Compression Dictionary En- tropy , as examples of GCDD, give more stable and accurate results for time-series clustering problem when tested on het- erogeneous input v ectors, than NCD and other traditional dis- tance (e.i. dissimilarity) measures, such as euclidean distance. Experimental results shown in the Fig.1 are produced from abovementioned collection of I time series vectors, on which 4 distance function are deﬁned (GCDD, NCD, L2-norm, Pearson correlation). Applying a distance function for each pairwise vectors, the dissimilarity matrix is consructed, such that: ∆ :=      δ 1 , 1 δ 1 , 2 · · · δ 1 ,I δ 2 , 1 δ 2 , 2 · · · δ 2 ,I . . . . . . . . . δ I , 1 δ I , 2 · · · δ I ,I      . (8) Then multidimensional scaling is performed. Giv en dis- similarity matrix ∆ , we ﬁnd I vectors x 1 , . . . , x I ∈ R N such that k x i − x j k ≈ δ i,j for all i, j ∈ 1 , . . . , I , where k · k is a vector norm. On the plots we use the follo wing symbols to encode v ec- tors of time series trend types: N - normal, C - cyclic, IT - increasing trend, DT - decreasing trend, US - upward shift and DS - downw ard shift. From the Fig.1 we see, that GCDD based distance metric efﬁciently groups time series on a h yperplane thus increasing separation ability . It has similar properties as NCD, and much better than L2-norm and Pearson correlation based, where the time series vectors are mix ed. Then we run the e xperiments with computationally inten- siv e state-of-the-art methods for time series clustering, such as: (1) autocorrelation based method, (2) Linear Predicti ve Coding based as proposed by Kalpakis, 2001 [11], (3) Adap- tiv e Dissimilarity Index based [12] and (4) ARIMA based (Piccolo, 1990) [13]. From the Fig.2 we see, that numerically intensive methods do not enhance much the separation ability , which is com- puted applying GCDD based distance metric. Also these dis- tance methods require much more computation time and are not applicable for big data problems. Further more, the result proposed in this paper could be used for unsupervised clustering, supervised classiﬁca- tion and feature representation for deep learning tasks gi ven the nice properties of GCDD, such as (1) it scales linearly with exponential growth of the input vector size and (2) it is content independent, as the semantics is coded inside the extracted dictionary itself. The future research steps include testing the concept on div erse data sets, including image processing data and using the GCDD output for different machine learning tasks. 4. REFERENCES [1] Nassim Nicholas T aleb, Antifragile: Things that Gain fr om Disorder , Allen Lane, London, UK, 2012. [2] Andrei A. Muchnik, Alex ei L. Semenov , and Vladimir A. Uspensky , “Mathematical metaphysics of randomness, ” Theor etical Computer Science , vol. 207, no. 2, pp. 263 – 317, 1998. [3] Tre v or Hastie, Robert T ibshirani, and Jerome Friedman, The Elements of Statistical Learning , Springer Series in Statistics. Springer Ne w Y ork Inc., New Y ork, NY , USA, 2001. [4] A.N. K olmogorov , “On tables of random numbers, ” Theor etical Computer Science , v ol. 207, no. 2, pp. 387 – 395, 1998. [5] Stephen W olfram, A new kind of science , W olfram- Media, 2002. [6] R. Cilibrasi and P .M.B. V itanyi, “Clustering by com- pression, ” Information Theory , IEEE T r ansactions on , vol. 51, no. 4, pp. 1523–1545, April 2005. [7] Ming Li, Xin Chen, Xin Li, Bin Ma, and P .M.B. V i- tanyi, “The similarity metric, ” Information Theory , IEEE T ransactions on , vol. 50, no. 12, pp. 3250–3264, Dec 2004. [8] Daniele Cerra and Mihai Datcu, “A fast compression- based similarity measure with applications to content- based image retriev al, ” J . V is. Comun. Image Repr esent. , vol. 23, no. 2, pp. 293–302, Feb . 2012. [9] D.J. Newman A. Asuncion, “UCI machine learning repository , ” 2007. [10] R. J. Alcock, Y . Manolopoulos, Data Engineering Lab- oratory , and Department Of Informatics, “Time-series similarity queries emplo ying a feature-based approach, ” in In 7 th Hellenic Confer ence on Informatics, Ioannina , 1999, pp. 27–29. [11] K. Kalpakis, D. Gada, and V . Puttagunta, “Dis- tance measures for ef fecti ve clustering of ARIMA time- series, ” in Data Mining, 2001. ICDM 2001, Proceedings IEEE International Confer ence on , 2001, pp. 273–280. [12] AhlameDouzal Chouakria and PandurangaNaidu Nagabhushan, “ Adapti v e dissimilarity index for measuring time series proximity , ” Advances in Data Analysis and Classiﬁcation , vol. 1, no. 1, pp. 5–21, 2007. [13] Domenico Piccolo, “ A distance measure for classifying ARIMA models, ” J ournal of T ime Series Analysis , vol. 11, no. 2, pp. 153–164, 1990. −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 −0.2 −0.1 0.0 0.1 GCDD based dissimilarity matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS −0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 −0.10 −0.05 0.00 0.05 0.10 0.15 NCD based dissimilarity matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS −100 −50 0 50 100 −80 −60 −40 −20 0 20 40 L2 norm based dissimilarity matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS −1.0 −0.5 0.0 0.5 −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 Correlation based dissimilarity matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS Fig. 1 . Fast Methods Comparison −10 −5 0 5 −5 0 5 Autocorrelation based dissimilarity matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 0 500 1000 1500 2000 Linear Predictive Coding ARIMA based dissimilarity matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS −150 −100 −50 0 50 100 −100 −50 0 50 100 Adaptive dissimilarity index matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS −0.8 −0.6 −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 0.4 0.6 ARIMA based, Piccolo−1990 dissimilarity matrix projection N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT IT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT DT US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US US DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS DS Fig. 2 . Numerically Intensiv e Methods Comparison

Generalized Compression Dictionary Distance as Universal Similarity Measure

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment