The Coron System

November 28, 2011

Reading time: 5 minute

...

📝 Original Info

Title: The Coron System
ArXiv ID: 1111.5690
Date: 2011-11-28
Authors: Mehdi Kaytoue (INRIA Lorraine - LORIA), Florent Marcuola (INRIA Lorraine - LORIA), Amedeo Napoli (INRIA Lorraine - LORIA), Laszlo Szathmary (INRIA Lorraine - LORIA), Jean Villerd (INRIA Lorraine - LORIA)

📝 Abstract

Coron is a domain and platform independent, multi-purposed data mining toolkit, which incorporates not only a rich collection of data mining algorithms, but also allows a number of auxiliary operations. To the best of our knowledge, a data mining toolkit designed specifically for itemset extraction and association rule generation like Coron does not exist elsewhere. Coron also provides support for preparing and filtering data, and for interpreting the extracted units of knowledge.

💡 Deep Analysis

📄 Full Content

The Coron System Mehdi Kaytoue1, Florent Marcuola1, Amedeo Napoli1, Laszlo Szathmary2, and Jean Villerd1 1 Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA) Campus Scientiﬁque – BP 239 – 54506 Vandœuvre-lès-Nancy Cedex (France) {kaytouem, marcuolf, napoli, villerd}@loria.fr 2 Département d’Informatique – Université du Québec à Montréal (UQAM) C.P. 8888 – Succ. Centre-Ville, Montréal H3C 3P8 (Canada) Szathmary.L@gmail.com Abstract. Coron is a domain and platform independent, multi-purposed data mining toolkit, which incorporates not only a rich collection of data mining algorithms, but also allows a number of auxiliary operations. To the best of our knowledge, a data mining toolkit designed speciﬁcally for itemset extraction and association rule generation like Coron does not exist elsewhere. Coron also provides support for preparing and ﬁltering data, and for interpreting the extracted units of knowledge. Key words: knowledge discovery, data mining, itemset extraction, as- sociation rules generation, rare item problem 1 System Overview Born for a particular need in a cohort study [1], Coron is now a framework of knowledge discovery in databases on its own, used in several application do- mains, e.g. [4–6]. Intended to an educational and scientiﬁc usage, the Coron system is articulated into several modules for preparing and mining binary data, and ﬁltering and interpreting the extracted units. Thus, from binary data (pos- sibly obtained from a discretization procedure), Coron allows one to extract itemsets (frequent, closed, generators, etc.) and then to generate association rules (non-redundant, informative, etc.). Building concept lattices is also pos- sible. The system includes many classical algorithms of the literature, but also others that are speciﬁc to Coron [9–11]. The software is freely available at http://coron.loria.fr. Mainly written in Java, Coron is compatible with the Unix, Mac and Windows operating systems and is of command-line usage. 2 A Global Data Mining Methodology The methodology was initially designed for mining biological cohorts, but it is generalizable to any kind of database. It is important to notice that the whole process is guided by an expert, who is a specialist of the domain related to the database. His role may be crucial, especially for selecting the data and for arXiv:1111.5690v1 [cs.DB] 24 Nov 2011 2 M. Kaytoue, F. Marcuola, A. Napoli, L. Szathmary and J. Villerd interpreting the extracted units, in order to fully turn them into knowledge units. In our case, the extracted knowledge units are mainly association rules. At the present time, ﬁnding association rules is one of the most important tasks in data mining. Association rules allow one to reveal “hidden” relationships in a dataset. Finding association rules requires ﬁrst the extraction of frequent itemsets. The methodology consists of the following steps: Deﬁnition of the study framework; Iterative step: data preparation and cleaning, pre-processing step, processing step, post-processing step; Validation of the results and Generation of new research hypotheses; Feedback on the experiment. The life-cycle of the methodology is shown in Figure 1. Coron is designed to satisfy the present methodology and oﬀers all the tools that are necessary for its application in a single platform. Pre-processing. These modules propose several tools for manipulating and for- matting large data. The data are described by binary tables in a simple text-ﬁle format: some individuals in lines possess or not some properties in column. The main possible operations are: (i) discretization of numerical data, (ii) conversion of diﬀerent ﬁle formats, (iii) creation of the complement of the binary table, and (iv) other projection operations such as transposition of the table. Fig. 1. Architecture of the Coron System The Coron System 3 Data mining. Extracting itemsets and association rules is a very popular task in data mining. Concept lattices are mathematical structures supported by a rich and well established formalism, namely, Formal Concept Analysis [13]. A concept lattice is represented by a diagram giving nice visualization of classes of objects of a domain. Thus, the data mining modules of the Coron System oﬀer the following possibilities: – Itemset extraction: frequent, closed, rare, generators, etc. This task is per- formed by a large collection of algorithms based on diﬀerent search strategies (depth-ﬁrst, level-wise, etc.). – Association rules generation: frequent, rare, closed, informative, minimal non-redundant, Duquenne-Guigues basis, etc. These rules are given with a set of measures such as support, conﬁdence, lift, conviction, etc. – Concept lattice construction. Post-processing. Extracted units from the data mining step may be very numer- ous, and hide some units of higher interest. Thus, Coron proposes some ﬁltering operations that should be done in interaction with a domain expert. The ana- lyst may ﬁ

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on open access ArXiv data.

The Coron System

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Coron : Plate-forme dextraction de connaissances dans les bases de donnees

A Formal Approach for Agent Based Large Concurrent Intelligent Systems

ABHIVYAKTI: A Vision Based Intelligent System for Elder and Sick Persons

Start searching

No results found