Title: CORON
1CORON
- A Platform for Itemset Mining Algorithms
- Laszlo SZATHMARY
- szathmar_at_loria.fr
- Orpailleur team ? LORIA, Nancy, France
Journée Fouille de données 12 juillet 2005,
LORIA
2Outline
- Overview
- Itemset mining algorithms of the system
- the Zart algorithm
- finding informative association rules with Zart
- 3) Rule mining module
- 4) Demo
3Overview
- Coron is an integrated platform for
- finding frequent (closed) itemsets
- extracting different kinds of association rules
- Coron is designed to cover a wide range of basic
tasks of symbolic - data mining, including
- pre-processing the dataset
- extracting frequent (closed) itemsets
- generating association rules
- post-processing (filtering) the results
- A rich set of algorithmic methods for symbolic
data mining is - included in the systems architecture.
4Coron ? algorithms
The central part of Coron contains a rich set of
algorithms for extracting frequent (closed)
itemsets. Algorithms implemented
- Apriori
- Apriori-Close
- Close
- Pascal
- Pascal
- Titanic
- Charm
- Eclat
- Zart
- Eclat-Z
FIs
FIs, mark FCIs
FCIs
FIs using key generators
like Pascal, but it also marks FCIs
FCIs using key generators
FCIs
FIs
The system is implemented entirely in Java.
5Zart
Zart an extension of Pascal Pascal FIs key
generators Zart FIs FCIs key generators
key generators are associated to
their closures The association of key generators
to their closures is necessary to be able to
generate informative association rules (GB, IB)
efficiently! Eclat-Z instead of Pascal it uses
Eclat to find FIs. After that it
works like Zart ? its output is identical with
Zarts
6Zart vs. Pascal
D
Pascal
ABCE
min_supp2 (40)
C
7Zart vs. Pascal
D
Zart
ABCE
min_supp2 (40)
C
8Informative association rules
An informative association rule has the following
form
P1 is a minimal generator P2 maxP2, i.e. P2
is an FCI P1 ? P2
r P1 ? P2 \ P1, where
Informative rules allow to deduce maximum
information with a minimal hypothesis.
Bastide et al., 2002
9Finding informative rules
min_supp2 (40)
min_gen e
CG be, bce, abce
e gt b e gt bc e gt abc
10Characteristics of Coron
- Any of its algorithms can be called from command
line as a standalone program. - Due to its Java API, it can easily be integrated
in other projects. - Since Coron is a platform, large parts of it can
be reused toimplement a new itemset mining
algorithm.
Datasets are very different in size, number of
objects, number of attributes, density,
etc. There is no best algorithm for arbitrary
datasets. With Coron we want to give a
possibility for users to try different algorithms
and choose the one that best suits their needs.
11Rule mining module
output association rules
input
- What kind of rules can be extracted with
AssRuleX? - all possible association rules
- all/reduced informative association rules
- Generic Basis (exact inf. ass. rules)
- Informative Basis (approximate inf. ass. rules)
12Building concept lattices
- Frequent closed itemsets can be used to build
concept lattices. - FCIs are the intents of the corresponding
concepts - find the extent part of the intents
- find the order among the previously identified
concepts - This approach can be used to build
- complete
- iceberg
- concept lattices.
- But the main goal of Coron is
- itemset mining
- rule extraction
13D
Demo
We will work with the following test dataset
min_supp2 (40)