Title: Discovering Fuzzy Classification Rules using Genetic Network Programming
1Discovering Fuzzy Classification Rules using
Genetic Network Programming
Karla Taboada
2Contents
- What is Data Mining?
- Why Data Mining?
- Data mining tasks.
- Genetic Network Programming (GNP).
- GNP for association rule mining.
- GNP-Fuzzy data mining method
- for classification.
- Simulation results.
- Conclusions
3- Introduction to Data Mining
4Why Mine Data?
- Lots of data is being collected and warehoused
- Web data, e-commerce
- purchases at department/grocery stores
- Bank/Credit Card transactions.
- Computers have become cheaper and more powerful
- Competitive Pressure is Strong
- Provide better, customized services for an edge
- (e.g. in Customer Relationship Management)
5Why Mine Data?
- Data collected and stored at enormous speeds
(GB/hour) - remote sensors on a satellite
- telescopes scanning the skies
- microarrays generating gene expression data
- scientific simulations generating terabytes of
data - Traditional techniques infeasible for raw data.
6Knowledge Discovery in Databases
- The abundance of data, coupled with the need for
powerful data analysis tools, has been described
as a data-rich but information-poor situation.
How do you explore millions of records, tens or
hundreds of fields, and find patterns?
7Knowledge Discovery in Databases
- Knowledge Discovery in Databases is the
non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data. -
8What is Data Mining?
- Process of semi-automatically analyzing large
databases to find patterns that are - valid hold on new data with some certainty.
- novel non-obvious to the system.
- useful should be possible to act on the item .
- understandable humans should be able to
interpret the pattern.
9Why Data Mining
- Credit ratings/targeted marketing
- Given a database of 100,000 names, which persons
are the least likely to default on their credit
cards? - Identify likely responders to sales promotions
- Fraud detection
- Which types of transactions are likely to be
fraudulent, given the demographics and
transactional history of a particular customer? - Customer relationship management
- Which of my customers are likely to be the most
loyal, and which are most likely to leave for a
competitor?
Data Mining helps extract such information
10Data Mining Tasks
- Classification
- Clustering
- Association Rule Discovery
- Sequential Pattern Discovery
- Regression
- Deviation Detection
11(No Transcript)
12Classification Application
- Direct Marketing
- Goal Reduce cost of mailing by targeting a set
of consumers likely to buy a new cell-phone
product. - Approach
- Use the data for a similar product introduced
before. - We know which customers decided to buy and which
decided otherwise. This buy, dont buy decision
forms the class attribute. - Collect various demographic, lifestyle, and
company-interaction related information about all
such customers (type of business, where they
stay, how much they earn, etc). - Use this information as input attributes to learn
a classifier model.
13Market Basket Example
Association Rule Mining
?
Where should detergents be placed in the store to
maximize their sales?
?
Are window cleaning products purchased when
detergents and orange juice are bought together?
?
Is soda typically purchased with bananas? Does
the brand of soda make a difference?
?
How are the demographics of the neighborhood
affecting what customers are buying?
14Association Rule Mining
- Searches for interesting relationships among
items in a given data set. - Support and confidence are the two most important
quality measures for evaluating the
interestingness of an - association rule.
- An association rule is an implication of the
form -
- X ? Y, where X, Y ? I, and X ?Y ?
Example When a customer buys bread and butter,
they buy milk 85 of the time.
15Association Rule Mining
Rules Discovered Milk --gt Cereal
Diaper, Milk --gt Beer
Milk and cereal selltogether!
Applications Catalog design, store layout,
cross-marketing
16Genetic Network Programming
17Genetic Network Programming (GNP)
GNP is an extension of Genetic Algorithms (GA)
and Genetic Programming (GP).
- The main difference between them
- is the representation of the solution
- GA evolves strings as solutions and it is mainly
applied to optimization problems. - GP expands the expression ability of GA by using
tree structures. - GNP uses directed graph structures as solutions,
therefore GNP can deal with complex problems more
effective and efficient than GA and GP.
Processing node Judgment node Start node
18(Roulette, tournament and elite selection are
established in GNP.)
Reproduction
19(No Transcript)
20(No Transcript)
21GNP for class association rule mining
22Objective
Propose a data mining method for dealing
continuous values based on Genetic Network
Programming (GNP) and Fuzzy Set Theory.
GNP
GNP
Fuzzy Classification Rules
A1_High gt Z1 A4_Med ? A7_Low gt Z2
23Empowering classical association rules
Why Fuzzy Association Rules?
The original idea derives from dealing with
continuous attributes, where discretization of
the continuous values into intervals would lead
to under or overestimating values near the
borders. This is called the sharp boundary
problem.
- Can help to overcome this problem by allowing
different degrees of membership, not only 1 and
0. - Has been shown to be a very useful tool because
the mined rules are expressed in linguistic
terms, which are more natural and understandable
for human beings
Fuzzy Sets Theory
24Extraction of Association Rules using GNP
- GNP examines the attribute values of database
using judgment nodes. - GNP calculates the measurements of association
rules using processing nodes. - The connections of judgment nodes are represented
as association rules.
P1
N
Yes
Yes
Yes
Yes
c
d
a
b
A41
A11
A31
A21
c(C)
d(C)
b(C)
a(C)
No
GNP structure for class association rule mining
25Extraction of Association Rules using GNP
(C 0, 1, , K)
P1
N
Yes
Yes
Yes
Yes
c
d
a
b
A41
A11
A31
A21
c(C)
d(C)
b(C)
a(C)
No
26Extraction of Fuzzy Classification Rules using GNP
Class Association Rules
(C 0, 1, , K)
P1
N
Yes
Yes
Yes
Yes
c
d
a
b
A1_High
A3_Mid
A4_High
A2_Low
c(C)
d(C)
b(C)
a(C)
No
P2
Yes No
27Fuzzy Classification Rules using GNP
Our proposed model consists of two major phases
1) Generating fuzzy class association rules by
using Genetic Network Programming. 2) Building a
classifier model based on the extracted fuzzy
rules. In the first phase, the task is to
extract fuzzy class association rules from a
fuzzy training set using a GNP-based algorithm.
Moreover, the fuzzy membership functions are
evolved by non-uniform mutation in every
generation in order to perform a more global
search in the space of candidate membership
functions and therefore enable to discover new
fuzzy rules. In the second phase, all of the
generated fuzzy rules in the pool are used to
predict the class of the test set. For each test
data, the classifier computes the average
distance between the data and the rules in each
class. Finally, the class with the smallest
distance is assigned to the test data.
28GNP-Fuzzy DM method
Fuzzy membership functions for handling
continuous attributes
Sample Database
Low Medium High
Young Middle Old
1
Salary attribute
Age attribute
Database with the fuzzy membership values
29GNP-Fuzzy DM method
Extraction of Fuzzy Association Rules using GNP
Probability to moving to Yes-side
Fuzzy values are used as probabilities for the
transition of judgment nodes.
Pb0.8
Pb0.7
Pb0.75
Pb0.9
P1
1
1
1
1
1
A2_Low
A3_Med
A4_High
A1_High
TID 1
Pb0.7
Pb0.65
Pb0.2
Pb0.7
P1
2
1
1
2
2
A2_Low
A3_Med
A4_High
A1_High
TID 2
30GNP-Fuzzy DM method
Extract fuzzy rules through generations
- Each fuzzy rule is stored with
- x2 value.
- Support.
- Fuzzy parameters.
31Fuzzy Classification Rules using GNP
Each run of the algorithm discovers fuzzy rules
for a single class, therefore the algorithm must
run K1 times, where K1 is the number of classes.
Pool Class 0
Pool Class K
...
Pool Class 2
Pool Class 1
Pool of fuzzy rules per each class in the DB.
32Fuzzy Classification Rules using GNP
For each test data, the classifier computes the
average distance between the data and the rules
in each class. Finally, the class with the
smallest distance is assigned to the test data.
Test set
Pool Class 1
Pool Class K
Pool Class 2
Pool Class 1
33Fuzzy Classification Rules using GNP
Therefore, the classification of test data d is
determined as follows
34Fuzzy Classification Rules using GNP
Therefore, the classification of test data d is
determined as follows
35Experimental results
36Experimental results
- We have evaluated our proposed method across
three public-domain data sets from the UCI data
set repository. The results reported below were
produced by using a 10-fold cross validation
procedure. - Population size 120.
- Number of processing nodes 20.
- Number of judgment nodes 200.
- Number of generations 100.
- x2 6.63
- supmin0.01, 0.05, 0.1, 0.15, 0.25. 0.3
- anew 150
- rc15/78
- rm11/3
- rm21/5
- All algorithms were coded in Java. Experiments
were - done on a 1.50GHz Pentium M with 504MB RAM.
37Experimental results
Heart stat-log DB 303 records 14 attributes.
38Experimental results
Heart stat-log DB 303 records 14 attributes.
39Experimental results
Ionosphere DB 351 records 35 attributes.
40Experimental results
Ionosphere DB 351 records 35 attributes.
41Experimental results
CRX DB 351 records 35 attributes.
42Experimental results
CRX DB 351 records 35 attributes.
43Experimental results
In order to evaluate the performance of our
proposed method we have compared it to other
evolutionary system found in the literature
CEFR-MINER 1.
Table 5. Accuracy rate, in .
1 R. Mendes, A. Freitas, Fuzzy Classification
Rules with Genetic Programming and Co-Evolution,
Conference on Principles of Data Mining and
Knowledge Discovery, 2001.
44Conclusions
- Compared with traditional classification rules,
fuzzy rules provide good linguistic explanation
and can deal with both discrete and continuous
attributes. - A method for discovering fuzzy classification
rules using GNP has been proposed. We have
performed experiments and estimated the
performance of the GNP based method. - Extract important association rules through
generations. - The pool is updated in every generation replacing
an association rule with lower x2 value by the
same association rules with higher x2 value. - The final result of the evolutionary process is a
fuzzy rule set and a set of fuzzy membership
functions. - The results have shown that the GNP based method
extracts important association rules in the
database effectively and obtain good results in
comparison wit other methods.
45Fin
- Thank you very much.
- Any question?