An Excelbased Data Mining Tool

About This Presentation

Title:

An Excelbased Data Mining Tool

Description:

Step 1: Enter the Data to be Mined. Step 2: Perform a Data Mining Session ... It is possible to run RuleMaker without running the mining algorithm again. ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 36

Provided by: xx197

Category:

more less

Transcript and Presenter's Notes

Title: An Excelbased Data Mining Tool

1

An Excel-based Data Mining Tool
iDA

2
(No Transcript)
3
(No Transcript)
4
ESX A Multipurpose Tool for Data Mining
5
The Algorithmic Logic Behind ESX

Given
A set of existing concept-level nodes C1, ..., Cn
An average class resemblance score S
A new instance I to be classified
Classify I with the concept class that will
improve S the most, or hurt S the least.
If learning is unsupervised, create a new concept
node with I alone if it results in a better S
score.

6
iDAV Format for Data Mining

iDA attribute/value format
First row attribute names
Second row attribute type identifier
C categorical, R real (real stands for any
numeric field)
Third row attribute usage identifier
I input, O output, U unused D display only
Forth row test set data

7
(No Transcript)
8
(No Transcript)
9
A Five-step Approach for Unsupervised Clustering

Step 1 Enter the Data to be Mined
Step 2 Perform a Data Mining Session
Step 3 Read and Interpret Summary Results
Step 4 Read and Interpret Individual Class
Results
Step 5 Visualize Individual Class Rules

10
Step 1 Enter The Data To Be Mined
11
Step 2 Perform A Data Mining Session

iDA -gt begin mining session
Select instance similarity and real-valued
tolerance setting

12
RuleMaker Settings
13
Step 3 Read and Interpret Summary Results

Class Resemblance Scores
Similarity of instances in the class
Domain Resemblance Score
Similarity of instances in the entire set
Cluster Quality
Class resemblance with reference to domain
resemblance (clusters should be at least as good
as the domain)

14
Step 3 Results about Attributes

Categorical
Domain Predictability
Given categorical attribute A with possible
values v1,..,vn, domain predictability gives the
number of instances that has A equal to vi (if
domain predictability score is close to 100,
most of the instances have the same value, and
the attribute is not very valuable for learning
purposes)
Numeric
Attribute significance
Given attribute A, find the range of class means,
and divide by the domain standard deviation
(higher values are better for differentiation
purposes)

15
(No Transcript)
16
(No Transcript)
17
Step 4 Read and Interpret Individual Class
Results

Class Predictability is a within-class measure.
Given class C and categorical attribute A with
possible values v1,..,vn, class predictability
gives the percent of instances that has A equal
to vi in C
Class Predictiveness is a between-class measure.
Given class C and categorical attribute A with
possible values v1,..,vn, class predictiveness
for vi is the probability that an instance
belongs to C given it has value vi for A.

18
(No Transcript)
19
Necessary and Sufficient Conditions

A predictiveness score of 1.0 tells us that all
instances with the particular attribute value
belong to this particular class.
gt Attribute v is a sufficient condition for
membership in this class.
A predictability score of 1.0 tells us that all
the instances in this class have Attribute v.
gt Attribute v is a necessary condition for
membership in this class.

20
Necessary and/or Sufficient Conditions

If both predictability and predictiveness scores
are 1.0, the particular value for the attribute
is necessary and sufficient for class membership.
ESX outputs necessary and sufficient attribute
values that meet a particular cut-off (0.80) as
highly necessary and highly sufficient.

21
(No Transcript)
22
Step 5 Visualize Individual Class Rules
23
RuleMaker Settings

Recall that we used the setting to ask RuleMaker
to generate all rules. This is a good way to
learn about the nature of the problem at hand.

24
A Six-Step Approach for Supervised Learning

Step 1 Choose an Output Attribute
Step 2 Perform the Mining Session
Step 3 Read and Interpret Summary Results
Step 4 Read and Interpret Test Set Results
Step 5 Read and Interpret Class Results
Step 6 Visualize and Interpret Class Rules

25
Perform the Mining Session

Decide on the size of the training set.
The remaining items will be used by the software
to test the model that is developed (and
evaluation results will be reported).

26
Read and Interpret Summary Results

The worksheet RES SUM contains summary
information.
Class resemblance scores, attribute summary
information (categorical and numerical) and most
commonly occurring attributes for each class are
given.

27
Read and Interpret Test Set Results
28
Read and Interpret Test Set Results

Worksheets RES TST, RES MTX
Reports performance on the test set (which was
not part of model training)
RES MTX reports confusion matrix
RES TST reports for each instance in the test set
the models classification, and whether it is
accurate or not.

29
Read and Interpret Class Results

As individual clusters are of interest in
unsupervised learning, the information about
individual classes is relevant in supervised
learning.
Worksheet RES CLS contains the information.
Most and least typical instances are also given
here.
The worksheet RUL TYP gives typicality scores
for all of the instances in the test set.

30
Visualize and Interpret Class Rules

All rules or covering set of rules?
Worksheet RES Rul contains the rules generated by
RuleMaker
If all rules are generated, there might be
overlapping coverage.
The covering set algorithm works iteratively, by
identifying the best covering rule and updating
the instance set to be covered.
It is possible to run RuleMaker without running
the mining algorithm again. This menu item can be
used to change the RuleMaker settings to generate
alternative rule sets.

31
Generating Rules The General Idea

Choose an attribute that differentiates all
domain/subclass instances best.
Use the attribute to subdivide instances into
classes.
For each subclass
If the instances meet a predefined criteria,
generate a defining rule for the subclass.
If the predefined criteria is not met, go to Step
1.

32
Techniques for Generating Rules