DATA MINING - PowerPoint PPT Presentation

About This Presentation

Title:

DATA MINING

Description:

Seniors. Medium. Sunset Years. Equifax. MicroVision Medium - High. Young- Mix. Very Low ... Initial used for game playing strategies for chess games. ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 65

Provided by: patrickg6

Learn more at: http://www23.homepage.villanova.edu

Category:

more less

Transcript and Presenter's Notes

Title: DATA MINING

1
DATA MINING

Patrick J. Gallagher
March 21, 2006

2
What is Data Mining?
3
DEFININTION

The automated extraction of hidden predictive
information from (large) databases.

4
DATA MINING TECHNIQUES

CLASSICAL
1. Statistics
2. Nearest Neighborhoods 3. Clustering
NEXT GENERATION
1. Decision Trees
2. Neural Networks
3. Rule Induction

5
THE CLASSICS

Techniques discussed will be those that have been
used for decades
They have also used almost all of the time on
existing business problems

6
1. STATISTICSWHAT IS IT
7
STATISTICS branch of mathematics concerning the
collection and the description of data
8

Born of real world problems from business,
biology and gambling
Knowing statistics helps the average business
person make better decisions by allowing them to
figure out risk and uncertainty when all facts
either arent known or can be collected
Has been around for a long, long time (easily a
century)

9
HISTOGRAMSSTATISTICAL SUMMARIZATION
10
Database Information
ID NAME Prediction Age Balance Income Eyes Gender
1 Amy NO 62 0 Medium Brown F
2 Al NO 53 1,800 Medium Green M
3 Betty NO 47 16,543 High Brown F
4 Bob YES 32 45 Medium Green M
5 Carla YES 21 2,300 High Blue F
6 Carl NO 27 5,400 High Brown M
7 Donna YES 50 165 Low Blue F
8 Don YES 46 0 High Blue M
9 Edna YES 27 500 Low Blue F
10 Ed NO 68 1,200 Low Blue M
11
This histogram shows the number of customers with
various eye colors. As you can see, the
histogram can show important information about
the database.
12
WHAT QUESTIONS CAN STATISTICS ANSWER?
13

What patterns are there in my database?
What is the chance that an event will occur?
What patterns are significant?
What is a high level summary of the data that
gives me some idea of what is contained in my
database

14
NOT ALL HISTOGRAMS ARE THIS SIMPLE
15
Complex histograms provide more information
(Predictors)
16
SUMMARY STATISTICS

Max - max value of predictor
Min - minimum value of predictor
Mean - average value of predictor
Median - value for a given predictor that divides
the database as nearly as possible into two
databases of equal number of records.
Mode common value for the predictor
Variance measure of how spread out the values
are from the average value

17
STATISTICS FOR PREDICTIONPrediction ?

A Regression
B Simulation
C Decision

18
PREDICTION REGRESIONA
19
LINEAR REGRESSION

One predictor and a prediction. The relationship
between the two can be mapped on a two
dimensional space and the records plotted for the
prediction values along the Y axis and the
predictor values along the X axis
Seeks to build a predictive model that is a line
that maps between each predictor value to a
prediction value.

20
Sample Linear Regression Predictive Model

The line will take a given value for a predictor
and map it into a given value for a prediction
Equation is Predictionabpredictor
Trick with predictive modeling is to find the
model that best minimizes the error

21
2. NEAREST NEIGHBORWhat does it mean?
22
Nearest Neighbor

In order to predict what a prediction value is in
one record look for records with similar
predictor values in the historical database and
use the prediction value from the record that it
nearest to the unclassified record

23
NEAREST NEIGHBOR EXAMPLE ?
24
EXAMPLE

Determining that people in your
neighborhood have an income of over
100,000 per year
NEAREST NEIGHBOR ASSUMES
Your income is also over 100,000

Prediction for a prediction value in one record
is determined by looking for similar predictor
values in the historical database and use the
prediction value from the record that is nearest
to the unclassified record
(ex salaries of people in your neighborhood)
The techniques are among the easiest to use and
understand because the techniques work similar to
the ways a person thinks
Are among the oldest techniques used in data
mining.

26
NEAREST NEIGHBORPREDICTION TECHNIQUEUSES

Business
Stock Market Data

27
PREDICTIONIN NEAREST NEIGHBOR MEANS

Objects that are NEAR to each other will have
similar prediction values as well.
Thus if you know the prediction value of one of
the objects you can predict it for its nearest
neighbor.

28
BUSINESS

Text Retrieval This particular technique is used
to find other documents that share important
characteristics with those documents that have
been marked as interesting.

29
STOCK MARKET DATA

The input data is just a long series of stock
prices over time without any particular record
that could be considered to be an object.
Example
Predictor Values
10 12 14 15 10 13 11 14 15
Prediction Value
11 (10th number)

30
3. CLUSTERINGWhat does it mean?
31
CLUSTERING

Clustering is a method which like records are
grouped together in order to give the end user a
high level view of what is going on in the data
base and business.

32
CLUSTERINGIn the real world

Two clustering systems are the PRIZM system from
Claritias Corporation and MicroVision from
Equifax Corporation. These companies have
grouped the population by demographic information
into segments that they believe are useful direct
marketing and sales.

33
NAME INCOME AGE EDUCATION VENDOR
Blue Blood Estates Wealthy 35-54 College Claritas Prism
Shot Gun and Pickups Middle 35-64 High School Claritas Prism
Southside City Poor Mix Grade School Claritas Prism
Living Off the Land Middle Poor School Age Families Low Equifax MicroVision
University USA Very Low Young- Mix Medium - High Equifax MicroVision
Sunset Years Medium Seniors Medium Equifax MicroVision
34
CLUSTERING VS NEAREST NEIGHBOR

Nearest Neighbor
Used for prediction as well as consolidation
Space is defined by the problem to be solved.
(Supervised learning technique)
Generally only uses distance metrics to determine
nearness.

Clustering
Used mostly for consolidating data into a
high-level view and general grouping of records
into like behaviors.
Space is defined as default n-dimensional space,
or is defined by the user or is a predefined
space driven by past experience. (Unsupervised
learning technique)
Can use other metrics besides distance to
determine nearest of two records for example
linking two points together.

35
What are the two main types of Clustering
techniques?
36

HIERARCHICAL
NON-HIERARCHICAL
CLUSTERING

37
HIERARCHYofCLUSTERS
38
NON- HIERARCHIAL CLUSTERING

1. Single Pass Methods
2. Reallocation Methods

39
Hierarchical Clustering

It is created by starting either at the top and
subdividing (dividing clustering) or starting at
the bottom with as many clusters as there are
records and merging (agglomerative clustering).
Has advantage over non-hierarchical in that the
clusters are solely by the data and that the
number of clusters can be increased or decreased
by simply moving up and down the hierarchy.

40
NEXT GENERATION

Represent the most often used techniques that
have been developed over the past two decades of
research.
It can be used for either discovering new
information within large databases or for
building predictive models.

41
NEXT GENERATION

1. DECISION TREES
2. NEURAL NETWORK
3. RULE INDUCTION

42
DECISION TREESWhat are they ?
43
DECISION TREES

A predictive model that, as its name implies, can
be viewed as a tree. Specifically, each branch
of the tree is a classification question and the
leaves of the tree are partitions of the dataset
with their classification

44
DECISION TREE EXAMPLE
45
DECISION TREE HISITORY

Similar technologies have been around for almost
20 years and early versions of the algorithms
date back in the 1960s
Originally, these techniques were developed for
statisticians to automate the process of
determining which fields in their database were
actually useful or correlated with the particular
problem that they were trying to understand.

46
DECISION TREE USES

EXPLORATION looks at predictors and values that
are chosen for each split of the tree. Often
times, these predictors provide usable insights
or propose questions that need to be answered.
DATA PREPROCESSING can be used on the first
pass of data mining to create a subset of useful
predictors that can be used in neural networks,
nearest neighbor and normal statistical routines.
PREDICTION used as a by product by
statisticians because decision trees are used for
exploratory analysis.

47
DECISION TREE ALGORITHMS

ID3
CART
CHAID

48
ID3

Developed in late 1970s by J. Ross Quinlan
First Decision Tree algorithm
Based on previous inference systems and concept
learning systems from decades preceding.
Initial used for game playing strategies for
chess games.
Picks predictors and splitting values based on
gain and information that the split/s provide.
The difference between the entropy of the
original segment and the accumulated entropies of
the resulting split segments.

49
ID3 to C4.5 ENHANCEMENTS

Predictors with missing values can still be used.
Predictors with continuous values can be used.
Pruning is introduced
Rule derivation

50
CART

Stands for Classification and Regression Trees
Data exploration and prediction algorithm
developed by Leo Breiman, Jerome Friedman,
Richard Olshen and Charles Stone.
Each predictor is picked on how well it teases
apart the records with different predictions.

51
CHAID

Stands for Chi Square Automatic Interaction
Detector
Similar to CART
It builds a decision tree
Different from CART
In the way it chooses its splits.

52
2. NEURAL NETWORKSWhat is it?
53
NEURAL NETWORK

Computer programs implementing sophisticated
pattern detection and machine learning algorithms
on a computer to build predictive models from
large historical databases.

Artificial neural networks derive their name from
their historical development which started off
with the premise that machines could be made to
think if scientists found ways to mimic the
structure and functioning of the human brain on
the computer.
Greatest breakthroughs in neural networks in
recent years have been in there application to
more mundane real world problems like customer
response prediction or fraud detection.
They technically are considered to learn and
make better predictions by detecting patterns
using analogies in similar ways that humans do.

55
NEURAL NETWORKUSES

Clustering
Outlier Analysis
Example Wine Distributor
Wine distributor store stands out as making
significantly lower profit. Upon further
examination the distributor was delivering
product but not collecting payment.
Feature Extraction

56
Neural Networks(Components)

Node- corresponds to the neuron in the human
brain.
Link- it corresponds to the connections between
neurons.

57
NEURAL NETWORKSample
58
NEURAL NETWORKTYPES

Back Propagation- refers to the propagation of
the error backwards from the output nodes through
the hidden layers and to the input layer.
Kohonen Feature Maps- developed in the 1970s and
are feed forward Neural Network generally with no
hidden layer.
- Used for unsupervised learning and clustering.
Radial Basic Function represent a hybrid
between nearest neighbor and neural network
classification.
- Used for supervised and learning

59
3. RULE INDUCTIONWhat is it?
60
RULE INDUCTION

Is one of the major forms of data mining
and the most common form of knowledge
discovery in unsupervised learning systems.
It mines for a rule that is interesting.

It is a massive undertaking were all possible
patterns are systematically pulled out of the
data and then an accuracy and significance are
added to them that tell the user how strong the
pattern is and how likely it is to occur again.
Rule induction systems are highly automated and
are probably the best of data mining techniques
for exposing all possible predictive patterns in
a database

62
Neural NetworkVSRule Induction

NEURAL NETWORKS
Extremely proficient and saying exactly what must
be done in a predictive task with little
explanation.
Example- Who do I give
credit to and who do I
deny credit to.

RULE INDUCTION
When used for prediction, they are like having a
committee of trusted advisors each with a
slightly different opinion as to what to do but
relatively well grounded reasoning and a good
explanation for why it should be done.

63
What is a RULE?