DATA MINING - PowerPoint PPT Presentation

About This Presentation
Title:

DATA MINING

Description:

Seniors. Medium. Sunset Years. Equifax. MicroVision Medium - High. Young- Mix. Very Low ... Initial used for game playing strategies for chess games. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 65
Provided by: patrickg6
Category:
Tags: data | mining

less

Transcript and Presenter's Notes

Title: DATA MINING


1
DATA MINING
  • Patrick J. Gallagher
  • March 21, 2006

2
What is Data Mining?
3
DEFININTION
  • The automated extraction of hidden predictive
    information from (large) databases.

4
DATA MINING TECHNIQUES
  • CLASSICAL
  • 1. Statistics
  • 2. Nearest Neighborhoods 3. Clustering
  • NEXT GENERATION
  • 1. Decision Trees
  • 2. Neural Networks
  • 3. Rule Induction

5
THE CLASSICS
  • Techniques discussed will be those that have been
    used for decades
  • They have also used almost all of the time on
    existing business problems

6
1. STATISTICSWHAT IS IT
7
STATISTICS branch of mathematics concerning the
collection and the description of data
8
  • Born of real world problems from business,
    biology and gambling
  • Knowing statistics helps the average business
    person make better decisions by allowing them to
    figure out risk and uncertainty when all facts
    either arent known or can be collected
  • Has been around for a long, long time (easily a
    century)

9
HISTOGRAMSSTATISTICAL SUMMARIZATION
10
Database Information
ID NAME Prediction Age Balance Income Eyes Gender
1 Amy NO 62 0 Medium Brown F
2 Al NO 53 1,800 Medium Green M
3 Betty NO 47 16,543 High Brown F
4 Bob YES 32 45 Medium Green M
5 Carla YES 21 2,300 High Blue F
6 Carl NO 27 5,400 High Brown M
7 Donna YES 50 165 Low Blue F
8 Don YES 46 0 High Blue M
9 Edna YES 27 500 Low Blue F
10 Ed NO 68 1,200 Low Blue M
11
This histogram shows the number of customers with
various eye colors. As you can see, the
histogram can show important information about
the database.
12
WHAT QUESTIONS CAN STATISTICS ANSWER?
13
  • What patterns are there in my database?
  • What is the chance that an event will occur?
  • What patterns are significant?
  • What is a high level summary of the data that
    gives me some idea of what is contained in my
    database

14
NOT ALL HISTOGRAMS ARE THIS SIMPLE
15
Complex histograms provide more information
(Predictors)
16
SUMMARY STATISTICS
  • Max - max value of predictor
  • Min - minimum value of predictor
  • Mean - average value of predictor
  • Median - value for a given predictor that divides
    the database as nearly as possible into two
    databases of equal number of records.
  • Mode common value for the predictor
  • Variance measure of how spread out the values
    are from the average value

17
STATISTICS FOR PREDICTIONPrediction ?
  • A Regression
  • B Simulation
  • C Decision

18
PREDICTION REGRESIONA
19
LINEAR REGRESSION
  • One predictor and a prediction. The relationship
    between the two can be mapped on a two
    dimensional space and the records plotted for the
    prediction values along the Y axis and the
    predictor values along the X axis
  • Seeks to build a predictive model that is a line
    that maps between each predictor value to a
    prediction value.

20
Sample Linear Regression Predictive Model
  • The line will take a given value for a predictor
    and map it into a given value for a prediction
  • Equation is Predictionabpredictor
  • Trick with predictive modeling is to find the
    model that best minimizes the error

21
2. NEAREST NEIGHBORWhat does it mean?
22
Nearest Neighbor
  • In order to predict what a prediction value is in
    one record look for records with similar
    predictor values in the historical database and
    use the prediction value from the record that it
    nearest to the unclassified record

23
NEAREST NEIGHBOR EXAMPLE ?
24
EXAMPLE
  • Determining that people in your
  • neighborhood have an income of over
  • 100,000 per year
  • NEAREST NEIGHBOR ASSUMES
  • Your income is also over 100,000

25
  • Prediction for a prediction value in one record
    is determined by looking for similar predictor
    values in the historical database and use the
    prediction value from the record that is nearest
    to the unclassified record
  • (ex salaries of people in your neighborhood)
  • The techniques are among the easiest to use and
    understand because the techniques work similar to
    the ways a person thinks
  • Are among the oldest techniques used in data
    mining.

26
NEAREST NEIGHBORPREDICTION TECHNIQUEUSES
  • Business
  • Stock Market Data

27
PREDICTIONIN NEAREST NEIGHBOR MEANS
  • Objects that are NEAR to each other will have
    similar prediction values as well.
  • Thus if you know the prediction value of one of
    the objects you can predict it for its nearest
    neighbor.

28
BUSINESS
  • Text Retrieval This particular technique is used
    to find other documents that share important
    characteristics with those documents that have
    been marked as interesting.

29
STOCK MARKET DATA
  • The input data is just a long series of stock
    prices over time without any particular record
    that could be considered to be an object.
  • Example
  • Predictor Values
  • 10 12 14 15 10 13 11 14 15
  • Prediction Value
  • 11 (10th number)

30
3. CLUSTERINGWhat does it mean?
31
CLUSTERING
  • Clustering is a method which like records are
    grouped together in order to give the end user a
    high level view of what is going on in the data
    base and business.

32
CLUSTERINGIn the real world
  • Two clustering systems are the PRIZM system from
    Claritias Corporation and MicroVision from
    Equifax Corporation. These companies have
    grouped the population by demographic information
    into segments that they believe are useful direct
    marketing and sales.

33
NAME INCOME AGE EDUCATION VENDOR
Blue Blood Estates Wealthy 35-54 College Claritas Prism
Shot Gun and Pickups Middle 35-64 High School Claritas Prism
Southside City Poor Mix Grade School Claritas Prism
Living Off the Land Middle Poor School Age Families Low Equifax MicroVision
University USA Very Low Young- Mix Medium - High Equifax MicroVision
Sunset Years Medium Seniors Medium Equifax MicroVision
34
CLUSTERING VS NEAREST NEIGHBOR
  • Nearest Neighbor
  • Used for prediction as well as consolidation
  • Space is defined by the problem to be solved.
    (Supervised learning technique)
  • Generally only uses distance metrics to determine
    nearness.
  • Clustering
  • Used mostly for consolidating data into a
    high-level view and general grouping of records
    into like behaviors.
  • Space is defined as default n-dimensional space,
    or is defined by the user or is a predefined
    space driven by past experience. (Unsupervised
    learning technique)
  • Can use other metrics besides distance to
    determine nearest of two records for example
    linking two points together.

35
What are the two main types of Clustering
techniques?
36
  • HIERARCHICAL
  • NON-HIERARCHICAL
  • CLUSTERING

37
HIERARCHYofCLUSTERS
38
NON- HIERARCHIAL CLUSTERING
  • 1. Single Pass Methods
  • 2. Reallocation Methods

39
Hierarchical Clustering
  • It is created by starting either at the top and
    subdividing (dividing clustering) or starting at
    the bottom with as many clusters as there are
    records and merging (agglomerative clustering).
  • Has advantage over non-hierarchical in that the
    clusters are solely by the data and that the
    number of clusters can be increased or decreased
    by simply moving up and down the hierarchy.

40
NEXT GENERATION
  • Represent the most often used techniques that
    have been developed over the past two decades of
    research.
  • It can be used for either discovering new
    information within large databases or for
    building predictive models.

41
NEXT GENERATION
  • 1. DECISION TREES
  • 2. NEURAL NETWORK
  • 3. RULE INDUCTION

42
DECISION TREESWhat are they ?
43
DECISION TREES
  • A predictive model that, as its name implies, can
    be viewed as a tree. Specifically, each branch
    of the tree is a classification question and the
    leaves of the tree are partitions of the dataset
    with their classification

44
DECISION TREE EXAMPLE
45
DECISION TREE HISITORY
  • Similar technologies have been around for almost
    20 years and early versions of the algorithms
    date back in the 1960s
  • Originally, these techniques were developed for
    statisticians to automate the process of
    determining which fields in their database were
    actually useful or correlated with the particular
    problem that they were trying to understand.

46
DECISION TREE USES
  • EXPLORATION looks at predictors and values that
    are chosen for each split of the tree. Often
    times, these predictors provide usable insights
    or propose questions that need to be answered.
  • DATA PREPROCESSING can be used on the first
    pass of data mining to create a subset of useful
    predictors that can be used in neural networks,
    nearest neighbor and normal statistical routines.
  • PREDICTION used as a by product by
    statisticians because decision trees are used for
    exploratory analysis.

47
DECISION TREE ALGORITHMS
  • ID3
  • CART
  • CHAID

48
ID3
  • Developed in late 1970s by J. Ross Quinlan
  • First Decision Tree algorithm
  • Based on previous inference systems and concept
    learning systems from decades preceding.
  • Initial used for game playing strategies for
    chess games.
  • Picks predictors and splitting values based on
    gain and information that the split/s provide.
  • The difference between the entropy of the
    original segment and the accumulated entropies of
    the resulting split segments.

49
ID3 to C4.5 ENHANCEMENTS
  • Predictors with missing values can still be used.
  • Predictors with continuous values can be used.
  • Pruning is introduced
  • Rule derivation

50
CART
  • Stands for Classification and Regression Trees
  • Data exploration and prediction algorithm
    developed by Leo Breiman, Jerome Friedman,
    Richard Olshen and Charles Stone.
  • Each predictor is picked on how well it teases
    apart the records with different predictions.

51
CHAID
  • Stands for Chi Square Automatic Interaction
    Detector
  • Similar to CART
  • It builds a decision tree
  • Different from CART
  • In the way it chooses its splits.

52
2. NEURAL NETWORKSWhat is it?
53
NEURAL NETWORK
  • Computer programs implementing sophisticated
    pattern detection and machine learning algorithms
    on a computer to build predictive models from
    large historical databases.

54
  • Artificial neural networks derive their name from
    their historical development which started off
    with the premise that machines could be made to
    think if scientists found ways to mimic the
    structure and functioning of the human brain on
    the computer.
  • Greatest breakthroughs in neural networks in
    recent years have been in there application to
    more mundane real world problems like customer
    response prediction or fraud detection.
  • They technically are considered to learn and
    make better predictions by detecting patterns
    using analogies in similar ways that humans do.

55
NEURAL NETWORKUSES
  • Clustering
  • Outlier Analysis
  • Example Wine Distributor
  • Wine distributor store stands out as making
    significantly lower profit. Upon further
    examination the distributor was delivering
    product but not collecting payment.
  • Feature Extraction

56
Neural Networks(Components)
  • Node- corresponds to the neuron in the human
    brain.
  • Link- it corresponds to the connections between
    neurons.

57
NEURAL NETWORKSample
58
NEURAL NETWORKTYPES
  • Back Propagation- refers to the propagation of
    the error backwards from the output nodes through
    the hidden layers and to the input layer.
  • Kohonen Feature Maps- developed in the 1970s and
    are feed forward Neural Network generally with no
    hidden layer.
  • - Used for unsupervised learning and clustering.
  • Radial Basic Function represent a hybrid
    between nearest neighbor and neural network
    classification.
  • - Used for supervised and learning

59
3. RULE INDUCTIONWhat is it?
60
RULE INDUCTION
  • Is one of the major forms of data mining
  • and the most common form of knowledge
  • discovery in unsupervised learning systems.
  • It mines for a rule that is interesting.

61
  • It is a massive undertaking were all possible
    patterns are systematically pulled out of the
    data and then an accuracy and significance are
    added to them that tell the user how strong the
    pattern is and how likely it is to occur again.
  • Rule induction systems are highly automated and
    are probably the best of data mining techniques
    for exposing all possible predictive patterns in
    a database

62
Neural NetworkVSRule Induction
  • NEURAL NETWORKS
  • Extremely proficient and saying exactly what must
    be done in a predictive task with little
    explanation.
  • Example- Who do I give
  • credit to and who do I
  • deny credit to.
  • RULE INDUCTION
  • When used for prediction, they are like having a
    committee of trusted advisors each with a
    slightly different opinion as to what to do but
    relatively well grounded reasoning and a good
    explanation for why it should be done.

63
What is a RULE?
  • if this and this and this then this.
  • EXAMPLES
  • - If paper plates then plastic forks.
  • - If dip then potato chips.

64
Presenter
  • Dr. Balaji Padmanabhan
  • Assistant Professor of Operations and Information
    Management
  • The Wharton School, University of Pennsylvania
  • Teaches
  • Enabling (Information) Technologies
  • Data Mining / Decision Support Systems
  • Introduction to the Computer as an Analysis Tool
Write a Comment
User Comments (0)
About PowerShow.com