COT5230 Data Mining

1 / 30
About This Presentation
Title:

COT5230 Data Mining

Description:

A U S T R A L I A ' S I N T E R N A T I O N A L U N I V E R S I T Y. Lecture Outline ... which items should not simultaneously be discounted. Market Basket Analysis ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 31
Provided by: rred

less

Transcript and Presenter's Notes

Title: COT5230 Data Mining


1
COT5230 Data Mining
  • Week 2
  • Data Mining Tasks

M O N A S H
A U S T R A L I A S I N T E R N A T I O N A
L U N I V E R S I T Y
2
Lecture Outline
  • Market Basket Analysis
  • Machine Learning - Basic Concepts

3
Data Mining Tasks 1
  • Various Taxonomies exist. Berry Linoff define 6
    tasks
  • Classification
  • Estimation
  • Prediction
  • Affinity Grouping
  • Clustering
  • Description

4
Data Mining Tasks 2
  • The Tasks are also referred to as Operations.
    Cabena et al. define 4 Operations
  • Predictive Modeling
  • Database Segmentation
  • Link Analysis
  • Deviation Detection

5
Affinity Grouping
  • Affinity grouping is also referred to as Market
    Basket Analysis
  • A common example is the discovery of which items
    are frequently sold together at a supermarket. If
    this is known, decisions can be made about
  • arranging items on shelves
  • which items should be promoted together
  • which items should not simultaneously be
    discounted

6
Market Basket Analysis
Confidence
Rule Body
When a customer buys a shirt, in 70 of cases,
he or she will also buy a tie! We find this
happens in 13.5 of all purchases.
Rule Head
Support
7
The Usefulness of Market Basket Analysis
  • Some rules are useful Unknown, unexpected and
    indicative of some action to take.
  • Some rules are trivial Known by anyone familiar
    with the business.
  • Some rules are inexplicable Seem to have no
    explanation and do not suggest a course of
    action.The key to success in business is to
    know something that nobody else
    knows Aristotle Onassis

8
Co-Occurrence Table
Customer Items 1 orange juice (OJ),
cola 2 milk, orange juice, window
cleaner 3 orange juice, detergent 4 orange
juice, detergent, cola 5 window cleaner,
cola OJ Cleaner Milk Cola Detergent OJ 4
1 1 2 2 Cleaner 1 2
1 1 0 Milk 1 1 1 0 0 Cola 2
1 0 3 1 Detergent 2 0 0 1 2
9
The Process for Market Basket Analysis
  • A co-occurrence cube would show associations in
    three dimensions - hard to visualize more
  • We must
  • Choose the right set of items
  • Generate rules by deciphering the counts in the
    co-occurrence matrix
  • Overcome the practical limits imposed by many
    items in large numbers of transactions

10
Choosing the Right Set of Items
  • Choosing the right level of detail (the creation
    of classes and a taxonomy)
  • Virtual items may be added to take advantage of
    information that goes beyond the taxonomy
  • Anonymous versus signed transactions

11
What is a Rule?
If condition then result Note If nappies and
Thursday then beer is usually better than (in
the sense that it is more actionable) If
Thursday then nappies and beerbecause it has
just one item in the result If a 3 way
combination is the most common, then consider
rules with just 1 item in the result, e.g. If A
and B, then C If A and C, then B
12
Is the Rule a Useful Predictor? - 1
  • Confidence is the ratio of the number of
    transactions with all the items in the rule to
    the number of transactions with just the items in
    the condition. Considerif B and C then A
  • If this rule has a confidence of 0.33, it means
    that when B and C occur in a transaction, there
    is a 33 chance that A also occurs.

13
Is the Rule a Useful Predictor? - 2
  • Consider the following table of probabilities of
    items and there combinations

14
Is the Rule a Useful Predictor? - 3
  • Now consider the following rulesIt is
    tempting to choose If B and C then A, because
    it is the most confident (33) - but there is a
    problem

15
Is the Rule a Useful Predictor? - 4
  • This rule is actually worse than just saying that
    A randomly occurs in the transaction - which
    happens 45 of the time
  • A measure called improvement indicates whether
    the rule predicts the result better than just
    assuming the result in the first place
    p(condition and result) p(condition)p(resu
    lt)

Improvement
16
Is the Rule a Useful Predictor? - 5
  • Improvement measures how much better a rule is at
    predicting a result than just assuming the result
    in the first place
  • When improvement gt 1, the rule is better at
    predicting the result than random chance

17
Is the Rule a Useful Predictor? - 6
  • Consider the improvement for our rules
  • None of the rules with three items shows any
    improvement - the best rule in the data actually
    has only two items if A then B. A predicts the
    occurrence of B 1.31 times better than chance.

18
Is the Rule a Useful Predictor? - 7
  • When improvement lt 1, negating the result
    produces a better rule. For example if B and C
    then not Ahas a confidence of 0.67 and thus an
    improvement of 0.67/0.55 1.22
  • Negated rules may not be as useful as the
    original association rules when it comes to
    acting on the results

19
Strengths and Weaknesses
  • Strengths
  • Clear understandable results
  • Supports undirected data mining
  • Works on variable length data
  • Is simple to understand
  • Weaknesses
  • Requires exponentially more computational effort
    as the problem size grows
  • Suits items in transactions but not all problems
    fit this description
  • It can be difficult to determine the right set of
    items to analysis
  • It does not handle rare items well simply
    considering the level of support will exclude
    these items

20
Machine Learning
  • A general law can never be verified by a finite
    number of observations. It can, however, be
    falsified by only one observation. Karl
    Popper
  • The patterns that machine learning algorithms
    find can never be definitive theories
  • Any results discovered must to be tested for
    statistical relevance

21
The Empirical Cycle
Analysis
Theory
Observation
Prediction
22
Concept Learning - 1
  • Example the concept of a wombat
  • a learning algorithm could consider many animals
    and be advised in each case whether it is a
    wombat or not. From this a definition would be
    deduced.
  • The definition is
  • complete if it recognizes all instances of a
    concept ( in this case a wombat).
  • consistent if it does not classify any negative
    examples as falling under the concept.

23
Concept Learning - 2
  • An incomplete definition is too narrow and would
    not recognize some wombats.
  • An inconsistent definition is too broad and would
    classify some non-wombats as wombats.
  • A bad definition could be both inconsistent and
    incomplete.

24
Hypothesis Characteristics - 1
  • Classification Accuracy
  • 1 in a million wrong is better than 1 in 10
    wrong.
  • Transparency
  • A person is able understand the hypothesis
    generated. It is then much easier to take action

25
Hypothesis Characteristics - 2
  • Statistical Significance
  • The hypothesis must perform better than the naĂŻve
    prediction. (Imagine if 80 of animals considered
    are wombats and the theory is that all animals
    are wombats then the theory is right 80 of the
    time! But nothing has been learnt.)
  • Information Content
  • We look for a rich hypothesis. The more
    information contained (while still being
    transparent) the more understanding is gained and
    the easier it is to formulate an action plan.

26
Complexity of Search Space
  • Machine learning can be considered as a search
    problem. We wish to find the correct hypothesis
    from among many.
  • If there are only a few hypotheses we could try
    them all but if there are an infinite number we
    need a better strategy.
  • If we have a measure of the quality of the
    hypothesis we can use that measure to select
    potential good hypotheses and based on the
    selection try to improve the theories
    (hill-climbing search)
  • Consider the metaphor of the kangaroo in the
    mist.
  • This demonstrates that it is important to know
    the complexity of the search space. Also that
    some pattern recognition patterns are almost
    impossible to solve.

27
Learning as a Compression
  • We have learnt something if we have an algorithm
    that creates a description of the data that is
    shorter than the original data set
  • A knowledge representation is required that is
    incrementally compressible and an algorithm that
    can achieve that incremental compression
  • The file-in could be a relation table and the
    file-out a prediction or a suggested clustering

Algorithm
File-out
File-in
28
Types of Input Message (File-in)
  • Unstructured or random messages
  • Highly structured messages with patterns that are
    easy to find
  • Highly structured messages that are difficult to
    decipher
  • Partly structured messages
  • Most data sets considered by data mining are in
    this class. There are patterns to be found but
    the data sets are not highly regular

29
Minimum Message Length Principle
  • The best theory to explain a set of data is the
    one that minimizes the sum of the length, in
    bits, of the description of the theory, plus the
    length of the data when encoded with the help of
    the theory. 011000110010011011000110101011111001
    00110
    00110011000011 110001100110000111
  • Put another way, if regularity is found in a data
    set and the description of this regularity
    together with the description of the exceptions
    is still shorter than the original data set, then
    something of value has been found.

Original data set
Theory
Data set coded with the theory
30
Noise and Redundancy
  • The distortion or mutation of a message is the
    number of bits that are corrupted
  • making the message longer by including redundant
    information can ensure that a message is received
    correctly even in the presence of noise
  • Some pattern recognition algorithms cope well
    with the presence of noise, others do not
  • We could consider a database which lacks
    integrity to contain a large amount of noise
  • patterns may exist for a small percentage of the
    data due solely to noise
Write a Comment
User Comments (0)