COT5230 Data Mining

1 / 30

About This Presentation

Title:

COT5230 Data Mining

Description:

A U S T R A L I A ' S I N T E R N A T I O N A L U N I V E R S I T Y. Lecture Outline ... which items should not simultaneously be discounted. Market Basket Analysis ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 31

Provided by: rred

more less

Transcript and Presenter's Notes

Title: COT5230 Data Mining

1
COT5230 Data Mining

Week 2
Data Mining Tasks

M O N A S H
A U S T R A L I A S I N T E R N A T I O N A
L U N I V E R S I T Y
2
Lecture Outline

Market Basket Analysis
Machine Learning - Basic Concepts

3
Data Mining Tasks 1

Various Taxonomies exist. Berry Linoff define 6
tasks
Classification
Estimation
Prediction
Affinity Grouping
Clustering
Description

4
Data Mining Tasks 2

The Tasks are also referred to as Operations.
Cabena et al. define 4 Operations
Predictive Modeling
Database Segmentation
Link Analysis
Deviation Detection

5
Affinity Grouping

Affinity grouping is also referred to as Market
Basket Analysis
A common example is the discovery of which items
are frequently sold together at a supermarket. If
this is known, decisions can be made about
arranging items on shelves
which items should be promoted together
which items should not simultaneously be
discounted

6
Market Basket Analysis
Confidence
Rule Body
When a customer buys a shirt, in 70 of cases,
he or she will also buy a tie! We find this
happens in 13.5 of all purchases.
Rule Head
Support
7
The Usefulness of Market Basket Analysis

Some rules are useful Unknown, unexpected and
indicative of some action to take.
Some rules are trivial Known by anyone familiar
with the business.
Some rules are inexplicable Seem to have no
explanation and do not suggest a course of
action.The key to success in business is to
know something that nobody else
knows Aristotle Onassis

8
Co-Occurrence Table
Customer Items 1 orange juice (OJ),
cola 2 milk, orange juice, window
cleaner 3 orange juice, detergent 4 orange
juice, detergent, cola 5 window cleaner,
cola OJ Cleaner Milk Cola Detergent OJ 4
1 1 2 2 Cleaner 1 2
1 1 0 Milk 1 1 1 0 0 Cola 2
1 0 3 1 Detergent 2 0 0 1 2
9
The Process for Market Basket Analysis

A co-occurrence cube would show associations in
three dimensions - hard to visualize more
We must
Choose the right set of items
Generate rules by deciphering the counts in the
co-occurrence matrix
Overcome the practical limits imposed by many
items in large numbers of transactions

10
Choosing the Right Set of Items

Choosing the right level of detail (the creation
of classes and a taxonomy)
Virtual items may be added to take advantage of
information that goes beyond the taxonomy
Anonymous versus signed transactions

11
What is a Rule?
If condition then result Note If nappies and
Thursday then beer is usually better than (in
the sense that it is more actionable) If
Thursday then nappies and beerbecause it has
just one item in the result If a 3 way
combination is the most common, then consider
rules with just 1 item in the result, e.g. If A
and B, then C If A and C, then B
12
Is the Rule a Useful Predictor? - 1

Confidence is the ratio of the number of
transactions with all the items in the rule to
the number of transactions with just the items in
the condition. Considerif B and C then A
If this rule has a confidence of 0.33, it means
that when B and C occur in a transaction, there
is a 33 chance that A also occurs.

13
Is the Rule a Useful Predictor? - 2

Consider the following table of probabilities of
items and there combinations

14
Is the Rule a Useful Predictor? - 3

Now consider the following rulesIt is
tempting to choose If B and C then A, because
it is the most confident (33) - but there is a
problem

15
Is the Rule a Useful Predictor? - 4

This rule is actually worse than just saying that
A randomly occurs in the transaction - which
happens 45 of the time
A measure called improvement indicates whether
the rule predicts the result better than just
assuming the result in the first place
p(condition and result) p(condition)p(resu
lt)

Improvement
16
Is the Rule a Useful Predictor? - 5

Improvement measures how much better a rule is at
predicting a result than just assuming the result
in the first place
When improvement gt 1, the rule is better at
predicting the result than random chance

17
Is the Rule a Useful Predictor? - 6

Consider the improvement for our rules
None of the rules with three items shows any
improvement - the best rule in the data actually
has only two items if A then B. A predicts the
occurrence of B 1.31 times better than chance.

18
Is the Rule a Useful Predictor? - 7

When improvement lt 1, negating the result
produces a better rule. For example if B and C
then not Ahas a confidence of 0.67 and thus an
improvement of 0.67/0.55 1.22
Negated rules may not be as useful as the
original association rules when it comes to
acting on the results

19
Strengths and Weaknesses

Strengths
Clear understandable results
Supports undirected data mining
Works on variable length data
Is simple to understand
Weaknesses
Requires exponentially more computational effort
as the problem size grows
Suits items in transactions but not all problems
fit this description
It can be difficult to determine the right set of
items to analysis
It does not handle rare items well simply
considering the level of support will exclude
these items

20
Machine Learning

A general law can never be verified by a finite
number of observations. It can, however, be
falsified by only one observation. Karl
Popper
The patterns that machine learning algorithms
find can never be definitive theories
Any results discovered must to be tested for
statistical relevance

21
The Empirical Cycle
Analysis
Theory
Observation
Prediction
22
Concept Learning - 1

Example the concept of a wombat
a learning algorithm could consider many animals
and be advised in each case whether it is a
wombat or not. From this a definition would be
deduced.
The definition is
complete if it recognizes all instances of a
concept ( in this case a wombat).
consistent if it does not classify any negative
examples as falling under the concept.

23
Concept Learning - 2

An incomplete definition is too narrow and would
not recognize some wombats.
An inconsistent definition is too broad and would
classify some non-wombats as wombats.
A bad definition could be both inconsistent and
incomplete.

24
Hypothesis Characteristics - 1

Classification Accuracy
1 in a million wrong is better than 1 in 10
wrong.
Transparency
A person is able understand the hypothesis
generated. It is then much easier to take action

25
Hypothesis Characteristics - 2

Statistical Significance
The hypothesis must perform better than the naïve
prediction. (Imagine if 80 of animals considered
are wombats and the theory is that all animals
are wombats then the theory is right 80 of the
time! But nothing has been learnt.)
Information Content
We look for a rich hypothesis. The more
information contained (while still being
transparent) the more understanding is gained and
the easier it is to formulate an action plan.

26
Complexity of Search Space

Machine learning can be considered as a search
problem. We wish to find the correct hypothesis
from among many.
If there are only a few hypotheses we could try
them all but if there are an infinite number we
need a better strategy.
If we have a measure of the quality of the
hypothesis we can use that measure to select
potential good hypotheses and based on the
selection try to improve the theories
(hill-climbing search)
Consider the metaphor of the kangaroo in the
mist.
This demonstrates that it is important to know
the complexity of the search space. Also that
some pattern recognition patterns are almost
impossible to solve.

27
Learning as a Compression

We have learnt something if we have an algorithm
that creates a description of the data that is
shorter than the original data set
A knowledge representation is required that is
incrementally compressible and an algorithm that
can achieve that incremental compression
The file-in could be a relation table and the
file-out a prediction or a suggested clustering

Algorithm
File-out
File-in
28
Types of Input Message (File-in)

Unstructured or random messages
Highly structured messages with patterns that are
easy to find
Highly structured messages that are difficult to
decipher
Partly structured messages
Most data sets considered by data mining are in
this class. There are patterns to be found but
the data sets are not highly regular

29
Minimum Message Length Principle

The best theory to explain a set of data is the
one that minimizes the sum of the length, in
bits, of the description of the theory, plus the
length of the data when encoded with the help of
the theory. 011000110010011011000110101011111001
00110
00110011000011 110001100110000111
Put another way, if regularity is found in a data
set and the description of this regularity
together with the description of the exceptions
is still shorter than the original data set, then
something of value has been found.

Original data set
Theory
Data set coded with the theory
30
Noise and Redundancy

The distortion or mutation of a message is the
number of bits that are corrupted
making the message longer by including redundant
information can ensure that a message is received
correctly even in the presence of noise
Some pattern recognition algorithms cope well
with the presence of noise, others do not
We could consider a database which lacks
integrity to contain a large amount of noise
patterns may exist for a small percentage of the
data due solely to noise