Fast Algorithms for Association Rule Mining - PowerPoint PPT Presentation

1 / 13

About This Presentation

Title:

Fast Algorithms for Association Rule Mining

Description:

Number of Views:56

Avg rating:3.0/5.0

Slides: 14

Provided by: tcU9

Category:

more less

Transcript and Presenter's Notes

Title: Fast Algorithms for Association Rule Mining

1
Fast Algorithms for Association Rule Mining

2
Background Motivation

Basket Data
Collection of records consisting of transaction
identifier and the items bought in a
transaction.
Mining for associations among items in a large
database of sales transaction to predict the
occurrence of an item based on the occurrences of
other items in the transaction.
For Example

3
Problem Definition

Output Assosiation rules having support minsup
threshold confidence minconf threshold

Objective
Find rules that will predict the occurrence of an
item based on the occurrences of other items in
the transaction

books ? Stationary,books, Bags ?
grocery,Coke,utensils, coke ? books,
4
Major Contribution

Proposed two new algorithms for fast association
rule mining
Apriori and AprioriTID, along with a hybrid
of the two algorithms .
Empirical evaluations of the performance of the
proposed algorithms as compared with the
contemporary algorithms.

5
Key Concepts

6
Key Concepts

Large Itemset
Itemsets having support greater than minimum
support and minimum confidence are called
as large itemsets.
Small Itemset
Itemsets having support less than minimum
support and minimum confidence are called as
small itemsets
Association Rule Mining
An asociation rule is an implication of the form
X-gtY where X and Y are the itemsets .
Candidate Itemsets
A set of itemsets which are generated from a
seed of itemsets which were found to be large in
the previous pass having
support minsup threshold
confidence minconf threshold

7
Key Concepts -Apriori

Input
The market base transaction dataset.
Process
Determine large 1-itemsets.
Repeat until no new large 1-itemsets are
identified.
Generate (k1) length candidate itemsets from
length k large itemsets.
Prune candidate itemsets that are not large.
Count the support of each candidate itemset.
Eliminate candidate itemsets that are small.
Output
Itemsets that are large and qualify the min
support and min confidence thresholds.

8
Key Concepts

AprioriTID
Same candidate generation function as Apriori.
Does not use database for counting support after
the first pass.
Encoding of the candidate itemsets used in the
previous pass.
Saves reading effort
Apriori Hybrid
Apriori Hybrid uses Apriori in the initial
passes and switches to AprioriTid when it expects
that the candidate itemsets at the end of the
pass will be in memory

9
Validation Methodology-Synthetic data

Generated synthetic data sets involving
transactions to evaluate the performance of
algorithms.
Each itemset in a transaction has a weight
associated with it, which corresponds to the
probability of the itemset to be picked.

10
Validation Methodology-Weakness and Strength

Strength
Author use a substantial basket data for guiding
the process of designing fast algorithms for
association rule mining.
Weakness
Synthetic data set is used for validation. The
data might be too synthetic as to not give any
valuable information about real world datasets.

11
Assumptions

Synthetic dataset is used.It is assumed that
performance of the algorithm in the synthetic
dataset is indicative of its performance on a
real world dataset.
All the items in the data are in a
lexicographical order.
It is assumed that all the data is present in
the same site or table and there are no cases
which there would be a requirement to make joins.

12
Possible Revision

Some real world datasets should be used to
perform the experiments .
The number of large itemsets could exponentially
increase with large databases. Modification in
the representation structure is required that
captures just a subset of the candidate large
itemsets.

13
Questions?

Write a Comment

User Comments (0)