Fast Algorithms for Association Rule Mining - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Fast Algorithms for Association Rule Mining

Description:

Mining for associations among items in a large database of sales transaction to ... {books, Bags} {grocery,Coke}, {utensils, coke} {books}, Major Contribution ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 14
Provided by: tcU9
Category:

less

Transcript and Presenter's Notes

Title: Fast Algorithms for Association Rule Mining


1
Fast Algorithms for Association Rule Mining
  • Muhammad Aurangzeb Ahmad
  • Nupur Bhatnagar

2
Background Motivation
  • Basket Data
  • Collection of records consisting of transaction
    identifier and the items bought in a
    transaction.
  • Mining for associations among items in a large
    database of sales transaction to predict the
    occurrence of an item based on the occurrences of
    other items in the transaction.
  • For Example

3
Problem Definition
  • INPUT
  • A set of transactions

Output Assosiation rules having support minsup
threshold confidence minconf threshold
  • Objective
  • Find rules that will predict the occurrence of an
    item based on the occurrences of other items in
    the transaction

books ? Stationary,books, Bags ?
grocery,Coke,utensils, coke ? books,
4
Major Contribution
  • Proposed two new algorithms for fast association
    rule mining
  • Apriori and AprioriTID, along with a hybrid
    of the two algorithms .
  • Empirical evaluations of the performance of the
    proposed algorithms as compared with the
    contemporary algorithms.

5
Key Concepts
  • Itemset
  • A collection of one or more items in a market
    based transactions.
  • k-Itemset
  • An itemset with k number of items is referred to
    as k-itemset.
  • Support
  • Given a rule X-gtY
  • Faction of transactions that contain both X and
    Y.
  • Support (X union Y )/ N where N is the total
    number of transactions.
  • Confidence
  • Given the rule X-gtY
  • It measures how often item in Y appear in
    transactions that contain X.
  • Confidence X union Y / X

6
Key Concepts
  • Large Itemset
  • Itemsets having support greater than minimum
    support and minimum confidence are called
  • as large itemsets.
  • Small Itemset
  • Itemsets having support less than minimum
    support and minimum confidence are called as
    small itemsets
  • Association Rule Mining
  • An asociation rule is an implication of the form
    X-gtY where X and Y are the itemsets .
  • Candidate Itemsets
  • A set of itemsets which are generated from a
    seed of itemsets which were found to be large in
    the previous pass having
  • support minsup threshold
  • confidence minconf threshold

7
Key Concepts -Apriori
  • Input
  • The market base transaction dataset.
  • Process
  • Determine large 1-itemsets.
  • Repeat until no new large 1-itemsets are
    identified.
  • Generate (k1) length candidate itemsets from
    length k large itemsets.
  • Prune candidate itemsets that are not large.
  • Count the support of each candidate itemset.
  • Eliminate candidate itemsets that are small.
  • Output
  • Itemsets that are large and qualify the min
    support and min confidence thresholds.

8
Key Concepts
  • AprioriTID
  • Same candidate generation function as Apriori.
  • Does not use database for counting support after
    the first pass.
  • Encoding of the candidate itemsets used in the
    previous pass.
  • Saves reading effort
  • Apriori Hybrid
  • Apriori Hybrid uses Apriori in the initial
    passes and switches to AprioriTid when it expects
    that the candidate itemsets at the end of the
    pass will be in memory

9
Validation Methodology-Synthetic data
  • Generated synthetic data sets involving
    transactions to evaluate the performance of
    algorithms.
  • Each itemset in a transaction has a weight
    associated with it, which corresponds to the
    probability of the itemset to be picked.

10
Validation Methodology-Weakness and Strength
  • Strength
  • Author use a substantial basket data for guiding
    the process of designing fast algorithms for
    association rule mining.
  • Weakness
  • Synthetic data set is used for validation. The
    data might be too synthetic as to not give any
    valuable information about real world datasets.

11
Assumptions
  • Synthetic dataset is used.It is assumed that
    performance of the algorithm in the synthetic
    dataset is indicative of its performance on a
    real world dataset.
  • All the items in the data are in a
    lexicographical order.
  • It is assumed that all the data is present in
    the same site or table and there are no cases
    which there would be a requirement to make joins.

12
Possible Revision
  • Some real world datasets should be used to
    perform the experiments .
  • The number of large itemsets could exponentially
    increase with large databases. Modification in
    the representation structure is required that
    captures just a subset of the candidate large
    itemsets.

13
Questions?
Write a Comment
User Comments (0)
About PowerShow.com