Data Mining Association Rules - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining Association Rules

Description:

Different M value affect the performance of DIC algorithm, Apriori algorithms put here is as an reference. We can see as the M increase, ... – PowerPoint PPT presentation

Number of Views:308
Avg rating:3.0/5.0
Slides: 15
Provided by: IVPR
Learn more at: https://www.cs.uml.edu
Category:

less

Transcript and Presenter's Notes

Title: Data Mining Association Rules


1
Data Mining Association Rules
  • Yao Meng
  • Hongli Li
  • 91.574 Database II
  • Fall 2002

2
Outline
  • Overview
  • Apriori
  • AprioriTid
  • DIC
  • Data Structure
  • Experiment Environment
  • Experiment Result and Analysis

3
Overview Apriori Algorithm
4
Overview AprioriTid
5
Overview DIC
  • Read M transaction
  • Increment those itemset that are current counting
  • If all the child of a itemset turned to large,
    begin to counting this itemset
  • If an itemset has been counted through all the
    transaction, remove it from the current counting
    list
  • If at the end of the DB, go to the first step
  • Stop if no itemset are need to counting

6
Hypothesis of Performance Analysis
  • Given a memory size
  • AprioriTid generally has better performance than
    Apriori due to I/O saving
  • DIC has better performance than Apriori in fairly
    homogenenous data environment.
  • DIC performance should approach that of Apriori
    while M approaches number of total transaction.

7
Experiment Environment
  • Data Sets
  • IBM Synthetic Dataset Generation Code for
    Association Rules
  • Enviroments
  • Operating System Microsoft Windows XP
    Professional
  • Computer
  • Intel Pentium III processor 550MHz
  • RAM 384 MB
  • Source code written in Java

8
Data Structure
  • Apriori and DIC
  • Candidate Itemset stored in a hash-tree
  • Each internal node is are hashtables
  • The leaves stored the candidate itemset
  • AprioriTid
  • Use array to keep candidates

9
Size vs. Execution Time
  • Number of Items 8
  • Avg transaction length 5
  • M 500

10
Support Threshold
  • Size 16410 transaction Number of Items 8
  • Average Length per transaction 5 M 500

11
DIC Different M value
  • Size 12291 transaction Number of Items 8
  • Average Length per transaction 5

12
DIC Non-Homogeneous Dataset
  • Size 6000 transaction Number of Items 8
  • M 500

13
Conclusions
  • AprioriTid is the best in our experiment
  • I/O saving
  • AprioriTid use small Data structure
  • Apriori and DIC are very similar
  • Apriori is Special Case of DIC
  • They use same data structure
  • DIC
  • Sensitive to data
  • M affects performance

14
Reference
  1. Rakesh Agrawal, Tomasz Imielinski, Arun Swami.
    Mining Association Rules between Sets of Items in
    Large Database. Proceedings of the 1993 ACM
    SIGMOD International Conference on Management of
    Data, 1993
  2. Rakesh Agrawal, Ramakrishnan Srikant. Fast
    Algorithms for Mining Association Rules. Proc.
    20th Int. Conf. Very Large Data Bases, VLDB, page
    487-499. 1994
  3. Ashok Savasere, Edward Omiecinski, Shamkant
    Navathe. An Efficient Algorithm for Mining
    Association Rules in Large Databases. Proc. of
    the 21st VLDB Conf., pp. 432-444, 1995.
  4. Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman,
    Shalom Tsur. Dynamic Itemset Counting and
    Implication Rules for Market Basket Data. SIGMOD
    1997, Proceedings ACM SIGMOD International
    Conference on Management of Data. Tucson,
    Arizona, USA. 1997.
  5. J. Hipp, U. Güntzer, G. Nakhaeizadeh. Mining
    Association Rules Deriving a Superior Algorithm
    by Analysing Today's Approaches. Proceedings of
    the 4th European Symposium on Principles of Data
    Mining and Knowledge Discovery (PKDD '00), Lyon,
    France. 2000.
  6. Jochen Hipp, Ulrich Güntzer, Gholamreza
    Nakhaeizadeh. Algorithms for Association Rule
    Mining A General Survey and Comparison. SIGKDD
    Explorations. 2(1) 58-64. 2000.
  7. R. Srikant, R. Agrawal. Mining Generalized
    Association Rule. In Proc. of the VLDB
    Conference, September 1995
Write a Comment
User Comments (0)
About PowerShow.com