Arif Djunaidy Rully Soelaiman Daning Tyaspamadya - PowerPoint PPT Presentation

About This Presentation

Arif Djunaidy Rully Soelaiman Daning Tyaspamadya


Title: Perancangan dan Pembuatan Perangkat Lunak Data Mining untuk Pencarian Kaidah Asosiasi dengan Metode Bottom-Up Author: Arif Djunaidy Last modified by – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 23
Provided by: Arif73
Learn more at:


Transcript and Presenter's Notes

Title: Arif Djunaidy Rully Soelaiman Daning Tyaspamadya

  • Arif DjunaidyRully SoelaimanDaning Tyaspamadya

Faculty of Information Technology ITS - Surabaya
Background - 1
  • In data mining, association rules represent
    relationships that may exist among items in their
    transactional databases
  • Since, the association rules that can be
    exploited may represent the customers behavior,
    identification of the frequent itemsets and the
    formation of the conditional implication rules
    among items are paramount important to perform
  • Efficient algorithms capable of optimizing those
    overheads in mining meaningful association rules
    are therefore required
  • However, for large databases, the extraction of a
    set of meaningful association rules may require
    substantial memory and database scanning that may
    in turn increase the overall computing time of
    the mining process

Background - 2
  • The task of discovering all frequent associations
    in very large databases is quite challenging
  • The search space is exponential in the number of
    database attributes
  • With millions of database objects, the problem of
    I/O minimization becomes paramount
  • Most current approaches are iterative in nature,
    requiring multiple database scans
  • Most approaches use very complicated data
    internal data structures, which have poor
    locality and add additional space and computation

Key Features of Our Approach
  • All frequent itemsets are enumerated via simple
    tid-list intersections
  • A lattice-theoretic approach is used to decompose
    the original search space (lattice) into smaller
    pieces (sub-lattices) that can be processed
    independently and easier
  • The hybrid search strategy for enumerating the
    frequent itemsets within each sub-lattice
  • Our approach is designed to involve only a few
    database scans to minimize the I/O costs

Problem Statement - 1
  • An association rule can be written as A ? B,
  • A is an itemset called the antecedent or
    left-hand side (LHS), and
  • B is an itemset called the consequent or
    right-hand side (RHS)
  • The association mining task is to discover a set
    of association rules among a large number of
    objects in a given database

Problem Statement - 2
  • The basic and fundamental task of the mining
    association rules application is to generate all
    association rules X ? Y (X, Y are itemsets) that
    can be extracted from the database. These rules
    must satisfy both the support and confidence
  • Support constraint Sup (X ? Y),
  • Confidence constraint Sup (X ? Y) / Sup (X)
  • Sup(X), is defined as the number of transactions
    in which it occurs as a subset
  • An itemset is categorized as a frequent itemset
    if its support is more than a minimum support
    (MinSup) supplied by a user
  • The confidence factor represents the conditional
    probability that a transaction contains Y (given
    that the transaction contains X)
  • An association rule is said to be confident if
    its confidence factor value is more than the
    minimum confidence (MinCof) supplied by the user.

Simple Example - 1
  • Consider the sales database of food store, where
    the objects represent customers and itemsets
    represent food
  • In this example, the discovered patterns are the
    set of food frequently bought together by the
  • An example pattern found could be that, 60
    percent of the customers who buy cereal also buy
  • The store can then use this knowledge for shelf
    placement, controlling the stock, etc.
  • There are many potential application areas for
    association rule technology, which include
    catalog design, customer segmentation, store
    layout, and so on

Simple Example - 2
MinSup 50
MinCof 100
The Lattice-Based Approach - 1
  • We use the Lattice-Theoretic to
  • Identify all frequent itemsets
  • Count the support of association rules
  • Pre-req Construct the tid-list from the
    transaction database

The Lattice-Based Approach - 2
  • Construct the powerset Lattice P(I)

MinSup 50
Maximal freq. itemsets
The Lattice-Based Approach - 3
  • Compute support of iternsets via tid-list

Hybrid Search for Freq. Itemsets - 1
  • Hybrid Search used to quickly enumerate all
    frequent itemsets
  • Hybrid Search combines both the top-down and
    bottom-up search strategies and is based on the
    intuition that the greater the support of a
    frequent itemset, the more likely it is to be a
    part of a longer frequent itemset
  • The hybrid approach is divided in two main steps
  • Initial phase containing the atoms rearrangement,
  • The hybrid process itself for generating all
    frequent itemsets. In the second step, the
    recursion process is repeated until no more
    frequent itemset can be generated

Hybrid Search for Freq. Itemsets - 2
  • The first step simply rearranges the atoms in
    descending order of their supports. The sorting
    algorithm is involved in this step
  • The second step starts by intersecting a pair of
    atoms one at a time
  • The intersection process is started from a pair
    of atoms each of which having the largest support
    among others to produce a larger and longer
    frequent itemset.
  • The process stops when an extension becomes
    infrequent (i.e., itemset that does not satisfy
    the minimum support requirement).
  • The second bottom-up phase is then entered

Hybrid Search for Freq. Itemsets - 3
Infrequent Itemsets (MinSup 50)
Infrequent Itemsets
Design of Application

Test Data
Statistics of Test Data

Experimental Results - 1

Number of k-itemsets
Experimental Results - 2

Number of Association Rules
Experimental Results - 3

Computing Time
Experimental Results - 4

Support Counting Performance
Experimental Results - 5

Comparison Results
  • Experimental results show that the use of this
    approach as well as the hybrid search method can
    speed-up the computing time compared to both
    apriori-based algorithms as well as the similar
    lattice-based approach that uses the bottom-up
    search strategy
  • Another interesting advantage of using the
    lattice-based algorithm is concerned with time
    used for scanning the databases. In this
    context, the lattice-based algorithms requires a
    single database scan once only. Hence, the I/O
    overhead can be maximally minimized
  • As far as the computing speed is concerned, it
    seems that substantial computing time are still
    required to execute large databases. Although,
    the lattice-approach is relatively powerful, it
    indicates that some other computing
    methodologies, such as the parallel algorithms
    using the distributed computing environments need
    to be considered to solve the computing speed
Write a Comment
User Comments (0)