User-Defined%20Association%20Mining - PowerPoint PPT Presentation

About This Presentation
Title:

User-Defined%20Association%20Mining

Description:

Many new ad-hoc association mining tasks can be defined in the UDA mining framework. ... In k-th iteration, a k-tuple (X1,..., Xk,Zi ) is generated from two ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 20
Provided by: centr83
Category:

less

Transcript and Presenter's Notes

Title: User-Defined%20Association%20Mining


1
User-Defined Association Mining
  • Ke Wang
  • Simon Fraser University
  • Yu He
  • National University of Singapore

2
Motivation
  • Current one-framework-per-notion paradigm.
  • The users may not find pre-determined framework
    suitable for their specific needs.

3
Several examples
  1. Causal relationships among multi-item events
    MALE?POSTGRAD ? HIGH_INCOME
  2. HIGH_INCOME is more correlated with MALE?POSTGRAD
    than FEMALE?POSTGRAD
  3. BEER is more correlated with CHIPS during
    6PM,9PM ?WEEKDAY than the general.

4
Our approach
  • User-Defined Association Mining (UDA mining).
  • Define a language for specifying a broad class of
    associations and yet efficient to be implemented.
  • Existing notions of association mining are
    instances of the UDA mining.
  • Many new ad-hoc association mining tasks can be
    defined in the UDA mining framework.

5
Definitions
  • items and events
  • A user-defined association is in the form of
  • Z1,, Zp ? X1,, Xk
  • subject events X1,, Xk
  • context events Z1,, Zp
  • Support_Filter P(X1Xk Zi) ? mini_sup
  • Strength_Filter ?i ? mini_str i

6
UDA specification
  • UDA(z1,, zp ? x1,, xk) (kgt0, p ?0) has the
    form
  • Strength_Filter(z1,, zp ? x1,, xk) ?
    Support_Filter(z1,, zp ? x1,, xk)
  • symmetric vs asymmetric

7
UDA problem
  • Z1,, Zp ? X1,, Xk is a UDA if
  • Xi ?Xj?, i?j, and
  • Xi ?Zj?, i?j, and
  • UDA(Z1,, Zp ? X1,, Xk) is true.
  • The UDA problem to find all UDAs of the
    specified sizes 0ltk?k for a given UDA
    specification.

8
Association rules
  • Association rules Z? A introduced in AIS93 can
    be specified by
  • Support_Filter(z ? x) P(zx) ? mini_sup
  • Strength_Filter(z ? x) ?(z,x) ? mini_conf
  • where ?(z,x) P(xz) is the confidence.

9
Multiway correlation
  • Events X1,, Xk are correlated when they occur
    together more often than expected when they are
    independent.
  • ?(x1,, xk ) ?2(x1,, xk) ? ??2 BMS97, ? 5
  • Uncorrelation 1/ ?2(x1,, xk) ? 1/ ??2, ? 95

10
Conditional association
  • Z ? X1,, Xk
  • Strength_Filter(z ? x1,, xk)
  • INTL ? BUSINESS_TRIP ? CEO, FIRST_CLASS

11
Comparison association
  • Z1, Z2 ? X
  • Strength_Filter(z1, z2 ? x)
  • Dist(?j(z1,x), ?j(z2,x)) ? mini_strj
  • INTL ? BUSINESS_TRIP, PRIVATE_TRIP ? CEO ?
    FIRST_CLASS

12
Emerging association
  • Z1, Z2 ? X1,, Xk
  • Strength_Filter(z1, z2 ? x1,, xk )
  • Dist(?j(z1, x1,, xk ), ?j(z2, x1,, xk )) ?
    mini_strj

13
Causal association
  • CCC rule if events Z, X1, X2 are pairwise
    correlated, and if X1 and X2 are uncorrelated
    when conditioned on Z, one of the following
    causal relationships exists
  • X1 ? Z ? X2, X1 ? Z ? X2 , X1 ? Z ? X2
  • Strength_Filter(z ? x1 , x2)
  • ?1(?, x1 , x2) ? mini_str1 ? ?1(?, x1 , z) ?
    mini_str1 ?
  • ?1(?, x2 , z) ? mini_str1 ? ?2(z, x1 , x2) ?
    mini_str2

14
The specification language
  • ?j is a function of P(v) and v is a conjunction
    of any number of subject events Xi and zero or
    one context event Zj.
  • Supports P(v) are available from mining large
    events because X1Xk Zi is large.
  • Individual-context assumption(ICA)
  • v satisfies the ICA if v is a conjunction of zero
    or more terms of the form Xi and ?Xi, and zero or
    one term of the form Zj and ?Zj.

15
Implementation
  • Step 1 find all large events by applying Apriori
    or its variants.
  • Step 2 construct UDAs using large events.
  • In k-th iteration, a k-tuple (X1,, Xk,Zi ) is
    generated from two (k-1)-tuple (X1,, Xk-1,Xk )
    and (X1,, Xk-1,Zi )
  • After generating all k-tuples, we construct a
    candidate UDA Z1,, Zp ? X1,, Xk using p
    distinct tuples of the form (X1,, Xk,Zi ),
    i1,, p.

16
Experiments
  • census dataset 63 items, 126,229 transactions.
  • Each transaction represents one individual.
  • One item represents a boolean descriptor.

17
Experiments
Some multiway correlations found
Some conditional associations found
18
Experiments
Some comparison associations found
Some emerging associations found
19
Experiments
Some CCC causal associations found
Write a Comment
User Comments (0)
About PowerShow.com