Association Rule Clustering - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Association Rule Clustering

Description:

172,000 cash register transactions. 2831 frequent item sets. 4782 association rules ... dab= Cab - 0.5(Aa Ab) Cab= 1/(nanb) j=1 to na j=1 to nb dij ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 28

Provided by: NoN560

Category:

more less

Transcript and Presenter's Notes

Title: Association Rule Clustering

1
Association Rule Clustering

Term Project Presentation 04-May-1999
Gunjan K. Gupta
Alexander Strehl

2
Presentation Overview

Introduction to Data Mining and Association Rules
Association Rule Clustering
Motivation
Approach
Distance Metrics Based on
Rule Features
Original Transactions
Transaction Probabilities
Clustering Using
Agglomerative Clustering using Chaining
Multi-dimensional Scaling and SOM
Results on Real Data
Intuitive Results of Each Clustering Technique
Quantitative Comparison of Techniques

3
Data Mining on Relational DBS

Taxonomy
Item (Attribute) I
Set of all Items I (Relation Schema) R
Transaction (Row) t
Database of all Transactions t (Relation) r
Particular Item-set X
All Transactions t Containing X m(X)

4
Association Rules

Obtained Using A Priori Algorithm
Left-hand-side (LHS)
Right-hand-side (RHS)
Both Sides (BS)
Support
Confidence

5
Clustering Association Rules

Scenario
We use data provided by Knowledge Discovery One
Discovering association rules is a standard and
very popular data mining technique
The set of rules discovered may be very large (gt
10000)
Problem
Too many to rules for (manual) user
interpretation
Clustering can help in browsing, visualizing,
ordering, pruning, merging of rules
There is no intuitive distance metric for rules
Approach
Rules are similar when they hold in similar
settings (such as transactions or customers)
Find good distance metrics and clustering
techniques

6
Direct Distance Metrics

Association Rule Features
Confidence
Support
Lift
LHS, RHS, BS bit-vectors
LHS, RHS, BS counts
Domain specific features such as average
revenue/margin per covered transaction
Distance Defined in Terms of Features
Bit-vector Hamming is too coarse (discrete)
Most others have little relevance
Do not capture the interaction on the data

7
Indirect Distance Metrics

Rules About Same Items are Considered Equal
Transaction Distance
Database size dependent, strong correlation to
support and confidence
Conditional Probability Distance

8
Good Neighbors

Rules at an Interesting Distance are Good
Neighbors
Distance 0 apply to the same transactions
Distance 1 no common occurrence
Interesting distances are neither 0 nor 1
Subset relationship in rules item-sets
Meta-association rules
Histogram for 50 bins (highest with 1650000)

9
Results

Home Improvement Data Set
172,000 cash register transactions
2831 frequent item sets
4782 association rules
1311 association rules after clustering by BS
Distance Statistics
1311x1311 matrix
Median distance is 1
Mean distance is 0.9943 (0.9992 maximum)
92.97 of distances are 1 (99.92 maximum)
Most rules have between 1 and 20 good neighbors

10
DiVis Examples I

Transaction Distance vs. Conditional
Probability Distance
Rule 738 (278533)
LIGHTING-FIXTURES-FLUORES.-UTILITY
LIGHT-BULBS-INCANDESCENT
Rule 606 (277617)
LIGHTING-FIXTURES-FLUORES.-UTILITY
LIGHT-BULBS-FLUORESCENT

11
DiVis Examples II
Rule 761 has 273 (20.8238) good neighbors 17050
ELECTRICAL-OUTLETS/SWITCHES 17140
ELECTRICAL-FITTINGS Rule 282 17050
ELECTRICAL-OUTLETS/SWITCHES 17140
ELECTRICAL-FITTINGS 17090 ELECTRICAL-BOXES--COVER
S
Rule 144 has 130 (9.9161) good neighbors 18408
FASTENERS---PACKAGED 17090 ELECTRICAL-BOXES--COVE
RS Rule 330 17030 ELECTRICAL-WALL-PLATES 18408
FASTENERS---PACKAGED 17090 ELECTRICAL-BOXES--COVE
RS
12
Agglomerative Clustering

A Hierarchical Clustering Technique which
generates a tree structure.
Many different Variations Available.
Chaining O (N2)
Single Link O (N2)
Vertical Length of Branches defined resulting in
a height information for clusters.
Height represents cluster compactness
Splitting along a particular height results in
clusters of approximately equal compactness.
Multiple resolutions of clusters available.

13
Agglomerative Chaining

Centroid used as cluster center.
Centroids of clusters at level N used for forming
clusters of level N1
Nearest neighbor linking.
Merging two clusters with a link between them.
Unique clusters at each level.
Cluster Width used as the height of the node.
Cluster width increases when the clusters are
merged to the next level.
An example -

14
Agglomerative Chaining continued...
15
Centroid estimation without the coordinates.

Generic equation for a center

dk(i,j) ?idki ?jdki ?d(ij) ?dki- dkj
For centroid, ?i ni /(ni nj), ?j nj /(ni
nj), ? - ?i ?j, ? 0
For three points -
Centroidk(i,j) 0.5(dki dkj) - 0.25 d(ij)
16
Centroid estimation continued ...

Approximation of the above and applying it to our
problem, we can estimate distances from the
centroid of clusters at level N1 using centroid
distances of level N

dab Cab - 0.5(Aa Ab)
Cab 1/(nanb) ? j1 to na ? j1 to nb dij
Aa 1/sqr(na) ? j1 to na ? j1 to na dij
Ab 1/sqr(nb) ? i1 to nb ? j1 to nb dij

As we can see for two identical (superimposed)
clusters the distance is zero.
An example -

17
Performance of the centroid distance estimator on
simulation
One example of the runs, on 10 points (see Slide
14).

Order preserved.
Zero distance for distance of the cluster on
itself.
Almost identical clusters for upto 100 points.
Centroid estimator works better for higher
dimensions.
Clustering results 99 identical even for 1000
points.

18
Tree Splitting for Agglomerative Clustering

Height of a node in a tree defined as the average
cluster width. Variations like minimum or maximum
possible.
Height increases for successive levels, but not
equally for all clusters.
Splitting at any height gives the final clusters.
Useful Splitting points are only the unique
heights for all the all the nodes together.
Useful splitting points calculated and stored in
an array.
For a given number of clusters K required, the
best split resulting in closest to K clusters
returned without compromising on cluster quality.
Splitting closer to root results in low
resolution, near leaf results in high resolution.

19
Agglomerative Tree results on a small rule set of
genuine market data -

19 major categories for 1311 itemsets at the top
level.
953 clusters at the lowest level of split.
289 unique splitting points in between.
Many good clusters by visual inspection with one
kind of rule class in it.
Some examples - painting raw-material,
construction tools, electrical tools, electrical
supply, christmas-tree construction material,
plumbing-materials.
Sepration of rules into categories helps in
visualization.

20
SOM Training with Distance Matrix

Mutidimensional scaling Conversion of the NXN
distance matrix into NxM vector input space (N
points of M dimension each).
Each M dimensional point input to a SOM.
Choosing a 2-D SOM grid of KxK size and mapping
the M points onto the SOM.

Mutli-Dimensional Scaling

From matrix M, find the first N Singular Value
Decompositions. They represent the N most
important dimensions.
Choose the first L dimensions.
Regenerate the distance matrix M.
Calculate an error between M M. If the error
less than threshold, stop else try for higher
dimensions.

21
Mutli-Dimensional Scaling Continued..

Error Measure - Stress

Stress (M- M).2/M.2

For simulated N-d data, the Stress drops
exponentially and goes suddenly very close to 0
for LN
For our data, Stress drops below 5 only for a
very large value of L.
Shorted Binary Search finds L750 in a 3 steps.
Binary search finds 757 in more than the double
the steps.

Stress
2.3 error, 750 dimensions
No. of dimensions
22
Combining SOM Training Results with Agglomerative
Clustering

Overlap might be taken as a sign of good clusters
since SOM and Agglomerative clustering should
have different bias.
715 clusters for one splitting of Agglomerative
Clustering tree.
Training 1000 points, 750 dimensions.
1000 Input points mapped to a 2-D SOM of 27x27
(729 output classes) would give best comparison.
Preliminary comparison for 10x10 SOM.
40,000 epochs.
Coloring SOM classified points with class labels
of Agglomerative clustering provides easy
visualization. See example -

23
SOM Training Results, 1000 to 100 mapping
24
A randomly picked cluster ..
Rule 278932 (278932) 16846,PAINT BRUSHES (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 278899
(278899) 12328,TRIM-A-TREE-ORNAMENTS-GLASS
SATIN (new) 12331,TRIM-A-TREE-IMPORT THEME
ORNAMENTS (new) 12336,TRIM-A-TREE-MISC.
CHRISTMAS ITEMS (new) Rule 278963 (278963)
17131,ELECTRICAL-WIRE/CABLE NM/UF RET CTN (new)
17090,ELECTRICAL-BOXES COVERS (new)
17152,ELECTRICAL-CONNECTORS/TERMINALS (new)
17050,ELECTRICAL-OUTLETS/SWITCHES (new)
25
A Cluster from Agglomerative Clustering
Rule 278932 (278932) 16846,PAINT BRUSHES (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 278773
(278773) 16846,PAINT BRUSHES (new)
16871,PAINTING ACC. - SHURLINE (new)
16730,PAINT-MISC SUNDRY ITEMS (new) Rule 277258
(277258) 16840,PAINTING ACCESSORY ITEMS (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 277269
(277269) 16871,PAINTING ACC. - SHURLINE (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 277433
(277433) 16871,PAINTING ACC. - SHURLINE (new)
16730,PAINT-MISC SUNDRY ITEMS (new) Rule 277435
(277435) ... and so on.(insufficient space to
show here)..
26
A Cluster (3,7) from SOM results
27
Conclusions and Future Work

Conclusions
Sparse and high-dimensional data
Future Work
Complexity and scalability issues
Sub-sampling for distance computation
Merge similar rules
Incorporate meta-data for validation or to
support clustering and merging
Explore other distance measures (log-likelihood
functions instead of probabilities)
More work on SOM coloring - use hierarchical
coloring. Also interactive visualization of rules
in SOM result.
Reference
Cluster Analysis, Brian Everitt
Applied Multivariate Statistical Analysis,
Johnson Wichern