Title: Association Rule Clustering
1Association Rule Clustering
- Term Project Presentation 04-May-1999
- Gunjan K. Gupta
- Alexander Strehl
2Presentation Overview
- Introduction to Data Mining and Association Rules
- Association Rule Clustering
- Motivation
- Approach
- Distance Metrics Based on
- Rule Features
- Original Transactions
- Transaction Probabilities
- Clustering Using
- Agglomerative Clustering using Chaining
- Multi-dimensional Scaling and SOM
- Results on Real Data
- Intuitive Results of Each Clustering Technique
- Quantitative Comparison of Techniques
3Data Mining on Relational DBS
- Taxonomy
- Item (Attribute) I
- Set of all Items I (Relation Schema) R
- Transaction (Row) t
- Database of all Transactions t (Relation) r
- Particular Item-set X
- All Transactions t Containing X m(X)
4Association Rules
- Obtained Using A Priori Algorithm
- Left-hand-side (LHS)
- Right-hand-side (RHS)
- Both Sides (BS)
- Support
- Confidence
5Clustering Association Rules
- Scenario
- We use data provided by Knowledge Discovery One
- Discovering association rules is a standard and
very popular data mining technique - The set of rules discovered may be very large (gt
10000) - Problem
- Too many to rules for (manual) user
interpretation - Clustering can help in browsing, visualizing,
ordering, pruning, merging of rules - There is no intuitive distance metric for rules
- Approach
- Rules are similar when they hold in similar
settings (such as transactions or customers) - Find good distance metrics and clustering
techniques
6Direct Distance Metrics
- Association Rule Features
- Confidence
- Support
- Lift
- LHS, RHS, BS bit-vectors
- LHS, RHS, BS counts
- Domain specific features such as average
revenue/margin per covered transaction - Distance Defined in Terms of Features
- Bit-vector Hamming is too coarse (discrete)
- Most others have little relevance
- Do not capture the interaction on the data
7Indirect Distance Metrics
- Rules About Same Items are Considered Equal
- Transaction Distance
- Database size dependent, strong correlation to
support and confidence - Conditional Probability Distance
8Good Neighbors
- Rules at an Interesting Distance are Good
Neighbors - Distance 0 apply to the same transactions
- Distance 1 no common occurrence
- Interesting distances are neither 0 nor 1
- Subset relationship in rules item-sets
- Meta-association rules
- Histogram for 50 bins (highest with 1650000)
9Results
- Home Improvement Data Set
- 172,000 cash register transactions
- 2831 frequent item sets
- 4782 association rules
- 1311 association rules after clustering by BS
- Distance Statistics
- 1311x1311 matrix
- Median distance is 1
- Mean distance is 0.9943 (0.9992 maximum)
- 92.97 of distances are 1 (99.92 maximum)
- Most rules have between 1 and 20 good neighbors
10DiVis Examples I
- Transaction Distance vs. Conditional
Probability Distance - Rule 738 (278533)
- LIGHTING-FIXTURES-FLUORES.-UTILITY
- LIGHT-BULBS-INCANDESCENT
- Rule 606 (277617)
- LIGHTING-FIXTURES-FLUORES.-UTILITY
- LIGHT-BULBS-FLUORESCENT
11DiVis Examples II
Rule 761 has 273 (20.8238) good neighbors 17050
ELECTRICAL-OUTLETS/SWITCHES 17140
ELECTRICAL-FITTINGS Rule 282 17050
ELECTRICAL-OUTLETS/SWITCHES 17140
ELECTRICAL-FITTINGS 17090 ELECTRICAL-BOXES--COVER
S
Rule 144 has 130 (9.9161) good neighbors 18408
FASTENERS---PACKAGED 17090 ELECTRICAL-BOXES--COVE
RS Rule 330 17030 ELECTRICAL-WALL-PLATES 18408
FASTENERS---PACKAGED 17090 ELECTRICAL-BOXES--COVE
RS
12Agglomerative Clustering
- A Hierarchical Clustering Technique which
generates a tree structure. - Many different Variations Available.
- Chaining O (N2)
- Single Link O (N2)
- Vertical Length of Branches defined resulting in
a height information for clusters. - Height represents cluster compactness
- Splitting along a particular height results in
clusters of approximately equal compactness. - Multiple resolutions of clusters available.
13Agglomerative Chaining
- Centroid used as cluster center.
- Centroids of clusters at level N used for forming
clusters of level N1 - Nearest neighbor linking.
- Merging two clusters with a link between them.
- Unique clusters at each level.
- Cluster Width used as the height of the node.
- Cluster width increases when the clusters are
merged to the next level. - An example -
14Agglomerative Chaining continued...
15Centroid estimation without the coordinates.
- Generic equation for a center
dk(i,j) ?idki ?jdki ?d(ij) ?dki- dkj
For centroid, ?i ni /(ni nj), ?j nj /(ni
nj), ? - ?i ?j, ? 0
For three points -
Centroidk(i,j) 0.5(dki dkj) - 0.25 d(ij)
16Centroid estimation continued ...
- Approximation of the above and applying it to our
problem, we can estimate distances from the
centroid of clusters at level N1 using centroid
distances of level N
dab Cab - 0.5(Aa Ab)
Cab 1/(nanb) ? j1 to na ? j1 to nb dij
Aa 1/sqr(na) ? j1 to na ? j1 to na dij
Ab 1/sqr(nb) ? i1 to nb ? j1 to nb dij
- As we can see for two identical (superimposed)
clusters the distance is zero. - An example -
17Performance of the centroid distance estimator on
simulation
One example of the runs, on 10 points (see Slide
14).
- Order preserved.
- Zero distance for distance of the cluster on
itself. - Almost identical clusters for upto 100 points.
- Centroid estimator works better for higher
dimensions. - Clustering results 99 identical even for 1000
points.
18Tree Splitting for Agglomerative Clustering
- Height of a node in a tree defined as the average
cluster width. Variations like minimum or maximum
possible. - Height increases for successive levels, but not
equally for all clusters. - Splitting at any height gives the final clusters.
- Useful Splitting points are only the unique
heights for all the all the nodes together. - Useful splitting points calculated and stored in
an array. - For a given number of clusters K required, the
best split resulting in closest to K clusters
returned without compromising on cluster quality. - Splitting closer to root results in low
resolution, near leaf results in high resolution.
19Agglomerative Tree results on a small rule set of
genuine market data -
- 19 major categories for 1311 itemsets at the top
level. - 953 clusters at the lowest level of split.
- 289 unique splitting points in between.
- Many good clusters by visual inspection with one
kind of rule class in it. - Some examples - painting raw-material,
construction tools, electrical tools, electrical
supply, christmas-tree construction material,
plumbing-materials. - Sepration of rules into categories helps in
visualization.
20SOM Training with Distance Matrix
- Mutidimensional scaling Conversion of the NXN
distance matrix into NxM vector input space (N
points of M dimension each). - Each M dimensional point input to a SOM.
- Choosing a 2-D SOM grid of KxK size and mapping
the M points onto the SOM.
Mutli-Dimensional Scaling
- From matrix M, find the first N Singular Value
Decompositions. They represent the N most
important dimensions. - Choose the first L dimensions.
- Regenerate the distance matrix M.
- Calculate an error between M M. If the error
less than threshold, stop else try for higher
dimensions.
21Mutli-Dimensional Scaling Continued..
Stress (M- M).2/M.2
- For simulated N-d data, the Stress drops
exponentially and goes suddenly very close to 0
for LN - For our data, Stress drops below 5 only for a
very large value of L. - Shorted Binary Search finds L750 in a 3 steps.
Binary search finds 757 in more than the double
the steps.
Stress
2.3 error, 750 dimensions
No. of dimensions
22Combining SOM Training Results with Agglomerative
Clustering
- Overlap might be taken as a sign of good clusters
since SOM and Agglomerative clustering should
have different bias. - 715 clusters for one splitting of Agglomerative
Clustering tree. - Training 1000 points, 750 dimensions.
- 1000 Input points mapped to a 2-D SOM of 27x27
(729 output classes) would give best comparison. - Preliminary comparison for 10x10 SOM.
- 40,000 epochs.
- Coloring SOM classified points with class labels
of Agglomerative clustering provides easy
visualization. See example -
23SOM Training Results, 1000 to 100 mapping
24A randomly picked cluster ..
Rule 278932 (278932) 16846,PAINT BRUSHES (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 278899
(278899) 12328,TRIM-A-TREE-ORNAMENTS-GLASS
SATIN (new) 12331,TRIM-A-TREE-IMPORT THEME
ORNAMENTS (new) 12336,TRIM-A-TREE-MISC.
CHRISTMAS ITEMS (new) Rule 278963 (278963)
17131,ELECTRICAL-WIRE/CABLE NM/UF RET CTN (new)
17090,ELECTRICAL-BOXES COVERS (new)
17152,ELECTRICAL-CONNECTORS/TERMINALS (new)
17050,ELECTRICAL-OUTLETS/SWITCHES (new)
25A Cluster from Agglomerative Clustering
Rule 278932 (278932) 16846,PAINT BRUSHES (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 278773
(278773) 16846,PAINT BRUSHES (new)
16871,PAINTING ACC. - SHURLINE (new)
16730,PAINT-MISC SUNDRY ITEMS (new) Rule 277258
(277258) 16840,PAINTING ACCESSORY ITEMS (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 277269
(277269) 16871,PAINTING ACC. - SHURLINE (new)
16926,PAINT-INTERIOR-ONE ONLY (new)
16844,PAINTING DROPCLOTHS (new) Rule 277433
(277433) 16871,PAINTING ACC. - SHURLINE (new)
16730,PAINT-MISC SUNDRY ITEMS (new) Rule 277435
(277435) ... and so on.(insufficient space to
show here)..
26A Cluster (3,7) from SOM results
27Conclusions and Future Work
- Conclusions
- Sparse and high-dimensional data
- Future Work
- Complexity and scalability issues
- Sub-sampling for distance computation
- Merge similar rules
- Incorporate meta-data for validation or to
support clustering and merging - Explore other distance measures (log-likelihood
functions instead of probabilities) - More work on SOM coloring - use hierarchical
coloring. Also interactive visualization of rules
in SOM result. - Reference
- Cluster Analysis, Brian Everitt
- Applied Multivariate Statistical Analysis,
Johnson Wichern