A Framework for Projected Clustering of High - PowerPoint PPT Presentation

About This Presentation
Title:

A Framework for Projected Clustering of High

Description:

Proceedings of the 30th VLDB Conference, Toronto, Canada, 2004 ... The clustering structure properties called additivity and temporal multiplicity ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 16
Provided by: publi8
Category:

less

Transcript and Presenter's Notes

Title: A Framework for Projected Clustering of High


1
A Framework for Projected Clustering of High
Dimensional Data Streams
Proceedings of the 30th VLDB Conference, Toronto,
Canada, 2004
2
Motivation and Underlying Concepts
  • All dimensions should not be considered in high
    dimensional setup for clustering
  • The Fading Cluster Structure Use fading function
  • The half life t0 of a point is defined as the
    time at which f(t0) (12)f(0).
  • A fading cluster structure at time t for a set of
    d-dimensional points
  • The clustering structure properties called
    additivity and temporal multiplicity
  • The clustering process requires a simultaneous
    maintenance of the clusters as well as the set of
    dimensions associated with each cluster

3
HPStream High-Dimentional Projected Stream
Clustering Method
4
HPStream Algorithm Brief Explanation
-Set parameters -Normalization Process -Initial
Clustering using k-means and Init
Number -ComputeDimensions This procedure
determines the dimensions in such a way that the
spread along the chosen dimensions is as small as
possible -The next step is the determination of
the closest cluster to the incoming data point
using FindProjectedDist -The procedure for
determination of the limiting radius is denoted
by FindLimitingRadius -Finally decision which
cluster to add or delete.
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
Experimental Setup
HPStream compared with Clustream both
implemented on MS VC One synthetic data and 2
sets of Real world data - Network Intrusion and
Forest cover type data sets. Comparison criteria
for judging the 2 algorithms - accuracy
clustering quality - efficiency stream
processing rate - sensitivity varying decay
rate, l and radius threshold - scalability
varying number of dimensions and
clusters Parameters initialized as
following Decay-rate 05, Spread radius
factor 2, InitNumber 2000, Average
Projected Dimensionality l gt d/2.
9
Comparing Accuracy Using clustering quality and
cluster purity
10
Accuracy comparison continued
11
Accuracy comparison continued
12
Efficiency comparison using Stream Processing
Rate
13
Sensitivity Varying l
14
Sensitivity Varying radius threshold and decay
rate
15
Scalability varying dimensionality and number
of clusters
Write a Comment
User Comments (0)
About PowerShow.com