Project Presentation CPSC 695 - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Project Presentation CPSC 695

Description:

Project Presentation CPSC 695 Prepared By: Priyadarshi Bhattacharya Outline of Talk Introduction to clustering and its relevance to my research interests. – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 22
Provided by: pbha7
Category:

less

Transcript and Presenter's Notes

Title: Project Presentation CPSC 695


1
Project PresentationCPSC 695
  • Prepared By
  • Priyadarshi Bhattacharya

2
Outline of Talk
  • Introduction to clustering and its relevance to
    my research interests.
  • Discussion on existing clustering techniques and
    their shortcomings.
  • Introduction to a new Delaunay based clustering
    algorithm.
  • Experimental Results and comparison with other
    methods.
  • Direction of future research.

3
Clustering Definition
  • Automatic identification of groups of similar
    objects.
  • A method of grouping data such that intracluster
    similarity is maximized and intercluster
    similarity is minimized.

4
Properties of clustering
  • Scalability Clustering performance should
    decrease linearly with data size increase
  • Ability to detect clusters of different shapes
  • Minimal input parameter
  • Robust with regard to noise
  • Insensitive to data input order
  • Scalability to higher dimensions
  • (properties referred from On Data Clustering
    Analysis Scalability, Constraints and
    Validation with minor
  • modifications)

5
Relevance to my research
  • Identification of high-risk areas in the sea
    based on incident data from the Maritime Activity
    and Risk Investigation System (MARIS), maintained
    primarily by the University of Halifax.

Marine Route Planning
Clustering Algorithm
Incident Data
High-risk areas
(ESRI Shape File)
Location of SAR Bases
6
Existing clustering algorithms
Clustering
Partitioning
Hierarchical
Density-based
Grid-based
K-Means, K-Medoid
BIRCH, CURE, ROCK, CHAMELEON
DBSCAN, TURN
WaveCluster1, CLIQUE
1WaveCluster A novel clustering approach based
on wavelet transforms. Applies a multi-resolution
grid structure on the data space. For more
details, refer to Wavecluster a
multi-resolution clustering approach for very
large spatial databases, Proc. 24th Conf. on
Very Large Databases.
7
Shortcomings of existing methods
  • Require large number of parameters to be input by
    user. Example number of clusters, threshold to
    quantify similarity, stopping condition, number
    of nearest neighbors etc.
  • Sensitivity to user-supplied parameters.
  • Capability of identifying clusters degrades with
    increase in noise.
  • Inability to identify clusters of widely varying
    shapes and sizes. Most detect spherical ones
    only.
  • Identification of dense clusters in presence of
    sparse ones, clusters connected by multiple
    bridges, closely lying dense clusters remains
    elusive.

8
CRYSTAL A new Delaunay based clustering
algorithm
  • The algorithm has 3 stages
  • Triangulation phase Forms the Delaunay
    Triangulation of the data points and sorts the
    vertices in the order of decreasing average
    length of adjacent edges.
  • Grow cluster phase Scans the sorted vertex list
    and grows clusters from the vertices in that
    order, first encompassing first order neighbors,
    then second order neighbors and so on. The growth
    stops when the boundary of the cluster is
    determined.
  • Noise removal phase The algorithm identifies
    noise as sparse clusters. They can be easily
    eliminated by removing clusters which are very
    small in size or which have a very low density.

9
Description of stage I
  • Triangulation phase
  • Triangulation is done in O(nlogn) time using
    the incremental
  • algorithm.
  • An auxiliary grid structure (O(n) in size) is
    used to speed up
  • the point location problem in the
    Delaunay Triangulation.
  • This considerably reduces length of walk
    in the graph to
  • locate the triangle containing the data
    point.
  • The well-known Winged-Edge data-structure is
    used to
  • represent the Delaunay Triangulation
    because of its
  • efficiency in answering proximity
    queries.

10
Description of Stage II
  • Grow Cluster phase
  • A queue is used to maintain a list of
    vertices in order, from which the cluster is
    grown. Only vertices that are not boundary points
    are inserted into the queue.
  • To decide whether a point belongs to the
    cluster, the edge length is compared with the
    average edge length of the cluster. To decide
    whether a point is on the boundary of a cluster,
    the average adjacent edge length of the point is
    compared to the average edge length of the
    cluster.

11
Description of Stage III
  • Noise Removal Phase
  • Noise in the data may be in the form of
    isolated data points or scattered throughout the
    data. In the former case,
  • cluster based at these data points will not
    be able to grow.
  • However, if the noise is scattered uniformly
    throughout the data, our algorithm identifies it
    as a single sparse cluster. This phase simply
    gets rid of noise by eliminating the cluster with
    the highest average edge length. Also any trivial
    clusters (size less than an acceptable number)
    are removed in this phase.

12
Complexity Analysis
  • The algorithm operates in O(nlogn) time.
  • Delaunay Triangulation is generated in
    O(nlogn) time. As a vertex once assigned to a
    cluster is not considered again, the clustering
    is done in O(n) time.

Cluster size (1000) Vs Time consumed (ms)
13
Clustering in action
14
Experimental Results
  • Comparison with K-Means based approaches

15
Experimental Results (contd.)
1. Clusters of different shapes 2. Closely
lying dense clusters
16
Experimental Results (contd.)
1. Clusters connected by multiple bridges
2. Clusters of widely varying density
17
Experimental Results (contd.)
Data set
K-Means
GEM
CRYSTAL
18
Experimental Results (contd.)
Results on t7.10k.dat (originally used in
CHAMELEON A Hierarchical Clustering Algorithm
Using Dynamic Modeling)
19
Conclusion Future Work
  • CRYSTAL is a fast O(nlogn) clustering
    algorithm that
  • automatically identifies clusters of widely
    varying shapes, sizes
  • and densities without requiring any input from
    user.
  • Future work will involve
  • Application of the clustering algorithm in
    identification of high-risk areas in the sea
    using the MARIS database.
  • Extension of the algorithm to 3D.
  • Considering physical constraints in clustering.
    In GIS, physical constraints such as rivers,
    highways, mountain ranges can hinder or alter the
    clustering result.

20
References
  • G. Papari, N. Petkov Algorithm That Mimics Human
    Perceptual Grouping of Dot Patterns. Lecture
    Notes in Computer Science (2005) 497-506
  • Vladimir Estivill-Castro, Ickjai Lee AUTOCLUST
    Automatic Clustering via Boundary Extraction for
    Mining Massive Point-Data Sets. Fifth
    International Conference on Geocomputation (2000)
  • Osmar R. Zaiane, Andrew Foss, Chi-Hoon Lee,
    Weinan Wang
  • On Data Clustering Analysis Scalability,
    Constraints and Validation.
  • Advances in Knowledge Discovery and Data
    Mining, Springer-Verlag (2002 )
  • Z.S.H. Chan, N. Kasabov Efficient global
    clustering using the Greedy Elimination Method.
  • Electronics Letters 40 25 (2004 )
  • Aristidis Likas, Nikos Vlassis, Jakob J. Verbeek
    The global k-means clustering algorithm.
  • Pattern Recognition 36 2 (2003 ) 451-461
  • Ying Xu, Victor Olman, Dong Xu Minimum Spanning
    Trees for Gene Expression Data Clustering.
    Computational Protein Structure Group, Life
    Sciences Division, Oak Ridge National Laboratory,
    USA
  • C. Eldershaw, M. Hegland Cluster Analysis using
    Triangulation. Computational Techniques and
    Applications CTAC97, 201-208. World Scientific,
    Singapore, 1997
  • Mir Abolfazl Mostafavi, Christopher Gold, Maciej
    Dakowicz Delete and insert operations in
    Voronoi/Delaunay methods and applications.
    Computers \ Geosciences 29 4 523-530 (2003)
  • Atsuyuki Okabe, Barry Boots, Kokichi Sugihara
    Spatial Tessellations Concepts and Applications
    of Voronoi Diagrams.

21
Thank You!
All 11 identified by CRYSTAL! Questions?
Write a Comment
User Comments (0)
About PowerShow.com