Clustering of Uncertain data objects by Voronoi-diagram-based approach - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering of Uncertain data objects by Voronoi-diagram-based approach

Description:

Clustering of Uncertain data objects by Voronoi-diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU Presentation Outline Introduction concept of ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 43
Provided by: hku45
Category:

less

Transcript and Presenter's Notes

Title: Clustering of Uncertain data objects by Voronoi-diagram-based approach


1
Clustering of Uncertain data objects by
Voronoi-diagram-based approach
  • Speaker Chan Kai Fong, Paul
  • Dept of CS, HKU

2
Presentation Outline
  • Introduction
  • concept of clustering, clustering of uncertain
    objects
  • Example Application of clustering on uncertain
    data
  • UK-means algorithm
  • Motivation
  • Voronoi-diagram-based (VD) clustering
  • MinMax-based (MM) clustering
  • VD is strictly better than MinMax
  • Clustering algorithms
  • VDBi, VDBiP, VD based methods with Cluster Shift
  • When VD based methods are better than MM based
    methods?
  • Experiments
  • Conclusion

3
Introduction
4
Introduction
  • Clustering
  • Group similar data objects together to form
    clusters
  • Partition-based clustering
  • Input of clusters (k), of objects (n)
  • Iterative method
  • In each iteration, divide n data objects into k
    groups to minimize an objective function
  • e.g., minimize the sum of squares of distances
  • Stop when the results are converged

5
Introduction
  • To cluster the data points in 2D space
  • Data objects n data points
  • Apply any partition-based clustering algorithms
    (K-means)
  • Distance measure Euclidean distance, Manhattan
    distance, etc.

6
Introduction
  • To cluster the uncertain objects in 2D space
  • Uncertain objects objects with uncertainty (e.g.
    location uncertainty)
  • No fixed coordinates in 2D space
  • Objects location is estimated by using a
    probability density function (pdf) over an
    uncertainty region
  • Assume the pdf for each object can be obtained
  • Uncertainty region (ur) a region that the object
    may appear, with a certain probability
    distribution and the probability of the objects
    appear outside the uncertainty region is zero
  • Each object may have an irregular uncertainty
    region, also the pdf could be arbitrary

MBR of o1.ur
o1.ur
7
Expected distance computation
  • The expected distance (ED) is used to measure the
    distance between uncertain object and cluster
    representative.
  • ED is the expected distance function, d is
    Euclidean distance function, x is any point
    inside ois uncertainty region, f is the pdf of
    uncertain objects oi, and pj is any cluster
    representatives.
  • ED computations are very expensive, in each
    iteration of K-means, nk ED computations are
    required.

8
Application Clustering the vehicles
  • Objective get traffic patterns by clustering
    vehicles in a city
  • Data objects vehicles on a 2D map
  • Uncertainty location uncertainty of the
    vehicles, each pdf defined over objects
    uncertainty region represent the probability
    distribution of possible location of a vehicle in
    a certain period of time

9
  • Degree of uncertainty is affected by the
    following factors,
  • Time
  • Traffic of the roads
  • Shape of the roads
  • Speed of the vehicles

10
Results
11
UK-means
  • UK-means first extension of K-means algorithm to
    handle uncertain objects
  • Distance measure Expected distance (ED)
  • Disadvantage Slow and inefficient
  • Show the possibility of using K-means to handle
    the clustering of uncertain objects

12
Two Approaches to solve clustering problem by
UK-means
  • MinMax-based approach (Jacky)
  • Voronoi-Diagram-based approach (Paul)

13
Motivation
14
Two Approaches to solve clustering problem by
UK-means
  • MinMax-based approach (Jacky)
  • Basic MinMax distance pruning (MinMax)
  • MinMax with pre-computation of ED
  • MinMax with Cluster Shift (MinMax-Shift)
  • Voronoi-Diagram-based approach (Paul)
  • Voronoi diagram with Bisector Pruning (VDBi)
  • Voronoi diagram with Bisector Pruning and Partial
    expected distance computations (VDBiP)
  • Voronoi diagram with Bisector Pruning and Cluster
    Shift (VDBi-Shift)
  • Voronoi diagram with Bisector Pruning and Partial
    expected distance computations and Cluster Shift
    (VDBiP-Shift)

15
MinMax-based Approach
  • UK-means with MinMax distance pruning
  • Objective avoid expected distance computation
  • using mindist and maxdist between objects MBR
    and cluster representatives to represent the
    distance bounds of ED(cj, oi) ED(cm, oi)
  • E.g., given an object oi , cluster rep cj and cm
    ,
  • if mindist(cj, oi) gt maxdist (cm , oi) then cj
    can be pruned

ED(cj,oi) need not be calculated.
(pruned) ED(cj,oi) gt ED(cm,oi) ? prune cj
16
MinMax-based Approach
  • Upper and lower bounds can become tighter by
    using Cluster Shift (CS) and ED Pre-computation
    (PC) methods
  • Replace mindist and maxdist loose estimation by
    tighter estimations on distance bounds
  • Details refer to Jackys works

17
Voronoi-diagram-based approach
Uncertain object o1 indexed by R-tree
Voronoi diagram for 5 cluster representatives
  • Each objects uncertainty region is bounded by
    its minimum bounding rectangle (MBR)
  • The objects MBRs are indexed by R-tree
  • Voronoi diagram is constructed for the cluster
    representatives in each iteration

18
Voronoi-diagram-based approach
  • If the bisector of two cluster representatives do
    not cut an objects MBR, and fall in p2 side of
    the bisector, then
  • ED(p1,o1) gt ED(p2, o1)

19
Voronoi-diagram-based approach(Cluster
Assignment)
ED(o1, p2) lt ED(o1, p1) and ED(o1,p2) lt ED(o1,
p3) o1 is assigned to cluster p2.
20
Voronoi-diagram-based approach
object enclosed entirely in Voronoi cell
get candidate objects for the cluster
object that intersect with more than one Voronoi
cell
  • In each iteration,
  • For each Voronoi cell, (approximated by a MBR)
    issue a range queries to objects R-tree retrieve
    the candidates objects for the cluster
  • If the candidates MBR is completely enclosed in
    the Voronoi cell, assign the object to the
    cluster
  • If the candidates MBR intersect with more than
    one Voronoi cells, special handling methods
    required for the objects to prune away the
    unqualified clusters

21
Advantages of using Voronoi-diagram-based
clustering
  • Avoid expected distance computation
  • If the object is completely enclosed in a Voronoi
    cell, then the object must belong to this cluster
  • For the best case, we do not need any expensive
    expected distance calculations, and we do not
    need to retrieve the objects pdf during the
    clustering

22
Advantages of using Voronoi-diagram-based
clustering
  • Voronoi diagram construction cost is independent
    of number of objects
  • We only need O(k log k) time to compute the 2D
    Voronoi diagram in each iteration, where k is the
    number of clusters, and k is not depend on number
    of objects
  • n is much larger than k

23
Difficulties of Voronoi based clustering
o1
c1
  • Handling of uncertain objects that intersect with
    more than one Voronoi cells
  • We cannot determine the nearest clusters by just
    looking at the Voronoi diagram

24
Is VD better than basic MinMax?
  • Theorem
  • VD is strictly better than basic MinMax
  • Given an object oi that is assigned to cluster
    c1, for any iteration in UK-means, if VD
    calculates ED(oi, cp) for some cp, then MM must
    calculate ED(oi, cp) as well.
  • If VD does not calculate ED(oi, cp), sometimes MM
    must calculate ED(oi,cp).

25
In some situations, VD based is better
  • VD based methods is always better than basic
    MinMax, but VD based methods may not beat
    MinMax-Shift
  • In some situations, VD based methods outperform
    all MM based methods
  • when the object uncertainty are very small, then
    VD based methods are preferred

26
Clustering algorithms
27
Clustering Methods
  • Voronoi-diagram-based approach
  • Voronoi diagram with bisector pruning (VDBi)
  • Voronoi diagram with bisector pruning and partial
    expected distance computation (VDBiP)

28
MinMax-based Methods
  • For each object,
  • Find out the upper and lower bounds of ED values
  • if Cluster-Shift (CS) method is not enabled,
    upper and lower bounds is estimated by maxdist
    and mindist respectively (MinMax)
  • if CS method is enabled, then upper and lower
    bounds become tighter (MinMax-Shift)
  • Prune unwanted clusters by upper and lower bounds
  • For all un-pruned cluster compute the ED values
    to determine the cluster assignment of the object

29
Voronoi-diagram-based Methods
  • Before each iteration, Voronoi diagram is
    constructed for all cluster representatives
  • For each cluster representative,
  • Find out the objects which completely enclosed in
    the clusters Voronoi cell
  • Apply bisector pruning to prune unrelated
    clusters

30
Voronoi diagram with Bisector Pruning (VDBi)
o1
c1
Comparing c1 and c3, o1 fall into c1 side of the
bisector(c1,c3), then c3 can be pruned. Since
bisector of c1 and c2 cut o1s MBR, o1 may
assigned to either c1 or c2.
31
Voronoi diagram with bisector pruning and partial
expected distance computation (VDBiP)
  • Cut the objects MBR input two equal halves (a)
    and (b)

32
VDBiP
  • If o1(b)s MBR is completely enclosed in Voronoi
    cell of c2
  • Compute ED(o1(a) , c1) ED(o1(a) , c2)
  • Since ED(o1(b), c2) lt ED(o1(b), c1)
  • If ED(o1(a) , c2) lt ED(o1(a) , c1) then
  • ED(o1(a) , c2) ED(o1(b) , c2) lt ED(o1(a), c1)
    ED(o1(b) , c1)
  • gt prune c1

33
Experiments
34
Experiments
  • Measures
  • Efficiency (Expected distance computation
    required)
  • Comparison with
  • Basic Min-max distance pruning (MinMax)
  • Voronoi diagram with Bisector Pruning (VDBi)
  • Voronoi diagram with Bisector Pruning and Partial
    expected distance computation (VDBiP)
  • MM-based with Cluster Shift (MinMax-Shift)
  • VD-based with Cluster Shift (VDBi-Shift
    ,VDBiP-Shift)

35
Experimental Settings
Data set randomly generated synthetic data set
Probability density function random
Domain 100 x 100 2D space
Number of objects 10000
Number of clusters vary
Maximum length of an MBRs side 10, 1, 0.1
Number of sample points 20 20
36
Degree of uncertainty is large (MBR width 10)
  1. VDBi perform slight better than basic MinMax only
  2. Cluster shift method greatly improve basic MinMax
    and VDBi performance

37
Degree of uncertainty is small (MBR width 1)
  1. Cluster shift method cannot greatly improve the
    performance of MinMax
  2. VD-based approach outperform MM-based approach
  1. VD-based approach still better than MM-based
    approach, but VD perform slightly better if there
    are less clusters

38
Degree of uncertainty is very small (MBR width
0.1)
39
Performance analysis
Algorithms Description
MinMax the worst one
MinMax-Shift Good when object is large
VDBi Good when object is small
VDBi-Shift Good at all cases, outperform MinMax-based method
VDBiP better than VDBi, perform well when MBR width is small
VDBiP-Shift Further improvement to VDBiP
40
Performance Analysis
  • Basic MinMax performance is bad, because of the
    loose upper and lower bound estimation by maxdist
    and mindist.
  • When degree of uncertainty of an object are
    small, MinMax with cluster shift (improved
    distance bounds) method cannot greatly improve
    the tightness of distance bounds, since mindist
    and maxdist is accurate enough
  • MinMax-Shifts performance is similar to that of
    basic MinMax
  • Because of the smaller objects size, lesser
    objects may intersect with multiple Voronoi
    cells, also we proved that VD is better than
    basic MinMax
  • VD is good for small objects, and a hybrid of
    cluster shift (PC) and VD perform well in all
    cases

41
Conclusion
  • Uncertain clustering
  • Voronoi-diagram-based approach and MinMax-based
    approach
  • VDBi is strictly better than basic MinMax
  • Voronoi-diagram-based approach beat MinMax-based
    approach when objects uncertainty are small
  • Hybrid approach is good in all cases

42
Thank you
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com