Unsupervised Approaches - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Unsupervised Approaches

Description:

clustering partitions a data set into smaller subsets based on the 'similarity' of the examples. ... The cost function is to be minimised. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 15
Provided by: CHMat
Category:

less

Transcript and Presenter's Notes

Title: Unsupervised Approaches


1
Unsupervised Approaches
  • An alternative approach clustering
  • algorithms that organize and classify data.
  • useful for data compression.
  • clustering partitions a data set into smaller
    subsets based on the 'similarity' of the
    examples.

2
Unsupervised Approaches
  • An alternative approach clustering
  • An example
  • K-means clustering
  • overview (an iterative process)
  • 1. determine a starting point i.e. the number of
    clusters and the cluster points.
  • 2. determine a membership matrix for the data
    points i.e. to which cluster point they are the
    closest.
  • 3. evaluate the cost function
  • 4. update the position of the cluster points
  • 5. go back to 2

3
Unsupervised Approaches
  • An alternative approach clustering
  • The cost function is to be minimised. It simply
    sums the distance between each data point and the
    appropriate cluster point for each cluster
  • updating the cluster position involves evaluating
    the average of the data points within a cluster,
    and moving the cluster point to this position.

4
Unsupervised Approaches
  • An alternative approach clustering
  • an example

selecting two cluster points c1, c2
5
Unsupervised Approaches
  • An alternative approach clustering
  • an example

Evaluating the membership matrix
membership of point xj in clusteri 1 (when
dist(xj, clusteri) lt all other dist(xj,
clusteri's)), otherwise it is 0
any suitable distance measure can be used e.g.
euclidean, hamming distance
6
Unsupervised Approaches
  • An alternative approach clustering
  • an example

we will use a hamming distance measure i.e
for n-dimension vectors (or points)
therefore x1 is a member of cluster point 2
7
Unsupervised Approaches
  • An alternative approach clustering
  • an example

cluster 1
cluster 2
8
Unsupervised Approaches
  • An alternative approach clustering
  • an example
  • Evaluate the cost function (J)

where there are n cluster points
For each cluster point
where there are m data points within the cluster
i.e. simply sum the distances between each point
in the cluster and the cluster point itself
9
Unsupervised Approaches
  • An alternative approach clustering
  • an example
  • using a hamming distance measure

and
thus J 2.8 3.05 5.85
10
Unsupervised Approaches
  • An alternative approach clustering
  • an example
  • 4. Update cluster points

i.e. move the cluster point the average position
of the data points in that cluster. For cluster
1 (members are x2, x3, x4, x6, x7)
average (1st component) (0.5 0.55 0.4 0.3
0.5)/points 2.25/5 0.45 average (2nd
component) (0.40.350.20.80.7)/points
2.45/5 0.49
so cluster point 1 moves to (0.45,0.49)
11
Unsupervised Approaches
  • An alternative approach clustering
  • an example
  • 4. Update cluster points

i.e. move the cluster point the average position
of the data points in that cluster. For cluster
2 (members are x1, x5, x8, x9)
average (1st component) (0.60.70.60.75)/poin
ts 2.65/4 0.66 average (2nd component)
(0.30.80.90.55)/points 2.55/4 0.64
so cluster point 1 moves to (0.66,0.64)
12
Unsupervised Approaches
  • An alternative approach clustering
  • an example

13
Unsupervised Approaches
  • An alternative approach clustering
  • an example

cluster 2
re-evaluate membership matrix
then re-evaluate the cost function and shift the
cluster pts again
14
Unsupervised Approaches
  • An alternative approach clustering
  • How is the algorithm terminated?
  • Normally a threshold is set and either
  • as soon as the cost function reaches (or is
    below) the threshold the algorithm terminates or
  • when the improvement (decrease) of the cost
    function over the previous improvement (decrease)
    falls below the threshold the algorithm
    terminates.
Write a Comment
User Comments (0)
About PowerShow.com