Basic Data Mining Techniques II More on Clustering - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Basic Data Mining Techniques II More on Clustering

Description:

... weather data has three input attributes, outlook, humidity and windy that ... between the two instances is equal to the count of different attribute values. ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 12
Provided by: xx197
Category:

less

Transcript and Presenter's Notes

Title: Basic Data Mining Techniques II More on Clustering


1
Basic Data Mining Techniques II More on
Clustering
2
Goals
  • Generalize the concept of clustering
  • Relate clustering techniques to instance based
    classification techniques such as nearest
    neighbor classification

3
An Alternative Clustering Algorithm
  • Let T denote the set of instances, C the set
    of clusters.
  • Name the first instance i1 in T as a cluster. T
    T \i1 , C1 i1.
  • While T is not empty
  • Pick an instance i from T. If i is similar
    enough to any existing cluster, place it in it.
    Otherwise, start a new cluster using i.
  • Set T T \i.

4
K-means example revisited
5
  • Set required similarity3, C1 (1.0,1.5)
  • Is (1,4.5) similar enough to C1?
  • d3, yes. C1 (1.0,1.5), (1.0,4.5)
  • Is (2,1.5) similar enough to C1?
  • C1 center (1, 3), d1.80, yes
  • C1 (1.0,1.5), (1.0,4.5), (2,1.5)
  • Is (2,3.5) similar enough to C1?
  • C1 center(1.33,2.5), d1.20, yes.
  • ....
  • C1(1.0,1.5), (1.0,4.5), (2,1.5), (2,3.5),
    (3,2.5), C2(5,6)

6
Remarks
  • The measure of similarity is again crucial and
    affects the clusters formed. This has two
    aspects
  • The distance function used
  • Aggregation operator employed (to measure
    similarity to a group of instances)

7
Reflections on Classification
  • Instance-based classification has similar ideas
    present in this clustering algorithm.
  • For classification purposes, instances in the
    test set are saved as they are.
  • A new instance to be classified is evaluated and
    its (one) nearest neighbor is identified. Then
    the class of this neighbor becomes the predicted
    class of the new instance.

8
Example 3-Nearest Neighbors Classification
9
The Problem
  • The fictitious weather data has three input
    attributes, outlook, humidity and windy that
    supposedly impacts whether a (n unspecified) game
    is played or not.
  • Consider the following distance function Given
    two instances I1 ad I2, the distance between the
    two instances is equal to the count of different
    attribute values. For example, d((rainy, high,
    false), (rainy, normal, true))0112.
  • Classify the new instance (rainy, high, true)
    using a 3-nearest neighbors classifier.

10
The Solution
11
Remarks
  • Nearest neighbor classification may be a simple
    and effective technique when the data set is
    relatively small.
  • The basic form presented here is very crude, many
    variations exist.
  • K-nearest neighbors identifies the closest K
    instances in the set, and uses a voting mechanism
    to determine the class of the new instance.
Write a Comment
User Comments (0)
About PowerShow.com