Artificial Intelligence 15-381 Unsupervised Machine Learning Methods

About This Presentation

Title:

Artificial Intelligence 15-381 Unsupervised Machine Learning Methods

Description:

Learning useful structure without labeled classes, optimization criterion, ... Prelude to discovery of underlying properties. Summarize the news for the past month ... – PowerPoint PPT presentation

Number of Views:543

Avg rating:3.0/5.0

Slides: 20

Provided by: rcp6

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Artificial Intelligence 15-381 Unsupervised Machine Learning Methods

1
Artificial Intelligence 15-381Unsupervised
Machine Learning Methods

Jaime Carbonell
1-November-2001
OUTLINE
What is unsupervised learning?
Similarity computations
Clustering Algorithms
Other kinds of unsupervised learning

2
Unsupervised Learning

Definition of Unsupervised Learning
Learning useful structure without labeled
classes, optimization criterion, feedback signal,
or any other information beyond the raw data

3
Unsupervised Learning

Examples
Find natural groupings of Xs (Xhuman languages,
stocks, gene sequences, animal species,)?
Prelude to discovery of underlying properties
Summarize the news for the past month?
Cluster first, then report centroids.
Sequence extrapolation E.g. Predict cancer
incidence next decade predict rise in
antibiotic-resistant bacteria
Methods
Clustering (n-link, k-means, GAC,)
Taxonomy creation (hierarchical clustering)
Novelty detection ("meaningful"outliers)
Trend detection (extrapolation from multivariate
partial derivatives)

4
Similarity Measures in Data Analysis

General Assumptions
Each data item is a tuple (vector)
Values of tuple are nominal, ordinal or numerical
Similarity (Distance)-1
Pure Numerical Tuples
Sim(di,dj) ?di,kdj,k
sim (di,dj) cos(didj)
and many more (slide after next)

5
Similarity Measures in Data Analysis

For Ordinal Values
E.g. "small," "medium," "large," "X-large"
Convert to numerical assuming constant ?on a
normalized 0,1 scale, where max(v)1,
min(v)0, others interpolate
E.g. "small"0, "medium"0.33, etc.
Then, use numerical similarity measures
Or, use similarity matrix (see next slide)

6
Similarity Measures (cont.)

For Nominal Values
E.g. "Boston", "LA", "Pittsburgh", or "male",
"female", or "diffuse", "globular", "spiral",
"pinwheel"
Binary rule If di,kdj,k, then sim1, else 0
Use underlying sematic property E.g. Sim(Boston,
LA)?dist(Boston, LA)-1, or Sim(Boston,
LA)?(size(Boston) size(LA) )-1
Use similarity Matrix

7
Similarity Matrix

tiny little small medium large huge
tiny 1.0 0.8 0.7 0.5 0.2 0.0
little 1.0 0.9 0.7 0.3 0.1
small 1.0 0.7 0.3 0.2
medium 1.0 0.5 0.3
large 1.0 0.8
huge 1.0
Diagonal must be 1.0
Monotonicity property must hold
Triangle inequality must hold
Transitive property need not hold

8
Document Clustering Techniques

Similarity or Distance MeasureAlternative
Choices
Cosine similarity
Euclidean distance
Kernel functions, e.g.,
Language Modeling P(ymodelx) where x and y are
documents

9
Document Clustering Techniques

Kullback Leibler distance ("relative entropy")

10
Incremental Clustering Methods

Given n data items D D1, D2,Di,Dn
And given minimal similarity threshold Smin
Cluster data incrementally as follows
Procedure Singlelink(D)
Let CLUSTERS D1
For i2 to n
Let Dc ArgmaxSim(Di,Dj
jlti
If DcgtSmin, add Dj to Dc's cluster
Else Append(CLUSTERS, Dj new cluster

11
Incremental Clustering (cont.)

Procedure Averagelink(D)
Let CLUSTERS D1
For i2 to n
Let Dc ArgmaxSim(Di, centroid(C)
C in CLUSTERS
If DcgtSmin, add Dj to cluster C
Else Append(CLUSTERS, Dj new cluster
Observations
Single pass over the data?easy to cluster new
data incrementally
Requires arbitrary Smin threshold
O(N2) time, O(N) space

12
Document Clustering Techniques

Example. Group documents based on similarity
Similarity matrix
Thresholding at similarity value of .9 yields
complete graph C1 1,4,5, namely Complete
Linkage
connected graph C21,4,5,6, namely Single
Linkage
For clustering we need three things
A similarity measure for pairwise comparison
between documents
A clustering criterion (complete Link, Single
Ling,)
A clustering algorithm

13
Document Clustering Techniques

Clustering Criterion Alternative Linkages
Single-link ('nearest neighbor")
Complete-link
Average-link ("group average clustering") or
GAC)

14
Non-hierarchical Clustering Methods

A Single-Pass Algorithm
Treat the first document as the first cluster
(singleton cluster).
Compare each subsequent document to all the
clusters processed so far.
Add this new document to the closest cluster if
the intercluster similarity is above the
similarity threshold (predetermined) otherwise,
leave the new document alone as a new cluster.
Repeat Steps 2 and 3 until all the documents are
processed.
- O(n2) time and O(n) space (worst case
complexity)

15
Non-hierarchical Methods (cont.)

Multi-pass K-means ("reallocation method")
Select K initial centroids (the "seeds")
Assign each document to the closeest centroid,
resulting in K clusters.
Recompute the centroid for each of the K
clusters.
Repeat Steps 2 and 3 until the centroids are
stabilized.
- O(nK) time and O(K) space per pass

16
Hierarchical Agglomerative Clustering Methods

Generic Agglomerative Procedure (Salton '89)
result in nested clusters via iterations
Compute all pairwise document-document similarity
coefficients
Place each of n documents into a class of its own
Merge the two most similar clusters into one
- replace the two clusters by the new cluster
- compute intercluster similarity scores w.r.t.
the new cluster
Repeat the above step until only one cluster is
left

17
Hierarchical Agglomerative Clustering Methods
(cont.)

Heuristic Approaches to Speedy Clustering
Reallocation methods with k selected-seeds (O(kn)
time)
- k is the desired number of clusters n is the
number of documents
Buckshot random sampling (of ?(k)n documents)
puls global HAC
Fractionation Divide and Conquer

18
Creating Taxonomies

Hierarchical Clustering
GAC trace creates binary hierarchy
Incremental-link? Hierarchical version
Cluster data with high Smin? 1st hierarchical
level
Decrease Smin (stop at Smin0)
Treat cluster centroids as data tuples and
recluster, creating next level of hierarchy, then
repeat steps 2 and 3.
K-means? Hierarchical k-means
Cluster data with large k
Decrease k (stop at k1)
Treat cluster centroids as data tuples and
recluster, creating next level of hierarchy, then
repeat steps 2 and 3.