Unsupervised Learning: Clustering

About This Presentation

Title:

Unsupervised Learning: Clustering

Description:

Unsupervised Learning Supervised learning used ... No labels = unsupervised learning Only some points are labeled = semi-supervised learning Labels may be ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 10

Provided by: EricE78

Learn more at: https://cs.brynmawr.edu

Category:

more less

Transcript and Presenter's Notes

Title: Unsupervised Learning: Clustering

1
Unsupervised Learning Clustering
Some material adapted from slides by Andrew
Moore, CMU. Visit http//www.autonlab.org/tutoria
ls/ for Andrews repository of Data Mining
tutorials.
2
Unsupervised Learning

Supervised learning used labeled data pairs (x,
y) to learn a function f X?Y.
But, what if we dont have labels?
No labels unsupervised learning
Only some points are labeled semi-supervised
learning
Labels may be expensive to obtain, so we only get
a few.
Clustering is the unsupervised grouping of data
points. It can be used for knowledge discovery.

3
Clustering Data
4
K-Means Clustering

K-Means ( k , data )
Randomly choose k cluster center locations
(centroids).
Loop until convergence
Assign each point to the cluster of the closest
centroid.
Reestimate the cluster centroids based on the
data assigned to each.

5
K-Means Clustering

K-Means ( k , data )
Randomly choose k cluster center locations
(centroids).
Loop until convergence
Assign each point to the cluster of the closest
centroid.
Reestimate the cluster centroids based on the
data assigned to each.

6
K-Means Clustering

K-Means ( k , data )
Randomly choose k cluster center locations
(centroids).
Loop until convergence
Assign each point to the cluster of the closest
centroid.
Reestimate the cluster centroids based on the
data assigned to each.

7
K-Means Animation
Example generated by Andrew Moore using Dan
Pellegs super-duper fast K-means system Dan
Pelleg and Andrew Moore. Accelerating Exact
k-means Algorithms with Geometric
Reasoning. Proc. Conference on Knowledge
Discovery in Databases 1999.
8
Problems with K-Means

Very sensitive to the initial points.
Do many runs of k-Means, each with different
initial centroids.
Seed the centroids using a better method than
random. (e.g. Farthest-first sampling)
Must manually choose k.
Learn the optimal k for the clustering. (Note
that this requires a performance measure.)

9
Problems with K-Means

How do you tell it which clustering you want?
Constrained clustering techniques

10
Learning Bayes Nets
Some material adapted from lecture notes by Lise
Getoor and Ron Parr
Adapted from slides by Tim Finin and Marie
desJardins.
11
Learning Bayesian networks

Given training set
Find B that best matches D
model selection
parameter estimation

Inducer
Data D
12
Parameter estimation

Assume known structure
Goal estimate BN parameters q
entries in local probability models, P(X
Parents(X))
A parameterization q is good if it is likely to
generate the observed data
Maximum Likelihood Estimation (MLE) Principle
Choose q so as to maximize L

i.i.d. samples
13
Parameter estimation II

The likelihood decomposes according to the
structure of the network
? we get a separate estimation task for each
parameter
The MLE (maximum likelihood estimate) solution
for each value x of a node X
and each instantiation u of Parents(X)
Just need to collect the counts for every
combination of parents and children observed in
the data
MLE is equivalent to an assumption of a uniform
prior over parameter values

sufficient statistics
14
Sufficient statistics Example

Why are the counts sufficient?

Moon-phase
Light-level
Earthquake
Burglary
Alarm
?A E, B N(A, E, B) / N(E, B)
15
Model selection

Goal Select the best network structure, given
the data
Input
Training data
Scoring function
Output
A network that maximizes the score

16
Structure selection Scoring

Bayesian prior over parameters and structure
get balance between model complexity and fit to
data as a byproduct
Score (GD) log P(GD) ? log P(DG) P(G)
Marginal likelihood just comes from our parameter
estimates
Prior on structure can be any measure we want
typically a function of the network complexity

Marginal likelihood
Prior
17
Heuristic search
18
Exploiting decomposability
19
Variations on a theme

Known structure, fully observable only need to
do parameter estimation
Unknown structure, fully observable do heuristic
search through structure space, then parameter
estimation
Known structure, missing values use expectation
maximization (EM) to estimate parameters
Known structure, hidden variables apply adaptive
probabilistic network (APN) techniques
Unknown structure, hidden variables too hard to
solve!

20
Handling missing data

Suppose that in some cases, we observe
earthquake, alarm, light-level, and moon-phase,
but not burglary
Should we throw that data away??
Idea Guess the missing valuesbased on the other
data

Moon-phase
Light-level
Earthquake
Burglary
Alarm
21
EM (expectation maximization)

Guess probabilities for nodes with missing values
(e.g., based on other observations)
Compute the probability distribution over the
missing values, given our guess
Update the probabilities based on the guessed
values
Repeat until convergence

22
EM example

Suppose we have observed Earthquake and Alarm but
not Burglary for an observation on November 27
We estimate the CPTs based on the rest of the
data
We then estimate P(Burglary) for November 27 from
those CPTs
Now we recompute the CPTs as if that estimated
value had been observed
Repeat until convergence!

Earthquake
Burglary
Alarm

Write a Comment

User Comments (0)