Title: How slow is the kmeans method
1How slow is the k-means method?
- David Arthur Sergei Vassilvitskii
- Stanford University
2The k-means Problem
- Given an integer k and n data points in Rd
- Partition points into k clusters
- Choose k centers and partition points according
to closest center - Try to minimize
- f ? x c(x)2
3Lloyds Algorithm (1982)
- Simply called the k-means method
- Choose k starting centers
- Uniformly at random usually
- Repeat until stable
- Assign each point to the closest center
- Set each center to be center of mass of points
assigned to it
4Example
Cluster boundary
Data Point
Cluster center
First choose k arbitrary centers
Assign points to closest centers
Recompute centers
k-means has now stabilized
5About k-means
- It always terminates
- Each step decreases f
- At most kn configurations
- It can stop with arbitrarily bad clusterings
6About k-means
- Widely used because it is fast
- Usually far fewer than n iterations
- How do you formalize this?
- Just look at worst-case performance?
7k-means (Worst case iterations)
- Counting number of configurations
- Already showed O(kn)
- Inaba et al. (SOCG 94) O(nkd)
- One dimension
- Dasgupta (COLT 03) O(n)
- Har-Peled, Sadri (SODA 05) O(n?2)
- ? ratio of largest distance to smallest
8Our Main Result
- Worst case 2O(vn)
- k-means is superpolynomial!
9Proof High Level
- Start with configuration M with n points, which
requires T iterations - Add O(1) clusters, O(k) points
- These reset initial configuration M
- M stabilizes to M
- New clusters, points reset M to M
- M now has to stabilize to M again
- Now requires at least 2T iterations
10Proof High Level
- Repeat reset construction m times
- O(m2) points
- O(m) clusters
- 2m iterations
11Main Construction (Overview)
Ci
The original means configuration, M
12Main Construction (Overview)
Ci
G
G
H
H
H
H
Note horizontal symmetry O(1) new clusters, O(k)
new points
13Main Construction (Overview)
Ci
G
G
H
H
H
H
Points pi Absorbed by Ci when M completes
Points qi Absorbed by Ci after pi, which resets
the center of Ci
Everything else Balances the important points
Helps H absorb pi and qi right after Ci does,
completing the reset
14Main Construction (Zoomed in)
Ci
0.989d
0.989d
G
de
e
0.2d
H
H
Zoomed in and more to scale
Some distances shown (d gtgt e)
15Main Construction (t0)
Ci
G
H
H
We trace k-means from this initial configuration
16Main Construction (t0T)
Ci
G
H
H
Push new points far enough away New clusters are
stable while M executes
17Main Construction (tT1)Reassigning points to
clusters
Ci
G
pi
H
H
Take pi to be direct lift of final center of
Ci At time T1 Ci closer to taking pi than
ever Can position G so pi absorbed by Ci at time
T1
18Main Construction (tT1)Reassigning points to
clusters
Ci
G
H
H
Nasty detail Have to position G to work for each
i simultaneously
19Main Construction (tT1)Reassigning points to
clusters
Ci
G
H
H
Basic idea Perturb final Ci centers onto a
hypersphere, and align G with center
20Main Construction (tT1)Recomputing centers
Ci
G
H
H
Center of G moves further away Centers of Ci
stable by symmetry
21Main Construction (tT2)Reassigning points to
clusters
Ci
G
qi
H
H
Gs center far away it loses points qi switches
to some Cj we want it to be Ci regardless of
qis position in base space
22Main Construction (tT2)Reassigning points to
clusters
Ci
G
H
H
Basic idea Translate each (pi, qi) along new
dimensionqi now closer to Ci than any other Cj
23Main Construction (tT2)Recomputing centers
Ci
G
Centers reset to t0!
H
H
Symmetry Ci centers not lifted towards G Can now
choose qis coordinate in base space to reset Ci
24Main Construction (tT3)Reassigning points to
clusters
G
H
Ci
Same clusters as t1
H
Ci centers have not moved closer to pi, qi But H
has
25Main Construction (tT3)Recomputing centers
G
H
Ci
Same centers as t1
H
26Main Construction (tT4)Reassigning points to
clusters
G
H
Ci
H
Same clusters as t2
27Main Construction (tT4)Recomputing centers
G
H
Ci
H
Same centers as t2
Success!
New clusters now totally stable Ci free to
proceed another T-2 steps
28Summary
- Some configurations take 2O(vn) iterations
- (Yes, we have actually implemented this!)
- What now?
- Lower bound is too precise to arise in practice
- How do you formalize that?
29The Big Question
- How to guarantee good speed?
- Choose initial centers randomly?
- Nope Can force starting configuration w.h.p.
- Har-Peled, Sadri SODA 05 Poly spread?
- Nope Can make spreadn by adding 1 dim.
- Low dimension?
- Open We conjecture poly only if d1
- But k-means fast in practice even in high dim
30The Big Question
- How to guarantee good speed?
- We suggest smoothed analysis of Spielman and Teng
- Perturb each data point using normal distribution
- We recently showed O(nk) and O(2n/d)
- Recall worst case bound O(nkd)
- Still open!
31Thanks for listening!