Title: Statistical Society of Canada
1Structuring Interactive Cluster Analysis
- Wayne Oldford
- University of Waterloo
Content by example
- ill-defined problem
- high-interaction desirable
- explore partitions
- recast algorithms
- problems
- resources
- interactive clustering
- partition moves
- implications
geometric/visual structure
context matters
structure in context
segmentation in MRI
image source
context specific structure
image source
some specific some not
image source
some specific some not
image source
- Find groups in data
- Similar objects are together
- Groups are separated
- What do you mean similar?
- E.g. what is contiguous structure?
- When are groups separate?
10Computational resources
2. Memory
3. Display
11Computational resources
- computationally intensive
2. Memory
3. Display
12Computational resources
2. Memory
3. Display
13Computational resources
2. Memory
3. Display
- graphics processors, digital video
- more data, more visual detail
14Computational resources
2. Memory
3. Display
Balance and integrate
15High interaction
- integrate computational resources
16Example image analysis
17Example context and function plots
18Example mutual support and shapes
19Example exploratory data analysis
20Interactive clustering
- visual grouping
- location, motion, shape, texture, ...
- linking across displays
- manual
- selection
- cases, variates, groups, ...
- colouring
- focus
- immediate and incremental
- context can be used to form groups
- multiple partitions
21Automated clustering typical software
- resources dedicated to numerical computation
- teletype interaction
- runs to completion
- graphical output
- dont always work so well (no universal solution)
- confirm via exploratory data analysis
Must be integrated with interactive methods
22Example K-means clustering
23Example VERI Visual Empirical Regions of
join points if no third point falls in this
24Example VERI
25Integrating automatic methods
- Move about the space of partitions
- Pa --gt Pb --gt Pc --gt .
Which operators f f(Pa) --gt Pb
are of interest?
28Refinement sequence
-gt 2
-gt 3
-gt 4
-gt 5
29Reassign, reduce sequence
-gt 5
30Reassign, reduce sequence
-gt 5
-gt 4
-gt 3
-gt 2
break minimal spanning tree
join near centres
- reassign (Pold) --gt Pnew
k-means maximize F
- partition (graphic) --gt Pnew
colours from point cloud
- varying focus
- subsets (selected manually and at random)
- merging new data into partition
- exploring multiple partitions
- interactive display and comparison
- resolving many to one
- interface design
- control panels, options
- interaction
34Interface - reduce
35Interface - refine
36Interface - reassign
38Interaction - refine 2
39Interaction - refine 3
40Interaction -save partition movie
41Interaction -refine 4
42Interaction - refine 5
43Interaction - refine 5 dendrogram
44Interaction - reassign
45Interaction - cluster plot movie
- partition (Data ...) --gt Pnew
- manually from colours
- k-means, random start, mst, veri, etc
- from existing classifier.
- partition-path (Data ) --gt P1 , P2 , , Pn
- partition-path (Pold ...)
- --gt Pold , P1 , P2 ,
, Pn
- e.g. nested sequence from hierarchical clustering
- resolve (P1, ..., Pm ) --gt Pnew
- combine different partitions of the same data
- merge (Data, Pold ) --gt Pnew
- classify additional points
- merge (Pa , Pb ) --gt Pnew
- combine non-overlapping partitions
48Other operators
- dissimilarity (Pi, Pj) --gt di,j
- dendrogram if P1 lt lt Pm
- mds plot of all clusters in P1, , Pm
- mds plot of all partitions P1, , Pm
- Algorithms (re)cast in terms of moves
- refine, reduce
- reassign
- partition, partition-path
- easily understandable (e.g. geometric structures)
- specify required data structures
- e.g. ms tree, triangulation, var-cov matrix,
50New problems
- interface design
- multiple partitions
- comparison and/or resolution
- multiple display
- inference
- Cluster analysis is naturally exploratory and
needs integration with modern interactive data
analysis. - Enlarging the problem to partitions
- simplifies and gives structure
- encourages exploratory approach
- integrates naturally
- introduces new possibilities (analysis and
- Catherine Hurley, Erin McLeish, Rayan Yahfoufi,
Natasha Wiebe - U(W) students in statistical computing
- Quail Quantitative Analysis in Lisp
- http//www.stats.uwaterloo.ca/Quail