Title: Statistical Society of Canada
1Structuring Interactive Cluster Analysis
- Wayne Oldford
- University of Waterloo
2Overview
Content by example
Argument
- ill-defined problem
- high-interaction desirable
- explore partitions
- recast algorithms
- problems
- resources
- interactive clustering
- partition moves
- implications
3Problem
geometric/visual structure
4Problem
context matters
5Problem
structure in context
segmentation in MRI
image source
6Problem
context specific structure
image source
7Problem
some specific some not
image source
8Problem
some specific some not
image source
9Problem
- Find groups in data
- Similar objects are together
- Groups are separated
- What do you mean similar?
- E.g. what is contiguous structure?
- When are groups separate?
10Computational resources
2. Memory
3. Display
11Computational resources
- computationally intensive
2. Memory
3. Display
12Computational resources
2. Memory
3. Display
13Computational resources
2. Memory
3. Display
- graphics processors, digital video
- more data, more visual detail
14Computational resources
2. Memory
3. Display
Balance and integrate
15High interaction
- integrate computational resources
16Example image analysis
17Example context and function plots
18Example mutual support and shapes
19Example exploratory data analysis
20Interactive clustering
- visual grouping
- location, motion, shape, texture, ...
- linking across displays
- manual
- selection
- cases, variates, groups, ...
- colouring
- focus
- immediate and incremental
- context can be used to form groups
- multiple partitions
21Automated clustering typical software
- resources dedicated to numerical computation
- teletype interaction
- runs to completion
- graphical output
- dont always work so well (no universal solution)
- confirm via exploratory data analysis
Must be integrated with interactive methods
22Example K-means clustering
23Example VERI Visual Empirical Regions of
Influence
join points if no third point falls in this
region
24Example VERI
25Integrating automatic methods
- Move about the space of partitions
- Pa --gt Pb --gt Pc --gt .
Which operators f f(Pa) --gt Pb
are of interest?
26Refine
Reduce
27Reassign
28Refinement sequence
-gt 2
-gt 3
-gt 4
-gt 5
29Reassign, reduce sequence
-gt 5
30Reassign, reduce sequence
-gt 5
-gt 4
-gt 3
-gt 2
31Moves
examples
break minimal spanning tree
join near centres
- reassign (Pold) --gt Pnew
k-means maximize F
- partition (graphic) --gt Pnew
colours from point cloud
32Challenges
- varying focus
- subsets (selected manually and at random)
- merging new data into partition
- exploring multiple partitions
- interactive display and comparison
- resolving many to one
- interface design
- control panels, options
- interaction
33Interface
34Interface - reduce
35Interface - refine
36Interface - reassign
37Interaction
38Interaction - refine 2
39Interaction - refine 3
40Interaction -save partition movie
41Interaction -refine 4
42Interaction - refine 5
43Interaction - refine 5 dendrogram
44Interaction - reassign
45Interaction - cluster plot movie
46Creation
- partition (Data ...) --gt Pnew
- manually from colours
- k-means, random start, mst, veri, etc
- from existing classifier.
- partition-path (Data ) --gt P1 , P2 , , Pn
- partition-path (Pold ...)
- --gt Pold , P1 , P2 ,
, Pn
- e.g. nested sequence from hierarchical clustering
47Composition
- resolve (P1, ..., Pm ) --gt Pnew
- combine different partitions of the same data
- merge (Data, Pold ) --gt Pnew
- classify additional points
- merge (Pa , Pb ) --gt Pnew
- combine non-overlapping partitions
48Other operators
- dissimilarity (Pi, Pj) --gt di,j
- dendrogram if P1 lt lt Pm
- mds plot of all clusters in P1, , Pm
- mds plot of all partitions P1, , Pm
49Implications
- Algorithms (re)cast in terms of moves
- refine, reduce
- reassign
- partition, partition-path
- easily understandable (e.g. geometric structures)
- specify required data structures
- e.g. ms tree, triangulation, var-cov matrix,
50New problems
- interface design
- multiple partitions
- comparison and/or resolution
- multiple display
- inference
51Summary
- Cluster analysis is naturally exploratory and
needs integration with modern interactive data
analysis. - Enlarging the problem to partitions
- simplifies and gives structure
- encourages exploratory approach
- integrates naturally
- introduces new possibilities (analysis and
research)
52Acknowledgements
- Catherine Hurley, Erin McLeish, Rayan Yahfoufi,
Natasha Wiebe - U(W) students in statistical computing
- Quail Quantitative Analysis in Lisp
- http//www.stats.uwaterloo.ca/Quail