Title: A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data
1A Rank-by-Feature Framework for Interactive
Exploration of Multidimensional Data
- Jinwook Seo, Ben Shneiderman
- University of Maryland
Hyun Young Song (hsong_at_cs.umd.edu) Maryam
Farboodi (farboodi_at_cs.umd.edu) Feb, 09 2006
2HCE 3.0
- HCE (Hierarchical Clustering Explorer)
- Main Idea GRID principles
- Graphics, Ranking and Interaction for Discovery
- Feature
- Application
- http//www.cs.umd.edu/hcil/hce/
- User Manual
- http//www.cs.umd.edu/hcil/hce/hce3-manual/hce3_ma
nual.html - Dataset
- http//www.cs.umd.edu/hcil/hce/examples/applicatio
n_examples.html
3Axis-Parallel vs. Non Axis-Parallel Approach
- Definition
- 3 dimensions X, Y Z
- Axis-parallel Projection on either X Y X Z
or Y Z - Non axis-parallel Can project on a.Xb.Y Z
- Simplicity vs. power
- Users
4Related Works
- Axis-parallel Machine learning, Info. Vis.
- Pattern recognition
- Subset of dimensions to find specific patterns
- Machine learning and Data mining
- Supervised/ Unsupervised classification
- Subspace-based clustering analysis
- Projections naturally partitioning the data set
- Information Visualization
- Permutation Matrix
- Parallel coordinates dimension ordering
- Conditional Entropy
5Related Work (cntd.)
- Non axis-parallel statisticians
- Two-dimensional projection
- SOM (Self Organizing Maps)
- XGobi
- Grand tour, Projection pursuit
- No ranking
- HD-Eye
- interactive hierarchical clustering
- OptiGrid (partitioning clustering algorithm)
6Major Contributions
- GRID (Graphics, Ranking and Interaction for
Discovery) - Study 1D, study 2D, then find features
- Ranking guides insight, statistics confirm
- Visualization Techniques
- Overview
- Coordination (multiple windows)
- Dynamic query (item slider)
7General Overview
- Menu
- Toolbar
- Overviews, Color setting
- Dendrogram (binary tree), scatterplot
- 7 tabs
- Color mosaic, Table view, Histogram Ordering,
Scatterplot ordering, Profile search, Gene
ontology, K-means
8General Overview
back
9Load/Transformation Data
- Natural Log
- Standardization
- Normalization
- To the first column
- Median
- Linear scaling
back
10Clustering Algorithm
- Initially, each data a cluster by itself
- Merge the pair with highest similarity value
- Update similarity values
- Repeat 2 3 for n - 1 times to reach one cluster
of size n - No predefined number of clusters
11Choosing Algorithm Parameters
12Linkage Method
- Average Linkage
- Average Group Linkage
- Complete Linkage
- Single Linkage
- Scheindermans 1by1 Linkage
- Tries to grow the newly merged cluster of last
iteration first
13Dendrogram View
back
147 Tabs
151D Histogram Interface
- Interface description
- Control panel, Score overview, Ordered list,
Histogram browser
161D Histogram Ordering
- Ranking criteria
- Normality of the distribution (08)
- s skewness, k kurtosis
- Uniformity of the distribution (08)
- Number of potential outliers (0n)
- IQR Q3 Q1, d item value
- Suspected outlier
- Extreme outlier
- Number of unique values (0n)
- Size of the biggest gap (0max. dim. range)
- mf max frequency, t tolerance
172D Scatterplot Interface
- Interface description
- Control panel, Score overview, Ordered list,
Scatterplot browser
182D Scatterplot Ordering
- Ranking criteria
- Statistical Relationship
- Correlation coefficient(-11) Pearsons
coefficient - Least square error for curvilinear
regression(01) - Quadracity(-88)
- Distribution Characteristics
- Number of potential outliers(0n)
- LOF-based Density-based outlier detection
- Number of items in area of interest(0n)
- Uniformity(08)
19Demo
20System Constraints
- Computational Complexity
- n data in m dimensional space O(nm²)
- O(n) scoring complexity
- O(m²) combination of dimension
- Display Constraints
- Appropriate number of dimensions for score
overview component 0130 - Lack of sliders to adjust displacement
21Evaluation of HCE 3.0
- Linear color mapping (3 color or 1 color)
- Consistent layout of the components
- Focus-context
- F dendrogram C rank-by-feature
- F ordered list - C histogram, scatter plot
- Item slider
- Dynamic query
- Multi-window view
- Dynamic update of data selection in different
window
22Futureworks
HCE 3.0 (HCE 3.5) HCE 4.0 ??
1D, 2D axis parallel projection 3D projection
Numerical data format Numerical categorical, binary, Nominal
Limited number of applicable datasets ( us cities, cereal, netscan ) 1D - 5 ranking criteria 2D 6 ranking criteria More meaningful datasets to demonstrate the power of each ranking criteria Incorporate more criterion into rank-by-feature framework
- User study
- Various statistical tools and data mining
algorithms
23