A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data

1 / 23
About This Presentation
Title:

A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data

Description:

Machine learning and Data mining. Supervised/ Unsupervised ... Repeat 2 & 3 for n - 1 times to reach one cluster of size n. No predefined number of clusters ... –

Number of Views:160
Avg rating:3.0/5.0
Slides: 24
Provided by: hyunyou
Category:

less

Transcript and Presenter's Notes

Title: A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data


1
A Rank-by-Feature Framework for Interactive
Exploration of Multidimensional Data
  • Jinwook Seo, Ben Shneiderman
  • University of Maryland

Hyun Young Song (hsong_at_cs.umd.edu) Maryam
Farboodi (farboodi_at_cs.umd.edu) Feb, 09 2006
2
HCE 3.0
  • HCE (Hierarchical Clustering Explorer)
  • Main Idea GRID principles
  • Graphics, Ranking and Interaction for Discovery
  • Feature
  • Application
  • http//www.cs.umd.edu/hcil/hce/
  • User Manual
  • http//www.cs.umd.edu/hcil/hce/hce3-manual/hce3_ma
    nual.html
  • Dataset
  • http//www.cs.umd.edu/hcil/hce/examples/applicatio
    n_examples.html

3
Axis-Parallel vs. Non Axis-Parallel Approach
  • Definition
  • 3 dimensions X, Y Z
  • Axis-parallel Projection on either X Y X Z
    or Y Z
  • Non axis-parallel Can project on a.Xb.Y Z
  • Simplicity vs. power
  • Users

4
Related Works
  • Axis-parallel Machine learning, Info. Vis.
  • Pattern recognition
  • Subset of dimensions to find specific patterns
  • Machine learning and Data mining
  • Supervised/ Unsupervised classification
  • Subspace-based clustering analysis
  • Projections naturally partitioning the data set
  • Information Visualization
  • Permutation Matrix
  • Parallel coordinates dimension ordering
  • Conditional Entropy

5
Related Work (cntd.)
  • Non axis-parallel statisticians
  • Two-dimensional projection
  • SOM (Self Organizing Maps)
  • XGobi
  • Grand tour, Projection pursuit
  • No ranking
  • HD-Eye
  • interactive hierarchical clustering
  • OptiGrid (partitioning clustering algorithm)

6
Major Contributions
  • GRID (Graphics, Ranking and Interaction for
    Discovery)
  • Study 1D, study 2D, then find features
  • Ranking guides insight, statistics confirm
  • Visualization Techniques
  • Overview
  • Coordination (multiple windows)
  • Dynamic query (item slider)

7
General Overview
  • Menu
  • Toolbar
  • Overviews, Color setting
  • Dendrogram (binary tree), scatterplot
  • 7 tabs
  • Color mosaic, Table view, Histogram Ordering,
    Scatterplot ordering, Profile search, Gene
    ontology, K-means

8
General Overview
back
9
Load/Transformation Data
  • Natural Log
  • Standardization
  • Normalization
  • To the first column
  • Median
  • Linear scaling

back
10
Clustering Algorithm
  • Initially, each data a cluster by itself
  • Merge the pair with highest similarity value
  • Update similarity values
  • Repeat 2 3 for n - 1 times to reach one cluster
    of size n
  • No predefined number of clusters

11
Choosing Algorithm Parameters
12
Linkage Method
  • Average Linkage
  • Average Group Linkage
  • Complete Linkage
  • Single Linkage
  • Scheindermans 1by1 Linkage
  • Tries to grow the newly merged cluster of last
    iteration first

13
Dendrogram View
back
14
7 Tabs
15
1D Histogram Interface
  • Interface description
  • Control panel, Score overview, Ordered list,
    Histogram browser

16
1D Histogram Ordering
  • Ranking criteria
  • Normality of the distribution (08)
  • s skewness, k kurtosis
  • Uniformity of the distribution (08)
  • Number of potential outliers (0n)
  • IQR Q3 Q1, d item value
  • Suspected outlier
  • Extreme outlier
  • Number of unique values (0n)
  • Size of the biggest gap (0max. dim. range)
  • mf max frequency, t tolerance

17
2D Scatterplot Interface
  • Interface description
  • Control panel, Score overview, Ordered list,
    Scatterplot browser

18
2D Scatterplot Ordering
  • Ranking criteria
  • Statistical Relationship
  • Correlation coefficient(-11) Pearsons
    coefficient
  • Least square error for curvilinear
    regression(01)
  • Quadracity(-88)
  • Distribution Characteristics
  • Number of potential outliers(0n)
  • LOF-based Density-based outlier detection
  • Number of items in area of interest(0n)
  • Uniformity(08)

19
Demo
20
System Constraints
  • Computational Complexity
  • n data in m dimensional space O(nm²)
  • O(n) scoring complexity
  • O(m²) combination of dimension
  • Display Constraints
  • Appropriate number of dimensions for score
    overview component 0130
  • Lack of sliders to adjust displacement

21
Evaluation of HCE 3.0
  • Linear color mapping (3 color or 1 color)
  • Consistent layout of the components
  • Focus-context
  • F dendrogram C rank-by-feature
  • F ordered list - C histogram, scatter plot
  • Item slider
  • Dynamic query
  • Multi-window view
  • Dynamic update of data selection in different
    window

22
Futureworks
HCE 3.0 (HCE 3.5) HCE 4.0 ??
1D, 2D axis parallel projection 3D projection
Numerical data format Numerical categorical, binary, Nominal
Limited number of applicable datasets ( us cities, cereal, netscan ) 1D - 5 ranking criteria 2D 6 ranking criteria More meaningful datasets to demonstrate the power of each ranking criteria Incorporate more criterion into rank-by-feature framework
  • User study
  • Various statistical tools and data mining
    algorithms

23
  • Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com