Title: An Introduction to Multivariate Data Visualization and XmdvTool
1An Introduction to Multivariate Data
Visualization and XmdvTool
- Matthew O. Ward
- Computer Science Department
- Worcester Polytechnic Institute
This work was supported under NSF Grant
IIS-9732897
2What is Multivariate Data?
- Each data point has N variables or observations
- Each observation can be
- nominal or ordinal
- discrete or continuous
- scalar, vector, or tensor
- May or may not have spatial, temporal, or other
connectivity attribute
3Characteristics of a Variable
- Order grades have an order, brand names do not.
- Distance metric for income, distance equals
difference. For rankings, difference is not a
distance metric. - Absolute zero temperature has an absolute zero,
bank account balances do not. - A variable can be classified by these three
attributes, called Scale. - Effective visualizations attempt to match the
scale of the data dimension with the graphical
attribute conveying it.
4Sources of Multivariate Data
- Sensors (e.g., images, gauges)
- Simulations
- Census or other surveys
- Commerce (e.g., stock market)
- Communication systems
- Spreadsheets and databases
5Issues in Visualizing Multivariate Data
- How many variables?
- How many records?
- Types of variables?
- User task (exploration, confirmation,
presentation) - Data feature of interest (clusters, anomalies,
trends, patterns, .) - Background of user (domain expert, visualization
specialist, decision-maker, .)
6Methods for Visualizing Multivariate Data
- Dimensional Subsetting
- Dimensional Reorganization
- Dimensional Embedding
- Dimensional Reduction
7Dimensional Subsetting
- Scatterplot matrix displays all pairwise plots
- Selection allows linkage between views
- Clusters, trends, and correlations readily
discerned between pairs of dimensions
8Dimensional Reorganization
- Parallel Coordinates creates parallel, rather
than orthogonal, dimensions. - Data point corresponds to polyline across axes
- Clusters, trends, and anomalies discernable as
groupings or outliers, based on intercepts and
slopes
9Dimensional Reorganization (2)
- Glyphs map data dimensions to graphical
attributes - Size, color, shape, and orientation are commonly
used - Similarities/differences in features give
insights into relations
10Dimensional Embedding
- Dimensional stacking divides data space into bins
- Each N-D bin has a unique 2-D screen bin
- Screen space recursively divided based on bin
count for each dimension - Clusters and trends manifested as repeated
patterns
11Dimensional Reduction
- Map N-D locations to M-D display space while best
preserving N-D relations - Approaches include MDS, PCA, and Kohonen Self
Organizing Maps - Relationships conveyed by position, links, color,
shape, size, etc.
12The Role of Selection
- User needs to interact with display, examine
interesting patterns or anomalies, validate
hypotheses - Selection allows isolation of subset of data for
highlighting, deleting, focussed analysis - Direct (clicking on displayed items ) vs.
indirect (range sliders) - Screen space (2-D) vs. data space (N-D)
13Demonstration of XmdvTool
14Problems with Large Data Sets
- Most techniques are effective with small to
moderate sized data sets - Large sets (gt 50K records) are increasingly
common - When traditional visualizations used, occlusion
and clutter make interpretation difficult
15One Potential Solution
- Multiresolution displays with aggregation
- Explicit clustering
- Break dimensions into bins
- Aggregate in a particular order (datacubes)
- Implicit clustering
- Hierarchical clustering (proximity-based merging)
- Hierarchical partitioning (proximity-based
splits) - Problem many ways to cluster, each revealing
different aspects of data
16Display Options
- For each cluster, show
- Center
- Extents for each dimension
- Population
- Other descriptors (e.g., quartiles)
- Color clusters such that siblings have similar
color to parents
17Hierarchical Parallel Coordinates
- Bands show cluster extents in each dimension
- Opacity conveys cluster population
- Color similarity indicates proximity in hierarchy
18Hierarchical Scatterplots
- Clusters displayed as rectangles, showing extents
in 2 dimensions - Color/opacity consistently used for relational
and population info
19Hierarchical Glyphs
- Star glyph with bands
- Analogous to parallel coordinates, with radial
rather than parallel dimensions - Glyph position critical for conveying relational
info
20Hierarchical Dimensional Stacking
- Clusters occupy multiple bins
- Overlaps can be reduced by increasing number of
bins - Cell colors can be blended, or display last
cluster mapped to space
21Hierarchical Star Fields
- Dimensional reduction techniques commonly
displayed with starfields - Each cluster becomes circle/sphere in field
- Alternatively, can show glyph at cluster location
22Navigating Hierarchies
- Drill-down, roll-up operations for more or less
detail - Need selection operation to identify subtrees for
exploration, pruning - Need indications of where you are in hierarchy,
and where youve been during exploration process
23Structure-Based Brushing
- Enhancement to screen-based and data-based
methods - Specify focus, extents, and level of detail
- Intuitive - wedge of tree and depth of interest
- Implemented by labeling/numbering terminals and
propagating ranges to parents
24Structure-Based Brush
- White contour links terminal nodes
- Red wedge is extents selection
- Color curve is depth specification
- Color bar maps location in tree to unique color
- Direct and indirect manipulation of brush
25Demonstration of Hierarchical Features in XmdvTool
26Auxiliary Tools
- Extent scaling to reduce occlusion of bands
- Dimensional zooming - fill display with selected
subspace (N-D distortion) - Dynamic masking to fade out selected or
unselected data - Saving selected subsets
- Enabling/disabling dimensions
- Univariate displays (Tukey box plots, tree maps)
27Summary
- Many ways to map multivariate data to images,
each with strengths and weaknesses - Linking between and within displays with brushing
enhances static displays and combines their
strengths. - Hierarchies and aggregations allow visualization
of large data sets - Intuitive navigation, filtering, and focus
critical to exploration process - Each basic multivariate visualization method is
readily extensible to displaying cluster
information
28Problems and Future Work
- Many data characteristics not currently supported
(e.g., text fields, records with missing entries,
data quality) - Navigation tool for hierarchies assumes linear
order of nodes - Looking at tools for dynamic reorganization
- Exploring 2-D or higher navigation interfaces
- Very large hierarchies are difficult to focus on
a narrow subset - Developing multiresolution interface
- Investigating distortion techniques for navigation
29Other Future Work
- Projection pursuit or view recommender
- Linking structure and data brushing
- User studies
- Customization based on domain
- Query optimization via caching and prefetching
30For More Information
- XmdvTool available to the public domain
- Both Unix and Windows support
- Next release will have Oracle interface, with
query optimization to support exploratory
operations - http//davis.wpi.edu/xmdv
- Papers in Vis 94, 95, 99, Infovis 99, IEEE
TVCG Vol. 6, No. 2, 2000
31Thanks to..
- Elke Rundensteiner
- Ying-Huey Fua
- Daniel Stroe
- Yang Jing
- Suggestions from Xmdv users
- NSF