An Introduction to Multivariate Data Visualization and XmdvTool - PowerPoint PPT Presentation

About This Presentation

Title:

An Introduction to Multivariate Data Visualization and XmdvTool

Description:

Sensors (e.g., images, gauges) Simulations. Census or other surveys. Commerce (e.g., stock market) ... map multivariate data to images, each with strengths and ... – PowerPoint PPT presentation

Number of Views:518

Avg rating:3.0/5.0

Slides: 32

Provided by: Matt1150

Learn more at: https://davis.wpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: An Introduction to Multivariate Data Visualization and XmdvTool

1
An Introduction to Multivariate Data
Visualization and XmdvTool

Matthew O. Ward
Computer Science Department
Worcester Polytechnic Institute

This work was supported under NSF Grant
IIS-9732897
2
What is Multivariate Data?

Each data point has N variables or observations
Each observation can be
nominal or ordinal
discrete or continuous
scalar, vector, or tensor
May or may not have spatial, temporal, or other
connectivity attribute

3
Characteristics of a Variable

Order grades have an order, brand names do not.
Distance metric for income, distance equals
difference. For rankings, difference is not a
distance metric.
Absolute zero temperature has an absolute zero,
bank account balances do not.
A variable can be classified by these three
attributes, called Scale.
Effective visualizations attempt to match the
scale of the data dimension with the graphical
attribute conveying it.

4
Sources of Multivariate Data

Sensors (e.g., images, gauges)
Simulations
Census or other surveys
Commerce (e.g., stock market)
Communication systems
Spreadsheets and databases

5
Issues in Visualizing Multivariate Data

How many variables?
How many records?
Types of variables?
User task (exploration, confirmation,
presentation)
Data feature of interest (clusters, anomalies,
trends, patterns, .)
Background of user (domain expert, visualization
specialist, decision-maker, .)

6
Methods for Visualizing Multivariate Data

Dimensional Subsetting
Dimensional Reorganization
Dimensional Embedding
Dimensional Reduction

7
Dimensional Subsetting

Scatterplot matrix displays all pairwise plots
Selection allows linkage between views
Clusters, trends, and correlations readily
discerned between pairs of dimensions

8
Dimensional Reorganization

Parallel Coordinates creates parallel, rather
than orthogonal, dimensions.
Data point corresponds to polyline across axes
Clusters, trends, and anomalies discernable as
groupings or outliers, based on intercepts and
slopes

9
Dimensional Reorganization (2)

Glyphs map data dimensions to graphical
attributes
Size, color, shape, and orientation are commonly
used
Similarities/differences in features give
insights into relations

10
Dimensional Embedding

Dimensional stacking divides data space into bins
Each N-D bin has a unique 2-D screen bin
Screen space recursively divided based on bin
count for each dimension
Clusters and trends manifested as repeated
patterns

11
Dimensional Reduction

Map N-D locations to M-D display space while best
preserving N-D relations
Approaches include MDS, PCA, and Kohonen Self
Organizing Maps
Relationships conveyed by position, links, color,
shape, size, etc.

12
The Role of Selection

User needs to interact with display, examine
interesting patterns or anomalies, validate
hypotheses
Selection allows isolation of subset of data for
highlighting, deleting, focussed analysis
Direct (clicking on displayed items ) vs.
indirect (range sliders)
Screen space (2-D) vs. data space (N-D)

13
Demonstration of XmdvTool
14
Problems with Large Data Sets

Most techniques are effective with small to
moderate sized data sets
Large sets (gt 50K records) are increasingly
common
When traditional visualizations used, occlusion
and clutter make interpretation difficult

15
One Potential Solution

Multiresolution displays with aggregation
Explicit clustering
Break dimensions into bins
Aggregate in a particular order (datacubes)
Implicit clustering
Hierarchical clustering (proximity-based merging)
Hierarchical partitioning (proximity-based
splits)
Problem many ways to cluster, each revealing
different aspects of data

16
Display Options

For each cluster, show
Center
Extents for each dimension
Population
Other descriptors (e.g., quartiles)
Color clusters such that siblings have similar
color to parents

17
Hierarchical Parallel Coordinates

Bands show cluster extents in each dimension
Opacity conveys cluster population
Color similarity indicates proximity in hierarchy

18
Hierarchical Scatterplots

Clusters displayed as rectangles, showing extents
in 2 dimensions
Color/opacity consistently used for relational
and population info

19
Hierarchical Glyphs

Star glyph with bands
Analogous to parallel coordinates, with radial
rather than parallel dimensions
Glyph position critical for conveying relational
info

20
Hierarchical Dimensional Stacking

Clusters occupy multiple bins
Overlaps can be reduced by increasing number of
bins
Cell colors can be blended, or display last
cluster mapped to space

21
Hierarchical Star Fields

Dimensional reduction techniques commonly
displayed with starfields
Each cluster becomes circle/sphere in field
Alternatively, can show glyph at cluster location

22
Navigating Hierarchies

Drill-down, roll-up operations for more or less
detail
Need selection operation to identify subtrees for
exploration, pruning
Need indications of where you are in hierarchy,
and where youve been during exploration process

23
Structure-Based Brushing

Enhancement to screen-based and data-based
methods
Specify focus, extents, and level of detail
Intuitive - wedge of tree and depth of interest
Implemented by labeling/numbering terminals and
propagating ranges to parents

24
Structure-Based Brush

White contour links terminal nodes
Red wedge is extents selection
Color curve is depth specification
Color bar maps location in tree to unique color
Direct and indirect manipulation of brush

25
Demonstration of Hierarchical Features in XmdvTool
26
Auxiliary Tools

Extent scaling to reduce occlusion of bands
Dimensional zooming - fill display with selected
subspace (N-D distortion)
Dynamic masking to fade out selected or
unselected data
Saving selected subsets
Enabling/disabling dimensions
Univariate displays (Tukey box plots, tree maps)

27
Summary

Many ways to map multivariate data to images,
each with strengths and weaknesses
Linking between and within displays with brushing
enhances static displays and combines their
strengths.
Hierarchies and aggregations allow visualization
of large data sets
Intuitive navigation, filtering, and focus
critical to exploration process
Each basic multivariate visualization method is
readily extensible to displaying cluster
information

28
Problems and Future Work

Many data characteristics not currently supported
(e.g., text fields, records with missing entries,
data quality)
Navigation tool for hierarchies assumes linear
order of nodes
Looking at tools for dynamic reorganization
Exploring 2-D or higher navigation interfaces
Very large hierarchies are difficult to focus on
a narrow subset
Developing multiresolution interface
Investigating distortion techniques for navigation

29
Other Future Work

Projection pursuit or view recommender
Linking structure and data brushing
User studies
Customization based on domain
Query optimization via caching and prefetching

30
For More Information

XmdvTool available to the public domain
Both Unix and Windows support
Next release will have Oracle interface, with
query optimization to support exploratory
operations
http//davis.wpi.edu/xmdv
Papers in Vis 94, 95, 99, Infovis 99, IEEE
TVCG Vol. 6, No. 2, 2000

31
Thanks to..