An Introduction to Multivariate Data Visualization and XmdvTool - PowerPoint PPT Presentation

About This Presentation
Title:

An Introduction to Multivariate Data Visualization and XmdvTool

Description:

Sensors (e.g., images, gauges) Simulations. Census or other surveys. Commerce (e.g., stock market) ... map multivariate data to images, each with strengths and ... – PowerPoint PPT presentation

Number of Views:516
Avg rating:3.0/5.0
Slides: 32
Provided by: Matt1150
Learn more at: https://davis.wpi.edu
Category:

less

Transcript and Presenter's Notes

Title: An Introduction to Multivariate Data Visualization and XmdvTool


1
An Introduction to Multivariate Data
Visualization and XmdvTool
  • Matthew O. Ward
  • Computer Science Department
  • Worcester Polytechnic Institute

This work was supported under NSF Grant
IIS-9732897
2
What is Multivariate Data?
  • Each data point has N variables or observations
  • Each observation can be
  • nominal or ordinal
  • discrete or continuous
  • scalar, vector, or tensor
  • May or may not have spatial, temporal, or other
    connectivity attribute

3
Characteristics of a Variable
  • Order grades have an order, brand names do not.
  • Distance metric for income, distance equals
    difference. For rankings, difference is not a
    distance metric.
  • Absolute zero temperature has an absolute zero,
    bank account balances do not.
  • A variable can be classified by these three
    attributes, called Scale.
  • Effective visualizations attempt to match the
    scale of the data dimension with the graphical
    attribute conveying it.

4
Sources of Multivariate Data
  • Sensors (e.g., images, gauges)
  • Simulations
  • Census or other surveys
  • Commerce (e.g., stock market)
  • Communication systems
  • Spreadsheets and databases

5
Issues in Visualizing Multivariate Data
  • How many variables?
  • How many records?
  • Types of variables?
  • User task (exploration, confirmation,
    presentation)
  • Data feature of interest (clusters, anomalies,
    trends, patterns, .)
  • Background of user (domain expert, visualization
    specialist, decision-maker, .)

6
Methods for Visualizing Multivariate Data
  • Dimensional Subsetting
  • Dimensional Reorganization
  • Dimensional Embedding
  • Dimensional Reduction

7
Dimensional Subsetting
  • Scatterplot matrix displays all pairwise plots
  • Selection allows linkage between views
  • Clusters, trends, and correlations readily
    discerned between pairs of dimensions

8
Dimensional Reorganization
  • Parallel Coordinates creates parallel, rather
    than orthogonal, dimensions.
  • Data point corresponds to polyline across axes
  • Clusters, trends, and anomalies discernable as
    groupings or outliers, based on intercepts and
    slopes

9
Dimensional Reorganization (2)
  • Glyphs map data dimensions to graphical
    attributes
  • Size, color, shape, and orientation are commonly
    used
  • Similarities/differences in features give
    insights into relations

10
Dimensional Embedding
  • Dimensional stacking divides data space into bins
  • Each N-D bin has a unique 2-D screen bin
  • Screen space recursively divided based on bin
    count for each dimension
  • Clusters and trends manifested as repeated
    patterns

11
Dimensional Reduction
  • Map N-D locations to M-D display space while best
    preserving N-D relations
  • Approaches include MDS, PCA, and Kohonen Self
    Organizing Maps
  • Relationships conveyed by position, links, color,
    shape, size, etc.

12
The Role of Selection
  • User needs to interact with display, examine
    interesting patterns or anomalies, validate
    hypotheses
  • Selection allows isolation of subset of data for
    highlighting, deleting, focussed analysis
  • Direct (clicking on displayed items ) vs.
    indirect (range sliders)
  • Screen space (2-D) vs. data space (N-D)

13
Demonstration of XmdvTool
14
Problems with Large Data Sets
  • Most techniques are effective with small to
    moderate sized data sets
  • Large sets (gt 50K records) are increasingly
    common
  • When traditional visualizations used, occlusion
    and clutter make interpretation difficult

15
One Potential Solution
  • Multiresolution displays with aggregation
  • Explicit clustering
  • Break dimensions into bins
  • Aggregate in a particular order (datacubes)
  • Implicit clustering
  • Hierarchical clustering (proximity-based merging)
  • Hierarchical partitioning (proximity-based
    splits)
  • Problem many ways to cluster, each revealing
    different aspects of data

16
Display Options
  • For each cluster, show
  • Center
  • Extents for each dimension
  • Population
  • Other descriptors (e.g., quartiles)
  • Color clusters such that siblings have similar
    color to parents

17
Hierarchical Parallel Coordinates
  • Bands show cluster extents in each dimension
  • Opacity conveys cluster population
  • Color similarity indicates proximity in hierarchy

18
Hierarchical Scatterplots
  • Clusters displayed as rectangles, showing extents
    in 2 dimensions
  • Color/opacity consistently used for relational
    and population info

19
Hierarchical Glyphs
  • Star glyph with bands
  • Analogous to parallel coordinates, with radial
    rather than parallel dimensions
  • Glyph position critical for conveying relational
    info

20
Hierarchical Dimensional Stacking
  • Clusters occupy multiple bins
  • Overlaps can be reduced by increasing number of
    bins
  • Cell colors can be blended, or display last
    cluster mapped to space

21
Hierarchical Star Fields
  • Dimensional reduction techniques commonly
    displayed with starfields
  • Each cluster becomes circle/sphere in field
  • Alternatively, can show glyph at cluster location

22
Navigating Hierarchies
  • Drill-down, roll-up operations for more or less
    detail
  • Need selection operation to identify subtrees for
    exploration, pruning
  • Need indications of where you are in hierarchy,
    and where youve been during exploration process

23
Structure-Based Brushing
  • Enhancement to screen-based and data-based
    methods
  • Specify focus, extents, and level of detail
  • Intuitive - wedge of tree and depth of interest
  • Implemented by labeling/numbering terminals and
    propagating ranges to parents

24
Structure-Based Brush
  • White contour links terminal nodes
  • Red wedge is extents selection
  • Color curve is depth specification
  • Color bar maps location in tree to unique color
  • Direct and indirect manipulation of brush

25
Demonstration of Hierarchical Features in XmdvTool
26
Auxiliary Tools
  • Extent scaling to reduce occlusion of bands
  • Dimensional zooming - fill display with selected
    subspace (N-D distortion)
  • Dynamic masking to fade out selected or
    unselected data
  • Saving selected subsets
  • Enabling/disabling dimensions
  • Univariate displays (Tukey box plots, tree maps)

27
Summary
  • Many ways to map multivariate data to images,
    each with strengths and weaknesses
  • Linking between and within displays with brushing
    enhances static displays and combines their
    strengths.
  • Hierarchies and aggregations allow visualization
    of large data sets
  • Intuitive navigation, filtering, and focus
    critical to exploration process
  • Each basic multivariate visualization method is
    readily extensible to displaying cluster
    information

28
Problems and Future Work
  • Many data characteristics not currently supported
    (e.g., text fields, records with missing entries,
    data quality)
  • Navigation tool for hierarchies assumes linear
    order of nodes
  • Looking at tools for dynamic reorganization
  • Exploring 2-D or higher navigation interfaces
  • Very large hierarchies are difficult to focus on
    a narrow subset
  • Developing multiresolution interface
  • Investigating distortion techniques for navigation

29
Other Future Work
  • Projection pursuit or view recommender
  • Linking structure and data brushing
  • User studies
  • Customization based on domain
  • Query optimization via caching and prefetching

30
For More Information
  • XmdvTool available to the public domain
  • Both Unix and Windows support
  • Next release will have Oracle interface, with
    query optimization to support exploratory
    operations
  • http//davis.wpi.edu/xmdv
  • Papers in Vis 94, 95, 99, Infovis 99, IEEE
    TVCG Vol. 6, No. 2, 2000

31
Thanks to..
  • Elke Rundensteiner
  • Ying-Huey Fua
  • Daniel Stroe
  • Yang Jing
  • Suggestions from Xmdv users
  • NSF
Write a Comment
User Comments (0)
About PowerShow.com