Information%20Visualization - PowerPoint PPT Presentation

About This Presentation
Title:

Information%20Visualization

Description:

The role of visualization in the process of data mining. The patterns being sought: clusters ... Use visualization techniques to steer the data mining process. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 51
Provided by: davidsquir3
Category:

less

Transcript and Presenter's Notes

Title: Information%20Visualization


1
Information Visualization
2
Lecture Outline
  • Overview of information visualization
  • The role of visualization in the process of data
    mining
  • The patterns being sought clusters and outliers
  • Issues when visualizing higher dimensional
    relationships
  • Criteria for comparison
  • A range of visualization techniques for
    exploratory data analysis

3
Information Visualization
  • A conjunction of a number of fields
  • Data Mining
  • Cognitive Science
  • Graphic Design
  • Interactive Computer Graphics

4
Information Visualization
  • Information Visualization attempts to use visual
    approaches and dynamic controls to provide
    understanding and analysis of multidimensional
    data
  • The data may have no inherent 2D or 3D semantics
    and may be abstract in nature.
  • There is no underlying physical model.
  • Much of the data in databases is of this type

5
Role of Information Visualization
  • Acts as an exploratory tool
  • Useful for identifying subsets of the data
  • Structures, trends and outliers may be identified
  • Statistical tests tend to incorporate isolated
    instances into a broader model as they attempt to
    formulate global features
  • There is no requirement for an hypothesis, but
    the techniques can also support the formulation
    of hypotheses if wanted

6
Integrating Visualization WithData Mining
  • There are four possible approaches
  • Use the visualization technique to present the
    results of the data mining process.
  • Use visualization techniques as complements to
    the data mining process.
  • They complement and increase understanding in a
    passive way.

7
Integrating Visualization WithData Mining
  • Use visualization techniques to steer the data
    mining process.
  • The visualization aids in deciding the
    appropriate data mining technique to use and
    appropriate subsets of the data to consider.
  • Apply data mining techniques to the visualization
    rather than directly to the data.
  • The idea is to capture the essential semantics
    visually then apply the data mining tools.

8
The Process of Knowledge Discovery in Databases
(a.k.a. Data Mining)
DataSelection
Cleaning Enrichment
Coding
Data mining
Reporting

- clustering
-domain consistency
- segmentation
-de-duplication
- prediction
-disambiguation
Information Requirement
Action
Feedback
External data
Operational data
The Knowledge Discovery in Databases (KDD)
process (AdZ1996)
9
Visualization in the Context of the Data Mining
Process
  • Visualization tools can potentially be used at a
    number of steps in the DM process. But
  • the same tools may not be appropriate at each
    step
  • how they will be used may be different

10
Visualization in the Context of the Data Mining
Process
  • In general, it is not important whether data
    visualization is the first step in the process or
    not
  • the feedback loop which moves the process forward
    may be commenced by either a visualization or a
    query

11
Visualization in the Context of the Data Mining
Process
  • some visualizations, (e.g. see slide 25) require
    an initial query to generate a visualization
  • this is an example of a complementary approach
  • questions generate visualizations, which may
    prompt further questions or generate hypotheses

12
Motivations for Visualization
  • The human visual system is extremely good at
    recognizing patterns
  • it is quicker and easier to understand visual
    representations than to absorb information from
    language or formal notations.
  • Exploratory visualization assists in
  • identifying areas of interest
  • identifying questions which might usefully be
    asked

13
Motivations for Visualization
  • i.e. a relevant or revealing visualization of
    either part or all of a data set, may suggest
    useful questions and/or hypotheses to the
    analyst. These can then be confirmed by more
    rigorous approaches
  • e.g. some clustering techniques require an
    initial estimate of the number of clusters
    present in the data
  • visualization techniques can assist in this
    estimation

14
Criteria for Comparison of Visualization Tools
  • Number of dimensions that can be represented
  • Number of data items that can be handled
  • Ability to handle categorical and other
    non-numeric data types
  • Ability to reveal patterns
  • Ease of use
  • Learning Curve (to what degree is the technique
    intuitive)

15
Examples - Scatterplot
  • Each pair of features (i.e. fields of records) in
    a multidimensional database is graphed as a point
    in two dimensions (2D)
  • This straightforward graphing procedure produces
    a simple scatterplot - a projection of the
    multidimensional data into 2D

16
Examples - Scatterplot
  • The scatterplots of all pair-wise combinations of
    features are arranged in a matrix
  • The figure on the following slide illustrates a
    scatter plot matrix of 3D from a study of
    abrasion loss in tyres. The features are
    hardness, tensile-strength, abrasion-loss
    Tie1989
  • Each sub-graph gives insight into the
    relationship between a pair of features

17
Scatterplot Matrix
  • Scatterplot matrix of abrasion loss data Tie1989

18
Possible Problems With Scatterplots
  • Everitt Eve78, p. 5 gives two reasons why
    scatter plots can prove unsatisfactory
  • if number of features is greater than 10, the
    number of plots to be examined is very large
  • this is just as likely to lead to confusion as
    to knowledge of the structures in the data.
  • structures existing in multidimensional data set
    do not necessarily appear in the 2D projections
    of the features represented in scatterplots (see
    next slide)

19
Possible Problems With Scatterplots
  • Despite these potential problems, variations on
    the scatterplot approach are the most commonly
    used of all the visualization techniques

20
Scatterplots Recognizing High-dimensional
Structures - 1
  • A structure which appears as a cluster in a 2D
    projection may in fact be a pipe in 3D
  • a pipe is a structure in 3D that looks like a rod
    or pipe when viewed in a 3D representation

21
Scatterplots Recognizing High-dimensional
Structures - 1
  • While the pipe is easily identifiable in a 3D
    display only projections of it will appear in the
    2D components of the scatterplot matrix
  • depending of the orientation of the pipe in 3D,
    it may not appear as an obvious cluster, if at all

22
Scatterplots Recognizing High-dimensional
Structures - 1
  • Equivalent structures can exist in higher
    dimensions, e.g. a cluster in 5D might be a
    pipe in 6D
  • the appearance of high-D structures in lower-D
    projections depends on the luck and skill of the
    analyst in choosing the projections, and on the
    alignment of the structures to the axes

23
Scatterplots recognizing high-dimensional
structures - 2
Random(Uniform)
May be a plane in 3D
A cluster in 2D
May be a pipe in 3D (or a cluster in 3D)
24
Example Tool Spotfirehttp//www.spotfire.com/
25
Example Tool Spotfirehttp//www.spotfire.com/
  • The user interacts with data by choosing which
    features will form the horizontal and vertical
    axes
  • Other features can be represented by color
  • this is an example of using the richness of
    visual representations to provide more
    information to the user. As well as 2D spatial
    position, other modes such as colour, size, shape
    and even sound can be used to convey information
    about high-dimensional data

26
Example Tool Spotfirehttp//www.spotfire.com/
  • On the previous slide, the data set contains a
    3D cluster
  • The cluster can seen, with its centre at around
    (20, 74)
  • all the points in the cluster are red, showing
    that its a 3D cluster

27
Example Tool DBMinerhttp//www.dbminer.com/
28
Example Tool DBMinerhttp//www.dbminer.com/
  • DBMiner is an integrated data mining tool
  • It employs a data visualization known as a data
    cube (see On-Line Analytic Processing - OLAP)

29
Example Tool DBMinerhttp//www.dbminer.com/
  • After creating a data cube, user can apply a
    variety of data mining techniques to analyze the
    data further, including
  • association, classification, prediction and
    clustering, etc.
  • The figure on the preceding slide shows a data
    cube for a data set which has 3D cluster of data
    instances in a 3D space

30
Examples Parallel Coordinates - 1
  • Uses the idea of mapping a point in a
    multidimensional feature space on to a number of
    parallel axes
  • Each feature is mapped one axis
  • as many axes as need can be lined up side to side
  • there is no limit to the number of dimensions
    that can be represented

31
Examples Parallel Coordinates - 1
  • A single polygonal line connects the individual
    coordinate mappings for each point
  • The technique has been applied in air traffic
    control, robotics, computer vision and
    computational geometry

32
Examples Parallel Coordinates - 2
Ci
Ci-1
Ci-1
Cn
C1
X1 X2 X3 Xi-1
Xn
  • Parallel axes for RN. The polygonal line shown
    represents the point C (C1, .... , C i-1, Ci,
    Ci1, ... , Cn)

33
Examples Parallel Coordinates - 3
  • The Parallel Coordinates visualization technique
    is employed in the software WinViz
    http//www.computer.org/intelligent/ex1996/x5069ab
    s.htm
  • The main advantage of the technique is that it
    can represent unlimited numbers of dimensions

34
Examples Parallel Coordinates - 3
  • When many points are represented using the
    parallel coordinates, the overlap of the
    polygonal lines can make it difficult to identify
    structures in the data.
  • Certain structures, such as clusters, can often
    be identified but others are hidden due to the
    overlap.

35
Two Clusters In WinViz
36
Examples Stick Figures
  • The stick figure technique is intended to make
    use of the users low-level perceptual processes
    PGL1995, such as perception of
  • texture, color, motion, and depth
  • The hope is that the user will automatically
    try to make physical sense of the pictures of the
    data created

37
Examples Stick Figures
  • Visualizations which represent multidimensional
    feature spaces by using a number of subspaces of
    3D or less (e.g. scatterplots) rely more on our
    cognitive abilities than our perceptual abilities
  • Stick figures avoid this, and present all
    variables and data points in a single
    representation.

38
Iconographic display using stick figures - US
Census Datahttp//ivpr.cs.uml.edu/gallery/
39
(No Transcript)
40
(No Transcript)
41
(No Transcript)
42
Examples Pixel-based techniqueshttp//www.dbs.in
formatik.uni-muenchen.de/dbs/projekt/visdb/visdb.h
tml
  • Query-Dependent Pixel-based Techniques
  • based on a query, a semantic distance is
    calculated between each of the query feature
    values and the features of each instance in the
    DB
  • Distance is mapped to colour for each attribute
  • Overall distance between the data values for a
    specific instance and the data attribute values
    used in the predicate of the query is also
    calculated

43
Examples Pixel-based techniqueshttp//www.dbs.in
formatik.uni-muenchen.de/dbs/projekt/visdb/visdb.h
tml
  • Instances are arranged on the screen, with the
    data items with highest relevance in the centre
    of the display, and then proceeding outwards in a
    spiral
  • the values for each of the attributes are
    presented in separate subwindows
  • the arrangement inside the subwindows is
    according to the overall distance

44
Query-Dependent Pixel-based Techniques
Overall Distance
  • Result of a complex query KeK1994

45
Examples Worlds within Worldshttp//www.cs.colum
bia.edu/graphics/projects/AutoVisual/AutoVisual.ht
ml
  • Employs virtual reality devices to represent an
    nD virtual world in 3D or 4D-Hyperworlds
  • basic approach to reducing the complexity of a
    multidimensional function is to hold one or more
    of its independent variables constant
  • equivalent to taking an infinitely thin slice of
    the world perpendicular to the constant
    variables axis
  • can be repeated until there are 3 dimensions and
    the resulting slice can be manipulated and
    displayed with conventional 3D graphics hardware

46
Examples Worlds within Worldshttp//www.cs.colum
bia.edu/graphics/projects/AutoVisual/AutoVisual.ht
ml
  • After reducing the higher-dimensional space to 3
    dimensions the additional dimensions can be added
    back, by adding additional 3D worlds within the
    first 3D world

47
Worlds within Worlds
48
Dynamic Techniques
  • Allow interaction with the visualization to
    explore the data more effectively. Can
    potentially be applied to all visualization
    techniques
  • Dynamic linking of the data attributes to the
    parameters of the visualization.
  • Filtering
  • Linking and brushing between multiple
    visualizations
  • Zooming
  • Details on demand

49
Other Techniques
  • Keim and Kriegels query independent approach
  • Chernoff faceshttp//www.fas.harvard.edu/stats/C
    hernoff/Hcindex.htm
  • Cone trees
  • Perspective walls
  • Visualization Spreadsheet
  • A number of techniques especially developed for
    web pages and their links

50
Web References
  • More lectures and demo software available at
  • http//www.cs.auc.dk/DVDM/courses.html
Write a Comment
User Comments (0)
About PowerShow.com