Title: Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases
1PolarisQuery, Analysis, and Visualization of
Large Hierarchical Relational Databases
- Chris Stolte
- Computer Science Department
- Stanford University
2Motivation
- Large relational databases have become very
common - Corporate data warehouses
- Amazon, Walmart,
- Scientific projects
- Human Genome Project
- Sloan Digital Sky Survey
- Need tools to extract meaning from these
databases - Programmatic data mining/statistical analysis
- Visual exploration and analysis
3Related Work
- Formalisms for graphics
- Bertins Semiology of Graphics
- Mackinlays APT
- Roth et al.s Sage and SageBrush
- Wilkinsons Grammar of Graphics
- Visual exploration of databases
- DeVise
- DataSplash/Tioga-2
- Visualization and data mining
- SGIs MineSet
- IBMs Diamond
4Outline
- Review of Data Warehouses Data Cubes
- A Visualization Formalism
- Polaris Visual Data Mining
- Multiscale Visualization
5Review of Data Warehouses and Data Cubes
6Review Data Warehouses
- Data warehouse stores data for analysis
- measures (facts) categorized by dimensions
Fact table
State Month Product Name Profit Sales Payroll Mar
keting Inventory Margin ...
Nominal / Ordinal fields (categorical dimensions)
Coffee chain (courtesy Visual Insights)
Quantitative fields (measures)
7Hierarchies
- Data warehouses are very largeneed to summarize
- Add hierarchical structure to warehouse
Dimension tables
Time Year Quarter Month
Fact table
Location Market State
State Month Product Name Profit Sales Payroll Mar
keting Inventory Margin ...
Products Product Type Product Name
8Hierarchical Dimensions
Time Year Quarter Month
9Data Cube
- For each level-of-detail, summarize relations as
cubes - More efficient, powerful model for analysis
Each cell aggregatesall measures for those
dimensions
Each cube axis corresponds to a dimension in the
relation at a level-of-detail
10Hierarchies Data Cubes
Hierarchies define a lattice of cubes
Least detailed
Each cube is defined by a level-of-detail in each
dimension.
Data abstraction
Most detailed
11Projecting Data Cubes
Can further abstract a cube by projection
Data abstraction
12Data Warehouse Summary
- Industry standard for storing analytic data
- Not operational or transactional data
- Structured as a lattice of data cubes
(aggregations) - Provide summaries of data at meaningful levels of
detail - To perform data abstraction
- Choose a cube in the lattice of cubes
- Project to relevant dimensions
- Where a lot of important data is stored
13A Visualization Formalism
14A Visualization Formalism
- Typical approach
- Monolithic objects defining a single visual
metaphor - Formalism
- Defines a space of visualization and unifies
tables, different graphs as a class of visual
representation - Succinct specification of sophisticated
visualizations - Can be compiled into necessary drawing operations
and database queries - Exposes structure and pattern of effective visual
metaphors - Powerful tool for describing, comparing, and
building visualizations
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Polaris Formalism
- Visualizations described using visual
specifications that define - Table configuration for visualization (algebra)
- Type of graphic in each pane
- Encoding of data as visual properties (color,
size, shape, ) of marks - Data transformations and queries
- Interpreter compiles a specification into drawing
commands and database queries
21Polaris Algebra Operands
- Ordinal fields interpret domain as a set that
partitions table into rows and columns - Quarter (Qtr1),(Qtr2),(Qtr3),(Qtr4) ?
- Quantitative fields treat domain as single
element set and encode spatially as axes - Profit (Profit) ?
22Concatenation () Operator
- Ordered union of two sets
- Quarter ProductType
- (Qtr1),(Qtr2),(Qtr3),(Qtr4)(Coffee),(Espres
so) - (Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espress
o) - Profit Sales
- (Profit),(Sales)
23Cross (?) Operator
- Direct-product of two sets
- Quarter ? ProductType
- (Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee),
(Qtr2, Tea), - (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee),
(Qtr4,Tea) - ProductType ? Profit
24Categorical Hierarchies
- Quarter ? Month
- Direct product of two sets
- Would create twelve entries for each quarter,
i.e. (Qtr1, December) - Quarter / Month
- Based on tuples in fact table not semantics
- Would only create three entries per quarter
- Can be expensive to compute
- Quarter . Month
- Based on tuples in dimension tables
25Encoding System
ltcolor encodinggtltmeasure nameProfitgtlt/color
encodinggt
26SQL Dataflow
Sort
Query Results
Tuples in Panes
Marks in Panes
- Notes
- Aggregation operators applied after sort
- Only one layer is shown additional z-sort
27Polaris Visual Data Mining
28The Pivot Table Interface
- Common interface to statistical packages/Excel
- Cross-tabulations
- Simple interface based on drag-and-drop
29Extending the Pivot Table Interface
- Extend the interface by
- Generating rich table-based graphical displays
rather than tables of text - Providing a single conceptual model for both
graphs and tables - Preserving the ability to rapidly construct
displays
30Polaris Design Goals
- Design guided by two primary goals
- Interactive analysis and exploration versus
static visualization - Simple, consistent interface
31Analysis Exploration Challenges
- Designing a user interface for analysis and
exploration places several requirements on the
user interface - Data dense displays display both many tuples
many dimensions - Multiple display types different displays suited
to different tasks - Exploratory interfaces rapidly change data
transformations and views
32Simple, Consistent Interface
- Excel Pivot tables provide a simple interface for
building text-based tables - Graphs require multiple steps different
interfaces and conceptual models - Want to unify tables, graphs, and database
queries in one interface
33Polaris Demo!
34Data Mining and Visualization
- Polaris not solely for visual analysis
- Precursor to algorithmic analysis
- Validate results and establish trust
- Incorporate decision trees and classification
algorithms into data warehouses as hierarchies
35Multiscale Visualization
36Multiscale Visualization
- Directly support analysis process
- Overview first, zoom and filter, then
details-on-demand - Visual representation changes as user pans and
zooms - Overview, lots of data ? data highly abstracted
- Zoom, data density decreases ? more detailed
information shown - Visual and data abstraction
- Visual abstraction different representation/same
data - Data abstractiontransformations to reduce data
set size
37Existing Multiscale Visualizations
- Cartography
- Multiscale information visualization
- Pad alternate desktops
- DataSplash
- XmdvTool
- ADVIZOR
- Main limitations
- One zoom path
- Primarily visual abstraction
38Contributions
- Multiscale visualization with both visual and
data abstraction using generalized mechanisms - Data Abstraction ? Data Cubes
- Visual Abstraction ? Polaris
- Design Patterns
39Path of Exploration
- Can think of an analysis as path of specifications
40Path of Exploration
Visual abstraction
41Path of Exploration
This is a multiscale visualization!
Dataabstraction
42Graphical Notation
43Graphical Notation Templates
Instance
Template
44Specifying Multiscale Visualizations
- Specify multiscale visualization using a graph of
Polaris specifications - zoom graphs
- Infovis 2002 paper describes how to implement
?Polaris Specification
Zooming
?Possible zoom
45Specifying Multiscale Visualizations
- Can specify a zooming pattern by using templates
46Specifying Multiscale Visualizations
- Independent zooming on different dimensions is
described as a graph
y-axis zoom
x-axis zoom
47Design Patterns
48Design Patterns
- Zoom graphs simplify specifying and implementing
multiscale visualizations - Design is still very hard
- Design patterns (a la Gamma et al.)
- Capture zoom structures that have been used
effectively reuse in new designs - We present four such patterns
- Formal way to discuss multiscale visualization
49Thematic Maps
50Thematic Maps
51Thematic Maps
52Thematic Maps
53Chart Stacks
54Chart Stacks
55Chart Stacks
56Chart Stacks
57Matrices
58Matrices
59Matrices
60Matrices
61Dependent QQ Plots
62Summary
- Polaris
- Spreadsheet for table-based displays
- Simple drag-and-drop interface
- Built on a formalism that allows algebraic
manipulation of visual mapping of tuples to marks - Multiscale visualizations using data and visual
abstraction - Connects to SQL/MDX servers
- See http//www.graphics.stanford.edu/projects/pola
ris
63Future Work
- Articulate full-set of multiscale design patterns
- Transition between levels of detail
- Develop system infrastructure for browsing VLDB
- Support layers/lenses/linking with tuple flow
- Device independence through graphical encodings
- Extend formalism to 3D
- Couple scientific and information visualization