Title: Polaris Query, Analysis, and Visualization of Large Hierarchical Relational Databases
1PolarisQuery, Analysis, and Visualization of
Large Hierarchical Relational Databases
- Pat Hanrahan
- With Chris Stolte and Diane Tang
- Computer Science Department
- Stanford University
2Motivation
- Large databases have become very common
- Corporate data warehouses
- Amazon, Walmart,
- Scientific projects
- Human Genome Project
- Sloan Digital Sky Survey
- Need tools to extract meaning from these databases
3Related Work
- Formalisms for graphics
- Bertins Semiology of Graphics
- Mackinlays APT
- Roth et al.s Sage and SageBrush
- Wilkinsons Grammar of Graphics
- Visual exploration of databases
- DeVise
- DataSplash/Tioga-2
- Visualization and data mining
- SGIs MineSet
- IBMs Diamond
4Formalism
5Polaris Formalism
- UI interpreted as visual specification that
defines - Table configuration
- Type of graphic in each pane
- Encoding of data as visual properties of marks
- Data transformations and queries
6Schema
Market State Year Quarter Month Product
Type Product Profit Sales Payroll Marketing Inven
tory Margin COGS ...
Ordinal fields (categorical)
Coffee chain dataVisual Insights
Quantitative fields (measures)
7Polaris Visual Encodings
Principle of Importance Ordering Encode the most
important information in the most effective way
Cleveland McGill
8The Pivot Table Interface
- Common interface to statistical packages/Excel
- Cross-tabulations
- Simple interface based on drag-and-drop
9(No Transcript)
10Data Cubes
- Structure relation as n-dimensional cube
Each cell aggregatesall measures for those
dimensions
Each cube axis corresponds to a dimension in the
relation
11Table Algebra Operands
- Ordinal fields interpret domain as a set that
partitions table into rows and columns - Quarter (Qtr1),(Qtr2),(Qtr3),(Qtr4) ?
- Quantitative fields treat domain as single
element set and encode spatially as axes - Profit (Profit) ?
12Concatenation () Operator
- Ordered union of two sets
- Quarter ProductType
- (Qtr1),(Qtr2),(Qtr3),(Qtr4)(Coffee),(Espres
so) - (Qtr1),(Qtr2),(Qtr3),(Qtr4),(Coffee),(Espress
o) - Profit Sales
- (Profit),(Sales)
13Cross (?) Operator
- Direct-product of two sets
- Quarter ? ProductType
- (Qtr1,Coffee), (Qtr1, Tea), (Qtr2, Coffee),
(Qtr2, Tea), - (Qtr3, Coffee), (Qtr3, Tea), (Qtr4, Coffee),
(Qtr4,Tea) - ProductType ? Profit
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18SQL Dataflow
- Notes
- Aggregation operators applied after sort
- Only one layer is shown additional z-sort
Sort
Relational Table
Tuples in Panes
Marks in Panes
19Multiscale Visualization
20Hierarchical Structure
- Challenge these databases are very large
- Queries/Vis should not require all the records
- Augment database with hierarchical structure
- Provide meaningful levels of abstraction
- Derived from domain or clustering
- Provides metadata (missing data for context)
21Hierarchies and Data Cubes
- Each dimension in the cube is structured as a
tree - Each level in tree corresponds to level of detail
22Schema Star Schema
Existence Table
Fact table
Location Market State
State Month Product Profit Sales Payroll Marketing
Inventory Margin ...
Time Year Quarter Month
Products Product Type Product Name
Measures
- Generalizations
- Snowflake schemas
- Lattices (DAGs)
23Categorical Hierarchies
- Quarter ? Month
- Direct product of two sets
- Would create twelve entries for each quarter,
i.e. (Qtr1, December) - Quarter / Month
- Based on tuples in database not semantics
- Would only create three entries per quarter
- Can be expensive to compute
- Quarter . Month
- Based on tuples in existence tables (not db)
24Cartographic Generalization
Canterbury and East Kent
150,000
1625,000
25Generalization Techniques
- Selection
- Simplification
- Exaggeration
- Regularization
- Displacement
- Aggregation
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Summary
- Polaris
- Spreadsheet or table-based displays
- Simple drag-and-drop interface
- Built on a formalism that allows algebraic
manipulation of visual mapping of tuples to marks - Multiscale visualizations using data and visual
abstraction - Connects to SQL/MDX servers
- See http//www.graphics.stanford.edu/projects/pola
ris
31Future Work
- Articulate full-set of multiscale design patterns
- Transition between levels of detail
- Develop system infrastructure for browsing VLDB
- Support layers/lenses/linking with tuple flow
- Device independence through graphical encodings
- Extend formalism to 3D
- Couple scientific and information visualization