Title: ON-LINE ANALYTICAL PROCESSING
1ON-LINE ANALYTICAL PROCESSING
- Prof. Navneet Goyal
- Department of Computer Science Information
Systems - BITS, Pilani
2Objectives
- What is OLAP
- Need for OLAP
- Features functions of OLAP
- Different OLAP models
- OLAP implementations
3OLAP
- Term coined in mid 1990s
- Main Goal support ad-hoc but complex querying by
business analysts - Extends worksheet like analysis to work with huge
amounts of data in a DW
4Demand for OLAP
- 2 approaches to developing EDWs
- In both approaches, Data Marts rest on
Dimensional Model - Data Marts are sufficient for basic data analysis
- Users need to go beyond such basic analysis
5Demand for OLAP
- Need for Multidimensional Analysis
- Fast Access Powerful Calculations
- Limitations of other analysis methods like
- SQL
- Spreadsheets
- Report Writers
6Demand for OLAP
- Traditional tools of report writers, query
products, spreadsheets, language interfaces do
not match the user expectations as far as
performing multidimensional analysis with complex
calculations is concerned. - Tools used with OLTP and basic DW environments do
not match up to the task
7OLAP is the Answer!
- OLAP is a category of software technology that
enables analysts, managers, and executives to
gain insight into the data through fast,
consistent, interactive, access in a wide variety
of possible views of information that has been
transformed from raw data to reflect the real
dimensionality of the enterprise as understood by
the user.
8What is OLAP?
OLAP software provides the ability to analyze
large volumes of information to improve decision
making at all levels of an organization.
9What is OLAP?
A wide spectrum of multidimensional analysis
involving intricate calculations and requiring
fast response times.
10What is OLAP?
OLAP has two immediate consequences online part
requires the answers of queries to be fast, the
analytical part is a hint that the queries itself
are complex i.e., Complex questions with Fast
Answers!
11Why a separate OLAP tool?
- Empowers end users to do own analysis
- Frees up IS backlog of report requests
- Ease of use
- No knowledge of tables or SQL required
12OLAP Characteristics
- Multi-user environment
- Client-server architecture
- Rapid response to queries, regardless of DB
size and complexity
13Data Warehouse OLAP
- OLAP is a software system that works on top of a
DW - A front-end tool for a DW
- Information delivery system for the DW
- Compliments the information delivery capacities
of a DW
14Why is OLAP useful?
- Facilitates multidimensional data analysis by
pre-computing aggregates across many sets of
dimensions - Provides for
- Greater speed and responsiveness
- Improved user interactivity
15The OLAP Market
16The OLAP Market
17Data Warehouses
- A data warehouse is based on a multidimensional
data model which views data in the form of a data
cube - A data cube allows data to be modeled and viewed
in multiple dimensions - In data warehousing literature, an n-D base cube
is called a base cuboid. The top most 0-D cuboid,
which holds the highest-level of summarization,
is called the apex cuboid. The lattice of
cuboids forms a data cube.
18Lattice of Cuboids
19CUBE
Multi-dimensional cube
Fact table view
dimensions 3
20Aggregates
- Add up amounts for day 1
- In SQL SELECT sum(amt) FROM SALE
- WHERE date 1
81
21Aggregates
- Add up amounts by day
- In SQL SELECT date, sum(amt) FROM SALE
- GROUP BY date
22Aggregates
- Operators sum, count, max, min, median, ave
- Having clause
- Using dimension hierarchy
- average by region (within store)
- maximum by month (within date)
23Cube Aggregation
Example computing sums
day 2
. . .
day 1
129
24Cube Operators
day 2
. . .
day 1
sale(c1,,)
129
sale(c2,p2,)
sale(,,)
sale(,p1,)
25Extended Cube
day 2
sale(,p2,)
day 1
26Aggregation Using Hierarchies
customer
region
country
(customer c1 in Region A customers c2, c3 in
Region B)
27Pivoting
Fact table view
Multi-dimensional cube
28Cube Aggregates Lattice
129
all
city
product
date
city, product
city, date
product, date
use greedy algorithm to decide what to materialize
city, product, date
29Dimension Hierarchies
all
state
city
30Dimension Hierarchies
all
product
city
date
product, date
city, product
city, date
state
city, product, date
state, date
state, product
state, product, date
not all arcs shown...
31Interesting Hierarchy
all
years
weeks
quarters
conceptual dimension table
months
days
32SAMPLE CUBE
Total annual sales of TV in U.S.A.
Total annual sales of PC in U.S.A.
Total annual sales of VCR in U.S.A.
Total sales In U.S.A
Total Q1 sales In U.S.A
Total sales In Canada
Total Q1 sales In Canada
Total sales In Mexico
Total Q1 sales In Mexico
Total Q2 sales In all countries
TOTAL SALES
Total Q1 sales In all countries
33OLAP Operations
- Roll-Up
- Drill-Down
- Slice Dice
- Pivot
- Drill-Across
- Drill-Through
34Example Schema
- Fact Table
- Sales(Store_id, Product_id, Time_id, Sales_amt)
- Dimension Tables
- Store (Store_id, city, state, region, country)
- Product (Product_id, name, category)
- Day (Time_id, month, quarter, year)
- Hierarchies
- Store ? City ? State ? Region ? Country
- Product ? Category
- Day ? Month ? Quarter ? Year
35Drill-Down
- SELECT S.product_id, S.store_id, SUM(S.sales_amt)
- FROM Sales S
- GROUP BY S.store_id, S. product_id
- SELECT S.product_id, St.state, SUM(S.sales_amt)
- FROM Sales S, Store St
- WHERE St.store_idS.store_id
- GROUP BY S.product_id, St.state
- SELECT S.product_id, St.city, SUM(S.sales_amt)
- FROM Sales S, Store St
- WHERE St.store_idS.store_id
- GROUP BY S.product_id, St.city
State
City
36Drill-Down
37Rolling Up
- SELECT S.product_id, St.city, SUM(S.sales_amt)
- INTO City_sales
- FROM Sales S, Store St
- WHERE St.store_idS.store_id
- GROUP BY S.product_id, St.city
- SELECT T.product_id, St.State, SUM(T.sales_amt)
- FROM City_sales T, Store St
- WHERE St.cityT.City
- GROUP BY T.product_id, St.State
38Pivoting
- When we view the data as a multi-dimensional
cube group on a subset of axes, we are said to
be performing a pivot on those axes - - Pivoting on dimension Dj (j1(1)k) in a cube
Di (i1(1)n) means that we use GROUP BY Aj
(j1(1)k) aggregate over Ak1, . An, where Ai
is an attribute of dimension Di - - Pivoting on product time corresponds to
grouping on prod_id quarter aggregating over
store_id
39Pivoting
-
- SELECT S.product_id, T.quarter, SUM(S.sales_amt)
- FROM Sales S, Time T
- WHERE T.time_idS.time_id
- GROUP BY S.product_id, T.quarter
40Dicing
- When we use GROUP BY to specify part of an
hierarchy, we are performing a range selection
called a DICE - Dicing Sales in the time dimension total sales
for each product in each qurater - SELECT S.product_id, T.quarter, SUM(S.sales_amt)
- FROM Sales S, Time T
- WHERE T.time_idS.time_id
- GROUP BY T.quarter, S.product_id
41Slicing
- When we use WHERE to specify a particular value
for an axis, we are performing a SLICE - Slicing in the time dimension choosing sales
only in week 12, then pivoting to product_id
(aggregating over store_id) - SELECT S.product_id, SUM(S.sales_amt)
- FROM Sales S, Time T
- WHERE T.time_idS.time_id T.week12
- GROUP BY S.product_id
42Slicing
43OLAP Operations
44Slicing
45Dicing (Sub-cube)
46Roll-Up
47Drill-Down
48Other OLAP Operations
- Drill-Across Queries involving more than one
fact table - Drill-Through Makes use of SQL to drill through
the bottom level of a data cube down to its
back-end relational tables - Pivot (rotate) Pivot (also called "rotate") is
a - visualization operation which rotates the data
axes in - view in order to provide an alternative
presentation of - the data. Other examples include rotating the
axes in a - 3-D cube, or transforming a 3-D cube into a
series of 2- - D planes.
49Other OLAP Operations
- Top N or Bottom N queries
- Moving Averages
- Growth Rates
- Depreciation
- Currency Conversion
- Statistical Functions
50Conceptual vs. Actual
- The cube is a logical way of visualizing the
data in an OLAP setting - Not how the data is actually represented on disk
- Two ways of storing data
- ROLAP Relational OLAP
- MOLAP Multidimensional OLAP
51OLAP CUBE
- Construction of the data cube is key to the
operation of OLAP - The computation process creates a set of
aggregates on the various dimensions of the data - The CUBE operator
52An example of the CUBE Operator
53The CUBE Operator
- Proposed by Gray et al
- Effectively involves a series of GROUP-BY
operations to aggregate data - Creates power set on all attributes according to
- A measure
- An aggregator function
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D.
Reichart, M. Venkatrao, F. Pellow and H.
Pirahesh. Data cube A relational aggregation
operator generalizing group-by, cross-tab and
sub-totals. Data Mining and Knowledge Discovery,
129-54, 1997.
54CUBING Problem
- Problem this generates a lot of data and work
(2n sets in total, where n is the number of
dimensions) - Solution optimized algorithms to run faster,
consume less memory, and perform fewer I/Os.
55Efficient Computation of Data Cubes
- ROLAP-based cubing algorithms (Agarwal et al96)
- Array-based cubing algorithm
- (Zhao et al97)
S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta,
J. F. Naughton, R. Ramakrishnan and
S.Sarawagi. On the computation of
multidimensional aggregates. In VLDB'96. Y. Zhao,
P. M. Deshpande, and J. F. Naughton. An
array-based algorithm for simultaneous
multidimensional aggregates. In SIGMOD'97.
56Efficient Computation of Data Cubes
- How many cuboids in a cube with 3 dimensions?
- Answer
- As many group by operations?
- No hierarchies involved!!
- p (Li 1), where Li is the number of levels
associated with dimension I - 10 dimensions 4 levels for each dimension
- Total Cuboids 510
57Approaches to OLAP Servers
- It is all about which DBMS you choose to store
your data warehouse data - RDBMS ROLAP
- MDDB MOLAP
- BOTH - HOLAP
58 OLAP Flavours
59Approaches to OLAP Servers
- Three possibilities for OLAP servers
- (1) Relational OLAP (ROLAP)
- Relational and specialized relational DBMS to
store and manage warehouse data - OLAP middleware to support missing pieces
- (2) Multidimensional OLAP (MOLAP)
- Array-based storage structures
- Direct access to array data structures
- (3) Hybrid OLAP (HOLAP)
- Storing detailed data in RDBMS
- Storing aggregated data in MDBMS
- User access via MOLAP tools
60ROLAP
- Special schema design star, snowflake
- Special indexes bitmap, multi-table join
- Proven technology (relational model, DBMS), tend
to outperform specialized MDDB especially on
large data sets - Products
- IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
61ROLAP
- Defines complex, multi-dimensional data with
simple model - Reduces the number of joins a query has to
process - Allows the data warehouse to evolve with
relatively low maintenance - Can contain both detailed and summarized data.
- ROLAP is based on familiar, proven, and already
selected technologies. - BUT!!!
- SQL for multi-dimensional manipulation of
calculations.
62MOLAP
- MDDB a special-purpose data model
- Facts stored in multi-dimensional arrays
- Dimensions used to index array
- Sometimes on top of relational DB
- Products
- Pilot, Arbor Essbase, Gentia
63MOLAP
- Pre-calculating or pre-consolidating
transactional data improves speed. - BUT
- Fully pre-consolidating incoming data, MDDs
require an enormous amount of overhead both in
processing time and in storage. An input file of
200MB can easily expand to 5GB - MDDBs are great candidates for the lt 100GB
department data marts. - With MDDs, application design is essentially the
definition of dimensions and calculation rules,
while the RDBMS requires that the database schema
be a star or snowflake.
64Quick Recap of OLAP Needs
- User Needs
- Multidimensional view
- Excellent Performance
- Analytical Flexibility
- Real-Time Data Access
- High Data Capacity
- MIS Needs
- Leverages Data Warehouse
- Easy Development
- Low Structure Maintenance
- Low Aggregate Maintenance
65Quick Recap of OLAP Needs User Needs
- Multidimensional View
- All true OLAP tools, whether they work with a
MDDB or an RDBMS, provide a multidimensional view
of data. - For example, decision makers may view sales by
office, quarter, representative, product, etc.
This perspective on data, which mirrors the way
business professional think, allows for more
intuitive and more powerful analysis.
66Quick Recap of OLAP Needs User Needs
- Excellent Performance
- The performance of your decision support tool
directly depends on the way it manages
aggregates. - RDBMS
- Calculate aggregates on fly (response time
suffers) - DBA creates summary tables to store aggregates
(enormous amount of disk space)
67Quick Recap of OLAP Needs User Needs
- Excellent Performance
- For example, suppose you have a Sales indicator
with six dimensionsRepresentatives, Products,
Customers, Regions, Months, and Years. - MOLAP tools will store a given aggregate, such as
the November 1997 government sales of product
A504 by representative 1040 in New York, in 1
cell of the MDDB. - In contrast, ROLAP tools consume 600 more space,
because they require a record of seven valuessix
foreign keys and the actual aggregatein a
relational summary table.
68Quick Recap of OLAP Needs User Needs
69Quick Recap of OLAP Needs User Needs
RDBMSs must use several summary tables to store
the aggregates that a MOLAP could store in just
one cube. For example, consider a Sales indicator
with three dimensions Months, Regions, and
Products. The indicator cube will contain seven
sets of aggregates Sales by month Sales by
product Sales by region Sales by month and
product Sales by month and region Sales by
product and region Sales by product, month, and
region To store these aggregates in an RDBMS,
youd have to create seven summary tables, one
for each aggregate set. HOW MANY SUMMARY TABLES
FOR 6 DIMENSIONS? (Separate fact table and
shrunken dimension table approach for storing
aggregates)
70Quick Recap of OLAP Needs User Needs
- Huge amounts of extra storage space is required
(even if there is no sparsity failure) - Maintenance costs are high
- Lot of statistical analysis needs to be done to
decide which aggregates are to be precomputed - DBA must keep the cost/performance ratio in
check
71Quick Recap of OLAP Needs User Needs
- In contrast, weve seen that multidimensional
databases store aggregates in a very compact
structure that consumes very little disk space
and requires very little maintenance - All levels of consolidation can therefore be
precomputed and stored in MDDB - As a result, fast response time is not limited
to the most frequently accessed queries all
aggregates can be accessed with lightning speed. -
72Quick Recap of OLAP Needs User Needs
- Analytical Flexibility
- Both ROLAP MOLAP tools offer comparative
performance for - Comparative Analysis
- Roll-up and Drill-down
- Slicing Dicing
- Only MOLAP tools offer what-if analysis
73Quick Recap of OLAP Needs User Needs
- Real-Time Data Access
- MOLAP tools load data into the multidimensional
cubes. Consequently, the data being accessed is
only as recent as the last load. - Some applications require real-time data access
- Process of continually refreshing the data
attaches higher costs to operating a MOLAP system - Some MOLAP tools offer reach-through
functionality to access volatile data stored
outside the MDDB - Unfortunately, users must be aware of the
underlying database structure - Relational data access is too complex for the
typical user
74Quick Recap of OLAP Needs User Needs
- Real-Time Data Access
- ROLAP tools maintain a constant link to the
operational RDBMS, which provides users with
up-to-the-minute, accurate data - (Real-Time Data Warehousing)
- Industries organizations with highly volatile
data particularly benefit from this access to
live, operational data.
75Quick Recap of OLAP Needs User Needs
- High Capacity Data
- MOLAP products are limited by the size of the
cube defined by the multidimensional view. When
dimension elements are predefined, the scope of
available data is limited at the onset. - ROLAP tools circumvent this barrier. Dynamic
dimensions are not stored in the predefined
multidimensional model, but fetched at run time
from the RDBMS.
76Quick Recap of OLAP Needs User Needs
- In MOLAP, only aggregates are stored in the
cube. Atomic, operational data are forced out of
the users analytical realm. -
- ROLAP systems can access extremely detailed
operational data, as well as aggregated data
stored in summary tables.
77Quick Recap of OLAP Needs
- MIS Needs
- Administrators should be able to leverage
their existing relational databases without
devoting large amounts of time and effort to
intricate development, fine tuning, or intensive
maintenance.
78Quick Recap of OLAP Needs MIS Needs
- Leveraging Data Warehouse
- Both the finance and the MIS departments of your
organization will appreciate a decision support
tool that leverages existing investments in data
warehousing. - MIS staff that opts for a MOLAP tool must
duplicate data in its own proprietary MDDB. - MIS staff that chooses a ROLAP tool will be able
to access the data warehouse directly.
79Quick Recap of OLAP Needs MIS Needs
- Easy Development
- MOLAP development is straightforward, it requires
no fine tuning and creates its own aggregates. - ROLAP tools, on the other hand, require a
specific schema for the relational database. - Skilled DBAs must provide the appropriate schema
(star or snowflake schema), tune the database,
and create the appropriate summary tables. - However, many ROLAP tools are metadata-driven,
which means the multidimensional view is
generated and maintained more easily.
80Quick Recap of OLAP Needs MIS Needs
- Low Structure Maintenance
- The structure of a MOLAP tools underlying MDDB
greatly depends on each of its dimensions. When
one dimension changes, the entire MDDB must be
re-structured. - Multi-matrix MDDBs reduce the maintenance burden
- ROLAP systems do not store data in a proprietary
structure. - They build and maintain a constant link between
the multidimensional view and the underlying
RDBMS using the metadata. - No database restructuring is required.
81Quick Recap of OLAP Needs MIS Needs
- Low Aggregate Maintenance
- MOLAP tools automatically create high-level
aggregates based on your lower-level MDDB data
and aggregate definitions. - When data is updated, the aggregates are
automatically updated and stored in the MDDB. - With ROLAP tools, MIS staff must continually
monitor the use of summary tables to keep their
cost/performance ratio in check. - DBAs inevitably use sophisticated statistics to
isolate only the most frequently accessed
aggregates, and store them in summary tables. - These tables leave ROLAP administrators with a
heavy maintenance burden.
82ROLAP vs. MOLAP
83ROLAP vs. MOLAP
- 1) Performance
- How fast will the system appear to the end-user?
- MDD server vendors believe this is a key point
in their favor. - 2) Data volume and scalability
- While MDD servers can handle up to 100GB of
storage, RDBMS servers can handle hundreds of
gigabytes and terabytes.
84Hybrid OLAP - HOLAP
- Best of both worlds
- Storing detailed data in RDBMS
- Storing aggregated data in MDBMS
- User access via MOLAP tools
85HOLAP
86ROLAP, MOPAL, or HOLAP
- IF
- A. You require write access
- B. Your data is under 50 GB
- C. Your timetable to implement is 60-90 days
- D. Lowest level already aggregated
- E. Data access on aggregated level
- F. Youre developing a general-purpose
application for inventory movement or assets
management - THEN
- Consider an MDD /MOLAP solution for your data
mart -
- IF
- A. Your data is over 100 GB
- B. You have a "read-only" requirement
- C. Historical data at the lowest level of
granularity - D. Detailed access, long-running queries
- E. Data assigned to lowest level elements
- THEN
- Consider an RDBMS/ROLAP solution for your data
mart.
87Conclusions
- ROLAP RDBMS -gt star/snowflake schema
- MOLAP MDDB -gt Cube structures
- ROLAP or MOLAP Data models used play major role
in performance differences - MOLAP for summarized and relatively lesser
volumes of data (100GB) - ROLAP for detailed and larger volumes of data
- Both storage methods have strengths and
weaknesses - The choice is requirement specific, though
currently data warehouses are predominantly built
using RDBMSs/ROLAP. - HOLAP is emerging as the OLPA server of choice