ON-LINE ANALYTICAL PROCESSING - PowerPoint PPT Presentation

1 / 87
About This Presentation
Title:

ON-LINE ANALYTICAL PROCESSING

Description:

ON-LINE ANALYTICAL PROCESSING Prof. Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani Objectives What is OLAP Need for OLAP Features ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 88
Provided by: csisBits6
Category:

less

Transcript and Presenter's Notes

Title: ON-LINE ANALYTICAL PROCESSING


1
ON-LINE ANALYTICAL PROCESSING
  • Prof. Navneet Goyal
  • Department of Computer Science Information
    Systems
  • BITS, Pilani

2
Objectives
  • What is OLAP
  • Need for OLAP
  • Features functions of OLAP
  • Different OLAP models
  • OLAP implementations

3
OLAP
  • Term coined in mid 1990s
  • Main Goal support ad-hoc but complex querying by
    business analysts
  • Extends worksheet like analysis to work with huge
    amounts of data in a DW

4
Demand for OLAP
  • 2 approaches to developing EDWs
  • In both approaches, Data Marts rest on
    Dimensional Model
  • Data Marts are sufficient for basic data analysis
  • Users need to go beyond such basic analysis

5
Demand for OLAP
  • Need for Multidimensional Analysis
  • Fast Access Powerful Calculations
  • Limitations of other analysis methods like
  • SQL
  • Spreadsheets
  • Report Writers

6
Demand for OLAP
  • Traditional tools of report writers, query
    products, spreadsheets, language interfaces do
    not match the user expectations as far as
    performing multidimensional analysis with complex
    calculations is concerned.
  • Tools used with OLTP and basic DW environments do
    not match up to the task

7
OLAP is the Answer!
  • OLAP is a category of software technology that
    enables analysts, managers, and executives to
    gain insight into the data through fast,
    consistent, interactive, access in a wide variety
    of possible views of information that has been
    transformed from raw data to reflect the real
    dimensionality of the enterprise as understood by
    the user.

8
What is OLAP?

OLAP software provides the ability to analyze
large volumes of information to improve decision
making at all levels of an organization.
9
What is OLAP?

A wide spectrum of multidimensional analysis
involving intricate calculations and requiring
fast response times.
10
What is OLAP?

OLAP has two immediate consequences online part
requires the answers of queries to be fast, the
analytical part is a hint that the queries itself
are complex i.e., Complex questions with Fast
Answers!
11
Why a separate OLAP tool?
  • Empowers end users to do own analysis
  • Frees up IS backlog of report requests
  • Ease of use
  • No knowledge of tables or SQL required

12
OLAP Characteristics
  • Multi-user environment
  • Client-server architecture
  • Rapid response to queries, regardless of DB
    size and complexity

13
Data Warehouse OLAP
  • OLAP is a software system that works on top of a
    DW
  • A front-end tool for a DW
  • Information delivery system for the DW
  • Compliments the information delivery capacities
    of a DW

14
Why is OLAP useful?
  • Facilitates multidimensional data analysis by
    pre-computing aggregates across many sets of
    dimensions
  • Provides for
  • Greater speed and responsiveness
  • Improved user interactivity

15
The OLAP Market
16
The OLAP Market
17
Data Warehouses
  • A data warehouse is based on a multidimensional
    data model which views data in the form of a data
    cube
  • A data cube allows data to be modeled and viewed
    in multiple dimensions
  • In data warehousing literature, an n-D base cube
    is called a base cuboid. The top most 0-D cuboid,
    which holds the highest-level of summarization,
    is called the apex cuboid. The lattice of
    cuboids forms a data cube.

18
Lattice of Cuboids
19
CUBE
Multi-dimensional cube
Fact table view
dimensions 3
20
Aggregates
  • Add up amounts for day 1
  • In SQL SELECT sum(amt) FROM SALE
  • WHERE date 1

81
21
Aggregates
  • Add up amounts by day
  • In SQL SELECT date, sum(amt) FROM SALE
  • GROUP BY date

22
Aggregates
  • Operators sum, count, max, min, median, ave
  • Having clause
  • Using dimension hierarchy
  • average by region (within store)
  • maximum by month (within date)

23
Cube Aggregation
Example computing sums
day 2
. . .
day 1
129
24
Cube Operators
day 2
. . .
day 1
sale(c1,,)
129
sale(c2,p2,)
sale(,,)
sale(,p1,)
25
Extended Cube

day 2
sale(,p2,)
day 1
26
Aggregation Using Hierarchies
customer
region
country
(customer c1 in Region A customers c2, c3 in
Region B)
27
Pivoting
Fact table view
Multi-dimensional cube
28
Cube Aggregates Lattice
129
all
city
product
date
city, product
city, date
product, date
use greedy algorithm to decide what to materialize
city, product, date
29
Dimension Hierarchies
all
state
city
30
Dimension Hierarchies
all
product
city
date
product, date
city, product
city, date
state
city, product, date
state, date
state, product
state, product, date
not all arcs shown...
31
Interesting Hierarchy
all
years
weeks
quarters
conceptual dimension table
months
days
32
SAMPLE CUBE
Total annual sales of TV in U.S.A.
Total annual sales of PC in U.S.A.
Total annual sales of VCR in U.S.A.
Total sales In U.S.A
Total Q1 sales In U.S.A
Total sales In Canada
Total Q1 sales In Canada
Total sales In Mexico
Total Q1 sales In Mexico
Total Q2 sales In all countries
TOTAL SALES
Total Q1 sales In all countries
33
OLAP Operations
  • Roll-Up
  • Drill-Down
  • Slice Dice
  • Pivot
  • Drill-Across
  • Drill-Through

34
Example Schema
  • Fact Table
  • Sales(Store_id, Product_id, Time_id, Sales_amt)
  • Dimension Tables
  • Store (Store_id, city, state, region, country)
  • Product (Product_id, name, category)
  • Day (Time_id, month, quarter, year)
  • Hierarchies
  • Store ? City ? State ? Region ? Country
  • Product ? Category
  • Day ? Month ? Quarter ? Year

35
Drill-Down
  • SELECT S.product_id, S.store_id, SUM(S.sales_amt)
  • FROM Sales S
  • GROUP BY S.store_id, S. product_id
  • SELECT S.product_id, St.state, SUM(S.sales_amt)
  • FROM Sales S, Store St
  • WHERE St.store_idS.store_id
  • GROUP BY S.product_id, St.state
  • SELECT S.product_id, St.city, SUM(S.sales_amt)
  • FROM Sales S, Store St
  • WHERE St.store_idS.store_id
  • GROUP BY S.product_id, St.city

State
City
36
Drill-Down
37
Rolling Up
  • SELECT S.product_id, St.city, SUM(S.sales_amt)
  • INTO City_sales
  • FROM Sales S, Store St
  • WHERE St.store_idS.store_id
  • GROUP BY S.product_id, St.city
  • SELECT T.product_id, St.State, SUM(T.sales_amt)
  • FROM City_sales T, Store St
  • WHERE St.cityT.City
  • GROUP BY T.product_id, St.State

38
Pivoting
  • When we view the data as a multi-dimensional
    cube group on a subset of axes, we are said to
    be performing a pivot on those axes
  • - Pivoting on dimension Dj (j1(1)k) in a cube
    Di (i1(1)n) means that we use GROUP BY Aj
    (j1(1)k) aggregate over Ak1, . An, where Ai
    is an attribute of dimension Di
  • - Pivoting on product time corresponds to
    grouping on prod_id quarter aggregating over
    store_id

39
Pivoting
  • SELECT S.product_id, T.quarter, SUM(S.sales_amt)
  • FROM Sales S, Time T
  • WHERE T.time_idS.time_id
  • GROUP BY S.product_id, T.quarter

40
Dicing
  • When we use GROUP BY to specify part of an
    hierarchy, we are performing a range selection
    called a DICE
  • Dicing Sales in the time dimension total sales
    for each product in each qurater
  • SELECT S.product_id, T.quarter, SUM(S.sales_amt)
  • FROM Sales S, Time T
  • WHERE T.time_idS.time_id
  • GROUP BY T.quarter, S.product_id

41
Slicing
  • When we use WHERE to specify a particular value
    for an axis, we are performing a SLICE
  • Slicing in the time dimension choosing sales
    only in week 12, then pivoting to product_id
    (aggregating over store_id)
  • SELECT S.product_id, SUM(S.sales_amt)
  • FROM Sales S, Time T
  • WHERE T.time_idS.time_id T.week12
  • GROUP BY S.product_id

42
Slicing
43
OLAP Operations
44
Slicing
45
Dicing (Sub-cube)
46
Roll-Up
47
Drill-Down
48
Other OLAP Operations
  • Drill-Across Queries involving more than one
    fact table
  • Drill-Through Makes use of SQL to drill through
    the bottom level of a data cube down to its
    back-end relational tables
  • Pivot (rotate) Pivot (also called "rotate") is
    a
  • visualization operation which rotates the data
    axes in
  • view in order to provide an alternative
    presentation of
  • the data. Other examples include rotating the
    axes in a
  • 3-D cube, or transforming a 3-D cube into a
    series of 2-
  • D planes.

49
Other OLAP Operations
  • Top N or Bottom N queries
  • Moving Averages
  • Growth Rates
  • Depreciation
  • Currency Conversion
  • Statistical Functions

50
Conceptual vs. Actual
  • The cube is a logical way of visualizing the
    data in an OLAP setting
  • Not how the data is actually represented on disk
  • Two ways of storing data
  • ROLAP Relational OLAP
  • MOLAP Multidimensional OLAP

51
OLAP CUBE
  • Construction of the data cube is key to the
    operation of OLAP
  • The computation process creates a set of
    aggregates on the various dimensions of the data
  • The CUBE operator

52
An example of the CUBE Operator
53
The CUBE Operator
  • Proposed by Gray et al
  • Effectively involves a series of GROUP-BY
    operations to aggregate data
  • Creates power set on all attributes according to
  • A measure
  • An aggregator function

J. Gray, S. Chaudhuri, A. Bosworth, A. Layman,D.
Reichart, M. Venkatrao, F. Pellow and H.
Pirahesh. Data cube A relational aggregation
operator generalizing group-by, cross-tab and
sub-totals. Data Mining and Knowledge Discovery,
129-54, 1997.
54
CUBING Problem
  • Problem this generates a lot of data and work
    (2n sets in total, where n is the number of
    dimensions)
  • Solution optimized algorithms to run faster,
    consume less memory, and perform fewer I/Os.

55
Efficient Computation of Data Cubes
  • ROLAP-based cubing algorithms (Agarwal et al96)
  • Array-based cubing algorithm
  • (Zhao et al97)

S. Agarwal, R. Agrawal, P. M. Deshpande, A.Gupta,
J. F. Naughton, R. Ramakrishnan and
S.Sarawagi. On the computation of
multidimensional aggregates. In VLDB'96. Y. Zhao,
P. M. Deshpande, and J. F. Naughton. An
array-based algorithm for simultaneous
multidimensional aggregates. In SIGMOD'97.
56
Efficient Computation of Data Cubes
  • How many cuboids in a cube with 3 dimensions?
  • Answer
  • As many group by operations?
  • No hierarchies involved!!
  • p (Li 1), where Li is the number of levels
    associated with dimension I
  • 10 dimensions 4 levels for each dimension
  • Total Cuboids 510

57
Approaches to OLAP Servers
  • It is all about which DBMS you choose to store
    your data warehouse data
  • RDBMS ROLAP
  • MDDB MOLAP
  • BOTH - HOLAP

58
OLAP Flavours

59
Approaches to OLAP Servers
  • Three possibilities for OLAP servers
  • (1) Relational OLAP (ROLAP)
  • Relational and specialized relational DBMS to
    store and manage warehouse data
  • OLAP middleware to support missing pieces
  • (2) Multidimensional OLAP (MOLAP)
  • Array-based storage structures
  • Direct access to array data structures
  • (3) Hybrid OLAP (HOLAP)
  • Storing detailed data in RDBMS
  • Storing aggregated data in MDBMS
  • User access via MOLAP tools

60
ROLAP
  • Special schema design star, snowflake
  • Special indexes bitmap, multi-table join
  • Proven technology (relational model, DBMS), tend
    to outperform specialized MDDB especially on
    large data sets
  • Products
  • IBM DB2, Oracle, Sybase IQ, RedBrick, Informix

61
ROLAP
  • Defines complex, multi-dimensional data with
    simple model
  • Reduces the number of joins a query has to
    process
  • Allows the data warehouse to evolve with
    relatively low maintenance
  • Can contain both detailed and summarized data.
  • ROLAP is based on familiar, proven, and already
    selected technologies.
  • BUT!!!
  • SQL for multi-dimensional manipulation of
    calculations.

62
MOLAP
  • MDDB a special-purpose data model
  • Facts stored in multi-dimensional arrays
  • Dimensions used to index array
  • Sometimes on top of relational DB
  • Products
  • Pilot, Arbor Essbase, Gentia

63
MOLAP
  • Pre-calculating or pre-consolidating
    transactional data improves speed.
  • BUT
  • Fully pre-consolidating incoming data, MDDs
    require an enormous amount of overhead both in
    processing time and in storage. An input file of
    200MB can easily expand to 5GB
  • MDDBs are great candidates for the lt 100GB
    department data marts.
  • With MDDs, application design is essentially the
    definition of dimensions and calculation rules,
    while the RDBMS requires that the database schema
    be a star or snowflake.

64
Quick Recap of OLAP Needs
  • User Needs
  • Multidimensional view
  • Excellent Performance
  • Analytical Flexibility
  • Real-Time Data Access
  • High Data Capacity
  • MIS Needs
  • Leverages Data Warehouse
  • Easy Development
  • Low Structure Maintenance
  • Low Aggregate Maintenance

65
Quick Recap of OLAP Needs User Needs
  • Multidimensional View
  • All true OLAP tools, whether they work with a
    MDDB or an RDBMS, provide a multidimensional view
    of data.
  • For example, decision makers may view sales by
    office, quarter, representative, product, etc.
    This perspective on data, which mirrors the way
    business professional think, allows for more
    intuitive and more powerful analysis.

66
Quick Recap of OLAP Needs User Needs
  • Excellent Performance
  • The performance of your decision support tool
    directly depends on the way it manages
    aggregates.
  • RDBMS
  • Calculate aggregates on fly (response time
    suffers)
  • DBA creates summary tables to store aggregates
    (enormous amount of disk space)

67
Quick Recap of OLAP Needs User Needs
  • Excellent Performance
  • For example, suppose you have a Sales indicator
    with six dimensionsRepresentatives, Products,
    Customers, Regions, Months, and Years.
  • MOLAP tools will store a given aggregate, such as
    the November 1997 government sales of product
    A504 by representative 1040 in New York, in 1
    cell of the MDDB.
  • In contrast, ROLAP tools consume 600 more space,
    because they require a record of seven valuessix
    foreign keys and the actual aggregatein a
    relational summary table.

68
Quick Recap of OLAP Needs User Needs
  • Excellent Performance

69
Quick Recap of OLAP Needs User Needs
  • Excellent Performance

RDBMSs must use several summary tables to store
the aggregates that a MOLAP could store in just
one cube. For example, consider a Sales indicator
with three dimensions Months, Regions, and
Products. The indicator cube will contain seven
sets of aggregates Sales by month Sales by
product Sales by region Sales by month and
product Sales by month and region Sales by
product and region Sales by product, month, and
region To store these aggregates in an RDBMS,
youd have to create seven summary tables, one
for each aggregate set. HOW MANY SUMMARY TABLES
FOR 6 DIMENSIONS? (Separate fact table and
shrunken dimension table approach for storing
aggregates)
70
Quick Recap of OLAP Needs User Needs
  • Excellent Performance
  • Huge amounts of extra storage space is required
    (even if there is no sparsity failure)
  • Maintenance costs are high
  • Lot of statistical analysis needs to be done to
    decide which aggregates are to be precomputed
  • DBA must keep the cost/performance ratio in
    check

71
Quick Recap of OLAP Needs User Needs
  • Excellent Performance
  • In contrast, weve seen that multidimensional
    databases store aggregates in a very compact
    structure that consumes very little disk space
    and requires very little maintenance
  • All levels of consolidation can therefore be
    precomputed and stored in MDDB
  • As a result, fast response time is not limited
    to the most frequently accessed queries all
    aggregates can be accessed with lightning speed.

72
Quick Recap of OLAP Needs User Needs
  • Analytical Flexibility
  • Both ROLAP MOLAP tools offer comparative
    performance for
  • Comparative Analysis
  • Roll-up and Drill-down
  • Slicing Dicing
  • Only MOLAP tools offer what-if analysis

73
Quick Recap of OLAP Needs User Needs
  • Real-Time Data Access
  • MOLAP tools load data into the multidimensional
    cubes. Consequently, the data being accessed is
    only as recent as the last load.
  • Some applications require real-time data access
  • Process of continually refreshing the data
    attaches higher costs to operating a MOLAP system
  • Some MOLAP tools offer reach-through
    functionality to access volatile data stored
    outside the MDDB
  • Unfortunately, users must be aware of the
    underlying database structure
  • Relational data access is too complex for the
    typical user

74
Quick Recap of OLAP Needs User Needs
  • Real-Time Data Access
  • ROLAP tools maintain a constant link to the
    operational RDBMS, which provides users with
    up-to-the-minute, accurate data
  • (Real-Time Data Warehousing)
  • Industries organizations with highly volatile
    data particularly benefit from this access to
    live, operational data.

75
Quick Recap of OLAP Needs User Needs
  • High Capacity Data
  • MOLAP products are limited by the size of the
    cube defined by the multidimensional view. When
    dimension elements are predefined, the scope of
    available data is limited at the onset.
  • ROLAP tools circumvent this barrier. Dynamic
    dimensions are not stored in the predefined
    multidimensional model, but fetched at run time
    from the RDBMS.

76
Quick Recap of OLAP Needs User Needs
  • High Capacity Data
  • In MOLAP, only aggregates are stored in the
    cube. Atomic, operational data are forced out of
    the users analytical realm.
  • ROLAP systems can access extremely detailed
    operational data, as well as aggregated data
    stored in summary tables.

77
Quick Recap of OLAP Needs
  • MIS Needs
  • Administrators should be able to leverage
    their existing relational databases without
    devoting large amounts of time and effort to
    intricate development, fine tuning, or intensive
    maintenance.

78
Quick Recap of OLAP Needs MIS Needs
  • Leveraging Data Warehouse
  • Both the finance and the MIS departments of your
    organization will appreciate a decision support
    tool that leverages existing investments in data
    warehousing.
  • MIS staff that opts for a MOLAP tool must
    duplicate data in its own proprietary MDDB.
  • MIS staff that chooses a ROLAP tool will be able
    to access the data warehouse directly.

79
Quick Recap of OLAP Needs MIS Needs
  • Easy Development
  • MOLAP development is straightforward, it requires
    no fine tuning and creates its own aggregates.
  • ROLAP tools, on the other hand, require a
    specific schema for the relational database.
  • Skilled DBAs must provide the appropriate schema
    (star or snowflake schema), tune the database,
    and create the appropriate summary tables.
  • However, many ROLAP tools are metadata-driven,
    which means the multidimensional view is
    generated and maintained more easily.

80
Quick Recap of OLAP Needs MIS Needs
  • Low Structure Maintenance
  • The structure of a MOLAP tools underlying MDDB
    greatly depends on each of its dimensions. When
    one dimension changes, the entire MDDB must be
    re-structured.
  • Multi-matrix MDDBs reduce the maintenance burden
  • ROLAP systems do not store data in a proprietary
    structure.
  • They build and maintain a constant link between
    the multidimensional view and the underlying
    RDBMS using the metadata.
  • No database restructuring is required.

81
Quick Recap of OLAP Needs MIS Needs
  • Low Aggregate Maintenance
  • MOLAP tools automatically create high-level
    aggregates based on your lower-level MDDB data
    and aggregate definitions.
  • When data is updated, the aggregates are
    automatically updated and stored in the MDDB.
  • With ROLAP tools, MIS staff must continually
    monitor the use of summary tables to keep their
    cost/performance ratio in check.
  • DBAs inevitably use sophisticated statistics to
    isolate only the most frequently accessed
    aggregates, and store them in summary tables.
  • These tables leave ROLAP administrators with a
    heavy maintenance burden.

82
ROLAP vs. MOLAP
83
ROLAP vs. MOLAP
  • 1) Performance
  • How fast will the system appear to the end-user?
  • MDD server vendors believe this is a key point
    in their favor.
  • 2) Data volume and scalability
  • While MDD servers can handle up to 100GB of
    storage, RDBMS servers can handle hundreds of
    gigabytes and terabytes.

84
Hybrid OLAP - HOLAP
  • Best of both worlds
  • Storing detailed data in RDBMS
  • Storing aggregated data in MDBMS
  • User access via MOLAP tools

85
HOLAP
86
ROLAP, MOPAL, or HOLAP
  • IF
  • A. You require write access
  • B. Your data is under 50 GB
  • C. Your timetable to implement is 60-90 days
  • D. Lowest level already aggregated
  • E. Data access on aggregated level
  • F. Youre developing a general-purpose
    application for inventory movement or assets
    management
  • THEN
  • Consider an MDD /MOLAP solution for your data
    mart
  •  
  • IF
  • A. Your data is over 100 GB
  • B. You have a "read-only" requirement
  • C. Historical data at the lowest level of
    granularity
  • D. Detailed access, long-running queries
  • E. Data assigned to lowest level elements
  • THEN
  • Consider an RDBMS/ROLAP solution for your data
    mart.

87
Conclusions
  • ROLAP RDBMS -gt star/snowflake schema
  • MOLAP MDDB -gt Cube structures
  • ROLAP or MOLAP Data models used play major role
    in performance differences
  • MOLAP for summarized and relatively lesser
    volumes of data (100GB)
  • ROLAP for detailed and larger volumes of data
  • Both storage methods have strengths and
    weaknesses
  • The choice is requirement specific, though
    currently data warehouses are predominantly built
    using RDBMSs/ROLAP.
  • HOLAP is emerging as the OLPA server of choice
Write a Comment
User Comments (0)
About PowerShow.com