Title: Data Warehousing
1Data Warehousing
Virtual University of Pakistan
- Lecture-4
- Introduction and Background
Ahsan Abdullah Assoc. Prof. Head Center for
Agro-Informatics Research www.nu.edu.pk/cairindex.
asp FAST National University of Computers
Emerging Sciences, Islamabad
2Introduction and Background
3How is it Different?
- Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
- Decision makers typically dont work 24 hrs a
day and 7 days a week. An ATM system does. - Once decision makers start using the DWH, and
start reaping the benefits, they start liking it
- Start using the DWH more often, till want it
available 100 of the time.
4How is it Different?
- Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
- For business across the globe, 50 of the world
may be sleeping at any one time, but the
businesses are up 100 of the time. - 100 availability not a trivial task, need to
take into account loading strategies, refresh
rates etc.
5How is it Different?
- Does not follows the traditional development model
- Classical SDLC
- Requirements gathering
- Analysis
- Design
- Programming
- Testing
- Integration
- Implementation
6How is it Different?
- Does not follows the traditional development model
- DWH SDLC (CLDS)
- Implement warehouse
- Integrate data
- Test for biasness
- Program w.r.t data
- Design DSS system
- Analyze results
- Understand requirement
7Data Warehouse Vs. OLTP
OLTP (On Line Transaction Processing) Select
tx_date, balance from tx_table Where account_ID
23876
8Data Warehouse Vs. OLTP
DWH Select balance, age, sal, gender from
customer_table, tx_table Where age between (30
and 40) and Education graduate
and CustID.customer_table Customer_ID.tx_table
9Data Warehouse Vs. OLTP
OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned
May use a single table Uses multiple tables
High selectivity of query Low selectivity of query
Indexing on primary key (unique) Indexing on primary index (non-unique)
10Data Warehouse Vs. OLTP
OLTP OnLine Transaction Processing (MIS or
Database System)
11Comparison of Response Times
- On-line analytical processing (OLAP) queries must
be executed in a small number of seconds. - Often requires denormalization and/or sampling.
- Complex query scripts and large list selections
can generally be executed in a small number of
minutes. - Sophisticated clustering algorithms (e.g., data
mining) can generally be executed in a small
number of hours (even for hundreds of thousands
of customers).
12Putting the pieces together
Data (Tier 0)
Data Warehouse Server (Tier 1)
Meta Data
Data Warehouse
Data Marts
Tools
Data sources