Data%20Warehouse - PowerPoint PPT Presentation

About This Presentation
Title:

Data%20Warehouse

Description:

Data Warehouse ... – PowerPoint PPT presentation

Number of Views:361
Avg rating:3.0/5.0
Slides: 66
Provided by: Maso78
Category:

less

Transcript and Presenter's Notes

Title: Data%20Warehouse


1
Data Warehouse
2
Objectives
  • Need for Data Warehouse.
  • What is Data Warehouse?
  • Data Warehouse Properties.
  • Data Warehouse Architectures.
  • Data Marts.
  • Corporate Information Factory.
  • Extraction, Transportation, Loading and
    Transformation.
  • Design in Data Warehouses.
  • Data Warehousing Schemas.

3
Decision support questions that enterprises need
to have answered
  • How did sales representatives perform over
    different periods of time?
  • What are the popular products?
  • What types of customers buy what types of
    products?
  • How much are the various internal organizations
    spending on what products?

4
Cont.
  • What were the variances between the amounts
    budgeted and the amounts spent?
  • What positions are being filled by people with
    what types of background?
  • What is the average pay for people within
    different age brackets?
  • What is the average pay for people within
    different age brackets?

5
What is a Data Warehouse?
  • A data warehouse is a relational database that is
    designed for query and analysis rather than for
    transaction processing
  • A common way of introducing data warehousing is
    to refer to the characteristics of a data
    warehouse as set forth by William Inmon
  • Subject Oriented
  • Integrated
  • Nonvolatile
  • Time Variant

6
Data Warehouse Properties
Data Warehouse
7
Subject Oriented
  • Data is categorized and stored by business
    subject rather than
  • by application.
  • For example, to learn more about your companys
    sales data ,
  • "Who was our best customer for this item,
    in this region last year?"
  • This ability to define a data warehouse by
    subject matter, sales in this case, makes the
    data warehouse subject oriented.

Product
Region
Time
Customer
Operational Systems
8
Integrated
  • Data warehouses must put data from disparate
    sources into a consistent format.

9
Time Variant (time series)
  • Data is stored as a series of snapshots, each
    representing a
  • period of time.

10
Non Volatile
  • Typically data in the data warehouse is not
    updated or deleted.

Nonvolatile means that, once entered into the
warehouse, data should not change .This is
logical because the purpose of a warehouse is to
enable you to analyze what has occurred.
11
Other Characteristics of Data Warehouse
  • Summarized
  • Not Normalized
  • Meta Data
  • Sources (Both operational and external data are
    presents)

12
Summary Data
  • Provide fast access to pre-computed data
  • Reduce use of
  • I/O
  • CPU
  • Memory
  • Distill from
  • Source systems - lightly summarized
  • Pre-calculated summaries - highly summarized
  • Determine requirements early

13
Summary Data
  • Average
  • Maximum
  • Total
  • Percentage

14
Summary Data
Product
Time
Store
Summary Fact (Derived)
15
Normalization
  • Normalized data contains no
  • Redundancy.
  • Repeating data.
  • Key independent columns.
  • Denormalized data often
  • Improves efficiency in OLAP systems.
  • Exists in data warehouse databases.
  • Comprises derived or summary data.
  • Star and snowflake models are denormalized.

16
Meta Data (Data about Data)
  • Provides information about the content of the
    warehouse.
  • Meta Data includes
  • A guide to moving data to the warehouse
  • Rules for summarization
  • Business terms used to describe data
  • Technical terminology
  • Rules for data extractions

17
Data Warehouse Architectures
  • Data Warehouse Architecture (Basic)
  • Data Warehouse Architecture (with a Staging
    Area)
  • Data Warehouse Architecture (with a Staging Area
    and Data Marts)

18
Data Warehouse Architecture (Basic)
  • End users directly access data derived from
    several source systems through the data warehouse.

19
Data Warehouse Architecture (with a Staging Area)
you need to clean and process your operational
data before putting it into the warehouse. You
can do this programmatically, although most data
warehouses use a staging area instead.
20
Data Warehouse Architecture (with a Staging Area
and Data Marts)
you may want to customize your warehouses
architecture for different groups within your
organization. You can do this by adding data
marts, which are systems designed for a
particular line of business.
21
Data Marts
  • A Data Mart is a small warehouse designed for
    strategic business unit or a department.
  • Data Mart Advantages
  • The cost is low.
  • Implementation time is shorter.
  • They are controlled locally rather than
    centrally.
  • They contain less information than the data
    warehouse and hence have more rapid response.
  • They allow a business unit to build its own DSS
    without relying on a centralized IS department.
  • Data Mart Types
  • Replicated Data Marts.
  • Stand-alone Data Marts.

22
Corporate Information Factory
23
Major Business Functions
Business Operations
Business Intelligence
Business Management
24
Operational Systems
Operational Systems are the internal and external
core systems that run the day-to-day business
operations. They are accessed through
application program interfaces (APIs) and are the
source of data for the data warehouse and
operational data store.
25
External Data
External Data is any data outside the normal data
collected through an enterprises internal
applications. Generally, external data, such as
demographic, credit, competitor, and financial
information, is purchased by the enterprise from
a vendor of such information.
26
Data Acquisition
Data Acquisition is the set of processes that
capture, integrate, transform, cleanse, and load
source data into the data warehouse and
operational data store.
27
Data Problems
28
Data Warehouse
The Data Warehouse is a subject-oriented,
integrated, time-variant, non-volatile collection
of data used to support the strategic
decision-making process for the enterprise.
29
Operational Data Store
The Operational Data Store is an
subject-oriented, integrated, current, volatile
collection of data used to support the tactical
decision-making process for the enterprise.
30
Comparing an Operational Data Store and a Data
Warehouse
31
CIF Data Management
CIF Data Management is the set of processes that
protect the integrity and continuity of the data
within and across the data warehouse and
operational data store. It may employ a staging
area for cleansing and synchronizing data.
32
Transactional Interface
The Transactional Interface is an easy-to-use and
intuitive interface for the end user to access
and manipulate data in the operational data
store.
33
Data Delivery
Data Delivery is the set of processes that
enables end users and their supporting IT groups
to filter, format, and deliver data to data marts
and oper-marts.
34
Exploration Warehouse
The Exploration Warehouse is a data mart whose
purpose is to provide a safe haven for
exploratory and ad hoc processing. An
exploration warehouse may utilize specialized
technologies to provide fast response times with
the ability to access the entire database.
35
Data Mining Warehouse
The Data Mining Warehouse includes tasks known as
knowledge extraction, data archaeology, data
exploration, data pattern processing and data
harvesting.
36
OLAP Data Mart
The OLAP (online analytical processing) Data Mart
is aggregated and/or summarized data that is
derived from the data warehouse and tailored to
support the multidimensional requirements of a
given business unit or business function.
37
Oper-Mart
The Oper-Mart is a subset of data derived from of
the operational data store used in tactical
analysis and usually stored in a multidimensional
manner (star schema or hypercube). They may be
created in a temporary manner and dismantled when
no longer needed.
38
Decision Support Interface
The Decision Support Interface is an easy-to-use,
intuitive tool to enable end user capabilities
such as exploration, data mining, OLAP, query,
and reporting to distill information from data.
39
Meta Data Management
Meta Data Management is the set of processes for
managing the information needed to promote data
legibility, use, and administration.
40
Information Feedback
Information Feedback is the set of processes that
transmit the intelligence gained through usage of
the Corporate Information Factory to appropriate
data stores.
41
Information Workshop
Information Workshop is the set of the facilities
that optimize use of the Corporate Information
Factory by organizing its capabilities and
knowledge, and then assimilating them into the
business process.
42
Library and Toolbox
The Library and Toolbox is the collection of meta
data and capabilities that provides information
to effectively use and administer the Corporate
Information Factory. The library provides the
medium from which knowledge is enriched. The
toolbox is a vehicle for organizing, locating,
and accessing capabilities.
43
Workbench
The Workbench is a strategic mechanism for
automating the integration of capabilities and
knowledge into the business process.
44
Operations and Administration
Operation and Administration is the set of
activities required to ensure smooth daily
operations, to ensure that resources are
optimized, and to ensure that growth is managed.
45
Systems Management
Systems Management is the set of processes for
maintaining, versioning, and upgrading the core
technology on which the data, software, and tools
operate.
46
Data Acquisition Management
Data Acquisition Management is the set of
processes that manage and maintain processes used
to capture source data and its preparation for
loading into the data warehouse or operational
data store.
47
Service Management
Service Management is the set of processes for
promoting user satisfaction and productivity
within the Corporate Information Factory. It
includes processes that manage and maintain
service level agreements, requests for change,
user communications, and the data delivery
mechanisms.
48
Change Management
Change Management is the set of processes
coordinating modifications to the Corporate
Information Factory.
49
Extraction, Transportation, Loading and
Transformation (ETL)
  • Purchase specialist tools, or develop programs
  • Extraction - select data using different methods
  • Transportation - move data into the warehouse
  • Loading and Transformation - validate, clean,
    integrate, and time stamp data

50
Data Quality - Importance
  • Ensure data is
  • Relevant
  • Useful
  • Quality
  • Accurate
  • Accessible

Warehouse
Operational systems
  • Large time consuming task

51
An Example
Sale 1/2/98 120001 Ham Pizza 10.00
Sale 1/2/98 120002 Cheese Pizza 15.00
Sale 1/2/98 120002 Anchovy Pizza 12.00
Return 1/2/98 120003 Anchovy Pizza
- 12.00
Sale 1/2/98 120004 Sausage Pizza 11.00
52
Extraction in Data Warehouses
  • Logical Extraction Methods
  • Full Extraction
  • The data is extracted completely from the source
    system.
  • Incremental Extraction
  • At a specific point in time, only the data that
    has changed since a well-defined event back in
    history will be extracted.
  • Physical Extraction Methods
  • Online Extraction
  • The data is extracted directly from the source
    system itself.
  • Offline Extraction
  • Flat files
  • Dump files
  • Redo and archive logs
  • Transportable tablespaces

53
Changing Data
54
Transportation in Data Warehouses
  • Transportation Mechanisms in Data Warehouses
  • Transportation Using Flat Files
  • Transportation Through Distributed Operations
  • Transportation Using Transportable Tablespaces

Oracle
55
Transportation in Data Warehouses
  • Transportation Using Flat Files
  • The most common method for transporting data is
    by the transfer of flat files, using mechanisms
    such as FTP or other remote file system access
    protocols
  • Transportation Through Distributed Operations
  • Distributed queries, either with or without
    gateways, can be an effective mechanism for
    extracting data. These mechanisms also transport
    the data directly to the target system.
  • Transportation Using Transportable Tablespaces
  • Some Databases such as Oracle and DB2 introduced
    an important mechanism for transporting data
    transportable tablespaces. This feature is the
    fastest way for moving large volumes of data
    between two databases.

56
Loading and Transformation in Data Warehouses
  • Loading Mechanisms
  • SQLLoader
  • External Tables
  • OCI and Direct-Path APIs
  • Export/Import
  • Transformation Mechanisms
  • Transformation Using SQL
  • Transformation Using PL/SQL
  • Transformation Using Table Functions

57
Incremental Development
  • Focus on business functionality
  • Deliver business benefit
  • Are suited to warehouse evolution
  • Once an increment is complete the selection and
    scope of the next increment is defined
  • Each increment follows the same phase sequence

58
Roles
  • The project team roles and responsibilities
  • Common roles
  • Analyst, Database Administrator, Programmer,
    Tester
  • Warehouse specific roles
  • DW Architect, Metadata Architect, Data Quality
    Administrator, DW Administrator

59
Design in Data Warehouses
  • Logical Design in Data Warehouses
  • Data Warehousing Schemas
  • Star
  • Snowflake
  • Constellation
  • Physical Design in Data Warehouses
  • Physical Design Structures
  • Tablespaces
  • Tables and Partitioned Tables
  • Views
  • Integrity Constraints
  • Dimensions
  • Indexes and Partitioned Indexes
  • Materialized Views

60
Data Warehousing Schemas
  • Star
  • Snowflake
  • Constellation

61
Star Schema
  • The center of the star consists of
  • one or more fact tables and the
  • points of the star are the
  • dimension tables.

62
Snowflake Schema
Product Table Product_id Product_desc
Store Table Store_id Store_desc District_id
District Table District_id District_desc
  • d

Sales Fact Table Item_id Store_id Sales_dollars Sa
les_units
Time Table Week_id Period_id Year_id
Dept Table Dept_id Dept_desc Mgr_id
Mgr Table Dept_id Mgr_id Mgr_name
Item Table Item_id Item_desc Dept_id
63
Constellation
Warehouse Table Warehouse_id Warehouse_loc
Inventory Fact Table Product_id Shelf_id Cost_doll
ars Qty_on_hand
64
Summary
  • Need for Data Warehouse.
  • What is Data Warehouse?
  • Data Warehouse Properties.
  • Data Warehouse Architectures.
  • Data Marts.
  • Corporate Information Factory.
  • Extraction, Transportation, Loading and
    Transformation.
  • Design in Data Warehouses.
  • Data Warehousing Schemas.

65
Q A
Write a Comment
User Comments (0)
About PowerShow.com