Data Warehouse Components - PowerPoint PPT Presentation

About This Presentation
Title:

Data Warehouse Components

Description:

Source Data Component Production data Internal data Archive data External data Data staging component Extraction Transformation Cleaning standardization Loading Data ... – PowerPoint PPT presentation

Number of Views:423
Avg rating:3.0/5.0
Slides: 26
Provided by: SRID6
Category:

less

Transcript and Presenter's Notes

Title: Data Warehouse Components


1
Data Warehouse Components
2
Overview of the Components
  • Source Data Component
  • Production data
  • Internal data
  • Archive data
  • External data
  • Data staging component
  • Extraction
  • Transformation
  • Cleaning
  • standardization
  • Loading
  • Data storage component
  • Information delivery component
  • Metadata component
  • Management and control component

3
Data Warehouse Architecture
Monitoring Administration
OLAP Servers
Metadata Repository
Extract Transform Load Refresh
Reconciled data
Analysis
External Sources
Serve
Query/Reporting
Operational Dbs
Data Mining
DATA SOURCES
TOOLS
Information Delivery
Data Acquisition
DATA MARTS
Data Storage
4
Architectural Framework
5
Data Acquisition
  • You are the data analyst on the project team
    building a DW for an insurance company. List the
    possible data sources from which you will bring
    data into DW
  • Production data data from various operational
    systems
  • External data for finding trends and comparisons
    against other organizations.
  • Internal data private confidential data
    important to an organization
  • Archived data for getting some historical
    information

6
Architectural Framework
7
Data Staging
  • Performs ETL
  • Extraction
  • Select data sources, determine filters
  • Automatic replicate
  • Create intermediary files
  • Transformation
  • Clean, merge, de-duplicate data
  • Covert data types
  • Calculate derived data
  • Resolve synonyms and homonyms
  • Loading
  • Initial loading
  • Incremental loading

8
Why is a separate data staging area required?
  • Data is across various operational databases
  • It should be subject-oriented data
  • Data staging is mandatory

9
Architectural Framework
10
Characteristics of data storage area
  • Separate repository
  • Data content
  • Read only
  • Integrated
  • High volumes
  • Grouped by business subjects
  • Metadata driven
  • Data from DW is aggregated in MDDBs

11
Architectural Framework
12
Information delivery component
  • Depends on the user
  • Novice user prefabricated reports, preset
    queries
  • Casual user once in a while information
  • business analyst complex analysis
  • Power users picks up interesting data

13
Information delivery component
14
Architectural Framework
15
Metadata component
  • Data about data in the datawarehouse
  • Metadata can be of 3 types
  • Operational metadata contains information about
    operational data sources
  • Extraction and transformation metadata Details
    pertaining to extraction frequencies, extraction
    methods, business rules for data extraction
  • End-user metadata navigational map of DW

16
Why is metadata especially important in a data
warehouse?
  • It acts as the glue that connects all parts of
    the data warehouse.
  • It provides information about the contents and
    structures to the developers.
  • It opens the door to the end-users and makes the
    contents recognizable in their own terms.

17
(No Transcript)
18
Management and Control
  • Sits on top of all components
  • Coordinates the services and activities within
    the DW
  • Controls the data transformation and transfer in
    DW storage

19
Summing up
  • Data warehouse building blocks or components are
    source data, data staging, data storage,
    information delivery, metadata, and management
    and control.
  • In a data warehouse, metadata is especially
    significant because it acts as the glue holding
    all the components together and serves as a
    roadmap for the end-users.

20
Doubts????????????????
21
Trends in DW
22
Case study 1
  • As a senior analyst on DW project of a large
    retail chain, you are responsible for improving
    data visualization of the output results. Make a
    list of recommendations

23
(No Transcript)
24
Parallel processing
  • Performance of DW may be improved using parallel
    processing with appropriate hardware and software
    options.
  • Parallel processing options
  • Symmetric multiprocessing
  • Massively parallel processing
  • clusters

25
DW with ERP packages
26
Web Enabled configuration
Write a Comment
User Comments (0)
About PowerShow.com