Data Warehousing - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Data Warehousing

Description:

A DBMS built for online transaction processing (OLTP) is generally regarded as ... Stores detailed, lightly, and highly summarized data. Data is largely static ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 28
Provided by: huy7
Category:

less

Transcript and Presenter's Notes

Title: Data Warehousing


1
Data Warehousing
  • Hu Yan?
  • huy_at_cs.tut.fi

2
? Outline
  • What is data warehousing
  • The benefit of data warehousing
  • Differences between OLTP and data warehousing
  • The architecture of data warehouse
  • The main components
  • Data flows
  • Tools and technologies
  • Integration
  • The importance of managing meta-data
  • Data marts

3
? What is data warehousing?
  • data warehousing is subject-oriented, integrated,
    time-variant, and non-volatile collection of data
    in support of managements decision-making
    process.
  • a data warehouse is data management and data
    analysis
  • data webhouse is a distributed data warehouse
    that is implement over the web with no central
    data repository
  • goal is to integrate enterprise wide corporate
    data into a single reository from which users can
    easily run queries

4
? What is data warehousing?
  • Subject-oriented?WH is organized around the major
    subjects of the enterprise..rather than the major
    application areas.. This is reflected in the need
    to store decision-support data rather than
    application-oriented data
  • Integrated?because the source data come together
    from different enterprise-wide applications
    systems. The source data is often inconsistent
    using..The integrated data source must be made
    consistent to present a unified view of the data
    to the users
  • Time-variant?the source data in the WH is only
    accurate and valid at some point in time or over
    some time interval. The time-variance of the data
    warehouse is also shown in the extended time that
    the data is held, the implicit or explicit
    association of time with all data, and the fact
    that the data represents a series of snapshots
  • Non-volatile?data is not update in real time but
    is refresh from OS on a regular basis. New data
    is always added as a supplement to DB, rather
    than replacement. The DB continually absorbs this
    new data, incrementally integrating it with
    previous data

5
? The benefits of data warehousing
  • The potential benefits of data warehousing are
    high returns on investment..
  • substantial competitive advantage..
  • increased productivity of corporate
    decision-makers..

6
? The difference bewteen OLTP and data warehousing
  • A DBMS built for online transaction processing
    (OLTP) is generally regarded as unsuitable for
    data warehousing because each system is designed
    with a differing set of requirements in mind
  • example OLTP systems are design to maximize the
    transaction processing capacity, while data
    warehouses are designed to support ad hoc query
    processing

7
comparision of OLTP systems and data warehousing
system
8
? Problems
  • Underestimation of resources for data loading
  • Hidden problems with source systems
  • Required data not captured
  • Increased end-user demands
  • Data homogenization
  • High demand for resources
  • Data ownership
  • High maintenance
  • Long-duration projects
  • Complexity of integration

9
? The architecture
  • Query Manage
  • Warehouse Manager

Operational data source1
Reporting, query, application development, and
EIS(executive information system) tools
  • DBMS

High summarized data
Meta-data
Operational data source 2
Lightly summarized data
Load Manager
Operational data source n
Detailed data
OLAP(online analytical processing) tools
Operational data store (ods)

Operational data store (ODS)
Data mining
Archive/backup data
End-user access tools
Typical architecture of a data warehouse
10
? The main components
  • Operational data sources?for the DW is supplied
    from mainframe operational data held in first
    generation hierarchical and network databases,
    departmental data held in proprietary file
    systems, private data held on workstaions and
    private serves and external systems such as the
    Internet, commercially available DB, or DB
    assoicated with and organizations suppliers or
    customers
  • Operational datastore(ODS)?is a repository of
    current and integrated operational data used for
    analysis. It is often structured and supplied
    with data in the same way as the data warehouse,
    but may in fact simply act as a staging area for
    data to be moved into the warehouse

11
? The main components
  • load manager?also called the frontend component,
    it performance all the operations associated with
    the extraction and loading of data into the
    warehouse. These operations include simple
    transformations of the data to prepare the data
    for entry into the warehouse
  • warehouse manager?performs all the operations
    associated with the management of the data in the
    warehouse. The operations performed by this
    component include analysis of data to ensure
    consistency, transformation and merging of source
    data, creation of indexes and views, generation
    of denormalizations and aggregations, and
    archiving and backing-up data

12
? The main components
  • query manager?also called backend component, it
    performs all the operations associated with the
    management of user queries. The operations
    performed by this component include directing
    queries to the appropriate tables and scheduling
    the execution of queries
  • detailed, lightly and lightly summarized
    data,archive/backup data
  • meta-data
  • end-user access tools?can be categorized into
    five main groups data reporting and query tools,
    application development tools, executive
    information system (EIS) tools, online analytical
    processing (OLAP) tools, and data mining tools

13
? Data flows
  • Inflow- The processes associated with the
    extraction, cleansing, and loading of the data
    from the source systems into the data warehouse.
  • upflow- The process associated with adding value
    to the data in the warehouse through summarizing,
    packaging , packaging, and distribution of the
    data
  • downflow- The processes associated with archiving
    and backing-up of data in the warehouse
  • outflow- The process associated with making the
    data availabe to the end-users
  • Meta-flow- The processes associated with the
    management of the meta-data

14
  • Warehouse Manager

Reporting, query,application development, and EIS
(executive information system) tools
Operational data source1
Warehouse Manager
  • DBMS

Meta-data
Meta-flow
High summarized data
Inflow
Outflow
Lightly summarized data
Load Manager
OLAP (online analytical processing) tools
Query Manage
Upflow
Operational data source n
Detailed data
Operational data store (ods)
Data mining tools
End-user access tools

Downflow
Archive/backup data
Information flows of a data warehouse
15
? Tools and Technologies
  • The critical steps in the construction of a data
    warehouse
  • a. Extraction
  • b. Cleansing
  • c. Transformation
  • after the critical steps, loading the results
    into target system can be carried out either by
    separate products, or by a single, categories
  • code generators
  • database data replication tools
  • dynamic transformation engines

16
? Data Warehouse DBSM(integration)
  • due to the maturity of such products, most
    relational databases will integrate predictably
    with other types of software
  • The reqirements for data warehose RDBMS
  • Load performance
  • Load processing
  • Data quality management
  • Query perfomance
  • Terabyte scalability
  • Mass user scalability
  • Networked data warehouse
  • Warehouse administration
  • Integrated dimensional analysis
  • Advanced query funtionlity

17
? The importance of managing meta-data(integration
)
  • The integration of meta-data, that is data about
    data
  • Meta-data is used for a variety of purposes and
    the management of it is a critical issue in
    achieving a fully integrated data warehouse
  • The major purpose of meta-data is to show the
    pathway back to where the data began, so that the
    warehouse administrators know the history of any
    item in the warehouse
  • The meta-data associated with data transformation
    and loading must describe the source data and any
    changes that were made to the data
  • The meta-data associated with data management
    describes the data as it is stored in the
    warehouse
  • The meta-data is required by the query manager to
    generate appropriate queries, also is associated
    with the user of queries

18
  • The major integration issue is how to synchronize
    the various types of meta-data use throughout the
    data warehouse. The challenge is to synchronize
    meta-data between different products from
    different vendors using different meta-data
    stores
  • Two major standards for meta-data and modeling in
    the areas of data warehousing and component-based
    development-MDC(Meta Data Coalition) and
    OMG(Object Management Group)

19
? Administration and Management Tools
  • a data warehouse requires tools to support the
    administration and management of such complex
    enviroment.
  • for the various types of meta-data and the
    day-to-day operations of the data warehouse, the
    administration and management tools must be
    capable of supporting those tasks
  • monitoring data loading from multiple sources
  • data quality and integrity checks
  • managing and updating meta-data
  • monitoring database performance to ensure
    efficient query response times and resource
    utilization

20
  • auditing data warehouse usage to provide user
    chargeback information
  • replicating, subsetting, and distributing data
  • maintaining effient data storage management
  • purging data
  • archiving and backing-up data
  • implementing recovery following failure
  • security management

21
? Data mart
  • data mart? a subset of a data warehouse that
    supports the requirements of particular
    department or business function
  • The characteristics that differentiate data marts
    and data warehouses include
  • a data mart focuses on only the requirements of
    users associated with one department or business
    function

22
  • data marts do not normally contain detailed
    operational data, unlike data warehouses
  • as data marts contain less data compared with
    data warehouses, data marts are more easily
    understood and navigated

23
  • Warehouse Manager

Operational data source1
Warehouse Manager
  • DBMS

Meta-data
High summarized data
Reporting, query,application development, and
EIS(executive information system) tools
Operational data source 2
Lightly summarized data
Query Manage
Load Manager
Detailed data
Operational data source n
OLAP(online analytical processing) tools

Operational data store (ods)
Data mining
(First Tier)
(Third Tier)
Operational data store (ODS)
Archive/backup data
End-user access tools
Data Mart
summarized data(Relational database)
Summarized data (Multi-dimension database)
(Second Tier)
Typical data warehouse adn data mart architecture
24
Reasons for creating a data mart
  • To give users access to the data they need to
    analyze most often
  • To provide data in a form that matches the
    collective view of the data by a group of users
    in a department or business function
  • To improve end-user response time due to the
    reduction in the volume of data to be accessed
  • To provide appropriately structured data as
    ditated by the requirements of end-user access
    tools
  • Normally use less data so tasks such as data
    cleansing, loading, transformation, and
    integration are far easier, and hence
    implementing and setting up a data mart is
    simpler than establishing a corporate data
    warehouse

25
  • The cost of implementing data marts is normally
    less than that required to establish a data
    warehouse
  • The potential users of a data mart are more
    clearly defined and can be more easily targeted
    to obtain support for a data mart project rather
    than a corporate data warehouse project

26
data marts issues
  • data mart functionality?the capabilities of data
    marts have increased with the growth in their
    popularity
  • data mart size?the performance deteriorates as
    data marts grow in size, so need to reduce the
    size of data marts to gain improvements in
    performance
  • data mart load performance?two critical
    components end-user response time and data
    loading performance?to increment DB updating so
    that only cells affected by the change are
    updated and not the entire MDDB structure

27
  • users access to data in multiple marts?one
    approach is to replicate data between different
    data marts or, alternatively, build virtual data
    mart?it is views of several physical data marts
    or the corporate data warehouse tailored to meet
    the requirements of specific groups of users
  • data mart internet/intranet access?its products
    sit between a web server and the data analysis
    product.Internet/intranet offers users low-cost
    access to data marts and the data WH using web
    browsers.
  • data mart administration?organization can not
    easily perform administration of multiple data
    marts, giving rise to issues such as data mart
    versioning, data and meta-data consistency and
    integrity, enterprise-wide security, and
    performance tuning . Data mart administrative
    tools are commerciallly available
  • data mart installation?data marts are becoming
    increasingly complex to build. Vendors are
    offering products referred to as data mart in a
    box that provide a low-cost source of data mart
    tools
Write a Comment
User Comments (0)
About PowerShow.com