Title: Data Warehousing
1Data Warehousing
2? Outline
- The benefit of data warehousing
- Differences between OLTP and data warehousing
- The architecture of data warehouse
- The importance of managing meta-data
3? What is data warehousing?
- data warehousing is subject-oriented, integrated,
time-variant, and non-volatile collection of data
in support of managements decision-making
process. - a data warehouse is data management and data
analysis - data webhouse is a distributed data warehouse
that is implement over the web with no central
data repository - goal is to integrate enterprise wide corporate
data into a single reository from which users can
easily run queries
4? What is data warehousing?
- Subject-oriented?WH is organized around the major
subjects of the enterprise..rather than the major
application areas.. This is reflected in the need
to store decision-support data rather than
application-oriented data - Integrated?because the source data come together
from different enterprise-wide applications
systems. The source data is often inconsistent
using..The integrated data source must be made
consistent to present a unified view of the data
to the users - Time-variant?the source data in the WH is only
accurate and valid at some point in time or over
some time interval. The time-variance of the data
warehouse is also shown in the extended time that
the data is held, the implicit or explicit
association of time with all data, and the fact
that the data represents a series of snapshots - Non-volatile?data is not update in real time but
is refresh from OS on a regular basis. New data
is always added as a supplement to DB, rather
than replacement. The DB continually absorbs this
new data, incrementally integrating it with
previous data
5? The benefits of data warehousing
- The potential benefits of data warehousing are
high returns on investment.. - substantial competitive advantage..
- increased productivity of corporate
decision-makers..
6? The difference bewteen OLTP and data warehousing
- A DBMS built for online transaction processing
(OLTP) is generally regarded as unsuitable for
data warehousing because each system is designed
with a differing set of requirements in mind - example OLTP systems are design to maximize the
transaction processing capacity, while data
warehouses are designed to support ad hoc query
processing
7comparision of OLTP systems and data warehousing
system
8? Problems
- Underestimation of resources for data loading
- Hidden problems with source systems
- Required data not captured
- Increased end-user demands
- Data homogenization
- High demand for resources
- Data ownership
- High maintenance
- Long-duration projects
- Complexity of integration
9? The architecture
-
- Query Manage
- Warehouse Manager
Operational data source1
Reporting, query, application development, and
EIS(executive information system) tools
High summarized data
Meta-data
Operational data source 2
Lightly summarized data
Load Manager
Operational data source n
Detailed data
OLAP(online analytical processing) tools
Operational data store (ods)
Operational data store (ODS)
Data mining
Archive/backup data
End-user access tools
Typical architecture of a data warehouse
10? The main components
- Operational data sources?for the DW is supplied
from mainframe operational data held in first
generation hierarchical and network databases,
departmental data held in proprietary file
systems, private data held on workstaions and
private serves and external systems such as the
Internet, commercially available DB, or DB
assoicated with and organizations suppliers or
customers - Operational datastore(ODS)?is a repository of
current and integrated operational data used for
analysis. It is often structured and supplied
with data in the same way as the data warehouse,
but may in fact simply act as a staging area for
data to be moved into the warehouse
11? The main components
- load manager?also called the frontend component,
it performance all the operations associated with
the extraction and loading of data into the
warehouse. These operations include simple
transformations of the data to prepare the data
for entry into the warehouse - warehouse manager?performs all the operations
associated with the management of the data in the
warehouse. The operations performed by this
component include analysis of data to ensure
consistency, transformation and merging of source
data, creation of indexes and views, generation
of denormalizations and aggregations, and
archiving and backing-up data
12? The main components
- query manager?also called backend component, it
performs all the operations associated with the
management of user queries. The operations
performed by this component include directing
queries to the appropriate tables and scheduling
the execution of queries - detailed, lightly and lightly summarized
data,archive/backup data - meta-data
- end-user access tools?can be categorized into
five main groups data reporting and query tools,
application development tools, executive
information system (EIS) tools, online analytical
processing (OLAP) tools, and data mining tools
13? Data flows
- Inflow- The processes associated with the
extraction, cleansing, and loading of the data
from the source systems into the data warehouse. - upflow- The process associated with adding value
to the data in the warehouse through summarizing,
packaging , packaging, and distribution of the
data - downflow- The processes associated with archiving
and backing-up of data in the warehouse - outflow- The process associated with making the
data availabe to the end-users - Meta-flow- The processes associated with the
management of the meta-data
14Reporting, query,application development, and EIS
(executive information system) tools
Operational data source1
Warehouse Manager
Meta-data
Meta-flow
High summarized data
Inflow
Outflow
Lightly summarized data
Load Manager
OLAP (online analytical processing) tools
Query Manage
Upflow
Operational data source n
Detailed data
Operational data store (ods)
Data mining tools
End-user access tools
Downflow
Archive/backup data
Information flows of a data warehouse
15? Tools and Technologies
- The critical steps in the construction of a data
warehouse - a. Extraction
- b. Cleansing
- c. Transformation
- after the critical steps, loading the results
into target system can be carried out either by
separate products, or by a single, categories - code generators
- database data replication tools
- dynamic transformation engines
16? Data Warehouse DBSM(integration)
- due to the maturity of such products, most
relational databases will integrate predictably
with other types of software - The reqirements for data warehose RDBMS
- Load performance
- Load processing
- Data quality management
- Query perfomance
- Terabyte scalability
- Mass user scalability
- Networked data warehouse
- Warehouse administration
- Integrated dimensional analysis
- Advanced query funtionlity
17? The importance of managing meta-data(integration
)
- The integration of meta-data, that is data about
data - Meta-data is used for a variety of purposes and
the management of it is a critical issue in
achieving a fully integrated data warehouse - The major purpose of meta-data is to show the
pathway back to where the data began, so that the
warehouse administrators know the history of any
item in the warehouse - The meta-data associated with data transformation
and loading must describe the source data and any
changes that were made to the data - The meta-data associated with data management
describes the data as it is stored in the
warehouse - The meta-data is required by the query manager to
generate appropriate queries, also is associated
with the user of queries
18- The major integration issue is how to synchronize
the various types of meta-data use throughout the
data warehouse. The challenge is to synchronize
meta-data between different products from
different vendors using different meta-data
stores - Two major standards for meta-data and modeling in
the areas of data warehousing and component-based
development-MDC(Meta Data Coalition) and
OMG(Object Management Group)
19? Administration and Management Tools
- a data warehouse requires tools to support the
administration and management of such complex
enviroment. - for the various types of meta-data and the
day-to-day operations of the data warehouse, the
administration and management tools must be
capable of supporting those tasks - monitoring data loading from multiple sources
- data quality and integrity checks
- managing and updating meta-data
- monitoring database performance to ensure
efficient query response times and resource
utilization
20- auditing data warehouse usage to provide user
chargeback information - replicating, subsetting, and distributing data
- maintaining effient data storage management
- purging data
- archiving and backing-up data
- implementing recovery following failure
- security management
21? Data mart
- data mart? a subset of a data warehouse that
supports the requirements of particular
department or business function - The characteristics that differentiate data marts
and data warehouses include - a data mart focuses on only the requirements of
users associated with one department or business
function
22- data marts do not normally contain detailed
operational data, unlike data warehouses - as data marts contain less data compared with
data warehouses, data marts are more easily
understood and navigated
23Operational data source1
Warehouse Manager
Meta-data
High summarized data
Reporting, query,application development, and
EIS(executive information system) tools
Operational data source 2
Lightly summarized data
Query Manage
Load Manager
Detailed data
Operational data source n
OLAP(online analytical processing) tools
Operational data store (ods)
Data mining
(First Tier)
(Third Tier)
Operational data store (ODS)
Archive/backup data
End-user access tools
Data Mart
summarized data(Relational database)
Summarized data (Multi-dimension database)
(Second Tier)
Typical data warehouse adn data mart architecture
24Reasons for creating a data mart
- To give users access to the data they need to
analyze most often - To provide data in a form that matches the
collective view of the data by a group of users
in a department or business function - To improve end-user response time due to the
reduction in the volume of data to be accessed - To provide appropriately structured data as
ditated by the requirements of end-user access
tools - Normally use less data so tasks such as data
cleansing, loading, transformation, and
integration are far easier, and hence
implementing and setting up a data mart is
simpler than establishing a corporate data
warehouse
25- The cost of implementing data marts is normally
less than that required to establish a data
warehouse - The potential users of a data mart are more
clearly defined and can be more easily targeted
to obtain support for a data mart project rather
than a corporate data warehouse project
26data marts issues
- data mart functionality?the capabilities of data
marts have increased with the growth in their
popularity - data mart size?the performance deteriorates as
data marts grow in size, so need to reduce the
size of data marts to gain improvements in
performance - data mart load performance?two critical
components end-user response time and data
loading performance?to increment DB updating so
that only cells affected by the change are
updated and not the entire MDDB structure
27- users access to data in multiple marts?one
approach is to replicate data between different
data marts or, alternatively, build virtual data
mart?it is views of several physical data marts
or the corporate data warehouse tailored to meet
the requirements of specific groups of users - data mart internet/intranet access?its products
sit between a web server and the data analysis
product.Internet/intranet offers users low-cost
access to data marts and the data WH using web
browsers. - data mart administration?organization can not
easily perform administration of multiple data
marts, giving rise to issues such as data mart
versioning, data and meta-data consistency and
integrity, enterprise-wide security, and
performance tuning . Data mart administrative
tools are commerciallly available - data mart installation?data marts are becoming
increasingly complex to build. Vendors are
offering products referred to as data mart in a
box that provide a low-cost source of data mart
tools -