Title: Data Warehouse
1Presentation on Data warehouse
Made by Himanchal Tiwari
9029084374
Himanchal
2Question What is data
warehouse? Answer Data warehouse is powerful
database model. That Users ability to quickly
analyze large and multi dimensional set.
Himanchal
3- Data in the warehouse must have a strong analytic
characteristics, i.e. - Subject Oriented Data
- Integrated Data
- Time Referenced Data
- Non-Volatile Data .
Himanchal
4- (1)Subject Oriented Data -
- Data in the warehouse is group by subject rather
than by activity. - Data in database is organized by activities e.g.
Payroll processing, Shipping Processing, Loan
processing. - It Helps reducing response time of queries.
- (2)Integrated Data -
- Integration means de-duplicating information and
merging it from many sources into one consistent
location. - Integrating data is one of the most happing
activity that happens in the DW as it reduce
response time of queries. - E.g.
- Employee Table
- Account Table
- Integrated Data
- (3)Time Referenced Data -
- Data is always is referred its time value
characteristics. - (4)Non-Volatile Data-
- Non-Volatile Data helps users to dig into deep
history and In DW is non changeable.
Himanchal
5Explain Operational Vs.
Informational System
Himanchal
6- Operational System
- O.S. as the name implies are the system are the
system that help everyday operational of
business. - For e.g. inventory, or order entry,
manufacturing, payroll and account etc. - They are backbone of any business operational.
- Most organization cannot function without O.S.
and data maintained by the system. - They focus single area and hence have limited
data. - Informational System
- (i) For function like planning, forecasting,
managing, marketing, engineering. - (ii) The above function requires knowledge based
system like informational. - (iii) I.S. spans large area by grouping together
several single areas. - (iv) I.S. deal with analyzing data and making
decisions. - (v) I.S. have different scope and need large
amount of operational.
Himanchal
7Framework of DW Architecture of Data Warehouse
Himanchal
8- Source System.
- Source data transport layer.
- Data quality Control and data profiling layer.
- Metadata management layer.
- Data integration layer.
- Data processing layer.
- End user reporting layer.
Himanchal
9- 1. Source system-
- For the processing of business transection,
operational data exit. - But because of limited focus, operational
database cannot be used to access data for
informational purpose. - Many operational system are 10-15 year old and
hence the data and data accessing technology is
older. - Goal of data warehousing is to free the data
locked in O.S. - Information in outside database obtained may be
demographic, econometric of competitive. - This information super highway required more data
resources. - 2. Source data transport layer-
- Data Transport Layer of DWA does data trafficking
. - It represents tools and process involved in
transporting data from source system to
warehouse. - FTP is extensively used for data transmission.
- (3) Data quality control and data Profiling layer
. - Incomplete and inaccurate data causes DW failures
. - DW does not generate data on its own ,rather
depends on source system. - (4) Metadata management Layer.
- For Fully Function Warehouse, it is necessary top
have verity of metadata. - Metadata is info or data about data.
- End users should be able to access data without
having known where the data reside in the DW.
10- (5)Data integration layer .
- This layer integrate data obtained from various
source system. - Lot of formatting cleansing activities happen
in the layer. - This layer consists of heavy tools job control
procedure to keep the data warehouse up to date. - (6) Data Processing layer.
- DW is where dimensionally modeled data obtained
from different operational database resides
analytical purposes. - Data in the warehouse is stored in a form that it
is easy to access highly flexible. - (7) End user Reporting layer.
- Success of DW depends on access of valuable
information. - Based on business needs, there are different type
of reporting architectures. - Since most of report queries are analytical
there is tight integration between DW and
reporting architecture.
Himanchal
11- Explain the Development of DW
- Developing DW is an IT Project That requires
careful planning, requirement definition , design
prototype and implementation. - Following are the step involved in the
development of warehouse. - Developing strategy.
- Evolving DWA.
- Designing Data warehouses.
- Managing Data warehouses.
Himanchal
12Data Warehouse Options Key factors that need to
be considered ? Scope of the data warehouse ?
Data redundancy ? Type of end-user Scope The
scope of a data warehouse may be as broad as all
the informational data for the entire enterprise
from the beginning of time, or it may be as
narrow as a personal data warehouse for a
single. manager for a single year. There is
nothing that makes one of these more of a data
warehouse than another. In practice, the broader
the scope, the more valuable the warehouse is to
the enterprise and the more expensive and time
consuming it is to create and maintain.
Himanchal
13Data Redundancy There are essentially three
levels of data redundancy that enterprises should
think about when considering their data warehouse
options ? Virtual or point-to-point data
warehouses ? Central data warehouses ?
Distributed data warehouses Virtual or
point-to-point Data Warehouses A virtual or
point-to-point data warehousing strategy means
that end-users are allowed to get at operational
databases directly, using whatever tools are
enabled to the data access network. Virtual
data warehouses often provide a starting point
for organizations to learn what end-users are
really looking for. Central Data Warehouses The
central data warehouse is a single physical
database that contains all data for a specific
functional area, department, division, or
enterprise. Such warehouses are often selected
where there is a common need for informational
data and there are large numbers of end-users
already connected to a central computer or
network. A central data warehouse may contain
records for any specific period of time and
usually, contains information from multiple
operational systems.
Himanchal
14Goals In order to provide information with which
users can increase profitability, gain
competitive advantage or make better business
decisions, the data warehouse must meet the
following goals Provide Easy Access to Corporate
Data ? The user access tools must be easy to
use, so that accessing warehouse data will be
simple and intuitive ? Access should be graphic
so that it will be easier for users to understand
and spot trends or area of interest ? Access
should be manageable by end users of data
warehouse, for these are predominantly business
analysts, mid-level managers, and CEOs who seldom
possess technical acumen to get the data but are
highly qualified to analyze the information. They
must easily get answers to their questions and
ask new questions, all without getting the IT
team involved ? The process of getting and
analyzing data must be fast. Questions leapfrog,
so you must get answers fast. The very nature of
data analysis is that not all questions are known
beforehand. This involves a lot of unpredictable,
ad-hoc inquiry during a typical data analysis
session. In an analytic environment, the user is
directly interacting with the facts to find
patterns, clues or problem areas, rather than
looking at a printed report. The answers have to
be delivered fast before users lose their train
of thought Provide Clean and Reliable Data for
Analysis ? For consistent analysis, the data
environment must be stable. Now that the users
can create their own reports, they must have
consistent data. One department doing an analysis
must get the same result as any other. ? Source
conflicts must be resolved. The source for
warehouse data is the transactional system and it
is frequently seen that these systems often have
data stored in several different applications in
different formats. A customer ID may be
represented as 100044629 in one system and
44629 in another. A straight load of
information for this company into the data
warehouse would result in two customers being
represented with transactions being listed and no
supporting details. ? Historical analysis must
be possible, so that data can be analyzed cross a
span of time.
Himanchal
15Thanks Watch my Presentation